Re: [p2pi] Refining the ALTO problem statement [Was: Re: discussing P2PI-related standardization in Dublin]

Laird Popkin <laird@pando.com> Fri, 13 June 2008 12:14 UTC

Return-Path: <p2pi-bounces@ietf.org>
X-Original-To: p2pi-archive@ietf.org
Delivered-To: ietfarch-p2pi-archive@core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id DA72A3A697D; Fri, 13 Jun 2008 05:14:39 -0700 (PDT)
X-Original-To: p2pi@core3.amsl.com
Delivered-To: p2pi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 81DB83A697D for <p2pi@core3.amsl.com>; Fri, 13 Jun 2008 05:14:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.145
X-Spam-Level:
X-Spam-Status: No, score=-10.145 tagged_above=-999 required=5 tests=[AWL=0.120, BAYES_00=-2.599, HABEAS_ACCREDITED_COI=-8, IP_NOT_FRIENDLY=0.334]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IQg9t7WlO7rv for <p2pi@core3.amsl.com>; Fri, 13 Jun 2008 05:14:37 -0700 (PDT)
Received: from dkny.pando.com (dkny.pando.com [67.99.55.163]) by core3.amsl.com (Postfix) with ESMTP id D58B13A68F2 for <p2pi@ietf.org>; Fri, 13 Jun 2008 05:14:36 -0700 (PDT)
Received: from localhost (localhost.localdomain [127.0.0.1]) by dkny.pando.com (Postfix) with ESMTP id 11323E10AE4; Fri, 13 Jun 2008 08:15:06 -0400 (EDT)
X-Virus-Scanned: amavisd-new at
Received: from dkny.pando.com ([127.0.0.1]) by localhost (dkny.pando.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dx+eLZKRFody; Fri, 13 Jun 2008 08:14:53 -0400 (EDT)
Received: from dkny.pando.com (dkny.pando.com [10.10.60.11]) by dkny.pando.com (Postfix) with ESMTP id 7A5CEE10AE1; Fri, 13 Jun 2008 08:14:53 -0400 (EDT)
Date: Fri, 13 Jun 2008 08:14:53 -0400
From: Laird Popkin <laird@pando.com>
To: stefano previdi <sprevidi@cisco.com>
Message-ID: <565621476.393261213359293462.JavaMail.root@dkny.pando.com>
In-Reply-To: <1098693626.393241213358903592.JavaMail.root@dkny.pando.com>
MIME-Version: 1.0
X-Originating-IP: [71.187.207.81]
Cc: p2pi@ietf.org
Subject: Re: [p2pi] Refining the ALTO problem statement [Was: Re: discussing P2PI-related standardization in Dublin]
X-BeenThere: p2pi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: P2P Infrastructure Discussion <p2pi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/p2pi>, <mailto:p2pi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/pipermail/p2pi>
List-Post: <mailto:p2pi@ietf.org>
List-Help: <mailto:p2pi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/p2pi>, <mailto:p2pi-request@ietf.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: p2pi-bounces@ietf.org
Errors-To: p2pi-bounces@ietf.org

This is a great discussion.

One point I'll make related to the signaling protocol for querying the "oracle" is that while the "send in a list of peer IP addresses and receive a sorted list back" approach is appealing in its simplicity, it has some serious limitations that would need to be taken into account. 

It's important to keep in mind that optimizing peer selection for p2p transfers has the most impact for large swarms. That is, if there are only 20 people on the planet that can serve you data, (1) it doesn't take too long to connect to all of them so optimized peer selection doesn't save much peer connection time, and (2) the odds of any of them being in the same ISP is fairly low, so optimized peer selection doesn't affect data flow much. Optimizing peer selection is focused on swarms with very large numbers of peers. For example, we often see swarms with thousands or tens of thousands of peers. These are a small percentage of swarms, of course, but they are responsible for a disproportionately large volume of data. For example, when Yale surveyed "major tracker sites" they saw that a very small percentage of swarms had 100+ peers, but that those swarms were responsible for well over 50% of downloads. For perspective, at Pando, we see swarms with thousands to tens of thousands of peers fairly often.

It is for these large sarms where the "list of IPs" approach runs into some pragmatic issues:
- The "oracle" must be sent the current complete set of IPs in each swarm. This means that the queries would be very large (e.g. a list of 1,000 IP addresses). This is not very cacheable, because swarm composition is highly dynamic. And it has to be a complete list, or the value of the guidance degrades dramatically. (Conceivable this could be implemented as 'delta' updates, but that raises oracle state issues, makes the software and protocol much more complex, etc.).
- The "oracle" would have to maintain state (or recompute on every query) for every swarm in every p2p network.
- The volume of queries would be very large (perhaps on the order of 100K queries per second, across all p2p networks operating within an ISP).
- The queries cannot be cached (because the first announce is the most important to optimize, and following queries need to include all new peers, and each query generates a response unique to the peer being provided the guidance).
- The ISP would need to scale the "oracle" capacity to satisfy this volume.
- It exposes the entire membership and data flow of the p2p network to the ISPs, which raises obvious legal and privacy issues. Imagine asking your lawyers whether they want to run Pirate's Bay's Trackers.
- It makes every Tracker announce slow, expensive and unreliable instead of fast, cheap and reliable.

This is why we shifted (in the P4P approach) away from the more obvious IP list approach to moving the guidance into the Tracker's memory. Admittedly that does require ISPs to expose some abstract guidance information (IP prefixes and percentages), but in return:
- The communication from the "oracle" to the p2p network is a single asynchronous update (e.g. as little as one message every hour) instead of one message per peer announce per swarm.
- The p2p network's Tracker can process peer requests in-memory, without having to make a blocking network call to an external "oracle". This is several orders of magnitude faster, more reliable, and far less resource intensive.
- There's minimal information exposed between the ISP and the P2P network (which is, from a legal and privacy angle, better for both P2P and ISP).

- Laird Popkin, CTO, Pando Networks
  mobile: 646/465-0570

----- Original Message -----
From: "stefano previdi" <sprevidi@cisco.com>
To: "Enrico Marocco" <enrico.marocco@telecomitalia.it>, p2pi@ietf.org
Sent: Friday, June 13, 2008 4:52:29 AM (GMT-0500) America/New_York
Subject: Re: [p2pi] Refining the ALTO problem statement [Was: Re: discussing P2PI-related standardization in Dublin]

Enrico,

I mostly agree mostly with everything below. My suggestion (as a routing guy
more than an apps guy) is also to scope what could be extended/enhanced in
routing protocols in order to improve localization mechanisms (whatever they
are). 

As an example, we have extended ISIS routing protocol in order to carry some
application related information (at this stage is an empty container) and
probably these kind of enhancements may have a use in localization/oracle
services.

s.


> From: Enrico Marocco <enrico.marocco@telecomitalia.it>
> 
> Peterson, Jon wrote:
>> The proposal for the ALTO BoF (which roughly covers (3)) is now in the
>> BoF tracker. You can read about it at its home page:
>> 
>> http://alto.tilab.com/
> [...]
>> we hope that discussions surrounding the BoF could elucidate the proper
>> division of labor and approach to this problem.
> 
> To follow up on Jon's thoughts and to reflect discussions that happened
> during the MIT workshop and on the mailing list, Vijay and I are going
> to update the problem statement draft the ALTO BoF proposal is based on
> (draft-marocco-alto-problem-statement-00).
> 
> As a prelude to focus the ALTO BoF, it would be great to start
> coalescing community agreement on ensuring that the scope of the problem
> is well defined and understood.
> 
> Essentially, the problem that ALTO will attempt to solve is this:
> applications (P2P and otherwise) that make intensive use of the network
> bandwidth or want to make decisions to optimize some function (see use
> cases) may be interested in choosing all or some of their peers based on
> recommendations provided by some network entity -- colloquially called
> an oracle -- that has more of a global view on the peers in the network,
> their topological distances, and their capabilities than the application
> itself does.
> 
> In thinking about the problem, it may help to consider the following:
> 
> + Use cases: current version of the draft identifies four use cases for
>   ALTO solutions (file-sharing, VoIP, p2p streaming and DHTs) but others
>   come to mind (e.g. cache/http-mirror selection, CDN).  At the MIT
>   workshop, Ted Hardie coined the term "unattended consequences" of
>   peer-to-peer nodes where traffic runs constantly for as long as the
>   resources to that node are popular.  Video feeds to remote stations
>   were identified as a category.
> 
> + Third-party services: even if topology information is directly
>   available to network operators, nothing should prevent third-parties
>   to provide peer selection services alternatively to or in competition
>   with ISPs.  An incomplete list of proposed solutions for providing
>   such services is online at http://alto.tilab.com/resources.html;
> 
> + Type(s) of information that are likely provided by the oracle (note
>   that a third-party oracle may not have access to some of this
>   information):
>   * Given a list of peer candidate IP addresses, sort them according to
>     preferences (topological proximity, bandwidth constraints, etc.) and
>     return to querying application;
>   * Ability for an application to determine its network location and
>     proximity to other nodes;
>   * Support for cost and charging information for application so that
>     applications can make appropriate trade-offs;
>   * Capabilities and characteristics of peers (i.e., last hop bandwidth,
>     etc.);
>   * Network policies;
>   * ...
> 
> + Type(s) of information that is provided to the oracle for rendering an
>   equitable decision:
>   * List of peer candidate IP addresses;
>   * Parameter to optimize (cost, delay, ...);
>   * ...
> 
> This is mainly what we have captured so far; however, feedback,
> suggestions and contributions are not only welcome, but at this time
> even required for the successful setting of a BOF.
> 
> Besides, the draft currently identifies two elements for which
> standardization work would be required, namely a discovery mechanisms
> for locating oracles and a signaling protocol for querying them.  Do
> people think the core solution for ALTO should include anything else?
> 
> -- 
> Ciao,
> Enrico
> _______________________________________________
> p2pi mailing list
> p2pi@ietf.org
> https://www.ietf.org/mailman/listinfo/p2pi


_______________________________________________
p2pi mailing list
p2pi@ietf.org
https://www.ietf.org/mailman/listinfo/p2pi
_______________________________________________
p2pi mailing list
p2pi@ietf.org
https://www.ietf.org/mailman/listinfo/p2pi