Re: [p2pi] Refining the ALTO problem statement [Was: Re: discussing P2PI-related standardization in Dublin]
"Y. R. Yang" <yry@cs.yale.edu> Sun, 15 June 2008 05:23 UTC
Return-Path: <p2pi-bounces@ietf.org>
X-Original-To: p2pi-archive@ietf.org
Delivered-To: ietfarch-p2pi-archive@core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 9B5E23A684E; Sat, 14 Jun 2008 22:23:51 -0700 (PDT)
X-Original-To: p2pi@core3.amsl.com
Delivered-To: p2pi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id EE3793A681E for <p2pi@core3.amsl.com>; Sat, 14 Jun 2008 22:23:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.284
X-Spam-Level:
X-Spam-Status: No, score=-2.284 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, SARE_MILLIONSOF=0.315]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xLdT8QAenQDE for <p2pi@core3.amsl.com>; Sat, 14 Jun 2008 22:23:47 -0700 (PDT)
Received: from netra.cs.yale.edu (netra.cs.yale.edu [128.36.229.21]) by core3.amsl.com (Postfix) with ESMTP id A74813A680F for <p2pi@ietf.org>; Sat, 14 Jun 2008 22:23:46 -0700 (PDT)
Received: from cyndra.cs.yale.edu (cyndra.cs.yale.edu [128.36.229.87]) by netra.cs.yale.edu (Postfix) with ESMTP id A517E409483; Sun, 15 Jun 2008 01:24:23 -0400 (EDT)
Date: Sun, 15 Jun 2008 01:24:23 -0400
From: "Y. R. Yang" <yry@cs.yale.edu>
To: p2pi@ietf.org
Message-ID: <Pine.LNX.4.64.0806150122570.425@cyndra.cs.yale.edu>
MIME-Version: 1.0
Subject: Re: [p2pi] Refining the ALTO problem statement [Was: Re: discussing P2PI-related standardization in Dublin]
X-BeenThere: p2pi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: P2P Infrastructure Discussion <p2pi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/p2pi>, <mailto:p2pi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/pipermail/p2pi>
List-Post: <mailto:p2pi@ietf.org>
List-Help: <mailto:p2pi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/p2pi>, <mailto:p2pi-request@ietf.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: p2pi-bounces@ietf.org
Errors-To: p2pi-bounces@ietf.org
Please see one pointer below. On Fri, 13 Jun 2008, 4:19pm -0400, Laird Popkin wrote: > Comments below. > > On Jun 13, 2008, at 9:01 AM, stefano previdi wrote: > > > Hi Laird, > > > > I understand your concern and let me try to answer from a routing > > implementation perspective. My thoughts are as follows: > > > > - the options of having a list of IP addresses to rank may be faced to > > some limitations but I probably have different view. I believe one > > alternative is to have a ranking of _pair_ of IP addresses (I > > believe there's a proposal made already on this direction). It > > doesn't address scalability per se but it allows other form of > > oracle/proximity capability. > > > > - From a routing algorithm perspective (and its implementation). The > > localization of 10 Vs. 1000 addresses may not incur an exponential > > complexity. I fact, routing algorithms/implementations are faced to > > this problem for years now... ;-) In the last years a lot of work > > has been done in the routing layer in order to improve performance > > and to minimize the impact of routing computations. We are now > > leveraging all this work in order to address other type of routing > > paradigms (using the same input). > > > > - State can be kept even if it results from different computations. > > This allows an "oracle/proximity" provider to minimize the amount of > > computation and to be able do deliver results on-the-fly. > > I'm not concerned about the computational cost of figuring out which > peers to connect to which - we do that now in our Trackers, and with > enough pre-computation it's remarkably fast, though of course not quite > as fast as picking random numbers from a list. But we don't mind a > little CPU overhead if it generates performance and efficiency. :-) > > The main concern that I have is that I can't take an extremely high > volume, fast, in-memory operation and turn it into a blocking external > message, because that makes my system orders of magnitude slower and > more fragile. > > The issue isn't the algorithm, it's where we draw the "line" through the > system where data crosses the wire. If the high-volume interactions > (mapping IP's to network locations, finding nearby locations, and > returning a sorted list of nearby IP's) can be done in memory, and only > periodic map updates are done over the wire, that's more efficient then > if the high-volume interactions all have to cross the wire. > > Now perhaps we can find a way to have "sorting lists of IP's" work > really well. As far as the ALTO BOF goes, I just don't want us to assume > that approach, because I tried it last year and hit a wall, and I don't > want us to collectively hit that wall without at least knowing that > there's a wall in the vicinity. Laird pointed out exactly the reason that we moved the computation from ISPs to applications. For example, in our original P4P overview article (http://www.dcia.info/documents/P4P_Overview.pdf section 3.1), the intelligtence was at the ISPs. In the current version (http://cs-www.cs.yale.edu/homes/yry/projects/p4p/p4p-sigcomm08.pdf) to be more scalable, extensible and application agonostic (see the beginning of Section 4 for a list of requirements), we moved application decisions to applications, where they should be. I can envision that both are possible but we prefer the current approach after our implementation effort last year. Sorting a list of IP addresses is a special case allowed by P4P but has its problems (please see ISP Use Cases and P2P Use Cases at page 4). Richard > > - There are apps models that have some form of hierarchy (e.g: > > trackers) and some others not. Should we have different models in > > the oracle too, or should we try to be as much as possible > > transparent/agnostic ? I'd vote for the latter. > > This is a very good question. In principle I agree that a single, more > 'generic' solution would be better. But the communication patterns for > tracker-based and trackerless p2p networks are quite different, and the > scaling requirements are so extreme, that it's not clear to me what a > single approach can work well for both. But that's what the ALTO > discussions will help explore. > > For a tracker-based system, my instinct is that a "map-based" approach > is more efficient, because there are a small number of Trackers and they > have plenty of resources to keep a map in memory. For a trackerless > system, my instinct is that a "sort IP list" approach might be more > appropriate, since each peer only knows about a small piece of the swarm > (so the lists aren't too big) and it seems inefficient to send a complex > "map" to millions of peers. > > > - I can't speak much about the legal/psychological issue (disclosing > > p2p addresses to SP) as I come from bottom layers, but what I can > > say is that, technically, the SP doesn't need any form of oracle in > > order to understand who among his customers is a p2p addicted... ;-) > > I am sure that's true - uplink utilization probably correlates pretty > well with p2p application usage. But if the p2p network sent IP lists > for every swarm to the oracle, that means that the ISP would know > exactly who was in each swarm for each unique piece of content in the > swarm, and which p2p network that piece of content was in, and (to some > degree) what the content is. This is broadcasting a level of detail > around user behavior that is, to be frank, a little unsettling. > > > - In one point we're in violent agreement: This is a great discussion. > > Yep. I am enjoying this quite a bit. > > > > Thanks. > > > > s. > > > > > >> From: Laird Popkin <laird@pando.com> > >> > >> This is a great discussion. > >> > >> One point I'll make related to the signaling protocol for querying > >> the > >> "oracle" is that while the "send in a list of peer IP addresses and > >> receive a > >> sorted list back" approach is appealing in its simplicity, it has > >> some serious > >> limitations that would need to be taken into account. > >> > >> It's important to keep in mind that optimizing peer selection for p2p > >> transfers has the most impact for large swarms. That is, if there > >> are only 20 > >> people on the planet that can serve you data, (1) it doesn't take > >> too long to > >> connect to all of them so optimized peer selection doesn't save > >> much peer > >> connection time, and (2) the odds of any of them being in the same > >> ISP is > >> fairly low, so optimized peer selection doesn't affect data flow > >> much. > >> Optimizing peer selection is focused on swarms with very large > >> numbers of > >> peers. For example, we often see swarms with thousands or tens of > >> thousands of > >> peers. These are a small percentage of swarms, of course, but they > >> are > >> responsible for a disproportionately large volume of data. For > >> example, when > >> Yale surveyed "major tracker sites" they saw that a very small > >> percentage of > >> swarms had 100+ peers, but that those swarms were responsible for > >> well over > >> 50% of downloads. For perspective, at Pando, we see swarms with > >> thousands to > >> tens of thousands of peers fairly often. > >> > >> It is for these large sarms where the "list of IPs" approach runs > >> into some > >> pragmatic issues: > >> - The "oracle" must be sent the current complete set of IPs in each > >> swarm. > >> This means that the queries would be very large (e.g. a list of > >> 1,000 IP > >> addresses). This is not very cacheable, because swarm composition > >> is highly > >> dynamic. And it has to be a complete list, or the value of the > >> guidance > >> degrades dramatically. (Conceivable this could be implemented as > >> 'delta' > >> updates, but that raises oracle state issues, makes the software > >> and protocol > >> much more complex, etc.). > >> > >> - The "oracle" would have to maintain state (or recompute on every > >> query) for > >> every swarm in every p2p network. > >> - The volume of queries would be very large (perhaps on the order > >> of 100K > >> queries per second, across all p2p networks operating within an ISP). > >> - The queries cannot be cached (because the first announce is the > >> most > >> important to optimize, and following queries need to include all > >> new peers, > >> and each query generates a response unique to the peer being > >> provided the > >> guidance). > >> - The ISP would need to scale the "oracle" capacity to satisfy this > >> volume. > >> - It exposes the entire membership and data flow of the p2p network > >> to the > >> ISPs, which raises obvious legal and privacy issues. Imagine asking > >> your > >> lawyers whether they want to run Pirate's Bay's Trackers. > >> - It makes every Tracker announce slow, expensive and unreliable > >> instead of > >> fast, cheap and reliable. > >> > >> This is why we shifted (in the P4P approach) away from the more > >> obvious IP > >> list approach to moving the guidance into the Tracker's memory. > >> Admittedly > >> that does require ISPs to expose some abstract guidance information > >> (IP > >> prefixes and percentages), but in return: > >> - The communication from the "oracle" to the p2p network is a single > >> asynchronous update (e.g. as little as one message every hour) > >> instead of one > >> message per peer announce per swarm. > >> - The p2p network's Tracker can process peer requests in-memory, > >> without > >> having to make a blocking network call to an external "oracle". > >> This is > >> several orders of magnitude faster, more reliable, and far less > >> resource > >> intensive. > >> - There's minimal information exposed between the ISP and the P2P > >> network > >> (which is, from a legal and privacy angle, better for both P2P and > >> ISP). > >> > >> - Laird Popkin, CTO, Pando Networks > >> mobile: 646/465-0570 > >> > >> ----- Original Message ----- > >> From: "stefano previdi" <sprevidi@cisco.com> > >> To: "Enrico Marocco" <enrico.marocco@telecomitalia.it>, p2pi@ietf.org > >> Sent: Friday, June 13, 2008 4:52:29 AM (GMT-0500) America/New_York > >> Subject: Re: [p2pi] Refining the ALTO problem statement [Was: Re: > >> discussing > >> P2PI-related standardization in Dublin] > >> > >> Enrico, > >> > >> I mostly agree mostly with everything below. My suggestion (as a > >> routing guy > >> more than an apps guy) is also to scope what could be extended/ > >> enhanced in > >> routing protocols in order to improve localization mechanisms > >> (whatever they > >> are). > >> > >> As an example, we have extended ISIS routing protocol in order to > >> carry some > >> application related information (at this stage is an empty > >> container) and > >> probably these kind of enhancements may have a use in localization/ > >> oracle > >> services. > >> > >> s. > >> > >> > >>> From: Enrico Marocco <enrico.marocco@telecomitalia.it> > >>> > >>> Peterson, Jon wrote: > >>>> The proposal for the ALTO BoF (which roughly covers (3)) is now > >>>> in the > >>>> BoF tracker. You can read about it at its home page: > >>>> > >>>> http://alto.tilab.com/ > >>> [...] > >>>> we hope that discussions surrounding the BoF could elucidate the > >>>> proper > >>>> division of labor and approach to this problem. > >>> > >>> To follow up on Jon's thoughts and to reflect discussions that > >>> happened > >>> during the MIT workshop and on the mailing list, Vijay and I are > >>> going > >>> to update the problem statement draft the ALTO BoF proposal is > >>> based on > >>> (draft-marocco-alto-problem-statement-00). > >>> > >>> As a prelude to focus the ALTO BoF, it would be great to start > >>> coalescing community agreement on ensuring that the scope of the > >>> problem > >>> is well defined and understood. > >>> > >>> Essentially, the problem that ALTO will attempt to solve is this: > >>> applications (P2P and otherwise) that make intensive use of the > >>> network > >>> bandwidth or want to make decisions to optimize some function (see > >>> use > >>> cases) may be interested in choosing all or some of their peers > >>> based on > >>> recommendations provided by some network entity -- colloquially > >>> called > >>> an oracle -- that has more of a global view on the peers in the > >>> network, > >>> their topological distances, and their capabilities than the > >>> application > >>> itself does. > >>> > >>> In thinking about the problem, it may help to consider the > >>> following: > >>> > >>> + Use cases: current version of the draft identifies four use > >>> cases for > >>> ALTO solutions (file-sharing, VoIP, p2p streaming and DHTs) but > >>> others > >>> come to mind (e.g. cache/http-mirror selection, CDN). At the MIT > >>> workshop, Ted Hardie coined the term "unattended consequences" of > >>> peer-to-peer nodes where traffic runs constantly for as long as the > >>> resources to that node are popular. Video feeds to remote stations > >>> were identified as a category. > >>> > >>> + Third-party services: even if topology information is directly > >>> available to network operators, nothing should prevent third- > >>> parties > >>> to provide peer selection services alternatively to or in > >>> competition > >>> with ISPs. An incomplete list of proposed solutions for providing > >>> such services is online at http://alto.tilab.com/resources.html; > >>> > >>> + Type(s) of information that are likely provided by the oracle > >>> (note > >>> that a third-party oracle may not have access to some of this > >>> information): > >>> * Given a list of peer candidate IP addresses, sort them > >>> according to > >>> preferences (topological proximity, bandwidth constraints, > >>> etc.) and > >>> return to querying application; > >>> * Ability for an application to determine its network location and > >>> proximity to other nodes; > >>> * Support for cost and charging information for application so that > >>> applications can make appropriate trade-offs; > >>> * Capabilities and characteristics of peers (i.e., last hop > >>> bandwidth, > >>> etc.); > >>> * Network policies; > >>> * ... > >>> > >>> + Type(s) of information that is provided to the oracle for > >>> rendering an > >>> equitable decision: > >>> * List of peer candidate IP addresses; > >>> * Parameter to optimize (cost, delay, ...); > >>> * ... > >>> > >>> This is mainly what we have captured so far; however, feedback, > >>> suggestions and contributions are not only welcome, but at this time > >>> even required for the successful setting of a BOF. > >>> > >>> Besides, the draft currently identifies two elements for which > >>> standardization work would be required, namely a discovery > >>> mechanisms > >>> for locating oracles and a signaling protocol for querying them. Do > >>> people think the core solution for ALTO should include anything > >>> else? > >>> > >>> -- > >>> Ciao, > >>> Enrico > >>> _______________________________________________ > >>> p2pi mailing list > >>> p2pi@ietf.org > >>> https://www.ietf.org/mailman/listinfo/p2pi > >> > >> > >> _______________________________________________ > >> p2pi mailing list > >> p2pi@ietf.org > >> https://www.ietf.org/mailman/listinfo/p2pi > > > > > > _______________________________________________ > p2pi mailing list > p2pi@ietf.org > https://www.ietf.org/mailman/listinfo/p2pi > _______________________________________________ p2pi mailing list p2pi@ietf.org https://www.ietf.org/mailman/listinfo/p2pi
- Re: [p2pi] Refining the ALTO problem statement [W… Enrico Marocco
- Re: [p2pi] Refining the ALTO problem statement [W… Laird Popkin
- Re: [p2pi] Refining the ALTO problem statement [W… stefano previdi
- Re: [p2pi] Refining the ALTO problem statement [W… Stephane Bortzmeyer
- Re: [p2pi] Refining the ALTO problem statement [W… Vijay K. Gurbani
- Re: [p2pi] Refining the ALTO problem statement [W… Laird Popkin
- Re: [p2pi] Refining the ALTO problem statement [W… Stephane Bortzmeyer
- Re: [p2pi] Refining the ALTO problem statement [W… Stephane Bortzmeyer
- Re: [p2pi] Refining the ALTO problem statement [W… Vijay K. Gurbani
- Re: [p2pi] Refining the ALTO problem statement [W… Vijay K. Gurbani
- Re: [p2pi] Refining the ALTO problem statement [W… Laird Popkin
- Re: [p2pi] Refining the ALTO problem statement [W… Vijay K. Gurbani
- Re: [p2pi] Refining the ALTO problem statement [W… Enrico Marocco
- Re: [p2pi] Refining the ALTO problem statement [W… Enrico Marocco
- Re: [p2pi] Refining the ALTO problem statement [W… Laird Popkin
- Re: [p2pi] Refining the ALTO problem statement [W… Y. R. Yang
- Re: [p2pi] Refining the ALTO problem statement [W… Stephane Bortzmeyer
- Re: [p2pi] Refining the ALTO problem statement [W… stefano previdi