Re: [p2pi] Refining the ALTO problem statement [Was: Re: discussing P2PI-related standardization in Dublin]

"Y. R. Yang" <yry@cs.yale.edu> Sun, 15 June 2008 05:23 UTC

Return-Path: <p2pi-bounces@ietf.org>
X-Original-To: p2pi-archive@ietf.org
Delivered-To: ietfarch-p2pi-archive@core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 9B5E23A684E; Sat, 14 Jun 2008 22:23:51 -0700 (PDT)
X-Original-To: p2pi@core3.amsl.com
Delivered-To: p2pi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id EE3793A681E for <p2pi@core3.amsl.com>; Sat, 14 Jun 2008 22:23:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.284
X-Spam-Level:
X-Spam-Status: No, score=-2.284 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, SARE_MILLIONSOF=0.315]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xLdT8QAenQDE for <p2pi@core3.amsl.com>; Sat, 14 Jun 2008 22:23:47 -0700 (PDT)
Received: from netra.cs.yale.edu (netra.cs.yale.edu [128.36.229.21]) by core3.amsl.com (Postfix) with ESMTP id A74813A680F for <p2pi@ietf.org>; Sat, 14 Jun 2008 22:23:46 -0700 (PDT)
Received: from cyndra.cs.yale.edu (cyndra.cs.yale.edu [128.36.229.87]) by netra.cs.yale.edu (Postfix) with ESMTP id A517E409483; Sun, 15 Jun 2008 01:24:23 -0400 (EDT)
Date: Sun, 15 Jun 2008 01:24:23 -0400
From: "Y. R. Yang" <yry@cs.yale.edu>
To: p2pi@ietf.org
Message-ID: <Pine.LNX.4.64.0806150122570.425@cyndra.cs.yale.edu>
MIME-Version: 1.0
Subject: Re: [p2pi] Refining the ALTO problem statement [Was: Re: discussing P2PI-related standardization in Dublin]
X-BeenThere: p2pi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: P2P Infrastructure Discussion <p2pi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/p2pi>, <mailto:p2pi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/pipermail/p2pi>
List-Post: <mailto:p2pi@ietf.org>
List-Help: <mailto:p2pi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/p2pi>, <mailto:p2pi-request@ietf.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: p2pi-bounces@ietf.org
Errors-To: p2pi-bounces@ietf.org

Please see one pointer below.

On Fri, 13 Jun 2008, 4:19pm -0400, Laird Popkin wrote:

> Comments below.
> 
> On Jun 13, 2008, at 9:01 AM, stefano previdi wrote:
> 
> > Hi Laird,
> >
> > I understand your concern and let me try to answer from a routing
> > implementation perspective. My thoughts are as follows:
> >
> > - the options of having a list of IP addresses to rank may be faced to 
> >   some limitations but I probably have different view. I believe one 
> >   alternative is to have a ranking of _pair_ of IP addresses (I 
> >   believe there's a proposal made already on this direction). It 
> >   doesn't address scalability per se but it allows other form of 
> >   oracle/proximity capability.
> >
> > - From a routing algorithm perspective (and its implementation). The 
> >   localization of 10 Vs. 1000 addresses may not incur an exponential 
> >   complexity. I fact, routing algorithms/implementations are faced to 
> >   this problem for years now... ;-) In the last years a lot of work 
> >   has been done in the routing layer in order to improve performance 
> >   and to minimize the impact of routing computations. We are now 
> >   leveraging all this work in order to address other type of routing 
> >   paradigms (using the same input).
> >
> > - State can be kept even if it results from different computations.  
> >   This allows an "oracle/proximity" provider to minimize the amount of 
> >   computation and to be able do deliver results on-the-fly.
> 
> I'm not concerned about the computational cost of figuring out which 
> peers to connect to which - we do that now in our Trackers, and with 
> enough pre-computation it's remarkably fast, though of course not quite 
> as fast as picking random numbers from a list. But we don't mind a 
> little CPU overhead if it generates performance and efficiency. :-)
> 
> The main concern that I have is that I can't take an extremely high 
> volume, fast, in-memory operation and turn it into a blocking external 
> message, because that makes my system orders of magnitude slower and 
> more fragile.
> 
> The issue isn't the algorithm, it's where we draw the "line" through the 
> system where data crosses the wire. If the high-volume interactions 
> (mapping IP's to network locations, finding nearby locations, and 
> returning a sorted list of nearby IP's) can be done in memory, and only 
> periodic map updates are done over the wire, that's more efficient then 
> if the high-volume interactions all have to cross the wire.
> 
> Now perhaps we can find a way to have "sorting lists of IP's" work 
> really well. As far as the ALTO BOF goes, I just don't want us to assume 
> that approach, because I tried it last year and hit a wall, and I don't 
> want us to collectively hit that wall without at least knowing that 
> there's a wall in the vicinity.

Laird pointed out exactly the reason that we moved the computation from 
ISPs to applications. For example, in our original P4P overview article 
(http://www.dcia.info/documents/P4P_Overview.pdf section 3.1), the 
intelligtence was at the ISPs. In the current version 
(http://cs-www.cs.yale.edu/homes/yry/projects/p4p/p4p-sigcomm08.pdf) to 
be more scalable, extensible and application agonostic (see the beginning 
of Section 4 for a list of requirements), we moved application decisions 
to applications, where they should be. I can envision that both are 
possible but we prefer the current approach after our implementation 
effort last year. Sorting a list of IP addresses is a special case allowed 
by P4P but has its problems (please see ISP Use Cases and P2P Use Cases at 
page 4).

Richard

> > - There are apps models that have some form of hierarchy (e.g:  
> >   trackers) and some others not. Should we have different models in 
> >   the oracle too, or should we try to be as much as possible 
> >   transparent/agnostic ? I'd vote for the latter.
> 
> This is a very good question. In principle I agree that a single, more 
> 'generic' solution would be better. But the communication patterns for 
> tracker-based and trackerless p2p networks are quite different, and the 
> scaling requirements are so extreme, that it's not clear to me what a 
> single approach can work well for both. But that's what the ALTO 
> discussions will help explore.
> 
> For a tracker-based system, my instinct is that a "map-based" approach 
> is more efficient, because there are a small number of Trackers and they 
> have plenty of resources to keep a map in memory. For a trackerless 
> system, my instinct is that a "sort IP list" approach might be more 
> appropriate, since each peer only knows about a small piece of the swarm 
> (so the lists aren't too big) and it seems inefficient to send a complex 
> "map" to millions of peers.
> 
> > - I can't speak much about the legal/psychological issue (disclosing 
> >   p2p addresses to SP) as I come from bottom layers, but what I can 
> >   say is that, technically, the SP doesn't need any form of oracle in 
> >   order to understand who among his customers is a p2p addicted... ;-)
> 
> I am sure that's true - uplink utilization probably correlates pretty 
> well with p2p application usage. But if the p2p network sent IP lists 
> for every swarm to the oracle, that means that the ISP would know 
> exactly who was in each swarm for each unique piece of content in the 
> swarm, and which p2p network that piece of content was in, and (to some 
> degree) what the content is. This is broadcasting a level of detail 
> around user behavior that is, to be frank, a little unsettling.
> 
> > - In one point we're in violent agreement: This is a great discussion.
> 
> Yep. I am enjoying this quite a bit.
> 
> 
> > Thanks.
> >
> > s.
> >
> >
> >> From: Laird Popkin <laird@pando.com>
> >>
> >> This is a great discussion.
> >>
> >> One point I'll make related to the signaling protocol for querying  
> >> the
> >> "oracle" is that while the "send in a list of peer IP addresses and  
> >> receive a
> >> sorted list back" approach is appealing in its simplicity, it has  
> >> some serious
> >> limitations that would need to be taken into account.
> >>
> >> It's important to keep in mind that optimizing peer selection for p2p
> >> transfers has the most impact for large swarms. That is, if there  
> >> are only 20
> >> people on the planet that can serve you data, (1) it doesn't take  
> >> too long to
> >> connect to all of them so optimized peer selection doesn't save  
> >> much peer
> >> connection time, and (2) the odds of any of them being in the same  
> >> ISP is
> >> fairly low, so optimized peer selection doesn't affect data flow  
> >> much.
> >> Optimizing peer selection is focused on swarms with very large  
> >> numbers of
> >> peers. For example, we often see swarms with thousands or tens of  
> >> thousands of
> >> peers. These are a small percentage of swarms, of course, but they  
> >> are
> >> responsible for a disproportionately large volume of data. For  
> >> example, when
> >> Yale surveyed "major tracker sites" they saw that a very small  
> >> percentage of
> >> swarms had 100+ peers, but that those swarms were responsible for  
> >> well over
> >> 50% of downloads. For perspective, at Pando, we see swarms with  
> >> thousands to
> >> tens of thousands of peers fairly often.
> >>
> >> It is for these large sarms where the "list of IPs" approach runs  
> >> into some
> >> pragmatic issues:
> >> - The "oracle" must be sent the current complete set of IPs in each  
> >> swarm.
> >> This means that the queries would be very large (e.g. a list of  
> >> 1,000 IP
> >> addresses). This is not very cacheable, because swarm composition  
> >> is highly
> >> dynamic. And it has to be a complete list, or the value of the  
> >> guidance
> >> degrades dramatically. (Conceivable this could be implemented as  
> >> 'delta'
> >> updates, but that raises oracle state issues, makes the software  
> >> and protocol
> >> much more complex, etc.).
> >>
> >> - The "oracle" would have to maintain state (or recompute on every  
> >> query) for
> >> every swarm in every p2p network.
> >> - The volume of queries would be very large (perhaps on the order  
> >> of 100K
> >> queries per second, across all p2p networks operating within an ISP).
> >> - The queries cannot be cached (because the first announce is the  
> >> most
> >> important to optimize, and following queries need to include all  
> >> new peers,
> >> and each query generates a response unique to the peer being  
> >> provided the
> >> guidance).
> >> - The ISP would need to scale the "oracle" capacity to satisfy this  
> >> volume.
> >> - It exposes the entire membership and data flow of the p2p network  
> >> to the
> >> ISPs, which raises obvious legal and privacy issues. Imagine asking  
> >> your
> >> lawyers whether they want to run Pirate's Bay's Trackers.
> >> - It makes every Tracker announce slow, expensive and unreliable  
> >> instead of
> >> fast, cheap and reliable.
> >>
> >> This is why we shifted (in the P4P approach) away from the more  
> >> obvious IP
> >> list approach to moving the guidance into the Tracker's memory.  
> >> Admittedly
> >> that does require ISPs to expose some abstract guidance information  
> >> (IP
> >> prefixes and percentages), but in return:
> >> - The communication from the "oracle" to the p2p network is a single
> >> asynchronous update (e.g. as little as one message every hour)  
> >> instead of one
> >> message per peer announce per swarm.
> >> - The p2p network's Tracker can process peer requests in-memory,  
> >> without
> >> having to make a blocking network call to an external "oracle".  
> >> This is
> >> several orders of magnitude faster, more reliable, and far less  
> >> resource
> >> intensive.
> >> - There's minimal information exposed between the ISP and the P2P  
> >> network
> >> (which is, from a legal and privacy angle, better for both P2P and  
> >> ISP).
> >>
> >> - Laird Popkin, CTO, Pando Networks
> >>  mobile: 646/465-0570
> >>
> >> ----- Original Message -----
> >> From: "stefano previdi" <sprevidi@cisco.com>
> >> To: "Enrico Marocco" <enrico.marocco@telecomitalia.it>, p2pi@ietf.org
> >> Sent: Friday, June 13, 2008 4:52:29 AM (GMT-0500) America/New_York
> >> Subject: Re: [p2pi] Refining the ALTO problem statement [Was: Re:  
> >> discussing
> >> P2PI-related standardization in Dublin]
> >>
> >> Enrico,
> >>
> >> I mostly agree mostly with everything below. My suggestion (as a  
> >> routing guy
> >> more than an apps guy) is also to scope what could be extended/ 
> >> enhanced in
> >> routing protocols in order to improve localization mechanisms  
> >> (whatever they
> >> are).
> >>
> >> As an example, we have extended ISIS routing protocol in order to  
> >> carry some
> >> application related information (at this stage is an empty  
> >> container) and
> >> probably these kind of enhancements may have a use in localization/ 
> >> oracle
> >> services.
> >>
> >> s.
> >>
> >>
> >>> From: Enrico Marocco <enrico.marocco@telecomitalia.it>
> >>>
> >>> Peterson, Jon wrote:
> >>>> The proposal for the ALTO BoF (which roughly covers (3)) is now  
> >>>> in the
> >>>> BoF tracker. You can read about it at its home page:
> >>>>
> >>>> http://alto.tilab.com/
> >>> [...]
> >>>> we hope that discussions surrounding the BoF could elucidate the  
> >>>> proper
> >>>> division of labor and approach to this problem.
> >>>
> >>> To follow up on Jon's thoughts and to reflect discussions that  
> >>> happened
> >>> during the MIT workshop and on the mailing list, Vijay and I are  
> >>> going
> >>> to update the problem statement draft the ALTO BoF proposal is  
> >>> based on
> >>> (draft-marocco-alto-problem-statement-00).
> >>>
> >>> As a prelude to focus the ALTO BoF, it would be great to start
> >>> coalescing community agreement on ensuring that the scope of the  
> >>> problem
> >>> is well defined and understood.
> >>>
> >>> Essentially, the problem that ALTO will attempt to solve is this:
> >>> applications (P2P and otherwise) that make intensive use of the  
> >>> network
> >>> bandwidth or want to make decisions to optimize some function (see  
> >>> use
> >>> cases) may be interested in choosing all or some of their peers  
> >>> based on
> >>> recommendations provided by some network entity -- colloquially  
> >>> called
> >>> an oracle -- that has more of a global view on the peers in the  
> >>> network,
> >>> their topological distances, and their capabilities than the  
> >>> application
> >>> itself does.
> >>>
> >>> In thinking about the problem, it may help to consider the  
> >>> following:
> >>>
> >>> + Use cases: current version of the draft identifies four use  
> >>> cases for
> >>>  ALTO solutions (file-sharing, VoIP, p2p streaming and DHTs) but  
> >>> others
> >>>  come to mind (e.g. cache/http-mirror selection, CDN).  At the MIT
> >>>  workshop, Ted Hardie coined the term "unattended consequences" of
> >>>  peer-to-peer nodes where traffic runs constantly for as long as the
> >>>  resources to that node are popular.  Video feeds to remote stations
> >>>  were identified as a category.
> >>>
> >>> + Third-party services: even if topology information is directly
> >>>  available to network operators, nothing should prevent third- 
> >>> parties
> >>>  to provide peer selection services alternatively to or in  
> >>> competition
> >>>  with ISPs.  An incomplete list of proposed solutions for providing
> >>>  such services is online at http://alto.tilab.com/resources.html;
> >>>
> >>> + Type(s) of information that are likely provided by the oracle  
> >>> (note
> >>>  that a third-party oracle may not have access to some of this
> >>>  information):
> >>>  * Given a list of peer candidate IP addresses, sort them  
> >>> according to
> >>>    preferences (topological proximity, bandwidth constraints,  
> >>> etc.) and
> >>>    return to querying application;
> >>>  * Ability for an application to determine its network location and
> >>>    proximity to other nodes;
> >>>  * Support for cost and charging information for application so that
> >>>    applications can make appropriate trade-offs;
> >>>  * Capabilities and characteristics of peers (i.e., last hop  
> >>> bandwidth,
> >>>    etc.);
> >>>  * Network policies;
> >>>  * ...
> >>>
> >>> + Type(s) of information that is provided to the oracle for  
> >>> rendering an
> >>>  equitable decision:
> >>>  * List of peer candidate IP addresses;
> >>>  * Parameter to optimize (cost, delay, ...);
> >>>  * ...
> >>>
> >>> This is mainly what we have captured so far; however, feedback,
> >>> suggestions and contributions are not only welcome, but at this time
> >>> even required for the successful setting of a BOF.
> >>>
> >>> Besides, the draft currently identifies two elements for which
> >>> standardization work would be required, namely a discovery  
> >>> mechanisms
> >>> for locating oracles and a signaling protocol for querying them.  Do
> >>> people think the core solution for ALTO should include anything  
> >>> else?
> >>>
> >>> -- 
> >>> Ciao,
> >>> Enrico
> >>> _______________________________________________
> >>> p2pi mailing list
> >>> p2pi@ietf.org
> >>> https://www.ietf.org/mailman/listinfo/p2pi
> >>
> >>
> >> _______________________________________________
> >> p2pi mailing list
> >> p2pi@ietf.org
> >> https://www.ietf.org/mailman/listinfo/p2pi
> >
> >
> 
> _______________________________________________
> p2pi mailing list
> p2pi@ietf.org
> https://www.ietf.org/mailman/listinfo/p2pi
> 
_______________________________________________
p2pi mailing list
p2pi@ietf.org
https://www.ietf.org/mailman/listinfo/p2pi