Re: [p2pi] draft-livingood-woundy-p4p-experiences-02 posted

Laird Popkin <laird@pando.com> Mon, 10 November 2008 03:40 UTC

Return-Path: <p2pi-bounces@ietf.org>
X-Original-To: p2pi-archive@ietf.org
Delivered-To: ietfarch-p2pi-archive@core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 4D6343A689F; Sun, 9 Nov 2008 19:40:27 -0800 (PST)
X-Original-To: p2pi@core3.amsl.com
Delivered-To: p2pi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 5FCF23A6803 for <p2pi@core3.amsl.com>; Sun, 9 Nov 2008 19:40:26 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.265
X-Spam-Level:
X-Spam-Status: No, score=-10.265 tagged_above=-999 required=5 tests=[AWL=0.000, BAYES_00=-2.599, HABEAS_ACCREDITED_COI=-8, IP_NOT_FRIENDLY=0.334]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oiNM46Szkk6D for <p2pi@core3.amsl.com>; Sun, 9 Nov 2008 19:40:24 -0800 (PST)
Received: from dkny.pando.com (dkny.pando.com [67.99.55.163]) by core3.amsl.com (Postfix) with ESMTP id DC7703A689F for <p2pi@ietf.org>; Sun, 9 Nov 2008 19:40:23 -0800 (PST)
Received: from localhost (localhost.localdomain [127.0.0.1]) by dkny.pando.com (Postfix) with ESMTP id 32C86E10AA5; Sun, 9 Nov 2008 22:40:16 -0500 (EST)
X-Virus-Scanned: amavisd-new at
Received: from dkny.pando.com ([127.0.0.1]) by localhost (dkny.pando.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1EoV+-Biljsg; Sun, 9 Nov 2008 22:39:58 -0500 (EST)
Received: from dkny.pando.com (dkny.pando.com [10.10.60.11]) by dkny.pando.com (Postfix) with ESMTP id F1741E10CFD; Sun, 9 Nov 2008 22:39:57 -0500 (EST)
Date: Sun, 9 Nov 2008 22:39:57 -0500 (EST)
From: Laird Popkin <laird@pando.com>
To: "Y. Richard Yang" <yry@cs.yale.edu>
Message-ID: <131331900.83471226288397931.JavaMail.root@dkny.pando.com>
In-Reply-To: <1967859877.83451226287789308.JavaMail.root@dkny.pando.com>
MIME-Version: 1.0
X-Originating-IP: [71.187.59.213]
Cc: p2pi@ietf.org
Subject: Re: [p2pi] draft-livingood-woundy-p4p-experiences-02 posted
X-BeenThere: p2pi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: P2P Infrastructure Discussion <p2pi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/p2pi>, <mailto:p2pi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/pipermail/p2pi>
List-Post: <mailto:p2pi@ietf.org>
List-Help: <mailto:p2pi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/p2pi>, <mailto:p2pi-request@ietf.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: p2pi-bounces@ietf.org
Errors-To: p2pi-bounces@ietf.org

To explain a bit of the test methodology, downloaders were globally distributed, and every downloader was assigned randomly to one of the five swarms at the time that they started the download. The result of this is that while there were quite comparable numbers of downloaders in each swarm globally, the number of downloaders within a given ISP for each swarm has random variation. For example, one ISP might have a few percent more downloaders of the 'random' swarm while another ISP might have a few percent more downloaders of the 'P4P generic weight matrix' swarm. As with any statistical analysis of real world data, you should expect some random variation based on sample size. In particular, within Comcast there happened to be a few percent more 'random' downloaders than for the guided swarms.

Looking at the numbers of downloads and their completion rates, all of the swarms had completion rates that were extremely high, with the guided (P4P and PNA) having a higher completion rate (about 98%) and the unguided (random) downloads having a slightly lower completion rate (about 97%).

The standard technique for addressing random samples is to look at percentages rather than absolute values. That is, instead of looking at the number of bytes downloaded for each swarm from external sources (for example), it's better to compare the percentage of data downloaded that came from external sources. It's not particularly meaningful that one swarm was downloaded with one ISP a few percent more than another, but comparing cancellation rates, or ratio of internal to external data download volumes, is rather illuminating.

- Laird Popkin, CTO, Pando Networks
  mobile: 646/465-0570

----- Original Message -----
From: "Y. Richard Yang" <yry@cs.yale.edu>
To: "Robb Topolski" <robb@funchords.com>
Cc: p2pi@ietf.org
Sent: Sunday, November 9, 2008 4:57:32 PM (GMT-0500) America/New_York
Subject: Re: [p2pi] draft-livingood-woundy-p4p-experiences-02 posted

Hi Robb,

Thanks for the excellent comments. Please see below.

Robb Topolski wrote:
> It seems like a broken statistic given this experiment.
>
> If it were tightly controlled (and thus less Internet-like), all of
> the downloads would have completed and the download byte amounts would
> be virtually the same.
>
>   
Welcome to the real world :-) Large-scale and yet tightly controlled 
experiments are hard to
conduct (if you have an idea or are interested in more discussion, it 
will be fantastic). We did
quite a few controlled experiments using clients on PlanetLab, but we 
could get a couple
hundred users if we were lucky. Neither may tightly controlled 
experiments be highly desirable,
because we would not know what we were missing then (your example of p2p 
streaming is
an excellent example). One objective of real experiments is unexpected 
discovery. Sometime
they are pleasant, and sometime they are not. Download abort was one 
unexpected. But given
the small fraction, I think it does not changes the reported results. 
For full disclosure, another
unexpected user behavior we discovered during our log processing was 
that users sometime paused
their download process (e.g. put the machines to sleep). We spent quite 
a lot of time on this issue.
Unfortunately the logs we had did not capture such events. We were 
careful to use statistically
more robust metrics (e.g., the download rates at different percentiles 
instead of the average). We
tried to check the impacts of such events. For example, we also computed 
statistics after filtering
to make sure our statistics did not change much. One filtering we did 
was that if a user had no
activity for too long (we tried two minutes and some other numbers), we 
flagged that the user
might have paused, and we ignored  that user's data. We did not see 
large change of the
statistics we computed before and after such filtering.

Of course, we learn from such experiences. In our current experiments, 
we are designing a new log
format to try to capture more events. We might be surprised again and go 
back to log more.
We will be happy to share and work with others on robust experiment 
design and statistics collection.
> One might argue that there is virtue to these observations because a
> downloader who cancels a slow download is a bad thing, because they
> ultimately did not get what they attempted to download.
>
> But is there virtue in an aborted download?  If they canceled the
> download in frustration or because they simply couldn't stay connected
> to the swarm long enough before being forced to quit by some other
> personal oe obligation, then one could say that an aborted download is
> a band thing. OTOH, an incomplete download is a good thing when a
> downloader may have changed his mind (wanted to hear/see/do something
> else) with less bandwidth wasted. There's just no way to know why the
> user aborted the download.
>
>   
I agree it will be fantastic if we can know the reason for an aborted 
download. One thought came to
mind immediately is a pop-up window after a download abort to ask the 
user for reason(s). I do not
know any P2P clients doing this right now and I am not sure how many 
users will respond (e.g. not click
cancel). But it is an excellent suggestion! We sure will talk to the P2P 
developers we are working with
to see if we can add such an option in one of our log plug-in.
> Even if we did know, this experiment doesn't deal with other possible
> P4P-advantaged uses such as streaming-P2P video delivery (a mode more
> prone to user "taste-testing" before completing a download than
> traditional file-transfer models).
>
>   
Absolutely good comment. This is why we are focusing on P2P streaming, 
where channel hopping is
a common user behavior (http://ccr.sigcomm.org/online/?q=node/404), and 
startup delay is a major
performance metric.

Hope to hear more such good comments!

Richard
> Robb
>
> 2008/11/6 Ye WANG <wangye.thu@gmail.com>om>:
>   
>> Hi Haibin,
>>
>> Yes, the Random swarm has notable smaller finished downloads than Generic or
>> Coarse Grained during the period.
>>
>> Since the swarm sizes (# downloading peers) are roughly equal across all
>> five swarms (Richard explained this in details), we suspect that a portion
>> of slow peers terminate/discard their downloads.  This is almost the same
>> hypothesis pointed out by Rich Woundy.
>>
>> Another evidence is we do notice significantly slower peers in Random swarm,
>> e.g., the slowest Random peer took 7268s (>2hours) to download the video,
>> but the slowest Generic took 2725s (<1hour), the slowest Corase Grained took
>> 3114s (<1hour).   Hours of downloading may make users impatient.  If the
>> "tail" peers in Random swarm could suffer much lower download rates,
>> presumably, the number of "terminated" peers may be larger in Random swarm.
>>
>> On Thu, Nov 6, 2008 at 4:09 AM, Song Haibin <melodysong@huawei.com> wrote:
>>     
>>> Hi Richard and all,
>>>
>>>       
>>>> The access download of each swarm should be equal to the sum of those
>>>> downloaded by the clients in each swarm. So if the number of downloads in
>>>> each swarm is the same and the amount downloaded is the same, then each
>>>> swarm should have the same access download.
>>>>         
>>> Song Haibin: From section 4.1, we can see that "The results of the trial
>>> indicated that P4P can improve the speed of downloads to P2P clients", so
>>> if
>>> the statistics data is collected during a certain period (from July 2 to
>>> July 17, 2008), then the download will be increased than the random swarm.
>>> I
>>> don't think each swarm has downloaded the same amount of chunk files
>>> during
>>> the statistic period.
>>>       
>>     
>>> Best Regards,
>>> Song Haibin
>>> Email: melodysong@huawei.com
>>> Skype: alexsonghw
>>>
>>>
>>>
>>>       
>>>> -----Original Message-----
>>>> From: p2pi-bounces@ietf.org [mailto:p2pi-bounces@ietf.org] On Behalf Of
>>>> Y.
>>>>         
>>> R.
>>>       
>>>> Yang
>>>> Sent: Thursday, November 06, 2008 10:40 AM
>>>> To: Woundy, Richard
>>>> Cc: p2pi@ietf.org; Livingood, Jason
>>>> Subject: Re: [p2pi] draft-livingood-woundy-p4p-experiences-02 posted
>>>>
>>>>
>>>> Hi Rich and others,
>>>>
>>>> The access download of each swarm should be equal to the sum of those
>>>> downloaded by the clients in each swarm. So if the number of downloads in
>>>> each swarm is the same and the amount downloaded is the same, then each
>>>> swarm should have the same access download.
>>>>
>>>> First look at the amount downloaded. There are can be some differences
>>>> due
>>>> to duplicated chunks and the way we detected data chunks (there are also
>>>> control data in the logs), but the difference appears to be small.
>>>>
>>>> Now let's look at the number of downloads. During the test, each client
>>>> is
>>>> uniformly assigned to a swarm. Given the large number of clients, each
>>>> swarm should have about the same number of clients. But there can be two
>>>> factors for us to see different numbers of *reported* downloads: (1) some
>>>> clients are old and may not report or the reporting of logs was not
>>>> successful; and (2) different # of clients finished downloading (if a
>>>> client does not finish downloading, it does not report. Laird, please
>>>> correct me if I am wrong). I belive the first factor should be small due
>>>> to uniform random assignment of peers to swarms.
>>>>
>>>> I just looked at the data available. We did detect a smaller number of
>>>> finished download with Random than with the P4P swarms. For example, from
>>>> July 3 to July 10, detected # of finished download of Generic is about 5%
>>>> more than than Random, and Coarse is 7% more than Random. From July 10 to
>>>> July 17, Generic is 10.5% more than Random, and Coarse is 4.7% more.
>>>> Looking at the traffic volume at Section 4.2, I see that Generic is about
>>>> 7% higher, and Coarse is about 8.5% higher. Note that # of finished
>>>> download and volume are different due to duplicated chunks and missing
>>>> logs.
>>>>
>>>> So I would like to support the theory/guess of Rich that some users
>>>> terminated the download prematurally and faster downloads may result in
>>>> fewer such terminations. But it may also include factors in (1)
>>>> differences in initial assignment due to random numbers; and (2) # of
>>>> finished but non-reporting clients.
>>>>
>>>> If you have any other suggestions, we will be more than happy to look
>>>> into
>>>> the available data more.
>>>>
>>>> Richard
>>>>
>>>> On Wed, 5 Nov 2008, 6:26pm -0500, Woundy, Richard wrote:
>>>>
>>>>         
>>>>> My current theory/guess is that some users may terminated the download
>>>>> prematurely, eg due to user impatience. So faster downloads (e.g.
>>>>> thanks
>>>>> to P4P) may result in fewer user terminations.
>>>>>
>>>>>
>>>>>
>>>>> Laird is checking the data to see if we can confirm that, or find
>>>>> another explanation.
>>>>>
>>>>>
>>>>>
>>>>> -- Rich
>>>>>
>>>>>
>>>>>
>>>>> ________________________________
>>>>>
>>>>> From: Laird Popkin [mailto:laird@pando.com]
>>>>> Sent: Wednesday, November 05, 2008 6:12 PM
>>>>> To: Robb Topolski
>>>>> Cc: Livingood, Jason; p2pi@ietf.org; Woundy, Richard
>>>>> Subject: Re: [p2pi] draft-livingood-woundy-p4p-experiences-02 posted
>>>>>
>>>>>
>>>>>
>>>>> That's a good question, and Richard and I spoke about this yesterday.
>>>>> I'll be looking into the data to see what the cause is.
>>>>>
>>>>> - Laird Popkin, CTO, Pando Networks
>>>>>   mobile: 646/465-0570
>>>>>
>>>>> ----- Original Message -----
>>>>> From: "Robb Topolski" <robb@funchords.com>
>>>>> To: "Richard Woundy" <Richard_Woundy@cable.comcast.com>
>>>>> Cc: "Jason Livingood" <Jason_Livingood@cable.comcast.com>om>,
>>>>> p2pi@ietf.org
>>>>> Sent: Wednesday, November 5, 2008 5:42:51 PM (GMT-0500)
>>>>> America/New_York
>>>>> Subject: Re: [p2pi] draft-livingood-woundy-p4p-experiences-02 posted
>>>>>
>>>>> I don't get the part where access network download consumption
>>>>> increased
>>>>> as a result of using P4P (section 4.2).  Can someone explain how that
>>>>> could happen?
>>>>>
>>>>> Robb
>>>>>
>>>>> On Wed, Nov 5, 2008 at 2:29 PM, Woundy, Richard
>>>>> <Richard_Woundy@cable.comcast.com> wrote:
>>>>>
>>>>> Reinaldo,
>>>>>
>>>>> I can answer the easy questions. We will need some assistance from
>>>>> Pando
>>>>> (and Yale) for some of the other ones.
>>>>>
>>>>>
>>>>>           
>>>>>> What was the file size in those experiments?
>>>>>>             
>>>>> 21 megabytes. From section 2: "Pando distributed a special 21 MB
>>>>> licensed video file as in order to measure the effectiveness of P4P
>>>>> iTrackers."
>>>>>
>>>>>
>>>>>           
>>>>>> How long would it take to download the file in the three different
>>>>>>             
>>>>> scenarios? I know that more consumed bandwidth in access might lead one
>>>>>
>>>>> to conclude that file was downloaded faster...
>>>>>
>>>>> To clarify, most of the raw data (download speed and Internet
>>>>> peering/transit traffic volumes) were collected by Pando Networks from
>>>>> their P2P clients, not collected by Comcast across its links. So my
>>>>> assumption is that the Pando client used the content size (21 MB), and
>>>>> divided by the download time to get the speed.
>>>>>
>>>>>
>>>>>           
>>>>>> Was the file already seeded in Comcast's network? More specifically,
>>>>>>             
>>>>> how
>>>>> was file propagation done?
>>>>>
>>>>> Any seeding happened outside of Comcast's network, and outside of
>>>>> Comcast's control. That's really a question for Pando.
>>>>>
>>>>>
>>>>>           
>>>>>> Was PEX, DHT and others enabled in the clients?
>>>>>>             
>>>>> Pando would know whether PEX was enabled. It would be safe to assume
>>>>> that with respect to this trial, DHT was NOT enabled, since Pando
>>>>> supplied the tracker. (The pTracker in the draft is a tracker operated
>>>>> by Pando.)
>>>>>
>>>>>
>>>>>           
>>>>>> Was local peer discovery enabled in the clients?
>>>>>>             
>>>>> Pando would know.
>>>>>
>>>>>
>>>>>           
>>>>>> BTW, can broadcast/multicast peer discovery work in Cable networks?
>>>>>>             
>>>>> Do you mean something like this:
>>>>> http://bittorrent.org/beps/bep_0026.html?
>>>>>
>>>>> If so, peer discovery probably would not work over the typical last
>>>>> mile
>>>>> cable network. Maybe I'm wrong, but I see this protocol as intended for
>>>>> peer discovery within one's home network / LAN / WiFi network, not over
>>>>> a cable network.
>>>>>
>>>>>
>>>>>           
>>>>>> So, were clients allowed to become seeders to the outside of Comcast's
>>>>>>             
>>>>> network?
>>>>>
>>>>> Yes, they were.
>>>>>
>>>>> As a related item, look closely at section 4.2. The amount of aggregate
>>>>> uploaded data from Comcast clients (per swarm) was about 140,000 MB.
>>>>> The
>>>>> amount of aggregate downloaded data from Comcast clients (per swarm)
>>>>> was
>>>>> about 60,000 MB or so. So the typical Comcast client uploaded more than
>>>>> twice the amount of data that it downloaded.
>>>>>
>>>>>
>>>>>           
>>>>>> How much of the swarm was within Comcast and outside?
>>>>>>             
>>>>> Most of the swarm was outside of Comcast. Unfortunately I don't have
>>>>> access to the size of the global swarm, but I would guess that Comcast
>>>>> clients represented no more than 15% of the swarm, and maybe as little
>>>>> as 5%. Those guesses are based on the behavior of the random swarm,
>>>>> e.g.
>>>>> Comcast clients uploaded to non-Comcast clients 94% of the time in the
>>>>> random swarm.
>>>>>
>>>>> -- Rich
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: p2pi-bounces@ietf.org [mailto:p2pi-bounces@ietf.org] On Behalf Of
>>>>>
>>>>> Reinaldo Penno
>>>>> Sent: Wednesday, November 05, 2008 11:23 AM
>>>>> To: Livingood, Jason; p2pi@ietf.org
>>>>> Subject: Re: [p2pi] draft-livingood-woundy-p4p-experiences-02 posted
>>>>>
>>>>> Hello Jason/Rich,
>>>>>
>>>>> This is such an interesting draft. I'm surprised there are no questions
>>>>> about it. Maybe everybody else is part of P4P one way or another and
>>>>> I'm
>>>>> not
>>>>> in the 'in' crowd (;-) so I have questions.
>>>>>
>>>>> * What was the file size in those experiments? Some post long ago said
>>>>> the
>>>>> file size in some P4P experiments was really small, as opposed to the
>>>>> top
>>>>> 100 torrents where the file size is ~1Gb. I was curious what is the
>>>>> optimization payback in terms of download time for large files as
>>>>> opposed
>>>>> small files.
>>>>>
>>>>> * How long would it take to download the file in the three different
>>>>> scenarios? I know that more consumed bandwidth in access might lead one
>>>>> to
>>>>> conclude that file was downloaded faster but I'm not sure this is a
>>>>> straightforward conclusion.
>>>>>
>>>>> * Was the file already seeded in Comcast's network? More specifically,
>>>>> how
>>>>> was file propagation done? All clients started from scratch and had to
>>>>> start
>>>>> pulling the file from some other side of the world and then exchanging
>>>>> pieces? This is mainly due to the discussion in 4.2.
>>>>>
>>>>> * Was PEX, DHT and others enabled in the clients?
>>>>>
>>>>> * Was local peer discovery enabled in the clients? BTW, can
>>>>> broadcast/multicast peer discovery work in Cable networks?
>>>>>
>>>>> * If more clients finish downloading faster and become seeders you
>>>>> would
>>>>> think that for popular content Comcast's upstream bandwidth would
>>>>> increase
>>>>> due to the number of seeder in its network. So, were clients allowed to
>>>>> become seeders to the outside of Comcast's network? How much of the
>>>>> swarm
>>>>> was within Comcast and outside?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Reinaldo
>>>>>
>>>>> On 11/3/08 12:49 PM, "Livingood, Jason"
>>>>> <Jason_Livingood@cable.comcast.com>
>>>>> wrote:
>>>>>
>>>>>           
>>>>>> For some reason the URL was cut to two lines - trying again:
>>>>>>
>>>>>>
>>>>>>             
>>>>> http://www.ietf.org/internet-drafts/draft-livingood-woundy-p4p-experienc
>>>>>           
>>>>>> es-02.txt
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> -----Original Message-----
>>>>>>> From: p2pi-bounces@ietf.org [mailto:p2pi-bounces@ietf.org] On
>>>>>>> Behalf Of Livingood, Jason
>>>>>>> Sent: Monday, November 03, 2008 3:35 PM
>>>>>>> To: p2pi@ietf.org
>>>>>>> Subject: [p2pi] draft-livingood-woundy-p4p-experiences-02 posted
>>>>>>>
>>>>>>> A draft at
>>>>>>> http://www.ietf.org/internet-drafts/draft-livingood-woundy-p4p
>>>>>>> -experienc
>>>>>>> es-02.txt may be of interest to folks that have been
>>>>>>> interested in P2Pi and ALTO.  We have requested time on the
>>>>>>> ALTO agenda at IETF 73 to present this.
>>>>>>>
>>>>>>> Regards
>>>>>>> Jason
>>>>>>> _______________________________________________
>>>>>>> p2pi mailing list
>>>>>>> p2pi@ietf.org
>>>>>>> https://www.ietf.org/mailman/listinfo/p2pi
>>>>>>>
>>>>>>>               
>>>>>> _______________________________________________
>>>>>> p2pi mailing list
>>>>>> p2pi@ietf.org
>>>>>> https://www.ietf.org/mailman/listinfo/p2pi
>>>>>>             
>>>>> _______________________________________________
>>>>> p2pi mailing list
>>>>> p2pi@ietf.org
>>>>> https://www.ietf.org/mailman/listinfo/p2pi
>>>>> _______________________________________________
>>>>> p2pi mailing list
>>>>> p2pi@ietf.org
>>>>> https://www.ietf.org/mailman/listinfo/p2pi
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Robb Topolski (robb@funchords.com)
>>>>> Hillsboro, Oregon USA
>>>>> http://www.funchords.com/
>>>>>
>>>>> _______________________________________________ p2pi mailing list
>>>>> p2pi@ietf.org https://www.ietf.org/mailman/listinfo/p2pi
>>>>>
>>>>>
>>>>>           
>>>> _______________________________________________
>>>> p2pi mailing list
>>>> p2pi@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/p2pi
>>>>         
>>> _______________________________________________
>>> p2pi mailing list
>>> p2pi@ietf.org
>>> https://www.ietf.org/mailman/listinfo/p2pi
>>>       
>> _______________________________________________
>> p2pi mailing list
>> p2pi@ietf.org
>> https://www.ietf.org/mailman/listinfo/p2pi
>>
>>
>>     
>
>
>
>   


_______________________________________________
p2pi mailing list
p2pi@ietf.org
https://www.ietf.org/mailman/listinfo/p2pi
_______________________________________________
p2pi mailing list
p2pi@ietf.org
https://www.ietf.org/mailman/listinfo/p2pi