Re: [p2pi] Real life torrent statistics

"Stas Khirman" <stas@khirman.com> Wed, 20 August 2008 00:21 UTC

Return-Path: <p2pi-bounces@ietf.org>
X-Original-To: p2pi-archive@ietf.org
Delivered-To: ietfarch-p2pi-archive@core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 6FAEC28C225; Tue, 19 Aug 2008 17:21:57 -0700 (PDT)
X-Original-To: p2pi@core3.amsl.com
Delivered-To: p2pi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 281FE3A6A1F for <p2pi@core3.amsl.com>; Tue, 19 Aug 2008 17:21:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.598
X-Spam-Level:
X-Spam-Status: No, score=-2.598 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id enOJvki-PtQE for <p2pi@core3.amsl.com>; Tue, 19 Aug 2008 17:21:47 -0700 (PDT)
Received: from mout.perfora.net (mout.perfora.net [74.208.4.195]) by core3.amsl.com (Postfix) with ESMTP id 363E628C22D for <p2pi@ietf.org>; Tue, 19 Aug 2008 17:21:46 -0700 (PDT)
Received: from viceroy (ppp-71-139-12-121.dsl.snfc21.pacbell.net [71.139.12.121]) by mrelay.perfora.net (node=mrus1) with ESMTP (Nemesis) id 0MKpCa-1KVbSN24i4-0005Iv; Tue, 19 Aug 2008 20:21:52 -0400
From: Stas Khirman <stas@khirman.com>
To: 'Laird Popkin' <laird@pando.com>, 'The 8472' <the8472@infinite-source.de>
References: <1104054922.179501219190149690.JavaMail.root@dkny.pando.com> <685421868.179521219190699129.JavaMail.root@dkny.pando.com>
Date: Tue, 19 Aug 2008 17:21:46 -0700
Message-ID: <003401c9025a$be8e0360$140aa8c0@viceroy>
MIME-Version: 1.0
X-Mailer: Microsoft Office Outlook 11
X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2900.3198
thread-index: AckCWHM0lYJ1JL9/SB66Cc+pIwVCugAAPKHw
In-Reply-To: <685421868.179521219190699129.JavaMail.root@dkny.pando.com>
X-Provags-ID: V01U2FsdGVkX18keYeJnKRC5o3ymDDrOAXukA+n4MBl10009hD fWcgADXIHiW9Tckonrlfp+vQwYWjOsTLE14fNRjGrodd/fycA4 qDM05f4TU+fWwfLhl3NpA==
Cc: p2pi@ietf.org, p4pwg@yahoogroups.com
Subject: Re: [p2pi] Real life torrent statistics
X-BeenThere: p2pi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: P2P Infrastructure Discussion <p2pi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/p2pi>, <mailto:p2pi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/pipermail/p2pi>
List-Post: <mailto:p2pi@ietf.org>
List-Help: <mailto:p2pi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/p2pi>, <mailto:p2pi-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============1411456707=="
Sender: p2pi-bounces@ietf.org
Errors-To: p2pi-bounces@ietf.org

Quite frankly, I do not thing that collected statistics can support this or
another point of view - it is too small set. I'll be glad to share more
datapoints with this community - probably in few days. 

 

But just for reference from my past data collection - for popular
http://tracker.prq.to/ tracker median size of the torrent swarm is 38 peers
only. In general, I saw that 10% of TV-shows torrents are using 80% of
peers. For movie content - it is about classical 20/80 ratio. 

 

  _____  

From: Laird Popkin [mailto:laird@pando.com] 
Sent: Tuesday, August 19, 2008 5:05 PM
To: The 8472
Cc: p2pi@ietf.org; p4pwg@yahoogroups.com; Stas Khirman
Subject: Re: [p2pi] Real life torrent statistics

 

You're right that swarms with 10 peers can't be optimized - the p2p network
will connect all 10 peers to each other and move as much data as possible.

 

That being said, I suspect that a small number of very popular swarms that
can be optimized would give significant impact on overall data flow, because
one swarm with 20,000 peers balances out a lot of 'long tail'. Haiyong Xie
of Yale mentioned to me that there was an analysis of this last year, by
spidering one of the popular torrent web sites, and their conclusion was
that well over 50% of the downloaders were in swarms with 100+ active peers,
which is where they estimate that P4P optimization applies. Similarly, when
I was in the music business and tracked such things, 2% of the files in the
p2p networks accounted for the large majority of the download activity.

 

I think that it would be a very interesting analysis, spidering some torrent
web sites and seeing how the distribution of bandwidth is balanced between
the 'head' and the 'long tail'.


- Laird Popkin, CTO, Pando Networks
  mobile: 646/465-0570

----- Original Message -----
From: "The 8472" <the8472@infinite-source.de>
To: "Stas Khirman" <stas@khirman.com>
Cc: p2pi@ietf.org, p4pwg@yahoogroups.com
Sent: Tuesday, August 19, 2008 4:40:39 PM (GMT-0800) America/Los_Angeles
Subject: Re: [p2pi] Real life torrent statistics

Stas Khirman wrote: 

To estimate a feasibility of ALTO/P4P for real life torrents , I collected
<ip,port> information for peers from one of the most popular "PirateBay"
torrents ( almost 20k peers) and maped their IPs to corresponded ASs. Please
find attached my working notes with some interesting statistics.

Ahh, there is a problem with this one. With torrents you have a significant
long tail when it comes to swarm sizes and content. I'm not certain about
distribution, but the long tail will probably outweight the... let's say top
100 torrents. Torrents with only 10-20 peers spread throughout several ASNs
is a much harder to optimize than the top 100.
This problem is aggreviated by swarm fragmentation due to private trackers
and since bittorrent does not aim to coalesce all torrents with the same
content, i.e. due to different piece sizes, file names etc.



Also, I find it surprising geo distribution of the peers - majority were in
UK , not in US (probably because content is available in US theaters).
Places 3-5 taken by Sweden, Poland and Canada (in total - more peers then in
US).

This will probably be different if you sample torrents from regional
trackers or torrents aimed at other audiences. During some DHT tracing on
the weekend i saw a significant proportion of DHT traffic coming from asian
countries, though i suspect an inefficient implementation of the DHT by a
client that's popular in china to play some role in this distribution.



 

Certainly, observed "heavy" neighboring of peers is a function of swarm
size. I intend to investigate a few medium/small size swarms to have a
multi-point picture for any future discussions.

as i mentioned above we should try to get the big picture, i.e. how relevant
the long tail is, measured in aggregate bandwidth. If the small torrents
actually make up the bulk of the traffic then any solution will require a
high degree of cooperation between ISPs, e.g. caches that cooperate with
each other.

-- 
The 8472 
independent developer for the Azureus Vuze Bittorrent client 

_______________________________________________ p2pi mailing list
p2pi@ietf.org https://www.ietf.org/mailman/listinfo/p2pi 

_______________________________________________
p2pi mailing list
p2pi@ietf.org
https://www.ietf.org/mailman/listinfo/p2pi