Re: [ppsp] [decade] Object naming in -req and -arch

Arno Bakker <> Fri, 13 July 2012 07:12 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 7E4F421F8795; Fri, 13 Jul 2012 00:12:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.535
X-Spam-Status: No, score=-1.535 tagged_above=-999 required=5 tests=[AWL=-0.031, BAYES_00=-2.599, HELO_EQ_NL=0.55, HOST_EQ_NL=1.545, RCVD_IN_DNSWL_LOW=-1]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 2UasvDiPPu2y; Fri, 13 Jul 2012 00:12:00 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 870D021F8790; Fri, 13 Jul 2012 00:12:00 -0700 (PDT)
Received: from ( by ( with Microsoft SMTP Server (TLS) id; Fri, 13 Jul 2012 09:12:32 +0200
Received: from [] ( by ( with Microsoft SMTP Server (TLS) id; Fri, 13 Jul 2012 09:12:34 +0200
Message-ID: <>
Date: Fri, 13 Jul 2012 09:12:45 +0200
From: Arno Bakker <>
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:11.0) Gecko/20120312 Thunderbird/11.0
MIME-Version: 1.0
To: Peng Zhang <>
References: <> <> <> <>
In-Reply-To: <>
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: 7bit
X-Originating-IP: []
Cc: ppsp <>, decade <>
Subject: Re: [ppsp] [decade] Object naming in -req and -arch
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: discussing to draw up peer to peer streaming protocol <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 13 Jul 2012 07:12:01 -0000

On 12/07/2012 22:28, Peng Zhang wrote:
> On Jul 12, 2012, at 2:23 AM, Arno Bakker wrote:
>> The gains of using MHT depend on the chunk size. For PPSP we prefer
>> chunks of 1K that fit in an UDP packet carried over Ethernet. In
>> that case, for a 4 GB file, there are 4 M chunks, resulting in 80
>> MB of leaf hashes when SHA1 is used. Transferring that beforehand
>> as in BitTorrent definitely increases latency ;o)
> Yes, if the chunk size is only 1KB, and each chunk is verified
> individually, we cannot afford to send all hashes beforehand. While
> in the worst case without optimization, almost 2*80M = 160M hashes
> needs to be sent to the receiver, will that be a large overhead
> compared to 4G? Do we really need such a small chunk size? Maybe I
> miss some previous discussion on this.


For PPSP we want to use UDP as we don't need the in-order and 
reliability features of TCP, and want flexibility to use differnet 
congestion control algorithms and handle NATs. With Ethernet as the 
dominant MAC layer at present and an unreliable transport we don't want 
datagrams to exceed the Ethernet MTU, otherwise the chance of losing a 
datagram increases (an UDP packet taking N IP packets will not be 
delivered when only 1 IP packet is lost). Hence, we use chunks of ~1K.

A good practice in P2P networks is to not forward data you have not 
verified. So to forward the 1K chunks directly we need to be able to 
verify their integrity at this granularity, enter Merkle Hash Trees.
We think the resulting overhead due to the size of the tree is 
acceptable, as it is easy to optimize the number of hashes transmitted
in our use cases.