Re: [ppsp] [decade] Object naming in -req and -arch

Arno Bakker <arno@cs.vu.nl> Tue, 10 July 2012 07:48 UTC

Return-Path: <a.bakker@vu.nl>
X-Original-To: ppsp@ietfa.amsl.com
Delivered-To: ppsp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B8FE811E80C7 for <ppsp@ietfa.amsl.com>; Tue, 10 Jul 2012 00:48:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.204
X-Spam-Level:
X-Spam-Status: No, score=-1.204 tagged_above=-999 required=5 tests=[AWL=-0.300, BAYES_00=-2.599, HELO_EQ_NL=0.55, HOST_EQ_NL=1.545, J_CHICKENPOX_23=0.6, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ha-62bU1aH+0 for <ppsp@ietfa.amsl.com>; Tue, 10 Jul 2012 00:48:42 -0700 (PDT)
Received: from mailin.vu.nl (mailin.vu.nl [130.37.164.19]) by ietfa.amsl.com (Postfix) with ESMTP id B21B411E808A for <ppsp@ietf.org>; Tue, 10 Jul 2012 00:48:41 -0700 (PDT)
Received: from PEXHB011A.vu.local (130.37.236.64) by mailin.vu.nl (130.37.164.19) with Microsoft SMTP Server (TLS) id 14.2.298.4; Tue, 10 Jul 2012 09:49:05 +0200
Received: from [130.37.193.73] (130.37.238.20) by mails.vu.nl (130.37.236.64) with Microsoft SMTP Server (TLS) id 14.2.298.4; Tue, 10 Jul 2012 09:49:06 +0200
Message-ID: <4FFBDE7C.8090807@cs.vu.nl>
Date: Tue, 10 Jul 2012 09:49:16 +0200
From: Arno Bakker <arno@cs.vu.nl>
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:11.0) Gecko/20120312 Thunderbird/11.0
MIME-Version: 1.0
To: ppsp@ietf.org
References: <20120703003402663560214@gmail.com> <4FF2ACC3.1020004@cs.tcd.ie> <20120704160541638826251@gmail.com> <4FF4B0D5.40906@cs.tcd.ie> <20120704174022735010267@gmail.com> <4FF4B8C2.7090702@cs.tcd.ie> <20120704175739679926274@gmail.com> <4FF4BD39.7050907@cs.tcd.ie> <20120706223508854218405@gmail.com> <4FF82C0E.5020809@cs.tcd.ie> <ED41823C-5ACD-4428-866E-3C9B8D5BF16C@gmail.com> <82AB329A76E2484D934BBCA77E9F524924D0ECB1@PALLENE.office.hd> <4651C731-C630-4E8F-8FF2-697F32EBA86F@gmail.com> <CANUuoLq1s+XyfnTNyU5juqbQBorirG+S4Tj-++82Noi0a6DVsw@mail.gmail.com>
In-Reply-To: <CANUuoLq1s+XyfnTNyU5juqbQBorirG+S4Tj-++82Noi0a6DVsw@mail.gmail.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Originating-IP: [130.37.238.20]
Subject: Re: [ppsp] [decade] Object naming in -req and -arch
X-BeenThere: ppsp@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
Reply-To: arno@cs.vu.nl
List-Id: discussing to draw up peer to peer streaming protocol <ppsp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ppsp>, <mailto:ppsp-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ppsp>
List-Post: <mailto:ppsp@ietf.org>
List-Help: <mailto:ppsp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ppsp>, <mailto:ppsp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Jul 2012 07:48:42 -0000

Hi all

I'll try to clarify the rationale and practical overhead of the Merkle 
Hash Trees in PPSP. For static content, MHTs enable content integrity 
protection using self-certified naming. Using a hash tree instead of a 
single hash is useful in all situations where the content is distributed
in parts (=a sequence of objects as you mention it) that are immediately 
used. In particular, when the parts are delivered to a higher level app 
upon receipt they must be integrity checked beforehand. This applies to 
streaming, but perhaps also to other P2P apps using DECADE.

Even if parts are not immediately used, an integrity check on parts can 
help to improve efficiency in a P2P context. An end-to-end integrity 
check when the content is completely downloaded is sufficient, but for
efficiency it would be nice to know if the individual parts are correct
instead of finding out at the end, especially for large content.

Note that Merkle Hash Trees support both partial and end-to-end 
integrity checks. When a peer has a copy of the content and the name of 
the object (=its root hash in the MHT) he can calculate the MHT from the 
content and compare the calculated root hash to the name. He does not 
need to receive any of the intermediary hashes from others, if that is 
not required.

Which brings us to the topic of overhead. As discussed in Sec. 5.5 of
http://www.ietf.org/id/draft-ietf-ppsp-peer-protocol-02.txt
the size of the MHT depends on the number of chunks (objects) at the 
base of the tree. That number depends on the size of the chunks that are 
processed immediately in the P2P application. For PPSP over UDP over 
Ethernet these chunks are small. For other P2P apps the chunks may be 
bigger.

How much of the MHT tree actually needs to be sent over the wire to a 
receiving peer depends on the download policy used. For a linear 
download only part of the tree needs to be transmitted, as the other 
part of the tree is calculated by the receiving peer while downloading.
In the example in Sec. 5.5, only 7 of the 16 hashes in the tree are 
actually transmitted.

Note that swift, the protocol on which the PPSP peer protocol is based 
was actually designed as a generic transport protocol, unifying regular 
downloads, VOD and live streaming. So it still supports efficient 
non-streaming download policies like BitTorrent's rarest first. In other 
words, its origins fits the general distribution nature of DECADE.

Regards,
      Arno


On 10/07/2012 07:23, Y. Richard Yang wrote:
> Hi Peng, Dirk,
>
> I am cc'ing the ppsp list as well. You raised a good point on the
> distinction between one object and a sequence of objects. To generalize,
> we can discuss even a set of objects (no ordering), and a set of
> equivalence objects (dynamic streaming that interleaves different
> resolutions). Your arguement against MTH is higher overhead in the
> general case (end to end arguements). How much exactly is the overhead?
> Decade may benefit from the analysis from ppsp. Since streaming is
> considered as the main app, how much overhead if decade has to build on top?
>
> Thanks!
>
> Richard
>
> On Tuesday, July 10, 2012, Peng Zhang wrote:
>
>     Hi Dirk and all,
>
>     I agree that the NI specification meets the basic requirement of
>     DECADE (without optimization on "early name generation").
>
>     As for the Merkle Hash Tree, or MTH, it is a integrity-assurance
>     method for a sequence of objects. It is critical to the PPSP
>     protocol. But i wonder wether we should incorporate it in our design:
>
>     First, DECADE is targeted at general content distribution
>     applications, and for applications other than P2P streaming, there
>     is no great value of using Merkle Hash Tree. It may cause high
>     overhead to these applications due "meta data" including signatures
>     and full hashes should be exchanged.
>
>     Still, we can discuss more on how to better incorporate PPSP based
>     on NI without hurting the general application of DECADE. Thanks.
>