Re: [icnrg] New Version Notification for draft-hong-icnrg-bloomfilterbased-name-resolution-01.txt

Christian Esteve Rothenberg <chesteve@dca.fee.unicamp.br> Tue, 30 September 2014 20:03 UTC

Return-Path: <chesteve@gmail.com>
X-Original-To: icnrg@ietfa.amsl.com
Delivered-To: icnrg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F025E1A87AC for <icnrg@ietfa.amsl.com>; Tue, 30 Sep 2014 13:03:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.978
X-Spam-Level:
X-Spam-Status: No, score=-0.978 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FM_FORGED_GMAIL=0.622, FREEMAIL_FROM=0.001, MIME_8BIT_HEADER=0.3, SPF_PASS=-0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sJKN3b-QN6Mb for <icnrg@ietfa.amsl.com>; Tue, 30 Sep 2014 13:03:09 -0700 (PDT)
Received: from mail-wg0-x229.google.com (mail-wg0-x229.google.com [IPv6:2a00:1450:400c:c00::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4DE9C1A87BC for <icnrg@irtf.org>; Tue, 30 Sep 2014 13:03:08 -0700 (PDT)
Received: by mail-wg0-f41.google.com with SMTP id k14so16000222wgh.0 for <icnrg@irtf.org>; Tue, 30 Sep 2014 13:03:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type:content-transfer-encoding; bh=NmqXDpW8Qem+s5nY4wMB/tzG0vXlb4C8sMeDv3BRcqg=; b=LWu2UIuUDIGkQlyOkhrcxcBy7UOsp8qJMW93UhVgEeRnevjgJln4+H29/La3eyWTDB mBZbQfzKZ9j8HCYgInCwAwVinAPXJoL2f5IM6h37H70r76AhMFAf95e71HMvrevIHs+o eG9XPGdqP7ppgaeyuICDaBVPfaLIVF4HbL+yTnbkl1cUQtu9MJO+ylZJGPI/AKhGZHhq XmaEdJQmV92RXUXAMpOGWZmedl3XtVgBww+rANY5sXN2PKk5c9O7pbQVbFQxgDUo19GG nGOyVbA6Tzvqu91x8glAJrwL51qICMssBiz28h92UD+snXm+uJlBqo8Mu2dVblpVYEN8 MaDA==
X-Received: by 10.194.83.67 with SMTP id o3mr710017wjy.31.1412107386883; Tue, 30 Sep 2014 13:03:06 -0700 (PDT)
MIME-Version: 1.0
Sender: chesteve@gmail.com
Received: by 10.27.219.143 with HTTP; Tue, 30 Sep 2014 13:02:45 -0700 (PDT)
In-Reply-To: <F8EFC212DF9A004DA18AA8FB011E42331D61143A@SMTP1.etri.info>
References: <20140917002758.31365.13695.idtracker@ietfa.amsl.com> <F8EFC212DF9A004DA18AA8FB011E42331D610CB8@SMTP1.etri.info> <8A63F6C6-4361-4105-B1AD-04B3DCE99F6E@ericsson.com> <F8EFC212DF9A004DA18AA8FB011E42331D611152@SMTP1.etri.info> <3F24F5B9-BEDB-4C25-AFAB-F4DA368789A0@ericsson.com> <CAC2HzvS92mQXrr2c_u7jNGhb9JOjgXmvYdHi1En6cL-s8bAhbw@mail.gmail.com> <869157B6-617A-4D31-929C-616CF1F7BE33@ericsson.com> <F8EFC212DF9A004DA18AA8FB011E42331D61143A@SMTP1.etri.info>
From: Christian Esteve Rothenberg <chesteve@dca.fee.unicamp.br>
Date: Tue, 30 Sep 2014 17:02:45 -0300
X-Google-Sender-Auth: QSgsj0wYRmQMIpit9ZAiUsQeSVA
Message-ID: <CAEj5p9T6joUD4Kk1tKmQyfouv5oAhfuOXpG-Ysr8AmHr1RXZ6Q@mail.gmail.com>
To: 홍정하 <jhong@etri.re.kr>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: http://mailarchive.ietf.org/arch/msg/icnrg/G6jc1U-BSV_U2Ghv9qzJUWgi0Rw
Cc: "woojikchun@gmail.com" <woojikchun@gmail.com>, Konstantinos Katsaros <ntinos@aueb.gr>, "icnrg@irtf.org" <icnrg@irtf.org>, Petri Jokela <petri.jokela@ericsson.com>
Subject: Re: [icnrg] New Version Notification for draft-hong-icnrg-bloomfilterbased-name-resolution-01.txt
X-BeenThere: icnrg@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Information-Centric Networking research group discussion list <icnrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/icnrg>, <mailto:icnrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/icnrg/>
List-Post: <mailto:icnrg@irtf.org>
List-Help: <mailto:icnrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/icnrg>, <mailto:icnrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Sep 2014 20:03:12 -0000

Nice discussion!

Petri, good to hear from you :) and your love–hate relationship with
Bloom filters ;)

Just trying to a few cents to the conversation:
Traditional bloom filters as an array of bits is a convenient
representation (out of many possible along multiple variations [1])
for the data structure but probably not the best one for many
implementations (e.g., m >>> 1) and certainly not optimized for
different SW/HW implementations. For instance, fingerprint compressed
filter (FCF) as we played with some time ago in the early discussions
on ICN forwarding alternatives [2], yield a BF-like data structure
with very interesting properties on a per entry basis (e.g. deletion?)
and amenable to optimized HW/SW implementations (including a
hierarchical cache-like multi-memory design)

[1] S. Tarkoma, C. Esteve Rothenberg and E. Lagerspetz. "Theory and
Practice of Bloom Filters for Distributed Systems." In IEEE
Communications Surveys and Tutorials. Vol. 14, Number 1, 2012
http://dl.dropboxusercontent.com/u/15183439/pubs/bloom-filter-ieee-survey-preprint.pdf

[2]  C. Esteve Rothenberg, F. Verdi and M. Magalhães. "Towards a new
generation of information-oriented internetworking architectures" ACM
CoNext, First Workshop on Re-Architecting the Internet (Re-Arch08).
Dec. 2008, Madrid, Spain.
https://dl.dropboxusercontent.com/u/15183439/pubs/ccngi-rearch08-spswitch-esteve-camera-ready.pdf
Christian


On Tue, Sep 30, 2014 at 1:52 PM, 홍정하 <jhong@etri.re.kr> wrote:
> Hi,
>
> Thanks a lot for your interesting and good comments on B-NRS.
> I totally agree with both of you.
>
> No matter what we choose between a large BF size and multiple smaller size
> BFs, we believe that we can get helped to prove any our ideas by H/W
> assisted implementation  and so we are working on it now.
>
> I would like to share any progress that I make further than now and get
> comments on it.
> So, please keep paying attention to discussions by e-mail.
> Any comments are more than welcome any time.
>
> Thanks,
> Jungha Hong
>
> ________________________________
> 보낸 사람 : "Petri Jokela" <petri.jokela@ericsson.com>
> 보낸 날짜 : 2014-09-29 14:42:20 ( +09:00 )
> 받는 사람 : Konstantinos Katsaros <ntinos@aueb.gr>
> 참조 : icnrg@irtf.org <icnrg@irtf.org>
> 제목 : Re: [icnrg] New Version Notification for
> draft-hong-icnrg-bloomfilterbased-name-resolution-01.txt
>
> Hi,
>
> Good comments, thanks. See few small comments inline:
>
> On 27 Sep 2014, at 16:12, Konstantinos Katsaros <ntinos@aueb.gr> wrote:
>
> I think the requirement here is rather to have a globally fixed BF size for
> merging to be possible. Then we could have additional BFs created when the
> capacity is exceeded: each node "fills" BFs until the FPR is reached; when
> merging, an object counter per BF can assist in finding the appropriate BFs
> to merge i.e., the ones that will not exceed the FPR.
>
>
> This is true, I was probably initially thinking a very simple solution with
> only one BF. Multi-BF solution brings, of course, more possibilities to
> design the NRS mechanism.
>
> This calls for a bin packing algorithm with its own processing overheads. In
> this context, we could illustrate a tradeoff by considering the two
> following extremes:
>
> (i) BFs of size enough to hold every object in the network. That would
> surely work for the topmost nodes. However, lower level nodes would
> propagate mostly very sparse BFs upwards, meaning that we unnecessarily
> spend too much space/bandwidth. Though admittedly a lookup operation on such
> large BFs could be an issue, note that in principle one BF should be checked
> per interface/neighbour node.
>
>
> In case of a bit string, we probably can compress sparse BFs efficiently for
> transmission and the final size can be quite small.
>
> (ii) BFs of a very small size that would easily get "full" necessitating the
> use of an additional BF. In this case we would have "dense" BFs that would
> not waste too much space/bandwidth, but would require multiple lookup
> operations per interface/neighbour. I do not know however, whether multiple
> lookups of smaller BFs are (computationally) preferable compared to a single
> lookup of a huge BF, or if it makes no substantial difference.
>
>
> Neither do I know which one is more efficient. From a large BF, you need to
> verify max k chunks each of being x-bits in size (what is the most efficient
> x?). This reduces the amount of logical operations. However, how much
> processing is needed for getting those chunks from a large filter, I have no
> idea. I feel that I’m on a very thin ice when discussing the implementation
> efficiency of big filters :-)
>
> Nevertheless, this is pretty interesting topic and I am interested to see
> how the B-NRS solution develops.
>
> BR, Petri
>
>
>
> Regards,
> Konstantinos (Dinos) V. Katsaros
>
> PS: Unfortunately I also do not participate the interim meeting. But,
> waiting for the minutes.
>
> On 27 September 2014 08:17, Petri Jokela <petri.jokela@ericsson.com> wrote:
>>
>> Hi, thanks for the clarifications. Unfortunately, I am not attending the
>> Saturday meeting. I hope you have fruitful discussions there. See my reply
>> in-line:
>>
>> On 27 Sep 2014, at 00:15, 홍정하 <jhong@etri.re.kr> wrote:
>>
>> Hi,
>>
>> We can take advantage of that bloom filter size can be adjusted by the
>> maximum number of names registered to a single NRS server to keep the FPP
>> less than a desirable value.
>>
>>
>> For example, the bloom filter with the size of 2MB for 10^6 entries can
>> keep the FPP less than 0.00046. If we assume that the number of such a NRS
>> server on the top level is 10^6 and they are fully peered, then the maximum
>> memory required at each server is 2TB. To deal with such a size, we are
>> considering of being supported by hardware using GPU or TCAM and we are
>> working on it.
>>
>>
>> If you want to merge BFs at higher level (by ORing them together), you
>> must use same size filters on each level. Filters with different sizes
>> cannot be merged. Just to clarify, are you assuming that you are using
>> different sizes of filters on different places in the network, or are you
>> using one-size filters which must be optimised for the worst case? If you
>> have different sizes of filters, you have to collect the lower layer filters
>> to the higher layer servers and when this higher level B-NRS receives a
>> request, it must match the item to each of the 10^6 BFs separately. If you
>> are
>>
>> On the other hand, different sizes of filters on different B-NRS nodes
>> would need a different set of hash functions for them. This is not a major
>> thing, but not efficient because you have to do more hash calculations over
>> the same item name to get correct values for different filters where we want
>> to test the existence of the data item on a single server with multiple BFs
>> from other B-NRSs.
>>
>> If you are using same size filters on every layer to make it possible to
>> OR them together, this requires huge filters because they must take into
>> account also the worst case (i.e. the maximum number of information items
>> that may be included in a single filter).
>>
>>
>>
>> As the draft [3] mentions, there are 10^9 nodes in the current Internet.
>> The number of addressable ICN objects will be several orders of magnitude
>> higher. This means that there will be, on average, tens of thousands of ICN
>> objects per node. How many of the objects will be in the NRS, is not known.
>> It is possible that some of the objects are only inside a single node and
>> some globally available. Also, depending on the role of a node, the number
>> of objects may be lower or higher.
>>
>> In B-NRS[3], it is not assumed that all names are globally visible. We are
>> considering that the maximum number of names which are globally visible is
>> 10^9 and the maximum number of B-NRS servers which are fully peered on the
>> top level with 10^6 names for each is 10^3 ~ 10^6, which is the upper bound.
>>
>>
>> Now I must as that are you really talking about information items, not the
>> node names. Considering the network today and what is available there, AND
>> taking into account the new developments, such as IoT with lots of
>> information, I guess that the amount of information items will be larger. If
>> we have now 10^9 nodes in the network, your assumption would mean one
>> information item published per one node (on average) globally, which is
>> pretty low. Of course, not everything is globally visible, but still the
>> amount looks low.
>>
>> There is also a scalability issue. What happens if the number of items
>> still grows in the future and the filter sizes must be adjusted accordingly?
>> If increasing the size is enough, it is probably not a big problem. But we
>> should prepare us also for something that we cannot envision now.
>>
>>
>>
>> On the third level, the BFs from the ten 2nd level NRS are merged. The
>> merged BF contains information of 2 000 000 000 objects. In theory, the 3rd
>> level NRS can use this merged BF to check if a certain ICN object can be
>> found in any of the 1st level NRSes in this tree.
>>
>> It has to be corrected like that the 3rd level NRS checks for the 2nd
>> level and the 2nd level does for the 1st level
>>
>>
>> Yes, true. What is assumed to be the highest level of NRSs, is three
>> levels enough?
>>
>>
>> - What is the acceptable rate for false positives?
>> We are considering that FPP is less than 0.001 (i.e., 0.1%, 1000 out of 1
>> M entries).
>>
>>
>> Reasonable. However, this changes my original calculations, where I
>> assumed slightly worse performance.
>>
>>
>> - How big BFs are feasible to send and receive in the network?
>> When bloom filters at each server are initially built, it will take a long
>> time. However, once they are established, the insertion is done by each new
>> name not the whole bloom filter.
>>
>> On bloom filter updates for deletions, we are considering of refresh by
>> shadow memory and the frequency can be from several hours to several days
>>
>>
>> Inserting items into a BF is not a problem, the problem comes when items
>> should be removed and the whole BF must be recalculated as you say and the
>> update frequency obviously depends on many things.
>>
>>
>> - The verification process should be fast, how can we efficiently match
>> publications to filters with gigabit size?
>> With H/W assisted implementation using GPU, chunks including the
>> corresponding bit set to 1 for hash value can be loaded to GPU memory and a
>> child can be selected by bit-wise AND.
>>
>>
>> Ok, thanks! Although this probably requires some processing to find these
>> chunks before matching can be done. But, as said, I don’t know much about
>> the optimisations that you can do on lower layers.
>>
>> BR, Petri
>>
>>
>>
>> [1] Bellovin S., “Using Bloom Filters for Authenticated Yes/No Answers in
>> the DNS”, draft-bellovin-dnsext-bloomfilt-00.txt, December 2001, Expired
>> [2] http://pages.cs.wisc.edu/~cao/papers/summary-cache/node8.html
>> [3] Hong, J., Chun, W., Jung, H., “Bloom Filter-based Flat Name Resolution
>> System for ICN”, draft-hong-icnrg-bloomfilterbased-name-resolution-01.txt
>>
>>
>> --
>> Petri Jokela
>> Senior researcher
>> NomadicLab, Ericsson Research
>> Oy L M Ericsson Ab
>>
>> E-mail: petri.jokela@ericsson.com
>> Mobile: +358 44 299 2413
>>
>>
>> _______________________________________________
>> icnrg mailing list
>> icnrg@irtf.org
>> https://www.irtf.org/mailman/listinfo/icnrg
>>
>
> _______________________________________________
> icnrg mailing list
> icnrg@irtf.org
> https://www.irtf.org/mailman/listinfo/icnrg
>
>
> --
> Petri Jokela
> Senior researcher
> NomadicLab, Ericsson Research
> Oy L M Ericsson Ab
>
> E-mail: petri.jokela@ericsson.com
> Mobile: +358 44 299 2413
>
>
> _______________________________________________
> icnrg mailing list
> icnrg@irtf.org
> https://www.irtf.org/mailman/listinfo/icnrg
>