Re: [icnrg] New Version Notification for draft-hong-icnrg-bloomfilterbased-name-resolution-01.txt

Dear Christian,

Thanks for the references and comments.

We will take a look.

Jungha Hong

-----Original Message-----
From: chesteve@gmail.com [mailto:chesteve@gmail.com] On Behalf Of Christian Esteve Rothenberg
Sent: Wednesday, October 01, 2014 5:03 AM
To: 홍정하
Cc: Petri Jokela; Konstantinos Katsaros; woojikchun@gmail.com; icnrg@irtf.org
Subject: Re: [icnrg] New Version Notification for draft-hong-icnrg-bloomfilterbased-name-resolution-01.txt

Nice discussion!

Petri, good to hear from you :) and your love–hate relationship with Bloom filters ;)

Just trying to a few cents to the conversation:

Traditional bloom filters as an array of bits is a convenient representation (out of many possible along multiple variations [1]) for the data structure but probably not the best one for many implementations (e.g., m >>> 1) and certainly not optimized for different SW/HW implementations. For instance, fingerprint compressed filter (FCF) as we played with some time ago in the early discussions on ICN forwarding alternatives [2], yield a BF-like data structure with very interesting properties on a per entry basis (e.g. deletion?) and amenable to optimized HW/SW implementations (including a hierarchical cache-like multi-memory design)

[1] S. Tarkoma, C. Esteve Rothenberg and E. Lagerspetz. "Theory and Practice of Bloom Filters for Distributed Systems." In IEEE Communications Surveys and Tutorials. Vol. 14, Number 1, 2012 http://dl.dropboxusercontent.com/u/15183439/pubs/bloom-filter-ieee-survey-preprint.pdf

[2]  C. Esteve Rothenberg, F. Verdi and M. Magalhães. "Towards a new generation of information-oriented internetworking architectures" ACM CoNext, First Workshop on Re-Architecting the Internet (Re-Arch08).

Dec. 2008, Madrid, Spain.

https://dl.dropboxusercontent.com/u/15183439/pubs/ccngi-rearch08-spswitch-esteve-camera-ready.pdf

Christian

On Tue, Sep 30, 2014 at 1:52 PM, 홍정하 <jhong@etri.re.kr<mailto:jhong@etri.re.kr>> wrote:

> Hi,

>

> Thanks a lot for your interesting and good comments on B-NRS.

> I totally agree with both of you.

>

> No matter what we choose between a large BF size and multiple smaller

> size BFs, we believe that we can get helped to prove any our ideas by

> H/W assisted implementation  and so we are working on it now.

>

> I would like to share any progress that I make further than now and

> get comments on it.

> So, please keep paying attention to discussions by e-mail.

> Any comments are more than welcome any time.

>

> Thanks,

> Jungha Hong

>

> ________________________________

> 보낸 사람 : "Petri Jokela" <petri.jokela@ericsson.com<mailto:petri.jokela@ericsson.com>> 보낸 날짜 : 2014-09-29

> 14:42:20 ( +09:00 ) 받는 사람 : Konstantinos Katsaros <ntinos@aueb.gr<mailto:ntinos@aueb.gr>> 참조

> : icnrg@irtf.org<mailto:icnrg@irtf.org> <icnrg@irtf.org<mailto:icnrg@irtf.org>> 제목 : Re: [icnrg] New Version

> Notification for

> draft-hong-icnrg-bloomfilterbased-name-resolution-01.txt

>

> Hi,

>

> Good comments, thanks. See few small comments inline:

>

> On 27 Sep 2014, at 16:12, Konstantinos Katsaros <ntinos@aueb.gr<mailto:ntinos@aueb.gr>> wrote:

>

> I think the requirement here is rather to have a globally fixed BF

> size for merging to be possible. Then we could have additional BFs

> created when the capacity is exceeded: each node "fills" BFs until the

> FPR is reached; when merging, an object counter per BF can assist in

> finding the appropriate BFs to merge i.e., the ones that will not exceed the FPR.

>

>

> This is true, I was probably initially thinking a very simple solution

> with only one BF. Multi-BF solution brings, of course, more

> possibilities to design the NRS mechanism.

>

> This calls for a bin packing algorithm with its own processing

> overheads. In this context, we could illustrate a tradeoff by

> considering the two following extremes:

>

> (i) BFs of size enough to hold every object in the network. That would

> surely work for the topmost nodes. However, lower level nodes would

> propagate mostly very sparse BFs upwards, meaning that we

> unnecessarily spend too much space/bandwidth. Though admittedly a

> lookup operation on such large BFs could be an issue, note that in

> principle one BF should be checked per interface/neighbour node.

>

>

> In case of a bit string, we probably can compress sparse BFs

> efficiently for transmission and the final size can be quite small.

>

> (ii) BFs of a very small size that would easily get "full"

> necessitating the use of an additional BF. In this case we would have

> "dense" BFs that would not waste too much space/bandwidth, but would

> require multiple lookup operations per interface/neighbour. I do not

> know however, whether multiple lookups of smaller BFs are

> (computationally) preferable compared to a single lookup of a huge BF, or if it makes no substantial difference.

>

>

> Neither do I know which one is more efficient. From a large BF, you

> need to verify max k chunks each of being x-bits in size (what is the

> most efficient x?). This reduces the amount of logical operations.

> However, how much processing is needed for getting those chunks from a

> large filter, I have no idea. I feel that I’m on a very thin ice when

> discussing the implementation efficiency of big filters :-)

>

> Nevertheless, this is pretty interesting topic and I am interested to

> see how the B-NRS solution develops.

>

> BR, Petri

>

>

>

> Regards,

> Konstantinos (Dinos) V. Katsaros

>

> PS: Unfortunately I also do not participate the interim meeting. But,

> waiting for the minutes.

>

> On 27 September 2014 08:17, Petri Jokela <petri.jokela@ericsson.com<mailto:petri.jokela@ericsson.com>> wrote:

>>

>> Hi, thanks for the clarifications. Unfortunately, I am not attending

>> the Saturday meeting. I hope you have fruitful discussions there. See

>> my reply

>> in-line:

>>

>> On 27 Sep 2014, at 00:15, 홍정하 <jhong@etri.re.kr<mailto:jhong@etri.re.kr>> wrote:

>>

>> Hi,

>>

>> We can take advantage of that bloom filter size can be adjusted by

>> the maximum number of names registered to a single NRS server to keep

>> the FPP less than a desirable value.

>>

>>

>> For example, the bloom filter with the size of 2MB for 10^6 entries

>> can keep the FPP less than 0.00046. If we assume that the number of

>> such a NRS server on the top level is 10^6 and they are fully peered,

>> then the maximum memory required at each server is 2TB. To deal with

>> such a size, we are considering of being supported by hardware using

>> GPU or TCAM and we are working on it.

>>

>>

>> If you want to merge BFs at higher level (by ORing them together),

>> you must use same size filters on each level. Filters with different

>> sizes cannot be merged. Just to clarify, are you assuming that you

>> are using different sizes of filters on different places in the

>> network, or are you using one-size filters which must be optimised

>> for the worst case? If you have different sizes of filters, you have

>> to collect the lower layer filters to the higher layer servers and

>> when this higher level B-NRS receives a request, it must match the

>> item to each of the 10^6 BFs separately. If you are

>>

>> On the other hand, different sizes of filters on different B-NRS

>> nodes would need a different set of hash functions for them. This is

>> not a major thing, but not efficient because you have to do more hash

>> calculations over the same item name to get correct values for

>> different filters where we want to test the existence of the data

>> item on a single server with multiple BFs from other B-NRSs.

>>

>> If you are using same size filters on every layer to make it possible

>> to OR them together, this requires huge filters because they must

>> take into account also the worst case (i.e. the maximum number of

>> information items that may be included in a single filter).

>>

>>

>>

>> As the draft [3] mentions, there are 10^9 nodes in the current Internet.

>> The number of addressable ICN objects will be several orders of

>> magnitude higher. This means that there will be, on average, tens of

>> thousands of ICN objects per node. How many of the objects will be in the NRS, is not known.

>> It is possible that some of the objects are only inside a single node

>> and some globally available. Also, depending on the role of a node,

>> the number of objects may be lower or higher.

>>

>> In B-NRS[3], it is not assumed that all names are globally visible.

>> We are considering that the maximum number of names which are

>> globally visible is

>> 10^9 and the maximum number of B-NRS servers which are fully peered

>> on the top level with 10^6 names for each is 10^3 ~ 10^6, which is the upper bound.

>>

>>

>> Now I must as that are you really talking about information items,

>> not the node names. Considering the network today and what is

>> available there, AND taking into account the new developments, such

>> as IoT with lots of information, I guess that the amount of

>> information items will be larger. If we have now 10^9 nodes in the

>> network, your assumption would mean one information item published

>> per one node (on average) globally, which is pretty low. Of course,

>> not everything is globally visible, but still the amount looks low.

>>

>> There is also a scalability issue. What happens if the number of

>> items still grows in the future and the filter sizes must be adjusted accordingly?

>> If increasing the size is enough, it is probably not a big problem.

>> But we should prepare us also for something that we cannot envision now.

>>

>>

>>

>> On the third level, the BFs from the ten 2nd level NRS are merged.

>> The merged BF contains information of 2 000 000 000 objects. In

>> theory, the 3rd level NRS can use this merged BF to check if a

>> certain ICN object can be found in any of the 1st level NRSes in this tree.

>>

>> It has to be corrected like that the 3rd level NRS checks for the 2nd

>> level and the 2nd level does for the 1st level

>>

>>

>> Yes, true. What is assumed to be the highest level of NRSs, is three

>> levels enough?

>>

>>

>> - What is the acceptable rate for false positives?

>> We are considering that FPP is less than 0.001 (i.e., 0.1%, 1000 out

>> of 1 M entries).

>>

>>

>> Reasonable. However, this changes my original calculations, where I

>> assumed slightly worse performance.

>>

>>

>> - How big BFs are feasible to send and receive in the network?

>> When bloom filters at each server are initially built, it will take a

>> long time. However, once they are established, the insertion is done

>> by each new name not the whole bloom filter.

>>

>> On bloom filter updates for deletions, we are considering of refresh

>> by shadow memory and the frequency can be from several hours to

>> several days

>>

>>

>> Inserting items into a BF is not a problem, the problem comes when

>> items should be removed and the whole BF must be recalculated as you

>> say and the update frequency obviously depends on many things.

>>

>>

>> - The verification process should be fast, how can we efficiently

>> match publications to filters with gigabit size?

>> With H/W assisted implementation using GPU, chunks including the

>> corresponding bit set to 1 for hash value can be loaded to GPU memory

>> and a child can be selected by bit-wise AND.

>>

>>

>> Ok, thanks! Although this probably requires some processing to find

>> these chunks before matching can be done. But, as said, I don’t know

>> much about the optimisations that you can do on lower layers.

>>

>> BR, Petri

>>

>>

>>

>> [1] Bellovin S., “Using Bloom Filters for Authenticated Yes/No

>> Answers in the DNS”, draft-bellovin-dnsext-bloomfilt-00.txt, December

>> 2001, Expired [2]

>> http://pages.cs.wisc.edu/~cao/papers/summary-cache/node8.html

>> [3] Hong, J., Chun, W., Jung, H., “Bloom Filter-based Flat Name

>> Resolution System for ICN”,

>> draft-hong-icnrg-bloomfilterbased-name-resolution-01.txt

>>

>>

>> --

>> Petri Jokela

>> Senior researcher

>> NomadicLab, Ericsson Research

>> Oy L M Ericsson Ab

>>

>> E-mail: petri.jokela@ericsson.com<mailto:petri.jokela@ericsson.com>

>> Mobile: +358 44 299 2413

>>

>>

>> _______________________________________________

>> icnrg mailing list

>> icnrg@irtf.org<mailto:icnrg@irtf.org>

>> https://www.irtf.org/mailman/listinfo/icnrg

>>

>

> _______________________________________________

> icnrg mailing list

> icnrg@irtf.org<mailto:icnrg@irtf.org>

> https://www.irtf.org/mailman/listinfo/icnrg

>

>

> --

> Petri Jokela

> Senior researcher

> NomadicLab, Ericsson Research

> Oy L M Ericsson Ab

>

> E-mail: petri.jokela@ericsson.com<mailto:petri.jokela@ericsson.com>

> Mobile: +358 44 299 2413

>

>

> _______________________________________________

> icnrg mailing list

> icnrg@irtf.org<mailto:icnrg@irtf.org>

> https://www.irtf.org/mailman/listinfo/icnrg

>