Re: [icnrg] New Version Notification for draft-hong-icnrg-bloomfilterbased-name-resolution-01.txt

홍정하 <jhong@etri.re.kr> Fri, 26 September 2014 21:16 UTC

From: 홍정하 <jhong@etri.re.kr>
To: Petri Jokela <petri.jokela@ericsson.com>, "icnrg@irtf.org" <icnrg@irtf.org>
Thread-Topic: [icnrg] New Version Notification for draft-hong-icnrg-bloomfilterbased-name-resolution-01.txt
Thread-Index: AQHP0g4+gbwFbOdbOEG9cjdNBhruHJwGBxAwgAnUxgCABBho1w==
Date: Fri, 26 Sep 2014 21:15:56 +0000
Message-ID: <F8EFC212DF9A004DA18AA8FB011E42331D611152@SMTP1.etri.info>
References: <20140917002758.31365.13695.idtracker@ietfa.amsl.com> <F8EFC212DF9A004DA18AA8FB011E42331D610CB8@SMTP1.etri.info>, <8A63F6C6-4361-4105-B1AD-04B3DCE99F6E@ericsson.com>
In-Reply-To: <8A63F6C6-4361-4105-B1AD-04B3DCE99F6E@ericsson.com>
Accept-Language: ko-KR, en-US
Content-Language: ko-KR
Content-Type: multipart/alternative; boundary="_000_F8EFC212DF9A004DA18AA8FB011E42331D611152SMTP1etriinfo_"
MIME-Version: 1.0
Archived-At: http://mailarchive.ietf.org/arch/msg/icnrg/n1ZO9gbdFNgPbGe9I9JJCm16HFI
Subject: Re: [icnrg] New Version Notification for draft-hong-icnrg-bloomfilterbased-name-resolution-01.txt
Precedence: list

Hi,

Here are my response.

We chose to utilize bloom filters for efficient lookup of flat names, where bloom filter size and false positive probability (FPP) are big challenges that have to be addressed.

We can take advantage of that bloom filter size can be adjusted by the maximum number of names registered to a single NRS server to keep the FPP less than a desirable value.

For example, the bloom filter with the size of 2MB for 10^6 entries can keep the FPP less than 0.00046. If we assume that the number of such a NRS server on the top level is 10^6 and they are fully peered, then the maximum memory required at each server is 2TB. To deal with such a size, we are considering of being supported by hardware using GPU or TCAM and we are working on it.

I put my response in the middle of your comments.
Please find them below.

Again, we are thankful for your comments.
We hope that it could be discussed more at the tomorrow meeting.

Jungha Hong

________________________________
보낸 사람 : "Petri Jokela" <petri.jokela@ericsson.com>
보낸 날짜 : 2014-09-25 00:23:00 ( +09:00 )
받는 사람 : 홍정하 <jhong@etri.re.kr>, icnrg@irtf.org <icnrg@irtf.org>
참조 :
제목 : Re: [icnrg] New Version Notification for draft-hong-icnrg-bloomfilterbased-name-resolution-01.txt

On 18 Sep 2014, at 03:31, 홍정하 <jhong@etri.re.kr<mailto:jhong@etri.re.kr>> wrote:

Hi all,

The draft, draft-hong-icnrg-bloomfilterbased-name-resolution-01, has been submitted to the ID system.
In this draft, we have tried to reflect few questions that we got from last meeting.

Please take a look and let us know if you have any questions or comments.
I hope this draft could be discussed more in the interim meeting.

Thanks,
Jungha Hong

Hi,

I have read the draft and I have some comments. I’m sorry, this e-mail is pretty long. If you think that I have miscalculated or made some erroneous assumptions, please correct me. So, here we go...

In this mail I try to figure out the feasibility of using Bloom filters as they are used in the original draft [3]. I would be happy to see that 1) my assumptions are too negative and / or 2) the filters do not have to be as large as I calculate here. It seems that the sizes of the filters will be huge in some environments, for example in a large corporate network.

Bloom filter matching provides an efficient way of verifying if an item belongs to a set or not, but false positive matches decrease the performance. Thus, I will concentrate on calculating the required size of the Bloom filter in the system with some assumptions of the number of nodes, number of ICN objects, and acceptable false positive rate.

As the draft [3] mentions, there are 10^9 nodes in the current Internet. The number of addressable ICN objects will be several orders of magnitude higher. This means that there will be, on average, tens of thousands of ICN objects per node. How many of the objects will be in the NRS, is not known. It is possible that some of the objects are only inside a single node and some globally available. Also, depending on the role of a node, the number of objects may be lower or higher.

In B-NRS[3], it is not assumed that all names are globally visible. We are considering that the maximum number of names which are globally visible is 10^9 and the maximum number of B-NRS servers which are fully peered on the top level with 10^6 names for each is 10^3 ~ 10^6, which is the upper bound.

The draft [3] envisions that the BF NRS is hierarchical and has multiple levels in a tree. It is not defined, how many nodes one NRS tree will serve. We can take an estimate, e.g. from a large corporate network where we would have, say, 200 000 nodes. Assuming that a single NRS serves e.g. 2 000 nodes, we need 100 NRS nodes. If we put them in a tree, we need more NRS nodes for higher levels. If each higher level NRS would serve e.g. 10 lower level NRS nodes, we would have 10 second level NRSs and one third level NRS. NOTE: I do not consider the peering possibility in this text, only the single NRS tree for this particular use case.

If each of the nodes in the network has 10 000 ICN objects that they publish, a single 1st level NRS would maintain location information for 20 000 000 objects. It creates a BF out of these and sends this BF to the upper level NRS. The 2nd level NRS receives BFs from ten of the 1st level NRSes and combines them using logical OR operation. This means that the resulting BF contains information of 200 000 000 objects. 2nd level NRS can use this combined BF to verify if any of the NRSes below it has that particular ICN object information, but it can also match the object to all ten received 1st level BFs which gives more accurate information. In the first case, if there is a match in the combined BF, the NRS must also try matching to all the 1st level BFs to find the actual NRS that maintains the object location information.

On the third level, the BFs from the ten 2nd level NRS are merged. The merged BF contains information of 2 000 000 000 objects. In theory, the 3rd level NRS can use this merged BF to check if a certain ICN object can be found in any of the 1st level NRSes in this tree.

It has to be corrected like that the 3rd level NRS checks for the 2nd level and the 2nd level does for the 1st level.

Now, the problem with BFs is the amount of false positives when the number of 1s in the filter increases. We can reduce the amount of false positives by selecting carefully the length of the filter m and the number of hash functions that are used for calculating the index values, namely k.

For each of the BFs in different levels, the amount of objects we have:

n1 = 2 * 10^7
n2 = 2 * 10^8
n3 = 2 * 10^9

If we aim at (roughly) 1% false positive rate, we get some possible values for m and k (from [2]):

m/n = 10, k=4:

level 1: m = n1 * 10 = 20 * 10^7 (~ 200 000 000 bits ~ 200 Mbits)
level 2: m = n2 * 10 = 20 * 10^8 (~ 2 000 000 000 ~ 2 Gbits)
level 3: m = n3 * 10 = … ~ 20 000 000 000 ~ 20 Gbits

Increasing k to 8, targeting to 1% false positive ratio, we get m/n = 9 (actually, it does not change much the false positive ratio if we select any value for k from 4 to 8, see the table in [2]):

level 1: m = n1 * 11 = 18 * 10^7 (~ 180 Mbits)
level 2: m ~ 1.8 Gbits
level 3: m ~ 18 Gbits

1% false positives would mean that e.g. 1 000 000 queries would generate 10 000 false queries to the lower level NRSs.

If we accept (roughly) 5% false positives, we can use m/n = 6 and k = 4. This would result in filter sizes:

level 1: m = n1 * 6 = 12 * 10^7 (~ 120 000 000 bits ~ 120 Mbits)
..
level 3: m = n3 * 6 = 12 * 10^9 (~ 12 000 000 000 bits ~ 12 Gbits)

In [1], BFs were suggested to be used in DNSSEC authentication. Not going into details of authentication, the author was suggesting to use

k = 16, m = 40
k = 16, m = 32
k = 8, m = 40
k = 8, m = 32

to get low enough false positive probability. They were aiming at false positive ratios max 0.005%. In the NRS and flat name case, this low false positive ratio is not necessarily required, but I took this draft as an example because they were using 800 Mbit filters with k = 8 or 16. And, hopefully, they had kept in mind the performance for the verification process. I must admit that I don’t know how the verification can be implemented efficiently with such large filters.

We have to note that with Bloom filters, the size of the filter is always the same at all the levels. I.e. we have to take big enough filter so that it works also on higher levels in hierarchy. So, just quickly looking at this, it seems that we are dealing with huge Bloom filters, that are gigabits in size.

Another issue is the transmission of the Bloom filters between NRSes. To keep the information as accurate as possible on different levels, we have to send updated Bloom filters pretty often to the upper level NRSes. How often this is done, depends naturally on the activity level of the nodes in the network, i.e. how often they publish new information items to the NRS or remove them.

So, we have some questions, e.g.

- What is the acceptable rate for false positives?
We are considering that FPP is less than 0.001 (i.e., 0.1%, 1000 out of 1 M entries).

- How big BFs are feasible to send and receive in the network?
When bloom filters at each server are initially built, it will take a long time. However, once they are established, the insertion is done by each new name not the whole bloom filter.
On bloom filter updates for deletions, we are considering of refresh by shadow memory and the frequency can be from several hours to several days.

- The verification process should be fast, how can we efficiently match publications to filters with gigabit size?
With H/W assisted implementation using GPU, chunks including the corresponding bit set to 1 for hash value can be loaded to GPU memory and a child can be selected by bit-wise AND.

[1] Bellovin S., “Using Bloom Filters for Authenticated Yes/No Answers in the DNS”, draft-bellovin-dnsext-bloomfilt-00.txt, December 2001, Expired
[2] http://pages.cs.wisc.edu/~cao/papers/summary-cache/node8.html
[3] Hong, J., Chun, W., Jung, H., “Bloom Filter-based Flat Name Resolution System for ICN”, draft-hong-icnrg-bloomfilterbased-name-resolution-01.txt

BR, Petri

--
Petri Jokela
Senior researcher
NomadicLab, Ericsson Research
Oy L M Ericsson Ab

E-mail: petri.jokela@ericsson.com<mailto:petri.jokela@ericsson.com>
Mobile: +358 44 299 2413

[icnrg] FW: New Version Notification for draft-ho… 홍정하
Re: [icnrg] New Version Notification for draft-ho… Petri Jokela
Re: [icnrg] New Version Notification for draft-ho… 정희영
Re: [icnrg] New Version Notification for draft-ho… 홍정하
Re: [icnrg] New Version Notification for draft-ho… Petri Jokela
Re: [icnrg] New Version Notification for draft-ho… Konstantinos V. Katsaros
Re: [icnrg] New Version Notification for draft-ho… Konstantinos Katsaros
Re: [icnrg] New Version Notification for draft-ho… Petri Jokela
Re: [icnrg] New Version Notification for draft-ho… 홍정하
Re: [icnrg] New Version Notification for draft-ho… 홍정하
Re: [icnrg] New Version Notification for draft-ho… Christian Esteve Rothenberg
Re: [icnrg] New Version Notification for draft-ho… 홍정하
Re: [icnrg] New Version Notification for draft-ho… 홍정하
Re: [icnrg] New Version Notification for draft-ho… Konstantinos Katsaros