Re: [icnrg] New Version Notification for draft-hong-icnrg-bloomfilterbased-name-resolution-01.txt

Petri Jokela <petri.jokela@ericsson.com> Wed, 24 September 2014 15:23 UTC

Return-Path: <petri.jokela@ericsson.com>
X-Original-To: icnrg@ietfa.amsl.com
Delivered-To: icnrg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1F06F1A01CB for <icnrg@ietfa.amsl.com>; Wed, 24 Sep 2014 08:23:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.2
X-Spam-Level:
X-Spam-Status: No, score=-1.2 tagged_above=-999 required=5 tests=[BAYES_50=0.8, HTML_MESSAGE=0.001, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IGM0Doo0YL_3 for <icnrg@ietfa.amsl.com>; Wed, 24 Sep 2014 08:22:59 -0700 (PDT)
Received: from sesbmg22.ericsson.net (sesbmg22.ericsson.net [193.180.251.48]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F0D651A0191 for <icnrg@irtf.org>; Wed, 24 Sep 2014 08:22:57 -0700 (PDT)
X-AuditID: c1b4fb30-f79736d0000053b8-21-5422e1cf8eaa
Received: from ESESSHC022.ericsson.se (Unknown_Domain [153.88.253.124]) by sesbmg22.ericsson.net (Symantec Mail Security) with SMTP id 7C.75.21432.FC1E2245; Wed, 24 Sep 2014 17:22:55 +0200 (CEST)
Received: from mail.lmf.ericsson.se (153.88.183.153) by smtp.internal.ericsson.com (153.88.183.86) with Microsoft SMTP Server id 14.3.174.1; Wed, 24 Sep 2014 17:22:55 +0200
Received: from [127.0.0.1] (nomadiclab.lmf.ericsson.se [131.160.33.3]) by mail.lmf.ericsson.se (Postfix) with ESMTP id 047DE1102A3; Wed, 24 Sep 2014 18:22:54 +0300 (EEST)
Content-Type: multipart/signed; boundary="Apple-Mail=_0619F0F2-2576-4C79-83BB-0D72C0DDF3CD"; protocol="application/pkcs7-signature"; micalg="sha1"
MIME-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
From: Petri Jokela <petri.jokela@ericsson.com>
In-Reply-To: <F8EFC212DF9A004DA18AA8FB011E42331D610CB8@SMTP1.etri.info>
Date: Wed, 24 Sep 2014 18:22:54 +0300
Message-ID: <8A63F6C6-4361-4105-B1AD-04B3DCE99F6E@ericsson.com>
References: <20140917002758.31365.13695.idtracker@ietfa.amsl.com> <F8EFC212DF9A004DA18AA8FB011E42331D610CB8@SMTP1.etri.info>
To: 홍정하 <jhong@etri.re.kr>, "icnrg@irtf.org" <icnrg@irtf.org>
X-Mailer: Apple Mail (2.1878.6)
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrDLMWRmVeSWpSXmKPExsUyM+Jvje75h0ohBlunWVrsnL2TyeLYzodM Fh96b7I5MHts/fyQ0WPnrLvsHpM3HmYLYI7isklJzcksSy3St0vgylj1fyJrQfMpxortrZ3M DYxT1jJ2MXJySAiYSOzq3scKYYtJXLi3nq2LkYtDSOAoo8TcN+fYIZwNjBKvf85nhHDWMUrc W3sMzGEWmMIo8XDXNrBZvAIGEse//2QBsYUFsiRmn3gM1M7BwSagJ3FiXSJImFPAXeLTssdM IDaLgKrE34UHoFrtJeZcnQTWKiRQI9F/fg8biC0iECJxed4HqFPlJT58OM4OUaMmMeXDA/YJ jAKzkJ0xC8kZIDazQJLEg64rTBC2tsSyha+ZIWxNif3dy1kwxTUkOr9NZIWwTSVeH/3ICGFb S8z4dZANwlaUmNL9kH0BI9cqRtHi1OKk3HQjI73Uoszk4uL8PL281JJNjMDoOrjlt8EOxpfP HQ8xCnAwKvHwLlBVChFiTSwrrsw9xCjNwaIkzrvw3LxgIYH0xJLU7NTUgtSi+KLSnNTiQ4xM HJxSDYxZk00/zr53UMzv9Mub5/99X3ojKu2WpwHLo5+c9t+dOfit7+jumbRYKfy0XNLcxTEn NGSuXGI6rb5YZkedTKrzgmdNuQIdx5yD5QsKzjTf1T64QOe/+s9fpupWW/cv2CI/pVrK9U8V J8Nvk8e9Yc/aLjFYfbp+dlKweNyJmLOpq1dGnC+UV5+vxFKckWioxVxUnAgAqBAARY8CAAA=
Archived-At: http://mailarchive.ietf.org/arch/msg/icnrg/SlaZAa54aQdhxElIskBCMS--gjU
Subject: Re: [icnrg] New Version Notification for draft-hong-icnrg-bloomfilterbased-name-resolution-01.txt
X-BeenThere: icnrg@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Information-Centric Networking research group discussion list <icnrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/icnrg>, <mailto:icnrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/icnrg/>
List-Post: <mailto:icnrg@irtf.org>
List-Help: <mailto:icnrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/icnrg>, <mailto:icnrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Sep 2014 15:23:02 -0000

On 18 Sep 2014, at 03:31, 홍정하 <jhong@etri.re.kr> wrote:

> Hi all,
>  
> The draft, draft-hong-icnrg-bloomfilterbased-name-resolution-01, has been submitted to the ID system.
> In this draft, we have tried to reflect few questions that we got from last meeting.
>  
> Please take a look and let us know if you have any questions or comments.
> I hope this draft could be discussed more in the interim meeting.
>  
> Thanks,
> Jungha Hong
>  


Hi, 

I have read the draft and I have some comments. I’m sorry, this e-mail is pretty long. If you think that I have miscalculated or made some erroneous assumptions, please correct me. So, here we go...

In this mail I try to figure out the feasibility of using Bloom filters as they are used in the original draft [3]. I would be happy to see that 1) my assumptions are too negative and / or 2) the filters do not have to be as large as I calculate here. It seems that the sizes of the filters will be huge in some environments, for example in a large corporate network.

Bloom filter matching provides an efficient way of verifying if an item belongs to a set or not, but false positive matches decrease the performance. Thus, I will concentrate on calculating the required size of the Bloom filter in the system with some assumptions of the number of nodes, number of ICN objects, and acceptable false positive rate. 

As the draft [3] mentions, there are 10^9 nodes in the current Internet. The number of addressable ICN objects will be several orders of magnitude higher. This means that there will be, on average, tens of thousands of ICN objects per node. How many of the objects will be in the NRS, is not known. It is possible that some of the objects are only inside a single node and some globally available. Also, depending on the role of a node, the number of objects may be lower or higher. 

The draft [3] envisions that the BF NRS is hierarchical and has multiple levels in a tree. It is not defined, how many nodes one NRS tree will serve. We can take an estimate, e.g. from a large corporate network where we would have, say, 200 000 nodes. Assuming that a single NRS serves e.g. 2 000 nodes, we need 100 NRS nodes. If we put them in a tree, we need more NRS nodes for higher levels. If each higher level NRS would serve e.g. 10 lower level NRS nodes, we would have 10 second level NRSs and one third level NRS. NOTE: I do not consider the peering possibility in this text, only the single NRS tree for this particular use case. 

If each of the nodes in the network has 10 000 ICN objects that they publish, a single 1st level NRS would maintain location information for 20 000 000 objects. It creates a BF out of these and sends this BF to the upper level NRS. The 2nd level NRS receives BFs from ten of the 1st level NRSes and combines them using logical OR operation. This means that the resulting BF contains information of 200 000 000 objects. 2nd level NRS can use this combined BF to verify if any of the NRSes below it has that particular ICN object information, but it can also match the object to all ten received 1st level BFs which gives more accurate information. In the first case, if there is a match in the combined BF, the NRS must also try matching to all the 1st level BFs to find the actual NRS that maintains the object location information. 

On the third level, the BFs from the ten 2nd level NRS are merged. The merged BF contains information of 2 000 000 000 objects. In theory, the 3rd level NRS can use this merged BF to check if a certain ICN object can be found in any of the 1st level NRSes in this tree. 

Now, the problem with BFs is the amount of false positives when the number of 1s in the filter increases. We can reduce the amount of false positives by selecting carefully the length of the filter m and the number of hash functions that are used for calculating the index values, namely k.

For each of the BFs in different levels, the amount of objects we have:

n1 = 2 * 10^7
n2 = 2 * 10^8
n3 = 2 * 10^9

If we aim at (roughly) 1% false positive rate, we get some possible values for m and k (from [2]):

m/n = 10, k=4: 

level 1: m = n1 * 10 = 20 * 10^7 (~ 200 000 000 bits ~ 200 Mbits)
level 2: m = n2 * 10 = 20 * 10^8 (~ 2 000 000 000 ~ 2 Gbits)
level 3: m = n3 * 10 = … ~ 20 000 000 000 ~ 20 Gbits

Increasing k to 8, targeting to 1% false positive ratio, we get m/n = 9 (actually, it does not change much the false positive ratio if we select any value for k from 4 to 8, see the table in [2]):

level 1: m = n1 * 11 = 18 * 10^7 (~ 180 Mbits)
level 2: m ~ 1.8 Gbits
level 3: m ~ 18 Gbits

1% false positives would mean that e.g. 1 000 000 queries would generate 10 000 false queries to the lower level NRSs. 

If we accept (roughly) 5% false positives, we can use m/n = 6 and k = 4. This would result in filter sizes:

level 1: m = n1 * 6 = 12 * 10^7 (~ 120 000 000 bits ~ 120 Mbits)
..
level 3: m = n3 * 6 = 12 * 10^9 (~ 12 000 000 000 bits ~ 12 Gbits)

In [1], BFs were suggested to be used in DNSSEC authentication. Not going into details of authentication, the author was suggesting to use 

k = 16, m = 40
k = 16, m = 32
k = 8, m = 40
k = 8, m = 32

to get low enough false positive probability. They were aiming at false positive ratios max 0.005%. In the NRS and flat name case, this low false positive ratio is not necessarily required, but I took this draft as an example because they were using 800 Mbit filters with k = 8 or 16. And, hopefully, they had kept in mind the performance for the verification process. I must admit that I don’t know how the verification can be implemented efficiently with such large filters.

We have to note that with Bloom filters, the size of the filter is always the same at all the levels. I.e. we have to take big enough filter so that it works also on higher levels in hierarchy. So, just quickly looking at this, it seems that we are dealing with huge Bloom filters, that are gigabits in size. 

Another issue is the transmission of the Bloom filters between NRSes. To keep the information as accurate as possible on different levels, we have to send updated Bloom filters pretty often to the upper level NRSes. How often this is done, depends naturally on the activity level of the nodes in the network, i.e. how often they publish new information items to the NRS or remove them.

So, we have some questions, e.g. 

- What is the acceptable rate for false positives?
- How big BFs are feasible to send and receive in the network? 
- The verification process should be fast, how can we efficiently match publications to filters with gigabit size?

[1] Bellovin S., “Using Bloom Filters for Authenticated Yes/No Answers in the DNS”, draft-bellovin-dnsext-bloomfilt-00.txt, December 2001, Expired
[2] http://pages.cs.wisc.edu/~cao/papers/summary-cache/node8.html
[3] Hong, J., Chun, W., Jung, H., “Bloom Filter-based Flat Name Resolution System for ICN”, draft-hong-icnrg-bloomfilterbased-name-resolution-01.txt

BR, Petri

-- 
Petri Jokela
Senior researcher
NomadicLab, Ericsson Research
Oy L M Ericsson Ab                  

E-mail: petri.jokela@ericsson.com
Mobile: +358 44 299 2413