Re: [icnrg] New Version Notification for draft-hong-icnrg-bloomfilterbased-name-resolution-01.txt

Dear Konstantinos,

I have a question on the idea of having additional BFs created when the capacity for a certain FPP is exceeded.

I wonder if you have thought that the total FPP would increase since there are multiple BFs per a single interface and FPP for each of them beside one is the maximum that it can be. In other words, if any a single BF among multiple ones per an interface hits the test, then request message will be forwarded into that interface. Thus, I think that we need to set a very low FPP for each small size BFs so the total
FPP would not exceed a desirable value.

Jungha Hong

From: icnrg [mailto:icnrg-bounces@irtf.org] On Behalf Of Konstantinos Katsaros
Sent: Saturday, September 27, 2014 10:12 PM
To: icnrg@irtf.org
Subject: Re: [icnrg] New Version Notification for draft-hong-icnrg-bloomfilterbased-name-resolution-01.txt

Hi,

If I may add 2c in this interesting discussion, though I do share the concerns expressed on the size of the Bloom Filters (BFs), I think the following basic assumption(s) does not necessarily hold:

"We have to note that with Bloom filters, the size of the filter is always the same at all the levels. I.e. we have to take big enough filter so that it works also on higher levels in hierarchy."

"If you are using same size filters on every layer to make it possible to OR them together, this requires huge filters because they must take into account also the worst case (i.e. the maximum number of information items that may be included in a single filter). "

I think the requirement here is rather to have a globally fixed BF size for merging to be possible. Then we could have additional BFs created when the capacity is exceeded: each node "fills" BFs until the FPR is reached; when merging, an object counter per BF can assist in finding the appropriate BFs to merge i.e., the ones that will not exceed the FPR. This calls for a bin packing algorithm with its own processing overheads. In this context, we could illustrate a tradeoff by considering the two following extremes:

(i) BFs of size enough to hold every object in the network. That would surely work for the topmost nodes. However, lower level nodes would propagate mostly very sparse BFs upwards, meaning that we unnecessarily spend too much space/bandwidth. Though admittedly a lookup operation on such large BFs could be an issue, note that in principle one BF should be checked per interface/neighbour node.

(ii) BFs of a very small size that would easily get "full" necessitating the use of an additional BF. In this case we would have "dense" BFs that would not waste too much space/bandwidth, but would require multiple lookup operations per interface/neighbour. I do not know however, whether multiple lookups of smaller BFs are (computationally) preferable compared to a single lookup of a huge BF, or if it makes no substantial difference.

Regards,
Konstantinos (Dinos) V. Katsaros

PS: Unfortunately I also do not participate the interim meeting. But, waiting for the minutes.

On 27 September 2014 08:17, Petri Jokela <petri.jokela@ericsson.com<mailto:petri.jokela@ericsson.com>> wrote:
Hi, thanks for the clarifications. Unfortunately, I am not attending the Saturday meeting. I hope you have fruitful discussions there. See my reply in-line:

On 27 Sep 2014, at 00:15, 홍정하 <jhong@etri.re.kr<mailto:jhong@etri.re.kr>> wrote:

Hi,

We can take advantage of that bloom filter size can be adjusted by the maximum number of names registered to a single NRS server to keep the FPP less than a desirable value.

For example, the bloom filter with the size of 2MB for 10^6 entries can keep the FPP less than 0.00046. If we assume that the number of such a NRS server on the top level is 10^6 and they are fully peered, then the maximum memory required at each server is 2TB. To deal with such a size, we are considering of being supported by hardware using GPU or TCAM and we are working on it.

If you want to merge BFs at higher level (by ORing them together), you must use same size filters on each level. Filters with different sizes cannot be merged. Just to clarify, are you assuming that you are using different sizes of filters on different places in the network, or are you using one-size filters which must be optimised for the worst case? If you have different sizes of filters, you have to collect the lower layer filters to the higher layer servers and when this higher level B-NRS receives a request, it must match the item to each of the 10^6 BFs separately. If you are

On the other hand, different sizes of filters on different B-NRS nodes would need a different set of hash functions for them. This is not a major thing, but not efficient because you have to do more hash calculations over the same item name to get correct values for different filters where we want to test the existence of the data item on a single server with multiple BFs from other B-NRSs.

If you are using same size filters on every layer to make it possible to OR them together, this requires huge filters because they must take into account also the worst case (i.e. the maximum number of information items that may be included in a single filter).

As the draft [3] mentions, there are 10^9 nodes in the current Internet. The number of addressable ICN objects will be several orders of magnitude higher. This means that there will be, on average, tens of thousands of ICN objects per node. How many of the objects will be in the NRS, is not known. It is possible that some of the objects are only inside a single node and some globally available. Also, depending on the role of a node, the number of objects may be lower or higher.

In B-NRS[3], it is not assumed that all names are globally visible. We are considering that the maximum number of names which are globally visible is 10^9 and the maximum number of B-NRS servers which are fully peered on the top level with 10^6 names for each is 10^3 ~ 10^6, which is the upper bound.

Now I must as that are you really talking about information items, not the node names. Considering the network today and what is available there, AND taking into account the new developments, such as IoT with lots of information, I guess that the amount of information items will be larger. If we have now 10^9 nodes in the network, your assumption would mean one information item published per one node (on average) globally, which is pretty low. Of course, not everything is globally visible, but still the amount looks low.

There is also a scalability issue. What happens if the number of items still grows in the future and the filter sizes must be adjusted accordingly? If increasing the size is enough, it is probably not a big problem. But we should prepare us also for something that we cannot envision now.

On the third level, the BFs from the ten 2nd level NRS are merged. The merged BF contains information of 2 000 000 000 objects. In theory, the 3rd level NRS can use this merged BF to check if a certain ICN object can be found in any of the 1st level NRSes in this tree.

It has to be corrected like that the 3rd level NRS checks for the 2nd level and the 2nd level does for the 1st level

Yes, true. What is assumed to be the highest level of NRSs, is three levels enough?

- What is the acceptable rate for false positives?
We are considering that FPP is less than 0.001 (i.e., 0.1%, 1000 out of 1 M entries).

Reasonable. However, this changes my original calculations, where I assumed slightly worse performance.

- How big BFs are feasible to send and receive in the network?
When bloom filters at each server are initially built, it will take a long time. However, once they are established, the insertion is done by each new name not the whole bloom filter.
On bloom filter updates for deletions, we are considering of refresh by shadow memory and the frequency can be from several hours to several days

Inserting items into a BF is not a problem, the problem comes when items should be removed and the whole BF must be recalculated as you say and the update frequency obviously depends on many things.

- The verification process should be fast, how can we efficiently match publications to filters with gigabit size?
With H/W assisted implementation using GPU, chunks including the corresponding bit set to 1 for hash value can be loaded to GPU memory and a child can be selected by bit-wise AND.

Ok, thanks! Although this probably requires some processing to find these chunks before matching can be done. But, as said, I don’t know much about the optimisations that you can do on lower layers.

BR, Petri

[1] Bellovin S., “Using Bloom Filters for Authenticated Yes/No Answers in the DNS”, draft-bellovin-dnsext-bloomfilt-00.txt, December 2001, Expired
[2] http://pages.cs.wisc.edu/~cao/papers/summary-cache/node8.html
[3] Hong, J., Chun, W., Jung, H., “Bloom Filter-based Flat Name Resolution System for ICN”, draft-hong-icnrg-bloomfilterbased-name-resolution-01.txt

--
Petri Jokela
Senior researcher
NomadicLab, Ericsson Research
Oy L M Ericsson Ab

E-mail: petri.jokela@ericsson.com<mailto:petri.jokela@ericsson.com>
Mobile: +358 44 299 2413<tel:%2B358%2044%20299%202413>

_______________________________________________
icnrg mailing list
icnrg@irtf.org<mailto:icnrg@irtf.org>
https://www.irtf.org/mailman/listinfo/icnrg