Re: [Din] I-D: The R5N Distributed Hash Table

Hi, Regarding locality in DHTs, there is “CoralCDN" (regions/clusters grouped by latency if I remember well).
You can find several papers here:  
https://www.cs.princeton.edu/~mfreed/publications/
It was implemented and tested in PlanetLab years ago.

Best, Leandro.

> On 16 Sep 2022, at 23:39, Christian Huitema <huitema@huitema.net> wrote:
> 
> On 8/31/2022 4:00 AM, Schanzenbach, Martin wrote:
> 
>> Hi Christian,
>> 
>> thanks for checking the draft out.
> 
> And thanks for the replies. Sorry for this late answer, I was very busy with other projects.
> 
>>> On 29. Aug 2022, at 19:18, Christian Huitema <huitema@huitema.net> wrote:
>>> 
>>> 
>>> On 8/29/2022 5:18 AM, Schanzenbach, Martin wrote:
>>>> Hi dinrg!
>>>> 
>>>> I hope this email finds you all in good health.
>>>> 
>>>> Wwe wanted to touch ground with you if any of you would be interested in our specification
>>>> on R5N, a protocol for randomised recursive routing for restricted-route networks. [1]
>>>> It is a distributed hash table protocol and we currently have two independent implementations.
>>>> The document is not finished but in a parseable state and feedback is very welcome either way.
>>>> 
>>>> This DHT can serve as a basis for GNS, for which there is a draft already quite far into the ISE process [2].
>>>> In fact, it is what our GNS implementations are using at the moment.
>>>> 
>>>> Thank you and have a nice day
>>>> Martin
>>>> 
>>>> [1] https://datatracker.ietf.org/doc/draft-schanzen-r5n/
>>>> [2] https://datatracker.ietf.org/doc/draft-schanzen-gns/
>>> The R5N protocol is built on the classic DHT, kamdelia. Starting with a well known algorithm makes a lot of sense, but  I wonder whether some more work is needed regarding locality and security.
>>> 
>>> The connectedness of the DHT relies on each node learning about its closest neighbors, where closest means lowest distance between the peer ID, not lowest distance across the network. As the reach of the DHT increases, the network distance between the closest neighbors increases -- the "closest" neighbors will be at random locations across the Internet. Each node will generate network traffic required to maintain the closest list, and the aggregate may end up quite noticeable.
>>> 
>> Do you know of any actual data on that? I know this seems to be true intuitively, but the actual impact would be of interest for a discussion.
> 
> The Windows team did computations of that nature as part of the PNRP development. The key input is the number of nodes and the frequency of the "keep-connected" traffic for the "closest neighbor" set. You have to assume that the closest neighbors are all over the Internet, model the keep-connected traffic, and multiply by the number of nodes. It can become large quickly.
> 
>> 
>>> The dependency on closest neighbors may also be a security issue. Each node depends on the good behavior of its closest neighbors. Attackers may single out a set of target peer ID, and spend some time generating peer IDs that have a close distance to this target. Once the attackers have secured a position in the list of closest neighbors of their target, they can mount a set of attacks, such as monitoring traffic to the target or denying service to the target.
>> Yes that is true. But it is partially addressed by the random routing. And the monitoring issue is extremely difficult if not impossible to address at all if you want global routing over public networks.
>> 
>>> The monitoring problem goes beyond just the list of closest neighbors. If attackers create a sufficient number of nodes, they will be able to monitor and possibly disturb a fraction of the traffic. The random routing steps in R5S ensure that resolution starts from random points in the network, which has both good and bad effects. On the good side, it means that repeated queries will follow different paths, which is more robust than always following a fixed path. On the other hand, the randomness almost ensures that a fraction of the queries will hit the attackers' nodes.
>> Yes, that is correct. We should discuss that in the document. The attacker model needs to be defined and discussed properly anyway.
> 
> Suppose an attacker wants to be among the closest neighbors of a target. It can do that by picking keys at random until it finds one that is sufficiently close. The number of trials necessary will scale linearly with the number of nodes in the network -- 2^32 would likely be more than enough in practice. That's not a very large number of trials, many attackers can afford that.
> 
> The attacker will be inserted in the closest neighbors set. At a minimum, it will receive the "keep-connected" messages, and be able to track the target as it becomes active and moves to different IP addresses. Opportunistically, the attacker will be asked to relay queries to the target, and accumulate data about the target's "social graph".
> 
> Of course, powerful adversaries can repeat this process many times and track multiple targets.
> 
>>> Such issues are inherent with the classic DHT technology, but they may be solved in alternative technologies, as demonstrated by Freenet. Of course, this is your project, your choices. But it would be very nice to have some higher emphasis on locality, so that for example resolution of local names does not generate traffic outside the local network -- that would be better for privacy, and also more robust.
>> I do not understand how either that would work or why you think this is not the case for r5n.
>> Local nodes (direct neighbours) cache messages (and possibly even store if the replication level is high enough). So ideally this data is available at close nodes, not only on nodes close to the key.
>> I am not really sure what you are proposing as an alternative. Keys cannot at the same time be stored locally and at the same time be efficiently available to peers outside of it. IMO we can have efficient global routing with O(log n) complexity or local-first routing with significantly more overhead for global routing.
>> If you have any concrete ideas or papers that we could read in this regard that would be very helpful.
> 
> Yes, there are ways to implement some amount of locality with DHT. The DHT requires a hierarchy of caches, with the number of "levels" scaling with the size N of the network as O(logN). The first hop of a query is typically from the first level of that hierarchy, and it is generally possible to populate that level with neighboring nodes. But as the query progresses, there are fewer local candidates. Indeed, the "closest neighbors" level is spread all over the Internet.
> 
> But it is not just about network proximity, it is also about trust. Suppose that you have a way to express some amount of trust between nodes -- maybe the nodes that have been given a local pet name. You could tweak the DHT routing, or the cache selection, to preferably route queries through these trusted nodes. You would route those queries over the "graph of trust" instead of over the whole Internet.
> 
> I don't have a specific reference in mind, but you should probably look at the white papers and specifications of the Freenet project, which did develop this separation between trusted and open graphs.
> 
> -- Christian Huitema
> 
> 
> 
> _______________________________________________
> Din mailing list
> Din@irtf.org <mailto:Din@irtf.org>
> https://www.irtf.org/mailman/listinfo/din <https://www.irtf.org/mailman/listinfo/din>
--
Leandro Navarro
http://people.ac.upc.edu/leandro	 http://dsg.ac.upc.edu