Re: [Roll] Review of draft-ietf-roll-rnfd-02

Konrad Iwanicki <iwanicki@mimuw.edu.pl> Wed, 20 March 2024 12:22 UTC

Return-Path: <iwanicki@mimuw.edu.pl>
X-Original-To: roll@ietfa.amsl.com
Delivered-To: roll@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3DF35C14F6B5 for <roll@ietfa.amsl.com>; Wed, 20 Mar 2024 05:22:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.107
X-Spam-Level:
X-Spam-Status: No, score=-7.107 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=mimuw.edu.pl
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id X0kdne5_dkWE for <roll@ietfa.amsl.com>; Wed, 20 Mar 2024 05:22:18 -0700 (PDT)
Received: from mail.mimuw.edu.pl (mail.mimuw.edu.pl [IPv6:2001:6a0:5001::4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6FD71C14F61A for <roll@ietf.org>; Wed, 20 Mar 2024 05:22:17 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by mail.mimuw.edu.pl (Postfix) with ESMTP id 83DC9300023FA; Wed, 20 Mar 2024 13:22:13 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=mimuw.edu.pl; h= content-transfer-encoding:content-type:content-type:in-reply-to :from:from:references:content-language:subject:subject :user-agent:mime-version:date:date:message-id:received:received; s=20240128; t=1710937331; x=1711542132; bh=f1brDH0lRsT8Isybj/Vr lFgbMt1lLw775UPJPe5mr/M=; b=dTCXqv3E9KylFLsY8G6+5vOQkquKYf/QtNdx 3ppJAQWUlW44UAWXvHKjxilYYiLiExsU/ve95FKwPnT9XMDRbmLvJ/kh/g3UhQ5x xvaTUvcrasS3tCwM9DYlc6fRgIEyCU68zdrxAuPZ+A2IdjiONXKYQQLSKj6REe+h wfXsT455f04/GEHP8D/ldPTb+B6UtbAJdHSK4Adu/7hBSwTGP6cNKus7f0r8JU12 +RIrznIGtVR/AbElbtm/atK+2KQmkz+6nNsSCsDjiEjPBpPNVrh1bCH2dc1M+1Nb fyeQlix4GnkIY2CQGKa0yfsRJt0m03hTxaQ0I1G4RLZ5Uzt/bw==
X-Virus-Scanned: Debian amavis at mail.mimuw.edu.pl
Received: from mail.mimuw.edu.pl ([127.0.0.1]) by localhost (mail.mimuw.edu.pl [127.0.0.1]) (amavis, port 10026) with ESMTP id 94RVLN-mK6Mh; Wed, 20 Mar 2024 13:22:11 +0100 (CET)
Received: from [192.168.0.171] (unknown [213.134.167.50]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.mimuw.edu.pl (Postfix) with ESMTPSA; Wed, 20 Mar 2024 13:22:11 +0100 (CET)
Message-ID: <2aa3fc95-721c-4b7b-a3d5-0826a0413f12@mimuw.edu.pl>
Date: Wed, 20 Mar 2024 13:22:10 +0100
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Content-Language: en-US, pl
To: Routing Over Low power and Lossy networks <roll@ietf.org>, "S.V.R.Anand" <anandsvr=40iisc.ac.in@dmarc.ietf.org>
References: <d6fxpyibwqdb5vt76ekvh6g3vqguvwp7ndfb4m2phyr7zxpmpf@fsxok66lfr27>
From: Konrad Iwanicki <iwanicki@mimuw.edu.pl>
In-Reply-To: <d6fxpyibwqdb5vt76ekvh6g3vqguvwp7ndfb4m2phyr7zxpmpf@fsxok66lfr27>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/roll/v9UoFb2wiDKlwQUwLEzMi_q59mQ>
Subject: Re: [Roll] Review of draft-ietf-roll-rnfd-02
X-BeenThere: roll@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Routing Over Low power and Lossy networks <roll.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/roll>, <mailto:roll-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/roll/>
List-Post: <mailto:roll@ietf.org>
List-Help: <mailto:roll-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/roll>, <mailto:roll-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Mar 2024 12:22:23 -0000

Dear Anand,

Thank you a lot for your feedback! Please, find my replies inline. 
Whenever I talk about changes made in the draft, I am referring to its 
newly submitted version: 03.

On 16/11/2023 10:52, S.V.R.Anand wrote:
> 1) After the reading document, I am tempted to ask this question as to how
> come such a serious problem which can disrupt the service for several minutes
> got unnoticed till today after so many years of large scale RPL deployments
> across the world and lot of RPL related research work that had gone in. Surely some of
> those real-world deployments might have implemented their own proprietary
> methods to minimize the service disruption without any changes to RPL protocol.
> Having multiple LBRs for the purpose of combating LBR failures and load
> balancing is not new.

Actually, the problem was noticed more than 8 years ago :-) (see 
reference [Iwanicki16], which also introduces the original RNFD 
algorithm). It just took a while to study the algorithm more thoroughly, 
describe it in the draft, and so on.

Regarding alternative solutions, in some cases LBR failures may not be a 
major problem. In particular, in our experience, they were mostly due to 
power outages. Therefore, equipping an LBR with a sufficiently large 
battery (and an independent uplink) can already alleviate the problem. 
Even a small battery could help. For example, detecting that its power 
supply has switched to the battery and anticipating an impending crash, 
an LBR can reset its Trickle timer to poison all routes just before it 
crashes.

In deployments without battery-backed power supply, in turn, LBR 
redundancy can help, again, probably as long as the different LBRs are 
in different power domains. Nevertheless, depending on the network 
topology, the process of switching all nodes in the DODAG of a failed 
LBR to a still active LBR may take some time due to the phenomena 
described in the draft. Depending on the application, this delay may or 
may not be problematic.

In any case, power outages are not the only cause of LBR crashes. For 
some of them, even LBR redundancy may not be enough (e.g., a software 
bug that manifests itself on all replicas of an LBR). The nature and 
frequency of such failures likely depends on particular solutions 
employed, though, so it is hard to generalize.

All in all, the RNFD algorithm is certainly not the only way of 
addressing the problem of LBR crashes. However, it is broadly 
applicable, does not make additional assumptions (e.g., LBR replication, 
power supply backup), and works largely independently of RPL, thereby 
not requiring major modifications of the protocol. The draft in addition 
highlights the problem itself and explains RPL’s emergent behavior that 
is its root cause, which may be also of value.

> Since LBR is straddling between wired and wireless worlds, proper functioning
> of the IoT network is essential for wireless sensor data producer and the wired
> sensor data consumer/actuator. This means both these end points can potentially
> detect LBR failures. While the RNFD document tackles the problem from the LLN
> side, one can use plethora of schema for quickly detecting LBR failure from the
> wired side deterministically at negligible cost. In the presence of multiple LBRs
> and the use of virtual roots one can reroute the packets to the other LBRs.

This idea falls in the aforementioned alternative solutions for the LBR 
crash problem. There are a couple of issues though. First, having only 
the wired side detect the failure does not account for scenarios in 
which only the (wired) link to the LBR is down rather than the LBR 
itself. In effect, from the LLN perspective, the LBR may be operating 
normally, in particular, correctly routing packets within the LLN, for 
instance, from battery-powered sensors to actuators with a tethered 
power supply. But in some applications this scenario may not be 
problematic. Second, in practice, failure detectors are never perfect. 
Consequently, detecting LBR failures even over wired links is still not 
deterministic and entails some nonzero probability of errors. Third, 
detecting LBR crashes by some node(s) is just one aspect. An equally 
important one is having all affected nodes (in the LLN) learn that the 
LBR is down. In this context, what is important in the above solution is 
that the multiple LBRs form a single virtual root rather than 
independent roots: upon detecting that one of them is down, the others 
can bump the DODAG Version thereby triggering a construction of a DODAG 
that does not involve the failed LBR as part of the virtual root.

In any case, RNFD can be an orthogonal solution. In particular, in the 
scenario with multiple LBRs, either independent or as a single virtual 
root, the LBRs can monitor each other over the wired links, thereby 
acting as each other's sentinels in RNFD.

> I came across an interesting work that has been carried out in Inria research
> "RPL cooperation experiments using FIT IoT-LAB testbed". In particular, the
> paper titled "Sharing is caring: a cooperation scheme for RPL network resilience
> and efficiency". Here coordination happens between LBRs over wired network to
> detect LBR failures.  The other LBRs pass on the LBR failure information into
> the LLN. (Source: https://inria.hal.science/hal-02095410/file/papier.pdf).

Yes, I think I might have come across that work too. It is essentially 
about transparently implementing virtual roots comprised of physical 
LBRs of possibly different network operators. It focuses mainly on load 
balancing and network address translation, so that the LBRs indeed 
appear as a single root. In contrast, the paper does not explicitly 
consider LBR failure detection. I think it also uses RPL Instance in a 
way not fully compliant with the protocol. In any case, as explained 
above, RNFD can complement those solutions, helping with detecting LBR 
failures also from the LLN side.

> Giving a brief summary of strategies adopted by IoT deployments that bring
> resilience to LBR failures may not be out of place. Also, RNFD protocol can
> take benefit from other schemes that are available.

Certainly, that is a great idea. I have added Section 1.3 to this end.

> 2) Assuming the time gap between border router failures due to power outages is
> long, it is not apparent from the document the energy consumption and its
> impact on the node/network lifetime due to the consensus process which is
> always running in the background. I know, it depends :) But it is good to
> include experimental results that indicate RNFD performance under different
> network scenarios.

Some experimental results are present in reference [Iwanicki16], notably 
Section VI.C and to some extent also Section VII.B. More, conducted on 
an independent implementation, can also be found in reference 
[Ciolkosz19]. In short, if a DIO with the RNFD option does not require 
more Layer-2 fragments than a DIO without the option, then the energy 
overhead due to the extra bits for the option is negligible. In such a 
case, the overhead is mostly due to extra DIO transmissions triggered by 
Trickle timer resets caused by changing the contents of the CFRCs. This 
happens, for instance, in the presence of flapping links with the root 
and may vary with the heuristics employed to assess whether a link is 
down and whether a node should be a Sentinel, that is, the way they 
exploit the trade-off between the accuracy of detection and the latency 
of making a decision. Therefore, yes, it all depends, :-) but the 
overhead can be small per the aforementioned references.

> 3) Refer to section 5.2.  Detecting and Verifying Problems with the DODAG Root
> 
> - Apart from NUD and Layer 2 triggers, I think "overhearing" technique as a means
>    to infer the neighbors' state can help reducing the message exchanges and
>    faster LBR failure detection. What are your thoughts on this ?

Good point. I added a sentence explicitly mentioning overhearing in 
Section 5.2.

> - How does the RNFD work in the presence of RDC ? How is the causality
>    of information is maintained in such a situation ?

I assume that "RDC" stands for "radio duty cycling". While RDC can 
indeed exacerbate the problem of information causality in some 
algorithms, the problem may also occur without RDC, so this is a valid 
concern irrespective of RDC.

RNFD addresses the problem by employing state-based eventual 
consistency. The main points are given below in a nutshell.

The relevant state of a node, denoted S, in principle comprises the 
DODAG Version (maintained by RPL and available to RNFD) and the two 
CFRCs (maintained by RNFD); LORS is local and is never transmitted, so 
we need not consider it here. Let us denote a so-defined node state as a 
tuple S = (v, (pc, nc)), where v is the DODAG Version, pc is the 
PositiveCFRC and nc is the NegativeCFRC from the draft. Moreover, as an 
invariant we have that compare(pc, nc) always yields greater or equal, 
where compare(_, _) is the function from Section 4.1, returning either 
smaller, greater, equal, or incomparable.

The node states are partially ordered. More specifically, given two 
states S1 = (v1, (pc1, nc1)) and S2 = (v2, (pc2, nc2)) we have:
- if v1 < v2 or v1 > v2, then respectively S1 < S2 or S1 > S2;
- otherwise (v1 == v2):
   * if compare(pc1, pc2) yields smaller or greater, then respectively 
S1 < S2 or S1 > S2;
   * if compare(pc1, pc2) yields equal, then:
     + if compare(nc1, nc2) yields smaller or greater, then respectively 
S1 < S2 or S1 > S2;
     + if compare(nc1, nc2) yields equal, then S1 = S2;
   * otherwise, S1 and S2 are incomparable.

Given this partial order, RNFD ensures that the state of a node never 
goes backward, that is, if St1 is the state of the node at time t1 and 
St2 is the state of the node at time t2 >= t1, then St1 <= St2. This is 
because:
- DODAG Versions adopted by the node can only grow (in the lollipop 
counter order but still);
- c_new_local := merge(c_old_local, c_recv) satisfies c_new_local >= 
c_old_local from the definition of merge(_, _);
- c_new_local := zero() is invoked only upon changing the DODAG Version 
to a greater one, so the resulting new state is greater than the old one 
anyway;
- both PositiveCFRC and NegativeCFRC are updated at the same time and 
always satisfy the invariant;
- merge(_, _) is idempotent, commutative, and associative;
- these are the only operations that affect the node’s previously 
defined state in RNFD.

The causality of information is thus preserved because of the node 
states being monotonic: the new state of a node either contains all 
information from the old state or is in a new DODAG Version.

> - What happens if certain conditions result in a state that keeps toggling ?  I
>    think RNFD needs to have additional algorithms in place on top of link
>    failure detection schemes such as NUD.

A toggling link with the DODAG root, as mentioned previously, causes a 
Sentinel to repeatedly update its CFRCs. This can quickly make them 
saturated(_), thereby necessitating bumping the DODAG Version, and hence 
rebuilding the DODAG. Therefore, Sentinels should rather have stable 
links with the DODAG root (as mentioned in the last paragraph of Section 
6.1 of the reviewed version 02 of the draft). In short, if a node, 
acting as Sentinel, observes that its link with the root keeps toggling, 
then it should probably step down to become Acceptor. If all Sentinels 
experience this, for instance, because there is some heavy noise around 
the DODAG root, then it may make sense to disable RNFD altogether, 
thereby trading off the ability to quickly detect the root’s crash for 
the stability of the DODAG. When the noise diminishes, RNFD can always 
be reenabled.

> 4) For minimizing false positives, and false negatives, we would expect the
>     Sentinel nodes to be running some internal algorithm that decide on the state
>     changes between “SUSPECTED DOWN” and “LOCALLY DOWN”. I suppose this algorithm
>     is important for the proper functioning of RNFD protocol. In order to maintain
>     consistency and robustness among different implementations, a reference
>     algorithm will be useful to be given in a separate section or appendix.

Yes, false positives and false negatives should be minimized, but also 
the latency of the detection should be considered. However, apart from 
this requirement, there is no reference algorithm. This is because the 
problem is highly correlated with the problem of assessing the quality 
of a link, and different Layer-2 solutions may employ different 
mechanisms to this end. For instance, with asynchronous MAC layers based 
on low-power listening with repeated frame transmissions, it may be 
sufficient not to receive acknowledgments for the full listening 
interval. For an always-on CSMA MAC, in turn, multiple transmissions and 
backoffs will likely be necessary. Yet other mechanisms may be more 
appropriate for TDMA MACs or wired networks. In short, depending on the 
underlying protocols, the exact mechanisms for detecting that a link is 
down are beyond RNFD.

Indeed, however, it is worth emphasizing that, on top of the regular 
link status detection, additional heuristics for a better confidence can 
be employed, and that the thresholds should preferably be made 
configurable. I have added an explaining paragraph as the last paragraph 
of Section 5.2.

> 5) Refer to 5.3.  Disseminating Observations and Reaching Agreement
> 
> - How does one set RNFD_CONSENSUS_THRESHOLD value ? Is it constant across the
>    network or it depends on factors like network topology, traffic pattern and so on ?
>    Also, does this number remain fixed for the entire lifetime of the network or needs
>    to be adapted over time ?

The value of RNFD_CONSENSUS_THRESHOLD was meant to be constant across 
the network. Nevertheless, the protocol would work even if the value 
were different for different nodes. Normally, such a setting would have 
little sense, though: it would simply mean that specific nodes (those 
with the lower values) could make the global decision faster than others.

The value could depend on the network topology, especially for very 
sparse networks one could like to have more precise control on how many 
Sentinels should report failures of their links before the root is 
considered as down. However, it is mostly a trade-off between how 
aggressive a detection of the root failure should be vs. how much one 
would like to risk rebuilding the DODAG upon a false positive decision. 
The default value is right in the middle.

The value may remain fixed for the entire lifetime of the network or may 
be dynamically adapted if necessary (beyond RNFD). The adaptation is 
possible thanks to the previously mentioned fact that the algorithm 
would work even when different nodes had different values (e.g., 
temporarily, when new values are being propagated).

> - A curious question. I would assume that even if just one node indicates that
>    LBR is UP at any time through PositiveCFRC, then LBR can be considered up at
>    that time. Isn't it ?  This precise time epoch is what should be considered in
>    deciding the liveliness of the LBR. In the absence of the notion of time, the
>    question of "latest" information does not arise. Is there a risk of a wrong
>    conclusion, especially if the number of good links are less ?

Nice question. This assumption would be correct if we discounted 
scenarios in which normal nodes, and specifically Sentinels in RNFD, may 
also fail. RNFD is designed to detect DODAG root crashes even in such 
scenarios: having RNFD_CONSENSUS_THRESHOLD below 1 allows for ignoring a 
fraction of Sentinels that may be down together with the root, yet 
reported earlier that the root was up (their decision will potentially 
never be changed).

> There is a small typo in 3.2. Counters and Communication. "acros" to be changed to "across"

Corrected.

Once again, thank you. The comments were really thought provoking and I 
am sure the resulting corrections did improve the quality of the draft.

Best,
-- 
- Konrad Iwanicki.