Re: [Roll] Border router failure detection

Konrad Iwanicki <iwanicki@mimuw.edu.pl> Tue, 20 April 2021 11:33 UTC

Return-Path: <iwanicki@mimuw.edu.pl>
X-Original-To: roll@ietfa.amsl.com
Delivered-To: roll@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 01B0B3A1EA2 for <roll@ietfa.amsl.com>; Tue, 20 Apr 2021 04:33:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, NICE_REPLY_A=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id saMjXbGf-zNH for <roll@ietfa.amsl.com>; Tue, 20 Apr 2021 04:33:50 -0700 (PDT)
Received: from mail.mimuw.edu.pl (mail.mimuw.edu.pl [193.0.96.6]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1D5F73A1EA1 for <roll@ietf.org>; Tue, 20 Apr 2021 04:33:49 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by duch.mimuw.edu.pl (Postfix) with ESMTP id 303D8603688B0; Tue, 20 Apr 2021 13:33:47 +0200 (CEST)
X-Virus-Scanned: amavisd-new at mimuw.edu.pl
Received: from duch.mimuw.edu.pl ([127.0.0.1]) by localhost (mail.mimuw.edu.pl [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id mAlLO4zzUQmu; Tue, 20 Apr 2021 13:33:45 +0200 (CEST)
Received: from [IPv6:2001:6a0:5001:2:d17:20fd:68fc:a5a2] (unknown [IPv6:2001:6a0:5001:2:d17:20fd:68fc:a5a2]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by duch.mimuw.edu.pl (Postfix) with ESMTPSA; Tue, 20 Apr 2021 13:33:43 +0200 (CEST)
From: Konrad Iwanicki <iwanicki@mimuw.edu.pl>
To: Routing Over Low power and Lossy networks <roll@ietf.org>, Michael Richardson <mcr+ietf@sandelman.ca>
References: <CAP+sJUfcEY2DNEQV=duJdN6P8zZn0ccuei+4ra-B6TcLb5z8Kg@mail.gmail.com> <49ac5fc3-4a3c-fb87-d366-eb7e7cfd60df@mimuw.edu.pl> <18233.1583176305@localhost> <CAO0Djp3w4vWCOawQ+eegNTRzb_HRGYH6n=bdEH6iVf5ZO0AGFQ@mail.gmail.com> <f71fe153-c0d1-097e-a72e-49ece97cbd48@mimuw.edu.pl> <10272666-28c7-ab3e-9ceb-1b8f2bb6e5e5@mimuw.edu.pl> <8372.1617839184@localhost>
Message-ID: <8abdccb5-afa1-8ba5-8974-8d8fe5bb96ff@mimuw.edu.pl>
Date: Tue, 20 Apr 2021 13:34:49 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <8372.1617839184@localhost>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/roll/k2RK1zjaRPBEQqlBxsPEK6GS0yI>
Subject: Re: [Roll] Border router failure detection
X-BeenThere: roll@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Routing Over Low power and Lossy networks <roll.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/roll>, <mailto:roll-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/roll/>
List-Post: <mailto:roll@ietf.org>
List-Help: <mailto:roll-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/roll>, <mailto:roll-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Apr 2021 11:33:55 -0000

Hi Michael,

Below, you can find my replies to your detailed comments, which I 
promised in my previous e-mail.

> It seems that you might want a term for the LBR's children.
> That is, the devices at rank "1", that hear the LBR's DIOs.

That would be useful. However, I am really bad at inventing names. Any 
ideas for a fitting one-word term?

Also, if you have better names for the two node roles, they may be worth 
considering because I am not perfectly satisfied with "sentinel" and 
"acceptor".

> I think that I would move some of section 3.2 further forward in the
> document.  I think that I need a gentler introduction to CFRCs here, and I
> don't really need to know the properties, rather I need a higher-level idea
> of things.    Since section 4 goes over the operations again, I would leave
> it for that spot, and make it a section 4.1.

True. Will do that.

> Having gone forward and back a bit, I'm still a bit uncertain how nodes
> assign themselves a bit... oh, self() in section 4 says "random".
> Why not make this a function (hash?) of the short-IPv6 address or something?

Hash could work. However, the choice of the random function allows a 
node to become sentinel, then give up, then become sentinel again, and 
so on, possibly multiple times. Having a hash based on the short-IPv6 
address would allow for only one such transition, unless for instance 
the node kept an additional counter that was hashed together with the 
address.

> Not every media has ACK frames at the L2 to establish that there are
> failures.  It might be worth putting the Detecting and the Verifying into
> separate sections.  Aside from the ANIMA case (which is usually pure ethernet),
> there are also situations where there is an ethernet backbone connecting a
> few 6LBRs (RFC8929), and your protocol would sensibly run on both the
> wireless and the wired side of the 6LBRs.

Indeed, RNFD can work with other link layers. L2 ACKs are only given as 
an example of an event that can be relevant to RNFD. In link-layers 
without ACKs, results from probing or other relevant events can be used 
(e.g., medium disconnection event). I tried to keep the example list 
brief, but perhaps more examples could be given for other underlying 
technologies?

Likewise, I was actually considering separating detection from 
verification but they are related as the same mechanisms can be used for 
both purposes. Therefore, ultimately they ended up in a single section.

> I also wonder if the RNFD could be included in DAOs (particularly storing
> mode ones) sent to the DODAG root.
> I know that probably seems senseless: why tell the root that you are
> observing it to be dying....  But, it acts as interesting telemetry about
> what the nodes are seeing, and might serve as a useful indication of imminent
> failure, or some kind of systematic long-cycle pathology.

Actually, the RNFD Option can be embedded into any control/data messages 
for which the DODAG ID and DODAG Version can be inferred unambiguously. 
In particular, in multi-hop messages, the option could even be updated 
and acted upon at each hop, which would speed up information 
propagation. The draft focuses on DIOs and DISs as they are basic 
messages that are exchanged irrespective of the current DODAG condition 
and existence of particular paths. Do you think this should be mentioned 
somewhere?

BTW. The root, if alive, sees that it is being observed and what the 
outcome of this observation is based on its local CFRCs.

> Your IANA considerations are how the document will look after IANA has
> processed it.  Prior to that point, you need to write it as a request.
>
> Something like:
>
>    IANA is requested to allocate the value TBD1 from the "RPL Control Message
>    Options" sub-registry of the "Routing Protocol for Low Power and Lossy
>    Networks (RPL)" registry.
>
> I like to include the URL of the registry in my request to be really really
> clear, and to save everyone else the time to find it.

Thanks, I did not know how to formulate this. I will correct the section.

> Your security considerations will want to cite RFC7416.
> In particular, 7.2.4, and section 7.3.4 and 7.3.5 might be relevant.

Thanks for the pointer. I was not aware that such a RPL-oriented 
document exists. Gathering all major threats and possible 
counter-measures, which would otherwise had to be picked from various 
works in the literature, it seems extremely valuable for prospective 
RPL's adopters.

As to the particular sections you mentioned, RNFD essentially relies of 
DIO and DIS messages. Since these messages are broadcast, sinkhole 
attacks (7.3.4) are of limited threat (to the RNFD itself): for any two 
nodes, if there exists any path in the network (comprising L2-induced 
communication links and non-compromised nodes), then eventually CFRCs 
from one of the nodes will reach (indirectly, through merging) the other 
node, irrespective of the sinkholes that may attract routed L3 traffic. 
Wormhole attacks (7.3.5) in turn would speed up the propagation of CFRC 
updates, so they actually can help RNFD.

Therefore, in my view, the major threats are indeed those already 
mentioned in the draft, which are referred to as 
overclaiming/misclaiming in the RFC you pointed at (7.2.4). Looking 
again at the security section of the RNFD draft, I am leaning toward 
expanding it a bit. In any case, an advantage of RNFD is that its 
construction allows for detecting such attacks to some extent and 
disabling the algorithm if necessary (though under such attacks, RPL 
would likely have more problems on its own).

The preferred means of disabling and enabling the algorithm at runtime 
is actually one of the issues I would like to discuss with the WG but 
probably after I have addressed all other issues that you and Pascal 
have raised (I think Pascal's last e-mail is next).

Best regards,
-- 
- Konrad Iwanicki.