[Roll] Border router failure detection

Konrad Iwanicki <iwanicki@mimuw.edu.pl> Mon, 02 March 2020 11:58 UTC

Return-Path: <iwanicki@mimuw.edu.pl>
X-Original-To: roll@ietfa.amsl.com
Delivered-To: roll@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 60CC13A0985 for <roll@ietfa.amsl.com>; Mon, 2 Mar 2020 03:58:50 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YwWYVcLzLEou for <roll@ietfa.amsl.com>; Mon, 2 Mar 2020 03:58:48 -0800 (PST)
Received: from mail.mimuw.edu.pl (mail.mimuw.edu.pl [193.0.96.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 436403A0997 for <roll@ietf.org>; Mon, 2 Mar 2020 03:58:48 -0800 (PST)
Received: from localhost (localhost [127.0.0.1]) by duch.mimuw.edu.pl (Postfix) with ESMTP id B8CCD61B99726 for <roll@ietf.org>; Mon, 2 Mar 2020 12:58:46 +0100 (CET)
X-Virus-Scanned: amavisd-new at mimuw.edu.pl
Received: from duch.mimuw.edu.pl ([127.0.0.1]) by localhost (mail.mimuw.edu.pl [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id EkHjK4_q9Io0 for <roll@ietf.org>; Mon, 2 Mar 2020 12:58:44 +0100 (CET)
Received: from [IPv6:2001:6a0:5001:2:ac95:3805:4c55:2928] (unknown [IPv6:2001:6a0:5001:2:ac95:3805:4c55:2928]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by duch.mimuw.edu.pl (Postfix) with ESMTPSA for <roll@ietf.org>; Mon, 2 Mar 2020 12:58:43 +0100 (CET)
To: Routing Over Low power and Lossy networks <roll@ietf.org>
References: <CAP+sJUfcEY2DNEQV=duJdN6P8zZn0ccuei+4ra-B6TcLb5z8Kg@mail.gmail.com>
From: Konrad Iwanicki <iwanicki@mimuw.edu.pl>
Message-ID: <49ac5fc3-4a3c-fb87-d366-eb7e7cfd60df@mimuw.edu.pl>
Date: Mon, 02 Mar 2020 12:59:02 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <CAP+sJUfcEY2DNEQV=duJdN6P8zZn0ccuei+4ra-B6TcLb5z8Kg@mail.gmail.com>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/roll/sedwAP83QqiBrKUmt9C8rLBeZ7Y>
Subject: [Roll] Border router failure detection
X-BeenThere: roll@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Routing Over Low power and Lossy networks <roll.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/roll>, <mailto:roll-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/roll/>
List-Post: <mailto:roll@ietf.org>
List-Help: <mailto:roll-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/roll>, <mailto:roll-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Mar 2020 11:58:57 -0000

Dear all,

I am new to the list, so apologies if I fail to follow some conventions.

In a nutshell, I would like to propose an extension to RPL that had been 
invented to significantly improve handling crashes of border routers. 
Since I have little experience writing RFC-like drafts, I would greatly 
appreciate any help.

Following Dominique’s advice, I did browse the topics that were being 
discussed on the ROLL list in the last two years and did not find any 
major overlaps. At the same time, the extension seems to match the 
current charter of the WG, notably with respect to reliability and 
manageability. Below I thus try to briefly motivate and summarize the 
extension, so that you can judge how relevant it is and so that 
hopefully somebody gets interested in helping to write it up.

The extension was inspired by numerous commercial LLN deployments, which 
we did for a range of industries. What we noticed in virtually all cases 
is that the major causes of sensor data unavailability were crashes of 
border routers. The explanation is rather intuitive. Border routers 
typically rely on a tethered power supply and in practice it is often 
not economic, or even impossible, to provide them with a battery backup. 
Therefore, power outages are major problems. Another issue is that 
border routers are also more intricate, in terms of both hardware and 
software, than low-power nodes. As such, they are also more likely to 
malfunction.

What we observed, however, is that RPL does not efficiently handle 
crashes of border routers [1][2]. Upon such a failure, tearing down 
nonexistent upward routes can take a lot of time (depending on the 
data-plane traffic) and generate considerable control traffic, which is 
problematic in many applications.

What we did to address the problem was developing an algorithm, called 
RNFD, in which nodes collaborate to monitor the state of a border router 
of the DODAG they belong to [1]. Experiments with a TinyOS 
implementation of the algorithm on two testbeds (32 nodes at 2.4GHz and 
76 nodes at 868MHz) and in simulations show that it can outperform bare 
RPL: it can detect a border router crash one or two orders of magnitude 
faster and with much lower control traffic [1].

To verify these results and come up with a way of best integrating RNFD 
with RPL, we also did an independent implementation of RNFD (for which I 
did not write a single line of code) and integrated it with RPL-Lite 
from ContikiNG [3]. Experiments with that implementation on a network of 
~150 nodes at 2.4GHz and in a different simulator again confirmed the 
aforementioned performance gains resulting from employing RNFD [3].

I am looking forward to your comments or questions (and hopefully 
volunteers to help me with the draft). Below you can find the previous 
citations. I would also be happy to provide more details.

References:

[1] K. Iwanicki: “RNFD: Routing-Layer Detection of DODAG (Root) Node 
Failures in Low-Power Wireless Networks,” in IPSN 2016: Proceedings of 
the 15th ACM/IEEE International Conference on Information Processing in 
Sensor Networks. IEEE. Vienna, Austria. April 2016. pp. 1—12. DOI: 
10.1109/IPSN.2016.7460720

[2] A. Paszkowska and K. Iwanicki: “Failure Handling in RPL 
Implementations: An Experimental Qualitative Study,” in Mission-Oriented 
Sensor Networks and Systems: Art and Science (Habib M. Ammari ed.). 
Springer International Publishing. Cham, Switzerland. September 2019. 
pp. 49—95. DOI: 10.1007/978-3-319-91146-5_3

[3] P. Ciolkosz: “Integration of the RNFD Algorithm for Border Router 
Failure Detection with the RPL Standard for Routing IPv6 Packets,” 
Master's Thesis, University of Warsaw. November 2019.

Best regards,
--
- Konrad Iwanicki.