Re: [Roll] Border router failure detection

Rahul Jadhav <rahul.ietf@gmail.com> Tue, 03 March 2020 10:45 UTC

Return-Path: <rahul.ietf@gmail.com>
X-Original-To: roll@ietfa.amsl.com
Delivered-To: roll@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B01693A1D16 for <roll@ietfa.amsl.com>; Tue, 3 Mar 2020 02:45:35 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NOgm7bS6Doil for <roll@ietfa.amsl.com>; Tue, 3 Mar 2020 02:45:33 -0800 (PST)
Received: from mail-lj1-x22e.google.com (mail-lj1-x22e.google.com [IPv6:2a00:1450:4864:20::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3300B3A1D15 for <roll@ietf.org>; Tue, 3 Mar 2020 02:45:33 -0800 (PST)
Received: by mail-lj1-x22e.google.com with SMTP id q19so2032542ljp.9 for <roll@ietf.org>; Tue, 03 Mar 2020 02:45:33 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=elOgUquKhsDrY+FNduoksyErhppl7mqnKg4zcROaN4s=; b=D0Rq+Q/lkLGC34c9ylD/wMDurewZSLFdpwnibbxVFq08TyzKgF/5P076816AA+ixac QgKDXEK4X9PdFnXApO382ika8ukIuCN7mUIYQmfp+z4dNTaZrQIyXGoJjaHXuL6LTF6l YVt895aMdNxEwfTd7aZo7srVZmXaLAni+msNWMf/D3O6vG8Y8dh/ziHy90T8g262KuXQ q7Iqi6kZjr7KBOfgpY4J6XhhvOCKpMuNyG/7pIrg7jnpd1FhGvcbggvDg5xRbhup2w+j KmELjJR9viQE1ZhrW6UmVx0YHVsKV2lKU3hSQr/i/NGFkWguIPHpCxY1X+NQdXH8VRcC n/fA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=elOgUquKhsDrY+FNduoksyErhppl7mqnKg4zcROaN4s=; b=s4WoGLdoIp2g3I8hGV6hE/7kXMmx2zKzr6jL/h35vRBRWGYG+103oIBcXYxua/ZdmO J+Yz83BRSO38UhMlKT7qjYxJP4e8+YxTLu2muEpdlIhuZoPe7vuDQpwj6WKVA644AN8j ZNuZeieqCnXo40USgph7PpFOzs6jaLJr1xznPMLFR/W830FF+kUiS91O6Lj97wRoOM5k 5raYaVsl8gI9RvkyQteEaE9X3WFRmGbch0fgw2Pz+jBjz8za9M5qmeensimi1lBBN+Sa /IEB+JoG3VFsyRPMRiBi/++I4dIPE6SZ9gP5BgwYrMXfR8mXHv2A0huWXjyfqUZhvIBz fXNw==
X-Gm-Message-State: ANhLgQ0JZVakAmvLPbL8PUmaIaPTPLjVawsSlXrAih+vbnQb8KkdsL9F BsWLrSp8Ox2zzqKohwGpsTniZba92SLER5kBZImxkHm4
X-Google-Smtp-Source: =?utf-8?q?ADFU+vuDVqlrSgON0yA0WY/xpP+LMOqBpKe2Y1IyMII1?= =?utf-8?q?SbLGphPZFKXdPyFjvVBj4DUWbkOdMZnnqcBUmX3z1aMowJM=3D?=
X-Received: by 2002:a2e:93c5:: with SMTP id p5mr2028924ljh.192.1583232331140; Tue, 03 Mar 2020 02:45:31 -0800 (PST)
MIME-Version: 1.0
References: <CAP+sJUfcEY2DNEQV=duJdN6P8zZn0ccuei+4ra-B6TcLb5z8Kg@mail.gmail.com> <49ac5fc3-4a3c-fb87-d366-eb7e7cfd60df@mimuw.edu.pl> <18233.1583176305@localhost>
In-Reply-To: <18233.1583176305@localhost>
From: Rahul Jadhav <rahul.ietf@gmail.com>
Date: Tue, 3 Mar 2020 16:15:18 +0530
Message-ID: <CAO0Djp3w4vWCOawQ+eegNTRzb_HRGYH6n=bdEH6iVf5ZO0AGFQ@mail.gmail.com>
To: Routing Over Low power and Lossy networks <roll@ietf.org>, iwanicki@mimuw.edu.pl
Content-Type: multipart/alternative; boundary="000000000000b1d080059ff100ca"
Archived-At: <https://mailarchive.ietf.org/arch/msg/roll/sqKMXXFZpBooT-2bSY5hzPKUiCk>
Subject: Re: [Roll] Border router failure detection
X-BeenThere: roll@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Routing Over Low power and Lossy networks <roll.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/roll>, <mailto:roll-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/roll/>
List-Post: <mailto:roll@ietf.org>
List-Help: <mailto:roll-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/roll>, <mailto:roll-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Mar 2020 10:45:36 -0000

Welcome Konrad and great to hear you on ROLL,

I had access to your paper (through my office IEEE act) and have read the
problem statement and the proposed solution.

I have a few questions with regard to the problem statement before we dive
into the solution part.

Border Router (BR) crash/restart is an issue and any deployment needs to
tackle it (IMO, in version 0.1). As I understand the aim of the paper is to
ensure that the attached nodes should detect the BR failure asap. Just
curious to understand how the nodes use this information in your
deployments? What will be the use-case of this detection?

In my deployments, my focus was to get the BR restarted as soon as possible
and then ensure that the nodes below it could rejoin the restarted BR. This
would mean that BR needs to backup some state information that would be
used post-crash-recovery.
On the contrary, in my scenario, it was necessary to detect 6LN/6LR failure
without depending on periodical pings from nodes to central sever i.e., for
some reason if the node (non-BR node) is stuck, then BR should detect it
asap (without depending on route lifetime since it could be very high) so
that measures could be taken (for e.g., send a personnel to repair the
smart meter). This also is non-trivial to handle. For BR, it is always
possible to ping on the external leg and check availability. Also, BR
unavailability means no traffic going out from any nodes and thus is easy
to identify on the external monitoring system.

Also, as Michael mentioned, the working group had discussed some issues
with respect to reboot handling (on 6LN/6LR/BR) and it has been captured in
https://datatracker.ietf.org/doc/draft-ietf-roll-rpl-observations/
It would be immensely helpful to get your view on those points.

Best,
Rahul

On Tue, 3 Mar 2020 at 00:42, Michael Richardson <mcr+ietf@sandelman.ca>
wrote:

>
> Welcome!
>
> Konrad Iwanicki <iwanicki@mimuw.edu.pl> wrote:
>     > In a nutshell, I would like to propose an extension to RPL that had
> been
>     > invented to significantly improve handling crashes of border
> routers. Since I
>     > have little experience writing RFC-like drafts, I would greatly
> appreciate
>     > any help.
>
> Use the markdown method, and use someone's template github.
>
>     > What we observed, however, is that RPL does not efficiently handle
> crashes of
>     > border routers [1][2]. Upon such a failure, tearing down nonexistent
> upward
>     > routes can take a lot of time (depending on the data-plane traffic)
> and
>     > generate considerable control traffic, which is problematic in many
>     > applications.
>
> Rahul and Pascal (and others) have had a lot of conversation about how we
> deal with the various lollipop counters.  So I am interested in what your
> border router does when it boots: how does it announce the new DIOs?
>
>     > What we did to address the problem was developing an algorithm,
> called RNFD,
>     > in which nodes collaborate to monitor the state of a border router
> of the
>     > DODAG they belong to [1]. Experiments with a TinyOS implementation
> of the
>     > algorithm on two testbeds (32 nodes at 2.4GHz and 76 nodes at
> 868MHz) and in
>     > simulations show that it can outperform bare RPL: it can detect a
> border
>     > router crash one or two orders of magnitude faster and with much
> lower
>     > control traffic [1].
>
> okay.
>
>     > [1] K. Iwanicki: “RNFD: Routing-Layer Detection of DODAG (Root) Node
> Failures
>     > in Low-Power Wireless Networks,” in IPSN 2016: Proceedings of the
> 15th
>     > ACM/IEEE International Conference on Information Processing in Sensor
>     > Networks. IEEE. Vienna, Austria. April 2016. pp. 1—12. DOI:
>     > 10.1109/IPSN.2016.7460720
>
> Unfortunately, it's behind the IEEE paywall.
> I have given up on getting documents from the IEEE.
> I guess you have been working on this for at least five years now.
>
>     > [2] A. Paszkowska and K. Iwanicki: “Failure Handling in RPL
> Implementations:
>     > An Experimental Qualitative Study,” in Mission-Oriented Sensor
> Networks and
>     > Systems: Art and Science (Habib M. Ammari ed.). Springer
> International
>     > Publishing. Cham, Switzerland. September 2019. pp. 49—95. DOI:
>     > 10.1007/978-3-319-91146-5_3
>
>     > [3] P. Ciolkosz: “Integration of the RNFD Algorithm for Border
> Router Failure
>     > Detection with the RPL Standard for Routing IPv6 Packets,” Master's
> Thesis,
>     > University of Warsaw. November 2019.
>
> --
> Michael Richardson <mcr+IETF@sandelman.ca>ca>, Sandelman Software Works
>  -= IPv6 IoT consulting =-
>
> _______________________________________________
> Roll mailing list
> Roll@ietf.org
> https://www.ietf.org/mailman/listinfo/roll
>