Re: [Roll] Review request for draft-clausen-lln-rpl-experiences-11

"Pascal Thubert (pthubert)" <pthubert@cisco.com> Thu, 17 May 2018 14:20 UTC

From: "Pascal Thubert (pthubert)" <pthubert@cisco.com>
To: Routing Over Low power and Lossy networks <roll@ietf.org>
Thread-Topic: [Roll] Review request for draft-clausen-lln-rpl-experiences-11
Thread-Index: AQHT7V1LuoaqYgQxeEKf1cTQg18YbKQz3azg
Date: Thu, 17 May 2018 14:20:17 +0000
Deferred-Delivery: Thu, 17 May 2018 14:19:38 +0000
Message-ID: <6cb0f705092d45f6bd78e94209357a0f@XCH-RCD-001.cisco.com>
References: <CAP+sJUfmD-kZqPBxPwoUsH7_of+11scyMbn_4-x6ZQDS2Hsnxg@mail.gmail.com>
In-Reply-To: <CAP+sJUfmD-kZqPBxPwoUsH7_of+11scyMbn_4-x6ZQDS2Hsnxg@mail.gmail.com>
Accept-Language: fr-FR, en-US
Content-Language: en-US
Content-Type: multipart/alternative; boundary="_000_6cb0f705092d45f6bd78e94209357a0fXCHRCD001ciscocom_"
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/roll/FtJHtCDHqhxzHzYExde17wAMwfk>
Subject: Re: [Roll] Review request for draft-clausen-lln-rpl-experiences-11
Precedence: list

Hello Ines :

Some random quotes, I did not go through a deep review; note that the title seems not feat since the text inside does not describe an experience but most a partial analysis of the protocol.


        In non-storing mode, the
   resource requirements on the DODAG Root are likely much higher than
   in storing mode, as the DODAG Root needs to store a network graph
   containing complete routes to all destinations in the RPL instance,

need to define “much”. Arguably, the root needs not store more than what it receives.
In storing mode, for each address in the network, the root saves a state associated to the adjacency, pointing on the next hop. This can be represented as the address of the next hop.
In non-storing mode, for each address in the network, the root saves a state associated to the node’s access link, pointing on the node’s parent. This can be represented as the address of the next parent. So not so much difference. So if the node performs a recursive route look up on each packet, there is no additional memory needed.
On the side, the root that can afford it may use some memory for caching source route paths, but then, when a parent – child relation changes, there is a need for additional work to invalidate the caches that use the modified hop.




   As the memory requirements for the DODAG Root and for other routers

   are substantially different, unless all routers are provisioned with

   resources (memory, energy, ...) to act as DODAG Roots, effectively if

   the designated DODAG Root fails, the network fails and RPL is unable

   to operate.

Sensible deployments that care for reliability use multiple roots. RPL is designed for that. The way things are worded here conveys a wrong impression.



   o  Networks in which all traffic is bi-directional, e.g., in case

      sensor devices in the LLN are, in majority, "actively read": a

      request is issued by the DODAG Root to a specific sensor, and the

      sensor value is expected returned.

The paper hints that RPL is not optimized for bidirectional traffic. This is wrong. RFC 6550 optimizes traffic to and from the root, and stretches device to device routes.
We now have AODV RPL and route projection to feel that latter gap but the doc does not mention them.



   o  A considerable control traffic overhead [bidir<https://tools.ietf.org/html/draft-clausen-lln-rpl-experiences-11#ref-bidir>], in particular at

      and near the DODAG Root.  Given the low data rate of LLNs (between

      20kbit/s and 250kbit/s for 802.15.4, between 5 kbit/s and 40

      kbit/s for G3-PLC, depending on the types of modulation), it has

      potential to congest the channels.


Hum; this again throws ideas without measure nor any mention of the efforts in RPL to balance that.

RPL refrains from (or is lazy at) advertising changes that do not affect the data traffic. This is why we have all these data-plane checks with the RPI.

In storing mode:

-         Storing mode DAO aggregate individual routes, this reduces the chattiness by an order of magnitude

-         Changes that are absorbed by a common parent do not need to be reported to the root.

So the root sees more addresses but sees them more stable.



In any fashion, the chattiness is controlled by the Objective function, how quickly it decides to make a change.



Compared to the protocols in the art that ROLL studied to start with, RPL is actually quite an improvements. An isotropic IGP would have the cost we see near the root near every node in the network. The key benefit of route stretch is that only the root pays that price. Not f  the above appears in the text, then again the perception from incomplete information is the wrong one.





   o  Energy drain from the routers near the DODAG root, because they

      need to forward more traffic for other parts of the network.



True, this is why we now have additional features, which the group thought of a lesser priority to start with considering the use cases.







6.  Fragmentation Of RPL Control Messages And Data Packet



This section discusses the 6LoWPAN problem of unreliable fragment in general. This is being addressed at 6LoWPAN and is not a RPL problem (see https://tools.ietf.org/html/draft-thubert-6lo-fragment-recovery). RPL is a L3 routing protocol, that can adapt but does not depend on a particular link type.

Then the paper discusses the size of the RPL message. This part is correct, and we (as 6LoWPAN +ROLL community) have always been ready to define a 6LoWPAN compression for RPL control packets. As it goes, no one complained so far so the work did not happen. OTOH, we did RFC 8138, but the draft fails to mention it.





   In addition to possible fragmentation, as occurs when using

   potentially long source routing headers over a medium with a small

   MTU



Same as above, a 6lo problem being handled, and missing ref to RFC 8138.







   In short, the mechanisms in RPL force the choice between requiring

   all routers to have sufficient memory to store route entries for all

   destinations (storing mode) - or, suffer increased risk of

   fragmentation (thus loss of data packets), while consuming network

   capacity by way of source routing through the DODAG Root (non-storing

   mode).



Fails to mention the work on DAO projection.







Address Aggregation and Summarization



RPL is used in adhoc networks and most typically inside a subnet. These are not environments where aggregation takes place. The whole section has little reason to exist. It seems to indicate that RPL is lacking  a border router function which does not have a reason to exist in the considered use cases.




Preliminary results from a test-bed of
   AMI (Automated Metering Infrastructure) devices using 950MHz radio
   interfaces, and with a total of 22 links, show that 36% of these
   links are unidirectional.  If a router receives a DIO on such a
   unidirectional link, and selects the originator of the DIO as parent,
   which would be a bad choice: unicast traffic in the upward direction
   would be lost.  If the router had verified the bidirectionality of
   links, it might have selected a better parent, to which it has a
   bidirectional link.



This is really far from the truth. In fact, RPL stipulates that the bidirectionality of a link must be validated before the link is used (section 1.1 of RFC 6550). Also, one test bed does not represent the internet. Later the paper recognizes that.

There was an effort for asymmetrical links, https://datatracker.ietf.org/doc/html/draft-thubert-roll-asymlink-02 but then again, no one showed up saying the work was actually needed for their use cases so the work did not progress. If the problem finally appear to be a real world problem, then we are ready to answer.

With RPL AODV, though, we consider supporting the case of unidirectional links, though the need is still largely undetermined.





   [RFC6550] discusses some mechanisms which can (if deemed needed) be

   used to verify that a link is bidirectional before choosing a router

   as a parent.  While requiring one mechanism for bidirectional

   verification to be used, the document does not specify which method

   to be used, and how to be used.



This is because the practitioner of the art know that most common radios used in LLNs have a layer-2 acknowledgement built in. So it is just impossible to unicast at L2 without bidirectional connectivity. For that reason, RPL does not include a mechanism that would be mostly unused, and leaves it to the device to assert bidir the way it likes.



   This has as consequence that such L2 acknowledgements

   can only be used to determine if a given link is bidirectional or

   unidirectional once the router already has selected parents AND

   actually has data traffic to forward by way of these parents - in

   contradiction with RPL's stated design principle that require that

   the reachability of a router be verified before choosing it as a

   parent ([RFC6550], Section 1.1<https://tools.ietf.org/html/rfc6550#section-1.1>).



NUD is mostly for maintenance purpose. At the join time, things are different.

And in order to participate to RPL a node must have a global address to advertise, and a neighbor cache installed in the router, before it can do anything. That step uses unicast 6LoWPAN ND exchanges, which require bidir connectivity, and it happens before attaching to a parent but the doc omits that particular aspect, maybe because some open source implementation had its own shortcuts.

Other exchanges may have happen if the router serves as join proxy for authentication, etc…





   A router may detect that its preferred parent is lost by way of NUD,

   when trying to communicate to the DODAG Root.  If that router has no

   other parents in its parent set, all it can do is wait: RPL does not

   provide other mechanisms for a router to react to such an event.



Wrong, RPL has the DIS mechanism for actively polling for parents.




   It is worth noting that RPL is optimized for upward traffic
   (multipoint-to-point traffic).  This is exactly the type of traffic
   where NUD is not applicable as a mechanism for detecting and reacting
   to connectivity loss.


Wrong, RPL has the same path optimization upwards and downwards. Only node to node (P2P) is stretched. Note that NUD is adapted to the lazy reactivity of RPL. Whether it is upwards or downwards, the traffic will trigger NUD. I heard of a buggy opensource implementation that failed to trigger NUD on downwards traffic, and thus did not discover the loss of a child. Sorry, not complient.





   Also, absent all routers consistently advertising their reachability

   through DAO messages, a protocol requiring bidirectional flows

   between the communicating devices, such as TCP or CoAP confirmable-

   acknowledgement exchange, will be unable to operate.

What’s the point? Is there such a thing and magical routing without an advertisement ? the reader sees things pilling up though it’s mostly empty comments.



   Finally, upon having been notified by NUD that the "next hop" is

   unreachable, a router must discard the preferred parent and select

   another - hoping that this time, the preferred parent is actually

   reachable.

RPL is meant to use a DAG. Preferred is not exclusive, and the conceptual preferred parent may be a set of nodes anyway. So it is not at all like if all traffic was dead when the preferred parent starts failing and till the failure is found. A node can and should use all good parents for upward traffic - while preferring the preferred one-, and maintain the metrics from all of them to maintain the preference ordering as the state of the links evolve



   In order to accommodate the verbose exchange format, route stretching

   and source routing for point-to-point traffic, several additional

   Internet-Drafts are being discussed for adoption in the ROLL Working

   Group - adding complexity to an already complex specification which,

   it is worth recalling, was intended to be of a protocol for low-

   capacity devices.
When we write the Internet Standard, we’ll try to be better at editing. I can buy that the text makes the spec look more complex than it is.
Now there are professional and interworking implementations if RPL, which prove the whole section moot.
On the side, does someone reading this need to be recalled that RPL is for LLNs?



   The reason for RPL to repair loops only when detected by a data

   traffic transmission is to reduce control traffic overhead.  However,

   there are two problems in repairing loops only when so triggered: (i)

   the triggered local repair mechanism delays forward progress of data

   packets, increasing end-to-end delays, and (ii) the data packet has

   to be buffered during repair.

Double wrong; RPL does not repair *only* upon data path detection, that’s a tradeoff to conserve energy that would be wasted fixing paths that are not used and may flap for just a few seconds. Depending on the environment, RPL may be tuned to more of less frequent DIOs upon which a node may lazily repair. The datapath detection only accelerates the lazy repair when it appears that the broken path is actively used.
And then the data packet does not need to be buffered. Nothing prevents the node from buffering it a little while if it can, but soon the packet will be retried or obsolete, and the network cannot know that, so discarding is a good strategy.

All in all, I’d say not ready for publication…

Pascal



From: Roll <roll-bounces@ietf.org> On Behalf Of Ines Robles
Sent: mercredi 16 mai 2018 23:31
To: roll <roll@ietf.org>
Subject: [Roll] Review request for draft-clausen-lln-rpl-experiences-11

Dear all,

This  Individual Submission depicts observations of RPL.

https://tools.ietf.org/html/draft-clausen-lln-rpl-experiences-11

Please let us know your opinion.

Thanks,

Ines and Peter

[Roll] Review request for draft-clausen-lln-rpl-e… Ines Robles
Re: [Roll] Review request for draft-clausen-lln-r… Philip Levis
Re: [Roll] Review request for draft-clausen-lln-r… Pascal Thubert (pthubert)