Re: [Roll] Semantics of DAO ACK

Joakim Eriksson <joakime@sics.se> Fri, 02 October 2015 15:39 UTC

Return-Path: <joakime@sics.se>
X-Original-To: roll@ietfa.amsl.com
Delivered-To: roll@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5E42B1B2C2B for <roll@ietfa.amsl.com>; Fri, 2 Oct 2015 08:39:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.6
X-Spam-Level:
X-Spam-Status: No, score=-2.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2ZnM6ZOmYH9V for <roll@ietfa.amsl.com>; Fri, 2 Oct 2015 08:39:42 -0700 (PDT)
Received: from mail-la0-f53.google.com (mail-la0-f53.google.com [209.85.215.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1AB621B2C2A for <roll@ietf.org>; Fri, 2 Oct 2015 08:39:41 -0700 (PDT)
Received: by laclj5 with SMTP id lj5so94745086lac.3 for <roll@ietf.org>; Fri, 02 Oct 2015 08:39:39 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:cc:content-transfer-encoding:message-id:references :to; bh=x8K1sbemnMGLVP69DzggaR4rLDXxYNaHD1NsNNQC6kE=; b=A4NIKcbbkY35j5IY3ar0ykVUz2lGsfTARdj86SmgwCYBCCsDy9RoSNC+NHQPD0mX3y ZlDCM3eRn9fcJ+uatKN6JLlVD6zT7YKiFGEwUS2BEs7I7M17OfFhPAgkHsrl90p5LjnL NpKe9JgvKDwrHg9q5CjhIZPa+VZzABcOB0tqXGn7gCxKmemZxtc0ssAwO4up4GT3gZOR lslTz8zPebeyLdzRaRqc+vxx3bba157JiVFT0sJ2nkLZ18DMPv399cPlkgPQ8Zvlryi1 KWZE9awOy8OAdM38yW6KtAAjuH/l7x7Nx4XX6NgyROvSDeDdY+1rctlw6CyL4mNw4oE+ Okzw==
X-Gm-Message-State: ALoCoQliGTvJvyjzUMVVTUKacWlcst3gJydOTaCnlpnlXRhCioDQ+vcJ5Qo6TpuLPvrBUwl+85xX
X-Received: by 10.25.167.84 with SMTP id q81mr3991781lfe.124.1443800379252; Fri, 02 Oct 2015 08:39:39 -0700 (PDT)
Received: from [192.168.1.102] (h31n15-sbg-a11.ias.bredband.telia.com. [195.67.245.31]) by smtp.gmail.com with ESMTPSA id g140sm1626662lfg.29.2015.10.02.08.39.38 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 02 Oct 2015 08:39:38 -0700 (PDT)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\))
From: Joakim Eriksson <joakime@sics.se>
In-Reply-To: <560EA05A.3080002@fbk.eu>
Date: Fri, 02 Oct 2015 17:39:37 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <B806D72C-BF7E-416B-A49C-B65FD52DE86E@sics.se>
References: <DB5PR01MB10807DAF503BBFF45787599C80420@DB5PR01MB1080.eurprd01.prod.exchangelabs.com> <6d21d0f86ab14ae7a99ff9fe6873b1fd@XCH-RCD-001.cisco.com> <C885EE62-D889-4229-9CCB-B3CB540F5692@sics.se> <560AFDBB.8050505@gmail.com> <560B68B2.6030501@fbk.eu> <245E0C92-6ED6-426B-95E1-09BA8736F1BC@sics.se> <560D8386.6000502@fbk.eu> <C00C9B6A-65F3-4C30-9982-44C94925D5D1@sics.se> <c905b2df38e8446cb71459ee3f66f50a@XCH-RCD-001.cisco.com> <560EA05A.3080002@fbk.eu>
To: Routing Over Low power and Lossy networks <roll@ietf.org>
X-Mailer: Apple Mail (2.2104)
Archived-At: <http://mailarchive.ietf.org/arch/msg/roll/fv9TcvyId08gBCOp0hRhcFs1N4w>
Subject: Re: [Roll] Semantics of DAO ACK
X-BeenThere: roll@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
Reply-To: Routing Over Low power and Lossy networks <roll@ietf.org>
List-Id: Routing Over Low power and Lossy networks <roll.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/roll>, <mailto:roll-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/roll/>
List-Post: <mailto:roll@ietf.org>
List-Help: <mailto:roll-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/roll>, <mailto:roll-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Oct 2015 15:39:45 -0000

From my perspective we really need the end-to-end ACK that is needed - the single hop ACK is
of no use at all if you aim at building scalable RPL networks that have bi-directional communication.

We have worked with this for quite a while at Yanzi Networks and I would say that if we get a deadlock
somewhere that is probably an indication that the selected path/parent was a bad one and that the node
should look for alternatives. If we instead would have gotten a DAO ACK over single-hop the node would
assume connectivity - but be wrong and some other mechanism (usually the application) needs to kick in
and trigger repairs. This is for me not the wanted behaviour and I rather take the risk of one or a few 
(short-term) dead-locks if a node in the path towards root reboots or in other ways is gone.

Of course combining DAO ACK (end-to-end) with infinite lifetime routes is not an optimal configuration
if there are many reboots and dead-locks. That is why we try to keep routing lifetime reasonably short
(30 minutes) so that any routes that ended up being “stuck” somewhere will be garbage collected after
a while.

Best regards,
— Joakim Eriksson, SICS


> On 02 Oct 2015, at 17:18, Csaba Kiraly <kiraly@fbk.eu> wrote:
> 
> Hi Pascal,
> 
> The core observation behind the end-to-end, or delayed ACK (at least on my side, I don't know for Joakim) is that while the one-hop NACK contains important information, the one-hop ACK is not really what you are interested in. In fact, implementations I've checked either don't even ask for it, or if they do, they still don't do anything on a positive ACK. If instead you delay ACKs, or in other words wait for all the parents to ACK, this ACK or NACK brings in an important piece of information.
> 
> Notice that if your immediate parent would NACK, it would NACK immediately in both interpretations. Feedback only changes where things go well, or things go well till some point on the recursive path, but fail after. In the first, you get an ACK and you know you are reachable. In the latter, you do get to know the problem and you can act upon. Convergence is surely something that needs to be studied carefully, but I think delayed ACK will not create more issues than what you already have with the one-hop ACK.
> 
> The point that I tried to raise when I brought this up, but maybe it wasn't clear, is that I think both are conformant to RFC6550. Which, if I'm correct, brings me to the point that clarifications might be needed about ACKs in an RFC to avoid interoperability problems, and I think it is needed independent of whether the end-to-end context is a good idea or not. You could even think of sending both types of ACKs, but that would bring the implementation out of RFC6550.
> 
> Cheers,
> Csaba
> 
> On 02/10/15 15:04, Pascal Thubert (pthubert) wrote:
>> Hello Joakim:
>> 
>> What I read here boils down to a limited diffusion algorithm. We tried to avoid that in RPL because any movement within the convergence would cause a deadlock. IOW if every node waits for (all of) the parents to ack, then this recursively takes you to the root, and if any node in the chain moves or dies, the diffusion will not be complete.
>> 
>> Cheers,
>> 
>> Pascal
>> 
>>> -----Original Message-----
>>> From: Roll [mailto:roll-bounces@ietf.org] On Behalf Of Joakim Eriksson
>>> Sent: jeudi 1 octobre 2015 22:11
>>> To: Routing Over Low power and Lossy networks <roll@ietf.org>
>>> Subject: Re: [Roll] Semantics of DAO ACK
>>> 
>>> 
>>>> On 01 Oct 2015, at 21:03, Csaba Kiraly <kiraly@fbk.eu> wrote:
>>>> 
>>>> 
>>>> 
>>>> On 30/09/15 17:55, Joakim Eriksson wrote:
>>>>>> On 30 Sep 2015, at 06:44, Csaba Kiraly <kiraly@fbk.eu> wrote:
>>>>>> 
>>>>>> Hello Joakim,
>>>>>> 
>>>>>> I have also worked on the Contiki DAO-ACK code, enabling ACKs,
>>> implementing fixes to the DAOSequence handling, and looking into multiple
>>> targets.
>>>>> Nice, I have a PR on Contiki now with what we have been doing to get
>>> Contiki RPL more scalable (but still only single targets / paths).
>>>>>> What Cenk is saying sounds a reasonable hack, but the standard itself
>>> is in my opinion a bit underspecified for the semantics of DAO-ACK
>>> messages in several ways.
>>>>> Yes, it is underspecified. There is need for clarifications and more details
>>> on the DAO / DAO ACK!
>>>>>> My preferred solution for ACKing DAO messages with multiple targets
>>> would be to have support for the same semantics that you would have had
>>> with one DAO message per Target, i.e., I would prefer an option field that
>>> gives an individual status for each Target.
>>>>> Yes, but that would require more memory in the sending node to keep
>>> track of things or full specification of the target in the response.
>>>>> I guess it might be solved by allowing multiple DAO ACKs for the same
>>> DAO and to have the Target options included in the DAO ACK.
>>>> This is a compromise to consider for sure. If you look at my
>>> implementation, you can see that I use the routing table entry to match the
>>> DAO ACK, and I keep only DAOSequence (SEQ) as extra state. I'm not sure it
>>> would work in all cases, but it did work for the case I considered. I'm
>>> keeping only the SEQ, and I think this is a must if you have no aggregation,
>>> and you want to match in case you forward multiple DAOs before getting
>>> the ACK or timing out on it. In fact, simply enabling ACKs in the original
>>> code was messing up the match between DAOs and their ACKs because it
>>> wasn't even keeping track of the DAO sequence numbers.
>>>> If you do aggregate ACKs, or you have more complex Target+Transit
>>> scenarios, the game changes, and I don't know what would be the balance
>>> between state you have to keep anyway and state you keep just to be able
>>> to interpret a later ACK correctly. I think it is implementation specific, so my
>>> suggestion would be a flag to indicate whether you request full ACK
>>> (bumping back all T+T info for rejected entries in the DAO) or a much
>>> smaller partial ACK with SEQ and similar IDs only. This flag (lets call it F for
>>> now), could go right next to K. As the DAO propagates, each node could set
>>> F based on whether it is able to store and/or reproduce info (F not set) or
>>> chose to remain stateless (set F ). This would exclude aggregation in the
>>> DAO-ACK sent down, but still allow for aggregation in the DAO sent up.
>>>>>> To give another example of subtle problems with DAO-ACK, it was not
>>> clear to me whether I want my implementation to ACK when the Target is
>>> added to the routing table, or only when this node itself receives an ACK for
>>> the same Target from its parent. Both makes sense, with the former giving a
>>> quick one-hop ACK, while the latter works as an end-to-end ACK, ensuring
>>> that the path is actually built. Both look conformant to the standard, but I
>>> suppose the original author was thinking of the former, and I can easily see
>>> interoperability problems arising between two implementations using
>>> different semantics.
>>>>> We did go for the end-to-end ACK to achieve better scalability. There
>>>>> is to me no point having a route to the parent if it is not possible to get
>>> it all the way to root. But I totally agree - this is not obvious from the RPL
>>> RFC either.
>>>> Do you mean end-to-end ACK as in non-storing, or end-to-end ACK in the
>>> sense that it is still addressed to the next hop but you delay ACK till you
>>> know your parent (and all its parents) ACKed?
>>> I mean end-to-end as in waiting for all the parents to ACK so that before
>>> ACKing it is known that the route is properly installed where it needs to.
>>> 
>>>>> Do you have your Contiki code somewhere in the open-source?
>>>> I did some cleanup and pushed the clean part of the code to github:
>>>> https://github.com/cskiraly/contiki/tree/DAO-NACK
>>>> It is not yet rebased to the latest master, but it should apply. I suppose
>>> describing it would be too off-topic for the list.
>>>> There is also the more experimental part of the code in a PR, if interested.
>>> Ok - I’ll take a look!
>>> 
>>> BTW: We have done a few deployments using our implementation using
>>> Yanzi networks products - take a look at this video where 1000 nodes is
>>> deployed with Contiki RPL + our fixes and 20 neighbors and routes per
>>> node. If without our fixes it would not work since Contiki RPL did not allow
>>> scaling beyond number of neighbors and routes / node that well before
>>> these fixes.
>>> 
>>> http://www.yanzi.se/video.jsp?id=7
>>> 
>>> Best regards,
>>> — Joakim Eriksson
>>> 
>>>> Best regards,
>>>> Csaba
>>>>> Best regards,
>>>>> — Joakim
>>>>> 
>>>>>> Best regards,
>>>>>> Csaba
>>>>>> 
>>>>>> On 29/09/15 23:08, Cenk Gündogan wrote:
>>>>>>> Hello Joakim,
>>>>>>> 
>>>>>>> This is an interesting question and I also couldn't find any answers in
>>> RFC 6550.
>>>>>>> However, my thoughts on this are as follows:
>>>>>>> Since a sub-set of the announced RPL targets could have been
>>>>>>> accepted before filling up the routing table (e.g.), I would choose a
>>> status code between 1 and 127.
>>>>>>> I would expect a node to choose another parent if a more aggressive
>>> status code is received ([128-255]).
>>>>>>> But a full routing table can have free space again until the next or any
>>> subsequent DAO arrives ..
>>>>>>> therefore I prefer a "mild rejection" with a status code of [1-127].
>>>>>>> 
>>>>>>> To give some feedback to the originator of the DAO, it might be
>>>>>>> sensible to copy the rejected RPL Target options from the affected
>>>>>>> DAO to the DAO-ACK, so that the originator is fully aware of which
>>> Target prefixes got rejected (and which ones got accepted, implicitly).
>>>>>>> I would choose this method, because it doesn't require the
>>>>>>> originator of the DAO to save any extra state about the DAO and its
>>> contents.
>>>>>>> Nonetheless, everything I wrote is nonconform and I am also
>>>>>>> interested in the RPL experts' opinions and solutions.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Cenk
>>>>>>> 
>>>>>>> On 29.09.2015 21:44, Joakim Eriksson wrote:
>>>>>>>> Hello All,
>>>>>>>> 
>>>>>>>> I have spend quite some time to get a more stable implementation
>>>>>>>> of DAO handling for Contiki RPL and I am currently looking into
>>>>>>>> DAO aggregation. But I realised that it is for me not 100% clear
>>>>>>>> what a node that receives a DAO with several prefixes to be
>>>>>>>> registered but can only accept a sub-set of them. Should it be a
>>> DAO_NACK in this case or is there any other way to handle that case?
>>>>>>>> If each would have been sent separately it is obvious that the
>>>>>>>> receiving node can do a NACK when the routing table is full and
>>>>>>>> therefore it is possible to get fine-grained answers. But with
>>> aggregation of DAOs this is not the case.
>>>>>>>> Any ideas?
>>>>>>>> 
>>>>>>>> Best regards,
>>>>>>>> — Joakim Eriksson, SICS
>>>>>>>> _______________________________________________
>>>>>>>> Roll mailing list
>>>>>>>> Roll@ietf.org
>>>>>>>> https://www.ietf.org/mailman/listinfo/roll
>>>>>>> _______________________________________________
>>>>>>> Roll mailing list
>>>>>>> Roll@ietf.org
>>>>>>> https://www.ietf.org/mailman/listinfo/roll
>>>>>> _______________________________________________
>>>>>> Roll mailing list
>>>>>> Roll@ietf.org
>>>>>> https://www.ietf.org/mailman/listinfo/roll
>>>>> _______________________________________________
>>>>> Roll mailing list
>>>>> Roll@ietf.org
>>>>> https://www.ietf.org/mailman/listinfo/roll
>>>> _______________________________________________
>>>> Roll mailing list
>>>> Roll@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/roll
>>> _______________________________________________
>>> Roll mailing list
>>> Roll@ietf.org
>>> https://www.ietf.org/mailman/listinfo/roll
>> _______________________________________________
>> Roll mailing list
>> Roll@ietf.org
>> https://www.ietf.org/mailman/listinfo/roll
>> 
>> 
> 
> 
> _______________________________________________
> Roll mailing list
> Roll@ietf.org
> https://www.ietf.org/mailman/listinfo/roll