Re: [Roll] Benjamin Kaduk's No Objection on draft-ietf-roll-efficient-npdao-12: (with COMMENT)

Benjamin Kaduk <kaduk@mit.edu> Tue, 02 July 2019 16:40 UTC

Return-Path: <kaduk@mit.edu>
X-Original-To: roll@ietfa.amsl.com
Delivered-To: roll@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 085321205FB; Tue, 2 Jul 2019 09:40:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.2
X-Spam-Level:
X-Spam-Status: No, score=-4.2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4CW-9clMtECo; Tue, 2 Jul 2019 09:40:31 -0700 (PDT)
Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DE28E1204DC; Tue, 2 Jul 2019 09:40:30 -0700 (PDT)
Received: from kduck.mit.edu ([24.16.140.251]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id x62GeF14001355 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 2 Jul 2019 12:40:18 -0400
Date: Tue, 02 Jul 2019 11:40:15 -0500
From: Benjamin Kaduk <kaduk@mit.edu>
To: Rahul Arvind Jadhav <rahul.jadhav@huawei.com>
Cc: The IESG <iesg@ietf.org>, "draft-ietf-roll-efficient-npdao@ietf.org" <draft-ietf-roll-efficient-npdao@ietf.org>, Peter Van der Stok <consultancy@vanderstok.org>, "aretana.ietf@gmail.com" <aretana.ietf@gmail.com>, "roll-chairs@ietf.org" <roll-chairs@ietf.org>, "roll@ietf.org" <roll@ietf.org>
Message-ID: <20190702164014.GT13810@kduck.mit.edu>
References: <156150881090.31233.460341246895590440.idtracker@ietfa.amsl.com> <982B626E107E334DBE601D979F31785C5DF26D0C@BLREML503-MBX.china.huawei.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <982B626E107E334DBE601D979F31785C5DF26D0C@BLREML503-MBX.china.huawei.com>
User-Agent: Mutt/1.10.1 (2018-07-13)
Archived-At: <https://mailarchive.ietf.org/arch/msg/roll/swv5iHlIwZTO4E5i9qUpiWkBIQg>
Subject: Re: [Roll] Benjamin Kaduk's No Objection on draft-ietf-roll-efficient-npdao-12: (with COMMENT)
X-BeenThere: roll@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Routing Over Low power and Lossy networks <roll.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/roll>, <mailto:roll-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/roll/>
List-Post: <mailto:roll@ietf.org>
List-Help: <mailto:roll-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/roll>, <mailto:roll-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 02 Jul 2019 16:40:34 -0000

On Mon, Jul 01, 2019 at 12:51:22AM +0000, Rahul Arvind Jadhav wrote:
> Thank you Benjamin for the detailed review and the comments.
> Please find my responses inline.
> 
> Thanks,
> Rahul
> 
> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------
> 
> I think that we need greater clarity about whether the DCOSequence number is just a series of monotonic (i.e., time-ordered) nonces (to be echoed back for matching request/response) or a full-on sequence counter that allows for loss detection as well as providing in-order delivery.
> It sounds like we just need the time-ordering and single-use properties, but I'm not entirely sure.  I wavered about making this a Discuss point but ended up not doing so since I'm not sure how much harm is being risked.  (I also mention this topic a couple times in the section-by-section comments below.)
> 
> [RJ] The DCOSequence is not expected to provide in-order delivery. The receiver
> is simply expected to echo back the DCOSequence whatever is received which
> informs the sender that the info is received. Even if received un-ordered the receiver is expected to just ack the rcvd DCO.

Okay, so it seems like the risk here would just be if a receiver
mis-implemented and expected sequential behavior.  I know Roman had also
mentioned this topic in his ballot, and I see you propose to make some
changes down in the Comment where I mentioned this; thanks!

> I agree with Barry that the Abstract is really hard to parse.
> 
> [RJ] Yes. We had updated the abstract as per Barry's suggestion.
> 
> Section 1.2
> 
>    RPL uses NPDAO messaging in the storing mode so that the node
>    changing it routing adjacencies can invalidate the previous route.
> 
> nit: "its routing adjacencies"
> 
> [RJ] Fixed
> 
>    This is needed so that nodes along the previous path can release any
>    resources (such as the routing entry) it maintains on behalf of
>    target node.
> 
> nit: singular/plural mismatch "nodes"/"it maintains"
> [RJ] changed "it maintains" to "they maintain"
> 
> Section 4.1
> 
>                                                             When node A
>    receives the regular DAO, it finds that it already has a routing
>    table entry on behalf of the target address of node D.  It finds
>    however that the next hop information for reaching node D has changed
>    i.e., node D has decided to change the paths.  In this case, Node A
>    which is the common ancestor node for node D along the two paths
>    (previous and new), should generate a DCO which traverses downwards
>    in the network.
> 
> I can't decide whether or not it helps readability to reiterate that in addition to creating the DCO, node A also does normal DAO processing (e.g., forwarding to the 6LBR).  I guess the example in A.1 does show this normal processing, so maybe it's overkill to also do so here.
> 
> [RJ] As you hinted, It would be better for the sake of completeness to add DAO processing point here as well. I have added a statement saying, "Node A handles normal DAO forwarding to
> 6LBR as required by the RFC6550."

Thanks!

> Section 4.2
> 
>                Transit Information Option should be carried in the DAO
>    message with I-flag set in case route invalidation is sought for the
>    corresponding target(s).
> 
> nit: this text as written implies thatthe I-flag is set in the DAO itself, not the TIO therein.
> 
> [RJ] Updated to say, "Transit Information Option with I-flag set should be carried in the DAO message when route invalidation is sought ..."
> 
> I'd also suggest to s/in case/when/ for clarity.
> 
> [RJ] Fixed as mentioned above.
> 
>    The common ancestor node SHOULD generate a DCO message in response to
>    this I-flag when it sees that the routing adjacencies have changed
>    for the target.  I-flag governs the ownership of the DCO message in a
>    way that the target node is still in control of its own route
>    invalidation.
> 
> nit: "The I-flag" (start of last sentence).
> 
> [RJ] Fixed
> 
> I'd further suggest rewording to something like "The I-flag is intended to give the target node control over its own route invalidation, serving as a signal to request DCO generation; in normal operation a DCO would not otherwise be generated"; the current text about "ownership" has some weird connotations/implications and this text also implicitly assumes that DAO/TIO/I-flag will never be maliciously generated.  It is also a little weaker about unsolicited DCO, per Section 4.5
> 
> [RJ] Have updated the text, " The I-flag is intended to give the target node control over its own route invalidation, serving as a signal to request DCO generation."
> I have avoided to say "in normal operation a DCO would not otherwise be generated" especially given the case of unsolicited DCO.

That's a good point (I hadn't quite gotten to unsolicited DCO when I  wrote
this comment).

> Section 4.3
> 
>    A new ICMPv6 RPL control message type is defined by this
>    specification called as "Destination Cleanup Object" (DCO), which is
> 
> nit: either "called" or "known as" or "referred to as" would be fine; "called as" is a grammatical mismatch.
> 
> [RJ] fixed. using "referred to as"
> 
>    DCOSequence: Incremented at each unique DCO message from a node and
>    echoed in the DCO-ACK message.  The initial DCOSequence can be chosen
>    randomly by the node.
> 
> What's the behavior if a sequence number is skipped?  (Why do we have a sequence number if we aren't going to detect and act on this condition?) Ah, I see Section 4.3.3, but perhaps a forward-reference is in order.
> 
> [RJ] Added a forward-reference.
> 
> Section 4.3.4
> 
> It seems that the "Reserved" field should be called "Flags", since a registry is being created for it.
> 
> [RJ] Yes, changed the "Reserved" field name to "Flags" in both the figure as well as the description below.
> 
> (I trust that the language about the D flag and DODAGID optionality from Barry's ballot thread is consistent between DCO and DCO-ACK.)
> 
> [RJ] Yes the DODAGID optionality statement is fixed in both the places as per Barry's suggestion. Thanks for reminding about the two places to fix.
> 
> Section 4.4
> 
>    1.  If a node sends a DCO message with newer or different information
>        than the prior DCO message transmission, it MUST increment the
>        DCOSequence field by at least one.  A DCO message transmission
>        that is identical to the prior DCO message transmission MAY
>        increment the DCOSequence field.
> 
> While reading up to this point I managed to confuse myself about Path Sequence (which must be consistent from DAO to DCO) and the separate DAOSequence and DCOSequence fields.  To check my (less confused) understanding, I guess if I could over-summarize, Path Sequence is like a generation counter for a given node's position in the routing topology, and the other two are for managing retransmission/ack of the respective update messages.  So if that mental model is correct, then there's not any value from trying to introduce a shared sequence number space for DCO and DAO, even though they are frequently going to be generated at the same time, especially since they have different recipients.  Right?
> 
> [RJ] The DAOSequence and DCOSequence are not dependent on each other at all.
> Yes, the DAO could be generated and DCO could be generated at the same time for
> different recipients.

Thanks for confirming.

> I do agree with the other discussion that we need clarity about whether the increment is exactly one or larger values are allowed (plus, presumably, whether the recipient should infer anything from a sequence number gap).  I do note that these are expected to be "lollipop sequence counters" per RFC 6550.
> 
> [RJ] This will be fixed i.e., we will make it explicit about the sequence
> counter operation and add a ref to the corresponding section in 6550.
> 
>    4.  A node receiving a unicast DCO message with the 'K' flag set
>        SHOULD respond with a DCO-ACK.  A node receiving a DCO message
>        without the 'K' flag set MAY respond with a DCO-ACK, especially
>        to report an error condition.
> 
> This seems redundant with Section 4.3's "A node receiving a DCO message without the 'K' flag set MAY respond with a DCO-ACK, especially to report an error condition."
> 
> [RJ] Yes it is redundant. But i feel it is better to have it in both places.
> The statements are consistent with each other.

Okay.  (I only mention it because we generally try to take note of such
redundancy; it's only a problem in practice when the two statements don't
quite agree with each other and there's no way to tell which one is
correct.)

> Section 4.4
> 
>    The scope of DCOSequence values is unique to each node.
> 
> recipient or originator?
> 
> [RJ] Changed this to, "The scope of DCOSequence values is unique to the node
> which generates it."
> 
> Section 4.5
> 
>    path on behalf of the target entry.  The 6LR has all the state
>    information namely, the Target address and the Path Sequence,
> 
> nit: comma before "namely".
> 
> [RJ] Fixed
> 
> Section 4.6.2
> 
>    Even with the changed semantics, the current NPDAO mechanism in
>    [RFC6550] can still be used, for example, when the route lifetime
>    expiry of the target happens or when the node simply decides to
>    gracefully terminate the RPL session on graceful node shutdown.
> 
> Er, what changed semantics?  This document does not have an Updates:
> relationship to any other document.
> 
> [RJ] I have reworded the section removing the statement, "Even with the changed
> semantics". Also (I believe) the section is more readable now.

Great :)

> Section 4.6.3
> 
>    Note that there is no requirement of synchronization between DCO and
>    DAOs.  The DelayDCO timer simply ensures that the DCO control
>    overhead can be reduced and is only needed when the network contains
>    nodes using multiple preferred parent.
> 
> This ("no requirement of synchronization") is because the benefit of DCO is in expiring routes faster than their normal expiration time to save local storage, rather than to provide synchronous route migration?  (It might be worth reiterating, if you want.)
> 
> [RJ] Sorry, but i didn't understand this. I didn't understand why the statement
> "the benefits of DCO is in expiring routes faster than their normal expiration
> time ...." should be mentioned in this context? NPDAO and DCO both are
> proactive route invalidation messaging techniques and they both help in expiring
> routes faster than their normal expiration times. DCO simply is more reliable
> due to its use on more reliable links.

Sorry for being unclear.  I was not proposing any text changes here, just
trying to check my understanding about "no requirement of synchronization".
But basically I think it's all just "both NPDAO and DCO are both ways to
expire cached information quickly when it becomes invalid, and if either
one works we have succeeded; we don't need both to reach any given node".

> Section 7
> 
>    This document introduces the ability for a common ancestor node to
>    invalidate a route on behalf of the target node.  The common ancestor
>    node is directed to do so by the target node using the 'I' flag in
>    DCO's Transit Information Option.  However, the common ancestor node
> 
> nit(?): there's perhaps some wordsmithing possible about "is directed to do so", given the next sentence and Section 4.5.
> 
> [RJ] I changed the statement, "The common ancestor node is directed to do so by
> the target node ..." to "The common ancestor node could be directed to do so by
> the target node ..."
> 
>    is also met.  Having said that a malicious 6LR may spoof a DAO on
>    behalf of the (sub) child with the I-flag set and can cause route
>    invalidation on behalf of the (sub) child node.
> 
> IIUC, such a malicious 6LR might also spoof a DAO even without this mechanism (to invalidate the "proper" Path Sequence) or otherwise cause denial of service by dropping traffic entirely, so perhaps we want to add another clause ", so this new mechanism does not present a substantially increased risk of disruption".
> 
> [RJ] Reworded and added new statements based on your suggestions to this para.
> 
>    This document assumes that the security mechanisms as defined in
>    [RFC6550] are followed, which means that the common ancestor node and
>    all the 6LRs are part of the RPL network because they have the
>    required credentials.  A non-secure RPL network needs to take into
>    consideration the risks highlighted in this section.
> 
> I'd consider adding "as well as those highlighted in [RFC6550]" to the end.
> 
> [RJ] Added the mentioned words to the end of the para.
> 
> Appendix A.1
> 
>    6.  Node G receives the DCO(tgt=D,pathseq=x+1).  It checks if the
>        received path sequence is latest as compared to the stored path
>        sequence.  If it is latest, Node G invalidates routing entry of
>        target D and forwards the (un)reachability information downstream
>        to B in DCO(tgt=D,pathseq=x+1).
> 
> This wording of "latest as compared to" feels unusual to me; I would have expected "is later than the stored path sequence" and "If it is later", but perhaps there is a convention here that I'm missing.
> 
> [RJ] I could not find a general convention here (after checking in 6550), so I
> changed it per your suggestion to use "later" in both the statements.
> 
> nit: "invalidates the routing entry"
> 
> [RJ] Fixed.
> 
>    9.  The propagation of the DCO will stop at any node where the node
>        does not have an routing information associated with the target.
>        If the routing information is present and its Path Sequence is
>        higher, then still the DCO is dropped.
> 
> nit: maybye reword to "If cached routing information is present and the cached Path Sequence is higher than the value in the DCO, then the DCO is dropped".
> 
> [RJ] This sounds lot better. Thanks for the text.
> 
> Appendix A.2
> 
> I feel like we should probably mention the DelayDAO timer as well as the DelayDCO one.
> 
> I think this is a side note, but it seems like the timer mechanism for DelayDAO (and by extension, DelayDCO) are a bit fragile, as one party has to wait for the full timeout before sending the message (e.g., N22 in this example) that the other party is waiting the timeout to receive (e.g., N11).  So it seems like we are still susceptible to transport delay/jitter and race conditions at some point in the network, even if it's not the next-hop of the target node.  But if that's a property of DelayDAO from RFC 6550, it doesn't really make sense to try to address it in this document (and it's also possible I misunderstand the situation).
> 
> [RJ] DelayDAO and DelayDCO are mechanisms to reduce the control overhead in the
> network. If an implementation chooses not to use it, it will still work ok,
> albeit with more control overhead. Yes the DelayDCO is modeled similar to
> DelayDAO and it is expected that a node simply waits for the timer to expire
> before it could start sending the corresponding message (DCO is this case, DAO
> in case of 6550). I am not fully sure if I completely understand what you mean
> by "susceptibility to transport delay/jitter" but i understand that there could
> be race conditions. If the race conditions happen the worst case problem would
> be slightly more control overhead. Race condition would not result in invalid
> states on any nodes.

I'm happy to trust that you've thought about the race conditions and they
don't cause any real problems.  For completeness, though, I was just
considering the case when the DAO from N22 to N11 arrives before/after the
timout fires on N11, whether N11 would then try to send a DCO down and what
would happen then.

Thanks for all the updates!

-Ben