Re: [Anima] An IOT DIR review of draft-ietf-anima-autonomic-control-plane

Toerless Eckert <tte@cs.fau.de> Thu, 10 May 2018 06:06 UTC

Return-Path: <eckert@i4.informatik.uni-erlangen.de>
X-Original-To: anima@ietfa.amsl.com
Delivered-To: anima@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0D94A126CE8; Wed, 9 May 2018 23:06:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.949
X-Spam-Level:
X-Spam-Status: No, score=-3.949 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_MED=-2.3, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lNt98Nij4SDs; Wed, 9 May 2018 23:06:45 -0700 (PDT)
Received: from faui40.informatik.uni-erlangen.de (faui40.informatik.uni-erlangen.de [131.188.34.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 773DC127078; Wed, 9 May 2018 23:06:42 -0700 (PDT)
Received: from faui48f.informatik.uni-erlangen.de (faui48f.informatik.uni-erlangen.de [131.188.34.52]) by faui40.informatik.uni-erlangen.de (Postfix) with ESMTP id EF53058C4E5; Thu, 10 May 2018 08:06:36 +0200 (CEST)
Received: by faui48f.informatik.uni-erlangen.de (Postfix, from userid 10463) id DF1BE440218; Thu, 10 May 2018 08:06:36 +0200 (CEST)
Date: Thu, 10 May 2018 08:06:36 +0200
From: Toerless Eckert <tte@cs.fau.de>
To: "Pascal Thubert (pthubert)" <pthubert@cisco.com>
Cc: iot-dir <iot-dir@ietf.org>, "ops-dir@ietf.org" <ops-dir@ietf.org>, "int-dir@ietf.org" <int-dir@ietf.org>, "anima@ietf.org" <anima@ietf.org>, "draft-ietf-anima-autonomic-control-plane@ietf.org" <draft-ietf-anima-autonomic-control-plane@ietf.org>
Message-ID: <20180510060636.gspxrd4d7duaksc7@faui48f.informatik.uni-erlangen.de>
References: <449b7e2f10094531b325919710696754@XCH-RCD-001.cisco.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <449b7e2f10094531b325919710696754@XCH-RCD-001.cisco.com>
User-Agent: NeoMutt/20170113 (1.7.2)
Archived-At: <https://mailarchive.ietf.org/arch/msg/anima/eX5EEC-MfPbGigHu9hNqX6RRg48>
Subject: Re: [Anima] An IOT DIR review of draft-ietf-anima-autonomic-control-plane
X-BeenThere: anima@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Autonomic Networking Integrated Model and Approach <anima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/anima>, <mailto:anima-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/anima/>
List-Post: <mailto:anima@ietf.org>
List-Help: <mailto:anima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/anima>, <mailto:anima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 10 May 2018 06:06:50 -0000

Thanks, Pascal, sorry for the delay,

Comments inline.

Version:                                                                                                      
                                                                                                              
https://raw.githubusercontent.com/anima-wg/autonomic-control-plane/2ae8f47399ae0d0811cb45209186d01f9e0d3077/draft-ietf-anima-autonomic-control-plane/draft-ietf-anima-autonomic-control-plane-14.txt

                                                                                                              
Diff to prior version (-14 for Joel Halpern)

http://tools.ietf.org/tools/rfcdiff/rfcdiff.pyht?url1=https://raw.githubusercontent.com/anima-wg/autonomic-control-plane/8b4436edaa720eadb5839120400fd1e89d3289b0/draft-ietf-anima-autonomic-control-plane/draft-ietf-anima-autonomic-control-plane-14.txt&url2=https://raw.githubusercontent.com/anima-wg/autonomic-control-plane/2ae8f47399ae0d0811cb45209186d01f9e0d3077/draft-ietf-anima-autonomic-control-plane/draft-ietf-anima-autonomic-control-plane-14.txt

Will commit as -14 when i am through with the other -13 feedback.

Cheers
    Toerless


On Mon, Feb 26, 2018 at 05:25:58PM +0000, Pascal Thubert (pthubert) wrote:
> Reviewer: Pascal Thubert on behalf of IOT-DIR;
> 
> I am an assigned IoT directorate reviewer for draft-ietf-anima-autonomic-control-plane-13.
> 
> These comments were written primarily for the benefit of the INT and OPS Areas Directors from the IoT perspective. Document editors and shepherd(s) should treat these comments just like they would treat comments from any other IETF contributors and resolve them along with any other Last Call comments that have been received. For more details on the IoT Directorate, please see https://datatracker.ietf.org/group/iotdir/about/ and for Directorates in general please see https://www.ietf.org/about/groups/directorates/.
> 
> 
> I'll be away for  the next 2 weeks and could not finish the review in time for this heavy document, but at least I made it through till the RPL section. In the interest of time, let me share what I already have.
> 
> 
> 
> Summary
> 
> -------------
> 
> The summary of the review is that the document is Ready for Publication, with comments.
> 
> 
> 
> 
> 
> Major comments
> 
> ------------------------
> 
> 
> 
> -          " in-band" and "out of band network "
> 
> should be defined since it is fundamental to understand that the ACP takes place in the same physical links as the data plane, as opposed to dedicated management ports (correct?);

Good point, done. (got a bit too long to paste here, so pls. check diff).

> 
> -          Section 3; the IOT certainly could use an ACP. It would be useful to scope the feature that is proposed in this document, whether it is compatible of not with constrained environments, whether it needs adaptations, point on Michael's enrollment draft. It would also be useful to indicate whether the ACP works between L3 bridges, IOW whether ACP operates the same (over IP) regardless of the packet forwarding layer in the data plane;

Not sure i understand the "point on Michaels enrollment draft". 

I am happy to add pointers to variations of ACP design aspects for
informational purposes to show that/how it can be modified,
but Michaels drafts i think are all variation of the BRSKI design,
so the ACP would be completly unaffected by them, right ?

Wrt. constrained devices and L2. I didn't want to touch section 3
for what you suggested because that really a very formulistic
section going back to the charter 1 justifications of ANIMA  and
matched with the three numbered use case explanations in the
introduction.

Instead i wrote text at the end of the introduction, now section 1.1,
This got longer than i hoped, but it was really a big missing piece
to pitch the ACP and give early on context for implementers and
reminding of the charter goal of reusing the best available existing
protocols/function.

The text got longer specifically also because i did not want to fall
into the trap of making claims whether or not ACP is applicable to
specific IOT networks or (even worse) assuming IOT is always
constrained. Therefore mentioning of OT networks (IMHO big part of
IOT, and often totally non-constrained), explanation of why RPL,
and at the end a statement about constrained environments. There
is also a new paragraph in 10.8 about TCP/TLS vs. CoAP/DTLS as
a possible constained environment variant in the future.

> -         " Inside the ACP VRF, each node sets up a loopback interface with its ULA IPv6 address"
> This is the first time, IPv6 is discussed; would have been nice to introduce that the Node has IPv6, that it needs a ULA and that ACP assigns it. This discussion could be done in conjunction with the comment above;

Actually, section 1. introduction already mentions that the ACP provides
IPv6 and that the stable-connectivity document describes how
to interoperate with IPv4 only OAM.

I added to the new 1.1 section (see above) the following paragraph,
which i hope is a useful context setting and pitch:

<t>The ACP uses only IPv6 to avoid complexity of dual-stack ACP operations (IPv6/IPv4). Nevertheless, it can without any changes be integrated into even otherwise IPv4-only network devices. The data-plane itself would not need to change, it could continue to be IPv4 only. For such IPv4 only devices, the IPv6 protocol itself would be additional implementation footprint only used for the ACP.</t>

> -         About  "The ttl
> parameter SHOULD be 3.5 times the period so that up to three
> consecutive messages can be dropped before considering an
> -           announcement expired. "
> 
> This is the only discussion on the ttl field of the M_FLOOD. Though its meaning is quite obvious, the behavior associated to it should be defined.

Added:

When a service announcer
using these parameters unexpectely dies immediately after sending the M_FLOOD,
receivers would consider it expired 210 seconds later. When a receiver tries to
connect this dead service earlier, it will experience a failing connection and
use that as an indication the service is dead and select another instance of the
same service instead.

> -         "In the above (recommended) example the period of sending of the"
> Is this RECOMMENDED IOW normative??

Hmmm... Sure, why not. Less guessing/experiementation for this.
Modified to RECOMMENDED.

> -         Text P 25 says "At this time in the lifecycle of ACP nodes, it is unclear whether it
> is feasible to even decide on a single MTI (mandatory to implement)
> security association protocol across all ACP nodes"
> but then P27 "It MUST support ESP
> with AES256 for encryption and SHA256 hash and MUST NOT permit weaker
> crypto options."
> 
> and then "   A baseline ACP node MUST support IPsec natively and MAY support IPsec
> via GRE. A constrained ACP node MUST support dTLS.  ACP nodes
> connecting constrained areas with baseline areas MUST therefore
> 
> support IPsec and dTLS."
> 
> Seems that text P25 should go?

P27: An ACP node supporting native IPsec MUST use IPsec security setup
via IKEv2, tunnel mode, local and peer link-local IPv6 addresses used for
encapsulation. It MUST support ESP... (parameters).

clarified to:

P27: An ACP node that is supporting native IPsec MUST use IPsec security setup
via IKEv2, tunnel mode, local and peer link-local IPv6 addresses used for
encapsulation. It MUST then support ESP... (parameters).

Also:

ACP nodes supporting ACP via GRE/IPsec MUST support IPsec security setup..

clarified to:

An ACP node that is supporting ACP via GRE/IPsec MUST then support IPsec security setup..

Aka: P25 is correct, there is no single MTI. 6.7.3 defines the actual
requirement for two different profiles: "baseline ACP node" and "constrainted ACP node".
The two requirements in P27 are only conditional MUSTs defining the details of
the IPsec profiles assuming a node does support IPsec or IPsec/GRE.

Let me know if this is still unclear, and if so, how you would suggest
to make it better readable.

> -           "Use-ULA: For loopback interfaces of ACP nodes, we use Unique Local
> 
> Addresses (ULA), specifically ULA-Random, as defined in [[RFC4193]
> 
> with L=1]."
> 
> This needs to be more crisp. ULA is defined in RFC 4193 but the term ULA-Random is not. I think you mean that 3.2.2.  of RFC 4193 is the way addresses are formed, if so please say so.  The best practice RFC 8064 recommends use of RFC 7217. I understand that privacy is not a concern but does it hurt? Anyway please point at section 6.10 and 6.11.1.11

The term "ULA-Random" was used in the discussions on ANIMA mailing list re.
ULA, so admittedly i didn't check that the term as it stands is actually
not defined/used in rfc4193 - but its just meant to imply ULA with L=1,
nothing more. I've removed the use of ULA-Random from the text and just
refer now to 4193 with L=1 (also pointing to 3.1. of rfc4193 defining L).

I have also added a note that the hash uses our own ACP definition instead
of rfc4193 3.2.2.

Paragraph is now:

<t>Use-ULA: For loopback interfaces of ACP nodes, we use Unique Local Addresses (ULA), as defined in <xref target="RFC4193"/> with L=1 (as defined in section 3.1 of <xref target="RFC4193"/>). Note that the random hash for ACP loopback addresses uses the definition in <xref target="scheme"/> and not the one of <xref target="RFC4193"/> section 3.2.2.</t>

8064/7217 are irrelevant here if i understand it correctly because
we define the addressing scheme for anything following the ULA prefix
ourselves for ACP addresses. Let me know if i do misunderstad what
you where trying to suggest re. 8064/7217.

> -           "
> RPL Mode of Operations (MOP): mode 3 "Storing Mode of Operations with multicast support".  Implementations should support also other modes.
> 
> Note: Root indicates mode in DIO flow"
> 
> Why "should" ? there is no much point supporting the other modes is there? Section 6.11.1.13 says that SRH is not used so this is inconsistent. You only need MOP 2 or 3, 3 is you do multicast which at the moment does not appear to be the case. SO I would MUST a MOP of 2 and MAY a MOP of 3 which is a superset of MOP 2, and that's it (see 6.3.1 of RFC 6550).

Probably a transcription error on my side when i took your
RPL summary and wrote it down. Fixed according to above.


> -           "The lack of a RPI (the header defined by [RFC6553]), means that the
> data-plane will have no rank value that can be used to detect loops.
> As a result, traffic may loop until the TTL of the packet reaches
> zero. "
> 
> Since we have reliable links and no stretch (section 6.11.1.7), loops should be exceedingly rare. It could be recommended to send the DIOs 2-3 times to inform children when losing the last parent. Note that the technique in section "8.2.2.6.  Detaching" of RFC 6550 should be favored over that in section "8.2.2.5.  Poisoning" because it allows local connectivity. Also, It should be said that a node should select more than one parent, at least 3 if possible, and send DAOs to all of then in parallel. This provides multi

Not sure why your paragraph ends apruptly, buts its also in the archive,
so its not a mistake on my email end. Hopefully nothing significant
missing.

I have replaced the suggestive text that followed your above quoted
text in the draft:

<t>There are a variety of heuristics that can be used to signal from the
  data-plane to the RPL control plane that a new route is needed.

With a hopefuly correct transcription of your suggestion:

<t>
  Since links in the ACP are assumed to be mostly reliable (or have link
  layer protection against loss) and because there is no stretch
  according to <xref target="rpl-dodag-repair"/>, loops should be 
  exceedingly rare though.</t>
<t>
  There are a variety of mechanisms possible in RPL to further
  avoid temporary loops: DIOs SHOULD be sent 2...3 times to inform children 
  when losing the last parent. The technique in <xref target="RFC6550"/>
  section 8.2.2.6. (Detaching) SHOULD be favored over that in section 8.2.2.5. 
  (Poisoning) because it allows local connectivity. Also, nodes SHOULD select
  more than one parent, at least 3 if possible, and send DAOs to all
  of then in parallel.</t>

> -           "ACP nodes MUST perform standard IPv6 operations across ACP virtual
> interfaces including SLAAC (Stateless Address Auto-Configuration -
> RFC4862])"
> 
> They may actually prefer Optimistic DAD RFC 4429 since address duplication is highly improbable as long as you .

Added:

        <t>"Optimistic Duplicate Address Detection (DAD)" according to
        <xref target="RFC4429"/> is RECOMMENDED because the likelyhood for
        duplicates between ACP nodes is highly improbable as long as
        the address can be formed from a globally unique local assigned identifier
        (e.g.: EUI-48/EUI-64, see below).</t>

> Minor comments
> 
> ------------------------
> 
> 
> 
> 
> 
> -         About "[RFC7575] defines the fundamental ...  "
> 
> for readability, it may be nicer to indicate the title of an RFC when it is referenced first; e.g. the text above would become
> 
> " Autonomic Networking: Definitions and Design Goals" [RFC7575] defines the fundamental ...

Ok, i am punting this one for now to RFC editor after playing around a bit
with the XML options - and not being satisfied... and having references for 50 RFCs in the doc:

<t>[RFC Editor: Question: Is it possible to change the first occurrances of [RFCxxxx] references to "<rfcxxx title>" [RFCxxxx] ? the XML2RFC format does not seem to offer such a format, but i did not want to duplicate 50 first references to be duplicate - one reference for title mentioning and one for RFC number.]</t>

If RFC editor comes back and can't do this easier than i can in XML,
i'll try to go through the chores when doc is in RFC editor queue.

> -         about "or network plane (there is no well-established name  for this)"
> 
> The term network plane is not used again in the document. This text may go away.

Done.

> You may consider using "security and transport substrate" instead, since it is used elsewhere in the document.

"autonomic communications fabric" = ACP including ACP GRASP
(ACP GRASP provides discovery etc..). "security and transport
substrate" == the parts of ACP used by "ACP GRASP" (aka: The
secure IPv6 forwarding of ACP).

Subtle difference.

> Also, please be consistent on whether you use hyphen or not and use that globally, e.g. for the above, and pane like in "forwarding plane" or "out of band network ";

Fixed up "out of band" (no hyphens) and "in-band" (with hyphen).
Not sure why but this is what MS word spelling checker suggested
to me. RFC editor will override if these are not the best choices.



> -         "data-plen"
> Typo?

Done


> -         "OAM applications ("Operations Administration and Management)"
> 
> Consider using "Operations Administration and Management (OAM) applications " instead; same goes for SDN, ASA, VRF, etc...

Ok. Tried to fix up all the instances i could find.

> -          "   MIC:  "Manufacturer Installed Certificate".  Another word not used in this document to describe an IDevID."
> 
> MIC is not used in the document, maybe inform of this equivalence in the IDevID definition instead; same goes for SUDI. Note that UDI is use just once and may not need an entry here.

The definitions of those non-necessary terms are there to
help others who like me start out not being security
experts and are confused about those equivalent or
related terms.

I specifically didn't want to include discussions about these
terms in the definitions that are relevant (eg: IDevID) so
as not to clobber up that text. Instead, readers would just
look up those redundant terms when they are like me initially
confused about them.

> -               "RPL:  \"IPv6 Routing Protocol for Low-Power and Lossy Networks\".  The routing protocol used in the ACP."
> 
> Maybe point on [RFC6550]?

Done

> -         "Connecting over non-ACP Layer-3 clouds initially requires a tunnel between ACP nodes."
> 
> I understands that it is one tunnel between each pair of adjacent ACP nodes, correct? I read "a tunnel" as an end-to-end tunnel, which sounds different

Changed to:

<t>Connecting over non-ACP Layer-3 clouds requires explicit configuration. See <xref target="remote-acp-neighbors"/>. This may be automated in in the future through autodiscovery mechanisms across L3.</t>


> -          "ACP relies on group security"
> 
> Add "The"


Done

> -          "An ACP node MUST have keying
>    material consisting of a certificate (LDevID), with which it can
>      cryptographically assert its membership in the ACP domain and trust
>      anchor(s) associated with that certificate with which it can verify
>      the membership of other nodes (see Section 6.1.2)."
> 
> This is convoluted. Could you make it 2 sentences?

Fixed to:

<t>The ACP relies on group security.  An ACP domain is a group of nodes that trust 
each other to participate in ACP operations.  To establish trust, each ACP member 
requires keying material: An ACP node MUST have a certificate (LDevID)
and a trust anchor (TA) consisting of a certificate (chain) used to sign the
LDevID of all domain members. The LDevID is used to cryptographically assert 
membership in the ACP domain, the TA to verify the membership of other nodes 
in the ACP domain (see <xref target="certcheck"/>).</t>

> -         "  Note: LDevID ("Local Device IDentification") is the term used to
>      indicate a certificate that was provisioned by the owner of a node as
> opposed to IDevID ("Initial Device IDentifier") that may has been
> loaded on the node during manufacturing time.  Those IDevID do not
> include owner and deployment specific information to allows autonomic
> establishment of trust for the operations of an ACP domain (e.g.:
> between two ACP nodes without relying on any third party)."

> LDevID was already defined in the terminology. This text may move there or go away.

Gone. I added the note that LDevID can not be used directly for ACP to the
terminology definition of IDevID.

> -          "   This document uses the term ACP in many places where its reference
> document use the word autonomic."
> 
> Add [RFC7575] after "reference document"

Done.

> -          "   "routing-subdomain" is the autonomic subdomain that is used to
> calculate the hash for the ULA prefix of the ACP address of the node."
> 
> Do you mean ULA suffix?

No, the Global ID. Fixed.

Actually, i also sumbled across this:

 RFC4193 uses "Prefix" for the first 7 bits of a ULA address, but
we have used use the term ULA prefix to refer to the first 48 bits of a ULA address,
so i've clarified this more in the terminology.

> -          "   o  If the node certificates indicate a CDP (or OCSP) then the peer's
> certificate must be valid according to those criteria. e.g.: OCSP
> check across the ACP or not listed in the CRL retrieved from the
> CDP."
> 
> Please define CDP and OSCP, and/or reference a RFC is possible.

Done. Using RFC5280. Hope thats correct, otherwise IESG SEC review should help.

> -          "enrolment"
> Typo

Fixed.

> -          "This can
> use a single GRASP M_FLOOD message as shown in above example."
> Actually the example is now below. Please reference the figure.

Done. Actually its still "above", or i am heavily confused.

> 
> -           "The protocol could for example could have been"
> 
> Typo

Fixed.

> -           "if the IPsec connecting"
> 
> Typo?

More like sentence structure was strange.

Fixed to:

 "Even if the IPsec connection from Bob succeeded, Alice might prefer another secure protocol over IPsec"

> -           "ACP wide service discovery"
> ACP-wide

Fixed.

> -           "if the IPsec connecting In most other solution
> designs such distributed discovery does not exist at all or was added
> as an afterthought and relied upon inconsistently"
> 
> Consider removing or rephrasing : )

Removed:

Sentence wasn't that bad, but easier removed instead of trying to justify
negative observations about reality without being called out to provide
even more proof.

> 
> -         Maybe consider moving the discussion on multicast P29 -30 to annex? Why Multicast is not used is an interesting discussion but not critical for the protocol operation.

Done.

> -           "it is not quite clear yet what exactly the implications are
> to make GRASP flooding depend on RPL DODAG convergence and how
> difficult it would be to let GRASP flooding access the DODAG
> information"
> 
> Let's chat then. There's work on reliable multicast for RPL using BIER.

Yeah if i would just get around to that ;-)

For now moved together into informative section together with te
multicast discus.

> -         "In the terminology of GRASP ([I-D.ietf-anima-grasp]), the ACP is the
> security and transport substrate for the GRASP instance run inside
> the ACP ("ACP GRASP"). "
> 
> "running" inside the ACP? Maybe rephrase more globally?
> 

This instance of GRASP runs across the ACP secure channels..

> -           "OAM protocols no not require IPv4: The ACP may carry OAM "
> Typo no->do

Fixed.

> -           "Consider a network that has multiple NOCs in different locations.
> Only one NOC will become the DODAG root.  Other NOCs will have to
> send traffic through the DODAG (tree) rooted in the primary NOC."

> A figure would help. I remember all the discussions we had about setting the prf bits in remote NOCs

Sorry. A bit low on cycles right now. Hopefully i'll get around to it later.
Created a wish list entry in changelog.

> -           "RPPL."
> Typo

Fixed.

> -           "Administrative Preference ([RFC6552], 3.2.6  "
> 
> The section is correct but that is RFC 6550.

Done

> 
> -           "This is a standard issue
> with tunneling, not specific to running the ACP across it."
> Do you really mean Standard or would Classical work better?

This is an issue of tunnels, not an issue of running the ACP across a tunnel.

> -           "Even though loopback interfaces where originally d"
> Typo Where -> were

Done.

> -         Section 3, 4, 9 and 10 may move to Annex (by moving the section after the </references> tag) since they are not normative and do not contribute to the understanding of the protocol. This way there should not be a need to indicate normative in other sections.

Sections 3, 4 and 9 are fairly short and the flow of the document
depends on them being in their particular location.

Section 10 could go into an appendix, but it makes not a lot of
difference, but past experience has shown that Annex text is
a lot less likely to be read given how the RFCs are structured.
We had 10 in Annex and moved it up for exactly that reason.

> -         Well Noted that Section 14 Will be removed/.

> Sorry for being interrupted here,


Thanks you so much!

Toerless