Re: [Anima] Roman Danyliw's Discuss on draft-ietf-anima-autonomic-control-plane-27: (with DISCUSS and COMMENT)

Toerless Eckert <tte@cs.fau.de> Tue, 28 July 2020 15:37 UTC

Return-Path: <eckert@i4.informatik.uni-erlangen.de>
X-Original-To: anima@ietfa.amsl.com
Delivered-To: anima@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CE6833A0DCC; Tue, 28 Jul 2020 08:37:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.412
X-Spam-Level:
X-Spam-Status: No, score=-0.412 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FAKE_REPLY_C=1.486, HEADER_FROM_DIFFERENT_DOMAINS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WgJ5e9_QnKzK; Tue, 28 Jul 2020 08:37:47 -0700 (PDT)
Received: from faui40.informatik.uni-erlangen.de (faui40.informatik.uni-erlangen.de [131.188.34.40]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3F9873A0DFA; Tue, 28 Jul 2020 08:37:45 -0700 (PDT)
Received: from faui48f.informatik.uni-erlangen.de (faui48f.informatik.uni-erlangen.de [IPv6:2001:638:a000:4134::ffff:52]) by faui40.informatik.uni-erlangen.de (Postfix) with ESMTP id 3E22F548440; Tue, 28 Jul 2020 17:37:39 +0200 (CEST)
Received: by faui48f.informatik.uni-erlangen.de (Postfix, from userid 10463) id 35527440043; Tue, 28 Jul 2020 17:37:39 +0200 (CEST)
Date: Tue, 28 Jul 2020 17:37:39 +0200
From: Toerless Eckert <tte@cs.fau.de>
To: Roman Danyliw <rdd@cert.org>
Cc: The IESG <iesg@ietf.org>, draft-ietf-anima-autonomic-control-plane@ietf.org, anima-chairs@ietf.org, anima@ietf.org, Sheng Jiang <jiangsheng@huawei.com>
Message-ID: <20200728153739.GI1772@faui48f.informatik.uni-erlangen.de>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
In-Reply-To: <159478218035.5567.5331512017107084574@ietfa.amsl.com>
User-Agent: Mutt/1.10.1 (2018-07-13)
Archived-At: <https://mailarchive.ietf.org/arch/msg/anima/YBKI4SlXvkCcv3scs9stRipUfn8>
Subject: Re: [Anima] Roman Danyliw's Discuss on draft-ietf-anima-autonomic-control-plane-27: (with DISCUSS and COMMENT)
X-BeenThere: anima@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Autonomic Networking Integrated Model and Approach <anima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/anima>, <mailto:anima-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/anima/>
List-Post: <mailto:anima@ietf.org>
List-Help: <mailto:anima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/anima>, <mailto:anima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jul 2020 15:37:52 -0000

Thanks a lot, Roman, very helpfull review.

Diff:

http://tools.ietf.org/tools/rfcdiff/rfcdiff.pyht?url1=https://tools.ietf.org/id/draft-ietf-anima-autonomic-control-plane-27.txt&url2=https://tools.ietf.org/id/draft-ietf-anima-autonomic-control-plane-28.txt

Inline

On Tue, Jul 14, 2020 at 08:03:00PM -0700, Roman Danyliw via Datatracker wrote:
> Roman Danyliw has entered the following ballot position for
> draft-ietf-anima-autonomic-control-plane-27: Discuss
> 
> When responding, please keep the subject line intact and reply to all
> email addresses included in the To and CC lines. (Feel free to cut this
> introductory paragraph, however.)
> 
> Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
> for more information about IESG DISCUSS and COMMENT positions.

Btw: WG members asked me how to determine if reviews like your raise
up to the level of a DISCUSS or just to the level of very good editorial feedback.

As an author, i am not interested in that meta-discussion, but with the root
cause for the concern, which is the timely closure of the DISCUSS.

To that end, the authors think all DISCUSS worthy points raised have been resolved or
satisfatory answered and would therefore hope for a timely closure of the DISCUSS ;-))

> The document, along with other ballot positions, can be found here:
> https://datatracker.ietf.org/doc/draft-ietf-anima-autonomic-control-plane/
> 
> ----------------------------------------------------------------------
> DISCUSS:
> ----------------------------------------------------------------------
> 
> ** As normative behavior is specific for BRSKI (e.g., Section 6.1.5 and
> 6.1.5.5), please make it a normative reference

Hmmm... Right. At an earlier stage, we intentionally downgraded BRSKI
from normative to informational, when we started to rewrite ACP so its
clear how to implement it with non-BRSKI registrars and BRSKI became
fully optional.

Resolution:

I have changed BRSKI to normative.

Let me know if informative wold still be corect given the optional naure of BRSKI for ACP.


> ** Figure 2???s definition of acp-address is ???acp-address = 32HEXLC | "0"???.  The
> following text references a 32HEXDIG but that isn???t in the definition of
> acp-address.
> 
> -- Section 6.1.2.  ???Nodes complying with this specification MUST be able to
> receive their ACP address through the domain certificate, in which case their
> own ACP domain certificate MUST have the 32HEXDIG "acp-address" field.???
> 
> -- Section 6.1.3.  ???The candidate peer certificate's acp-node-name has a
> non-empty acp-address field (either 32HEXDIG or 0, according to Figure 2).???

Resolution:

Thank you so much. Embarrassing inconsistant edit in -23 to move from
32HEXDIG to lower case 32HEXLC. Fixed.

> ** Precision in bounding the cipher selection.
> 
> -- Section 6.7.2.  Per ???Symmetric encryption for the transmission of secure
> channel data MUST use encryption schemes considered to be security wise equal
> to or better than AES256???, which property of AES-256 is being considered for
> this assessment?

Key strength.

Resolution:

... considered to be security wise equal to or better than 256 bit key strength, such as AES256

> -- Section 6.8.2.  Per ???TLS for GRASP MUST offer
> TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 and
> TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 and MUST NOT offer options with less
> than 256bit AES or less than SHA384???, please state this more precisely.

Also  key strength.

Resolution:

MUST NOT offer options with less than 256 bit symmetric key strength or hash strength of less han SHA384

> -- Is it that AES-128 shouldn???t be used or that that AES-256 has a certain key
> strength to which to adhere to?
> 
> -- Is it that SHA-224 or SHA256 shouldn???t be used (staying in the SHA-2 family)
> or is it a certain number of bits of security ?

Note:

If my proposed text does not fix it according to what a security expert would
write, could you be so kind and propose text ? This polishing of security sentences
has been going on a lot, mostly because i do not get proposed better text but
just rejections of the text i attempt to write after reading a lot of
security RFCs, and probably wastig a lot of time in the process.

> ** The text specifies the need for physical controls.  Please be more specific
> on the appropriate degree of that physical control or how that decision should
> be made; and explicitly explain threat of concern.
> 
> -- Section 8.1.1.  ???Thus, the ACP connect interface and NOC systems connected
> to it needs to be physically controlled/secured.???

Resolution: 

<t>Physical controlled/secured means that attackers can gain no access to the physical device hosting the ACP Edge Node, the physical interfaces and links providing the ACP connect link nor the physical equipment constituing the NOC Device. In a simple case, ACP Edge node and NOC Device are co-located in a physcially access controlled room, such as a NOC, to which attackers can not gain physical access to.</t>

> -- Section 8.1.5.  ?????? the ACP connect link and the nodes connecting to it must
> be in a contiguous secure   environment, hence assuming there can be no
> physical attack against the devices.???

Resolution:

Text replaced with:

See <xref target="acp-connect"/>.

Aka: removed text,
replaced with pointer to above enhanced text so as not to duplicate the explanatory text.

> ** (???discuss discuss???) Section 8.1.2.  What is the normative behavior being
> specified in this section?  Specifically, what is the additional or more
> restrictive behavior for the circumstance is where an ACP node is virtualized.

Resolution: 

No text change, but following explanation:

The difference is that in the case of two physcial devices conneced with
a non-encrypted link, the effort of "physical security"/access-control is
a problem/requirement that makes ACP-connect a "workaround", because
the overhead of physical security is higher than that of a cryptographically
secured channel between the ACP nodes.

When ACP-Edge-Router and NOC-Device are just e.g.: two VMs running on the
same device, and the ACP connect link is a virtual linux bridge link,
then this is not a workaround anymore but a good solution because:

a) There is no need for physical security anymore (locked room, authenticate
   people who get access, etc. pp)

b) it does not have to be set up manually (bysically create access protected
   room, wire up two devices), but can be orchestrated automatically by
   software.

c) The remaining security problem of how to ensure that all the software
   components within the systems are protected against attack, including
   the virtual link between the NOC software component and the ACP Edge node
   sotware component is no different than securing the software on any
   ACP node.

I was hoping the text was making this clear. If there is additional text
you want to propose for the document to support this explanation, pls. do so,
but i already felt the existing tex was a good, more compact representation
of what i elaborated above. If that is not the case, then it is easier
for someone like you (reader) than me to suggest text that could be better
understood.

> ** Section 8.2.1.  (I???m no ABNF expert ???) Per the ABNF in Figure 17, the ???//=???
> notation isn???t valid.  I think you want:
> 
> OLD
>      method //= [ "DTLS",    port ]
> 
> NEW
>      method =/ [ "DTLS",    port ]

Resolution: fixed text

Thanks a lot. No idea how this happened. Maybe i was confused with CDDL... Too many
formal languages all slightly different *sigh*

> ** Section 10.2.1.  Per ???An attacker will not be able to join the ACP unless
> having a valid domain certificate, also packet injection and sniffing traffic
> will not be possible due to the security provided by the encryption protocol.???,
> please be clearer:
> 
> -- on path attacker = no packet injection
> -- on path attacker = only traffic analysis when sniffing

Resolution:

Proposed fixed text:

                       <t>Attacker will not be able to join the ACP unless they have a valid ACP domain certificate. On-path attackers without a valid ACP domain certificate can not inject packets into the ACP due to ACP secure channels. They can also not decrypt ACP traffic except if they can crack the encryption. They can attempt behavioral traffic analysis on the encrypted ACP traffic.</t>

> -- compromised node = can inject traffic

Actually, we aim higher.

Resolution:

<t>The degree to which compromised ACP nodes can impact the ACP depends on the implementation of the ACP nodes and their impairment. When an attacker has only gained administrative privileges to configure ACP nodes remotely, the attacker can disrupt the ACP only through one of the few configuration options to disable it, see <xref target="enabling-acp"/>, or by configuring of non-autonomic ACP options if those are supported on the impaired ACP nodes, see <xref target="workarounds"/>. Injecting or ectracting traffic into/from an impaired ACP node is only possible when an impaired ACP node supports ACP connect (see <xref target="ACPconnect"/>) and the attacker can control traffic into/from one of the ACP nodes interfaces, such as by having physical access to the ACP node.</t>

This should be a fair assessment. Of course this is not inclusive of attacks
against aspects of the node which are outside the AP spec. 

[ Note that there are of course ideas for extension work to overcome these issues. ]

> ** Section 11.  Per ???an ACP is self-protecting and there is no need to apply
> configuration to make it secure???, if this assertion is going to be made:

Proposed rewrite of this paragraph:

<t>A set of ACP nodes with ACP certificates for the same ACP domain and with ACP functionaliy enabled is automatically "self-building": The ACP is automatically established between neighboring ACP nodes. It is also "self-protecting": The ACP secure channels are authenticated and encrypted.</t>

Aka: removed reference to registrar etc... (see explanation below), and "no configuration required".
Primarily because "no configuration required" is somewhat duplicate from "automatically".

> 
> -- please specify the security services/properties in a normative section (not
> in the informative text in Section 10).

Statements of high level security properties do IMHO not have to be normative
text because they are an assessment, not a definition. I think this is quite common
for security consideration sections from what i read in other RFCs. The definitions
of protocol machinery and properties of data objects like certificates to achieve 
those claimed properties on the other hand have to be normative, and they are -
in section 6.

> 
> -- please also be clear on what configuration is being referenced.
> 

Much easier to remove as done above. I was thinking of the typical config
that would be required to create an ACP on a non-ACP device, e.g.:
VRF (lite), virtual/loopback interfaces, IPv6 addressing, IPSec config, RPL routing config,
but not enough value IMHO to elaborate about this here.

Of course, not 100% the same, non-autonomic nodes would not have the new
GRASP protocol for discovery. But for the rest, you could get pretty cloe
on an existing non-ACP capable router through config. After all thats the
idea of ANIMA being in OPS - reuse what we have experience with.

> ** Section 11.  Per the list of factors on which ACP depends, it seems like the
> following are missing:
> 
> -- the security properties of the enrollment protocol
> 
> -- that the security considerations of EST and BRSKI apply (or if not, why not)

Resolution: 

No change, but the following explanations:

The ACP is explicitly defined to be a set of nodes with ACP domain certificates,
enrollment/BRSKI is really out of scope. Normative ACP nodes start their existance
with an ACP cert. How they got it is part of a prior life. BRSKI security properties
are covered in BRSKI draft.

I struggled long how to well define registrars given that ACP does not mandate/
specify any specific protocol.  The solution in the document is that there is only
a very abstract definition of the normative requirements against registrars in 6.10.7,
pretty much simple requirements against the resulting certificates such as registrar MUST
NOT assign same addresses to multiple nodes for example. 

Think of a registrar abstractely as this: 

https://datatracker.ietf.org/meeting/102/materials/slides-102-dinrg-dinrg-anima-toerless-eckert-00
Slide 10

While funny, its not far away from a possible reality of a network operator
being a registrar, provisioning ACP certificates manually into ACP nodes, 
and performing all the backend operations (CA, MASA, adressing database).

BRSKI is of course the ANIMA preferred enrollment protocol, and if it is used, an ACP
node is called an ANI node (ACP+BRSKI). Section 3.2 makes the security property
of the ACP for any such bootstrap protocol. For example in communities outside
of ANIMA, NetConf ZeroTouch might be preferred over BRSKI.

The only mandatory ACP part of your above list is EST ONLY for renewal of certificates,
an that of course is specified in the normative section.

The BRSKI draft itself defines how it integrates with ACP (GRASP objective etc.).

Hope this answers satisfactory the concern.

> 
> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------
> 
> (Preliminary ballot.  Need to double check that all of ekr's discusses were
> cleared)
> 
> The style of explaining the design choice after describing an element of the
> protocol was informative and helpful.  Thanks.
> 
> The this document has undergone a significant amount of security review.  Thank
> you for incorporating all of this feedback.

Thanks. If the feedack from the security reviewers would have included more
proposed text it would have been faster. Maybe i wouldn't have learned as much
about security as i did this way, but it was kinda painfull for the WG because
it was lengthy.

> ** Section 6.  It doesn???t seem appropriate to call a protocol ???indestructible???
> unless you are going to enumerate the resiliency properties more precisely ???
> ???many inadvertent changes??? is vague.

ACP is not a protocol. Its a system and node-wide design with the goal to
be "indestructible". Obviously, this is a name-calling to summarize a wide range
of benefits, hence it is in parenthesis.

Proposed text:

<t>This section describes the components and steps to set up an ACP and highlights the key properties which make it "indestructible" against most changes to the Data-Plane, including misconfigurations of routing, addressing, NAT, firewall or any other traffic policy filters that inadvertently or otherwise unavoidably would also impact the management plane traffic, such as the actual operator CLI session or controller NetConf session through which the configuration changes to the data-plane are executed. Physical misconfiguration of wiring between ACP nodes will also not break the ACP as long as there is a transitive path between ACP nodes, the ACP should be able to recover given that it automatically operates across all interfaces of a ACP nodes. Attacks against the network via incorrect routing or addressing information for the data-plane will not impact the ACP. Even impaired ACP nodes will have a significantly reduced attack surface against malicious misconfiguration because only very limited ACP or interface up/down configuration can affect the ACP, and pending on their specific designs these type of attacks could also be eliminated. See more in <xref target="enabling-acp"/> and <xref target="security"/>.</t>

> ** Section 6.  Per ???An ACP node can be ??? or any other IP a capable node???,
> should this be ???IPv6 capable node????

Fixed.

I was coming from practical experience, where devices are sold as IPv4 only
for price/market differentiation. Those devices could typically easily add the
IPv6-only ACP even though the Data-Plane was IPv4 only. I hope by now those
sales/marketing models are not possible anymore ;-)

> ** Section 6.1.1.  Per ?????? it is beneficial to copy the device identifying
> fields of the node's IDevID certificate into the ACP domain certificate ??????, is
> there a ACP-recommended approach for that?

No, because its part of what the registrar does, and even though ANIMA
defines BRSKI to be the preferred registrar protocol, ACP itself and to be 
be widely reuseable describes registrar behavior as abstract as possible.

For example, IETF also has NetConf zerotouch, which could be used 
or any proprietary mechanism. Its IMHO easy in all cases given how the expectation
is that the registrar needs to knows the pledges IDevID to securely identify 
the pledge, and in all protocols i have seen, the registrar is in
the position to make sure the pledges pledges certificate has components in it that
are needed. For BRSKI this is specified in BRSKI.

> ** Section 6.1.3.1. Per ???In the absence of implementing a secured mechanism,
> such an ACP node MAY use a current time learned in an insecured fashion in the
> ACP domain membership check.???, please be clearer on how this current time is
> learned in the domain membership check.

Proposed text:

<t>Current time MAY for example be learned via NTP (<xref target="RFC5905"/>)
over the same link-local IPv6 addresses used for the ACP from neighboring ACP nodes.
ACP nodes that that do provide NTP insecure over their link-local addresses SHOULD
primarily run NTP across the ACP and provide NTP time across the ACP only
when they have a secure time source. Details for such NTP procedures are beyond the
scope of this specification.</t>

There is obviously a chicken&egg problem: If you have a bunch of cheap no-RTC
routers waking up from total power faillure and trying to get together again,
the best outcome to have is to build ACP  like it is Jan 1 1970. If at least
one of those devices does have an RTC, it would start NTP across the ACP
and the others would sync to it. 

The details of all of this are another extension RFC to ACP, although
i guess it can become quite convoluted to describe the autoconfig and all
corner cases. Hence i tried to avoid going down this path in this doc too much.

The text above is roughly what i've used in practice with enterprise IPsec
VPN setups where alas a lot of spoke routers didn't hve RTC.

> ** Section 6.1.5.  Per ???ACP nodes SHOULD be able to remember the IPv6 locator
> parameters ...???, what happens if they don???t remember?

ACP is meant to support many, even uncoordinated registrars. That is why
the addressing scheme has a registrar-ID in the client addresses so that
registrars can independently of each other assign ACP addresses to clients via
their certs without conflicting with each other.

If you only do renewal via EST then it is purely a simplification of diagnostic that
you are doing renewal always with the registrar you originally got your cert
from. The renewal would easily be the same going to any other registrar, but
the logs on the registrar would then not nicely have only the info about
the clients that they initially enrolled.

When clients expire their certs and need to re-enroll, it gets harder to
get the same ACP address again if you do not go to the same original registrar.
There is text for that later in the doc explaining how registrars can still
attempt to honor the original cert ACP address, but this functionality
is probably something we need anothre extension document for with LAMPS
involved (registrars honoring expired certs for the purpose of renewal).

Changing ACP address under such circumstances is fine for ACP itself, but
more work for a backend system having to track such change of addresses
(having to keep track of e.g.: IDevID and changing LDevID/ACP-address).

> ** Section 6.1.5.3.  Per ???The connecting ACP node SHOULD verify that the CRLDP
> certificate used during the HTTPs connection has the same ACP address as
> indicated in the CRLDP URL ??????, why is this not a MUST?

Unlike non-CRL setups, i have no practical deployment experience with CRL.
To the best of my understanding, the content of a CRL itself is protected
such that you can have it relayed through e.g.: caches or the like.
Hence the additional address is just another layer of security whose
added value vs the additional limitations for deployability it may bring are
unclear. Hence a SHOULD.

Eg.: In one example, i had to build an ACP connect through IPv6/IPv4 NAT
because some backend in customers NOC did not support IPv6.

> ** Section 6.1.5.5.  Per ???An ACP node may determine that its ACP domain
> certificate has expired ??? [i]n this case, the ACP node SHOULD convert to a role
> of a re-enrolling candidate ACP node???, what is the alternative if it wants to
> connect back to the network?  Shouldn???t this be a MUST?

This goes back to above mentioned IMHO more desirable, but not yet
not documented or well enough discussed option of simply permitting 
renewal instead of re-enrollment with an expired cert. Logically the
argument for that is that the lifetime in a cert does not necessarily have to
be a limiter to the registrar+CA renewal operations because that is
not a normal "use" of the certificate, and that is whats to be imited by
the lifetime. 

When i was operating an enterprise VPN headend for a while i recurringly
had the eperience that long-time switched off VPN spokes where reactivated,
then the user asked in email why his VPN router does not work, and then
in the absence of better options i had to disable lifetime checks for
this cert on the headend temporarily to get renewal working. (obviuosly, the phone
call was the additional attestation for the node with the expired cert to
be assumed to be still "trusted").

Re-enrollment can vary widely in complexity based on the registrar
used - even with BRSKI, there are lot of MASA options, So it can be
ACP career limiting for specific type of registrars if this was a MUST.

There is one specifically preferred option which would have to go into a
followup document, which is to use BRSKI and BRSKI proxy together with
the expired cert to renew. This exactly would only involve the registrar
having to honor the expired cert, but not the ACP. So its not really
re-enrolling (which would use the IDevID and MASA), but it would use the BRSKI
enrollment infra (BRSKI-proxy).

> ** Section 6.5.  It seems misplaced to describe MacSec as an option for channel
> security even when it is not a profiled in this document.

I disagree. It is IMHO important to provide important fundamental explanations of the
extension points of the ACP architecture, and that is most easily done through examples.

MacSec is a highly desirable extension option because it is the most widely cross-platform
HW-accelerated encryption option, so it is a good example to use in a sentence.

Remember, this is an OPS area system specification where operational
and architecture aspects are important. Also we where asked not to have a separate
architecture document by the initial AD to the WG.

> ** Section 6.7.2. Per ???Signaling of TA certificates may not be appropriate when
> the deployment is relying on a security model when the TA certificate content
> is considered confidential???, where is the requirement to signal TA certificates
> discussed.  How would this selection of signaling a TA work?  The entire
> paragraph prior seemed to explicitly discuss that the TA doesn???t need to be
> shared.

The previous paragraph to the one you are citing explicitly says:

| Nevertheless, for use with ACP secure channel setup,
| there SHOULD be the option to include the TA certificate in the signaling
| to aid troubleshooting, see <xref target="ta-troubleshoot"/>.

I had exhaustive discussions about this on the ipsec mailing list to derive at
the acceptable option, whose details are in the IPsec section of the ACP
(aka: including the TA in the signalled IKEv2 messages without having to
 extend IKEv2, but just relying on permissible existing IKEv2 options).

> ** Section 6.7.2.  Per ???When introducing the profile for security association
> protocol ??????, I recommend being clearer to whom you are providing this advice. 
> This seems to for operators of ACP infrastructure technology (not
> implementers/vendors of ACP technology)

Actually it is definition of extension point requirements for implementers,
so i am sad to learn it is misreadable as an operator related text.

Proposed fix:

<t>When specifying additional security association protocol for ACP secure channel use beyond those covered in this document, protocol options SHOULD be eliminated that are not necessary to support devices that are expected to be able to support the ACP.

> ** Section 6.7.3.  Per ???The ACP usage of IPsec and IKEv2 mandates a profile
> with a narrow set of options of the current standards-track usage guidance for
> IPsec    [RFC8221] and IKEv2 [RFC8247]???, should there be normative wording use
> (MUST) instead of a ???mandates????

The actual normative (rfc2119) requirements of the IPsec profile are later in the section.
This initial text gives just the overview.

Saying something like "you MUST comply with the normative requirements
described later on in this text" would be only confusing and not helpfull.

> ** Section 6.7.3.1.1.  Per ENCR_CHACHA20_POLY1305, ???[t]herefore this algorithm
> is only recommended???, shouldn???t it read as RECOMMENDED?

Again avoiding IMHO unhelpfull duplication of rfc2119 language. The normative requirement was
earlier in the section:

| ENCR_CHACHA20_POLY1305 SHOULD be supported at equal or higher performance
| than ENCR_AES_GCM_16. If that performance is not feasible, it MAY be supported.

The paragraph you mention is explanation of that prior normative requirement.

> ** Section 6.7.3.1.2.  Per ???[RFC8247] provides a baseline recommendation for
> mandatory to   implement ciphers, integrity checks, pseudo-random-functions and
> Diffie-Hellman mechanisms.  Those recommendations, and the recommendations of
> subsequent documents apply well to the ACP.???, it seems like normative language
> should be used to adhered to.

Oops. With the rewrite of this section in the last revisions, the MUST 8247
actually vanished. Great catch.

The text above is introduction, the proposed fix is after that paragraph:

| ACP Nodes supporting IKEv2 MUST comply with <xref target="RFC8247"/> amended by
| the following requirements which constitute a policy statement as permitted by
| <xref target="RFC8247"/>.

> ** Section 6.10.7.3.  The paragraph/sentence starting with ???ACP registrars that
> are aware of can use ?????? doesn???t parse.  The guidance isn???t clear as a result.

Apologies. trash left fixing the sentence. Fixed text and added explanation:

ACP registrars that are aware of the IDevID certificate of a candidate ACP device...

... The PID for example could identify type of
devices allowing for specialized ASA requiring multiple addresses or non-autonomic VMs for
services and those nodes could receive Vlong sub-address scheme ACP addresses.</t>

> ** Section 6.10.7.3.  Per ???In a simple allocation scheme, an ACP registrar
> remembers persistently across reboots ??????, what???s the recover step if it loses
> that state?

Proposed added text:

<t>If allocated addresses can not be remembered by registrars, then
it is necessary to either use a new value for the Register-ID field
in the ACP addresses, or determine allocated ACP addresses from polling the
ACP network nodes.  Non-tracked ACP addresses can be reclaimed by revoking
or not renewing their certificates and instead handing out new certificate
with new addresses (for example with a new Registrar-ID value). Note that
such strategies may require coordination amongst registrars.</t>

> ** Section 6.11.1.1.2.  Not clear what ???DODAG Information Objects (DIOs) SHOULD
> be sent 2 .. 3 times??? means ??? can ???2 .. 3??? please be clarified.

proposed fix:

SHOULD be sent 2 or 3 times

(actual value 2 or 3 left for experimentation, this seems to be a necessary and sufficient
 range from similar protocol experiences).

> ** Section 6.11.1.1.2.  A mechanism for failed ACP detected using a secure
> channel protocol is noted for IPSec (with IKEv2 Dead Peer Detection).  What is
> the equivalent for DTLS?

Good question. If you know someone who could suggest an equivalent, please
bring her in. Given how this is a performance optimization, i don't think
we need to bother too much. I hope we can learn from implementation/deployment
experience (i only hve that for IPsec) and then write update text later with
such refinements.

> ** Section 9.  The section notes that Section 9.1 is ???derived from diagnostic
> of a commercially available ACP implementation???.  The shepherd report from
> 03/2019 notes that there are no implementations of ACP.  If this is documented
> somewhere, it would be very compelling to cite it.

I have deleted that sentence again.  This was not appropriate for the RFC,
we can discuss this on the list or offline.

> ** Section 9.1. Per ???The basic diagnostics [sic] is support of (yang) data
> models representing the complete (auto-)configuration and operational state of
> all components ??????, are these YANG models defined?  Are there references?

Changed to:

Basic standardized diagnostics would require support for (yang) models ...

Wrt to the question: The existing components used by ACP should have YANG models
given how long they have been in use, but they don't necessarily seem to have them.
Eg there as some IPSECME targeted draft like draft-tran-ipsecme-yang-01.txt, not
sure why that died, one could maybe use rfc8049 to represent the VRF for ACP (not sure),
there is an initial dradt for RPL (draft-ietf-roll-mpl-yang-02). From talking
to Kent Watson, the situation for certificates looks better, e.g.:
draft-ietf-netconf-trust-anchors.

Once we have these ACP/GRASP/BRSKI specification RFCs out, we'll be looking for interested
Yang geeks to help with ACP/GRASP yang models. The existing spec intends
to be an informal starting point towards such future  yang work, but there
will no more detailling of what is or is not available in IETF yang world.
Thts a job by itself best done fofr an acp yang rfc. I have some work i
started from 3 years ago lying around... 

> ** Section 9.3.1.  Per ???Whenever this document refers to enabling an interface
> for ACP ??? it only requires to permit [sic] the interface ??????, this seems like
> normative behavior in an informative section.

What is the purpose of the [sic] notation ?

IETF does not allow to raise normative requirements against informal operator
interface decription, which is what this section is. I guess that would
require a formal CLI model which we don't have.  Hence the normative
requirements could only come from a YANG model.

Hence this section is informational. It is also good to not formalize a normative
operator interface without first having enough experience collected IMHO.

> ** Section 9.3.2.  That this is an information section is noted.  It would
> benefit from describing what precisely can and cannot be done in the three
> states proposed ??? up, down and admin down.

Hmm... i think it says everything it can say, except... see your following point.

> ** Section 9.3.2.1.  What is the proposed threat that using admin down is
> intended to mitigate?  Under what circumstance should it be invoked?

Great. I remember we went several times through this text especially with routing
directorate, but nobody noticed that the simple core example was not explicitly
written down. Instead we had good discuss about the proposed how
(separartion of down insto adin-down/physcial down). Which is a good reason
too not to have this section normative yet. But i digress.

I added the following paragraph to the section. The configuration summary section
actually already summarizes this core reason as well, but it really needs to be in this section.

<t>One of the common problems of remote management is for the operator or SDN controller to cut its own connectivity to the remote node by a configuration impacting its own management connection into the node. The ACP itself should have no dedicated configuration other than aforementioned enablement of the ACP on brownfield ACP nodes. This leaves configuration that can not distinguish between ACP and Data-Plane as sources of configuration mistakes as these commands will impact the ACP even though they should only impact the Data-Plane. The one ubiquitous type of commands that do this on many type of routers are interface "down" commands/configurations. When such a command applied on the interface through which the ACP provides access for remote management it would cut the remote management connection through the ACP because (as outlined above), the "down" commands typically impact the physical layer too and not only the Data-Plane services.</t>

<t>To provide ACP/ANI resilience against such operator misconfiguration, ...

> ** Section 9.3.2.1.  Per ???"Admin down" state as described above provides also a
> high level of security because it only permits ACP/ANI operations which are
> both well secured???, to what is the ???both??? referring? I suspect this is
> editorial (but just in case, noting here).

Added sentence in before for more missing explanation:

<t>An example of non-ACP but ANI traffic that should be permitted to pass even in "admin-down" state is BRSKI enrollment traffic between BRSKI pledge and a BRSKI proxy.</t>

Now the answer to your question is that the "both" in the sentence you ask
about refers to ACP/ANI (both ACP and ANI).  If this is not good english, pls. let me know.

> ** Section 10.2.2.  Per ???For example, management plane functions (transport
> ports) should only be reachable from the ACP but not the Data-Plane???, this
> seems like good guidance.  Is there a reason not to upgrade this informative
> statement and put it the Security Considerations as normative guidance?

Give it time. We're in round 1:

Existing router management plane host stacks may have severe issues separating
out access by VRF context. I remember some of the per-VRF SMI work.

Enterprise operators may have severe issues with this short-term. For example,
the typical solution would be for network admins to have to VPN back into some
headend from where the ACP is accessible. That is a lot of pain especially when
you are trying to troubleshoot locally and the headend is remote.

To make future network admin staff work not more convoluted because of ACP,
but actually more secure AND easier, we need some easily useable ACP-access method
for e.g. network admin notebooks. Given how i at times had 4 VPNs on my notebook
and at most 2 VPN would work at the same time, i am not sure if this documents
IPsec spec would be good for operator notebooks. But maybe an ACP acces method
via 802.1x... TBD.

> ** Section 10.2.2.  Per ???Protection across all potential attack vectors is
> typically easier to do in devices whose software is designed from the ground up
> with security in mind than with legacy software based systems where the ACP is
> added on as another feature???, no argument on the general principle.  However,
> as it relates to ACP:

Changed text "security in mind" -> "ACP in mind".

I think there a initiatives looking at secure devices, but thats not the
same as stricter isolation between ACP and the rest of the system.

> --what???s an example of the legacy software?

I had a presentation about designs for ACP a few years back in an IEEE conference
touching those aspects.

To me, legacy is probably most router software infra today, where you
would need to implement ACP as one of many VRFs that the software may already
already support as part of e.g.: L3VPN or similar services. 

Non-legacy is something where the router is actually a VM or container 
running on a hypervisor. And ACP would be part of the hypervisor (e.g.: linux). Or
where ACP does not even run on the same CPU/FPE as the router, but 
on e.g.: BMC HW/software (OpenBoot etc..). Lots of interesting design
choices possible.

> -- as noted in the shepherd report from 03/2019, there are no implementations,
> so is there reason to believe that this is going to put on ???legacy??? platforms?

Lets use different mail threads to discuss this.

> ** Section 10.2.2.  Per ???As explained above, traffic across the ACP SHOULD
> still ??????, is RFC2119 language really intended in this informative section?

Fixed to lower case. Thanks.

> ** Section 11.  Per ???Security can be compromised by implementation errors
> (bugs), as in all products???, given the generic nature of this statement,
> couldn???t it also be a configuration error in the product too?

;-) Not really because the claim of ACP is that all the core parts of ACP
have no configuration. See section 9.5. 

Given how more than 50% of the whole document to me feel like talking about
security, i do have a hard time figuring out a mandatory logic of what belongs into
section 11. My solution  was that it is a mixture of important summaries and
then tidbits that didn't have a better place to be explained.

Given how the topic of configuration and misconfiguration of ACP was exhaustively
discussed in prior sections, i didn't feel the need to add it here again.

But always happy to consider proposed text.

> ** Section 11.  Per ???Higher layer service built using ACP domain certificates
> should not rely on undifferentiated group security ?????? is there a reason not to
> make this a normative SHOULD?

section 11 is not normative, and actual normative requirements for new stuff
across the ACP is something that was not in charter when we defined this document,
aka: i wouldn't have a logical place in this doc for this beside as a general
security considerartion.

Writing normative about ASA IMHO just became possible after recharter of WG last
year, so i would want to put normative requirements about this probably into ASA 
docs we are starting to write.

> ** Editorial
> 
> -- Recommend being consistent on either ???ACP domain certificates??? or ???ACP
> certificates???

ACP certificates.

> -- Section 1.  Editorially.  The two sentences ???Section 7 defines normative how
> to ?????? and ???Section 8 explain normative how ?????? don???t parse as the adjective
> normative needs a noun to modify.

Fixed by putting (normative) at the end of the sentences.

> -- Section 6.1.4.  Editorial.  s/These requirements can be achieved by using TA
> private/These requirements can be achieved by using a TA private/

fixed

> -- Section 6.2.  Editorial. s/does intentionally not/intentionally does not/

fixed

> -- Section 6.5.  Editorial.  Per ???Note that MacSec is not required by any
> profiles of the ACP in this specification but just mentioned as a likely next
> interesting secure channel protocol.???, does not parse.

Fixed by separating sentences: Instead, MacSec is mentioned as ...
> 
> -- Section 6.10.7.  Per ???ACP registrars are responsible to enroll candidate ACP
> nodes with ACP domain certificates and associated trust point(s)???, is a trust
> point the same thing as a trust anchor?    If so, I recommend being consistent.
>  If not, then please define it.

Oops. how did that slip through. Thought i had fied all points to be anchors.
Fixed now.

> -- Section 6.11.1.  Please make draft-ietf-roll-applicability-template a
> reference.

Already in -27.

> -- Section 6.11.1.5.  Editorially.  It might be worth framing the path metric
> in the form of sentence.

Fixed to:
Use Hopcount according to xref target="RFC6551"
> 
> -- Section 11. s/enemy plegdes/rogue pledges/

Was already fixed in -27 to "malicious registrar"
(on prior reviewer suggestion i think)

> -- Section 11. Per ???Fundamentally, security depends on avoid operator and
> network automation mistakes ??????, this paragraph is not actionable.  Recommend
> removal.

Fixed with proposed replacement paragraph:

<t>Operators and provisioning software developers need to be aware of how the provisioning/configuration of network devices impacts the ability of the operator / provisioning software to remotely access the network nodes. By using the ACP, most of the issues  of configuration/provisioning caused loss of connectivity for remote provisioning/configuration will be eliminated, see <xref target="self-creation"/>. Only few exceptions such as explicit physical interface down configuration will be left <xref target="admin-down"/>.</t>

First sentence should now be actionable to operators/developers.
Second summarizes benefits of ACP. Third gives example of limitations.

Core value proposition of ACP hence useful to not just eliminate in security section
(assuming certain readers will primarily read security section). 

This answers the actionable point. whether or not misconfiguration
that makes a network become unmanage is a security issue, i can't judge,
because i am not sure i know a precise definition of "security"


> ** Typos
> Section 1.  Typo. s/parth/path/
> Section 1.  Typo. s/seperately/separately/
> Section 1.  Typo. s/automaticically/ automatically/
> Section 1.  Typo. s/managemenet/ management/
> Section 1.  Typo. s/absene/absence/
> Section 1.1.  Typo. s/solution:/solution/
> Section 2.  Typo. s/netork/network/
> Section 2.  Typo. s/physcially/physically/
> Section 5.  Typo. s/(see (see/(see/
> Section 5. Typo. s/loopack/loopback/
> Section 6.1.1.  Typo. s/e.g.:signing/e.g., signing/
> Section 6.1.1.  Typo. s/signalled/signaled/g
> Section 6.1.1.  Typo. s/bei/by/
> Section 6.1.2.  Typo. s/simpy/simply/
> Section 6.1.2.  Typo. s/readible/readable/
> Section 6.1.2.  Typo. s/Adresses/Addresses/
> Section 6.1.2.  Typo. s/manadatory/mandatory/
> Section 6.1.2. Typo. s/inapproprite/inappropriate/
> Section 6.1.3.1.  Typo. s/as as/as/
> Section 6.1.3.1.  Typo. /insecured/insecure/
> Section 6.1.3.1.  Typo. s/likley/likely/
> Section 6.2. Typo. s/IKEv2 has am/IKEv2 has an/
> Section 6.7.2.  Typo. s/successfull/successful/
> Section 6.7.3.1.1.  Typo.  s/superceed/superseded/
> (I stopped documenting spelling errors at Section 6.7.3.1.1.  Please run a
> spell checker before handing this off to the RFC Editor)

Sorry for this editorial trouble. I was hoping that you would not
spend time on this. Yes, of course i will do a more thorough check
before RFC editor. My excuse for not doing it on every rev is
quite lame, but i have not found a spell checker that remembers
all the non-standard words correctly, so a few changes in a 160 page document
is terrible each time...

Thanks so much for the excellent review! Please consider removing the
discuss.

Cheers
    Toerless