[Anima-bootstrap] Detailed BRSKI review, part 1

"Michael Behringer (mbehring)" <mbehring@cisco.com> Mon, 17 October 2016 15:38 UTC

From: "Michael Behringer (mbehring)" <mbehring@cisco.com>
To: "anima-bootstrap@ietf.org" <anima-bootstrap@ietf.org>
Thread-Topic: Detailed BRSKI review, part 1
Thread-Index: AdIojIowvRHV0Q7aS5uVfJWO/hywbQ==
Date: Mon, 17 Oct 2016 15:38:46 +0000
Message-ID: <9ffa17925cdd4a43a0aeca04e06c906d@XCH-RCD-006.cisco.com>
Accept-Language: en-GB, en-US
Content-Language: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/anima-bootstrap/HLlYsSPpsJ_uSLVIwC2uF6_iMp0>
Subject: [Anima-bootstrap] Detailed BRSKI review, part 1
Precedence: list

Looking at version -03. There is a lot of detail in this document, and overall I think it's very good! Tons of work went into this doc, and it's pretty apparent!

The biggest question to me is the one from my email from Friday: In the original document we assumed the audit method, and wrote the entire document around it. And while I was originally in favour of writing the doc for a single method, and afterwards explaining exceptions, but I now think reality is that we'll see the three methods I mentioned in my mail in parallel:

1) join any domain (first come first join)
--> No MASA required
2) require audit token
--> MASA required, audit mode
3) require authentication token
--> MASA required, ownership tracking mode

My main comment, thus, for discussion is: Given that we're likely to see the three variants in parallel in real life, we should probably explain in each step how the different variants affect this particular step. This would require quite some work, but my feeling is it's needed for clarity.

The other think I would like to discuss: We now kept ANIMA pretty much outside scope. This seems wrong to me. Yes, we should not REQUIRE ANIMA, but we should still explain how BRSKI works in an ANIMA context. This would add some small notes in various places (see detailed comments). For example, we don't explain that proxy to Registrar connection is through the ACP, and that Registrar is found through GRASP. That feels wrong.

Detailed review comments, mostly editorial, but sometimes important (I think :-) are below.

Michael

- section 1: " A complexity that
this protocol deals with are dealing with devices from a variety of
vendors, and a network infrastructure (the domain) that is operated
by parties that do not have any priviledged relationship with the
device vendors."

I don't understand "priviledged relationship". I guess we want to say that any domain can claim a device, and that domain doesn't have to be authenticated by the vendor, right? It might be clearer to say "prior relationship". The work "priviledged" is not entirely clear, I find.

- if we use "pledge", we should use it throughout. I suggest to do a global search/replace. Like, the intro uses "new entity" many times.

- definition of "pledge": I don't think we should link to "definition 6" on a web page, since that could well change.
That definition sounds very weird to me. "Neither the device nor the network knows if the
device yet knows if this device belongs with this network." (he?) Surely, the device knows if it knows the domain?!?!
Do we need this sentence !?!?
Here we say the identity is coming from a "factory". Above it was "third party". We should use the same term, consistently.

What about:
Pledge: the new device seeking to join a domain, with an identity provided by a third party (e.g., vendor, integrator).

- I don't understand capitalization of the definitions. When uppercase, when lowercase? Should probably be uppercase for all?

- "Optimal security is achieved with IEEE 802.1AR certificates on each
new entity, accompanied by a third-party Internet based service for
verification."

I suggest to add after "third party" something like "(e.g., manufacturer, integrator) (that's what we mean, right?)

- should we not define MASA? I think we need to. (I see it's defined later on page 7, but it should be defined up front.)

- Page 5: "imprint: the process where a device obtains the cryptographic key
material to identity and trust future interactions with a network."
s/identity/identify/

- Page 5: definition of "DomainID": I suggest to add "see section 4.2.1.2 of RFC5280" (I found it, but would have found it faster with this ref ;-)

- reference I-D.irtf-nmrg-autonomic-network-definitions should be replaced with RFC7575. (globally)

- Page 5: audit token / authorization token / ownership voucher: I think we really have TWO things only, the audit token and the ownership voucher.
Are we still using "authorization token" at all? If not, let's take it out. In any case, I don't think we ever used "authorization token" equal to "audit toke", which this section implies.
If we use "third party" elsewhere, we should also use it here.
Both get issued by the same entity, but are used for different things. This should be reflected here.
Right now, one is from a "manufacturer" the other from a "vendor", etc. All inconsistent. I suggest let's define "third party" above (see comments above) and then only use that term.

What about:
- Audit Token: A signed token from the MASA of a third party (e.g., manufacturer, integrator) indicating that the bootstrapping has been successfully logged, including historic logging information from this device.

- Ownership Voucher: A signed token from the MASA of a third party (e.g., manufacturer, integrator) indicating that a specific domain "owns" the pledge as defined in [netconf draft]

Page 6

Section 1.2: IOT suitability: "In general the answer is no" - I would prefer to rephrase to: "This depends on the capabilities of the devices in question. The terminology of ..."
(because capabilities may well change in the next years, potentially making the general answer a "yes")

- "delays for privacy reasons" - can you expand?

Section 1.3: " between the domain Registrar and the new device". Replace "the" with "a" - there can be more than one Registrar. Alternatively, "between domain trust anchor and new device". (do a global search and replace "the Registrar" with "a Registrar")

We have comments about constrained devices a bit all over the intro. I suggest we collect them under a separate heading.

Section 2

Under Figure 1 we define terms, and in the intro we did as well. There are overlapping ones (pledge and new entity), similar ones (domain and domainID), it's confusing. I suggest we take all the definitions into the intro section. Figure 1 doesn't actually introduce new terms.

MASA versus Ownership tracker: The document makes it sound like two different entities ("MASA or Ownership Tracker") In my mind, there is a single entity which I thought we called MASA. It has two functions, one is auditing, the other is issuing ownership vouchers. In the discussion from last week this seemed to be the consensus as well. In that case, we need to adapt the drawing and the explanations. "MASA service" provides two services "ownership attestation" and "audit logging".

(This is a bit of a repeat of my last email: I think we should define the three general models up front (ownership validation, audit log, and no MASA), explain the terms of the tokens up front, and what goes where.)

3.1: " A New Entity MUST NOT automatically initiate bootstrapping if it has already been configured." Add: "or is in the process of being configured.

Figure 3:
Bullet 1: I would remove "closest" Registrar. I think there will be many criteria. Say here "to a Registrar". (And, we should capitalise "Registrar" consistently)

Bullet 2: replace " (Although the Registrar is also authenticated these credentials are only provisionally accepted at this time)" (confusing) with " (The Registrar credentials are only provisionally accepted at this time)"

I think bullet 2 and 3 are actually the same operation. By presenting itself, it implicitly requests to join. We should not make it sound like these are two distinct operations. Make it one box and call it "Request join (presenting ID)". If we do this, then we should also merge 3.1.2 and 3.1.3.

Bullet 4 / Imprint operation:

A specific device may require a MASA token to bootstrap, another one may NOT. This is really a feature of the pledge. And this behaviour MUST NOT be changeable (ie it's hard coded). (somewhere we should state that, I think we don't so far).
In the "Imprint" step three errors can happen: 1) The device receives a bad MASA token, or doesn't receive one; and 2) the domain Registrar receives a bad or no MASA token or 3) the audit log makes the Registrar reject the device. For trouble shooting, I think it is imperative that in 1) the pledge informs the Registrar of the error, and in 2) and 3) the Registrar informs the pledge (e.g., to turn on a red LED, such that the installer knows that an error condition has arisen. I think we don't cover those cases yet?

3.1.1
" The result of discovery is logically (should be "logical") communication with a Proxy instead ... " I would have said it the other way round, and reduced that paragraph to: " The result of discovery is a logical communication with a Registrar, through a Proxy."

" To discover the Domain Bootstrap Server" you mean " To discover a Registrar" - right? I suggest to remove the term "bootstrap server" completely (globally) to avoid confusion.

a): We exclude a case with normal DHCP for IPv4. Do we really want to do this? Also, if option d) is the only one working, we require DNS to work. So a) should probably be expanded to include these options?
b): Do we need an IANA registration for the "_bootstrapks._tcp.local" service? We have no IANA considerations section!!
c) We're using both "example.com" and "example.net". Only use .com (http://www.iana.org/domains/reserved)
d) "Vendors that leverage this method SHOULD provision appropriately." Explain? I don't understand what that means?

Not sure, just verifying: Our proxy methods would work if the pledge is IPv4 and the Registrar IPv6?

"to avoid overloading that discovery methods network infrastructure." Does that make sense? I think "to avoid overloading the network infrastructure with discovery".

In the reference model we state that if a pledge has been rejected by a domain, it should preferably use other domains that are seen. We may want to add something at the end of 3.1.1. This is also the reason why the pledge needs to know if the Registrar has rejected it based on MASA input.

s/Therefore or clarity/Therefore for clarity/

3.1.2 suggest to merge with 3.1.3. The "request join" includes the "identity", really. These are NOT two separate steps.
s/ bootstrapping protocol server/Registrar/g
s/bootstrapping server/Registrar/g
s/Bootstrapping server/Registrar/g

3.1.4
The non-autonomic methods are confusing here. I wonder whether we should exclude them? Are they really in scope?

The pledge must support three modes:
1 - (no MASA): doesn't require an ownership voucher or audit token
2 - (MASA with audit only): requires an audit token
3 - (MASA with ownership tracking): requires an ownership voucher.

3.1.5
" o In accordance with IEEE 802.1AR and RFC5280 all manufacturing
installed certificates and trust anchors are assumed to have
infinite lifetimes. All such certificates "SHOULD be assigned the
GeneralizedTime value of 99991231235959Z" [RFC5280]. The New
Entity, Registrar and MASA server MUST ignore any other validity
period information in these credentials and treat the effective
lifetime as 99991231235959Z. This ensures that client
authentication (see Section 3.3.1) and the audit token signature
(see Section 5.3) can always be verified during RFC5280 path
validation."

The MUST statement implies that a MASA etc actually knows whether a certificate is 8201.AR or another type of cert, right? Is that true? When I look at a device certificate, how do I know it's an IDevID?

Assuming you *can* distinguish IDevID from a "normal" cert, we may run into cases where "normal" certs are used in the function of an IDevID, right? I.e. a device type doesn't really support IDevID, but a manufacturer has pre-loaded certs at manufacturing time.

This "All such certificates "SHOULD be assigned the GeneralizedTime value of 99991231235959Z" [RFC5280]. " in combination with "MUST ignore" makes me nervous...

We're referring to an audit token in this section, but not to the other 2 methods (Onwership voucher and no MASA). This isn't complete...

Specifically, in a case without MASA, I think we need to simply state that we cannot validate time during enrolment. I think this is what the statement "When accepting an enrollment certificate the validity period
within the new end entity certificate is assumed to be valid by
the New Entity." wants to say?

Actually, we only look at the domain validating time from the pledge, shouldn't we also describe the other direction? -->
Wouldn't it be correct to say "A pledge without real-time clock cannot securely bootstrap time. During the bootstrap process it accepts all certificates without validating time. Once bootstrapped such devices MUST be provided with the current correct time for other PKI operations to succeed."

This whole section 3.1.5 makes me a bit nervous...

3.1.6
"The New Entity contacts the Registrar" add "via a proxy". We always assume a proxy.

In this section we don't foresee a case without MASA sever. (Bullet list)

" o The EST server is authenticated by using the Ownership Voucher
indicated fully qualified domain name to build the EST URI such
that EST section 4.1.1 bootstrapping using the New Entity implicit
Trust Anchor database can be used."

Read this several times, still don't parse it. Can we make this sentence simpler? Not even sure this is grammatically correct?!?

Also this section, I think we should distinguish the three cases of MASA. Last paragraph starts with "once the audit token is received". What if there is none or an ownership voucher?

3.1.7
As mentioned in my other mail, I would prefer to call the final state here "enrolled". We could explain here that in the case of ANIMA, the next step is the establishment of the ACP, see draft ... and in the non-ANIMA case we expect normal management to take place, ex via NETCONF, ... But I suggest to have a reference to the ACP draft.

3.2
We should re-state here that architecturally, a Pledge ALWAYS interfaces a Proxy; if the directly adjacent device happens to be a Registrar, it has to present itself to the pledge in the same way a normal Proxy would.

"the chosen mechanism SHOULD... " - This is the mechanism we specify later in the doc, right? (Sounds like this is a requirement outside this doc). Then I would re-phrase "the chosen mechanism was designed to ..."

I disagree with the *general* goal "SHOULD use the minimum amount of state on the proxy device." This is a good goal for constrained devices, but in a normal network we always try to handle DoS for example as far "out" as possible. (We had that discussion a while back).

What are we planning to do with draft-richardson-anima-state-for-joinrouter? It contains valuable background. Wouldn't it be nice to have that as an appendix in brski? (However, then the naming would need to be adapted to the brski terminology).

Add: "If this bootstrap mechanism is used in an ANIMA context, the proxy device will discover Registrar(s) through GRASP based discovery, inside the ACP. The connection from the Pledge will also be forwarded inside the ACP." A proxy will only be enabled when a device sees a Registrar; if it loses connections to all Registrars, it withdraws the proxy service announcements.
Or did we decide to leave ANIMA completely out of the draft? (I thought we wanted it independent, but ANIMA is still the main use case for now).

3.3
I think we need to take a step back here. First, explain that the registrar is typically configured. Then, we need to give a bit more context: On one side, it expects connections from pledges, on the other we have a CA connection and (optionally) a MASA.
Then, in an ANIMA context, the Registrar(s) announce their service inside the ACP, and they expect to be contacted by proxies through the ACP.

3.3.2
The whole document is focused on the audit method; If this is the main method, then we MUST explain the white list here, because neither of the 3 bullets in this section is sufficient for authorizing exactly "my" devices. (I realise white lists appear later on).

Paragraph "In order to validate the IEEE 802.1AR device identity..." belongs into 3.3.1.

s/it is expected request/it is expected to request/

"these certificates can subsequently be used to determine the boundaries of the homenet..." - remove the homenet references here. I suggest to re-phase: "These certificates can be used for other methods, for example boundary detection, auto-securing protocols, etc.".

"The authorization performed during this phase MAY be
cached for the TLS session and applied to subsequent EST enrollment
requests so long as the session lasts." - not clear?!? Each request is for a single device. Why cache?

I stop the detailed review here for a moment, since my comments would depend too much on how we resolve the question asked above about the 3 methods. Will resume here once we settled on this...

[Anima-bootstrap] Detailed BRSKI review, part 1 Michael Behringer (mbehring)
Re: [Anima-bootstrap] Detailed BRSKI review, part… Brian E Carpenter
Re: [Anima-bootstrap] Detailed BRSKI review, part… Max Pritikin (pritikin)
Re: [Anima-bootstrap] Detailed BRSKI review, part… Michael Behringer (mbehring)