Re: New version of certspec (01); request review and URN assignment

Sean Leonard <dev+ietf@seantek.com> Fri, 08 November 2013 18:12 UTC

Content-Type: multipart/signed; boundary="Apple-Mail=_0A5E82C6-8554-488D-B32A-6D238A9F339F"; protocol="application/pkcs7-signature"; micalg="sha1"
Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\))
Subject: Re: New version of certspec (01); request review and URN assignment
From: Sean Leonard <dev+ietf@seantek.com>
In-Reply-To: <201311042214.rA4MEeNj4209694@shell01.TheWorld.com>
Date: Fri, 08 Nov 2013 10:12:15 -0800
Message-Id: <F889CBF5-F5DC-4A49-9C89-A49B0B92F87D@seantek.com>
References: <201311042214.rA4MEeNj4209694@shell01.TheWorld.com>
To: "Dale R. Worley" <worley@ariadne.com>
Cc: urn-nid@ietf.org, Alexey Melnikov <alexey.melnikov@isode.com>
Precedence: list

Hi Dale,

Once again, thank you for the comments. My feedback is below.

On Nov 4, 2013, at 2:14 PM, Dale R. Worley <worley@ariadne.com> wrote:

> This draft looks interesting but needs a lot of cleaning up.
> 
> In general, there are a lot of things that aren't spelled out clearly
> or are forward-referenced (the import of a paragraph is only clear
> when text much further down is read).
> 
> (Added after the following paragraphs were written:)  Have you
> considered the poissibility that "cert" is a URI, not a URN (and thus
> "cert" should be a scheme rather than a NID)?  From practical
> necessity, certificates will have multiple certspecs, which means that
> certspecs aren't *names*.  And a certspec doesn't show how to obtain
> the certificate, so it isn't a URL.  But it is clearly a URI…

I understand the debate around this. Overall, I have written the draft as an URN specification rather than an URI specification, for a few reasons:
The main "name" requirement (as I understand it) is that a particular URN must only refer to one item, and that binding must be permanent. For example, urn:ietf has multiple ways of referring to the same document (urn:ietf:rfc, urn:ietf:bcp, urn:ietf:std)--the point is that once an RFC/BCP/STD number is assigned, the binding is permanent.
For all intents and purposes, URNs are not protocols and are not intended to be protocols--URNs are "resolved" but not "dereferenced". URIs do not have to represent protocols, but most of the popular ones do. Even the non-protocol launching schemes (mailto:, callto:, tel:, and the myriad non-registered app schemes like yelp:) are basically mechanisms where dereferencing the URI results in accessing a resource in a particular, well-defined way (create an e-mail, call someone with Skype or another telephony app, look at reviews on Yelp app, etc.). In this case, the application using the cert URN may engage in any number of behaviors--but generally not launching activity.

There are other reasons but I hope that the latest draft (-02) mostly spells these out. Happy to debate, however.

> 
> There is inconsistency regarding whether a URN may retrieve multiple
> certificates.  In section 2:
> 
>   When a resolver resolves certspec, the resolver's output is either
>   a single certificate or nothing.
> 
> In section 4:
> 
>   A certificate specification (certspec) unambiguously identifies a
>   single certificate.
> 
> In section 8:
> 
>   When using specs that depend on certificate data, the implementation
>   MUST be prepared to deal with multiple found certificates that
>   contain the same certificate data, but are not the same certificate.
> 
> Perhaps I am not understanding that there is a difference between "the
> same certificate" and "certificates that contain the same certificate
> data", but if there is such a difference, the text in the other
> sections needs to take that into account.  Or is it that "the
> implementation MUST be prepared to deal with the error condition of
> multiple found certificates"?

Cleared up in -02. It is supposed to be just one certificate.

> 
> There is a lot of use of the terms "spec" and "certspec" but I can't
> find definitions for them.

I tried to eliminate the term "spec" and am now using "certspec" exclusively to refer to the cert URN, specifically, the specification type + value combination (i.e., not including the query or fragment components).

>  I suspect that they are terms that have
> arisen in a specific discussion context, but since they are not widely
> known, they should be replaced with phrases that are more explanatory
> or given a clear definition upon first use.  There also seem to be
> some ambiguities, e.g., "spec" is sometimes used to mean the spec-type
> string itself, and sometimes the subset of "cert" URNs that have that
> spec-type value.
> 
> The syntax is given in section 3 as:
> 
>      NSS = spec-type ":" spec-value ( '?' certattrs)
>        spec-type = scheme
>        certattrs = <URN chars>
>        hexOctet  = hexDigit hexDigit
>      hexDigit  =
>             "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" /
>             "a" / "b" / "c" / "d" / "e" / "f" /
>             "A" / "B" / "C" / "D" / "E" / "F"
> 
> It appears that "scheme" is intended to be "<NID>" from RFC 2141.  Or
> is it the similar but different "scheme" from RFC 2396/3986?  "<URN
> chars>" appears to be "<URN chars>" from RFC 2141.  "hexDigit" could
> be replaced with a similar reference to "<hex>" from RFC 2141.

Thank you. I completely rewrote the ABNF so that there is one large ABNF block that covers all "standard" certspecs. I preferred RFC 5234 definitions where available, such as HEXDIG.

> 
> The external references could be made clearer, and should be made
> explicit.  And the fact that the defined spec-types and the
> corresponding syntaxes for spec-values are given below is not stated
> here.  It would help if this BNF was explicitly labeled as the
> "generic syntax" for "cert" URNs.

Yes. See -02.

> 
> Section 3.2 says:
> 
>   A certspec can include attributes that are associated with the
>   identified certificate.  These attributes do NOT affect certificate
>   identification; the syntax is intended primarily to convey
>   certificate metadata such as attributes found in PKCS #9, PKCS #11,
>   PKCS #12, and particular implementations of cryptographic libraries.
> 
> It is unclear how to me this is intended to work.  A URN is intended
> to denote a *thing*, but these attributes seem to be intended to allow
> the specification of the *thing* to be decorated with metadata (which
> is effectively "carried in-line").  To be exact, you want to define
> the *thing* that such a URN names to be the combination of the
> certificate and the metadata about the certificate.

Cleared up in -02. A lot was made clearer by using [URNBIS] (draft-ietf-urnbis-rfc2141bis-urn) as normative. In RFC 2141, it was ambiguous whether and to what extent URNs had query and fragment components. Now it is clear that they do.

> 
> How the use of %-escapes affects lexical equality should be made
> clear, and ideally it should be consistent for all spec-types.  (In
> situations that I'm aware of (SIP), the general convention is that a
> %-escape should be considered lexically (and functionally) equivalent
> to the character it represents.)

Clarified in -02. Mostly, percent-encoding is prohibited, but this is spelled out. Comments welcome.

> 
> You also need to be explicit about how attributes affect lexical
> equality.  That could get messy.  (Is order of attributes
> significant?)

Attributes do not affect lexical equality. Clarified in -02.

> 
> The policy regarding internal non-URN characters (especially
> whitespace) isn't consistent between the various spec-types.  It would
> be simpler, IMHO, if the definition of the URN *itself* excluded
> extraneous characters.  If there are contexts where, e.g., line-breaks
> must be allowed for practical reasons, the context should define that
> the text is "a URN, possibly with embedded whitespace (which is
> ignored)".

I tried to address this in -02. Most certspecs prohibit whitespace but a few have custom rules (namely, base64 and issuersn).

> 
> Section 4.1 says:
> 
>   In all certspecs in this specification *or* derived
>   from this specification, the hash is computed over the octets of the
>   DER encoding of the certificate, namely, the Certificate type of
>   [RFC5280 sec. 4.1].
> 
> This cannot be correct as written, because "this specification" shows
> a number of spec-types that do not contain hashes.  Do you mean "In
> all certspecs that contain hashes, the hash is computed over the
> octets…"?

Yes. Clarified in -02.

> 
> This BNF is given:
> 
>   spec-value-sha-1   = 20hexOctet
>   spec-value-sha-256 = 32hexOctet
>   spec-value-SHA-384 = 48hexOctet
>   spec-value-SHA-512 = 64hexOctet
> 
> The definition of the SHA-1 spec-type is simply:
> 
>   The spec-type is "SHA-1".  The hash is computed using SHA-1 [SHS].
> 
> But there is no explicit statement that the spec-value of a URN with
> spec-type "SHA-1" must conform to "spec-value-sha-1".  Similarly for
> some other spec-types.

Hopefully clarified in -02. Maybe it's not super-clarified, but there were big ABNF changes. Again, comments welcome.

> 
> In regard to spec-type base64, you might want to mention that though
> "/" is reserved by RFC 2141, it is used by the "data" URL (RFC 2397),
> so you feel it is safe to allow un-escaped "/" in "cert" URNs.
> However, this might be a point of philosophical dispute in the URN
> community.

Discussed extensively in -02.

> 
> In section 4.3.1 is the BNF:
> 
>   spec-value-issuersn = distinguishedName SEMI serialNumber
>   serialNumber        = 1*hexOctet
> 
> This is not strictly correct, as the first component of
> spec-value-issuersn is "a distinguishedName with non-URN characters
> replaced with their %-escape representations".  So the BNF needs to be
> corrected.  It would also help if the text specified explicitly the
> set of characters that were permitted to appear un-escaped in the
> field, to minimize the chances of erroneous implementations.

Fixed in -02. I will complete the explicit table of characters in -03.

> 
> Section 4.3.1 says:
> 
>   <serialNumber> is the hexadecimal encoding of the certificate's
>   serial number, with the exact same (DER encoded) contents octets of a
>   CertificateSerialNumber ::= INTEGER as specified in [RFC5280] sec.
>   4.1.
> 
> I think you could more clearly state this as
> 
> . <serialNumber> is the hexadecimal encoding of the octets of the DER
> . encoding of the CertificateSerialNumber ::= INTEGER as specified in
> . [RFC5280] sec. 4.1.
> 
>   If the serial number hex octets are malformed, the certspec is
>   invalid.

Done.

> 
> This seems to be a special case of a general rule that if a part of a
> "cert" URN is the hex or base64 encoding of a DER-encoded part of a
> certificate, if the encodings are incorrect, the URN is invalid.  If
> so, it would be useful to state this at the top as a general rule.

Done.

> 
> Section 4.3.2 says:
> 
>   The spec-type is "ski".  The spec-value is given by the following
>   ABNF:
> 
>   spec-value-ski = keyIdentifier
>   keyIdentifier  = 1*hexOctet
> 
>   <keyIdentifier> is the hexadecimal encoding of the certificate's
>   subject key identifier, which is recorded in the certificate's
>   Subject Key Identifier extension [RFC5280] sec. 4.2.1.2.
> 
> My guess is that keyIdentifier is the hexadecimal encoding of the
> DER-encoding of the SubjectKeyIdentifier field, but that is not made
> explicit.

I see the ambiguity here. Clarified in -02. The SKI (value) is the contents octets of the DER-encoded SubjectKeyIdentifier ::= OCTET STRING. So if the extension value is: 04 03 01 02 03, the SKI is "010203" (not the tag or length octets).

> 
> In regard to section 4.3.3:
> 
> The "identifier uniqueness considerations" do not discuss the fact
> that different URNs (considered as text strings) can be lexically
> equivalent via (at least):
> 
> - case of the spec-type and hex encodings
> - use of %-encoding of characters for which it is not mandatory
> - hex and base64 spec-type encoding of the same certificate
> - when embedded non-base64 characters are ignored
> 
> Since these are "lexically obvious", they should not cause practical
> problems, but need to be made explicit.

Clarified in -02.

> 
> However there are many situations where two different URNs designate
> the same certificate and that fact can only be discovered by resolving
> them.  That is probably unavoidable for practical reasons, but it
> should be mentioned and the consequences discussed.
> 
>      For specs that identify certificates by certificate data,
>      the resolver's database of certificates and implementation
>      of certification path validation [RFC5280 sec. 6] ensure
>      uniqueness.
> 
> I think what you really mean to say is that as long as the CAs issue
> certificates correctly, no "cert" URN can identify two different
> certificates.

I used this text. Clarified in -02.

> 
> In regard to "identifier persistence considerations", I think what you
> have written is more in regard to the persistence of *certificates*.
> What you need to state is that once an URN identifies a particular
> certificate, that fact will never change.  (That the URN can identify
> only one certificate is a matter of uniqueness, which is the previous
> section.
> 
> .  Identifier persistence considerations:
> .     A certificate is a permanent digital artifact, irrespective of
> .     its origin.  As the URN records only information that is
> .     derivable from the certificate itself, such as one of its
> .     cryptographic hashes, the binding between the URN and the
> .     certificate is permanent.

I used (most of) this text. Clarified in -02.

> 
> In regard to "Conformance with URN Syntax", some discussion of the use
> of "/" needs to be included, as that is nonconformant with RFC 2141.

Clarified in -02.

> 
> In two places is the example
> 
>   urn:cert:base64:MIICAS...
> 
> Since this does not conform to the BNF, something needs to be revised.

I generated a very small certificate, which hopefully provides a real-world example. It's in -02.

> 
> In regard to "IANA Considerations", some discussion regarding
> assignment of spec-types needs to be done.  Should a registry be
> created?

This has been brought up before.

I am on the fence about whether a registry should be created. I understand the desire. On the other hand, there are countervailing considerations (which I put in -02). Basically, I want to make it difficult to register new types of certspecs, because every time a new certspec type is created, it will impose a lot of burden on existing implementations to support the new spec.

The exception here is hashes: hash algorithms are expected to weaken over time given the progress of cryptographic research, so implementations need to implement new certspec processing for new algorithms as old ones fail. However, because URNs are supposed to be permanent identifiers, it makes sense to me to limit the algorithms to the most well-known and well-implemented ones, as they apply to fingerprinting data. As of today this still is SHA-1--most tools will readily cough up the SHA-1 hash of a certificate. However, I believe that there is an IETF BCP or other document that says that we should move away from SHA-1 for new applications. There is NIST guidance on the same topic: SP 800-131A. Here is the text from the website <http://csrc.nist.gov/groups/ST/hash/policy.html>:
SHA-1: Federal agencies should stop using SHA-1 for generating digital signatures, generating time stamps and for other applications that require collision resistance. Federal agencies may use SHA-1 for the following applications: verifying old digital signatures and time stamps, generating and verifying hash-based message authentication codes (HMACs), key derivation functions (KDFs), and random bit/number generation. Further guidance on the use of SHA-1 is provided in SP 800-131A.

SHA-2 (i.e., SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224 and SHA-512/256): Federal agencies may use these hash functions for all applications that employ secure hash algorithms. NIST encourages application and protocol designers to implement SHA-256 at a minimum for any applications of hash functions requiring interoperability. Further guidance on the use of SHA-2 is provided in SP 800-57 Part 1, section 5.6.2 and SP 800-131A.

SHA-3: When the SHA-3 hash algorithm becomes available, it may also be used for all applications that employ secure hash algorithms. At this time, there is no need or plan to transition applications from SHA-2 to SHA-3.

Basically for these reasons, draft-seantek-certspec specifies SHA-1 and SHA-256; SHA-384 and SHA-512 are thrown in "for convenience". SHA-512/224 and SHA-512/256 were omitted because the increased performance of truncated SHA-512 does not seem to justify proliferating the certspec standard (i.e., the certificate lookup computation will almost certainly dwarf the hash computation). SHA-3 is simply too new; including it would not serve any purpose that the SHA-2 functions don't already meet.

Thanks again!

Cheers,

Sean

Attachment: smime.p7s

Re: New version of certspec (01); request review … Dale R. Worley
New version of certspec (01); request review and … Sean Leonard
Re: New version of certspec (01); request review … Sean Leonard
Re: New version of certspec (01); request review … Sean Leonard
Re: New version of certspec (01); request review … Dale R. Worley

Re: New version of certspec (01); request review and URN assignment

Attachment: smime.p7s