Re: New version of certspec (01); request review and URN assignment

worley@ariadne.com (Dale R. Worley) Mon, 04 November 2013 22:16 UTC

Return-Path: <worley@shell01.TheWorld.com>
X-Original-To: urn-nid@ietfa.amsl.com
Delivered-To: urn-nid@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A9EFF21E80D4 for <urn-nid@ietfa.amsl.com>; Mon, 4 Nov 2013 14:16:43 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.108
X-Spam-Level:
X-Spam-Status: No, score=-3.108 tagged_above=-999 required=5 tests=[AWL=0.491, BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SkHZJh8C8k3U for <urn-nid@ietfa.amsl.com>; Mon, 4 Nov 2013 14:16:37 -0800 (PST)
Received: from TheWorld.com (pcls5.std.com [192.74.137.145]) by ietfa.amsl.com (Postfix) with ESMTP id 416BD21E821A for <urn-nid@ietf.org>; Mon, 4 Nov 2013 14:16:02 -0800 (PST)
Received: from shell.TheWorld.com (root@shell01.theworld.com [192.74.137.71]) by TheWorld.com (8.14.5/8.14.5) with ESMTP id rA4MFKF6015058; Mon, 4 Nov 2013 17:15:23 -0500
Received: from shell01.TheWorld.com (localhost.theworld.com [127.0.0.1]) by shell.TheWorld.com (8.13.6/8.12.8) with ESMTP id rA4MEf4L4291777; Mon, 4 Nov 2013 17:14:41 -0500 (EST)
Received: (from worley@localhost) by shell01.TheWorld.com (8.13.6/8.13.6/Submit) id rA4MEeNj4209694; Mon, 4 Nov 2013 17:14:40 -0500 (EST)
Date: Mon, 4 Nov 2013 17:14:40 -0500 (EST)
Message-Id: <201311042214.rA4MEeNj4209694@shell01.TheWorld.com>
From: worley@ariadne.com (Dale R. Worley)
Sender: worley@ariadne.com (Dale R. Worley)
To: urn-nid@ietf.org
Subject: Re: New version of certspec (01); request review and URN assignment
X-BeenThere: urn-nid@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: discussion of new namespace identifiers for URNs <urn-nid.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn-nid>, <mailto:urn-nid-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/urn-nid>
List-Post: <mailto:urn-nid@ietf.org>
List-Help: <mailto:urn-nid-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn-nid>, <mailto:urn-nid-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Nov 2013 22:16:43 -0000

This draft looks interesting but needs a lot of cleaning up.

In general, there are a lot of things that aren't spelled out clearly
or are forward-referenced (the import of a paragraph is only clear
when text much further down is read).

(Added after the following paragraphs were written:)  Have you
considered the poissibility that "cert" is a URI, not a URN (and thus
"cert" should be a scheme rather than a NID)?  From practical
necessity, certificates will have multiple certspecs, which means that
certspecs aren't *names*.  And a certspec doesn't show how to obtain
the certificate, so it isn't a URL.  But it is clearly a URI...

There is inconsistency regarding whether a URN may retrieve multiple
certificates.  In section 2:

   When a resolver resolves certspec, the resolver's output is either
   a single certificate or nothing.

In section 4:

   A certificate specification (certspec) unambiguously identifies a
   single certificate.

In section 8:

   When using specs that depend on certificate data, the implementation
   MUST be prepared to deal with multiple found certificates that
   contain the same certificate data, but are not the same certificate.

Perhaps I am not understanding that there is a difference between "the
same certificate" and "certificates that contain the same certificate
data", but if there is such a difference, the text in the other
sections needs to take that into account.  Or is it that "the
implementation MUST be prepared to deal with the error condition of
multiple found certificates"?

There is a lot of use of the terms "spec" and "certspec" but I can't
find definitions for them.  I suspect that they are terms that have
arisen in a specific discussion context, but since they are not widely
known, they should be replaced with phrases that are more explanatory
or given a clear definition upon first use.  There also seem to be
some ambiguities, e.g., "spec" is sometimes used to mean the spec-type
string itself, and sometimes the subset of "cert" URNs that have that
spec-type value.

The syntax is given in section 3 as:

      NSS = spec-type ":" spec-value ( '?' certattrs)
        spec-type = scheme
        certattrs = <URN chars>
        hexOctet  = hexDigit hexDigit
      hexDigit  =
             "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" /
             "a" / "b" / "c" / "d" / "e" / "f" /
             "A" / "B" / "C" / "D" / "E" / "F"

It appears that "scheme" is intended to be "<NID>" from RFC 2141.  Or
is it the similar but different "scheme" from RFC 2396/3986?  "<URN
chars>" appears to be "<URN chars>" from RFC 2141.  "hexDigit" could
be replaced with a similar reference to "<hex>" from RFC 2141.

The external references could be made clearer, and should be made
explicit.  And the fact that the defined spec-types and the
corresponding syntaxes for spec-values are given below is not stated
here.  It would help if this BNF was explicitly labeled as the
"generic syntax" for "cert" URNs.

Section 3.2 says:

   A certspec can include attributes that are associated with the
   identified certificate.  These attributes do NOT affect certificate
   identification; the syntax is intended primarily to convey
   certificate metadata such as attributes found in PKCS #9, PKCS #11,
   PKCS #12, and particular implementations of cryptographic libraries.

It is unclear how to me this is intended to work.  A URN is intended
to denote a *thing*, but these attributes seem to be intended to allow
the specification of the *thing* to be decorated with metadata (which
is effectively "carried in-line").  To be exact, you want to define
the *thing* that such a URN names to be the combination of the
certificate and the metadata about the certificate.

How the use of %-escapes affects lexical equality should be made
clear, and ideally it should be consistent for all spec-types.  (In
situations that I'm aware of (SIP), the general convention is that a
%-escape should be considered lexically (and functionally) equivalent
to the character it represents.)

You also need to be explicit about how attributes affect lexical
equality.  That could get messy.  (Is order of attributes
significant?)

The policy regarding internal non-URN characters (especially
whitespace) isn't consistent between the various spec-types.  It would
be simpler, IMHO, if the definition of the URN *itself* excluded
extraneous characters.  If there are contexts where, e.g., line-breaks
must be allowed for practical reasons, the context should define that
the text is "a URN, possibly with embedded whitespace (which is
ignored)".

Section 4.1 says:

   In all certspecs in this specification *or* derived
   from this specification, the hash is computed over the octets of the
   DER encoding of the certificate, namely, the Certificate type of
   [RFC5280 sec. 4.1].

This cannot be correct as written, because "this specification" shows
a number of spec-types that do not contain hashes.  Do you mean "In
all certspecs that contain hashes, the hash is computed over the
octets..."?

This BNF is given:

   spec-value-sha-1   = 20hexOctet
   spec-value-sha-256 = 32hexOctet
   spec-value-SHA-384 = 48hexOctet
   spec-value-SHA-512 = 64hexOctet

The definition of the SHA-1 spec-type is simply:

   The spec-type is "SHA-1".  The hash is computed using SHA-1 [SHS].

But there is no explicit statement that the spec-value of a URN with
spec-type "SHA-1" must conform to "spec-value-sha-1".  Similarly for
some other spec-types.

In regard to spec-type base64, you might want to mention that though
"/" is reserved by RFC 2141, it is used by the "data" URL (RFC 2397),
so you feel it is safe to allow un-escaped "/" in "cert" URNs.
However, this might be a point of philosophical dispute in the URN
community.

In section 4.3.1 is the BNF:

   spec-value-issuersn = distinguishedName SEMI serialNumber
   serialNumber        = 1*hexOctet

This is not strictly correct, as the first component of
spec-value-issuersn is "a distinguishedName with non-URN characters
replaced with their %-escape representations".  So the BNF needs to be
corrected.  It would also help if the text specified explicitly the
set of characters that were permitted to appear un-escaped in the
field, to minimize the chances of erroneous implementations.

Section 4.3.1 says:

   <serialNumber> is the hexadecimal encoding of the certificate's
   serial number, with the exact same (DER encoded) contents octets of a
   CertificateSerialNumber ::= INTEGER as specified in [RFC5280] sec.
   4.1.

I think you could more clearly state this as

. <serialNumber> is the hexadecimal encoding of the octets of the DER
. encoding of the CertificateSerialNumber ::= INTEGER as specified in
. [RFC5280] sec. 4.1.

   If the serial number hex octets are malformed, the certspec is
   invalid.

This seems to be a special case of a general rule that if a part of a
"cert" URN is the hex or base64 encoding of a DER-encoded part of a
certificate, if the encodings are incorrect, the URN is invalid.  If
so, it would be useful to state this at the top as a general rule.

Section 4.3.2 says:

   The spec-type is "ski".  The spec-value is given by the following
   ABNF:

   spec-value-ski = keyIdentifier
   keyIdentifier  = 1*hexOctet

   <keyIdentifier> is the hexadecimal encoding of the certificate's
   subject key identifier, which is recorded in the certificate's
   Subject Key Identifier extension [RFC5280] sec. 4.2.1.2.

My guess is that keyIdentifier is the hexadecimal encoding of the
DER-encoding of the SubjectKeyIdentifier field, but that is not made
explicit.

In regard to section 4.3.3:

The "identifier uniqueness considerations" do not discuss the fact
that different URNs (considered as text strings) can be lexically
equivalent via (at least):

- case of the spec-type and hex encodings
- use of %-encoding of characters for which it is not mandatory
- hex and base64 spec-type encoding of the same certificate
- when embedded non-base64 characters are ignored

Since these are "lexically obvious", they should not cause practical
problems, but need to be made explicit.

However there are many situations where two different URNs designate
the same certificate and that fact can only be discovered by resolving
them.  That is probably unavoidable for practical reasons, but it
should be mentioned and the consequences discussed.

      For specs that identify certificates by certificate data,
      the resolver's database of certificates and implementation
      of certification path validation [RFC5280 sec. 6] ensure
      uniqueness.

I think what you really mean to say is that as long as the CAs issue
certificates correctly, no "cert" URN can identify two different
certificates.

In regard to "identifier persistence considerations", I think what you
have written is more in regard to the persistence of *certificates*.
What you need to state is that once an URN identifies a particular
certificate, that fact will never change.  (That the URN can identify
only one certificate is a matter of uniqueness, which is the previous
section.

.  Identifier persistence considerations:
.     A certificate is a permanent digital artifact, irrespective of
.     its origin.  As the URN records only information that is
.     derivable from the certificate itself, such as one of its
.     cryptographic hashes, the binding between the URN and the
.     certificate is permanent.

In regard to "Conformance with URN Syntax", some discussion of the use
of "/" needs to be included, as that is nonconformant with RFC 2141.

In two places is the example

   urn:cert:base64:MIICAS...

Since this does not conform to the BNF, something needs to be revised.

In regard to "IANA Considerations", some discussion regarding
assignment of spec-types needs to be done.  Should a registry be
created?

Dale