APPSDIR review of draft-farrell-decade-ni-07

"Martin J. Dürst" <> Tue, 05 June 2012 09:43 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id CB87321F86D5 for <>; Tue, 5 Jun 2012 02:43:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -98.68
X-Spam-Status: No, score=-98.68 tagged_above=-999 required=5 tests=[AWL=-0.690, BAYES_50=0.001, GB_I_LETTER=-2, HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, J_CHICKENPOX_35=0.6, J_CHICKENPOX_37=0.6, MIME_8BIT_HEADER=0.3, USER_IN_WHITELIST=-100]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id BJeyH9I7EUTp for <>; Tue, 5 Jun 2012 02:43:06 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 4174F21F86B3 for <>; Tue, 5 Jun 2012 02:43:05 -0700 (PDT)
Received: from ([]) by (secret/secret) with SMTP id q559gqHe026517 for <>; Tue, 5 Jun 2012 18:42:52 +0900
Received: from (unknown []) by with smtp id 4a87_35b8_d19f44d4_aef2_11e1_ab9e_001d096c5782; Tue, 05 Jun 2012 18:42:51 +0900
Received: from [IPv6:::1] ([]:52147) by with [XMail 1.22 ESMTP Server] id <S15CE4E6> for <> from <>; Tue, 5 Jun 2012 18:42:56 +0900
Message-ID: <>
Date: Tue, 05 Jun 2012 18:42:49 +0900
From: "\"Martin J. Dürst\"" <>
Organization: Aoyama Gakuin University
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv: Gecko/20100722 Eudora/3.0.4
MIME-Version: 1.0
Subject: APPSDIR review of draft-farrell-decade-ni-07
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Cc: IETF discussion list <>, "" <>
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: IETF-Discussion <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 05 Jun 2012 09:43:08 -0000

Hello everybody,

[For replies, please trim the cc list, thanks!]

I have been selected as the Applications Area Directorate reviewer for 
this draft (for background on appsdir, please see ).

Please resolve these comments along with any other Last Call comments 
you may receive. Please wait for direction from your document shepherd 
or AD before posting a new version of the draft.

Document: draft-farrell-decade-ni-07
Title: Naming Things with Hashes
Reviewer: Martin Dürst
Review Date: 2012-06-03, 2012 (written up 2012-06-04/05)
IETF Last Call Date: started 2012-06-04, ends 2012-07-02

Summary: This draft addresses a real generic need, but the current form 
of the draft is the result of adding more and more special cases without 
a clear overall view and a firm hand to separate the wheat from the 
chaff. This shows both in the technical issues as well as in many of the 
editorial issues below. This draft is not ready for publication without 
some serious additional work, but that work is mostly straightforward 
and should be easy to complete quickly.

Major design issue:

The draft defines two schemes, which differ only slightly, and mostly 
just gratuitously (see also editorial issues).
These are the ni: and the nih: scheme. As far as I understand, they 
differ as follows:
                                     ni:                nih:
authority:                          optional           disallowed
ascii-compatible encoding:          base64url          base16
check digit:                        disallowed         optional
query part:                         optional           disallowed
decimal presentation of algorithm:  disallowed         possible

The usability of URIs is strongly influenced by the number of different 
schemes, with the smaller a number, the better. As a somewhat made-up 
example, if the original URIs had been separated into httph: for HTML 
pages and httpi: for images, or any other arbitrary subdivision that one 
can envision, that would have hurt the growth and extensibility of the 
Web. Creating new URI schemes is occasionally necessary, and the ideas 
that lead to this draft definitely seem to warrant a new scheme (*), but 
there's no reason for two schemes.
[(*) I know people who would claim the the .well-formed http/https thing 
is completely sufficient, no new scheme needed at all.]

More specifically, if the original URIs had been separated into httpm: 
(for machines) and httph: (for humans), the Web for sure wouldn't have 
grown at the speed it did (and does) grow. In practice, there are huge 
differences in human 'speakability' for URIs (and IRIs, for that 
matter); compare e.g. with
(which I have significantly shortened to hopefully eliminate potential 
privacy issues), or compare the average mailto: URI with the average 
data: URI. However, what's important is that there never has been a 
strong dividing line between machine-only and human-only URIs or 
schemes, the division has always been very gradual. Short and mainly 
human-oriented URIs have of course been handled by machines, and on the 
other hand, very long URIs have been spoken when really necessary. 
"Speakability" has been maintained to some extent by scheme designers, 
and to some extent by "survival of the fittest" (URIs that weren't very 
speakable (or spellable/memorizable/guessable/...), and their Web sites, 
might just die out slowly).

It should also be noted that the resistance against multiple URI schemes 
may have been low because there are so many different ways to express 
hashes in the draft anyway, and one more (the nih: section is the last 
one before the examples section) didn't seem like much of a deal 
anymore. But when it comes to URIs, one less is a lot better than one more.

In the above ni:/nih: distinction, nih: seems to have been added as an 
afterthought after realizing that reading an ni: URI aloud over the 
phone may be somewhat suboptimal because there is a need for repeated 
"upper case" - "lower case" (sure very quickly shortened to "upper" - 
"lower" and then to "up" - "low" or something similar). It is not a bad 
idea to try to make sure that IETF technology, and URIs in particular, 
are accessible to people with certain kinds of dislexya. (There are 
indeed people who have tremendous difficulties with distinguishing 
upper- and lower-case letters, and this may or may not be connected with 
other aspects of dislexya.) It is however totally unclear to this 
reviewer why this has to lead to two different URI schemes with other 
gratuitous differences.

Finding a solution is rather easy (of course, other solutions may also 
be possible): Merge the schemes, so that authority, check digit, and 
query part are all optional (an authority part and/or a query part may 
very well be very useful in human communication, and a check digit won't 
hurt when transmitted electronically) and the decimal presentation of 
the algorithm is always allowed, and use base32 
( as the encoding. This leads to a 
16.6% less efficient encoding of the value part of the ni: URI, but 
given that other URI-related encodings, e.g. the %-encoding resulting 
when converting an IRI to an URI, are much less efficient, and that URI 
infrastructure these days can handle URIs with more than 1000 bytes, 
this should not be a serious problem. Also, there's a separate binary 
format (section 6) that is more compact already.

(relatively) Minor technical issues:

Section 2, "When the input to the hash algorithm is a public key value": 
Is it absolutely clear that this will work for any and all public key 
values, existing and future, and not only for what's currently around? 
After all, as far as I understand, the concept of a public key is a 
fairly general one.

"Other than in the above special case where public keys are used, we do 
not specify the hash function input here.  Other specifications are 
expected to define this.": Do you really expect that to happen? Wouldn't 
it be better limit variability here as much as possible, and to use 
media types to identify different kinds of data? This would also work 
for public keys: If there's a MIME media type for a 
SubjectPublicKeyInfo, then the fact that this media type is the 
preferred way to transfer a public key becomes an application convention 
rather than a special case in the spec. If a better way (or just another 
way) to encode/transfer public keys became popular at a later date, 
there would be no need to change the spec.

Related, in Section 3:
    The "val" field MUST contain the output of base64url encoding the
    result of applying the hash function ("alg") to its defined input,
    which defaults to the object bytes that are expected to be returned
    when the URI is dereferenced.
How do I know whether the default applies or not? The URI doesn't tell 
you. Deducing from context is a bad idea.

Section 3: "Thus to ensure interoperability, implementations SHOULD NOT 
generate URIs that employ URI character escaping": This is wrong and 
needs to be fixed. Characters such as "&", "=", "#", and "%", as well as 
ASCII characters not allowed in URIs and non-ASCII characters MUST be 
%-encoded if they appear in query parameter values in URIs (or in query 
parameter tags, which is however less likely). It would be better if the 
spec here deferred to the URI spec rather than trying to come up with 
its own rules.

Section 3: "The Named Information URI adapts the URI definition from the 
URI Generic Syntax [RFC3986].": This sounds as if this were a voluntary 
decision (and the text should be changed to avoid such an impression), 
but if you don't conform to RFC 3986 syntax, you're not an URI. This is 
the first time I have seen an URI scheme definition starting explicitly 
with the top ABNF rule from RFC 3986 
( This is completely 
unnecessary. Just make sure your production conforms to the generic URI 
syntax, and mention all the ABNF rules from RFC3986 that you use.

Also, using the "URI" production from RFC 3986, and then silently 
dropping the #fragment part, is technically wrong. Scheme definitions 
have nothing to do with the fragment (including the question of whether 
there's a fragment or not; the semantics of fragments are defined by the 
MIME media type that you get when you resolve). This may not be 
completely clear in RFC 4395, but the IRI WG is working on an update of 
RFC 4395 where this will be made clearer (see also; thanks for giving me 
a chance to remember that I had to create a new issue in the tracker for 
this :-).

Section 3, ABNF:
             ni-hier-part   = "//" authority path-algval
                              / path-algval
This gives you ni://;f4OxZX_x_FO5... (//authority/) 
and ni:/sha-256;f4OxZX_x_FO5... (one slash only), but the examples show 
ni:///sha-256;f4OxZX_x_FO5... (three slashes). It looks like the ABNF 
you want is:
             ni-hier-part   = "//" authority path-algval
                            / "//" path-algval
(aligning "=" and "/" helps!)
or more simply:
             ni-hier-part   = "//" [authority] path-algval
or even more simply:
             ni-hier-part   = "//" authority path-algval
because authority can be empty; let's show this:
    authority     = [ userinfo "@" ] host [ ":" port ]
If we can show that host can be empty, we're done:
    host          = IP-literal / IPv4address / reg-name
If we can show that any one of these can be empty, we're done, let's 
pick reg-name:
    reg-name      = *( unreserved / pct-encoded / sub-delims )
* means "zero or more", thus reg-name can be empty. QED.

Section 4:
    The HTTP(S) mapping MAY be used in any context where clients without
    support for ni URIs are needed without loss of interoperability or
What is meant by "support for ni"? There's nowhere in the spec where 
this is explained clearly. If I were a browser maker, or writing an URI 
library,..., what would I do to support the ni scheme? The only thing I 
have come up with is to covert ni to the .well-known format, then use 
HTTP(S). In that case, the above text seems wrong, as it says that 
.well-known is used when there's no support for ni, not in order to 
support ni.

Section 5: This defines an "URL segment format". It seems to be limited 
to path componest in HTTP URIs. What if I want to use this in a query 
part, or maybe even as a fragment identifier? What if I want to use this 
as a path component in an FTP URI? Or in some other schem? It would be 
better to define the alg-val (see next point) part as such (before the 
other things), with an explanation along the following lines: "This is 
defined here both for use in other sections of this document as well as 
for use in other places where it may be helpful, such as HTTP URI path 

Section 5 (and Section 3): "To do this one simply uses the "alg;val" 
production": There is no "alg;val" production. Please change to "To do 
this one simply uses the <alg-val> production" and fix the ABNF in 
section 3 to
             path-algval = "/" alg-val
             alg-val     = alg ";" val
It's probably even better to fold this in with the changes to 
ni-hier-part, resulting e.g. in:
             ni-hier-part   = "//" authority "/" alg-val
             alg-val     = alg ";" val

Section 9.4: Status can be 'empty' or 'deprecated'. I suggest to replace 
'empty' with something positive, such as 'valid' or 'active'. This will 
help people who go to the IANA page and start to ask "well, it doesn't 
have a status, what does that mean". Also, I strongly suggest to add an 
additional status 'reserved', and remove the current "Reserved" hash 
name string from the entries with IDs 0 and 32.

Section 9.4: "The Suite ID value 32 is reserved for compatibility with 
ORCHIDs [RFC4843].": How will compatibility be kept for future 
changes/additions in ORCHID?

Major editorial issues:

Title and abstract (and the spec itself) use the wording "Naming 
Things". While in a security context, it may be that there is an 
implicitly assumption that there are only digital things, in a wider 
context, this is of course not true. Research on the Internet of Things 
and efforts such as the Semantic Web/Linked Data try to deal with things 
in the real world. People in these areas it will be confused by title, 
abstract, and text, unless you can show (me and) them an ni: hash for a 
person, an apple, a building, or an elephant. Therefore, while it may be 
possible to keep the catchy title, the abstract has to be fixed to avoid 
such misunderstandings, e.g. by changing "to identify a thing" to "to 
identify a digital object" or some such in the abstract, and likewise in 
the main text of the spec.

"Human-speakable" (e.g. ), "human-readable" (e.g. section title of 
section 7), and "for humans" (e.g. section title of section 9.2): These 
terms are used throughout the spec, but are imprecise and confusing. 
First, there's the problem of interpreting "for humans" in the sense of 
the previous paragraph, which of course has to be fixed. But the main 
problem is that none of the "ni:" URIs are "non-human-readable" or 
"non-human-speakable". Reading them aloud is only somewhat more tedious, 
but not at all impossible. And because the value part of the nih: form 
is 50% longer, and people quickly develop conventions for shortening 
things such as "upper case" and "lower case", it's not even clear that 
reading aloud the nih: form will necessarily take that much time. 
Therefore, I strongly recommend to change all occurrences of 
"Human-speakable", "human-readable", "for humans", and the like, to the 
more precise "more easily read out aloud by humans" or something equivalent.

Abstract and further on: "specifying URI, URL": By all URx theories (see 
e.g., URLs are a subset of 
URIs, and therefore saying that the spec specifies an URI and an URL is 
somewhat confusing. I'd propose using wording along the following lines: 
"specifying an URI scheme and a way to map these URIs to http".

Section 2, "When the input to the hash algorithm is a public key value", 
and example section: It took me a while to understand that the "public 
key" stuff was not yet another way to present a hash, and also not a way 
to mix in a public key to the hash in order to obtain some specific 
security property (I wasn't able to figure out how that would work, but 
draft-hallambaker-decade-ni-params contains something similar involving 
digital signatures and a public key). The document would be much easier 
to understand if there was a section e.g. entitled "Forms of input to 
hash", with subsections e.g. "general data", "public keys", "other stuff 
(not defined in this document)". As it is written, the relevant 
paragraphs in section 2 look like an afterthought, and it's not clear to 
Also, the example section should be fixed as follows: 1) say upfront 
that there will be two examples, one for a short string and another for 
a public key. 2) Make sure both examples exercise all forms (the public 
key example seems to be pretty complete, but the "Hello World!" example 
seems to be incomplete). 3) Use the same form of presentation (either a 
table in both cases or short paragaphs in both cases.
The caption on Figure 7 is also way too unspecific.

Section 9.4: "Hash Name Algorithm Registry", and later "a new registry 
for hash algorithms as used in the name formats specified here": IANA 
will be helped tremendously if your draft comes with an 
easy-to-understand and unambiguous name for the new registry. "Hash Name 
Algorithm Registry" may be okay, but is probably not specific enough. 
The circumscription at the start of the section is definitely not good 
enough because you're not registering hash algorithms, but names of hash 
algorithms and their truncations.

Minor editorial issues:

Introduction: It would be good to have a general reference to hashing 
(for security purposes) for people not utterly familiar with the subject.

Intro: After reading the whole document, the structure of the Intro 
seems to make some sense, but it didn't on first reading (where it's 
actually more important). The main problem I was able to identify was 
that after a general outlook in paragraph 1, the Intro drops into a list 
of examples without saying what they are good for. I suggest to, after 
the sentence "This document specifies standard ways to do that to aid 
interoperability.", add a sentence along the lines: "The next few 
paragraphs give usage examples for the various ways to include a hash in 
a name or identifier as they are defined later in this document.". It 
may also make sense to further streamline the following paragraphs, so 
that it is clearer which pieces of text refer each to one of the 
"standard ways".

There are two instances of the term "binary presentation". Looking 
around, it seems that they are supposed to mean the same as "binary 
format". Please replace all instances of "binary presentation" with 
"binary format" to avoid misunderstandings and useless seach time.

Section 3: "A Named Information (ni) URI consists of the following 
components:": It would be good to know exactly where the list ended. One 
way to do this would be to say "consists of the following nine components".

Section 3: "Note that while the ni names with and without an authority 
differ syntactically, both names refer to the same object if the digest 
algorithm and value are the same.": What about cases with different 
authority? The text seems to apply by transitivity, but this may be easy 
to miss for an implementer. I suggest changing to: "Note that while ni 
names with and without an authority, and ni names with different 
authorities, differ syntactically, they all refer to the same object if 
the digest algorithm and value are the same.".

Section 3: "Consequently no special escaping mechanism is required for 
the query parameter portion of ni URIs.": Does this mean "no escaping 
mechanism at all"? Or "nothing besides %-encoding"? Or something else? 
Please clarify.

Figure 3: the "=" characters of the various rules should be aligned as 
much as possible to make it easier to scan the productions (see for an example).

Section 3:
             unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
                 ;  directly from RFC 3986, section 2.3
                 ; "authority" and "pct-encoded" are also from RFC 3986
Please don't copy productions. Please don't copy half (or one-third, 
actually) of the productions you use, and reference the rest. Please 
don't say what productions you copy from where in a comment, and even 
less in a comment for an unrelated production. Please before the ABNF, 
say which productions are used from another spec.

Section 4:
    The HTTP(S) mapping MAY be used in any context where clients without
    support for ni URIs are needed without loss of interoperability or
This is difficult to understand. If some new functionality is proposed, 
it's usually a client *with* the new functionality that's needed, not 
one without. Also, the "without loss of interoperability or 
functionality" is unclear: Sure if ni isn't supported, there's a loss in 
interoperability. So I suggest to rewrite this as:
    The HTTP(S) mapping MAY be used in any context where clients with
    support for ni URIs are not available.
(but see also the comment in minor technical issues)

Section 6: "binary format name": Why 'name'? Why not just "binary 
format"? The later is completely clear in the context of the document or 
together with an indication of the document; for something that can be 
used independently, even "binary format name" isn't enough.

Section 6: "suite ID": The word "suite" seems out of place here. In the 
general use of the term, it refers to "a group of things forming a unit 
or constituting a collection" (see A good definition that 
works for the uses I'm familiar with in digital security would be "An 
algorithm suite is a coherent collection of cryptographic algorithms for 
performing operations such as signing, encryption, generating message 
digests, and so on." 
disclaimer: I'm in no way a SOAP fan). The use here is not for a 
collection, but for a single truncated-length variant of a single hash 
algorithm. I seriously hope you can find a better name.

Section 6: "Note that a hash value that is truncated to 120 bits will 
result in the overall name being a 128-bit value which may be useful 
with certain use-cases.": This left me really wondering: Is there 
something magic to 128 bits in computer/internet security? What are the 
"certain use cases"? Or is this just an example to make sure the reader 
got the relationships, and it could have been as well "Note that a hash 
value that is truncated to 64 bits will result in the overall name being 
a 72-bit value which may be useful with certain use-cases." (or whatever 
other value that's registered in section 9)?

Section 7: Just for the highly unfortunate case that this doesn't 
disappear, it would be very helpful if the presentation of this section 
paralleled section 3.

Section 7: "contain the ID value as a UTF-8 encoded decimal number": I'm 
an internationalization expert with a strong affection for UTF-8, but 
even for me, this should be "contain the ID value as an ASCII encoded 
decimal number".

Section 9: The registration templates refer to sections. This is fine 
for readers of the draft, but not if the template is standalone. I 
suggest using a format such as that at, which in draft stage may 
look e.g. like

Section 9.3: "Assignment of Well Known URI prefix ni" and later (and 
elsewhere in the draft) "URI suffix": Are we dealing with a prefix or a 
suffix here?

Section 9.4: "This registry has five fields, the binary suite ID,...":
Better to remove the word "binary", because the actual number is decimal.

Section 9.4: "The expert SHOULD seek IETF review before approving a 
request to mark an entry as "deprecated."  Such requests may simply take 
the form of a mail to the designated expert (an RFC is not required). 
IETF review can be achieved if the designated expert sends a mail to the 
IETF discussion list.  At least two weeks for comments MUST be allowed 
thereafter before the request is approved and actioned.": I'm at a loss 
to see why asking the IETF at large is a SHOULD, but if it's done, then 
the two weeks period is a MUST.

Section 9.4: The registry initialization in Fig. 8 refers to RFC4055 
many times. But RFC 4055 does in no way define SHA-256. It looks like 
the actual spec is (National 
Institute of Standards and Technology (NIST), FIPS 180-2: Secure Hash 
Standard, 1 August 2002.) I think this should be cited, in particular 
because there is a "Specification Required" requirement, and this sure 
should mean that there is a Specification for the actual algorithm, and 
not just a specification that mentions some labels. So using RFC4055 as 
a reference could be taken as creating bad precedent.

Section 9.4: "The designated expert is responsible for ensuring that the 
document referenced for the hash algorithm is such that it would be 
acceptable were the "specification required" rule applied.": Why all 
this circumscription? Why not just say something like: "The designated 
expert is responsible for ensuring that the document referenced for the 
hash algorithm meets the "specification required" rule."


Author's list: Last time I heard about this, there was a general limit 
of 5 authors per RFC. I'm not sure whether this still exists, and what'd 
be needed to get around it, but I just wanted to point out that this may 
be a potential problem or additional work (hoops to get through).

Intro: "Since, there is no standard" -> "Since there is no standard"

Intro: "for these various purposes" -> "for these purposes" or "for 
various purposes" (the indefinite 'various' is incompatible with the 
definite 'these').

"2.  Hashes are what Count" -> "2.  Hashes are what Counts" (the former 
may look logically correct, but 'what' requires a singular verb form.

Section 2: "the left-most or most significant in network byte order N 
bits from the binary representation of the hash value" -> "the left-most 
(or most significant in network byte order) N bits from the binary 
representation of the hash value" or "the left-most N bits, or the N 
most significant bits in network byte order, from the binary 
representation of the hash value" (the current text is virtually 

Figure 1: The 0x notation is never explained. A short clause or pharse 
is all that would be needed, but it would be better if this were spelled 

Section 3, Query Parameter separator: "The query parameter separator 
acts a separator between" -> "The query parameter separator acts *as* a 
separator between".

Section 3, Query Parameters: "A tag=value list of optional query 
parameters as are used with HTTP URLs" ->  "A tag=value list of optional 
query parameters as used with HTTP URLs" (or "A tag=value list of 
optional query parameters as they are used with HTTP URLs").

Section 4: "the object named by the ni URI will be available at the 
corresponding HTTP(S) URL" -> "the object named by the ni URI will be 
available via the corresponding HTTP(S) URL" (via stresses the point 
that this should be done via (sic) redirection)

Section 4: "so there may still be reasons to use" -> "so there can still 
be reasons to use" (better to use can because non-normative; the 
document otherwise does a good job on this)

Section 10: "Note that fact that" -> "Note the fact that", or much 
better: "Note that".

Regards,     Martin.