Re: [urn] Benjamin Kaduk's Discuss on draft-hakala-urn-nbn-rfc3188bis-01: (with DISCUSS and COMMENT)

Peter Saint-Andre <stpeter@mozilla.com> Thu, 07 June 2018 20:02 UTC

To: Benjamin Kaduk <kaduk@mit.edu>, The IESG <iesg@ietf.org>
Cc: draft-hakala-urn-nbn-rfc3188bis@ietf.org, "urn@ietf.org" <urn@ietf.org>
References: <152837409539.30768.4568779645299135020.idtracker@ietfa.amsl.com>
From: Peter Saint-Andre <stpeter@mozilla.com>
Openpgp: preference=signencrypt
Autocrypt: addr=stpeter@mozilla.com; prefer-encrypt=mutual; keydata= xsFNBFonEf4BEADvZ+RGsJoOyZaw2rKedB9pBb2nNXVGgymNS9+FAL/9SsfcrKaGYSiWEz7P Lvc97hWH3LACFAHvnzoktv+4IWHjItvhdi9kUQ3Gcbahe55OcdZuSXXH3w5cHF0rKz9aYRpN jENqXM5dA8x4zIymJraqYvHlFsuuPB8rcRIV9SKsvcy14w9iRqu770NjXfE/aIsyRwwmTPiU FQ0fOSDPA/x2DLjed/GYHem90C5vF4Er9InMqH5KAMLnjIYZ9DbPx5c5EME4zW/d648HOvPB bm+roZs4JTHBhjlrTtzDDpMcxHq1e8YPvSdDLPvgFXDcTD4+ztkdO5rvDkbc61QFcLlidU8H 3KBiOVMA/5Rgl4lcWZzGfJBnwvSrKVPsxzpuCYDg01Y/7TH4AuVkv5Na6jKymJegjxEuJUNw CBzAhxOb0H9dXROkvxnRdYS9f0slcNDBrq/9h9dIBOqLhoIvhu+Bhz6L/NP5VunQWsEleGaO 3gxGh9PP/LMyjweDjPz74+7pbyOW0b5VnIDFcvCTJKP0sBJjRU/uqmQ25ckozuYrml0kqVGp EfxhSKVqCFoAS4Q7ux99yT4re2X1kmlHh3xntzmOaRpcZsS8mJEnVyhJZBMOhqE280m80ZbS CYghd2K0EIuRbexd+lfdjZ+t8ROMMdW5L51CJVigF0anyYTcAwARAQABzSdQZXRlciBTYWlu dC1BbmRyZSA8c3RwZXRlckBtb3ppbGxhLmNvbT7CwZQEEwEIAD4WIQQ1VSPTuPTvyWCdvvRl YYwYf2gUqQUCWicR/gIbIwUJCWYBgAULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgAAKCRBlYYwY f2gUqdaREAChG8qU1853mP0sv2Mersns8TLG1ztgoKHvMXFlMUpNz6Oi6CjjaMNFhP7eUY4T D43+yQs7f4qCkOAPWuuqO8FbNWQ+yUoVkqF8NUrrVkZUlZ1VZBMQHNlaEwwu1CGoHsLoRohP SiZ0hpmGTWB3V6cDDK4KN6nl610WJbzE9LeKY1AxtePdJi2KM281U0Fz8ntij1jWu0gF2xU4 Sez46JDogHLWKgd0srauhcCVzZjAhiWrXp1+ryzSWYaZO8Kh8SnF1f4o6jtYikMqkxUaI5nX wvD3kNX4AMSkCAZfG7Jcfj/SLDojTcREgO87g7B9bcOOsHN4lj3lHoFV0aXpgPmjfIvAjJHu fHkXZAQAH8w0u9bgJqRn703+A4NPfLopnjegyhlNi7fQ3cMQV1H7Oj7WrB/pCcprx+1u/6Uq oTtDwWh1U5uVthVAI0QojpNWR08zABDX19TlGtVoeygaQV3CAEolxTiYQtCfVavUzUplCZ/t 3v4YiRov+NylflJd+1akyOs1IAgARf444BnoH1fotkpfXNOpp9wUXXwsQcFRdP7vpMkSCkc0 sxPNTVX3ei0QImp4NsrFdaep7LV3zEb3wkAp6KE5Qno4hVVEypULbvB0G6twNZbeRfcs2Rjp jnPb2fofvg2WhAKB20dnRfIfK8OKTD/P+JDcauJANjmekM7BTQRaJxH+ARAApPwkbOTChAQu jMvteb/xcwuL5JZElmLxIqvJhqybV7JknM+3ATyN0CTYQFvPTgIrhpk4zSn0A6pEePdK8mKK 5/aHyd7pr7rLEi1sI/X3UE8ld/E83MExksKrYbs0UX1wSQwYXU6g64KicnuP2Abqg+8wrQ18 1nPcZci9jJI75XVPnTdUpZD5aaQWGp7IJ06NTbiOk30I50ORfulgKoe4m3UfsMALFxIx3pJk oy76xC2tjxYGf+4Uq1M0iK3Wy655GrcwXq/5ieODNUcAZzvK5hsUVRodBq0Lq3g1ivQF4ba7 RQayDzlW6XgoeU49xnCr9XdZYnTnj4iaPmr2NtY6AacBwRz+bJsyugeSyGgHsnVGyUSMk8YN wZHvUykMjH21LLzIUX5NFlcumLUXDOECELCJwewui4W81sI5Sq/WDJet+iJwwylUX22TSulG VwDS+j66TLZpk1hEwPanGLwFBSosafqSNBMDVWegKWvZZVyoNHIaaQbrTIoAwuAGvdVncSQz ttC6KkaFlAtlZt3+eUFWlMUOQ9jxQKTWymyliWKrx+S6O1cr4hwVRbg7RQkpfA8E2Loa13oO vRSQy/M2YBRZzRecTKY6nslJo6FWTftpGO7cNcvbmQ6I++5cBG1B1eNy2RFGJUzGh1vlYo51 pdfSg0U1oPHBPCHNvPYCJ7UAEQEAAcLBfAQYAQgAJhYhBDVVI9O49O/JYJ2+9GVhjBh/aBSp BQJaJxH+AhsMBQkJZgGAAAoJEGVhjBh/aBSpAw0P/1tEcEaZUO1uLenNtqysi3mQ6qAHYALR Df3p2z/RBKRVx0DJlzDfDvJ2R/GRwoo+vyCviecuG2RNKmJbf1vSm/QTtbQMUjwut9mx6KCY CyKwniqdhaMBmjCfV2DB2MxxZLYMtDfx/2mY7vzAci7AkjC+RkSUByMEOkyscUydKC/ETdf9 tvI8GhTY/8Q7JSylS3lQA5pMUHiIf+KpSmqKZeBPkGc7nSKM1w1UKUvFAsyyVsiG6A/hWrTr 7tTQAl7YfjtOGE8n4IKGktvrT99bbh9wdWKZ5FdHUN9hx2Q8VP8+0lR1CH2laVFbEwCOv1vM W4cgQDLxwwpo1iOTdHBVtQDxlQ9hPMKVlB1KP9KjchxuiLc24wLmCjP3pDMml4LQxOYB34Eq cgPZ3uHvJZG309sb2wTMTWaXobWNI++ZrsRD5GTmuzF3kkx3krtrq6HI5NSaemxK6MTDTjDN Rj/OwTl0yU35eJXuuryB20GFOSUsxiw00I2hMGQ1Cy9L/+IW6Dvotd8O3LmKh2tFArzXaKLx /rZyGNurS/Go5YjHp8wdJOs7Ka2p1U31js24PMWO6hf6hIiY2WRUsnE6xZNhvBTgKOY6u0KT V6hTevFqEw7OAZDCWUoE2Ob2/oHGZCCMW5SLAMgp7eihF0kGf2S2CmpIFYXGb61hAD8SqSY7 Fn7V
Message-ID: <6a1a100c-3bc0-76d3-3ae4-047d37906bfc@mozilla.com>
Date: Thu, 07 Jun 2018 14:02:23 -0600
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.8.0
MIME-Version: 1.0
In-Reply-To: <152837409539.30768.4568779645299135020.idtracker@ietfa.amsl.com>
Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="prnAVb0UtUcfWLw8tHWYtPhc0f2bp1iJ1"
Archived-At: <https://mailarchive.ietf.org/arch/msg/urn/HNgmUsEJwHlL4qhhKGU9vuld0QI>
Subject: Re: [urn] Benjamin Kaduk's Discuss on draft-hakala-urn-nbn-rfc3188bis-01: (with DISCUSS and COMMENT)
Precedence: list

[ + cc urn@ietf.org for broader discussion ]

Document shepherd here. I expect the document author (and perhaps my
co-author on RFC 8141) to provide further thoughts.

On 6/7/18 6:21 AM, Benjamin Kaduk wrote:
> Benjamin Kaduk has entered the following ballot position for
> draft-hakala-urn-nbn-rfc3188bis-01: Discuss
> 
> When responding, please keep the subject line intact and reply to all
> email addresses included in the To and CC lines. (Feel free to cut this
> introductory paragraph, however.)
> 
> 
> Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
> for more information about IESG DISCUSS and COMMENT positions.
> 
> 
> The document, along with other ballot positions, can be found here:
> https://datatracker.ietf.org/doc/draft-hakala-urn-nbn-rfc3188bis/
> 
> 
> 
> ----------------------------------------------------------------------
> DISCUSS:
> ----------------------------------------------------------------------
> 
> I think this document may benefit from an Internationalization
> Considerations sections, but am not entirely sure how needed it is.
> So let's discuss it...
> 
> In particular, the URN:NBN lexical equivalence rules include several
> case-insensitive comparisons, for the prefix and for the case of the
> hex digits in any percent-encoded values, but do not specify any
> operation on the decoded percent-encoded values/characters. 

As a reminder, RFC 8141 does state:

   In particular, with regard to characters outside the ASCII range,
   URNs that appear in protocols or that are passed between systems MUST
   use only Unicode characters encoded in UTF-8 and further encoded as
   required by RFC 3986.  To the extent feasible and consistent with the
   requirements of names defined and standardized elsewhere, as well as
   the principles discussed in Section 1.2, the characters used to
   represent names SHOULD be restricted to either ASCII letters and
   digits or to the characters and syntax of some widely used models
   such as those of Internationalizing Domain Names in Applications
   (IDNA) [RFC5890], Preparation, Enforcement, and Comparison of
   Internationalized Strings (PRECIS) [RFC7613], or the Unicode
   Identifier and Pattern Syntax specification [UAX31].

   In order to make URNs as stable and persistent as possible when
   protocols evolve and the environment around them changes, URN
   namespaces SHOULD NOT allow characters outside the ASCII range
   [RFC20] unless the nature of the particular URN namespace makes such
   characters necessary.

By my reading of draft-hakala-urn-nbn-rfc3188bis and RFC 8141, the
allowable case-sensitivity for nbn_string constructs generated by a
national library applies to the percent-encoded string because that is
where any comparison or equivalence-matching would occur for these
identifiers. Venturing into case matching of percent-decoded strings
would (IMHO) unnecessarily open up an ugly can of worms.

> In many
> (perhaps even most?) cases, ignoring such encoded characters for
> purposes of case-insensitive comparison is the wrong thing to do,
> but if I understand correctly, it actually is the correct thing to
> do in this case.  Namely, a NBN (or URN:NBN), once assigned, is
> essentially static data and consumers of it should not attempt to
> perform modification, Unicode normalization, etc. on it -- that
> would potentially change what is being identified (or render the
> identifier invalid). 

Well, Unicode normalization would be used as part of equivalence
operations (as in IDNA or PRECIS), but in general you are right about
modification. These are identifiers or even numbers, not malleable strings.

> On the other hand, a national library or
> delegated institution that is assigning NBNs may wish to take into
> account Unicode normalization rules and other similar considerations
> while assigning NBNs (in particular, the nbn_string component), as
> part of their allocation policy. 

It could, but as far as I know none of the national libraries have yet
gone down that path or seen the need to. Juha can tell us if I'm wrong.

> Because these can be subtle, it
> may be worth explicitly pointing out the potential issues for
> registration authorities. 

"There be dragons and don't go there" seems like fine advice.

> That, plus the directive to consumers to
> not normalize, seems like it would be appropriate content for an
> Internationalization Considerations section.

By "normalize" you mean perform equivalence matching of percent-decoded
strings (of which Unicode normalization might be one step), right? Here
again I think the answer is "don't do that" because it's equivalence
matching is done on the percent-encoded strings.

> Separately, in Section 4.2.1 where we cover 4-components, I noted
> that RFC 8141 rather discourages actually using r-components until
> their semantics are standardized.  The text here seems to be giving
> free reign for national libraries to assign their own semantics
> without any coordination with a broader community. 

Juha and perhaps John can clarify, but as I understand it the scope of a
URN resolver for NBNs would likely be within a particular national
library system, not even necessarily across all national libraries (this
is how things are deployed now in the absence of URN resolution, in any
case).

> Do we really
> want to advocate for this, as opposed to attempting to get broadly
> unified semantics for r-components Internet-wide?  (Perhaps we
> already have and I just missed it; if so, a reference here would be
> appropriate.)

The semantics of r-components are yet to be defined. I would venture
that the IETF is probably not the right place to do that work, given how
little energy remained in the URN WG at the end (and we probably didn't
have the right people in the room in the first place).

> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------
> 
> I'm a little confused on some of the places in the text that talk
> about URN:NBNs being "generated from" NBNs (and non-reuse
> thereafter) or restrictions on URN:NBN assignment (e.g.,
> uniqueness).  The procedure seems to be basically deterministic for
> creating a URN:NBN once an NBN is assigned, and potentially
> something that could be done by any party in possession of the NBN
> (i.e., not necessarily the registration authority that created the
> NBN).  So I'm not sure why the act of generating the URN:NBN has any
> significance, if anyone could do it -- the restrictions would need
> to apply at NBN assignment time in order to be useful.  (This kind
> of gets into Ben's DISCUSS point, too, in the sense that we can only
> say what prerequisites there are for national library NBN allocation
> policies in order for them to be useful with URN:NBN, but they can
> in principle do whatever they like and choose to not use URN:NBN.)

Yes, the process of creating a URN from an NBN is trivial (modulo
potentially interesting encoding of non-ASCII characters). I think the
point of the text is that an NBN URN is not exactly the same as an NBN.
Perhaps that could be worded more clearly.

> Section 3.2
> 
>    From the library community point of view it is important that the
>    f-component is not a part of the NSS and therefore f-component
>    attachment does not mean that the relevant component part is
>    identified.  Moreover, the resolution process still retrieves the
>    entire resource even if there is an f-component.  The fragment
>    selection is applied by the resolution client (e.g., browser) to the
>    media returned by the resolution process.  In other words, in this
>    latter case the fragments are logical and physical components of the
>    identified resource whereas in the former cases these "fragments" are
>    actually complete, independently named entities.
> 
> I'm not sure I'm understanding this correctly -- is the "former
> case" the thing that libraries should not do, namely, including the
> f-component in the NSS?

Now that you point it out, I'm not sure what the former case is.
Formally speaking the f-component simply is not part of the NSS, see the
ABNF in RFC 8141.

>    If an NBN identifies a work, descriptive metadata about the work
>    SHOULD be supplied.  The metadata record MAY contain links to
>    Internet-accessible digital manifestations of the work.
> 
> This left me confused.  Is it only intended to apply in the case
> described in the previous paragraph, where the resource identified
> by the NBN is not available in the Internet?  Or does it always
> apply, forcing the metadata to take precedence over delivering the
> actual work?  (Or maybe I'm just confused, and there's an easy way
> to deliver both metadata and the actual work alongside each other
> with no ambiguity.)

Juha can clarify this.

> Section 4.1
> 
>    National Bibliography Number (NBN) is a generic term referring to a
>    group of identifier systems administered by the national libraries
>    and institutions authorized by them.
> 
> "the national libraries" implies a specific set -- which ones?  It
> may be better to hedge with "some national libraries".

Or remove "the" ... "by national libraries".

> Section 4.2.2
> 
> Do we need to say anything about a URN-to-URI step before talking
> about URI-to-resource services?
> 
> I'm also wondering about any relationship between "component
> resource" NBNs and f-components of the containing work.  If there is
> are NBNs assigned to both an image within a work and that containing
> work, and an NBN with f-resource is used to refer to the image
> within the containing work, is there any relationship between the
> f-resource and the image-specific NBN?
> 
> Section 4.3
> 
>    Expressing NBNs as URNs is usually straightforward, as only ASCII
>    characters are allowed in NBN strings.  If necessary, NBNs MUST be
>    translated into canonical form as specified in RFC 8141.
> 
> When is it necessary?

It seems that in theory an NBN itself could contain non-ASCII
characters, whereas an NBN URN and its nbn_string construct can contain
only ASCII characters. At least that is my understanding.

>    Being part of the prefix, sub-namespace identifier strings are case-
>    insensitive.  They MUST NOT contain any hyphens.
> 
> This MUST seems to just duplicate a syntactic requirement from the
> ABNF; is RFC 2119 language really necessary?

/me shrugs

> Section 8
> 
>    John Klensin provided significant editorial and advisory support for
>    late versions of the draft.
> 
> Presumably that's "later versions"?

Yes.

Peter

Attachment: signature.asc

Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Adam Roach
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Peter Saint-Andre
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Peter Saint-Andre
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… John C Klensin
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Benjamin Kaduk
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Benjamin Kaduk
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Adam Roach
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Peter Saint-Andre
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… John C Klensin
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Hakala, Juha E
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Benjamin Kaduk

Re: [urn] Benjamin Kaduk's Discuss on draft-hakala-urn-nbn-rfc3188bis-01: (with DISCUSS and COMMENT)

Attachment: signature.asc