Re: [urn] Benjamin Kaduk's Discuss on draft-hakala-urn-nbn-rfc3188bis-01: (with DISCUSS and COMMENT)

Benjamin Kaduk <kaduk@mit.edu> Fri, 08 June 2018 20:32 UTC

Date: Fri, 08 Jun 2018 15:32:30 -0500
From: Benjamin Kaduk <kaduk@mit.edu>
To: Peter Saint-Andre <stpeter@mozilla.com>
Cc: The IESG <iesg@ietf.org>, draft-hakala-urn-nbn-rfc3188bis@ietf.org, "urn@ietf.org" <urn@ietf.org>
Message-ID: <20180608203227.GD16349@kduck.kaduk.org>
References: <152837409539.30768.4568779645299135020.idtracker@ietfa.amsl.com> <6a1a100c-3bc0-76d3-3ae4-047d37906bfc@mozilla.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg="pgp-sha512"; protocol="application/pgp-signature"; boundary="ikeVEW9yuYc//A+q"
Content-Disposition: inline
In-Reply-To: <6a1a100c-3bc0-76d3-3ae4-047d37906bfc@mozilla.com>
User-Agent: Mutt/1.9.1 (2017-09-22)
Archived-At: <https://mailarchive.ietf.org/arch/msg/urn/ZEKRmVH0PNEtshWAcGzmXMSQ-Pg>
Subject: Re: [urn] Benjamin Kaduk's Discuss on draft-hakala-urn-nbn-rfc3188bis-01: (with DISCUSS and COMMENT)
Precedence: list

I'm happy to see the main point of discussion progressing with input
from people who know more about the subject than me ... that said, I
can comment on some of the other points, inline.

On Thu, Jun 07, 2018 at 02:02:23PM -0600, Peter Saint-Andre wrote:
> [ + cc urn@ietf.org for broader discussion ]
> 
> Document shepherd here. I expect the document author (and perhaps my
> co-author on RFC 8141) to provide further thoughts.
> 
> On 6/7/18 6:21 AM, Benjamin Kaduk wrote:
> > Benjamin Kaduk has entered the following ballot position for
> > draft-hakala-urn-nbn-rfc3188bis-01: Discuss
> > 
> > When responding, please keep the subject line intact and reply to all
> > email addresses included in the To and CC lines. (Feel free to cut this
> > introductory paragraph, however.)
> > 
> > 
> > Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
> > for more information about IESG DISCUSS and COMMENT positions.
> > 
> > 
> > The document, along with other ballot positions, can be found here:
> > https://datatracker.ietf.org/doc/draft-hakala-urn-nbn-rfc3188bis/
> > 
> > 
> > 
> > ----------------------------------------------------------------------
> > DISCUSS:
> > ----------------------------------------------------------------------
> > 
> > I think this document may benefit from an Internationalization
> > Considerations sections, but am not entirely sure how needed it is.
> > So let's discuss it...
> > 
> > In particular, the URN:NBN lexical equivalence rules include several
> > case-insensitive comparisons, for the prefix and for the case of the
> > hex digits in any percent-encoded values, but do not specify any
> > operation on the decoded percent-encoded values/characters. 
> 
> As a reminder, RFC 8141 does state:
> 
>    In particular, with regard to characters outside the ASCII range,
>    URNs that appear in protocols or that are passed between systems MUST
>    use only Unicode characters encoded in UTF-8 and further encoded as
>    required by RFC 3986.  To the extent feasible and consistent with the
>    requirements of names defined and standardized elsewhere, as well as
>    the principles discussed in Section 1.2, the characters used to
>    represent names SHOULD be restricted to either ASCII letters and
>    digits or to the characters and syntax of some widely used models
>    such as those of Internationalizing Domain Names in Applications
>    (IDNA) [RFC5890], Preparation, Enforcement, and Comparison of
>    Internationalized Strings (PRECIS) [RFC7613], or the Unicode
>    Identifier and Pattern Syntax specification [UAX31].
> 
>    In order to make URNs as stable and persistent as possible when
>    protocols evolve and the environment around them changes, URN
>    namespaces SHOULD NOT allow characters outside the ASCII range
>    [RFC20] unless the nature of the particular URN namespace makes such
>    characters necessary.
> 
> By my reading of draft-hakala-urn-nbn-rfc3188bis and RFC 8141, the
> allowable case-sensitivity for nbn_string constructs generated by a
> national library applies to the percent-encoded string because that is
> where any comparison or equivalence-matching would occur for these
> identifiers. Venturing into case matching of percent-decoded strings
> would (IMHO) unnecessarily open up an ugly can of worms.
> 
> > In many
> > (perhaps even most?) cases, ignoring such encoded characters for
> > purposes of case-insensitive comparison is the wrong thing to do,
> > but if I understand correctly, it actually is the correct thing to
> > do in this case.  Namely, a NBN (or URN:NBN), once assigned, is
> > essentially static data and consumers of it should not attempt to
> > perform modification, Unicode normalization, etc. on it -- that
> > would potentially change what is being identified (or render the
> > identifier invalid). 
> 
> Well, Unicode normalization would be used as part of equivalence
> operations (as in IDNA or PRECIS), but in general you are right about
> modification. These are identifiers or even numbers, not malleable strings.
> 
> > On the other hand, a national library or
> > delegated institution that is assigning NBNs may wish to take into
> > account Unicode normalization rules and other similar considerations
> > while assigning NBNs (in particular, the nbn_string component), as
> > part of their allocation policy. 
> 
> It could, but as far as I know none of the national libraries have yet
> gone down that path or seen the need to. Juha can tell us if I'm wrong.
> 
> > Because these can be subtle, it
> > may be worth explicitly pointing out the potential issues for
> > registration authorities. 
> 
> "There be dragons and don't go there" seems like fine advice.
> 
> > That, plus the directive to consumers to
> > not normalize, seems like it would be appropriate content for an
> > Internationalization Considerations section.
> 
> By "normalize" you mean perform equivalence matching of percent-decoded
> strings (of which Unicode normalization might be one step), right? Here
> again I think the answer is "don't do that" because it's equivalence
> matching is done on the percent-encoded strings.

I did not have a terribly concrete scenario in mind when I wrote
this; I think the one Adam described is probably enough to get us
thinking about the right things.

> > Separately, in Section 4.2.1 where we cover 4-components, I noted
> > that RFC 8141 rather discourages actually using r-components until
> > their semantics are standardized.  The text here seems to be giving
> > free reign for national libraries to assign their own semantics
> > without any coordination with a broader community. 
> 
> Juha and perhaps John can clarify, but as I understand it the scope of a
> URN resolver for NBNs would likely be within a particular national
> library system, not even necessarily across all national libraries (this
> is how things are deployed now in the absence of URN resolution, in any
> case).
> 
> > Do we really
> > want to advocate for this, as opposed to attempting to get broadly
> > unified semantics for r-components Internet-wide?  (Perhaps we
> > already have and I just missed it; if so, a reference here would be
> > appropriate.)
> 
> The semantics of r-components are yet to be defined. I would venture
> that the IETF is probably not the right place to do that work, given how
> little energy remained in the URN WG at the end (and we probably didn't
> have the right people in the room in the first place).

I won't argue with that.  Does it make sense to say something like
"There are not currently any broadly accepted semantics for
r-components at the time of this writing which may be grounds to be
cautious with their use" in this document?

> > ----------------------------------------------------------------------
> > COMMENT:
> > ----------------------------------------------------------------------
> > 
> > I'm a little confused on some of the places in the text that talk
> > about URN:NBNs being "generated from" NBNs (and non-reuse
> > thereafter) or restrictions on URN:NBN assignment (e.g.,
> > uniqueness).  The procedure seems to be basically deterministic for
> > creating a URN:NBN once an NBN is assigned, and potentially
> > something that could be done by any party in possession of the NBN
> > (i.e., not necessarily the registration authority that created the
> > NBN).  So I'm not sure why the act of generating the URN:NBN has any
> > significance, if anyone could do it -- the restrictions would need
> > to apply at NBN assignment time in order to be useful.  (This kind
> > of gets into Ben's DISCUSS point, too, in the sense that we can only
> > say what prerequisites there are for national library NBN allocation
> > policies in order for them to be useful with URN:NBN, but they can
> > in principle do whatever they like and choose to not use URN:NBN.)
> 
> Yes, the process of creating a URN from an NBN is trivial (modulo
> potentially interesting encoding of non-ASCII characters). I think the
> point of the text is that an NBN URN is not exactly the same as an NBN.
> Perhaps that could be worded more clearly.

Okay.  (I don't think I have any suggestions for different text.)

> > Section 3.2
> > 
> >    From the library community point of view it is important that the
> >    f-component is not a part of the NSS and therefore f-component
> >    attachment does not mean that the relevant component part is
> >    identified.  Moreover, the resolution process still retrieves the
> >    entire resource even if there is an f-component.  The fragment
> >    selection is applied by the resolution client (e.g., browser) to the
> >    media returned by the resolution process.  In other words, in this
> >    latter case the fragments are logical and physical components of the
> >    identified resource whereas in the former cases these "fragments" are
> >    actually complete, independently named entities.
> > 
> > I'm not sure I'm understanding this correctly -- is the "former
> > case" the thing that libraries should not do, namely, including the
> > f-component in the NSS?
> 
> Now that you point it out, I'm not sure what the former case is.
> Formally speaking the f-component simply is not part of the NSS, see the
> ABNF in RFC 8141.

I guess we should wait for Juha to clarify.

> >    If an NBN identifies a work, descriptive metadata about the work
> >    SHOULD be supplied.  The metadata record MAY contain links to
> >    Internet-accessible digital manifestations of the work.
> > 
> > This left me confused.  Is it only intended to apply in the case
> > described in the previous paragraph, where the resource identified
> > by the NBN is not available in the Internet?  Or does it always
> > apply, forcing the metadata to take precedence over delivering the
> > actual work?  (Or maybe I'm just confused, and there's an easy way
> > to deliver both metadata and the actual work alongside each other
> > with no ambiguity.)
> 
> Juha can clarify this.
> 
> > Section 4.1
> > 
> >    National Bibliography Number (NBN) is a generic term referring to a
> >    group of identifier systems administered by the national libraries
> >    and institutions authorized by them.
> > 
> > "the national libraries" implies a specific set -- which ones?  It
> > may be better to hedge with "some national libraries".
> 
> Or remove "the" ... "by national libraries".

That's probably better :)

Thanks,

Benjamin

> > Section 4.2.2
> > 
> > Do we need to say anything about a URN-to-URI step before talking
> > about URI-to-resource services?
> > 
> > I'm also wondering about any relationship between "component
> > resource" NBNs and f-components of the containing work.  If there is
> > are NBNs assigned to both an image within a work and that containing
> > work, and an NBN with f-resource is used to refer to the image
> > within the containing work, is there any relationship between the
> > f-resource and the image-specific NBN?
> > 
> > Section 4.3
> > 
> >    Expressing NBNs as URNs is usually straightforward, as only ASCII
> >    characters are allowed in NBN strings.  If necessary, NBNs MUST be
> >    translated into canonical form as specified in RFC 8141.
> > 
> > When is it necessary?
> 
> It seems that in theory an NBN itself could contain non-ASCII
> characters, whereas an NBN URN and its nbn_string construct can contain
> only ASCII characters. At least that is my understanding.
> 
> >    Being part of the prefix, sub-namespace identifier strings are case-
> >    insensitive.  They MUST NOT contain any hyphens.
> > 
> > This MUST seems to just duplicate a syntactic requirement from the
> > ABNF; is RFC 2119 language really necessary?
> 
> /me shrugs
> 
> > Section 8
> > 
> >    John Klensin provided significant editorial and advisory support for
> >    late versions of the draft.
> > 
> > Presumably that's "later versions"?
> 
> Yes.
> 
> Peter
> 
>

Attachment: signature.asc

Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Adam Roach
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Peter Saint-Andre
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Peter Saint-Andre
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… John C Klensin
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Benjamin Kaduk
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Benjamin Kaduk
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Adam Roach
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Peter Saint-Andre
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… John C Klensin
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Hakala, Juha E
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Benjamin Kaduk

Re: [urn] Benjamin Kaduk's Discuss on draft-hakala-urn-nbn-rfc3188bis-01: (with DISCUSS and COMMENT)

Attachment: signature.asc