Re: [urn] Benjamin Kaduk's Discuss on draft-hakala-urn-nbn-rfc3188bis-01: (with DISCUSS and COMMENT)

Benjamin Kaduk <kaduk@mit.edu> Fri, 08 June 2018 20:32 UTC

Return-Path: <kaduk@mit.edu>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5F024131024; Fri, 8 Jun 2018 13:32:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.201
X-Spam-Level:
X-Spam-Status: No, score=-4.201 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 84eRsntPaGoZ; Fri, 8 Jun 2018 13:32:38 -0700 (PDT)
Received: from dmz-mailsec-scanner-8.mit.edu (dmz-mailsec-scanner-8.mit.edu [18.7.68.37]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D75D2129619; Fri, 8 Jun 2018 13:32:37 -0700 (PDT)
X-AuditID: 12074425-301ff700000045ef-35-5b1ae7e4a6a3
Received: from mailhub-auth-3.mit.edu ( [18.9.21.43]) (using TLS with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by dmz-mailsec-scanner-8.mit.edu (Symantec Messaging Gateway) with SMTP id DA.74.17903.4E7EA1B5; Fri, 8 Jun 2018 16:32:36 -0400 (EDT)
Received: from outgoing.mit.edu (OUTGOING-AUTH-1.MIT.EDU [18.9.28.11]) by mailhub-auth-3.mit.edu (8.13.8/8.9.2) with ESMTP id w58KWZkH019707; Fri, 8 Jun 2018 16:32:35 -0400
Received: from kduck.kaduk.org (24-107-191-124.dhcp.stls.mo.charter.com [24.107.191.124]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id w58KWUR8015567 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Fri, 8 Jun 2018 16:32:33 -0400
Date: Fri, 08 Jun 2018 15:32:30 -0500
From: Benjamin Kaduk <kaduk@mit.edu>
To: Peter Saint-Andre <stpeter@mozilla.com>
Cc: The IESG <iesg@ietf.org>, draft-hakala-urn-nbn-rfc3188bis@ietf.org, "urn@ietf.org" <urn@ietf.org>
Message-ID: <20180608203227.GD16349@kduck.kaduk.org>
References: <152837409539.30768.4568779645299135020.idtracker@ietfa.amsl.com> <6a1a100c-3bc0-76d3-3ae4-047d37906bfc@mozilla.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg="pgp-sha512"; protocol="application/pgp-signature"; boundary="ikeVEW9yuYc//A+q"
Content-Disposition: inline
In-Reply-To: <6a1a100c-3bc0-76d3-3ae4-047d37906bfc@mozilla.com>
User-Agent: Mutt/1.9.1 (2017-09-22)
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrAKsWRmVeSWpSXmKPExsUixCmqrfvkuVS0wfI2Q4s/13+xWMz4M5HZ 4tnKU4wWU5s/MDmweCxZ8pPJo+9AF2sAUxSXTUpqTmZZapG+XQJXxryzDYwFl9IrJtxaxd7A eDy4i5GTQ0LARGLRm/mMILaQwGImiU/Nsl2MXED2BkaJJR+WMEE4V5gkdl5oYAOpYhFQkehc sRXMZgOyG7ovM4PYIgLaEjcP7WUBsZkF8iSaW7eB2cICmRIXft9nB7F5gbZdevySGWJoE6PE yQ8/GCESghInZz6Bai6T2H97H1ADB5AtLbH8HwdImFPAXuLhlxtgc0QFlCX29h1in8AoMAtJ 9ywk3bMQuiHCWhI3/r1kwhDWlli28DUzhG0rsW7de5YFjOyrGGVTcqt0cxMzc4pTk3WLkxPz 8lKLdC30cjNL9FJTSjcxgiKD3UV1B+Ocv16HGAU4GJV4eBuapKKFWBPLiitzDzFKcjApifKe OC8ZLcSXlJ9SmZFYnBFfVJqTWnyIUQVo16MNqy8wSrHk5eelKonwPlMCauVNSaysSi3KhymT 5mBREufNWcQYLSSQnliSmp2aWpBaBJOV4eBQkuAVAiYGIcGi1PTUirTMnBKENBMH5yFGCQ4e oOH1z0CGFxck5hZnpkPkTzEqSonzTgJJCIAkMkrz4HpBCU0ie3/NK0ZxoLeEeW+CVPEAkyFc 9yugwUxAgz2YJUEGlyQipKQaGCc1XUpf6xFruqHm/vqJtQoX3D5947uf+8JwX3pgj2JAwBrL WJUzD9aH13/veCV43NP78q4/2g53kz+HfrqY8HrlK/nnfCkZExTyg/7812/wrv1T+lN12+PC zb9eic2ccfT8H0bGbe9X73x96ZVBXvyDm983/U/99XfGu1DtbRVR6xUeN3GdzD6oxFKckWio xVxUnAgA+hmvM0MDAAA=
Archived-At: <https://mailarchive.ietf.org/arch/msg/urn/ZEKRmVH0PNEtshWAcGzmXMSQ-Pg>
Subject: Re: [urn] Benjamin Kaduk's Discuss on draft-hakala-urn-nbn-rfc3188bis-01: (with DISCUSS and COMMENT)
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.26
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/urn/>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Jun 2018 20:32:42 -0000

I'm happy to see the main point of discussion progressing with input
from people who know more about the subject than me ... that said, I
can comment on some of the other points, inline.

On Thu, Jun 07, 2018 at 02:02:23PM -0600, Peter Saint-Andre wrote:
> [ + cc urn@ietf.org for broader discussion ]
> 
> Document shepherd here. I expect the document author (and perhaps my
> co-author on RFC 8141) to provide further thoughts.
> 
> On 6/7/18 6:21 AM, Benjamin Kaduk wrote:
> > Benjamin Kaduk has entered the following ballot position for
> > draft-hakala-urn-nbn-rfc3188bis-01: Discuss
> > 
> > When responding, please keep the subject line intact and reply to all
> > email addresses included in the To and CC lines. (Feel free to cut this
> > introductory paragraph, however.)
> > 
> > 
> > Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
> > for more information about IESG DISCUSS and COMMENT positions.
> > 
> > 
> > The document, along with other ballot positions, can be found here:
> > https://datatracker.ietf.org/doc/draft-hakala-urn-nbn-rfc3188bis/
> > 
> > 
> > 
> > ----------------------------------------------------------------------
> > DISCUSS:
> > ----------------------------------------------------------------------
> > 
> > I think this document may benefit from an Internationalization
> > Considerations sections, but am not entirely sure how needed it is.
> > So let's discuss it...
> > 
> > In particular, the URN:NBN lexical equivalence rules include several
> > case-insensitive comparisons, for the prefix and for the case of the
> > hex digits in any percent-encoded values, but do not specify any
> > operation on the decoded percent-encoded values/characters. 
> 
> As a reminder, RFC 8141 does state:
> 
>    In particular, with regard to characters outside the ASCII range,
>    URNs that appear in protocols or that are passed between systems MUST
>    use only Unicode characters encoded in UTF-8 and further encoded as
>    required by RFC 3986.  To the extent feasible and consistent with the
>    requirements of names defined and standardized elsewhere, as well as
>    the principles discussed in Section 1.2, the characters used to
>    represent names SHOULD be restricted to either ASCII letters and
>    digits or to the characters and syntax of some widely used models
>    such as those of Internationalizing Domain Names in Applications
>    (IDNA) [RFC5890], Preparation, Enforcement, and Comparison of
>    Internationalized Strings (PRECIS) [RFC7613], or the Unicode
>    Identifier and Pattern Syntax specification [UAX31].
> 
>    In order to make URNs as stable and persistent as possible when
>    protocols evolve and the environment around them changes, URN
>    namespaces SHOULD NOT allow characters outside the ASCII range
>    [RFC20] unless the nature of the particular URN namespace makes such
>    characters necessary.
> 
> By my reading of draft-hakala-urn-nbn-rfc3188bis and RFC 8141, the
> allowable case-sensitivity for nbn_string constructs generated by a
> national library applies to the percent-encoded string because that is
> where any comparison or equivalence-matching would occur for these
> identifiers. Venturing into case matching of percent-decoded strings
> would (IMHO) unnecessarily open up an ugly can of worms.
> 
> > In many
> > (perhaps even most?) cases, ignoring such encoded characters for
> > purposes of case-insensitive comparison is the wrong thing to do,
> > but if I understand correctly, it actually is the correct thing to
> > do in this case.  Namely, a NBN (or URN:NBN), once assigned, is
> > essentially static data and consumers of it should not attempt to
> > perform modification, Unicode normalization, etc. on it -- that
> > would potentially change what is being identified (or render the
> > identifier invalid). 
> 
> Well, Unicode normalization would be used as part of equivalence
> operations (as in IDNA or PRECIS), but in general you are right about
> modification. These are identifiers or even numbers, not malleable strings.
> 
> > On the other hand, a national library or
> > delegated institution that is assigning NBNs may wish to take into
> > account Unicode normalization rules and other similar considerations
> > while assigning NBNs (in particular, the nbn_string component), as
> > part of their allocation policy. 
> 
> It could, but as far as I know none of the national libraries have yet
> gone down that path or seen the need to. Juha can tell us if I'm wrong.
> 
> > Because these can be subtle, it
> > may be worth explicitly pointing out the potential issues for
> > registration authorities. 
> 
> "There be dragons and don't go there" seems like fine advice.
> 
> > That, plus the directive to consumers to
> > not normalize, seems like it would be appropriate content for an
> > Internationalization Considerations section.
> 
> By "normalize" you mean perform equivalence matching of percent-decoded
> strings (of which Unicode normalization might be one step), right? Here
> again I think the answer is "don't do that" because it's equivalence
> matching is done on the percent-encoded strings.

I did not have a terribly concrete scenario in mind when I wrote
this; I think the one Adam described is probably enough to get us
thinking about the right things.

> > Separately, in Section 4.2.1 where we cover 4-components, I noted
> > that RFC 8141 rather discourages actually using r-components until
> > their semantics are standardized.  The text here seems to be giving
> > free reign for national libraries to assign their own semantics
> > without any coordination with a broader community. 
> 
> Juha and perhaps John can clarify, but as I understand it the scope of a
> URN resolver for NBNs would likely be within a particular national
> library system, not even necessarily across all national libraries (this
> is how things are deployed now in the absence of URN resolution, in any
> case).
> 
> > Do we really
> > want to advocate for this, as opposed to attempting to get broadly
> > unified semantics for r-components Internet-wide?  (Perhaps we
> > already have and I just missed it; if so, a reference here would be
> > appropriate.)
> 
> The semantics of r-components are yet to be defined. I would venture
> that the IETF is probably not the right place to do that work, given how
> little energy remained in the URN WG at the end (and we probably didn't
> have the right people in the room in the first place).

I won't argue with that.  Does it make sense to say something like
"There are not currently any broadly accepted semantics for
r-components at the time of this writing which may be grounds to be
cautious with their use" in this document?

> > ----------------------------------------------------------------------
> > COMMENT:
> > ----------------------------------------------------------------------
> > 
> > I'm a little confused on some of the places in the text that talk
> > about URN:NBNs being "generated from" NBNs (and non-reuse
> > thereafter) or restrictions on URN:NBN assignment (e.g.,
> > uniqueness).  The procedure seems to be basically deterministic for
> > creating a URN:NBN once an NBN is assigned, and potentially
> > something that could be done by any party in possession of the NBN
> > (i.e., not necessarily the registration authority that created the
> > NBN).  So I'm not sure why the act of generating the URN:NBN has any
> > significance, if anyone could do it -- the restrictions would need
> > to apply at NBN assignment time in order to be useful.  (This kind
> > of gets into Ben's DISCUSS point, too, in the sense that we can only
> > say what prerequisites there are for national library NBN allocation
> > policies in order for them to be useful with URN:NBN, but they can
> > in principle do whatever they like and choose to not use URN:NBN.)
> 
> Yes, the process of creating a URN from an NBN is trivial (modulo
> potentially interesting encoding of non-ASCII characters). I think the
> point of the text is that an NBN URN is not exactly the same as an NBN.
> Perhaps that could be worded more clearly.

Okay.  (I don't think I have any suggestions for different text.)

> > Section 3.2
> > 
> >    From the library community point of view it is important that the
> >    f-component is not a part of the NSS and therefore f-component
> >    attachment does not mean that the relevant component part is
> >    identified.  Moreover, the resolution process still retrieves the
> >    entire resource even if there is an f-component.  The fragment
> >    selection is applied by the resolution client (e.g., browser) to the
> >    media returned by the resolution process.  In other words, in this
> >    latter case the fragments are logical and physical components of the
> >    identified resource whereas in the former cases these "fragments" are
> >    actually complete, independently named entities.
> > 
> > I'm not sure I'm understanding this correctly -- is the "former
> > case" the thing that libraries should not do, namely, including the
> > f-component in the NSS?
> 
> Now that you point it out, I'm not sure what the former case is.
> Formally speaking the f-component simply is not part of the NSS, see the
> ABNF in RFC 8141.

I guess we should wait for Juha to clarify.

> >    If an NBN identifies a work, descriptive metadata about the work
> >    SHOULD be supplied.  The metadata record MAY contain links to
> >    Internet-accessible digital manifestations of the work.
> > 
> > This left me confused.  Is it only intended to apply in the case
> > described in the previous paragraph, where the resource identified
> > by the NBN is not available in the Internet?  Or does it always
> > apply, forcing the metadata to take precedence over delivering the
> > actual work?  (Or maybe I'm just confused, and there's an easy way
> > to deliver both metadata and the actual work alongside each other
> > with no ambiguity.)
> 
> Juha can clarify this.
> 
> > Section 4.1
> > 
> >    National Bibliography Number (NBN) is a generic term referring to a
> >    group of identifier systems administered by the national libraries
> >    and institutions authorized by them.
> > 
> > "the national libraries" implies a specific set -- which ones?  It
> > may be better to hedge with "some national libraries".
> 
> Or remove "the" ... "by national libraries".

That's probably better :)

Thanks,

Benjamin

> > Section 4.2.2
> > 
> > Do we need to say anything about a URN-to-URI step before talking
> > about URI-to-resource services?
> > 
> > I'm also wondering about any relationship between "component
> > resource" NBNs and f-components of the containing work.  If there is
> > are NBNs assigned to both an image within a work and that containing
> > work, and an NBN with f-resource is used to refer to the image
> > within the containing work, is there any relationship between the
> > f-resource and the image-specific NBN?
> > 
> > Section 4.3
> > 
> >    Expressing NBNs as URNs is usually straightforward, as only ASCII
> >    characters are allowed in NBN strings.  If necessary, NBNs MUST be
> >    translated into canonical form as specified in RFC 8141.
> > 
> > When is it necessary?
> 
> It seems that in theory an NBN itself could contain non-ASCII
> characters, whereas an NBN URN and its nbn_string construct can contain
> only ASCII characters. At least that is my understanding.
> 
> >    Being part of the prefix, sub-namespace identifier strings are case-
> >    insensitive.  They MUST NOT contain any hyphens.
> > 
> > This MUST seems to just duplicate a syntactic requirement from the
> > ABNF; is RFC 2119 language really necessary?
> 
> /me shrugs
> 
> > Section 8
> > 
> >    John Klensin provided significant editorial and advisory support for
> >    late versions of the draft.
> > 
> > Presumably that's "later versions"?
> 
> Yes.
> 
> Peter
> 
>