Re: [urn] Benjamin Kaduk's Discuss on draft-hakala-urn-nbn-rfc3188bis-01: (with DISCUSS and COMMENT)

Adam Roach <adam@nostrum.com> Thu, 07 June 2018 20:40 UTC

To: Peter Saint-Andre <stpeter@mozilla.com>, Benjamin Kaduk <kaduk@mit.edu>, The IESG <iesg@ietf.org>
Cc: "urn@ietf.org" <urn@ietf.org>, draft-hakala-urn-nbn-rfc3188bis@ietf.org
References: <152837409539.30768.4568779645299135020.idtracker@ietfa.amsl.com> <6a1a100c-3bc0-76d3-3ae4-047d37906bfc@mozilla.com>
From: Adam Roach <adam@nostrum.com>
Message-ID: <7161340e-014b-3740-83ed-39f4db3a30c0@nostrum.com>
Date: Thu, 07 Jun 2018 15:40:44 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.8.0
MIME-Version: 1.0
In-Reply-To: <6a1a100c-3bc0-76d3-3ae4-047d37906bfc@mozilla.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/urn/nNVUQRk3-DYo-luRzd3q7nwcHCw>
Subject: Re: [urn] Benjamin Kaduk's Discuss on draft-hakala-urn-nbn-rfc3188bis-01: (with DISCUSS and COMMENT)
Precedence: list

On 6/7/18 3:02 PM, Peter Saint-Andre wrote:
> By "normalize" you mean perform equivalence matching of percent-decoded
> strings (of which Unicode normalization might be one step), right? Here
> again I think the answer is "don't do that" because it's equivalence
> matching is done on the percent-encoded strings.

I think the concern here is the translation between percent-encoded URNs 
on the wire and the display form that users enter into lookup forms. For 
example, imagine some alt-history version of a country that decided that 
its NBNs would take the form <author-last-name>-<serial>. Encoded as a 
URN, this might look like urn:nbn:dd:roach-157.

The issue, of course, is that "urn:nbn:dd:m%c3%bcller-127" doesn't match 
"urn:nbn:dd:mu%cc%88ller-127".

So the problem doesn't occur at the level you mention; it happens 
somewhere between the keyboard and the network card of a querying user. 
That's not really this document's problem per se, but it is definitely a 
dragon that needs flagging. I'm not sure whether "don't do that then" is 
the correct advice, unless we have reason to believe that national 
libraries are hyper-aware of URN considerations when designing their NBN 
schemes. What we probably need to say is "if your national library 
defines an NBN that can contain percent-encoded characters higher than 
U+007F, then that same body needs to carefully define the canonical 
transformation from NBNs into URNs, including normalization forms."

/a

Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Adam Roach
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Peter Saint-Andre
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Peter Saint-Andre
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… John C Klensin
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Benjamin Kaduk
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Benjamin Kaduk
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Adam Roach
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Peter Saint-Andre
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… John C Klensin
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Hakala, Juha E
Re: [urn] Benjamin Kaduk's Discuss on draft-hakal… Benjamin Kaduk