Re: [urn] Benjamin Kaduk's Discuss on draft-hakala-urn-nbn-rfc3188bis-01: (with DISCUSS and COMMENT)

Adam Roach <adam@nostrum.com> Thu, 07 June 2018 20:40 UTC

Return-Path: <adam@nostrum.com>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5CE7C130F58; Thu, 7 Jun 2018 13:40:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.88
X-Spam-Level:
X-Spam-Status: No, score=-1.88 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, T_SPF_HELO_PERMERROR=0.01, T_SPF_PERMERROR=0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8lzB-Ei-ViuZ; Thu, 7 Jun 2018 13:40:53 -0700 (PDT)
Received: from nostrum.com (raven-v6.nostrum.com [IPv6:2001:470:d:1130::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E8EFC130DC6; Thu, 7 Jun 2018 13:40:52 -0700 (PDT)
Received: from Svantevit.local (99-152-146-228.lightspeed.dllstx.sbcglobal.net [99.152.146.228]) (authenticated bits=0) by nostrum.com (8.15.2/8.15.2) with ESMTPSA id w57Keobx068860 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Thu, 7 Jun 2018 15:40:51 -0500 (CDT) (envelope-from adam@nostrum.com)
X-Authentication-Warning: raven.nostrum.com: Host 99-152-146-228.lightspeed.dllstx.sbcglobal.net [99.152.146.228] claimed to be Svantevit.local
To: Peter Saint-Andre <stpeter@mozilla.com>, Benjamin Kaduk <kaduk@mit.edu>, The IESG <iesg@ietf.org>
Cc: "urn@ietf.org" <urn@ietf.org>, draft-hakala-urn-nbn-rfc3188bis@ietf.org
References: <152837409539.30768.4568779645299135020.idtracker@ietfa.amsl.com> <6a1a100c-3bc0-76d3-3ae4-047d37906bfc@mozilla.com>
From: Adam Roach <adam@nostrum.com>
Message-ID: <7161340e-014b-3740-83ed-39f4db3a30c0@nostrum.com>
Date: Thu, 07 Jun 2018 15:40:44 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.8.0
MIME-Version: 1.0
In-Reply-To: <6a1a100c-3bc0-76d3-3ae4-047d37906bfc@mozilla.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/urn/nNVUQRk3-DYo-luRzd3q7nwcHCw>
Subject: Re: [urn] Benjamin Kaduk's Discuss on draft-hakala-urn-nbn-rfc3188bis-01: (with DISCUSS and COMMENT)
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.26
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/urn/>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jun 2018 20:40:56 -0000

On 6/7/18 3:02 PM, Peter Saint-Andre wrote:
> By "normalize" you mean perform equivalence matching of percent-decoded
> strings (of which Unicode normalization might be one step), right? Here
> again I think the answer is "don't do that" because it's equivalence
> matching is done on the percent-encoded strings.


I think the concern here is the translation between percent-encoded URNs 
on the wire and the display form that users enter into lookup forms. For 
example, imagine some alt-history version of a country that decided that 
its NBNs would take the form <author-last-name>-<serial>. Encoded as a 
URN, this might look like urn:nbn:dd:roach-157.

The issue, of course, is that "urn:nbn:dd:m%c3%bcller-127" doesn't match 
"urn:nbn:dd:mu%cc%88ller-127".

So the problem doesn't occur at the level you mention; it happens 
somewhere between the keyboard and the network card of a querying user. 
That's not really this document's problem per se, but it is definitely a 
dragon that needs flagging. I'm not sure whether "don't do that then" is 
the correct advice, unless we have reason to believe that national 
libraries are hyper-aware of URN considerations when designing their NBN 
schemes. What we probably need to say is "if your national library 
defines an NBN that can contain percent-encoded characters higher than 
U+007F, then that same body needs to carefully define the canonical 
transformation from NBNs into URNs, including normalization forms."

/a