Re: [urn] I-D Action: draft-ietf-urnbis-rfc2141bis-urn-01.txt

Peter Saint-Andre <stpeter@stpeter.im> Thu, 09 February 2012 21:23 UTC

Return-Path: <stpeter@stpeter.im>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0734E11E808A for <urn@ietfa.amsl.com>; Thu, 9 Feb 2012 13:23:40 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.239
X-Spam-Level:
X-Spam-Status: No, score=-102.239 tagged_above=-999 required=5 tests=[AWL=0.060, BAYES_00=-2.599, MIME_8BIT_HEADER=0.3, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Essipvmr1dzU for <urn@ietfa.amsl.com>; Thu, 9 Feb 2012 13:23:39 -0800 (PST)
Received: from stpeter.im (mailhost.stpeter.im [207.210.219.225]) by ietfa.amsl.com (Postfix) with ESMTP id 27A1711E8085 for <urn@ietf.org>; Thu, 9 Feb 2012 13:23:39 -0800 (PST)
Received: from squire.local (unknown [72.163.0.129]) (Authenticated sender: stpeter) by stpeter.im (Postfix) with ESMTPSA id 6BA9B40058; Thu, 9 Feb 2012 14:34:12 -0700 (MST)
Message-ID: <4F343959.5040702@stpeter.im>
Date: Thu, 09 Feb 2012 14:23:37 -0700
From: Peter Saint-Andre <stpeter@stpeter.im>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; rv:10.0) Gecko/20120129 Thunderbird/10.0
MIME-Version: 1.0
To: =?UTF-8?B?QWxmcmVkIO+/vQ==?= <ah@TR-Sys.de>
References: <201201191701.SAA02623@TR-Sys.de>
In-Reply-To: <201201191701.SAA02623@TR-Sys.de>
X-Enigmail-Version: 1.3.5
OpenPGP: url=https://stpeter.im/stpeter.asc
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Cc: urn@ietf.org
Subject: Re: [urn] I-D Action: draft-ietf-urnbis-rfc2141bis-urn-01.txt
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Discussions about possible revisions to the definition of Uniform Resource Names <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/urn>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Feb 2012 21:23:40 -0000

<hat type='individual'/>

On 1/19/12 10:01 AM, Alfred � wrote:
> 
> On 2012-01-19, Lars Svenson wrote:
> 
>> And a final thought about Lexical Equivalence (out of scope for
>> 2141bis, but relevant to 3406bis):
>> If I have a namspace that allows characters which I must percent-
>> encode in a URN (like '������'), how can I specify that the NSS
>> '���' is lexically equivalent to '���'?
> 
> A namespace that needs/wants to incorporate non-ASCII characters
> (which will be first UTF-8 encoded and then percent-encoded for
> inclusion in URNs) needs to determine its appropriate rules.
> In case of roman characters with accents, the case mapping
> properties and normalization rules/forms specified by the
> Unicode Standard might be good candidates to draw from.

The WG might also consider mapping rules such as those in RFC 5895, for
example:

   1.  Uppercase characters are mapped to their lowercase equivalents by
       using the algorithm for mapping case in Unicode characters.

and:

   3.  All characters are mapped using Unicode Normalization Form C
       (NFC).

> In general and ultimately, any equivalence not easily expressible
> in syntax rules will need to be instantiated by the registration /
> resolution systems for a specific namespace, and the methods for
> implementation are strictly a "local" matter for the maintainers
> of the registration system(s).
> 
> A similar problem is well-known for domain names, where it is
> well-known that, in particular for non-roman scripts,
> "human-friendly" equivalence for identifiers is impossible to
> be specified on an "absolute base" and achieved in a distributed
> manner because human-perceived equivalence is frequently culture
> and context dependent; hence, in the context of the DNS, only
> the name registration system and the authoritative servers for a
> domain can implement and enforce the non-ASCII equivalence rules
> intended for that particular domain.
> 
> So, in your example above, the respective namespace document could
> specify a "normal form" of the NSS for that namespace and the rules
> to achieve it (e.g., Unicode NFxx normalization and case mapping to
> lower-case), or it could simply state that any appropriate
> equivalence will be deliverd by the resolution system.

If we say that all equivalence is the responsibility of the resolution
system, then how are URN implementations supposed to process or compare
URNs in a consistent fashion? It seems more helpful to specify some
mapping rules that are applied before percent-encoding occurs.

> Therefore, and in fact, for some namespaces, _lexical_ equivalence
> of URNs (i.e., what a client system can determine) might become
> less important than _semantical_ equivalence (implemented in the
> registration / resolution services).

I doubt that we'll ever agree on semantic equivalence, and in any case I
think that's out of scope for our work here (RFC 2141 speaks of lexical
equivalence and IMHO the best we can hope for is to clarify exactly what
we mean by that).

Peter

-- 
Peter Saint-Andre
https://stpeter.im/