Re: URN UUID question

worley@ariadne.com (Dale R. Worley) Tue, 25 March 2014 14:25 UTC

Return-Path: <worley@ariadne.com>
X-Original-To: urn-nid@ietfa.amsl.com
Delivered-To: urn-nid@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 446071A011B for <urn-nid@ietfa.amsl.com>; Tue, 25 Mar 2014 07:25:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.701
X-Spam-Level:
X-Spam-Status: No, score=-1.701 tagged_above=-999 required=5 tests=[BAYES_20=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, GB_I_LETTER=-2, MIME_8BIT_HEADER=0.3] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ECfAk-FGkJ_v for <urn-nid@ietfa.amsl.com>; Tue, 25 Mar 2014 07:25:05 -0700 (PDT)
Received: from qmta12.emeryville.ca.mail.comcast.net (qmta12.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:44:76:96:27:227]) by ietfa.amsl.com (Postfix) with ESMTP id D85811A0140 for <urn-nid@ietf.org>; Tue, 25 Mar 2014 07:25:05 -0700 (PDT)
Received: from omta22.emeryville.ca.mail.comcast.net ([76.96.30.89]) by qmta12.emeryville.ca.mail.comcast.net with comcast id hpmu1n0031vN32c01qR5J7; Tue, 25 Mar 2014 14:25:05 +0000
Received: from hobgoblin.ariadne.com ([73.53.31.223]) by omta22.emeryville.ca.mail.comcast.net with comcast id hqNo1n00D4oqPT58iqNqB9; Tue, 25 Mar 2014 14:23:03 +0000
Received: from hobgoblin.ariadne.com (hobgoblin.ariadne.com [127.0.0.1]) by hobgoblin.ariadne.com (8.14.7/8.14.7) with ESMTP id s2PEMVKI031151; Tue, 25 Mar 2014 10:22:31 -0400
Received: (from worley@localhost) by hobgoblin.ariadne.com (8.14.7/8.14.7/Submit) id s2PELiuo031119; Tue, 25 Mar 2014 10:21:44 -0400
Date: Tue, 25 Mar 2014 10:21:44 -0400
Message-Id: <201403251421.s2PELiuo031119@hobgoblin.ariadne.com>
From: worley@ariadne.com (Dale R. Worley)
Sender: worley@ariadne.com (Dale R. Worley)
To: =?utf-8?Q?Martin_J=2E_D=C3=BCrst?= <duerst@it.aoyama.ac.jp>
In-reply-to: <532FFF6A.6080206@it.aoyama.ac.jp> (duerst@it.aoyama.ac.jp)
Subject: Re: URN UUID question
References: <CALPpAZ_fLwK80dcM5ty5pp2pLiafpW36uvK2WoJdKpuaWX6PQw@mail.gmail.com> <201403192139.s2JLdchi012675@hobgoblin.ariadne.com> <CALPpAZ8oqUMg6Q+HZjhkyCDPGOnFYYrrrXY=Jxr8OJH2YU39FQ@mail.gmail.com> <532FFF6A.6080206@it.aoyama.ac.jp>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1395757505; bh=XkLTMCYwyDYQY+WGypknUGtCSl/2jLPfT0X+YwRkjEg=; h=Received:Received:Received:Received:Date:Message-Id:From:To: Subject:MIME-version:Content-type; b=r8PBdEcJAQ/Tdb7yFwVmiAIa6wpiNSNzyOHXHmNQKJAw/T6UhvZym9AkDi7I/45qB uQ/Gq/kK/g9Oa5UsQOosl8Ekv/HOfY13WKg4ujAbKwrsFSLLhl90+EhC7OK8g5LM4d rAcydHERkZ0DUcCgHnTIJboDMOYFsSNjuuAd7DrOm58hc6WnolZgR2eWsJk9SGyeFd zgn3S/o0m/gzQjgpMFVSLGC3+eTt56L0RawcAls2rE0H/9PV6G2GGDd5VrHO11HVTE ugUhP4SSkVYo+wGKJuhkhK+dGXjtLtZlo3+g3KMXhKq8AgRfNDP22NU+THaCkDwykq urx/rBDs6aDww==
Archived-At: http://mailarchive.ietf.org/arch/msg/urn-nid/zOzvCbIprKQPUNjeZtIKUpDcJkg
Cc: urn-nid@ietf.org, timothy@hpl.hp.com, sandro@w3.org
X-BeenThere: urn-nid@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: discussion of new namespace identifiers for URNs <urn-nid.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn-nid>, <mailto:urn-nid-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/urn-nid/>
List-Post: <mailto:urn-nid@ietf.org>
List-Help: <mailto:urn-nid-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn-nid>, <mailto:urn-nid-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 25 Mar 2014 14:25:08 -0000

I don't have much opinion on what the "best" type of URI to use is.
And I know nothing about IRIs, though I'm fairly versed in URIs.

> From: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>

>  >>>>
>     In the interests of tractability to humans, tags SHOULD NOT be minted
>     with percent-encoded parts.  However, the tag syntax does allow
>     percent-encoded characters in the "pchar" elements (defined in RFC
>     3986 [1]).
>  >>>>
> 
> It should allow percent-encoded parts also in the authorityName part, 
> and specify that in all cases, such percent-encoded parts must be 
> created and interpreted using UTF-8. After all, that's what RFC 3986 
> (which is heavily cited) says for authority names.

> From: Sandro Hawke <sandro@w3.org>

> On the authorityName, if it's a DNSName, presumably you'd use punycode, 
> not percent encoding, right?   If it's an emailAddress, presumably you'd 
> use punycode for the DNSname part of it.  I don't know what one's 
> supposed to use for the part before the @ in an email address?      I 
> haven't kept up on the email standards.    Is there consensus about that?

If the tag URI authorityName is a DNSname, then we can use the current
"punycode" conversion (RFC 3490), and anyone who cares can recognize
and convert for display purposes any DNS label that is punycoded.
That output is within the syntax for tag URIs.

The same holds for the DNSname part of an emailAddress authorityName.

The local-part (per RFC 822) part of an emailAddress is more difficult
to handle.  To start with, the syntax specified in RFC 4151 (tag URIs)
is neither a subset nor a superset of of the syntax specified in RFC
822.  (E.g., RFC 4151 permits two adjacent ".", but RFC 822 permits
"~".)  So RFC 4151 already has a problem that it does not admit all
e-mail addresses that RFC 822 admits.  OTOH, there are tag URI
emailAddresses that are syntactically forbidden by RFC 822, which may
allow some space for extensions.

The internationalization of e-mail address local-names is in RFC 5335,
essentially by permitting non-ASCII UTF-8 characters much as if they
are ASCII letters.  Since URIs cannot contain non-ASCII characters
that doesn't help us.

Perhaps we could define an extension, where the local-name in the
emailAddress is ".." followed by the punycode encoding of the actual
internationalized e-mail local-part?  That is similar to what is done
with domain names and could be automatically recognized and translated
for display.  This convention would also allow representation of
local-parts that conform to RFC 822 but not to RFC 4151.  And it
remains within the syntax of RFC 4151.  (Though it looks like Punycode
would have to be modified to understand that some ASCII characters
aren't in the base character set.)

In regard to the "specific" part of the URI, the arbitrary part which
comes after taggingEntity, though the narrative of RFC 4151 says
percent-encoding shouldn't be used, the syntax explicitly permits it,
and the narrative only forbids percent-encoding at "SHOULD" strength,
which officially means "there may exist valid reasons in particular
circumstances to ignore a particular item, but the full implications
must be understood and carefully weighed before choosing a different
course."  And internationalization seems to be a valid reason to me.

Similarly, percent-encoding can be automatically recognized and
parsed for display purposes.

> Certainly unnecessary percent encoding is a problem because it causes 
> confusion about whether two URIs are the same.   (If you have to ask 
> that, they are not.   But people may not realize that.   Some people 
> might think "tag:sandro@hawke.org,2014:A" and 
> "tag:sandro@hawke.org,2014:%41" are the same, but they are not.)

You have to be careful with that, because the equivalence rule depends
on the URI scheme.  In the case of tag URIs, the rule is given in RFC
4151 section 2.4, and it requires that different strings are never
equivalent:

   2.4.  Equality of Tags

   Tags are simply strings of characters and are considered equal if and
   only if they are completely indistinguishable in their machine
   representations when using the same character encoding.  That is, one
   can compare tags for equality by comparing the numeric codes of their
   characters, in sequence, for numeric equality.  This criterion for
   equality allows for simplification of tag-handling software, which
   does not have to transform tags in any way to compare them.

> From: Joel Kalvesmaki <kalvesmaki@gmail.com>
> 
> I would like people who do not own or have access to a domain name to be
> able to mint names. 

The current tag scheme doesn't allow that, but it's fairly simple to
get around that problem.  (Although it's difficult to imagine anyone
who would use this system but not have an e-mail address.)  We need
some person or enterprise that has a legitimate authorityName to
delegate parts of its URI space to owners who do not have
authorityNames.  So we could separate:

    tag:joe@example.com,2014:delegations:person-A:[person A's string]
    tag:joe@example.com,2014:delegations:person-B:[person B's string]
    tag:joe@example.com,2014:delegations:person-C:[person C's string]
    tag:joe@example.com,2014:delegations:person-D:[person D's string]

We could probably set up an automatic delegation system much
resembling how tag URIs are delegated by allowing each possessor of a
telephone number a set of strings:

    tag:joe@example.com,2014:delegations:17816479199,2014:[my string]

would be automatically delegated to me because I possessed the use of
that E.164 number on 2014-01-01.

> Further, it's important that names be valid for centuries to come.

Fortunately, that's fairly easy to arrange as long as people don't
mis-construct the URIs.

> My concerns are akin to those that have motivated the architects of
> Canonical Text Services[1][2][3] to develop the convention
> "urn:cts:..." to provide names for ancient literary works, their
> fragments, and their versions. their naming scheme stands
> independent of server performance, etc. But it also can be easily
> incorporated into registries that facilitate Semantic Web
> applications.

Good Lord, haven't they thought to properly register those schemes?  I
don't see them listed in
http://www.iana.org/assignments/urn-namespaces/urn-namespaces.xhtml .

Dale