Re: [URN] Agenda for Washington Meeting -- Questions on URN

Martin J. Dürst <mduerst@ifi.unizh.ch> Fri, 05 December 1997 11:21 UTC

Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id GAA05001 for urn-ietf-out; Fri, 5 Dec 1997 06:21:15 -0500 (EST)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with ESMTP id GAA04996 for <urn-ietf@services.bunyip.com>; Fri, 5 Dec 1997 06:21:12 -0500 (EST)
Received: from josef.ifi.unizh.ch (josef.ifi.unizh.ch [130.60.48.10]) by mocha.bunyip.com (8.8.5/8.8.5) with SMTP id GAA20734; Fri, 5 Dec 1997 06:20:59 -0500 (EST)
Received: from ifi.unizh.ch by josef.ifi.unizh.ch with SMTP (PP) id <13712-0@josef.ifi.unizh.ch>; Fri, 5 Dec 1997 12:20:11 +0100
Date: Fri, 05 Dec 1997 12:19:49 +0100
From: "Martin J. Dürst" <mduerst@ifi.unizh.ch>
To: "Sam X. Sun" <ssun@CNRI.Reston.VA.US>
cc: Leslie Daigle <leslie@bunyip.com>, urn-ietf <urn-ietf@bunyip.com>
Subject: Re: [URN] Agenda for Washington Meeting -- Questions on URN
In-Reply-To: <199712030824.DAA01249@newcnri.CNRI.Reston.Va.US>
Message-ID: <Pine.SUN.3.96.971205120143.3739J-100000@enoshima.ifi.unizh.ch>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"
Sender: owner-urn-ietf@Bunyip.Com
Precedence: bulk
Reply-To: "Martin J. Dürst" <mduerst@ifi.unizh.ch>
Errors-To: owner-urn-ietf@Bunyip.Com

On Wed, 3 Dec 1997, Sam X. Sun wrote:

> In light of the handle system draft(draft-sun-handle-system-00.txt), and
> continuing on the earlier discussions with Roy, Martin, and Keith, I would
> like to raise the following questions before we wrap up the working group:

I'm sorry that I won't be able to be in Washington; I am waiting for
my visum for Japan because I will go to work there very soon.


> 2.	Most of the reserved/excluded characters of current URN are really
> restrictions from some current URL schemes, mostly from "http URL". For URN
> name space defined independently from URL, like Handle System, they are not
> required, and should not be put into the URN specification in the broader
> sense. For example, RFC1738 (url syntax), and the new URI draft
> (draft-fielding-uri-syntax-01.txt) allows reserved/excluded character sets
> to be defined on the individual URL scheme basis. Again, we are talking
> about URN in the broader sense here. How it's to be done under "urn:"
> implementation is another issue.

In particular with respect to the '#', which in the terms of
draft-fielding-uri-syntax-01.txt separates a fragment identifier
from the URI proper in an URI reference, you are well advised
to not reuse this character in your own uri scheme. Many
implementations won't like it. For other reserved characters,
the problems are smaller, but also exist.

Also, please note that because of backwards compatibility reasons,
you cannot have a non-ASCII character as scheme-reserved, unless
you completely disallow that character from the non-reserved
syntax of your scheme, because the distinction between %-escaped
and non-escaped octets/characters above 0x7F cannot be used
to distinguish between reserved and non-reserved instances
of the same character.


> 3.	Some URN specifications specifically exclude the support of user
> friendly names. While user friendly name may not be appropriate in some
> case, there are situations where friendly names are desired or even
> required. The underlying technology should not limit the usage of user
> friendly names. 

Fortunately, I currently don't see anything that limits the use
of user-friendly names. It would be difficult anyway, as human
beings are extremely good at giving meaning to even the most
obscure number sequences.


> 	On top of this, for any global naming scheme to support friendly names, it
> should not be limited to ASCII only, but should allow any native characters
> to be used directly without hex encoding. Otherwise, it can only support
> friendly names in English, but not in other languages like Greek, Russian,
> Chinese, Japanese, Korean, etc. For example, how could anyone tell that
> %C2%B7 is a friendly name in Greek? (Note: %C2%B7 is the hex encoded  UTF-8
> encoding of a Greek symbol, code point B7 defined in ISO-8859-7.)

There are various ways to see the current limitation to hex-escaped
UTF-8. One is to say that this limitation is necessary, and should
always be kept. The other is to say that it is temporary, in a
preparation for directly encoded or directly displayed native-script
URLs. Larry has published a draft on native-script UTF-8 URLs/URIs,
and I'm working on expanding this draft and filling in more details.
I'm quite confident that we are moving into the right direction
(but we could of course move faster), and that in due time, whether
or not URLs/URNs will formally be defined as ASCII-only will not
be of interest to users, because the user interface will make the
necessary conversions.


Regards,	Martin.