Re: UTF-8 and URLs

"Martin J. Duerst" <mduerst@ifi.unizh.ch> Sun, 27 April 1997 16:31 UTC

Date: Sun, 27 Apr 1997 18:16:02 +0200
From: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
To: Francois Yergeau <yergeau@alis.com>
Cc: uri@bunyip.com
Subject: Re: UTF-8 and URLs
In-Reply-To: <3.0.1.32.19970425102234.00d53550@genstar.alis.ca>
Message-Id: <Pine.SUN.3.96.970427175721.245P-100000@enoshima>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset="ISO-8859-1"
Sender: owner-uri@bunyip.com
Precedence: bulk
Content-Transfer-Encoding: quoted-printable

On Fri, 25 Apr 1997, Francois Yergeau wrote:

> À 00:25 25-04-97 -0500, Dan Connolly a écrit :
> >> Let's see: we would have an i18n RFC that would allow URLs to contain most
> >> any characters, and a (possibly Draft) standard that would say "All URLs
> >> consist of a restricted set of characters..." (we know which): clear
> >> contradiction.
> >
> >Please don't cite out of context or paraphrase wildly. The _existing_
> >RFC limits the characters in URLs. In fact, the UTF-8-in-%XX encoding
> >propsal doesn't even change that: it just adds semantics to the syntax.
> 
> I'm sorry, but I see it differently: the UTF-8-in-%XX proposal doesn't add
> octet values on-the-wire, but it adds, and correctly maps, thousands of
> characters.

It can be seen in different ways. For some of the issues discussed
in the syntax draft, in particular all about relative URL processing,
it is indeed just semantics and doesn't interfere. On the other hand,
the current draft contains many explanations about the relation between
represented characters, octets, and URL characters. Somebody studying
it will greatly benefit from being told about the limitations of the
model that the current draft assumes, and from being told the direction
that is being taken to change the model and eliminate the deficiencies.

Also, the UTF-8-in-%XX proposal, strictly requiring %XX, is indeed
just an addition of semantics. However, once it is clear how these
semantics are added, the next step, namely removing the %XX requirement
and extending the URL character set to most of the Universal Character
Set (excluding compatibility characters and stuff), is obvious.
If URLs were closely similar to MIME headers, we could say that
this is a transparent user-interface issue, but because URLs include
the form on paper, where we agree that transcribing long %XX sequences
is a great pain for those that know the actual characters, the
situation is different.

I originally proposed the addition of UTF-8-in-%XX to the current
draft as an important first step towards fully international URLs,
based on experience with the URN compromize. But UTF-8-in-%XX is
only the first step, and because we already know the next steps,
we definitely should tell this to the reader of the syntax draft,
whether in the form of a fully reworked draft or (probably
preferable) in the form of a note discussing future developments.

Regards,	Martin.

Re: revised "generic syntax" internet draft Foteos Macrides
leading ".." (Re: revised ...) Gregory J. Woodhouse
Re: revised "generic syntax" internet draft Roy T. Fielding
Re: revised "generic syntax" internet draft Francois Yergeau
Re: revised "generic syntax" internet draft Roy T. Fielding
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Francois Yergeau
Transcribing non-ascii URLs [was: revised "generi… Dan Connolly
Re: revised "generic syntax" internet draft Edward Cherlin
Re: Transcribing non-ascii URLs [was: revised "ge… Martin J. Duerst
Re: revised "generic syntax" internet draft Roy T. Fielding
Re: revised "generic syntax" internet draft Dan Oscarsson
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft John C Klensin
Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
Re: revised "generic syntax" internet draft Larry Masinter
Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
Re: revised "generic syntax" internet draft Chris Newman
Re: revised "generic syntax" internet draft Chris Newman
Re: revised "generic syntax" internet draft Keld J|rn Simonsen
Re: revised "generic syntax" internet draft Roy T. Fielding
Re: revised "generic syntax" internet draft Chris Newman
Re: revised "generic syntax" internet draft Roy T. Fielding
Re: revised "generic syntax" internet draft Roy T. Fielding
Re: revised "generic syntax" internet draft Roy T. Fielding
Re: revised "generic syntax" internet draft Edward Cherlin
Re: revised "generic syntax" internet draft Larry Masinter
Re: revised "generic syntax" internet draft Harald.T.Alvestrand
Re: revised "generic syntax" internet draft Roy T. Fielding
Re: revised "generic syntax" internet draft Jon Knight
Re: revised "generic syntax" internet draft Jon Knight
Re: revised "generic syntax" internet draft John C Klensin
Re: revised "generic syntax" internet draft Ron Daniel, Jr.
Re: Transcribing non-ascii URLs [was: revised "ge… Bert Bos
Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
A workable alternative to "hex-encoded UTF-8 enco… Larry Masinter
Re: revised "generic syntax" internet draft Larry Masinter
Re: revised "generic syntax" internet draft Larry Masinter
Re: revised "generic syntax" internet draft John C Klensin
Re: revised "generic syntax" internet draft Harald.T.Alvestrand
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Roy T. Fielding
Re: revised "generic syntax" internet draft Chris Newman
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: A workable alternative to "hex-encoded UTF-8 … Martin J. Duerst
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Larry Masinter
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Keld J|rn Simonsen
Re: revised "generic syntax" internet draft Keld J|rn Simonsen
Re: revised "generic syntax" internet draft Jonathan Rosenne
Re: revised "generic syntax" internet draft Keld J|rn Simonsen
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Keld J|rn Simonsen
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Edward Cherlin
Opaque right hand sides (was: Re: revised "generi… John C Klensin
Re: revised "generic syntax" internet draft Karen R. Sollins
UTF-8 and URLs Larry Masinter
Re: UTF-8 and URLs Dan Connolly
Re: UTF-8 and URLs Chris Newman
Re: UTF-8 and URLs John C Klensin
Re: UTF-8 and URLs Francois Yergeau
Re: UTF-8 and URLs Dan Connolly
Re: revised "generic syntax" internet draft Edward Cherlin
Re: revised "generic syntax" internet draft John C Klensin
Re: revised "generic syntax" internet draft Keld J|rn Simonsen
Re: UTF-8 and URLs Martin J. Duerst
Re: UTF-8 and URLs Francois Yergeau
Re: UTF-8 and URLs Dan Connolly
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Martin J. Duerst
New proposal (was Re: UTF-8 and URLs) Edward Cherlin
Re: UTF-8 and URLs Larry Masinter
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: UTF-8 and URLs Martin J. Duerst
initial "relative-looking" elements. Larry Masinter
Re: revised "generic syntax" internet draft Edward Cherlin
Re: initial "relative-looking" elements. Roy T. Fielding