UTF-8 and URLs
Larry Masinter <masinter@parc.xerox.com> Thu, 24 April 1997 17:43 UTC
Received: from cnri by ietf.org id aa07641; 24 Apr 97 13:43 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa16524; 24 Apr 97 13:43 EDT
Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id MAA14405 for uri-out; Thu, 24 Apr 1997 12:58:49 -0400 (EDT)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with SMTP id MAA14400 for <uri@services.bunyip.com>; Thu, 24 Apr 1997 12:58:36 -0400 (EDT)
Received: from alpha.Xerox.COM by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA21477 (mail destined for uri@services.bunyip.com); Thu, 24 Apr 97 12:58:34 -0400
Received: from casablanca.parc.xerox.com ([13.2.16.111]) by alpha.xerox.com with SMTP id <18017(3)>; Thu, 24 Apr 1997 09:57:26 PDT
Received: from bronze.parc.xerox.com ([13.1.100.114]) by casablanca.parc.xerox.com with SMTP id <72455>; Thu, 24 Apr 1997 09:57:01 PDT
Message-Id: <335F90D8.6EDB@parc.xerox.com>
Date: Thu, 24 Apr 1997 09:56:56 -0700
From: Larry Masinter <masinter@parc.xerox.com>
Organization: Xerox PARC
X-Mailer: Mozilla 3.01Gold (Win95; I)
Mime-Version: 1.0
To: John C Klensin <klensin@mci.net>
Cc: uri@bunyip.com
Subject: UTF-8 and URLs
References: <SIMEON.9704240851.W@tp7.Jck.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: owner-uri@bunyip.com
Precedence: bulk
John, Your clarification didn't help me. And the sticking point for me is that "as a sequence of glyphs" is an important part of the transport of URLs, whether those glyphs are on paper or on the screen, and that the octet->glyph and glyph->octet route is really error-prone. I think to actually solve the problem of Internationalization of URLs we need two recommendations: a) If you're writing software that displays URLs to users, then 1) any 'forbidden' octets should be displayed as if they were UTF-8 encoded characters. That is, those octets are currently disallowed in URLs, but if you see them, display them in a standard way. 2) Any sequences of %HH-encoded octets should be displayed EITHER as <%><H><H>, e.g., just show the encoding in ASCII, OR by assuming that they're hex-encoded UTF-8. The latter assumption is likely to be wrong for now, but might change later. b) If you're writing software that lets users type in URLs, then if the user types in any character that isn't legal in a URL, encode the character as hex-encoded UTF-8. For Japanese, avoid using double-wide characters. For RTL scripts such as Hebrew or Arabic, leave out any direction changes and encode the characters in logical, not presentation order. Since there haven't been any standards for non-ASCII character representations, this is as good a choice as any. c) If you're writing software that generates URLs to be interpreted later, then use hex-encoded UTF-8 for the encoding to generate, and accept either the raw UTF-8 or the hex-encoded version as identifying the same resource. This is a recommendation for HTTP servers and FTP servers and a variety of other implementations. These three recommendations affect software from a large number of different producers. To make progress in the community, those software implementors will need to agree that this is the best solution to interoperability of URLs internationally. I think given its likely controversial nature, we should clearly make these recommendations in a separate RFC, and perhaps with a new working group. I'm willing to put this all down in a separate internet draft, if it will help focus the process on actually making progress. Some of the examples that have been sent out to the mailing list will be useful to guide the recommendations in the RFC. Regards, Larry -- http://www.parc.xerox.com/masinter -- http://www.parc.xerox.com/masinter
- Re: revised "generic syntax" internet draft Foteos Macrides
- leading ".." (Re: revised ...) Gregory J. Woodhouse
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Francois Yergeau
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Francois Yergeau
- Transcribing non-ascii URLs [was: revised "generi… Dan Connolly
- Re: revised "generic syntax" internet draft Edward Cherlin
- Re: Transcribing non-ascii URLs [was: revised "ge… Martin J. Duerst
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Dan Oscarsson
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft John C Klensin
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- Re: revised "generic syntax" internet draft Chris Newman
- Re: revised "generic syntax" internet draft Chris Newman
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Chris Newman
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Edward Cherlin
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft Harald.T.Alvestrand
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Jon Knight
- Re: revised "generic syntax" internet draft Jon Knight
- Re: revised "generic syntax" internet draft John C Klensin
- Re: revised "generic syntax" internet draft Ron Daniel, Jr.
- Re: Transcribing non-ascii URLs [was: revised "ge… Bert Bos
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- A workable alternative to "hex-encoded UTF-8 enco… Larry Masinter
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft John C Klensin
- Re: revised "generic syntax" internet draft Harald.T.Alvestrand
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Chris Newman
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: A workable alternative to "hex-encoded UTF-8 … Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Jonathan Rosenne
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Edward Cherlin
- Opaque right hand sides (was: Re: revised "generi… John C Klensin
- Re: revised "generic syntax" internet draft Karen R. Sollins
- UTF-8 and URLs Larry Masinter
- Re: UTF-8 and URLs Dan Connolly
- Re: UTF-8 and URLs Chris Newman
- Re: UTF-8 and URLs John C Klensin
- Re: UTF-8 and URLs Francois Yergeau
- Re: UTF-8 and URLs Dan Connolly
- Re: revised "generic syntax" internet draft Edward Cherlin
- Re: revised "generic syntax" internet draft John C Klensin
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: UTF-8 and URLs Martin J. Duerst
- Re: UTF-8 and URLs Francois Yergeau
- Re: UTF-8 and URLs Dan Connolly
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- New proposal (was Re: UTF-8 and URLs) Edward Cherlin
- Re: UTF-8 and URLs Larry Masinter
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: UTF-8 and URLs Martin J. Duerst
- initial "relative-looking" elements. Larry Masinter
- Re: revised "generic syntax" internet draft Edward Cherlin
- Re: initial "relative-looking" elements. Roy T. Fielding