Re: Using UTF-8 for non-ASCII Characters in URLs
"Martin J. Duerst" <mduerst@ifi.unizh.ch> Thu, 01 May 1997 13:23 UTC
Received: from cnri by ietf.org id aa14840; 1 May 97 9:23 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa09939; 1 May 97 9:23 EDT
Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id IAA23242 for uri-out; Thu, 1 May 1997 08:54:04 -0400 (EDT)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with ESMTP id IAA23237 for <uri@services.bunyip.com>; Thu, 1 May 1997 08:53:51 -0400 (EDT)
Received: from josef.ifi.unizh.ch (josef.ifi.unizh.ch [130.60.48.10]) by mocha.bunyip.com (8.8.5/8.8.5) with SMTP id IAA09774 for <uri@bunyip.com>; Thu, 1 May 1997 08:52:56 -0400 (EDT)
Received: from enoshima.ifi.unizh.ch by josef.ifi.unizh.ch with SMTP (PP) id <21259-0@josef.ifi.unizh.ch>; Thu, 1 May 1997 14:50:13 +0200
Date: Thu, 01 May 1997 14:50:11 +0200
From: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
To: Larry Masinter <masinter@parc.xerox.com>
cc: Francois Yergeau <yergeau@alis.com>, uri@bunyip.com
Subject: Re: Using UTF-8 for non-ASCII Characters in URLs
In-Reply-To: <3367BA32.6588@parc.xerox.com>
Message-ID: <Pine.SUN.3.96.970501143843.245M-100000@enoshima>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from QUOTED-PRINTABLE to 8bit by services.bunyip.com id IAA23238
Sender: owner-uri@bunyip.com
Precedence: bulk
On Wed, 30 Apr 1997, Larry Masinter wrote: From Francois' web page: > "This shows the path to be followed with non-ASCII URLs embedded in a > text file: simply encode the characters of the URL in the same way as > the other characters of the document, i.e. using the CCS of the > document. If a character in the URL is not part of the repertoire of > this CCS, use URL-encoding of the UTF-8 representation to preserve that > character's identity." Larry's comment: > You would require a different transcoding mechanism for the URL and for > the rest of the document. Normally, transcoding a Unicode document in > HTML into ISO-8859-1 requires converting characters outside of 0-255 > into numeric character references; however, you are suggesting turning > URLs into hex-encoded UTF-8 instead. Right? Not exactly. Probably Francois' wording above ("is not part of the repertoire of this CCS") should be a little bit different, saying something like "cannot be represented in the document". RFC2070/Cougar/... conforming html documents can represent the whole repertoire of ISO 10646/Unicode, and there is therefore no must to translate to %HH. For automatic transcoding of HTML documents, using &#nnn; is definitely possible, and eaiser because it does not need parsing of the document. On the other hand, a more sophisticated tool definitely could, and probably should, use %HH, as the fact that the characters don't fit into the underlying CCS is a strong indication that the target readers may not be able to use the original form in further transcription. Regards, Martin.
- Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Dan Connolly
- Re: Using UTF-8 for non-ASCII Characters in URLs Michael Kung <MKUNG.US.ORACLE.COM>
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Dan Oscarsson
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Dan Oscarsson
- Re: Using UTF-8 for non-ASCII Characters in URLs Gary Adams - Sun Microsystems Labs BOS
- Re: Using UTF-8 for non-ASCII Characters in URLs Gary Adams - Sun Microsystems Labs BOS
- Re: Using UTF-8 for non-ASCII Characters in URLs Francois Yergeau
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Michael Kung <MKUNG.US.ORACLE.COM>
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Dan Oscarsson
- Re: "Difficult Characters" draft Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Edward Cherlin
- Re: Using UTF-8 for non-ASCII Characters in URLs Chris Newman
- Re: "Difficult Characters" draft Larry Masinter
- Re: "Difficult Characters" draft Alain LaBont/e'/
- Re: "Difficult Characters" draft Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: "Difficult Characters" draft Leslie Daigle
- Re: "Difficult Characters" draft Alain LaBont/e'/
- Re: "Difficult Characters" draft Martin J. Duerst
- Re: "Difficult Characters" draft Patrik Faltstrom
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Alain LaBont/e'/