Re: Using UTF-8 for non-ASCII Characters in URLs
Larry Masinter <masinter@parc.xerox.com> Wed, 30 April 1997 21:51 UTC
Received: from cnri by ietf.org id aa24612; 30 Apr 97 17:51 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa21908; 30 Apr 97 17:51 EDT
Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id RAA14457 for uri-out; Wed, 30 Apr 1997 17:33:04 -0400 (EDT)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with ESMTP id RAA14451 for <uri@services.bunyip.com>; Wed, 30 Apr 1997 17:33:01 -0400 (EDT)
Received: from alpha.xerox.com (alpha.Xerox.COM [13.1.64.93]) by mocha.bunyip.com (8.8.5/8.8.5) with SMTP id RAA05135 for <uri@bunyip.com>; Wed, 30 Apr 1997 17:32:59 -0400 (EDT)
Received: from casablanca.parc.xerox.com ([13.2.16.111]) by alpha.xerox.com with SMTP id <18228(5)>; Wed, 30 Apr 1997 14:31:47 PDT
Received: from bronze.parc.xerox.com ([13.1.100.114]) by casablanca.parc.xerox.com with SMTP id <71927>; Wed, 30 Apr 1997 14:31:34 PDT
Message-ID: <3367BA32.6588@parc.xerox.com>
Date: Wed, 30 Apr 1997 14:31:30 -0700
From: Larry Masinter <masinter@parc.xerox.com>
Organization: Xerox PARC
X-Mailer: Mozilla 3.01Gold (Win95; I)
MIME-Version: 1.0
To: Francois Yergeau <yergeau@alis.com>
CC: uri@bunyip.com
Subject: Re: Using UTF-8 for non-ASCII Characters in URLs
References: <199704300652.IAA09984@valinor.malmo.trab.se> <3.0.1.32.19970430110018.00e3aee0@genstar.alis.ca>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: owner-uri@bunyip.com
Precedence: bulk
Francois, I suggested: ><A HREF="this-is-the-URL">this-is-what-the-user-sees</A> > >The URL in the 'this-is-the-URL' part should use hex-encoded-UTF8, >no matter what the user sees. and you responded: "That would break with current practice. Please see <http://www.alis.com/~yergeau/url-00.html>, section 4 for a discussion of this issue." However, I'm not aware of any current practice that does what section 4 suggests, namely: "This shows the path to be followed with non-ASCII URLs embedded in a text file: simply encode the characters of the URL in the same way as the other characters of the document, i.e. using the CCS of the document. If a character in the URL is not part of the repertoire of this CCS, use URL-encoding of the UTF-8 representation to preserve that character's identity." You would require a different transcoding mechanism for the URL and for the rest of the document. Normally, transcoding a Unicode document in HTML into ISO-8859-1 requires converting characters outside of 0-255 into numeric character references; however, you are suggesting turning URLs into hex-encoded UTF-8 instead. Right? Could you clarify what current practice would "break"?
- Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Dan Connolly
- Re: Using UTF-8 for non-ASCII Characters in URLs Michael Kung <MKUNG.US.ORACLE.COM>
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Dan Oscarsson
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Dan Oscarsson
- Re: Using UTF-8 for non-ASCII Characters in URLs Gary Adams - Sun Microsystems Labs BOS
- Re: Using UTF-8 for non-ASCII Characters in URLs Gary Adams - Sun Microsystems Labs BOS
- Re: Using UTF-8 for non-ASCII Characters in URLs Francois Yergeau
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Michael Kung <MKUNG.US.ORACLE.COM>
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Dan Oscarsson
- Re: "Difficult Characters" draft Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Edward Cherlin
- Re: Using UTF-8 for non-ASCII Characters in URLs Chris Newman
- Re: "Difficult Characters" draft Larry Masinter
- Re: "Difficult Characters" draft Alain LaBont/e'/
- Re: "Difficult Characters" draft Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: "Difficult Characters" draft Leslie Daigle
- Re: "Difficult Characters" draft Alain LaBont/e'/
- Re: "Difficult Characters" draft Martin J. Duerst
- Re: "Difficult Characters" draft Patrik Faltstrom
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Alain LaBont/e'/