Re: Using UTF-8 for non-ASCII Characters in URLs
Larry Masinter <masinter@parc.xerox.com> Wed, 30 April 1997 08:23 UTC
Received: from cnri by ietf.org id aa06657; 30 Apr 97 4:23 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa05172; 30 Apr 97 4:23 EDT
Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id EAA10327 for uri-out; Wed, 30 Apr 1997 04:01:32 -0400 (EDT)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with ESMTP id EAA10322 for <uri@services.bunyip.com>; Wed, 30 Apr 1997 04:01:30 -0400 (EDT)
Received: from alpha.xerox.com (alpha.Xerox.COM [13.1.64.93]) by mocha.bunyip.com (8.8.5/8.8.5) with SMTP id EAA29747 for <uri@bunyip.com>; Wed, 30 Apr 1997 04:01:27 -0400 (EDT)
Received: from casablanca.parc.xerox.com ([13.2.16.111]) by alpha.xerox.com with SMTP id <17421(8)>; Wed, 30 Apr 1997 01:00:54 PDT
Received: from bronze-208.parc.xerox.com ([13.0.209.122]) by casablanca.parc.xerox.com with SMTP id <71888>; Wed, 30 Apr 1997 01:00:33 PDT
Message-ID: <3366FC1B.EA8@parc.xerox.com>
Date: Wed, 30 Apr 1997 01:00:27 -0700
From: Larry Masinter <masinter@parc.xerox.com>
Organization: Xerox PARC
X-Mailer: Mozilla 3.01Gold (Win95; I)
MIME-Version: 1.0
To: Dan Oscarsson <Dan.Oscarsson@trab.se>
CC: uri@bunyip.com
Subject: Re: Using UTF-8 for non-ASCII Characters in URLs
References: <199704300652.IAA09984@valinor.malmo.trab.se>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: owner-uri@bunyip.com
Precedence: bulk
Dan, > This is not right. A directory listing service generates a html document > that is sent back to the web browser. All URLs within a html document > should use the same character set as the document uses. That is, > if the document uses iso 8859-1, the URLs will be in iso 8859-1, and > if the document is in UTF-8, the URLs will be in UTF-8. Dan, for each item in a directory listing, there are two entries. <A HREF="this-is-the-URL">this-is-what-the-user-sees</A> The URL in the 'this-is-the-URL' part should use hex-encoded-UTF8, no matter what the user sees. I'll try to make clear that the recommendation for how URLs should be processed really only applies to the URLs and not to anything else that isn't a URL. > If the browser knows how to handle the character set of the html document, > it also should know how to translate the embedded URLs into UTF-8 when > the user follows a link. I think you've missed the whole point. A browser that knows ISO-8859-1 and KOI-8 can continue to only process directory listings from servers that have files whose file names are in Japanese. > In general, URLs used without a context that defines the characters used, > should be encoded using UTF-8. URLs used within a context where the > meaning of the characters is defined should use the character encoding > of the context. I suppose you're entitled to this opinion that thats how they "should" be encoded, but this is a different recommendation from those being promoted by others on this mailing list. If you want to make a counter-proposal, you're free to do so, but I don't think you have described anything that is actually workable. Larry
- Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Dan Connolly
- Re: Using UTF-8 for non-ASCII Characters in URLs Michael Kung <MKUNG.US.ORACLE.COM>
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Dan Oscarsson
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Dan Oscarsson
- Re: Using UTF-8 for non-ASCII Characters in URLs Gary Adams - Sun Microsystems Labs BOS
- Re: Using UTF-8 for non-ASCII Characters in URLs Gary Adams - Sun Microsystems Labs BOS
- Re: Using UTF-8 for non-ASCII Characters in URLs Francois Yergeau
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Michael Kung <MKUNG.US.ORACLE.COM>
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Dan Oscarsson
- Re: "Difficult Characters" draft Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Edward Cherlin
- Re: Using UTF-8 for non-ASCII Characters in URLs Chris Newman
- Re: "Difficult Characters" draft Larry Masinter
- Re: "Difficult Characters" draft Alain LaBont/e'/
- Re: "Difficult Characters" draft Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: "Difficult Characters" draft Leslie Daigle
- Re: "Difficult Characters" draft Alain LaBont/e'/
- Re: "Difficult Characters" draft Martin J. Duerst
- Re: "Difficult Characters" draft Patrik Faltstrom
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Alain LaBont/e'/