Re: Using UTF-8 for non-ASCII Characters in URLs
Dan Oscarsson <Dan.Oscarsson@trab.se> Wed, 30 April 1997 09:20 UTC
Received: from cnri by ietf.org id aa07317; 30 Apr 97 5:20 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa05941; 30 Apr 97 5:20 EDT
Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id EAA11053 for uri-out; Wed, 30 Apr 1997 04:46:03 -0400 (EDT)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with ESMTP id EAA11042 for <uri@services.bunyip.com>; Wed, 30 Apr 1997 04:46:00 -0400 (EDT)
Received: from malmo.trab.se (malmo.trab.se [131.115.48.10]) by mocha.bunyip.com (8.8.5/8.8.5) with ESMTP id EAA29924 for <uri@bunyip.com>; Wed, 30 Apr 1997 04:45:56 -0400 (EDT)
Received: from valinor.malmo.trab.se (valinor.malmo.trab.se [131.115.48.20]) by malmo.trab.se (8.7.5/TRAB-primary-2) with ESMTP id KAA20700; Wed, 30 Apr 1997 10:45:20 +0200 (MET DST)
Received: by valinor.malmo.trab.se (8.7.5/TRM-1-KLIENT); Wed, 30 Apr 1997 10:45:20 +0200 (MET DST) (MET)
Date: Wed, 30 Apr 1997 10:45:20 +0200
From: Dan Oscarsson <Dan.Oscarsson@trab.se>
Message-Id: <199704300845.KAA10131@valinor.malmo.trab.se>
To: masinter@parc.xerox.com
Subject: Re: Using UTF-8 for non-ASCII Characters in URLs
Cc: uri@bunyip.com
Mime-Version: 1.0
Content-MD5: XfECtRru3cxFKc+MfKOQxQ==
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: owner-uri@bunyip.com
Precedence: bulk
> > This is not right. A directory listing service generates a html document > > that is sent back to the web browser. All URLs within a html document > > should use the same character set as the document uses. That is, > > if the document uses iso 8859-1, the URLs will be in iso 8859-1, and > > if the document is in UTF-8, the URLs will be in UTF-8. > > Dan, for each item in a directory listing, there are two entries. > > <A HREF="this-is-the-URL">this-is-what-the-user-sees</A> > > The URL in the 'this-is-the-URL' part should use hex-encoded-UTF8, > no matter what the user sees. > If you use hex-encoding, yes. But NOT if you use the native character set of the document. In that case, the 'this-is-the-URL' part must use the same character set as the rest of the html document. Raw UTF-8 may only be used in a UTF-8 encoded html document, not in a iso 8859-1 encoded document. A large amount of html documents are hand written in a text editor. A user can not be expected to use a different encoding when typing the URLs in a document. But I agree that if hex-encoded characters are found in a URL they should be UTF-8 otherwise it would be unclear what encoding is used for hex-encoded URLs in a ascii-only html document. But a ascii-only document may not contain any 8-bit characters in a URL as there is no defined character set for them. To use native encoding in URLs in known context and hex-encoded UTF-8 in other places and, if you want, in known context is what I understand others on the list also wants. If we cannot use native encoding when typing in our URLs in our html documents very little is won. Dan
- Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Dan Connolly
- Re: Using UTF-8 for non-ASCII Characters in URLs Michael Kung <MKUNG.US.ORACLE.COM>
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Dan Oscarsson
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Dan Oscarsson
- Re: Using UTF-8 for non-ASCII Characters in URLs Gary Adams - Sun Microsystems Labs BOS
- Re: Using UTF-8 for non-ASCII Characters in URLs Gary Adams - Sun Microsystems Labs BOS
- Re: Using UTF-8 for non-ASCII Characters in URLs Francois Yergeau
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Michael Kung <MKUNG.US.ORACLE.COM>
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Dan Oscarsson
- Re: "Difficult Characters" draft Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Edward Cherlin
- Re: Using UTF-8 for non-ASCII Characters in URLs Chris Newman
- Re: "Difficult Characters" draft Larry Masinter
- Re: "Difficult Characters" draft Alain LaBont/e'/
- Re: "Difficult Characters" draft Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: "Difficult Characters" draft Leslie Daigle
- Re: "Difficult Characters" draft Alain LaBont/e'/
- Re: "Difficult Characters" draft Martin J. Duerst
- Re: "Difficult Characters" draft Patrik Faltstrom
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Alain LaBont/e'/