Re: URL internationalization!
"Martin J. Duerst" <mduerst@ifi.unizh.ch> Mon, 24 February 1997 16:32 UTC
Received: from cnri by ietf.org id aa21221; 24 Feb 97 11:32 EST
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa14385; 24 Feb 97 11:32 EST
Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id LAA20027 for uri-out; Mon, 24 Feb 1997 11:09:19 -0500 (EST)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with SMTP id LAA20020 for <uri@services.bunyip.com>; Mon, 24 Feb 1997 11:09:15 -0500 (EST)
Received: from josef.ifi.unizh.ch by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA20182 (mail destined for uri@services.bunyip.com); Mon, 24 Feb 97 11:09:11 -0500
Received: from enoshima.ifi.unizh.ch by josef.ifi.unizh.ch with SMTP (PP) id <11546-0@josef.ifi.unizh.ch>; Mon, 24 Feb 1997 17:09:08 +0100
Date: Mon, 24 Feb 1997 17:09:06 +0100
From: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
To: Alain LaBont/e'/ <alb@sct.gouv.qc.ca>
Cc: Francois Yergeau <yergeau@alis.com>, "Roy T. Fielding" <fielding@kiwi.ics.uci.edu>, URI mailing list <uri@bunyip.com>
Subject: Re: URL internationalization!
In-Reply-To: <9702211454.AA12501@socrate.riq.qc.ca>
Message-Id: <Pine.SUN.3.95q.970224164714.245O-100000@enoshima>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"
Sender: owner-uri@bunyip.com
Precedence: bulk
On Fri, 21 Feb 1997, Alain LaBont/e'/ wrote: > @ 23:11 97-02-20 -0500, Francois Yergeau icrit : > > [given 8 bit per byte encoding] > > >Right. In fact, not only the system MUST NOT crash, but it SHOULD behave > >the same as if it had received the corresponding %XX. > > Ginial! Sorry, but it's not exactly as genial as it looks. As an example, let's take a resource name with a G with breve (U+011E). Let's assume that on the server, resource names are encoded in iso-8859-3. Then the G with breve contains appears as %AB in a well-formed URL. Now suppose somebody put that URL into an HTML document that is encoded in iso-8859-3, in 8-bit form (i.e. the URL contains the octet 0xAB for the G with breve character), and that that document is correctly tagged as iso-8859-3. Now assume a browser sends a request with Accept-Charset: iso-8859-5 The server (or a proxy) translates the whole document from iso-8859-3 to iso-8859-5 to honor the request of the browser. The G with breve gets changed to 0xD0. The client receives the 0xD0. If it "behaves the same as if it had received the corresponding %XX", i.e. %D0, the URL will not work at all. This is difficult to fix in the short term, but in the long term, once the convention that URLs use UTF-8 becomes popular, the client shouldn't "behave the same", but should take the character (namely the G with breve), encode it as UTF-8 and then with %HH, and then send it to the server. If we make recommendations as to what to do with an 8-bit encoded URL, we should definitely mention both possibilities, namely: - Interpret as octet directly and convert it to %HH - Interpret as character and convert to UTF-8 and then to %HH With this, we cover two cases: - The URL wasn't transcoded (not guaranteed, but quite frequent) - The server uses UTF-8 to encode characters (will become more and more frequent) The third case, namely that the URL gets transcoded, but the server doesn't support UTF-8, would be very difficult to cover, and is unrelated to the proposal of introducing UTF-8 as a recommended character encoding for URLs. Regards, Martin.
- URL internationalization! Martin J. Duerst
- URL internationalization! Martin J. Duerst
- Re: URL internationalization! Roy T. Fielding
- Re: URL internationalization! Gregory J. Woodhouse
- Re: URL internationalization! Francois Yergeau
- Re: URL internationalization! Martin J. Duerst
- Re: URL internationalization! Dan Oscarsson
- Re: URL internationalization! Alain LaBont/e'/
- Re: URL internationalization! Gregory J. Woodhouse
- Re: URL internationalization! Francois Yergeau
- Re: URL internationalization! Gregory J. Woodhouse
- Re: URL internationalization! Martin J. Duerst
- Symbolic vs Numeric identifiers (was Re: URL inte… Daniel LaLiberte
- Re: URL internationalization! Martin J. Duerst
- Re: Symbolic vs Numeric identifiers (was Re: URL … Gregory J. Woodhouse
- Re: URL internationalization! Dan Oscarsson
- Re: URL internationalization! Martin J. Duerst
- Re: URL internationalization! Jonathan Rosenne
- Re: URL internationalization! Larry Masinter
- Re: URL internationalization! Alain LaBont/e'/
- Re: Symbolic vs Numeric identifiers Daniel LaLiberte
- Re: URL internationalization! Martin J. Duerst
- Re: URL internationalization! Martin J. Duerst
- Re: Symbolic vs Numeric identifiers (was Re: URL … Martin J. Duerst
- Re: Symbolic vs Numeric identifiers (was Re: URL … Gavin Nicol