Re: URL internationalization!

Jonathan Rosenne <Jonathan_Rosenne@compuserve.com> Tue, 25 February 1997 18:55 UTC

Received: from cnri by ietf.org id aa23242; 25 Feb 97 13:55 EST
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa17900; 25 Feb 97 13:54 EST
Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id NAA05118 for uri-out; Tue, 25 Feb 1997 13:07:27 -0500 (EST)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with SMTP id NAA05113 for <uri@services.bunyip.com>; Tue, 25 Feb 1997 13:07:24 -0500 (EST)
Received: from arl-img-4.compuserve.com by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA29534 (mail destined for uri@services.bunyip.com); Tue, 25 Feb 97 13:07:23 -0500
Received: by arl-img-4.compuserve.com (8.6.10/5.950515) id NAA07042; Tue, 25 Feb 1997 13:07:18 -0500
Date: Tue, 25 Feb 1997 13:05:54 -0500
From: Jonathan Rosenne <Jonathan_Rosenne@compuserve.com>
Subject: Re: URL internationalization!
To: URI List <uri@bunyip.com>
Message-Id: <199702251306_MC2-11B1-87E5@compuserve.com>
Sender: owner-uri@bunyip.com
Precedence: bulk

> As an example,
> let's take a resource name with a G with breve (U+011E). Let's
> assume that on the server, resource names are encoded in iso-8859-3.
> Then the G with breve contains appears as %AB in a well-formed
> URL. Now suppose somebody put that URL into an HTML document
> that is encoded in iso-8859-3, in 8-bit form (i.e. the URL contains
> the octet 0xAB for the G with breve character), and that that
> document is correctly tagged as iso-8859-3.
>=20
> Now assume a browser sends a request with
>       Accept-Charset: iso-8859-5
> The server (or a proxy) translates the whole document from
> iso-8859-3 to iso-8859-5 to honor the request of the browser.
> The G with breve gets changed to 0xD0. The client receives
> the 0xD0. If it "behaves the same as if it had received the
> corresponding %XX", i.e. %D0, the URL will not work at all.

I don't understand. What if the user uses 8859-8, which has no G-breve? I
mean, what if it says Accept-Charset: iso-8859-8?

Jonathan