Re: Using UTF-8 for non-ASCII Characters in URLs

Larry Masinter <masinter@parc.xerox.com> Wed, 30 April 1997 04:24 UTC

Received: from cnri by ietf.org id aa03170; 30 Apr 97 0:24 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa01589; 30 Apr 97 0:24 EDT
Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id AAA07155 for uri-out; Wed, 30 Apr 1997 00:10:45 -0400 (EDT)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with ESMTP id AAA07149 for <uri@services.bunyip.com>; Wed, 30 Apr 1997 00:10:42 -0400 (EDT)
Received: from alpha.xerox.com (alpha.Xerox.COM [13.1.64.93]) by mocha.bunyip.com (8.8.5/8.8.5) with SMTP id AAA28442 for <uri@bunyip.com>; Wed, 30 Apr 1997 00:10:28 -0400 (EDT)
Received: from casablanca.parc.xerox.com ([13.2.16.111]) by alpha.xerox.com with SMTP id <17028(2)>; Tue, 29 Apr 1997 21:09:54 PDT
Received: from bronze-208.parc.xerox.com ([13.0.209.122]) by casablanca.parc.xerox.com with SMTP id <71888>; Tue, 29 Apr 1997 21:09:48 PDT
Message-ID: <3366C606.786A@parc.xerox.com>
Date: Tue, 29 Apr 1997 21:09:42 -0700
From: Larry Masinter <masinter@parc.xerox.com>
Organization: Xerox PARC
X-Mailer: Mozilla 3.01Gold (Win95; I)
MIME-Version: 1.0
To: "Michael Kung <MKUNG.US.ORACLE.COM>" <MKUNG@us.oracle.com>
CC: uri@bunyip.com
Subject: Re: Using UTF-8 for non-ASCII Characters in URLs
References: <199704292148.OAA04694@mailsun2.us.oracle.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: owner-uri@bunyip.com
Precedence: bulk

This isn't just a "small point", it's essential:

The only way to guarantee "round trip" is to stick to the smallest
repertoire of characters. Clearly you shouldn't enter "http" as
wide characters, and if you have 'wide characters' that need
to be distinguished from ascii characters, you should encode them
in hex-encoded-UTF8 always.