Re: UTF-8 URL for testing

Larry Masinter <masinter@parc.xerox.com> Fri, 11 April 1997 08:45 UTC

Received: from cnri by ietf.org id aa09734; 11 Apr 97 4:45 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa05538; 11 Apr 97 4:45 EDT
Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id EAA29372 for uri-out; Fri, 11 Apr 1997 04:16:53 -0400 (EDT)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with SMTP id EAA29367 for <uri@services.bunyip.com>; Fri, 11 Apr 1997 04:16:50 -0400 (EDT)
Received: from alpha.Xerox.COM by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA14175 (mail destined for uri@services.bunyip.com); Fri, 11 Apr 97 04:16:49 -0400
Received: from bronze-208.parc.xerox.com ([13.0.209.122]) by alpha.xerox.com with SMTP id <17794(14)>; Fri, 11 Apr 1997 01:16:44 PDT
Message-Id: <334DADDC.5CBC@parc.xerox.com>
Date: Thu, 10 Apr 1997 20:19:56 -0700
From: Larry Masinter <masinter@parc.xerox.com>
Reply-To: masinter@parc.xerox.com
Organization: PARC
X-Mailer: Mozilla 3.01Gold (Win95; I)
Mime-Version: 1.0
To: Francois Yergeau <yergeau@alis.com>
Cc: uri@bunyip.com
Subject: Re: UTF-8 URL for testing
References: <3.0.1.32.19970410171259.0075760c@genstar.alis.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: owner-uri@bunyip.com
Precedence: bulk

Thank you, Francois, for providing some actual data.

Martin's ad hominem attacks are infuriating. It may
seem like it should "go without saying", but I appreciate
your civility and willingness to deal with the actual
facts.

Does Alis provide its documentation online? Can you
point us to the place where the use of %-hex encoded
UTF-8 encoded Unicode in URLs is documented?

The URLs you point us to are all in your personal
area (~yergeau) on alis.com. Why aren't any of
the other URLs on the alis site internationalized,
since it is compatible with current browsers?

You say:

> Click on the links
> within to see whether *your* browser handles those crazy,
> out-in-left-field, non-ASCII constructs.  I've found three that work fine,
> and two of them together represent almost the whole browser installed base.

Did you try any browsers that didn't work? Do any of the browsers
display the URLs as anything other than %xx%xx%xx in the 'location' box?

Is there any software anywhere in the world that actually generates
URLs like these? All of the examples seem to be carefully
hand-constructed.
Since these URLs are compatible with existing browsers, as you say,
there should not be any difficulty in people running their web servers
this way. Do any web servers in Japan use hex-encoded UTF-8-encoded
Unicode for URLs?

The problem with recommending this method for "Draft Standard" is
not the "six month delay" it takes in getting to draft standard,
it's that we should not recommend something that people aren't
actually going to do. This is not some kind of nit-picky technical
objection, it's fundamental to the process of Internet standards.

I am eager to actually support internationalization.
Martin's remarks insinuating otherwise were insulting. However,
I think it is counter-productive to foist hex-encoded UTF-8-encoded
URLs (12 bytes to represent one 16-bit Kanji) on the rest of the
world merely because a western European and a Canadian like
the idea. Surely we can find a site in Japan, China, Israel, or
Russia that would support exporting their URLs with hex-encoded
UTF-8-encoded URLs, before believing that this isn't yet
another form of Unicode imperialism. Otherwise, we would just
have a pretend solution to a real problem.

Regards,

Larry
--
http://www.parc.xerox.com/masinter