Re: revised "generic syntax" internet draft

"Martin J. Duerst" <mduerst@ifi.unizh.ch> Mon, 21 April 1997 11:08 UTC

Received: from cnri by ietf.org id aa01587; 21 Apr 97 7:08 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa00404; 21 Apr 97 7:08 EDT
Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id GAA23071 for uri-out; Mon, 21 Apr 1997 06:35:29 -0400 (EDT)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with SMTP id GAA23066 for <uri@services.bunyip.com>; Mon, 21 Apr 1997 06:35:26 -0400 (EDT)
Received: from josef.ifi.unizh.ch by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA20458 (mail destined for uri@services.bunyip.com); Mon, 21 Apr 97 06:35:23 -0400
Received: from enoshima.ifi.unizh.ch by josef.ifi.unizh.ch with SMTP (PP) id <22139-0@josef.ifi.unizh.ch>; Mon, 21 Apr 1997 12:31:50 +0200
Date: Mon, 21 Apr 1997 12:31:48 +0200
From: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
To: Larry Masinter <masinter@parc.xerox.com>
Cc: Gary Adams - Sun Microsystems Labs BOS <Gary.Adams@east.sun.com>, uri@bunyip.com, fielding@kiwi.ics.uci.edu, Harald.T.Alvestrand@uninett.no
Subject: Re: revised "generic syntax" internet draft
In-Reply-To: <3354813C.7078@parc.xerox.com>
Message-Id: <Pine.SUN.3.96.970421120702.245F-100000@enoshima>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"
Sender: owner-uri@bunyip.com
Precedence: bulk

On Wed, 16 Apr 1997, Larry Masinter wrote:


> Gary,
> 
> Thanks for going through my questions and giving general answers,
> but they were still pretty generic and hand-waving. In lieu of
> an actual implementation, could you please go through a couple
> of real examples, e.g., for any one of Chinese, Japanese,
> Greek, Hebrew.

[I'll try to answer an example for Japanese, but first I'll
answer the question at the end of Larry's mail.]

> How is this supposed to work, and how does hex-encoded UTF-8 encoded
> actually help make it work?

The question of how *hex-encoded* UTF-8 actually helps is a
good one. It is very clear that an advertisement with an
URL with lots of %HH in it won't be better than the same
with a few English letters in it, even if we know that
the %HH are UTF-8 of characters that make a lot of sense.
*hex-encoded* UTF-8, however, is an important preparation
for really using beyond-ASCII letters in URLs. Without this
defined character<->octet conversion, using beyond-ASCII
letters will never work. Once UTF-8 is nailed down, it
will work rather smoothly.


> What about Sanyoo depaarto? What do they print as the URL
> for their food shop? How would someone enter that into a browser?

Well, they print something like http://WEB.SANYO.CO.JP/FOODSHOP,
where upper case is Japanese characters. Of course, for this we have
to assume that DNS works with characters beyond ASCII, but that's
a separate problem that can be solved (see draft-duerst-dns-i18n-00.txt).
This is entered as such into a browser. We assume that those users
that are the target of the Sanyoo depaato food shop page can read
Japanes and have equipment that allows them to input Japanese.
I won't go into the details of entering the corresponding characters,
it's a process the Japanese computer users are very familliar with.
The browser then would convert the Japanese characters into UTF-8
and (add %HH encoding) and pass the URL to the resolver machinery,
where the host part would be resolved with DNS, and then the machine
at the corresponding IP number would be contacted with HTTP. That
machine would of course have been set up so that the correct page
is returned.

I hope this explanation is detailled enough. If you don't understand
some part of it, please tell us.

Regards,	Martin.