Re: revised "generic syntax" internet draft

"Martin J. Duerst" <> Fri, 25 April 1997 18:02 UTC

Received: from cnri by id aa05303; 25 Apr 97 14:02 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa15456; 25 Apr 97 14:02 EDT
Received: (from daemon@localhost) by (8.8.5/8.8.5) id NAA17321 for uri-out; Fri, 25 Apr 1997 13:20:50 -0400 (EDT)
Received: from (mocha.Bunyip.Com []) by (8.8.5/8.8.5) with SMTP id NAA17316 for <>; Fri, 25 Apr 1997 13:20:46 -0400 (EDT)
Received: from by with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA28794 (mail destined for; Fri, 25 Apr 97 13:20:42 -0400
Received: from by with SMTP (PP) id <>; Fri, 25 Apr 1997 19:20:09 +0200
Date: Fri, 25 Apr 1997 19:20:07 +0200 (MET DST)
From: "Martin J. Duerst" <>
To: "Karen R. Sollins" <>
Subject: Re: revised "generic syntax" internet draft
In-Reply-To: <>
Message-Id: <Pine.SUN.3.96.970425190044.245w-100000@enoshima>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Precedence: bulk

On Thu, 24 Apr 1997, Karen R. Sollins wrote:

> I have tried VERY hard to stay out of this discussion, but I know have
> to ask a question as suggested by the extraction above.  Must one
> conclude from a position of supporting encoding of character sets in
> UTF-8 that the server at the site of the resource MUST be of a certain
> flavor supporting that character set, and furthermore that perhaps the
> general practice will be that each server will only support one or a
> small number?  With no general solution implemented globally, those
> with less popular character sets (this often goes hand in hand with
> less technology and less economic strength) are much more likely to be
> left out in the cold.  So much for general internationalization,
> unless this means only internationalization for the larger, richer
> communities.


Your concerns are very understandable, but I think they are
not necessary. Most current servers don't support any character
sets, they just handle octets transparently. The easiest way to
set up UTF-8 URLs is to use an editor or terinal emulator that
understands UTF-8. Understanding of character sets and transcoding
is only necessary if you want your server to accept URLs in
two (or more) different encodings, for example in UTF-8 and
some legacy encoding for backwards compatibility.

Also, it is important to notice that full support of Unicode
definitely needs some memory for fonts and lots of other things.
But in any context, the main interest would be for the local
script and maybe the Latin script, and so resources can be
reduced dramatically. People in not so rich places also
don't mind having to use a script or two if that does the

Also, for many scripts UNicode is the main source of
reference, and stands out clearly above a multitude of
ad-hoc character encodings and font layouts. In some cases,
small places with big surroundings also have a need to
represent more than just their native script. As an example,
take Georgian. They have about 20 encodings currently in use,
and they would like to use Georgian, Cyrillic, and Latin in
the same text, which can only be done in 8 bits with great
sacrifices. If they use some native editor, writing a
conversion program from<->to UTF-8 is done in a day or less.

Another important aspect is the communication with the
community abroad. A Georgian in the US might not want to
set up some old OS with (for him) archaic tools to communicate
in his mother's tongue. The main chance for him to get
Georgian support in his usual software is through Unicode.

Regards,	Martin.