Re: revised "generic syntax" internet draft

John C Klensin <klensin@mci.net> Tue, 15 April 1997 16:33 UTC

Received: from cnri by ietf.org id aa02145; 15 Apr 97 12:33 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa14326; 15 Apr 97 12:33 EDT
Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id LAA28973 for uri-out; Tue, 15 Apr 1997 11:56:00 -0400 (EDT)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with SMTP id LAA28968 for <uri@services.bunyip.com>; Tue, 15 Apr 1997 11:55:57 -0400 (EDT)
Received: from ns.jck.com by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA13131 (mail destined for uri@services.bunyip.com); Tue, 15 Apr 97 11:55:53 -0400
Received: from tp7.Jck.com ("port 2641"@tp7.jck.com) by a4.jck.com (PMDF V5.1-8 #21705) with SMTP id <0E8OS8Y4100F0N@a4.jck.com> for uri@bunyip.com; Tue, 15 Apr 1997 11:55:46 -0400 (EDT)
Date: Tue, 15 Apr 1997 11:55:43 -0400
From: John C Klensin <klensin@mci.net>
Subject: Re: revised "generic syntax" internet draft
In-Reply-To: <199704151350.PAA20358@valinor.malmo.trab.se>
To: Dan Oscarsson <Dan.Oscarsson@trab.se>
Cc: Harald.T.Alvestrand@uninett.no, uri@bunyip.com, fielding@kiwi.ics.uci.edu
Reply-To: John C Klensin <klensin@mci.net>
Message-Id: <SIMEON.9704151143.E@tp7.Jck.com>
Mime-Version: 1.0
X-Mailer: Simeon for Win32 Version 4.1.1b2 Build (6)
Content-Type: TEXT/PLAIN; CHARSET="US-ASCII"
Priority: NORMAL
X-Authentication: none
X-MIME-Autoconverted: from QUOTED-PRINTABLE to 8bit by services.bunyip.com id LAA28969
Sender: owner-uri@bunyip.com
Precedence: bulk
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by services.bunyip.com id LAA28973

On Tue, 15 Apr 1997 15:50:11 +0200 (MET DST) Dan Oscarsson 
<Dan.Oscarsson@trab.se> wrote:
>...
> Well, Swedish letters like åäö are normally called Latin, but I assume you
> mean ascii.

I can't speak for Roy, but, in my earlier note on the 
subject, I meant *Latin*.  The reality is that UTF-8 is 
"user friendly" --and will get through a lot of systems 
without either advanced planning or difficulties-- if the 
character set that is actually in use is ISO 8859-1, not 
just ASCII.  It isn't too bad for the other Latin 
alphabets.  But for the character collections that are 
distinctly not Latin-based, the display resulting from the 
use of UTF-8 in the absence of the sort of aggressive, 
front-end, "everyone needs to apply it" translations that 
Roy suggested are not only not user-friendly, but closely 
approximate a secret code (worse than %-notation or the 
notorious Q-P).

If one looks ahead more than a year or so and assumes 
worldwide use of the Internet, there are more of "them" 
than there are of "us" and the marginal fraction of the 
population that considers 8859-1 (and hence UTF-8) to be 
user-friendly as compared to ASCII is, unfortunately, 
barely worth the trouble.

It would have been better had URLs been carefully and 
thoughtfully internationalized from the very beginning.  
For whatever reasons, they weren't.  A conversion now is 
going to be painful.  But, if the pain is worth it, and I 
suspect it might be, then let's look to a balanced, 
equitable, *international* solution, not using UTF-8 
encoding in the hope that no one who uses ideographic 
characters will be bothered about what happens to them.

> If we cannot find a way to send URLs containing any character in a way so
> that the characters can be understood and displyed in a user friendly
> manner, the web and URLs are not the future.

I completely agree with this.  However, I think we need to 
adopt a very broad understanding of "user friendly" as well 
as keeping in mind that, for intersystem protocol purposes, 
ASCII, -- or even the stable subset of ISO 646 / T.50 -- 
have a much more successful track record (in both the 
IETF and ISO/ITU arenas) than any of the many attempts at 
"national", "localized", "international", or "universal" 
character sets.

   john