Re: revised "generic syntax" internet draft
John C Klensin <klensin@mci.net> Fri, 25 April 1997 09:07 UTC
Received: from cnri by ietf.org id aa10592; 25 Apr 97 5:07 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa05536; 25 Apr 97 5:07 EDT
Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id EAA10776 for uri-out; Fri, 25 Apr 1997 04:41:02 -0400 (EDT)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with SMTP id EAA10771 for <uri@services.bunyip.com>; Fri, 25 Apr 1997 04:40:59 -0400 (EDT)
Received: from ns.jck.com by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA26898 (mail destined for uri@services.bunyip.com); Fri, 25 Apr 97 04:40:58 -0400
Received: from tp7.Jck.com ("port 1755"@tp7.jck.com) by a4.jck.com (PMDF V5.1-8 #21705) with SMTP id <0E96QS7660015I@a4.jck.com> for uri@bunyip.com; Fri, 25 Apr 1997 04:40:55 -0400 (EDT)
Date: Fri, 25 Apr 1997 04:40:55 -0400
From: John C Klensin <klensin@mci.net>
Subject: Re: revised "generic syntax" internet draft
In-Reply-To: <v03007834af85fb0914ed@[206.245.192.47]>
To: Edward Cherlin <cherlin@newbie.net>
Cc: uri@bunyip.com
Reply-To: John C Klensin <klensin@mci.net>
Message-Id: <SIMEON.9704250455.U@tp7.Jck.com>
Mime-Version: 1.0
X-Mailer: Simeon for Win32 Version 4.1.1 Build (14)
Content-Type: TEXT/PLAIN; CHARSET="US-ASCII"
Priority: NORMAL
X-Authentication: none
Sender: owner-uri@bunyip.com
Precedence: bulk
On Thu, 24 Apr 1997 23:25:10 -0700 Edward Cherlin <cherlin@newbie.net> wrote: >... > Those with less popular character sets are out in the cold today. Unicode > will bring them in from the cold, since it is a general solution that has > fairly wide implementation (in Windows NT, Macintosh OS, several flavors of > UNIX, Java, and so on, and in applications such as Microsoft Office 97 and > Alis Tango Web browser). >... > There is no hope of getting every legacy character encoding incorporated > into modern software by any means other than Unicode. Edward, This is not true, and these discussions seem to be difficult enough without engaging in hyperbole. In particular: (i) However widely Unicode is implemented, the actual use patterns are, especially outside of areas that use Latin-based alphabets, much larger for systems based on character set (or code page) switching (mostly, but not entirely, utilizing ISO 2022 designators) than they are for Unicode. (ii) In many cases (including with some applications that end up on the system you have mentioned), Unicode (or something like it) is used as an internal representation, but what goes over the wire is a character set switching (or shifting) system. There is some small (but not zero) risk that Unicode, like X.400, will end up being more of a common conversion and representation format than a format that end-user applications actually use natively. (iii) Even if "Unicode" is the right solution, it doesn't automatically follow that the best representation is UTF-8. A case can be made that, if one is going to have to resort to hex encoding anyway, simply hex-encoding the UCS-2 string will give better behavior more of the time (when "more" is considered by weighting the use of different ranges of the Unicode coding set by the number of people in the world who use those characters). (iv) It is not hard to demonstrate that, in the medium to long term, there are some requirements for character set encoding for which Unicode will not suffice and it will be necessary to go to multi-plane 10646 (which is one of several reasons why IETF recommendation documents have fairly consistently pointed to 10646 and not Unicode). The two are not the same. In particular, while the comment in (iii) can easily and correctly be rewritten as a UCS-4 statement, UTF-8 becomes, IMO, pathological (and its own excuse for compression) when one starts dealing with plane 3 or 4 much less, should we be unlucky enough to get there, plane 200 or so. john p.s. I haven't changed my mind -- I still don't like 2022 as a "character set" or as a data representation, largely because I don't like stateful character encodings. But I think we need to make decisions based on reality rather than wishful thinking, evangelism, or pure optimism.
- Re: revised "generic syntax" internet draft Foteos Macrides
- leading ".." (Re: revised ...) Gregory J. Woodhouse
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Francois Yergeau
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Francois Yergeau
- Transcribing non-ascii URLs [was: revised "generi… Dan Connolly
- Re: revised "generic syntax" internet draft Edward Cherlin
- Re: Transcribing non-ascii URLs [was: revised "ge… Martin J. Duerst
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Dan Oscarsson
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft John C Klensin
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- Re: revised "generic syntax" internet draft Chris Newman
- Re: revised "generic syntax" internet draft Chris Newman
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Chris Newman
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Edward Cherlin
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft Harald.T.Alvestrand
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Jon Knight
- Re: revised "generic syntax" internet draft Jon Knight
- Re: revised "generic syntax" internet draft John C Klensin
- Re: revised "generic syntax" internet draft Ron Daniel, Jr.
- Re: Transcribing non-ascii URLs [was: revised "ge… Bert Bos
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- A workable alternative to "hex-encoded UTF-8 enco… Larry Masinter
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft John C Klensin
- Re: revised "generic syntax" internet draft Harald.T.Alvestrand
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Chris Newman
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: A workable alternative to "hex-encoded UTF-8 … Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Jonathan Rosenne
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Edward Cherlin
- Opaque right hand sides (was: Re: revised "generi… John C Klensin
- Re: revised "generic syntax" internet draft Karen R. Sollins
- UTF-8 and URLs Larry Masinter
- Re: UTF-8 and URLs Dan Connolly
- Re: UTF-8 and URLs Chris Newman
- Re: UTF-8 and URLs John C Klensin
- Re: UTF-8 and URLs Francois Yergeau
- Re: UTF-8 and URLs Dan Connolly
- Re: revised "generic syntax" internet draft Edward Cherlin
- Re: revised "generic syntax" internet draft John C Klensin
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: UTF-8 and URLs Martin J. Duerst
- Re: UTF-8 and URLs Francois Yergeau
- Re: UTF-8 and URLs Dan Connolly
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- New proposal (was Re: UTF-8 and URLs) Edward Cherlin
- Re: UTF-8 and URLs Larry Masinter
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: UTF-8 and URLs Martin J. Duerst
- initial "relative-looking" elements. Larry Masinter
- Re: revised "generic syntax" internet draft Edward Cherlin
- Re: initial "relative-looking" elements. Roy T. Fielding