Re: revised "generic syntax" internet draft
"Martin J. Duerst" <mduerst@ifi.unizh.ch> Sat, 19 April 1997 18:21 UTC
Received: from cnri by ietf.org id aa14647; 19 Apr 97 14:21 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa03556; 19 Apr 97 14:21 EDT
Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id NAA23795 for uri-out; Sat, 19 Apr 1997 13:06:51 -0400 (EDT)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with SMTP id NAA23790 for <uri@services.bunyip.com>; Sat, 19 Apr 1997 13:06:45 -0400 (EDT)
Received: from josef.ifi.unizh.ch by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA14128 (mail destined for uri@services.bunyip.com); Sat, 19 Apr 97 13:06:41 -0400
Received: from enoshima.ifi.unizh.ch by josef.ifi.unizh.ch with SMTP (PP) id <02361-0@josef.ifi.unizh.ch>; Sat, 19 Apr 1997 19:06:36 +0200
Date: Sat, 19 Apr 1997 19:06:27 +0200
From: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
To: John C Klensin <klensin@mci.net>
Cc: Harald.T.Alvestrand@uninett.no, fielding@kiwi.ics.uci.edu, uri@bunyip.com, Dan Oscarsson <Dan.Oscarsson@trab.se>
Subject: Re: revised "generic syntax" internet draft
In-Reply-To: <SIMEON.9704161008.G@tp7.Jck.com>
Message-Id: <Pine.SUN.3.96.970419185030.708e-100000@enoshima>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"
Sender: owner-uri@bunyip.com
Precedence: bulk
On Wed, 16 Apr 1997, John C Klensin wrote: > Harald.T.Alvestrand@uninett.no wrote: > > > Factoid: > > > > UTF-8 is not user-friendly in 8859-1; the standard coding octets for > > putting the 8859-1 charset into UTF-8 insert one character in front of > > each character, and also change the last character for the 4 uppermost > > columns of the 8859-1 character table. > > My apologies. I should have said something more like "more > user-friendly for Latin-1 than it is for upper-end > ideographic characters, where it deteriorates even more > severely :-( You might come to the state where you have to view UTF-8 with a terminal emulator or editor not set to view it, where the above effects are occurring, but this should actually be rare. And it wouldn't be better if you looked at ideographic characters with an 8859-1 editor or so. First, we don't want to have UTF-8 and 8859-1 (or any other legacy coding) mixed in the same document. Once everything is working as envisioned, if you transport a Western European URL in 8859-1, you transport the characters, as 8859-1. It's only when this is changed to %HH, or to binary 8-bit URLs as such which lack any information on character encoding, that you change to UTF-8. So you would edit a list of 8-bit URLs with an UTF-8 editor, and you would edit a Japanese HTML document with some URLs e.g. with an EUC editor (the two editors may be the same and use autodetection). If you do cut-and-paste between the two editors (or the two windows), the characters should stay the same, while the underlying representation will change. That is what will be expected by all other kinds of text processing. > Given the bad behavior *even* for 8859-1, could someone > please remind me why we are pushing the thing rather than a > straight 16 or 32-bit encoding with compression if needed? Given that for URLs intended for global exchangability, pure ASCII is still the best choice, and that enormous amounts of energy can be saved if we don't invent everything for new, given that the bad behaviour described above can happen as an accident, but is not part of what should happen, and given that designing a compression scheme for short strings such as URLs is not exactly easy, I think using UTF-8, which is supported by a lot of software and used in many other places, is not the worst thing to do. Regards, Martin.
- Re: revised "generic syntax" internet draft Foteos Macrides
- leading ".." (Re: revised ...) Gregory J. Woodhouse
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Francois Yergeau
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Francois Yergeau
- Transcribing non-ascii URLs [was: revised "generi… Dan Connolly
- Re: revised "generic syntax" internet draft Edward Cherlin
- Re: Transcribing non-ascii URLs [was: revised "ge… Martin J. Duerst
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Dan Oscarsson
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft John C Klensin
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- Re: revised "generic syntax" internet draft Chris Newman
- Re: revised "generic syntax" internet draft Chris Newman
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Chris Newman
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Edward Cherlin
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft Harald.T.Alvestrand
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Jon Knight
- Re: revised "generic syntax" internet draft Jon Knight
- Re: revised "generic syntax" internet draft John C Klensin
- Re: revised "generic syntax" internet draft Ron Daniel, Jr.
- Re: Transcribing non-ascii URLs [was: revised "ge… Bert Bos
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- A workable alternative to "hex-encoded UTF-8 enco… Larry Masinter
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft John C Klensin
- Re: revised "generic syntax" internet draft Harald.T.Alvestrand
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Chris Newman
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: A workable alternative to "hex-encoded UTF-8 … Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Jonathan Rosenne
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Edward Cherlin
- Opaque right hand sides (was: Re: revised "generi… John C Klensin
- Re: revised "generic syntax" internet draft Karen R. Sollins
- UTF-8 and URLs Larry Masinter
- Re: UTF-8 and URLs Dan Connolly
- Re: UTF-8 and URLs Chris Newman
- Re: UTF-8 and URLs John C Klensin
- Re: UTF-8 and URLs Francois Yergeau
- Re: UTF-8 and URLs Dan Connolly
- Re: revised "generic syntax" internet draft Edward Cherlin
- Re: revised "generic syntax" internet draft John C Klensin
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: UTF-8 and URLs Martin J. Duerst
- Re: UTF-8 and URLs Francois Yergeau
- Re: UTF-8 and URLs Dan Connolly
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- New proposal (was Re: UTF-8 and URLs) Edward Cherlin
- Re: UTF-8 and URLs Larry Masinter
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: UTF-8 and URLs Martin J. Duerst
- initial "relative-looking" elements. Larry Masinter
- Re: revised "generic syntax" internet draft Edward Cherlin
- Re: initial "relative-looking" elements. Roy T. Fielding