Re: revised "generic syntax" internet draft

"Martin J. Duerst" <mduerst@ifi.unizh.ch> Sat, 19 April 1997 18:21 UTC

Date: Sat, 19 Apr 1997 19:06:27 +0200
From: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
To: John C Klensin <klensin@mci.net>
Cc: Harald.T.Alvestrand@uninett.no, fielding@kiwi.ics.uci.edu, uri@bunyip.com, Dan Oscarsson <Dan.Oscarsson@trab.se>
Subject: Re: revised "generic syntax" internet draft
In-Reply-To: <SIMEON.9704161008.G@tp7.Jck.com>
Message-Id: <Pine.SUN.3.96.970419185030.708e-100000@enoshima>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"
Sender: owner-uri@bunyip.com
Precedence: bulk

On Wed, 16 Apr 1997, John C Klensin wrote:

> Harald.T.Alvestrand@uninett.no wrote:
> 
> > Factoid:
> > 
> > UTF-8 is not user-friendly in 8859-1; the standard coding octets for
> > putting the 8859-1 charset into UTF-8 insert one character in front of
> > each character, and also change the last character for the 4 uppermost
> > columns of the 8859-1 character table.
> 
> My apologies.  I should have said something more like "more 
> user-friendly for Latin-1 than it is for upper-end 
> ideographic characters, where it deteriorates even more 
> severely :-(

You might come to the state where you have to view UTF-8 with
a terminal emulator or editor not set to view it, where the
above effects are occurring, but this should actually be rare.
And it wouldn't be better if you looked at ideographic characters
with an 8859-1 editor or so.

First, we don't want to have UTF-8 and 8859-1 (or any other legacy
coding) mixed in the same document. Once everything is working as
envisioned, if you transport a Western European URL in 8859-1,
you transport the characters, as 8859-1. It's only when this is
changed to %HH, or to binary 8-bit URLs as such which lack any
information on character encoding, that you change to UTF-8.

So you would edit a list of 8-bit URLs with an UTF-8 editor,
and you would edit a Japanese HTML document with some URLs
e.g. with an EUC editor (the two editors may be the same and
use autodetection). If you do cut-and-paste between the
two editors (or the two windows), the characters should
stay the same, while the underlying representation will
change. That is what will be expected by all other kinds
of text processing.

> Given the bad behavior *even* for 8859-1, could someone 
> please remind me why we are pushing the thing rather than a 
> straight 16 or 32-bit encoding with compression if needed?  

Given that for URLs intended for global exchangability,
pure ASCII is still the best choice, and that enormous
amounts of energy can be saved if we don't invent everything
for new, given that the bad behaviour described above can
happen as an accident, but is not part of what should happen,
and given that designing a compression scheme for short
strings such as URLs is not exactly easy, I think
using UTF-8, which is supported by a lot of software
and used in many other places, is not the worst thing to do.

Regards,	Martin.

Re: revised "generic syntax" internet draft Foteos Macrides
leading ".." (Re: revised ...) Gregory J. Woodhouse
Re: revised "generic syntax" internet draft Roy T. Fielding
Re: revised "generic syntax" internet draft Francois Yergeau
Re: revised "generic syntax" internet draft Roy T. Fielding
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Francois Yergeau
Transcribing non-ascii URLs [was: revised "generi… Dan Connolly
Re: revised "generic syntax" internet draft Edward Cherlin
Re: Transcribing non-ascii URLs [was: revised "ge… Martin J. Duerst
Re: revised "generic syntax" internet draft Roy T. Fielding
Re: revised "generic syntax" internet draft Dan Oscarsson
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft John C Klensin
Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
Re: revised "generic syntax" internet draft Larry Masinter
Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
Re: revised "generic syntax" internet draft Chris Newman
Re: revised "generic syntax" internet draft Chris Newman
Re: revised "generic syntax" internet draft Keld J|rn Simonsen
Re: revised "generic syntax" internet draft Roy T. Fielding
Re: revised "generic syntax" internet draft Chris Newman
Re: revised "generic syntax" internet draft Roy T. Fielding
Re: revised "generic syntax" internet draft Roy T. Fielding
Re: revised "generic syntax" internet draft Roy T. Fielding
Re: revised "generic syntax" internet draft Edward Cherlin
Re: revised "generic syntax" internet draft Larry Masinter
Re: revised "generic syntax" internet draft Harald.T.Alvestrand
Re: revised "generic syntax" internet draft Roy T. Fielding
Re: revised "generic syntax" internet draft Jon Knight
Re: revised "generic syntax" internet draft Jon Knight
Re: revised "generic syntax" internet draft John C Klensin
Re: revised "generic syntax" internet draft Ron Daniel, Jr.
Re: Transcribing non-ascii URLs [was: revised "ge… Bert Bos
Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
A workable alternative to "hex-encoded UTF-8 enco… Larry Masinter
Re: revised "generic syntax" internet draft Larry Masinter
Re: revised "generic syntax" internet draft Larry Masinter
Re: revised "generic syntax" internet draft John C Klensin
Re: revised "generic syntax" internet draft Harald.T.Alvestrand
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Roy T. Fielding
Re: revised "generic syntax" internet draft Chris Newman
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: A workable alternative to "hex-encoded UTF-8 … Martin J. Duerst
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Larry Masinter
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Keld J|rn Simonsen
Re: revised "generic syntax" internet draft Keld J|rn Simonsen
Re: revised "generic syntax" internet draft Jonathan Rosenne
Re: revised "generic syntax" internet draft Keld J|rn Simonsen
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Keld J|rn Simonsen
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Edward Cherlin
Opaque right hand sides (was: Re: revised "generi… John C Klensin
Re: revised "generic syntax" internet draft Karen R. Sollins
UTF-8 and URLs Larry Masinter
Re: UTF-8 and URLs Dan Connolly
Re: UTF-8 and URLs Chris Newman
Re: UTF-8 and URLs John C Klensin
Re: UTF-8 and URLs Francois Yergeau
Re: UTF-8 and URLs Dan Connolly
Re: revised "generic syntax" internet draft Edward Cherlin
Re: revised "generic syntax" internet draft John C Klensin
Re: revised "generic syntax" internet draft Keld J|rn Simonsen
Re: UTF-8 and URLs Martin J. Duerst
Re: UTF-8 and URLs Francois Yergeau
Re: UTF-8 and URLs Dan Connolly
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: revised "generic syntax" internet draft Martin J. Duerst
New proposal (was Re: UTF-8 and URLs) Edward Cherlin
Re: UTF-8 and URLs Larry Masinter
Re: revised "generic syntax" internet draft Martin J. Duerst
Re: UTF-8 and URLs Martin J. Duerst
initial "relative-looking" elements. Larry Masinter
Re: revised "generic syntax" internet draft Edward Cherlin
Re: initial "relative-looking" elements. Roy T. Fielding