Re: revised "generic syntax" internet draft
"Martin J. Duerst" <mduerst@ifi.unizh.ch> Mon, 21 April 1997 10:37 UTC
Received: from cnri by ietf.org id aa01191; 21 Apr 97 6:37 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa22283; 21 Apr 97 6:37 EDT
Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id GAA22824 for uri-out; Mon, 21 Apr 1997 06:03:14 -0400 (EDT)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with SMTP id GAA22819 for <uri@services.bunyip.com>; Mon, 21 Apr 1997 06:03:10 -0400 (EDT)
Received: from josef.ifi.unizh.ch by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA20363 (mail destined for uri@services.bunyip.com); Mon, 21 Apr 97 06:03:07 -0400
Received: from enoshima.ifi.unizh.ch by josef.ifi.unizh.ch with SMTP (PP) id <21550-0@josef.ifi.unizh.ch>; Mon, 21 Apr 1997 12:02:55 +0200
Date: Mon, 21 Apr 1997 12:02:53 +0200
From: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
To: "Roy T. Fielding" <fielding@kiwi.ics.uci.edu>
Cc: uri@bunyip.com
Subject: Re: revised "generic syntax" internet draft
In-Reply-To: <9704180819.aa08758@paris.ics.uci.edu>
Message-Id: <Pine.SUN.3.96.970421113701.245E-100000@enoshima>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"
Sender: owner-uri@bunyip.com
Precedence: bulk
Hello Roy, On Fri, 18 Apr 1997, you wrote: > Martin, I haven't forgotten about your very detailed problem statement > at <http://www.imc.org/ietf-url/mail-archive/0052.html>. My question was > whether all the other people advocating non-ASCII URLs agree to that > problem statement, I guess even though many people have expressed the problems (and solutions) in different words, there is wide agreement on it. The summary you have written also expresses the same problems, but the solutions you give are not satisfactory to many of us. > and in particular to the course of action for the > current draft revision. There have been various oppinions, from "leave it as it is, deal with internationalization separately" to "take the chance to recycle and deal with i18n completely". I personally tend towards the later, but I think that you and Larry have worked hard on the current draft, and that there are many aspects in i18n URLs that need long and detailled specs (such as BIDI) so that I think there should be some middle solution. The middle solution would be to include, in the current draft, a clear indication of where we are heading (UTF-8), so that people stay tuned and can take the necessary steps (for example if they have to decide how to set up their server site, whether to use some legacy encoding or UTF-8 for filenames, they can choose UTF-8 because they will know that this will make things easier in the long run), and then write some other documents to describe more advanced things. > >and looks into the way configuration information can be > >setup for Apache to inform it about special needs of scripts > >and stuff, before he again claims things to be impossible. > > It is impossible for Apache to correctly transcode incoming URLs for the > same reason that it is impossible for current browsers to decode and display > the encoded octets of received URLs -- a program cannot transcode bytes to > a different charset unless it knows how the bytes are currently encoded. > There is nothing you can do in the Apache configuration to change that > fact, since it is a property of how the URL is generated (either by some > other part of the server or some part of the user agent or some author > of any page in the Web). I meant the comment about Apache configuration to know which encoding the target of the URL (filename, cgi parameter,...) is in. To convert correctly, you need to know the "charset" of both the source and the target. As for the source (I have explained this already), if we expand the current heuristic "same as target" to "same as target or UTF-8", and not to "whatever it might be", then in sparse namespaces, we have something like a 99.999% hit rate, and because of the properties of UTF-8, we only occasionally need a second file system access. For dense name spaces, we need some information from the browser to distinguish "same as up to now" and "UTF-8", and I have already described the "FORM-UTF8: Yes" that does this job. > I think there is a way to define UTF-8 preference for URL encoding > such that it won't break existing services, by forbidding transcoding > of already-encoded octets. By "already-encoded", do you mean already encoded with %HH? Of course, things that are encoded in %HH should be treated as binary and not messed around with it. Once UTF-8 is firmly established, there might be instances that have a look at the %HH-sequences, find out that they look like UTF-8 (very rare for arbitrary sequences, unless they are only ASCII), and convert them to real characters. On converting back from real characters, UTF-8 would also be used, and so the same octets would be reproduced even if they were not originally UTF-8. However, apart from user interface only cases, this won't be frequent. > However, I won't bother to explain that > until there is broad agreement on what needs to be solved. Please go on and explain your ideas! Maybe they are even closer to mine than you think :-). Regards, Martin.
- Re: revised "generic syntax" internet draft Foteos Macrides
- leading ".." (Re: revised ...) Gregory J. Woodhouse
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Francois Yergeau
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Francois Yergeau
- Transcribing non-ascii URLs [was: revised "generi… Dan Connolly
- Re: revised "generic syntax" internet draft Edward Cherlin
- Re: Transcribing non-ascii URLs [was: revised "ge… Martin J. Duerst
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Dan Oscarsson
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft John C Klensin
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- Re: revised "generic syntax" internet draft Chris Newman
- Re: revised "generic syntax" internet draft Chris Newman
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Chris Newman
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Edward Cherlin
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft Harald.T.Alvestrand
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Jon Knight
- Re: revised "generic syntax" internet draft Jon Knight
- Re: revised "generic syntax" internet draft John C Klensin
- Re: revised "generic syntax" internet draft Ron Daniel, Jr.
- Re: Transcribing non-ascii URLs [was: revised "ge… Bert Bos
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- A workable alternative to "hex-encoded UTF-8 enco… Larry Masinter
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft John C Klensin
- Re: revised "generic syntax" internet draft Harald.T.Alvestrand
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Chris Newman
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: A workable alternative to "hex-encoded UTF-8 … Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Jonathan Rosenne
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Edward Cherlin
- Opaque right hand sides (was: Re: revised "generi… John C Klensin
- Re: revised "generic syntax" internet draft Karen R. Sollins
- UTF-8 and URLs Larry Masinter
- Re: UTF-8 and URLs Dan Connolly
- Re: UTF-8 and URLs Chris Newman
- Re: UTF-8 and URLs John C Klensin
- Re: UTF-8 and URLs Francois Yergeau
- Re: UTF-8 and URLs Dan Connolly
- Re: revised "generic syntax" internet draft Edward Cherlin
- Re: revised "generic syntax" internet draft John C Klensin
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: UTF-8 and URLs Martin J. Duerst
- Re: UTF-8 and URLs Francois Yergeau
- Re: UTF-8 and URLs Dan Connolly
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- New proposal (was Re: UTF-8 and URLs) Edward Cherlin
- Re: UTF-8 and URLs Larry Masinter
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: UTF-8 and URLs Martin J. Duerst
- initial "relative-looking" elements. Larry Masinter
- Re: revised "generic syntax" internet draft Edward Cherlin
- Re: initial "relative-looking" elements. Roy T. Fielding