Re: revised "generic syntax" internet draft
Gary Adams - Sun Microsystems Labs BOS <Gary.Adams@east.sun.com> Wed, 16 April 1997 23:55 UTC
Received: from cnri by ietf.org id aa29279; 16 Apr 97 19:55 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa25904; 16 Apr 97 19:55 EDT
Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id TAA24811 for uri-out; Wed, 16 Apr 1997 19:27:22 -0400 (EDT)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with SMTP id TAA24794 for <uri@services.bunyip.com>; Wed, 16 Apr 1997 19:27:18 -0400 (EDT)
Received: from mercury.Sun.COM by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA26706 (mail destined for uri@services.bunyip.com); Wed, 16 Apr 97 19:27:13 -0400
Received: from East.Sun.COM ([129.148.1.241]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id QAA13192; Wed, 16 Apr 1997 16:34:50 -0700
Received: from suneast.East.Sun.COM by East.Sun.COM (SMI-8.6/SMI-5.3) id KAA15546; Wed, 16 Apr 1997 10:49:56 -0400
Received: from zeppo.East.Sun.COM by suneast.East.Sun.COM (SMI-8.6/SMI-SVR4) id KAA02547; Wed, 16 Apr 1997 10:49:57 -0400
Received: by zeppo.East.Sun.COM (SMI-8.6/SMI-SVR4) id KAA03396; Wed, 16 Apr 1997 10:44:21 -0400
Date: Wed, 16 Apr 1997 10:44:21 -0400
From: Gary Adams - Sun Microsystems Labs BOS <Gary.Adams@east.sun.com>
Message-Id: <199704161444.KAA03396@zeppo.East.Sun.COM>
To: Gary.Adams@east.sun.com, fielding@kiwi.ics.uci.edu
Cc: uri@bunyip.com
Subject: Re: revised "generic syntax" internet draft
Sender: owner-uri@bunyip.com
Precedence: bulk
> From: "Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU> > > >If the encoding is labeled (or known to be UTF8), then the magazine > >could publish either native character representation or a %HH escaped > >URL. Similarly the browser could support input of native characters > >or a %HH escaped URL. Finally, the %HH escaped UTF8 URL is transmitted > >to the server and converted for use in accessing the local resource. > > The magazine could also just publish the native character representation > and assume that the reader's browser is set up to use the same charset > encoding as the server. OTOH, the standard could say that when a URL > is entered from a source that has no charset, use UTF-8. The question is > really about what is the most likely charset used by the server. > This is the crux of the problem. The problem with native character representations is that they are often platform specific. e.g. EUC-JP on the Unix http server, SJIS on the PC clients, JIS through the mail system, and soon UTF8 on all the Java components and NFS v4 servers(wishful thinking). The only places where a safe exchange is taking place today is betweeen homogeneous networks. All Unix or all Windows or all Mac networks, or in places where a single national character encoding has been proscribed by law. I do agree with you that the crux of the problem today has to do with what the server can grok and what it expects it's underlying services to grok. Since URLs are opaque, they are safe to pass around and only URL generator can be certain about what the contents really mean. > If a browser assumes that the server is using UTF-8 and transcodes the > non-ASCII octets before submission to the server, then bad things happen > if the server is not using UTF-8. The nature of the "bad things" range > from disallowed access to invalid form data entry. Since it is not > possible for us to require all servers to be upgraded, it is not safe > for browsers to perform transcoding of URLs, and therefore it is impossible > to deploy a solution that requires UTF-8 transcoding UNLESS that decision > is based on the URL scheme. > > Likewise, a server often acts as a gateway for some parts of its namespace, > as is the case for CGI scripts and API modules like mod_php, and other > parts of its namespace are derived from filesystem names. On a server > like Apache, the filesystem-based URLs are generated by url-encoding all > non-urlc bytes without concern for the filesystem charset. While it is > theoretically possible for the server to edit all served content such > that URLs are identified and transcoded to UTF-8, that would assume that > the server knows what charset is used to generate those URLs in the > first place. It can't use a single configuration table for all transcoding, > since the URLs may be generated from sources with varying charsets. > The bottom line is that a server cannot enforce UTF-8 encoding unless > it knows that all of its URLs and gateways use a common charset, and if > that were the case we wouldn't need a UTF-8 solution. > > I listed out the solution space in the hope that people would see the > trade-offs. We know that all-ASCII URLs *interoperate* well on the > Internet, but we also know that they can be ugly. We know that existing > systems will accept non-ASCII URLs if the charset matches that used by > the URL generator/interpreter on the server. We also know that most > existing, deployed servers are not restricted to generating UTF-8 > encoded URLs. So, since I'm looking for a solution to the end to end problem, here's a proposal that I think you might see as a viable solution. ... Without changing the definition of URLs, we simply define the next version of a particular URL scheme (or a new scheme) which includes the constraint or feature that the %HH escaped characters were generated by a UTF8 aware service. Clients could then take advantage of this updated information in determining how to present the URL or in the ways it would accept URL inputs. e.g. httpu://... or GET /%HH%HH HTTP/1.2 In otherwords, an NFS v4 filesystem would commit to Unicode externally visible character strings. A Java based web server would also support an httpu scheme URL or an http version 1.2 transaction for Unicode based pathnames. The syntax is the same, but the semantics are more clearly specified. > In a perfect world, requiring UTF-8 would be a valid solution. But this > is not a perfect world! The purpose of an Internet standard is to define > the requirements for interoperability between implementations of the > applicable protocol. A solution that requires UTF-8 will fail to interoperate > with systems that do not require UTF-8, and the latter is the case for > most URL-based systems on the Internet today. As far as the versioning problem is concerned, a server can always speak a lower version protocol and a client can always rely of proxy services to perform non local protocol requests. Client Server http 1.1 http 1.1 (status quo) http 1.1 http 1.2 (server provides %hh utf8 URLs, but the client doesn't know how to exploit that information) http 1.2 http 1.1 (client knows utf-8 url input methods, but must deliver raw %hh urls) http 1.2 http 1.2 (client and server have a contract about the utf8 url contents) Or alternatively, Client Proxy Server httpu_proxy httpu httpu 1.0 (a unicode http url scheme, with client designated proxy agent) > ...Roy T. Fielding > Department of Information & Computer Science (fielding@ics.uci.edu) > University of California, Irvine, CA 92697-3425 fax:+1(714)824-1715 > http://www.ics.uci.edu/~fielding/ (Sorry if this message is a bit cryptic, one eye on the screen and one eye on my 2yr old, scotch tape and cats really don't go together:-). \ /gra
- Re: revised "generic syntax" internet draft Foteos Macrides
- leading ".." (Re: revised ...) Gregory J. Woodhouse
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Francois Yergeau
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Francois Yergeau
- Transcribing non-ascii URLs [was: revised "generi… Dan Connolly
- Re: revised "generic syntax" internet draft Edward Cherlin
- Re: Transcribing non-ascii URLs [was: revised "ge… Martin J. Duerst
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Dan Oscarsson
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft John C Klensin
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- Re: revised "generic syntax" internet draft Chris Newman
- Re: revised "generic syntax" internet draft Chris Newman
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Chris Newman
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Edward Cherlin
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft Harald.T.Alvestrand
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Jon Knight
- Re: revised "generic syntax" internet draft Jon Knight
- Re: revised "generic syntax" internet draft John C Klensin
- Re: revised "generic syntax" internet draft Ron Daniel, Jr.
- Re: Transcribing non-ascii URLs [was: revised "ge… Bert Bos
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- A workable alternative to "hex-encoded UTF-8 enco… Larry Masinter
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft John C Klensin
- Re: revised "generic syntax" internet draft Harald.T.Alvestrand
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Chris Newman
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: A workable alternative to "hex-encoded UTF-8 … Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Jonathan Rosenne
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Edward Cherlin
- Opaque right hand sides (was: Re: revised "generi… John C Klensin
- Re: revised "generic syntax" internet draft Karen R. Sollins
- UTF-8 and URLs Larry Masinter
- Re: UTF-8 and URLs Dan Connolly
- Re: UTF-8 and URLs Chris Newman
- Re: UTF-8 and URLs John C Klensin
- Re: UTF-8 and URLs Francois Yergeau
- Re: UTF-8 and URLs Dan Connolly
- Re: revised "generic syntax" internet draft Edward Cherlin
- Re: revised "generic syntax" internet draft John C Klensin
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: UTF-8 and URLs Martin J. Duerst
- Re: UTF-8 and URLs Francois Yergeau
- Re: UTF-8 and URLs Dan Connolly
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- New proposal (was Re: UTF-8 and URLs) Edward Cherlin
- Re: UTF-8 and URLs Larry Masinter
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: UTF-8 and URLs Martin J. Duerst
- initial "relative-looking" elements. Larry Masinter
- Re: revised "generic syntax" internet draft Edward Cherlin
- Re: initial "relative-looking" elements. Roy T. Fielding