Transcribing non-ascii URLs [was: revised "generic syntax" internet draft]
Dan Connolly <connolly@w3.org> Mon, 14 April 1997 18:35 UTC
Received: from cnri by ietf.org id aa05009; 14 Apr 97 14:35 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa17203; 14 Apr 97 14:35 EDT
Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id NAA03493 for uri-out; Mon, 14 Apr 1997 13:54:54 -0400 (EDT)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with SMTP id NAA03486 for <uri@services.bunyip.com>; Mon, 14 Apr 1997 13:54:51 -0400 (EDT)
Received: from beach.w3.org by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA06110 (mail destined for uri@services.bunyip.com); Mon, 14 Apr 97 13:54:49 -0400
Received: from beach.w3.org (beach.w3.org [207.8.37.250]) by beach.w3.org (8.8.4/8.8.4) with SMTP id MAA03854; Mon, 14 Apr 1997 12:54:42 -0500
Message-Id: <33526F61.622A4B12@w3.org>
Date: Mon, 14 Apr 1997 12:54:41 -0500
From: Dan Connolly <connolly@w3.org>
Organization: World Wide Web Consortium
X-Mailer: Mozilla 3.01 (X11; I; Linux 2.0.27 i586)
Mime-Version: 1.0
To: Francois Yergeau <yergeau@alis.com>
Cc: uri@bunyip.com, bert@w3.org
Subject: Transcribing non-ascii URLs [was: revised "generic syntax" internet draft]
References: <Your message of "Sun, 13 Apr 1997 23:54:47 EDT." <3.0.1.32.19970413235447.006e2e48@genstar.alis.com> <3.0.1.32.19970414121551.00cc2d70@genstar.alis.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: owner-uri@bunyip.com
Precedence: bulk
Francois Yergeau wrote: > > Any application that transmits a URL in > >non-ASCII characters is declared non-compliant. > > You are confusing characters and bytes. While you may want to restrict the > transmitted bytes to 7 bits (but again, why?), you cannot restrict the > range of characters. Hence a full mapping is required, not ASCII-only. > The current spec omits that mapping. I have been shooting from the hip on this I18N/URL stuff for a while, but some folks at WWW6 wanted the full weight of W3C behind it, so I've been trying to think more carefully. And this issue of transcribing non-ascii URLs particularly concerns me. On the one hand, it makes a lot of sense that if a user creates a file and gives it a hebrew or arabic or CJK name, and then exports the file via an HTTP server, that the Address: field in a web browser should show the hebrew or arabic or ... characters faithfully. On the other hand, suppose that address is to be printed and put in an advertisement or a magazine article. Should it print the hebrew/arabic/CJK characters using those glyphs? Or should it print ASCII glyphs corresponding to the characters of the %xx encoding of the original characters? If the former, then reliability suffers: the odds that a random person on the globe can faithfully key in a hebrew/arabic/CJK name seem considerably lower than the odds that they can key in an ASCII name. (though the odds of correctly transcribing a long sequence of %xx codes is vanishingly small too...) (I'm not saying that everybody knows english, but rather that a person using a computer connected to the internet has a farily high probablility of being able to match the 'a' character on a peice of paper to the 'a' character on the keyboard.) If the latter, then the system is very much biased to the *American* Standard Code of Information Interchange. It seems to me that the minimally constraining thing to do is to specify both and allow folks to choose: specify how Unicode strings fit into URLs, and then advise folks to use a small subset of Unicode if their audience is international (and at the same time, add a few more notes: perhaps advise folks that mixing upper and lowercase increases the risk of transcription errors). What's the conventional wisdom among the DNS folks? Surely they face the same issue. Regarding process, it seems clear (based on Larry M and John K's input) that specifying how Unicode strings fit into URLs is not the sort of thing one adds to a proposed standard to make it a draft standard. But I'm not terribly interested in a draft standard that doesn't address this issue -- even if only to say "we thought about encoding Unicode in URLs, but decided against it for the following reasons... ." In either case, a separate internet draft on the subject seems like a perfectly good idea. I don't think the risk of "incompatible standards" is unmageable. Larry has asked for implementation experience. Such experience seems to be growing. None of the implementors has reported any problems (as far as I can see). Regarding Jigsaw and Amaya... Support in Jigsaw should be easy. I'll look into it. Anybody want to do it for me? Should be a quick hack. Support in Amaya would be more work. I don't think we've crossed the hurdle of getting non-western fonts working in Amaya, not to mention internationalized input. -- Dan Connolly, W3C Architecture Domain Lead <connolly@w3.org> +1 512 310-2971 http://www.w3.org/People/Connolly/ PGP:EDF8 A8E4 F3BB 0F3C FD1B 7BE0 716C FF21
- Re: revised "generic syntax" internet draft Foteos Macrides
- leading ".." (Re: revised ...) Gregory J. Woodhouse
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Francois Yergeau
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Francois Yergeau
- Transcribing non-ascii URLs [was: revised "generi… Dan Connolly
- Re: revised "generic syntax" internet draft Edward Cherlin
- Re: Transcribing non-ascii URLs [was: revised "ge… Martin J. Duerst
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Dan Oscarsson
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft John C Klensin
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- Re: revised "generic syntax" internet draft Chris Newman
- Re: revised "generic syntax" internet draft Chris Newman
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Chris Newman
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Edward Cherlin
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft Harald.T.Alvestrand
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Jon Knight
- Re: revised "generic syntax" internet draft Jon Knight
- Re: revised "generic syntax" internet draft John C Klensin
- Re: revised "generic syntax" internet draft Ron Daniel, Jr.
- Re: Transcribing non-ascii URLs [was: revised "ge… Bert Bos
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- Re: revised "generic syntax" internet draft Gary Adams - Sun Microsystems Labs BOS
- A workable alternative to "hex-encoded UTF-8 enco… Larry Masinter
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft John C Klensin
- Re: revised "generic syntax" internet draft Harald.T.Alvestrand
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Roy T. Fielding
- Re: revised "generic syntax" internet draft Chris Newman
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: A workable alternative to "hex-encoded UTF-8 … Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Larry Masinter
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Jonathan Rosenne
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Edward Cherlin
- Opaque right hand sides (was: Re: revised "generi… John C Klensin
- Re: revised "generic syntax" internet draft Karen R. Sollins
- UTF-8 and URLs Larry Masinter
- Re: UTF-8 and URLs Dan Connolly
- Re: UTF-8 and URLs Chris Newman
- Re: UTF-8 and URLs John C Klensin
- Re: UTF-8 and URLs Francois Yergeau
- Re: UTF-8 and URLs Dan Connolly
- Re: revised "generic syntax" internet draft Edward Cherlin
- Re: revised "generic syntax" internet draft John C Klensin
- Re: revised "generic syntax" internet draft Keld J|rn Simonsen
- Re: UTF-8 and URLs Martin J. Duerst
- Re: UTF-8 and URLs Francois Yergeau
- Re: UTF-8 and URLs Dan Connolly
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: revised "generic syntax" internet draft Martin J. Duerst
- New proposal (was Re: UTF-8 and URLs) Edward Cherlin
- Re: UTF-8 and URLs Larry Masinter
- Re: revised "generic syntax" internet draft Martin J. Duerst
- Re: UTF-8 and URLs Martin J. Duerst
- initial "relative-looking" elements. Larry Masinter
- Re: revised "generic syntax" internet draft Edward Cherlin
- Re: initial "relative-looking" elements. Roy T. Fielding