Re: Using UTF-8 for non-ASCII Characters in URLs
Edward Cherlin <cherlin@newbie.net> Fri, 02 May 1997 18:02 UTC
Received: from cnri by ietf.org id aa29137; 2 May 97 14:02 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa16940; 2 May 97 14:02 EDT
Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id NAA07666 for uri-out; Fri, 2 May 1997 13:41:55 -0400 (EDT)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with ESMTP id NAA07659 for <uri@services.bunyip.com>; Fri, 2 May 1997 13:41:52 -0400 (EDT)
Received: from mtshasta.snowcrest.net (mtshasta.snowcrest.net [206.245.192.1]) by mocha.bunyip.com (8.8.5/8.8.5) with ESMTP id NAA22494 for <uri@Bunyip.Com>; Fri, 2 May 1997 13:41:39 -0400 (EDT)
Received: from [206.245.192.60] (ttyD3.mtshasta.snowcrest.net [206.245.192.35]) by mtshasta.snowcrest.net (8.8.5/8.6.5) with ESMTP id KAA24598 for <uri@Bunyip.Com>; Fri, 2 May 1997 10:41:13 -0700 (PDT)
X-Sender: cherlin@snowcrest.net
Message-Id: <v0300783faf8f314b10e6@[206.245.192.60]>
In-Reply-To: <Pine.SUN.3.96.970501211303.245P-100000@enoshima>
References: <199705010017.RAA27111@mailsun3-fddi.us.oracle.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Thu, 01 May 1997 23:32:37 -0700
To: uri@bunyip.com
From: Edward Cherlin <cherlin@newbie.net>
Subject: Re: Using UTF-8 for non-ASCII Characters in URLs
Sender: owner-uri@bunyip.com
Precedence: bulk
"Martin J. Duerst" <mduerst@ifi.unizh.ch> wrote: [snip] > >Internet Draft M. Duerst ><draft-duerst-i18n-norm-00?.txt> University of Zurich >Expires in six months May 1997 > > [snip] > >1.? Notation > > > Codepoints from the UCS are denoted as U+XXXX, where XXXX is their > hexadecimal representation, according to [Unicode, p.???]. The Unicode Standard Version 2.0, p. 1-5. > > Stretches of characters? "A range of Unicode values is expressed as U+xxxx-->U+yyyy or U+xxxx--U+yyyy..." p. 1-5. > Official character names and components all > upper case. "...uppercase Latin letters A through Z, space, and hyphen-minus;..." p. 1-5 > > > > > Expires in six months [Page 3] > >Internet DrafNormalization of Internationalized Identifiers May 1997 > > >2. Categories of Ambiguity and Problems > > > Comparing two sequences of codepoints from the UCS, various degrees > of ambiguity can arise: > > Category A: The two sequences are expected to be rendered exactly the > same, considered identical by the user, and cannot be disambiguated > by context. > > Category B: The two sequences are "semantically" different but diffi- > cult or impossible to distinguish in rendering. > > Category C: ????? > > ???? > > There are also a number of codepoints in the UCS that should not be > used for various reasons, mainly that they are not available on usual > keyboards. These go into Category X. That could be taken to apply to math and APL characters, which would be unfortunate. There are strong reasons for allowing math and APL expressions in identifiers for math and APL pages. I published a book, "The Encyclopedia of APL" which was indexed in APL as well as in English names of APL symbols, functions, and operators. It would have been a useful Web site. All codepoints can be entered from standard keyboards. There are keyboards and other entry methods for almost all Unicode characters implemented in some software, and all can be used in keyboard layouts of standard form. We must expect keyboard layout utilities to appear in future multilingual software. I think we need some other distinction. When we start listing the Category X characters, we can discuss their characteristics more meaningfully. [snip] >Bibliography > > [HTML] T. Berners-Lee and D. Connolly, "Hypertext Markup Lan- > guage - 2.0" (RFC1866), MIT/W3C, November 1995. > > [Unicode2] Unicode????, Version 2, Addisson-Wesley, Reading, MA, > 1996. The Unicode Standard, Version 2, Addison-Wesley... > [HTML-I18N] F. Yergeau, G. Nicol, G. Adams, and M. Duerst, "Inter- > nationalization of the Hypertext Markup Language", > Work in progress (draft-ietf-html-i18n-05.txt), August > 1996. > > > > > > > Expires in six months [Page 7] > >Internet DrafNormalization of Internationalized Identifiers May 1997 > > >Author's Address > > Martin J. Duerst > Multimedia-Laboratory > Department of Computer Science > University of Zurich > Winterthurerstrasse 190 > CH-8057 Zurich > Switzerland > > Tel: +41 1 257 43 16 > Fax: +41 1 363 00 35 > E-mail: mduerst@ifi.unizh.ch > > > NOTE -- Please write the author's name with u-Umlaut wherever > possible, e.g. in HTML as Dürst. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Expires in six months [Page 8] > -- Edward Cherlin cherlin@newbie.net Everything should be made Vice President Ask. Someone knows. as simple as possible, NewbieNet, Inc. __but no simpler__. http://www.newbie.net/ Attributed to Albert Einstein
- Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Dan Connolly
- Re: Using UTF-8 for non-ASCII Characters in URLs Michael Kung <MKUNG.US.ORACLE.COM>
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Dan Oscarsson
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Dan Oscarsson
- Re: Using UTF-8 for non-ASCII Characters in URLs Gary Adams - Sun Microsystems Labs BOS
- Re: Using UTF-8 for non-ASCII Characters in URLs Gary Adams - Sun Microsystems Labs BOS
- Re: Using UTF-8 for non-ASCII Characters in URLs Francois Yergeau
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Michael Kung <MKUNG.US.ORACLE.COM>
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Dan Oscarsson
- Re: "Difficult Characters" draft Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Edward Cherlin
- Re: Using UTF-8 for non-ASCII Characters in URLs Chris Newman
- Re: "Difficult Characters" draft Larry Masinter
- Re: "Difficult Characters" draft Alain LaBont/e'/
- Re: "Difficult Characters" draft Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: "Difficult Characters" draft Leslie Daigle
- Re: "Difficult Characters" draft Alain LaBont/e'/
- Re: "Difficult Characters" draft Martin J. Duerst
- Re: "Difficult Characters" draft Patrik Faltstrom
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Alain LaBont/e'/