Re: Using UTF-8 for non-ASCII Characters in URLs
Larry Masinter <masinter@parc.xerox.com> Fri, 02 May 1997 09:45 UTC
Received: from cnri by ietf.org id aa16260; 2 May 97 5:45 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa06230; 2 May 97 5:45 EDT
Received: (from daemon@localhost) by services.bunyip.com (8.8.5/8.8.5) id EAA20900 for uri-out; Fri, 2 May 1997 04:58:59 -0400 (EDT)
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.8.5/8.8.5) with ESMTP id EAA20893 for <uri@services.bunyip.com>; Fri, 2 May 1997 04:58:57 -0400 (EDT)
Received: from alpha.xerox.com (alpha.Xerox.COM [13.1.64.93]) by mocha.bunyip.com (8.8.5/8.8.5) with SMTP id EAA18308 for <uri@bunyip.com>; Fri, 2 May 1997 04:58:55 -0400 (EDT)
Received: from casablanca.parc.xerox.com ([13.2.16.111]) by alpha.xerox.com with SMTP id <17605(1)>; Fri, 2 May 1997 01:58:21 PDT
Received: from bronze-208.parc.xerox.com ([13.0.209.122]) by casablanca.parc.xerox.com with SMTP id <73358>; Fri, 2 May 1997 01:58:12 PDT
Message-ID: <3369AC9E.281F@parc.xerox.com>
Date: Fri, 02 May 1997 01:58:06 -0700
From: Larry Masinter <masinter@parc.xerox.com>
Organization: Xerox PARC
X-Mailer: Mozilla 3.01Gold (Win95; I)
MIME-Version: 1.0
To: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
CC: URI mailing list <uri@bunyip.com>
Subject: Re: Using UTF-8 for non-ASCII Characters in URLs
References: <Pine.SUN.3.96.970501211303.245P-100000@enoshima>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: owner-uri@bunyip.com
Precedence: bulk
This is a great start at dealing with the issues that would otherwise cause great confusion. Other issues: The bidi issues for RLT languages in conjunction with normal punctuation used in and around identifiers. (Will the identifiers present themselves 'correctly' without these characters in all cases?) Using UCS in identifiers that are normally "case insensitive" in ASCII, and the issues, e.g., similar upper-case forms, the role of accents and equivalence. I think "white space" or spacing characters in general need to be addressed. You need to decide whether you're doing canonicalization/normalization or just equivalence. Equivalence is probably easier to define, and less politically sensitive, even though not as useful.
- Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Dan Connolly
- Re: Using UTF-8 for non-ASCII Characters in URLs Michael Kung <MKUNG.US.ORACLE.COM>
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Dan Oscarsson
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Dan Oscarsson
- Re: Using UTF-8 for non-ASCII Characters in URLs Gary Adams - Sun Microsystems Labs BOS
- Re: Using UTF-8 for non-ASCII Characters in URLs Gary Adams - Sun Microsystems Labs BOS
- Re: Using UTF-8 for non-ASCII Characters in URLs Francois Yergeau
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Michael Kung <MKUNG.US.ORACLE.COM>
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Larry Masinter
- Re: Using UTF-8 for non-ASCII Characters in URLs Dan Oscarsson
- Re: "Difficult Characters" draft Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Edward Cherlin
- Re: Using UTF-8 for non-ASCII Characters in URLs Chris Newman
- Re: "Difficult Characters" draft Larry Masinter
- Re: "Difficult Characters" draft Alain LaBont/e'/
- Re: "Difficult Characters" draft Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: "Difficult Characters" draft Leslie Daigle
- Re: "Difficult Characters" draft Alain LaBont/e'/
- Re: "Difficult Characters" draft Martin J. Duerst
- Re: "Difficult Characters" draft Patrik Faltstrom
- Re: Using UTF-8 for non-ASCII Characters in URLs Martin J. Duerst
- Re: Using UTF-8 for non-ASCII Characters in URLs Alain LaBont/e'/