Character set issues again

dank@blacks.jpl.nasa.gov Fri, 15 October 1993 17:00 UTC

Message-Id: <9310151544.AA27195@blacks.jpl.nasa.gov>
To: ietf-wnils@ucdavis.edu
Cc: dank@alumni.caltech.edu
Subject: Character set issues again
Date: Fri, 15 Oct 1993 08:44:25 -0600
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: dank@blacks.jpl.nasa.gov

In the Architecture document, under "Format of a Search String",
there is some mention of the need for using non-ASCII character sets, but
no guidelines.

In the time since the draft was written, there has been a lot of progress
towards adoption of Unicode as an all-inclusive international character set.
Major companies, including Microsoft and Apple, have committed to switching
to Unicode.  Unicode is currently supported by Microsoft Windows-NT
and Plan 9 (the latest research version of ATT Unix), and a version of Xterm
supporting Unicode was recently posted to Usenet.

Plan 9's encoding was designed to look like vanilla ASCII when viewed
by naive programs, including the Unix filesystem's filename code.
This has the benefit that you can use Unicode filenames even on operating
systems that don't support Unicode.

I propose that wnils officially endorse Unicode as the non-ASCII character
set of choice to be supported someday.
I also propose that wnils separately consider the issue of
how to encode Unicode so that plain ASCII and Unicode can be intermixed,
and suggest that Plan 9's encoding might be a solution.  
Finally, wnils should look at how to deal with the difficulties of comparing
Unicode text strings.  (For an idea of how one company tried to deal with
Unicode, look at the Microsoft Windows Win32s programmer's reference manuals.)

Whew.  Hope you guys didn't settle this issue while I wasn't watching.
I'd love to hear whether you think my suggestion is germane...
- Dan Kegel (dank@alumni.caltech.edu)

p.s. excerpt from wnils docs

August 19, 1992
               Architecture of the WHOIS++ service
...
Format of a Search String
--------------------------
 
[* The actual format of a search string is not yet specified, as there is
   a discussion to be had concerning the use of non-ASCII (esp. but no
   limited to other European languages) in search strings and even
   attribute names, etc. We must allow for this, but this document is
   merely flagging this need for now. This sounds like fruitful grounds
   for Working Group discussions... *]

Character set issues again dank
Re: Character set issues again Peter Deutsch
Re: Character set issues again John C Klensin