Re: [Ltru] Re: Solving the UTF-8 problem

Chris Newman <Chris.Newman@Sun.COM> Tue, 10 July 2007 06:39 UTC

Return-path: <>
Received: from [] ( by with esmtp (Exim 4.43) id 1I89NV-00080H-6u; Tue, 10 Jul 2007 02:39:17 -0400
Received: from ltru by with local (Exim 4.43) id 1I89NU-00080C-AV for; Tue, 10 Jul 2007 02:39:16 -0400
Received: from [] ( by with esmtp (Exim 4.43) id 1I89NT-000803-J1 for; Tue, 10 Jul 2007 02:39:15 -0400
Received: from ([]) by with esmtp (Exim 4.43) id 1I89NT-0003oi-48 for; Tue, 10 Jul 2007 02:39:15 -0400
Received: from ([]) by (8.13.6+Sun/8.12.9) with ESMTP id l6A6bsET021827 for <>; Tue, 10 Jul 2007 06:37:54 GMT
Received: from by (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) id <> (original mail from Chris.Newman@Sun.COM) for; Tue, 10 Jul 2007 00:37:54 -0600 (MDT)
Received: from [] ( []) by (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPSA id <>; Tue, 10 Jul 2007 00:37:53 -0600 (MDT)
Date: Mon, 09 Jul 2007 23:38:10 -0700
From: Chris Newman <Chris.Newman@Sun.COM>
Subject: Re: [Ltru] Re: Solving the UTF-8 problem
In-reply-to: <000701c7bd37$65947eb0$6401a8c0@DGBP7M81>
To: Doug Ewell <>, LTRU Working Group <>,
Message-id: <490E048F8E5A21C40359DB47@[]>
MIME-version: 1.0
X-Mailer: Mulberry/3.1.6 (Mac OS X)
Content-type: text/plain; format="flowed"; charset="us-ascii"
Content-transfer-encoding: 7bit
Content-disposition: inline
References: <006501c7bc33$637b08b0$6401a8c0@DGBP7M81> <> <000701c7bd37$65947eb0$6401a8c0@DGBP7M81>
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 7baded97d9887f7a0c7e8a33c2e3ea1b
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>

Doug Ewell wrote on 7/2/07 23:00 -0700:

> Stephane Bortzmeyer <bortzmeyer at nic dot fr> wrote:
>>> 3.  UTF-8 can't be read on some, espcially older, computer systems (Frank
>>> Ellermann, months ago, and CE Whitehead).
>> So, I basically agree that UTF-8 for the registry is better but I do not
>> want to see bold sentences like "Anyone but Frank Ellermann can run a full
>> UTF-8 environment by now". This is not true.
> You're correct.  I restated three objections to converting the Registry to
> UTF-8, and tried to show why they don't outweigh the advantages of
> converting.  All three are, in fact, true:
> 1.  UTF-8 doesn't play well with e-mail.
> 2.  Converting will break processors that expect only ASCII.
> 3.  Some computers can't display UTF-8.
> But we can work out the e-mail problem, and the breakage to processors is no
> worse than adding new fields (nor are there that many fully-conformant
> processors to be fixed).  And the display problem is really not as much of a
> showstopper as it is being portrayed.  People are saying that the hex escapes
> are a display problem too, and adding "Arua" and "Aru&#xE1; (Arua)" to the
> Registry is going to confuse a LOT of people, no matter how many comments we
> add.

UTF-8 has been the recommend charset for Internet interchange since RFC 2277. 
Our past experience with ASCII encodings of non-ASCII text in the IETF has been 
questionable.  RFC 2047, 2231, IMAP modified-UTF-7, and quoted-printable have 
all had mixed results.  Meanwhile, UTF-8 based IETF protocols have been less 
problematic from an interoperability viewpoint.  The EAI WG is putting together 
an experiment to try UTF-8 in email headers and addresses and that will 
increase the pressure to update email infrastructure.

Rough edges are inevitable during the adoption of new technology, but where do 
we want to be 5-10 years from now?  What's the least painful path to get there?

                - Chris

Ltru mailing list