RE: [Ltru] Re: Solving the UTF-8 problem

Martin Duerst <> Wed, 04 July 2007 07:26 UTC

Return-path: <>
Received: from [] ( by with esmtp (Exim 4.43) id 1I5zFp-0000Pg-FP; Wed, 04 Jul 2007 03:26:25 -0400
Received: from ltru by with local (Exim 4.43) id 1I5zFo-0000Pa-EY for; Wed, 04 Jul 2007 03:26:24 -0400
Received: from [] ( by with esmtp (Exim 4.43) id 1I5zFo-0000PS-56 for; Wed, 04 Jul 2007 03:26:24 -0400
Received: from ([]) by with esmtp (Exim 4.43) id 1I5zF4-0007lv-Ma for; Wed, 04 Jul 2007 03:26:24 -0400
Received: from (scmse1 []) by (secret/secret) with SMTP id l647PbIp026223 for <>; Wed, 4 Jul 2007 16:25:37 +0900 (JST)
Received: from ( by via smtp id 76e5_c2cebe1c_29ff_11dc_879f_0014221fa3c9; Wed, 04 Jul 2007 16:25:37 +0900
Received: from ([]:36566) by with [XMail 1.22 ESMTP Server] id <SD1831> for <> from <>; Wed, 4 Jul 2007 16:23:30 +0900
Message-Id: <>
X-Sender: duerst@localhost
X-Mailer: QUALCOMM Windows Eudora Version 6J
Date: Wed, 04 Jul 2007 15:43:59 +0900
To: Peter Constable <>, LTRU Working Group <>, "" <>
From: Martin Duerst <>
Subject: RE: [Ltru] Re: Solving the UTF-8 problem
In-Reply-To: <DDB6DE6E9D27DD478AE6D1BBBB83579560F3AAE4EF@NA-EXMSG-C117.r>
References: <006501c7bc33$637b08b0$6401a8c0@DGBP7M81> <> <000701c7bd37$65947eb0$6401a8c0@DGBP7M81> <>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 41c17b4b16d1eedaa8395c26e9a251c4
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>

At 00:24 07/07/04, Peter Constable wrote:
>From: Doug Ewell []
>> I restated three objections to converting the Registry to
>> UTF-8, and tried to show why they don't outweigh the
>> advantages of converting.

Dougs argument that most newcommers are confused by the
numeric character references is a very strong one.

>>  All three are, in fact, true:
>> 1.  UTF-8 doesn't play well with e-mail.
>> 2.  Converting will break processors that expect only ASCII.
>> 3.  Some computers can't display UTF-8.
>> But we can work out the e-mail problem

I'm confident this can be done. I'm one of the people who
cannot view UTF-8 in email, but I consider that my problem,
not a problem of the WG or the subtag registration mailing

One thing we should try to get solved (if it's not already
done) is to make sure that the mailing list archive serves
emails with the correct charset setting. This may or may not
already the case.

>> And the display problem is really not as much of a
>> showstopper as it is being portrayed.  People are saying
>> that the hex escapes are a display problem too...
>+1. I don't see the display issue as being a show-stopper at all. Anybody 
>that has a need to view this registry has access to means of viewing UTF-8.

I strongly agree with this.

>> the breakage to processors is no worse than adding new
>> fields (nor are there that many fully-conformant
>> processors to be fixed).
>I'm inclined to agree, but am waiting to see if anyone makes a strong 

I agree here too. There are not too many implementations that
read in the registry, and of these, some are known and can be fixed,
some are know to be 8-bit tolerant, and some are run only in batch
mode in a central place and can be fixed when an update occurs.

For the implementations where this really matters, i.e. stuff that
is field-deployed with a software upgrade mechanism and polls the
registry, first, such implementations should be rather rare, and
second, they should have been implemented in a robust way, because
with the network, there are no guarantees at all. Explained in another
way, if the implementation throws up because it sees an eigth bit
on a byte, and becomes completely useless (e.g. it clears its
internal language information cache or just blows up), then that's
a very bad implementaion. Even if we keep all our stability guarantees,
there is no guarantee that the network will never turn any bits.

Regards,    Martin.

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University

Ltru mailing list