[Ltru] Re: Solving the UTF-8 problem

Stephane Bortzmeyer <bortzmeyer@nic.fr> Mon, 02 July 2007 20:17 UTC

Return-path: <ltru-bounces@ietf.org>
Received: from [] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1I5SKo-0006Ux-2j; Mon, 02 Jul 2007 16:17:22 -0400
Received: from ltru by megatron.ietf.org with local (Exim 4.43) id 1I5SKm-0006UO-RW for ltru-confirm+ok@megatron.ietf.org; Mon, 02 Jul 2007 16:17:20 -0400
Received: from [] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1I5SKm-0006UB-Hx for ltru@ietf.org; Mon, 02 Jul 2007 16:17:20 -0400
Received: from bortzmeyer.netaktiv.com ([] helo=mail.bortzmeyer.org) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1I5SKd-0003rY-Be for ltru@ietf.org; Mon, 02 Jul 2007 16:17:20 -0400
Received: by mail.bortzmeyer.org (Postfix, from userid 10) id 532E4240817; Mon, 2 Jul 2007 22:17:08 +0200 (CEST)
Received: by mail.sources.org (Postfix, from userid 1000) id 024CB128E0; Mon, 2 Jul 2007 22:15:55 +0200 (CEST)
Date: Mon, 02 Jul 2007 22:15:55 +0200
From: Stephane Bortzmeyer <bortzmeyer@nic.fr>
To: Doug Ewell <dewell@roadrunner.com>
Message-ID: <20070702201555.GA17967@sources.org>
References: <006501c7bc33$637b08b0$6401a8c0@DGBP7M81>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <006501c7bc33$637b08b0$6401a8c0@DGBP7M81>
X-Transport: UUCP rules
X-Operating-System: Debian GNU/Linux 3.1
User-Agent: Mutt/1.5.9i
X-Spam-Score: 0.1 (/)
X-Scan-Signature: e1e48a527f609d1be2bc8d8a70eb76cb
Cc: ietf-languages@iana.org, LTRU Working Group <ltru@ietf.org>
Subject: [Ltru] Re: Solving the UTF-8 problem
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Errors-To: ltru-bounces@ietf.org

On Sun, Jul 01, 2007 at 03:58:48PM -0700,
 Doug Ewell <dewell@roadrunner.com> wrote 
 a message of 161 lines which said:

> 3.  UTF-8 can't be read on some, espcially older, computer systems
> (Frank Ellermann, months ago, and CE Whitehead).
> With the continuing adoption of Unicode by OS and software vendors,
> I really can't get behind this argument.

Sorry but UTF-8 adoption is far from ubiquitous. Many tools still have
problems with UTF-8. I discovered today that ht://Dig, one of the two
most common free search engines has no UTF-8 support at all (see
http://www.htdig.org/FAQ.html#q4.27 and
http://www.htdig.org/FAQ.html#q4.10) which is quite sad for a Web
search engine (and, yes, the explanations they give are wrong, too).

Another common example is the Postscript tool a2ps.

> It simply isn't appropriate to "dumb down" all computerized text to
> match the least capable systems that might be running somewhere.

I understand the reasoning and, yes, switching the registry to UTF-8
might be one more signal sent to software developers, to tell them
they really should upgrade but do not claim that everything is done

So, I basically agree that UTF-8 for the registry is better but I do
not want to see bold sentences like "Anyone but Frank Ellermann can
run a full UTF-8 environment by now". This is not true.

> This is especially true considering the language names listed above.
> We don't restrict text to uppercase to maintain compatibility with
> BCDIC and Sinclair ZX81 systems.

I'm not talking about dead systems but about programs which are live,
used and maintained.

Note from the trenches: as an implementor, I promise to follow
whatever LTRU will decide and to improve my UTF-8 parsing abilities in
Haskell, should we decide to use it.

Ltru mailing list