[Ltru] Re: scripts on langtag.net

Stephane Bortzmeyer <bortzmeyer@nic.fr> Fri, 22 June 2007 07:46 UTC

Return-path: <ltru-bounces@ietf.org>
Received: from [] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1I1dr9-0005rG-99; Fri, 22 Jun 2007 03:46:59 -0400
Received: from ltru by megatron.ietf.org with local (Exim 4.43) id 1I1dr7-0005pS-Q0 for ltru-confirm+ok@megatron.ietf.org; Fri, 22 Jun 2007 03:46:57 -0400
Received: from [] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1I1dr6-0005ny-WB for ltru@ietf.org; Fri, 22 Jun 2007 03:46:57 -0400
Received: from mx2.nic.fr ([]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1I1dr5-0001We-Lz for ltru@ietf.org; Fri, 22 Jun 2007 03:46:56 -0400
Received: from mx2.nic.fr (localhost []) by mx2.nic.fr (Postfix) with SMTP id 29AF31C012E; Fri, 22 Jun 2007 09:46:55 +0200 (CEST)
Received: from relay2.nic.fr (relay2.nic.fr []) by mx2.nic.fr (Postfix) with ESMTP id 248061C0128; Fri, 22 Jun 2007 09:46:54 +0200 (CEST)
Received: from bortzmeyer.nic.fr (batilda.nic.fr []) by relay2.nic.fr (Postfix) with ESMTP id 2153358EB54; Fri, 22 Jun 2007 09:46:54 +0200 (CEST)
Date: Fri, 22 Jun 2007 09:46:54 +0200
From: Stephane Bortzmeyer <bortzmeyer@nic.fr>
To: GerardM <gerard.meijssen@gmail.com>
Message-ID: <20070622074654.GB18927@nic.fr>
References: <41a006820706212358r4aa63497qbae1402c2b456489@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <41a006820706212358r4aa63497qbae1402c2b456489@mail.gmail.com>
X-Operating-System: Debian GNU/Linux 4.0
X-Kernel: Linux 2.6.18-4-686 i686
Organization: NIC France
X-URL: http://www.nic.fr/
User-Agent: Mutt/1.5.13 (2006-08-11)
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 5a9a1bd6c2d06a21d748b7d0070ddcb8
Cc: ltru@ietf.org
Subject: [Ltru] Re: scripts on langtag.net
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Errors-To: ltru-bounces@ietf.org

[I assume you refer to

On Fri, Jun 22, 2007 at 08:58:26AM +0200,
 GerardM <gerard.meijssen@gmail.com> wrote 
 a message of 48 lines which said:

> When it is argued that the content should be in ASCII in order to be
> readable, then please make it so that the content of the website
> parses well. I get for instance "Ethiopic (Ge&#x2BB;ez) / Ethiopic
> (Ge'ez)". This is nor humanly readable.

Do you prefer:

Ethiopic (Ge'ez)


If so, I can create an ASCII file where all descriptions with numeric
entities omitted. There is no general way to translate from these
entities to ASCII (that's why Unicode was invented, after all).

And note that, in scripts only, four subtags have no pure-ASCII
description (Hano, Nkoo, Lepc and Hang) so this would mean not only
selecting some descriptions but also parsing words inside

[Side note: the registry seems a bit inconsistent. Why:

Subtag: Ethi
Description: Ethiopic (Ge&#x2BB;ez)
Description: Ethiopic (Ge'ez)


Subtag: Hang
Description: Hangul (Hang&#x16D;l, Hangeul)

Why not:

Subtag: Hang
Description: Hangul (Hang&#x16D;l)
Description: Hangul (Hangeul)


End of side note]

> To me this notion that UTF-8 cannot be used because it might break
> things is rather odd when I cannot properly read all the names of
> scripts on langtag.net. To me it seems that things are broken
> anyway.

I would be glad to provide plain-text UTF8 versions of the registry as
soon as I have time to write the program. Since langtag.net is a
cooperative effort, any cooperation is welcome (probably better off-list).

Ltru mailing list