Re: [Ltru] updated demo

Mark Davis ⌛ <mark@macchiato.com> Sun, 28 June 2009 21:51 UTC

Return-Path: <mark.edward.davis@gmail.com>
X-Original-To: ltru@core3.amsl.com
Delivered-To: ltru@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id CE5B93A6B55 for <ltru@core3.amsl.com>; Sun, 28 Jun 2009 14:51:16 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.949
X-Spam-Level:
X-Spam-Status: No, score=-0.949 tagged_above=-999 required=5 tests=[AWL=-0.956, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, J_CHICKENPOX_46=0.6, MIME_8BIT_HEADER=0.3, URIBL_RHS_DOB=1.083]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FzMkQOjLHRIm for <ltru@core3.amsl.com>; Sun, 28 Jun 2009 14:51:15 -0700 (PDT)
Received: from mail-gx0-f226.google.com (mail-gx0-f226.google.com [209.85.217.226]) by core3.amsl.com (Postfix) with ESMTP id 565093A6B20 for <ltru@ietf.org>; Sun, 28 Jun 2009 14:51:15 -0700 (PDT)
Received: by gxk26 with SMTP id 26so3192321gxk.13 for <ltru@ietf.org>; Sun, 28 Jun 2009 14:51:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type; bh=op2ZgGGazlDfPHJw7EbB1akMPTZbxHAkdpDvrlRbRes=; b=EcM9ZO1/ZObIDyLk3TKidJGNG890W0eVylQMdGyxBr+LSOwqM5cyG83IwqtfkCxs08 DtVVBSYN7JvAw0yHw9cpDTQeJ8PJYVfKWdBBeacIPayIAbv9OgCk+wt7GOWERZH/I/AN q1nkTq+xwa21fYCfHo5bR2mWvJXuJQfH5ypGE=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; b=piROcmz32y9gY+E39VJ09D6eFteRinvAljy6jZLIiuS2TCchibrsPnwuEmdaCuc9/M cx81K6JpByL/QsZMm9EfjqKkMA8C5WagymQfXh3b2SfZf0ENlTpgFl2zb8kSDr6PF/Qz Q1O+BLsHDWDdvKCnZyz0xPquIGUR4MWZ3fEQk=
MIME-Version: 1.0
Sender: mark.edward.davis@gmail.com
Received: by 10.100.96.4 with SMTP id t4mr8125028anb.170.1246225892167; Sun, 28 Jun 2009 14:51:32 -0700 (PDT)
In-Reply-To: <ba4134970906281347o63cb306g5df5ed06651b75e7@mail.gmail.com>
References: <30b660a20906271138o186f82a5xd2531f70806ab3be@mail.gmail.com> <ba4134970906280207td8dbdd4l8a4860f7ee4de28@mail.gmail.com> <30b660a20906281307p7324a2a4uf1a29a41d6271378@mail.gmail.com> <ba4134970906281347o63cb306g5df5ed06651b75e7@mail.gmail.com>
Date: Sun, 28 Jun 2009 14:51:32 -0700
X-Google-Sender-Auth: 00faa8d186333238
Message-ID: <30b660a20906281451x687c2e22n614c9f6fc783a30f@mail.gmail.com>
From: Mark Davis ⌛ <mark@macchiato.com>
To: Felix Sasaki <felix.sasaki@fh-potsdam.de>
Content-Type: multipart/alternative; boundary="0016e645b9449cd5f1046d6f936f"
Cc: LTRU Working Group <ltru@ietf.org>
Subject: Re: [Ltru] updated demo
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 28 Jun 2009 21:51:17 -0000

Some comments on one point you raise:

On Sun, Jun 28, 2009 at 13:47, Felix Sasaki <felix.sasaki@fh-potsdam.de>wrote:
...

>
> Yes. I guess you are using CLDR data for the localized subtag names? For
> including such data easily, a common result format for language tag analysis
> would be good.
>

For localization, CLDR uses the structure as you see in
http://unicode.org/cldr/data/common/main/de.xml

Language subtags illustrate this:

<ldml>
	<localeDisplayNames>
		<languages>
			<language type="aa">Afar</language>
...


However, the data can also be used to translate compounds, such as

			<language type="de_AT">Österreichisches Deutsch</language>
			<language type="de_CH">Schweizer Hochdeutsch</language>

That means that when getting the localized name for a tag, you have to first
try with lang+script+region, then lang+script, then lang+region to see if
there are any matches, then remove the fields you got in order to look up
the rest.

Scripts are similar, but don't have the compounds:

		<scripts>
			<script type="Arab">Arabisch</script>
...

Regions use the name 'territory', having predated BCP 47:

		<territories>
			<territory type="001">Welt</territory>

Variants are simple.

		<variants>
...
			<variant type="1994">Standardisierte Resianische Rechtschreibung</variant>
...

Ideally, however, they should allow for compounds, since for goofy compound
variant tags like sl-SI-rozaj-njiva-1994, you don't want a term like:

Slovenian (Slovenia, standardized resian orthography, resian, gniva/njiva
dialect)

but rather something a bit more readable and less repetitious like:

Slovenian (Slovenia, gniva/njiva dialect with standardized resian
orthography)

And we don't have support for extensions yet.

To put components together, there are localizable patterns:

		<localeDisplayPattern>
			<localePattern>{0} ({1})</localePattern>
			<localeSeparator>, </localeSeparator>

While it might be more natural to have structure for something like the
following, that is a real challenge for generative localization of language
tags, because of grammatical changes required by composition in more complex
languages.

Chinese written in traditional script as used in Hong Kong

There is, however, the ability to have abbreviations, like:

			<territory type="HK">Sonderverwaltungszone Hongkong</territory>
			<territory type="HK" alt="short">Hongkong</territory>

What we don't have yet is the ability to allow different forms of names for
different target environments. In flowing text, you might want to say
"traditional Chinese" for "zh-Hant", but in an alphabetized menu, something
like "Chinese, Traditional". That is, order, casing, and wording might
change between those two environments.