Re: [Ltru] updated demo

Felix Sasaki <felix.sasaki@fh-potsdam.de> Sun, 28 June 2009 23:38 UTC

Return-Path: <felix.sasaki@googlemail.com>
X-Original-To: ltru@core3.amsl.com
Delivered-To: ltru@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 938F43A694F for <ltru@core3.amsl.com>; Sun, 28 Jun 2009 16:38:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.122
X-Spam-Level:
X-Spam-Status: No, score=-1.122 tagged_above=-999 required=5 tests=[AWL=-0.046, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, J_CHICKENPOX_46=0.6, MIME_8BIT_HEADER=0.3]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bwzUnflntdIp for <ltru@core3.amsl.com>; Sun, 28 Jun 2009 16:38:51 -0700 (PDT)
Received: from mail-fx0-f218.google.com (mail-fx0-f218.google.com [209.85.220.218]) by core3.amsl.com (Postfix) with ESMTP id C55CA3A68A8 for <ltru@ietf.org>; Sun, 28 Jun 2009 16:38:50 -0700 (PDT)
Received: by fxm18 with SMTP id 18so672134fxm.37 for <ltru@ietf.org>; Sun, 28 Jun 2009 16:39:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type; bh=7gRm9oZRacCNna/ln085QJaiupQrFVHT9bJlI+m22gI=; b=fNnIU81RsRMBOYn5wv78lpkWgTqBPzryw2ZTF6rgh1PoyGsJzSE9QjDDxiSTF6Gk+y zYdRmSSI2v4oMNo/bc1AOVvGw1CADlyqy9lOHsUbW1e2fd3qs+t2AVzH6eDceSFVhRMZ T9de3e8mz5/ot5zT3+doi8tUGpA9OGsO5e2x0=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; b=K9UPhOfbFHlP6Bd8rNI/KnF10NDa+KKPROggkBuci/XXpEIVI86x5nnLW6FPY87g4P 6Rp9KHTtC+VdjK7eljntbmFdy6meoW5neXGzzmPBurKaHJqPkCWeAKDw6TFOwnITKg2k 3WFddSaYfLgBzyG/QZLf4a0SBJ20LbXasz28A=
MIME-Version: 1.0
Sender: felix.sasaki@googlemail.com
Received: by 10.223.114.74 with SMTP id d10mr4011471faq.87.1246232347846; Sun, 28 Jun 2009 16:39:07 -0700 (PDT)
In-Reply-To: <30b660a20906281451x687c2e22n614c9f6fc783a30f@mail.gmail.com>
References: <30b660a20906271138o186f82a5xd2531f70806ab3be@mail.gmail.com> <ba4134970906280207td8dbdd4l8a4860f7ee4de28@mail.gmail.com> <30b660a20906281307p7324a2a4uf1a29a41d6271378@mail.gmail.com> <ba4134970906281347o63cb306g5df5ed06651b75e7@mail.gmail.com> <30b660a20906281451x687c2e22n614c9f6fc783a30f@mail.gmail.com>
Date: Mon, 29 Jun 2009 01:39:07 +0200
X-Google-Sender-Auth: 608554bec107f0f6
Message-ID: <ba4134970906281639l3592dc3bncc0e9fa38e8ed88b@mail.gmail.com>
From: Felix Sasaki <felix.sasaki@fh-potsdam.de>
To: Mark Davis ⌛ <mark@macchiato.com>
Content-Type: multipart/alternative; boundary="0016368e2bc966a8e9046d71140b"
Cc: LTRU Working Group <ltru@ietf.org>
Subject: Re: [Ltru] updated demo
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 28 Jun 2009 23:38:52 -0000

Hello Mark,
thank you for your helpful explanations. I have used this part of CLDR data
before in a different context (localization of XML schema). My point "For
including such data easily, a common result format for language tag analysis
would be good. " was that I would not need to implement the CLDR analysis
"again" if we would have one result format for language tag analysis - I
would just fetch parts of your results (e.g. the localization bit), and you
might fetch parts of mine (e.g. the subtag proposals).

Felix


2009/6/28 Mark Davis ⌛ <mark@macchiato.com>

> Some comments on one point you raise:
>
> On Sun, Jun 28, 2009 at 13:47, Felix Sasaki <felix.sasaki@fh-potsdam.de>wrote:
> ...
>
>>
>> Yes. I guess you are using CLDR data for the localized subtag names? For
>> including such data easily, a common result format for language tag analysis
>> would be good.
>>
>
> For localization, CLDR uses the structure as you see in
> http://unicode.org/cldr/data/common/main/de.xml
>
> Language subtags illustrate this:
>
> <ldml>
> 	<localeDisplayNames>
> 		<languages>
> 			<language type="aa">Afar</language>
>
> ...
>
>
> However, the data can also be used to translate compounds, such as
>
> 			<language type="de_AT">Österreichisches Deutsch</language>
>
> 			<language type="de_CH">Schweizer Hochdeutsch</language>
>
> That means that when getting the localized name for a tag, you have to
> first try with lang+script+region, then lang+script, then lang+region to see
> if there are any matches, then remove the fields you got in order to look up
> the rest.
>
> Scripts are similar, but don't have the compounds:
>
> 		<scripts>
> 			<script type="Arab">Arabisch</script>
>
> ...
>
> Regions use the name 'territory', having predated BCP 47:
>
> 		<territories>
> 			<territory type="001">Welt</territory>
>
> Variants are simple.
>
> 		<variants>
> ...
> 			<variant type="1994">Standardisierte Resianische Rechtschreibung</variant>
>
> ...
>
> Ideally, however, they should allow for compounds, since for goofy compound
> variant tags like sl-SI-rozaj-njiva-1994, you don't want a term like:
>
> Slovenian (Slovenia, standardized resian orthography, resian, gniva/njiva
> dialect)
>
> but rather something a bit more readable and less repetitious like:
>
> Slovenian (Slovenia, gniva/njiva dialect with standardized resian
> orthography)
>
> And we don't have support for extensions yet.
>
> To put components together, there are localizable patterns:
>
> 		<localeDisplayPattern>
> 			<localePattern>{0} ({1})</localePattern>
>
> 			<localeSeparator>, </localeSeparator>
>
> While it might be more natural to have structure for something like the
> following, that is a real challenge for generative localization of language
> tags, because of grammatical changes required by composition in more complex
> languages.
>
> Chinese written in traditional script as used in Hong Kong
>
> There is, however, the ability to have abbreviations, like:
>
> 			<territory type="HK">Sonderverwaltungszone Hongkong</territory>
>
> 			<territory type="HK" alt="short">Hongkong</territory>
>
> What we don't have yet is the ability to allow different forms of names for
> different target environments. In flowing text, you might want to say
> "traditional Chinese" for "zh-Hant", but in an alphabetized menu, something
> like "Chinese, Traditional". That is, order, casing, and wording might
> change between those two environments.
>
>
> _______________________________________________
> Ltru mailing list
> Ltru@ietf.org
> https://www.ietf.org/mailman/listinfo/ltru
>
>