Re: [Ltru] updated demo

Hello Mark,
thank you for your helpful explanations. I have used this part of CLDR data
before in a different context (localization of XML schema). My point "For
including such data easily, a common result format for language tag analysis
would be good. " was that I would not need to implement the CLDR analysis
"again" if we would have one result format for language tag analysis - I
would just fetch parts of your results (e.g. the localization bit), and you
might fetch parts of mine (e.g. the subtag proposals).

Felix

2009/6/28 Mark Davis ⌛ <mark@macchiato.com>

> Some comments on one point you raise:
>
> On Sun, Jun 28, 2009 at 13:47, Felix Sasaki <felix.sasaki@fh-potsdam.de>wrote:
> ...
>
>>
>> Yes. I guess you are using CLDR data for the localized subtag names? For
>> including such data easily, a common result format for language tag analysis
>> would be good.
>>
>
> For localization, CLDR uses the structure as you see in
> http://unicode.org/cldr/data/common/main/de.xml
>
> Language subtags illustrate this:
>
> <ldml>
> 	<localeDisplayNames>
> 		<languages>
> 			<language type="aa">Afar</language>
>
> ...
>
>
> However, the data can also be used to translate compounds, such as
>
> 			<language type="de_AT">Österreichisches Deutsch</language>
>
> 			<language type="de_CH">Schweizer Hochdeutsch</language>
>
> That means that when getting the localized name for a tag, you have to
> first try with lang+script+region, then lang+script, then lang+region to see
> if there are any matches, then remove the fields you got in order to look up
> the rest.
>
> Scripts are similar, but don't have the compounds:
>
> 		<scripts>
> 			<script type="Arab">Arabisch</script>
>
> ...
>
> Regions use the name 'territory', having predated BCP 47:
>
> 		<territories>
> 			<territory type="001">Welt</territory>
>
> Variants are simple.
>
> 		<variants>
> ...
> 			<variant type="1994">Standardisierte Resianische Rechtschreibung</variant>
>
> ...
>
> Ideally, however, they should allow for compounds, since for goofy compound
> variant tags like sl-SI-rozaj-njiva-1994, you don't want a term like:
>
> Slovenian (Slovenia, standardized resian orthography, resian, gniva/njiva
> dialect)
>
> but rather something a bit more readable and less repetitious like:
>
> Slovenian (Slovenia, gniva/njiva dialect with standardized resian
> orthography)
>
> And we don't have support for extensions yet.
>
> To put components together, there are localizable patterns:
>
> 		<localeDisplayPattern>
> 			<localePattern>{0} ({1})</localePattern>
>
> 			<localeSeparator>, </localeSeparator>
>
> While it might be more natural to have structure for something like the
> following, that is a real challenge for generative localization of language
> tags, because of grammatical changes required by composition in more complex
> languages.
>
> Chinese written in traditional script as used in Hong Kong
>
> There is, however, the ability to have abbreviations, like:
>
> 			<territory type="HK">Sonderverwaltungszone Hongkong</territory>
>
> 			<territory type="HK" alt="short">Hongkong</territory>
>
> What we don't have yet is the ability to allow different forms of names for
> different target environments. In flowing text, you might want to say
> "traditional Chinese" for "zh-Hant", but in an alphabetized menu, something
> like "Chinese, Traditional". That is, order, casing, and wording might
> change between those two environments.
>
>
> _______________________________________________
> Ltru mailing list
> Ltru@ietf.org
> https://www.ietf.org/mailman/listinfo/ltru
>
>