Re: [Ltru] updated demo

It does look similar. After looking them over, I think sometimes one is
better and sometimes the other is. The differences I can see (other than UI)
are:

*Feedback on ill-formed, invalid, or non-preferred values.*

de-x

   - http://unicode.org/cldr/utility/languageid.jsp?a=de-x shows the tag
   where the problem lies, but not potential fixes. (I use a regex based on the
   ABNF for well-formedness, and if it fails I just show the tag where the
   problem is.)
   - http://www.w3.org/2008/05/lta/language-tags/q?input=de-x shows what the
   potential subtags at that point might be.

iw-su

   - http://unicode.org/cldr/utility/languageid.jsp?a=iw-su&l=en shows the
   replacement values for iw and su.
   -
   http://www.w3.org/2008/05/lta/language-tags/q?input=iw-su&output=html&hl=enjust
says they are valid. It does show all the registry information, like
   when the code was added.

eng-840

   - http://unicode.org/cldr/utility/languageid.jsp?a=eng-840&l=en shows the
   replacements for the wrong choice of source code (3 letter language when 2
   letter exists (common in the field), 3 digit region when 2 letter exists)
   -
   http://www.w3.org/2008/05/lta/language-tags/q?input=eng-840&output=html&hl=enjust
says they are invalid.

*Localization:*

sl-Cyrl-YU - Arabic, German

   - http://unicode.org/cldr/utility/languageid.jsp?a=sl-Cyrl-YU&l=ar and
   http://unicode.org/cldr/utility/languageid.jsp?a=sl-Cyrl-YU&l=de show
   localized subtag names.
   -
   http://www.w3.org/2008/05/lta/language-tags/q?input=sl-Cyrl-YU&output=html&hl=aromits
text;
   http://www.w3.org/2008/05/lta/language-tags/q?input=sl-Cyrl-YU&output=html&hl=dehas
a localized UI, but not localized subtag names.

*Prefix Warnings*

en-cmn-rozaj

   - http://unicode.org/cldr/utility/languageid.jsp?a=en-cmn&l=ar doesn't
   give a warning (it just applies strict validity).
   -
   http://www.w3.org/2008/05/lta/language-tags/q?input=en-cmn-rozaj&output=html&hl=endoes
supply warnings for missing variant prefixes.

*Canonical Form*

sl-cyrl-Yu-rozaj-Solba-1994-b-1234-a-Foobar-x-b-1234-a-Foobar

   -
   http://unicode.org/cldr/utility/languageid.jsp?a=sl-cyrl-Yu-rozaj-Solba-1994-b-1234-a-Foobar-x-b-1234-a-Foobar&l=enputs
the results in canonical casing and order (and shows canonical
   replacements). It does not validate extensions, like "b-1234". (It follows
   LDML canonical order for variants - alphabetical.)
   -
   http://www.w3.org/2008/05/lta/language-tags/q?input=sl-Cyrl-YU-rozaj-solba-1994-b-1234-a-Foobar-x-b-1234-a-Foobar&output=html&hl=endoesn't.
It also gives a validation error on extensions.

Validating extensions is debatable - the validity of these is established
outside of the spec and iana subtag registry. Probably best would be neither
of the above: a warning, not an error.

Note that http://unicode.org/cldr/utility/languageid.jsp<http://unicode.org/cldr/utility/languageid.jsp?a=sl-cyrl-Yu-rozaj-Solba-1994-b-1234-a-Foobar-x-b-1234-a-Foobar&l=en>says
"suggested canonical form", since in the case of multiple replacements
it doesn't try to pick the best one. Eg the best guess for ru-SU is ru-RU,
but the best guess for az-SU would be az-AZ. It also doesn't try to find
missing prefix values for variants; that's probably of such low frequency
that it doesn't pay.

*Completeness*

   - http://unicode.org/cldr/utility/languageid.jsp?a=i-default&l=en doesn't
   allow grandfathered codes. (Following LDML.)
   - http://www.w3.org/2008/05/lta/language-tags/q?input=i-default does.

FYI, the regex it uses is:

      (?: ( [a-z A-Z]{2,8} | [a-z A-Z]{2,3} [-_] [a-z A-Z]{3} )
      (?: [-_] ( [a-z A-Z]{4} ) )?
      (?: [-_] ( [a-z A-Z]{2} | [0-9]{3} ) )?
      (?: [-_] ( (?: [0-9 a-z A-Z]{5,8} | [0-9] [0-9 a-z A-Z]{3} ) (?: [-_]
(?: [0-9 a-z A-Z]{5,8} | [0-9] [0-9 a-z A-Z]{3} ) )* ) )?
      (?: [-_] ( [a-w y-z A-W Y-Z] (?: [-_] [0-9 a-z A-Z]{2,8} )+ (?: [-_]
[a-w y-z A-W Y-Z] (?: [-_] [0-9 a-z A-Z]{2,8} )+ )* ) )?
      (?: [-_] ( [xX] (?: [-_] [0-9 a-z A-Z]{1,8} )+ ) )? )
    | ( [xX] (?: [-_] [0-9 a-z A-Z]{1,8} )+ )

Mark

On Sun, Jun 28, 2009 at 02:07, Felix Sasaki <felix.sasaki@fh-potsdam.de>wrote:

> Hello Mark,
>
> this looks similar to
> http://www.w3.org/2008/05/lta/
> my language tag parser currently based on draft 21 of rfc4646bis. lta also
> contains some error checking mechanisms, see examples like
>
> http://www.w3.org/2008/05/lta/language-tags/q?input=de-x
> http://www.w3.org/2008/05/lta/language-tags/q?input=xa
> http://www.w3.org/2008/05/lta/language-tags/q?input=en-latn
> http://www.w3.org/2008/05/lta/language-tags/q?input=ja-1901
> http://www.w3.org/2008/05/lta/language-tags/q?input=fr-cmn
> http://www.w3.org/2008/05/lta/language-tags/q?input=zh-cmn-cmn
> http://www.w3.org/2008/05/lta/language-tags/q?input=zh-cmn-a-bbb-a-ccc
> http://www.w3.org/2008/05/lta/language-tags/q?input=de-de-1901-1901
>
> Output is available in HTML with German UI and English, and in an XML
> format, see e.g.
>
> http://www.w3.org/2008/05/lta/language-tags/q?input=de-de-1901-1901&output=xml
>
> My comment on your tool is that to co-ordinate such efforts it would be
> great to have a common machine-readable output format for language tag
> parsing, also e.g. to deal with error descriptions like
>
>  <lta:variant>
>
>       <lta:subtag>1901</lta:subtag>
>       <lta:registryInfo>
>
>          <lta:var ty="variant" su="1901" ad="2005-10-16">
>
>             <lta:ds>Traditional German orthography
>
> </lta:ds>
>             <lta:pref>de</lta:pref>
>
>          </lta:var>
>       </lta:registryInfo>
>       <lta:matchedPrefix>de</lta:matchedPrefix>
>
>       <lta:error type="e007">
>          <lta:errorText>Variant repetition</lta:errorText>
>
>          <lta:errorAddInfo>
>
>             <lta:subtag>1901</lta:subtag>
>          </lta:errorAddInfo>
>
>       </lta:error>
>    </lta:variant>
>
>
> Felix
>
> 2009/6/27 Mark Davis ⌛ <mark@macchiato.com>
>
>> I updated the demo at http://unicode.org/cldr/utility/languageid.jsp to
>> parse extlangs. The samples include official languages and the scripts they
>> use (based on CLDR data), and the names have localizations where available.
>>
>> Comments welcome.
>>
>> Mark
>>
>> _______________________________________________
>> Ltru mailing list
>> Ltru@ietf.org
>> https://www.ietf.org/mailman/listinfo/ltru
>>
>>
>