Re: [Ltru] updated demo
Mark Davis ⌛ <mark@macchiato.com> Sun, 28 June 2009 20:07 UTC
Return-Path: <mark.edward.davis@gmail.com>
X-Original-To: ltru@core3.amsl.com
Delivered-To: ltru@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id D35FD3A6BB0 for <ltru@core3.amsl.com>; Sun, 28 Jun 2009 13:07:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.235
X-Spam-Level:
X-Spam-Status: No, score=-2.235 tagged_above=-999 required=5 tests=[AWL=0.358, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, GB_I_LETTER=-2, HTML_MESSAGE=0.001, MIME_8BIT_HEADER=0.3, URIBL_RHS_DOB=1.083]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id c4TOALkNbgRx for <ltru@core3.amsl.com>; Sun, 28 Jun 2009 13:07:07 -0700 (PDT)
Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com [209.85.210.182]) by core3.amsl.com (Postfix) with ESMTP id D6BA63A6AD6 for <ltru@ietf.org>; Sun, 28 Jun 2009 13:07:06 -0700 (PDT)
Received: by yxe12 with SMTP id 12so345205yxe.29 for <ltru@ietf.org>; Sun, 28 Jun 2009 13:07:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type; bh=sI6gPJkHP29GtRDW+/F3aMoY80xJJkTYOTJ2BXt2tX4=; b=LSwRgU0Ovus/q/Q16yp32ANlKL9EXRlOCJXd2sbJSku0T3KGTDlNfvJDN7qskNysFf NEV4zimWhrNGM/k4eLVFERY04qWhGbvNNbIh7ZUnRuNrgjdBN3i9WDZSH1mL76B/lvM3 uQnvmsfgNI2kWETF+ZvJvNrqn0rGoSN+oOdYo=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; b=fwYkl1DPrC4TlMUWYXW+OuHIjo6V5mWJ5PrhnKtDkuAheBfWSFQJpfifk5kwI1x+y8 /vxl8H1Tpk8gRJd7Gza7+K9PdTtHPwaUQH3B/dMLoHX8CeWPEGZBqJdTZUG2/Yr+5P8x onpSENGbHZjgVHHzZLw2ow2SIn2IfuIhjKjTw=
MIME-Version: 1.0
Sender: mark.edward.davis@gmail.com
Received: by 10.100.251.8 with SMTP id y8mr8078737anh.74.1246219644796; Sun, 28 Jun 2009 13:07:24 -0700 (PDT)
In-Reply-To: <ba4134970906280207td8dbdd4l8a4860f7ee4de28@mail.gmail.com>
References: <30b660a20906271138o186f82a5xd2531f70806ab3be@mail.gmail.com> <ba4134970906280207td8dbdd4l8a4860f7ee4de28@mail.gmail.com>
Date: Sun, 28 Jun 2009 13:07:24 -0700
X-Google-Sender-Auth: 836c5d1af1739f85
Message-ID: <30b660a20906281307p7324a2a4uf1a29a41d6271378@mail.gmail.com>
From: Mark Davis ⌛ <mark@macchiato.com>
To: Felix Sasaki <felix.sasaki@fh-potsdam.de>
Content-Type: multipart/alternative; boundary="001636af03423d8126046d6e1f6b"
Cc: LTRU Working Group <ltru@ietf.org>
Subject: Re: [Ltru] updated demo
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 28 Jun 2009 20:07:08 -0000
It does look similar. After looking them over, I think sometimes one is better and sometimes the other is. The differences I can see (other than UI) are: *Feedback on ill-formed, invalid, or non-preferred values.* de-x - http://unicode.org/cldr/utility/languageid.jsp?a=de-x shows the tag where the problem lies, but not potential fixes. (I use a regex based on the ABNF for well-formedness, and if it fails I just show the tag where the problem is.) - http://www.w3.org/2008/05/lta/language-tags/q?input=de-x shows what the potential subtags at that point might be. iw-su - http://unicode.org/cldr/utility/languageid.jsp?a=iw-su&l=en shows the replacement values for iw and su. - http://www.w3.org/2008/05/lta/language-tags/q?input=iw-su&output=html&hl=enjust says they are valid. It does show all the registry information, like when the code was added. eng-840 - http://unicode.org/cldr/utility/languageid.jsp?a=eng-840&l=en shows the replacements for the wrong choice of source code (3 letter language when 2 letter exists (common in the field), 3 digit region when 2 letter exists) - http://www.w3.org/2008/05/lta/language-tags/q?input=eng-840&output=html&hl=enjust says they are invalid. *Localization:* sl-Cyrl-YU - Arabic, German - http://unicode.org/cldr/utility/languageid.jsp?a=sl-Cyrl-YU&l=ar and http://unicode.org/cldr/utility/languageid.jsp?a=sl-Cyrl-YU&l=de show localized subtag names. - http://www.w3.org/2008/05/lta/language-tags/q?input=sl-Cyrl-YU&output=html&hl=aromits text; http://www.w3.org/2008/05/lta/language-tags/q?input=sl-Cyrl-YU&output=html&hl=dehas a localized UI, but not localized subtag names. *Prefix Warnings* en-cmn-rozaj - http://unicode.org/cldr/utility/languageid.jsp?a=en-cmn&l=ar doesn't give a warning (it just applies strict validity). - http://www.w3.org/2008/05/lta/language-tags/q?input=en-cmn-rozaj&output=html&hl=endoes supply warnings for missing variant prefixes. *Canonical Form* sl-cyrl-Yu-rozaj-Solba-1994-b-1234-a-Foobar-x-b-1234-a-Foobar - http://unicode.org/cldr/utility/languageid.jsp?a=sl-cyrl-Yu-rozaj-Solba-1994-b-1234-a-Foobar-x-b-1234-a-Foobar&l=enputs the results in canonical casing and order (and shows canonical replacements). It does not validate extensions, like "b-1234". (It follows LDML canonical order for variants - alphabetical.) - http://www.w3.org/2008/05/lta/language-tags/q?input=sl-Cyrl-YU-rozaj-solba-1994-b-1234-a-Foobar-x-b-1234-a-Foobar&output=html&hl=endoesn't. It also gives a validation error on extensions. Validating extensions is debatable - the validity of these is established outside of the spec and iana subtag registry. Probably best would be neither of the above: a warning, not an error. Note that http://unicode.org/cldr/utility/languageid.jsp<http://unicode.org/cldr/utility/languageid.jsp?a=sl-cyrl-Yu-rozaj-Solba-1994-b-1234-a-Foobar-x-b-1234-a-Foobar&l=en>says "suggested canonical form", since in the case of multiple replacements it doesn't try to pick the best one. Eg the best guess for ru-SU is ru-RU, but the best guess for az-SU would be az-AZ. It also doesn't try to find missing prefix values for variants; that's probably of such low frequency that it doesn't pay. *Completeness* - http://unicode.org/cldr/utility/languageid.jsp?a=i-default&l=en doesn't allow grandfathered codes. (Following LDML.) - http://www.w3.org/2008/05/lta/language-tags/q?input=i-default does. FYI, the regex it uses is: (?: ( [a-z A-Z]{2,8} | [a-z A-Z]{2,3} [-_] [a-z A-Z]{3} ) (?: [-_] ( [a-z A-Z]{4} ) )? (?: [-_] ( [a-z A-Z]{2} | [0-9]{3} ) )? (?: [-_] ( (?: [0-9 a-z A-Z]{5,8} | [0-9] [0-9 a-z A-Z]{3} ) (?: [-_] (?: [0-9 a-z A-Z]{5,8} | [0-9] [0-9 a-z A-Z]{3} ) )* ) )? (?: [-_] ( [a-w y-z A-W Y-Z] (?: [-_] [0-9 a-z A-Z]{2,8} )+ (?: [-_] [a-w y-z A-W Y-Z] (?: [-_] [0-9 a-z A-Z]{2,8} )+ )* ) )? (?: [-_] ( [xX] (?: [-_] [0-9 a-z A-Z]{1,8} )+ ) )? ) | ( [xX] (?: [-_] [0-9 a-z A-Z]{1,8} )+ ) Mark On Sun, Jun 28, 2009 at 02:07, Felix Sasaki <felix.sasaki@fh-potsdam.de>wrote: > Hello Mark, > > this looks similar to > http://www.w3.org/2008/05/lta/ > my language tag parser currently based on draft 21 of rfc4646bis. lta also > contains some error checking mechanisms, see examples like > > http://www.w3.org/2008/05/lta/language-tags/q?input=de-x > http://www.w3.org/2008/05/lta/language-tags/q?input=xa > http://www.w3.org/2008/05/lta/language-tags/q?input=en-latn > http://www.w3.org/2008/05/lta/language-tags/q?input=ja-1901 > http://www.w3.org/2008/05/lta/language-tags/q?input=fr-cmn > http://www.w3.org/2008/05/lta/language-tags/q?input=zh-cmn-cmn > http://www.w3.org/2008/05/lta/language-tags/q?input=zh-cmn-a-bbb-a-ccc > http://www.w3.org/2008/05/lta/language-tags/q?input=de-de-1901-1901 > > Output is available in HTML with German UI and English, and in an XML > format, see e.g. > > http://www.w3.org/2008/05/lta/language-tags/q?input=de-de-1901-1901&output=xml > > My comment on your tool is that to co-ordinate such efforts it would be > great to have a common machine-readable output format for language tag > parsing, also e.g. to deal with error descriptions like > > <lta:variant> > > <lta:subtag>1901</lta:subtag> > <lta:registryInfo> > > <lta:var ty="variant" su="1901" ad="2005-10-16"> > > <lta:ds>Traditional German orthography > > </lta:ds> > <lta:pref>de</lta:pref> > > </lta:var> > </lta:registryInfo> > <lta:matchedPrefix>de</lta:matchedPrefix> > > <lta:error type="e007"> > <lta:errorText>Variant repetition</lta:errorText> > > <lta:errorAddInfo> > > <lta:subtag>1901</lta:subtag> > </lta:errorAddInfo> > > </lta:error> > </lta:variant> > > > Felix > > 2009/6/27 Mark Davis ⌛ <mark@macchiato.com> > >> I updated the demo at http://unicode.org/cldr/utility/languageid.jsp to >> parse extlangs. The samples include official languages and the scripts they >> use (based on CLDR data), and the names have localizations where available. >> >> Comments welcome. >> >> Mark >> >> _______________________________________________ >> Ltru mailing list >> Ltru@ietf.org >> https://www.ietf.org/mailman/listinfo/ltru >> >> >
- [Ltru] updated demo Mark Davis ⌛
- Re: [Ltru] updated demo Felix Sasaki
- Re: [Ltru] updated demo Mark Davis ⌛
- Re: [Ltru] updated demo Felix Sasaki
- Re: [Ltru] updated demo Felix Sasaki
- Re: [Ltru] updated demo Mark Davis ⌛
- Re: [Ltru] updated demo Felix Sasaki
- Re: [Ltru] updated demo Stephane Bortzmeyer
- Re: [Ltru] updated demo Mark Davis ⌛
- Re: [Ltru] updated demo Phillips, Addison
- Re: [Ltru] updated demo Doug Ewell
- Re: [Ltru] updated demo Mark Davis ⌛
- Re: [Ltru] updated demo Felix Sasaki
- Re: [Ltru] updated demo Doug Ewell