Re: [Ltru] Re: Test suite for language tags?

"Mark Davis" <mark.davis@icu-project.org> Sun, 17 September 2006 21:49 UTC

Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1GP4Vj-0004p1-BM; Sun, 17 Sep 2006 17:49:11 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1GP4Vh-0004ou-Ix for ltru@lists.ietf.org; Sun, 17 Sep 2006 17:49:09 -0400
Received: from nf-out-0910.google.com ([64.233.182.185]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1GP4Vb-0000jr-VC for ltru@lists.ietf.org; Sun, 17 Sep 2006 17:49:09 -0400
Received: by nf-out-0910.google.com with SMTP id n15so2911115nfc for <ltru@lists.ietf.org>; Sun, 17 Sep 2006 14:49:02 -0700 (PDT)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=N5hW8j4MkV1cUonpAjXHGlwQnd0wXsYDtmmkXK41kvHSAdFxNumrFmTwzwctJnBeMduaL7NIWIVWNqOdk0oydR8Kq8m3SGjALIVklXtOC7ISqH2h+D5Looi/yPlNdHw4T0EINQUaNMxE0otZvxfEpCZUf00svmfz4GYwWQEoHVE=
Received: by 10.48.14.4 with SMTP id 4mr16244310nfn; Sun, 17 Sep 2006 14:49:01 -0700 (PDT)
Received: by 10.49.65.16 with HTTP; Sun, 17 Sep 2006 14:49:01 -0700 (PDT)
Message-ID: <30b660a20609171449u1ee4b3b9n9c715666aa369226@mail.gmail.com>
Date: Sun, 17 Sep 2006 14:49:01 -0700
From: Mark Davis <mark.davis@icu-project.org>
To: Martin Duerst <duerst@it.aoyama.ac.jp>
Subject: Re: [Ltru] Re: Test suite for language tags?
In-Reply-To: <6.0.0.20.2.20060917115535.0899d670@localhost>
MIME-Version: 1.0
References: <20060801203351.GA8854@sources.org> <20060804165720.GA24037@sources.org> <44D4AC42.79E0@xyzzy.claranet.de> <20060830093000.GA31895@nic.fr> <44F6313D.2070000@yahoo-inc.com> <6.0.0.20.2.20060831201004.101ab8d0@localhost> <44F6EF0E.20602@yahoo-inc.com> <6.0.0.20.2.20060901024806.109a6d90@localhost> <30b660a20609161628t22ab3c4flc81ea92f40800a09@mail.gmail.com> <6.0.0.20.2.20060917115535.0899d670@localhost>
X-Google-Sender-Auth: a8a8f3037ff56768
X-Spam-Score: 0.5 (/)
X-Scan-Signature: 0fa76816851382eb71b0a882ccdc29ac
Cc: Frank Ellermann <nobody@xyzzy.claranet.de>, ltru@lists.ietf.org
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============1724309798=="
Errors-To: ltru-bounces@ietf.org

Fixed, thanks!

Mark

On 9/16/06, Martin Duerst <duerst@it.aoyama.ac.jp> wrote:
>
> There's a mistake in the regex, from the underlying
> langtagRegex.txt, that allows zh-zh-cmn-Hans.
>
> Regards,    Martin.
>
>
> At 08:28 06/09/17, Mark Davis wrote:
> >BTW, I had updated my regex to the final spec for 4646. Here is a single
> Perl or Java regex that does most of the parse:
> >
> >Regex: ((?: [a-z A-Z]{2,3} (?: [-] [a-z A-Z]{3} ){0,3} | [a-z A-Z]{4,8}
> ))(?: [-] ((?: [a-z A-Z]{4} )) )?(?: [-] ((?: [a-z A-Z]{2} | [0-9]{3} ))
> )?(?: [-] ((?: (?: [0-9] [a-z A-Z 0-9]{3} | [a-z A-Z 0-9]{5,8} ) (?: [-] (?:
> [0-9] [a-z A-Z 0-9]{3} | [a-z A-Z 0-9]{5,8} ) )* )) )?(?: [-] ((?: (?: [a-w
> y-z A-W Y-Z] (?: [-] [a-z A-Z 0-9]{2,8} )+ ) (?: [-] (?: [a-w y-z A-W Y-Z]
> (?: [-] [a-z A-Z 0-9]{2,8} )+ ) )* )) )?(?: [-] ((?: [xX] (?: [-] [a-z A-Z
> 0-9]{1,8} )+ )) )?| ( (?i) art [-] lojban| cel [-] gaulish| en [-] (?: boont
> | GB [-] oed | scouse )| i [-] (?: ami | bnn | default | enochian | hak |
> klingon | lux | mingo | navajo | pwn | tao | tay | tsu )| no [-] (?: bok |
> nyn)| sgn [-] (?: BE [-] fr | BE [-] nl | CH [-] de)| zh [-] (?: cmn | zh
> [-] cmn [-] Hans | cmn [-] Hant | gan | guoyu | hakka | min | min [-] nan |
> wuu | xiang | yue))| ((?: [xX] (?: [-] [a-z A-Z 0-9]{1,8} )+ ))
> >
> >It checks for the grandfathered tags, since otherwise too much cruft
> sneaks in. You can't check in regex that there are only single instances of
> each singleton extension. (In retrospect we could have allowed multiple
> singletons: we could have accepted en-a-bcdef-ghijk-b-123 -a-lmnop as
> equivalent to the canonical form en-a-bcdef-ghijk-lmnop-b-123, but that's
> water under the bridge at this point.) Of course, I didn't put this together
> by hand. The table used to build it is much more readable, at
> >
> ><
> http://unicode.org/cldr/data/tools/java/org/unicode/cldr/util/data/langtagRegex.txt
> >
> http://unicode.org/cldr/data/tools/java/org/unicode/cldr/util/data/langtagRegex.txt
> >
> >and a test file that includes strings mentioned on this list is at:
> >
> ><
> http://unicode.org/cldr/data/tools/java/org/unicode/cldr/util/data/langtagTest.txt
> >
> http://unicode.org/cldr/data/tools/java/org/unicode/cldr/util/data/langtagTest.txt
> >Mark
>
>
> #-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
> #-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp
>
>
_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www1.ietf.org/mailman/listinfo/ltru