[Ltru] Re: Test suite for language tags?

"Mark Davis" <mark.davis@icu-project.org> Sun, 17 September 2006 21:54 UTC

Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1GP4b3-0006XC-0A; Sun, 17 Sep 2006 17:54:41 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1GP4b1-0006X7-Lo for ltru@lists.ietf.org; Sun, 17 Sep 2006 17:54:39 -0400
Received: from nf-out-0910.google.com ([64.233.182.189]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1GP4b0-0001sS-8x for ltru@lists.ietf.org; Sun, 17 Sep 2006 17:54:39 -0400
Received: by nf-out-0910.google.com with SMTP id n15so2911788nfc for <ltru@lists.ietf.org>; Sun, 17 Sep 2006 14:54:37 -0700 (PDT)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=N5SlXOgWNgIiY9Z/qgy9MPuu8MtbcU4dHJRvt4e5QsMJAVJmGHoeBW/jOqNuX6jzumg9RJu58awnNxL0GZjAXSJFM9CPsXhdoaOz5IY1N61JdIxY8/mT5vGz26zNA7bBFFrK4+4rug3rUMCACbYNfamdeMUf2Xg5ilXvmsmUvD0=
Received: by 10.48.254.1 with SMTP id b1mr16261607nfi; Sun, 17 Sep 2006 14:54:36 -0700 (PDT)
Received: by 10.49.65.16 with HTTP; Sun, 17 Sep 2006 14:54:36 -0700 (PDT)
Message-ID: <30b660a20609171454k3f80374p646d156948c13535@mail.gmail.com>
Date: Sun, 17 Sep 2006 14:54:36 -0700
From: Mark Davis <mark.davis@icu-project.org>
To: Stephane Bortzmeyer <bortzmeyer@nic.fr>
In-Reply-To: <20060917165050.GA29413@sources.org>
MIME-Version: 1.0
References: <20060802072709.GA17404@nic.fr> <20060804165720.GA24037@sources.org> <44D4AC42.79E0@xyzzy.claranet.de> <20060830093000.GA31895@nic.fr> <44F6313D.2070000@yahoo-inc.com> <6.0.0.20.2.20060831201004.101ab8d0@localhost> <44F6EF0E.20602@yahoo-inc.com> <6.0.0.20.2.20060901024806.109a6d90@localhost> <30b660a20609161628t22ab3c4flc81ea92f40800a09@mail.gmail.com> <20060917165050.GA29413@sources.org>
X-Google-Sender-Auth: 554d6793c55b6b8f
X-Spam-Score: 0.5 (/)
X-Scan-Signature: b4a0a5f5992e2a4954405484e7717d8c
Cc: ltru@lists.ietf.org
Subject: [Ltru] Re: Test suite for language tags?
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============0081637701=="
Errors-To: ltru-bounces@ietf.org

That's odd. Testing in Java on the file
http://unicode.org/cldr/data/tools/java/org/unicode/cldr/util/data/langtagTest.txt(which
includes those cases) the only failures I get are the ones that
repeat extensions (like en-a-bbb-a-ccc), which can't be tested for with
regex. I don't think I'm using any regex syntax that differs between Java
and Perl (there are very few instances).

Note: I added a number of mechanically generated items to the test file.

Mark

On 9/17/06, Stephane Bortzmeyer <bortzmeyer@nic.fr> wrote:
>
> On Sat, Sep 16, 2006 at 04:28:01PM -0700,
> Mark Davis <mark.davis@icu-project.org> wrote
> a message of 98 lines which said:
>
> > BTW, I had updated my regex to the final spec for 4646. Here is a
> > single Perl or Java regex that does most of the parse:
>
> Isn't it too lax? When testing it in a Perl script, I find it accepts
> all my well-formed tags (OK) but also accepts wrongly:
>
> fr-Latn-F is well-formed
> en-a-bbb-a-ccc is well-formed
> tlh-a-b-foo is well-formed
> abcdefghi-012345678 is well-formed
> ab-abc-abc-abc-abc is well-formed
> ab-abcd-abc is well-formed
> ab-ab-abc is well-formed
> ab-123-abc is well-formed
> ab-abcde-abc is well-formed
> ab-1abc-abc is well-formed
> ab-ab-abcd is well-formed
> ab-123-abcd is well-formed
> ab-abcde-abcd is well-formed
> ab-1abc-abcd is well-formed
> ab-a-b is well-formed
> ab-a-x is well-formed
> ab--ab is well-formed
> ab-abc- is well-formed
> ab-c-abc-r-toto-c-abc is well-formed
> abcd-efg is well-formed
> aabbccddE is well-formed
>
_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www1.ietf.org/mailman/listinfo/ltru