Re: [Ltru] Test suite for language tags?

Addison Phillips <addison@yahoo-inc.com> Fri, 04 August 2006 15:20 UTC

Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1G91Sz-0008Py-6h; Fri, 04 Aug 2006 11:20:01 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1G91Sx-0008Pt-NW for ltru@ietf.org; Fri, 04 Aug 2006 11:19:59 -0400
Received: from rsmtp1.corp.yahoo.com ([207.126.228.149]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1G91Sw-0005NC-Bn for ltru@ietf.org; Fri, 04 Aug 2006 11:19:59 -0400
Received: from [10.72.72.17] (snvvpn1-10-72-72-c17.corp.yahoo.com [10.72.72.17]) (authenticated bits=0) by rsmtp1.corp.yahoo.com (8.13.6/8.13.6/y.rout) with ESMTP id k74FJpds011622 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 4 Aug 2006 08:19:52 -0700 (PDT)
DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=message-id:date:from:user-agent:mime-version:to:cc:subject: references:in-reply-to:content-type:content-transfer-encoding; b=DHZ9kGi/YSvtjmfEtOqTHEr4qXbUhYbsDtveftS4qO5+k+rsTE9vky32JhCi+fVe
Message-ID: <44D36598.1020304@yahoo-inc.com>
Date: Fri, 04 Aug 2006 08:19:52 -0700
From: Addison Phillips <addison@yahoo-inc.com>
User-Agent: Thunderbird 1.5.0.5 (Windows/20060719)
MIME-Version: 1.0
To: Doug Ewell <dewell@adelphia.net>
Subject: Re: [Ltru] Test suite for language tags?
References: <E1G8H4g-00035u-Ej@megatron.ietf.org> <001d01c6b78a$07b37890$040aa8c0@DGBP7M81>
In-Reply-To: <001d01c6b78a$07b37890$040aa8c0@DGBP7M81>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Spam-Score: -15.0 (---------------)
X-Scan-Signature: cab78e1e39c4b328567edb48482b6a69
Cc: LTRU Working Group <ltru@ietf.org>
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Errors-To: ltru-bounces@ietf.org

Doug Ewell wrote:
> 
> Just a few minor nits here.  This is slightly more complex than meets 
> the eye, unfortunately.

Yep.
> 
>> - singletons in the first position (except for 'x' and the 
>> grandfathered list)
> 
> Sadly, the existence of the grandfathered list means that all 
> well-formed processors must also do a limited amount of validity 
> checking.  I don't question the importance of maintaining support for 
> the grandfathered tags, but this is a side effect.

Agreed: all RFC 3066bis processors have to have the list of 
grandfathered tags baked in.
> 
>> - missing subtag ("--")
>> - a dangling hyphen ("foo-bar-baz-") or initial hyphen ("-foo-bar-baz")
> 
> The second is really just a special case of the first: a missing subtag 
> at the end or beginning, respectively.  One thing I found useful, when 
> building my validator, was to parse out the subtags first and check them 
> for validity afterward, so the hyphens never become part of the validity 
> checking per se.

I did the same thing. However, one must check the hyphens. Tokenizers 
sometimes do not return "empty" tokens and can miss these cases.
> 
>> "ab-x-abc-x-abc" // anything goes after x
> 
> Not quite anything, of course:  1*("-" (1*8alphanum))

Yes. At least one alphanumeric subtag must follow 'x' and it cannot 
exceed eight characters in length.

Addison

-- 
Addison Phillips
Globalization Architect -- Yahoo! Inc.

Internationalization is an architecture.
It is not a feature.

_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www1.ietf.org/mailman/listinfo/ltru