Re: [Ietf-languages] Language subtag registration form

Doug Ewell <doug@ewellic.org> Fri, 27 November 2020 20:51 UTC

Return-Path: <doug@ewellic.org>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7F3703A0D1E for <ietf-languages@ietfa.amsl.com>; Fri, 27 Nov 2020 12:51:41 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tcRcXC93XKA3 for <ietf-languages@ietfa.amsl.com>; Fri, 27 Nov 2020 12:51:40 -0800 (PST)
Received: from mork.alvestrand.no (mork.alvestrand.no [158.38.152.117]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BA1913A0992 for <ietf-languages@ietf.org>; Fri, 27 Nov 2020 12:51:39 -0800 (PST)
Received: by mork.alvestrand.no (Postfix) id BA64A7C64EF; Fri, 27 Nov 2020 21:51:37 +0100 (CET)
Delivered-To: ietf-languages@alvestrand.no
X-Comment: SPF skipped for whitelisted relay - client-ip=192.0.46.73; helo=pechora3.dc.icann.org; envelope-from=doug@ewellic.org; receiver=ietf-languages@alvestrand.no
Received: from pechora3.dc.icann.org (pechora3.icann.org [192.0.46.73]) by mork.alvestrand.no (Postfix) with ESMTPS id 746AC7C64BE for <ietf-languages@alvestrand.no>; Fri, 27 Nov 2020 21:51:37 +0100 (CET)
Received: from p3plsmtpa07-07.prod.phx3.secureserver.net (p3plsmtpa07-07.prod.phx3.secureserver.net [173.201.192.236]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pechora3.dc.icann.org (Postfix) with ESMTPS id 5BA4C7000849 for <ietf-languages@iana.org>; Fri, 27 Nov 2020 20:51:34 +0000 (UTC)
Received: from DESKTOPLPOB1E4 ([73.229.14.229]) by :SMTPAUTH: with ESMTPSA id ikiJkPWKJsWMDikiKklyS2; Fri, 27 Nov 2020 13:51:33 -0700
X-CMAE-Analysis: v=2.4 cv=WZ/J12tX c=1 sm=1 tr=0 ts=5fc166d5 a=9XGd8Ajh92evfb2NHZFWmw==:117 a=9XGd8Ajh92evfb2NHZFWmw==:17 a=IkcTkHD0fZMA:10 a=te1EGT4yAAAA:8 a=nORFd0-XAAAA:8 a=hPxnefIyoY-mbl1orBcA:9 a=QEXdDO2ut3YA:10 a=qCQTxTj4OGMA:10 a=6fcv1VwEgVcA:10 a=RRElR4r2U1jGY2dU47NL:22 a=AYkXoqVYie-NGRFAsbO8:22
X-SECURESERVER-ACCT: doug@ewellic.org
From: Doug Ewell <doug@ewellic.org>
To: 'Peter Constable' <pgcon6@msn.com>, 'Mark Davis ☕️' <mark@macchiato.com>
Cc: ietf-languages@iana.org, 'Sebastian Drude' <drude@xs4all.nl>
References: <CAKZQS29HBak-v6M2HLCpdgeZHJTFVc2W_w4G=qOK+mtPcXEenQ@mail.gmail.com> <4846f915-5706-e9dc-8b16-9f16362f82f0@xs4all.nl> <001001d6c2c0$dbbb4180$9331c480$@ewellic.org> <CAJ2xs_HXFgNmNZEnPd=FJP_JV1ioRTxFpJ3hB8scaar=qeOuaA@mail.gmail.com> <MWHPR1301MB21129ED84D7FCC14287BC75386F90@MWHPR1301MB2112.namprd13.prod.outlook.com>
In-Reply-To: <MWHPR1301MB21129ED84D7FCC14287BC75386F90@MWHPR1301MB2112.namprd13.prod.outlook.com>
Date: Fri, 27 Nov 2020 13:51:31 -0700
Message-ID: <000501d6c4ff$171fc170$455f4450$@ewellic.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AQEbdRGCByWYvjkiHbavI4UI8xsU7gHXknWdAmkKSaMBwQYBgQIFl3o5qxLSprA=
Content-Language: en-us
X-CMAE-Envelope: MS4xfOIs7qxW9fJ8s38nxaeKKwagbPg7R9nxZmxCpjV7k44oqsIzIc+Dbl61pRD+fUF0g6HYonIq74bhqTHOmrNSK105KmQN/Zh/3JqJzayasp/JELlweinG DjB6Q0TOSlM24SaeovnK33sqDxdjVxL0J4vtaH5WgkLnMvu551VweiSIeODdDqdoIXQVC4zyJ1QiIPiW/fBkVPywlsPDcEOd8zETDmmQsMxXQvVbY0yeD9gX K4SMdgrkOA4cwuESvi7tNpyTE/SYBqjy/rKaxgjgZfM=
X-Greylist: Sender DNS name whitelisted, not delayed by milter-greylist-4.6.2 (pechora3.dc.icann.org [0.0.0.0]); Fri, 27 Nov 2020 20:51:34 +0000 (UTC)
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/rGn484a75Ck69uS3zNdzzmxgVuY>
Subject: Re: [Ietf-languages] Language subtag registration form
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 27 Nov 2020 20:51:42 -0000

I agree with Peter that while the RFCs that define the 'u' and 't' extensions may be used as good examples for future extensions, the code lists themselves should not be. They are wrapped in large zip files along with hundreds of other files, most of which have nothing to do with the extensions, but a few of which do (i.e. you might want the DTDs in order to validate the XMLs where the code lists live).

A better example of how to present the code lists might be ISO 15924:

https://www.unicode.org/iso15924/codelists.html

There are a few HTML tables, nicely formatted for human browsing, but the normative code list is in a single, semicolon-delimited text file. It's zipped to prevent mangling by download protocols ("Oh, good, a text file! Let's change line endings and convert everything to Windows-1252!"), but it's the only file contained within that zip file. Its name includes the release date, and individual records show the date each was added to the list.

At one point I had hoped that the CLDR data items pertaining to the 'u' and 't' extensions would be split out into their own Registry-like files, and I still have examples of how that would work, but in the end, doing this was not a priority of the CLDR project.

To their credit, they are still machine-readable text files, as opposed to HTML that one must scrape (like UN M.49), or any of several native spreadsheet, word-processor, or database formats (as ISO 4217 used to be; thankfully CLDR distills that for us for the 'u' extension). And, of course, they follow all the other requirements of BCP 47: freely available, versioned, stable, etc.

--
Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org