Re: [Ietf-languages] LANGUAGE SUBTAG REGISTRATION FORM

Doug Ewell <doug@ewellic.org> Fri, 04 September 2020 03:13 UTC

Return-Path: <doug@ewellic.org>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CDA013A1566 for <ietf-languages@ietfa.amsl.com>; Thu, 3 Sep 2020 20:13:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.896
X-Spam-Level:
X-Spam-Status: No, score=-1.896 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DbmjC5Sve9N1 for <ietf-languages@ietfa.amsl.com>; Thu, 3 Sep 2020 20:13:30 -0700 (PDT)
Received: from mork.alvestrand.no (mork.alvestrand.no [IPv6:2001:700:1:2::117]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7C9863A1564 for <ietf-languages@ietf.org>; Thu, 3 Sep 2020 20:13:30 -0700 (PDT)
Received: by mork.alvestrand.no (Postfix) id 019BC7C5CF3; Fri, 4 Sep 2020 05:13:29 +0200 (CEST)
Delivered-To: ietf-languages@alvestrand.no
X-Comment: SPF skipped for whitelisted relay - client-ip=2620:0:2d0:201::1:74; helo=pechora4.lax.icann.org; envelope-from=doug@ewellic.org; receiver=ietf-languages@alvestrand.no
Received: from pechora4.lax.icann.org (pechora4.icann.org [IPv6:2620:0:2d0:201::1:74]) by mork.alvestrand.no (Postfix) with ESMTPS id 91D7B7C5C76 for <ietf-languages@alvestrand.no>; Fri, 4 Sep 2020 05:13:28 +0200 (CEST)
Received: from p3plsmtpa08-09.prod.phx3.secureserver.net (p3plsmtpa08-09.prod.phx3.secureserver.net [173.201.193.110]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pechora4.lax.icann.org (Postfix) with ESMTPS id B464F700048C for <ietf-languages@iana.org>; Fri, 4 Sep 2020 03:13:25 +0000 (UTC)
Received: from DESKTOPLPOB1E4 ([73.229.14.229]) by :SMTPAUTH: with ESMTPSA id E29vkFx04zJFEE29wkJbFE; Thu, 03 Sep 2020 20:13:04 -0700
X-CMAE-Analysis: v=2.3 cv=J57UEzvS c=1 sm=1 tr=0 a=9XGd8Ajh92evfb2NHZFWmw==:117 a=9XGd8Ajh92evfb2NHZFWmw==:17 a=DAwyPP_o2Byb1YXLmDAA:9 a=I0CVDw5ZAAAA:8 a=nORFd0-XAAAA:8 a=eh70xQqfzouBz7AZR54A:9 a=4bzY_eWE1n908pRS:21 a=15iLuBHMXIqeYviK:21 a=QEXdDO2ut3YA:10 a=yMhMjlubAAAA:8 a=SSmOFEACAAAA:8 a=tOl6LddhiTNakAEb6wYA:9 a=EV8k7a_VeWjNiIAD:21 a=gKO2Hq4RSVkA:10 a=UiCQ7L4-1S4A:10 a=hTZeC7Yk6K0A:10 a=frz4AuCg-hUA:10 a=YdXdGVBxRxTCRzIkH2Jn:22 a=AYkXoqVYie-NGRFAsbO8:22
X-SECURESERVER-ACCT: doug@ewellic.org
From: Doug Ewell <doug@ewellic.org>
To: 'Lenny Soshinskiy' <soshial@gmail.com>, 'John Cowan' <cowan@ccil.org>
Cc: 'IETF Languages Discussion' <ietf-languages@iana.org>
References: <CANNmbuVto_acaYc8SFHEY_=OLdp7FRua=1vqXLSFS16TJhezuA@mail.gmail.com> <CAD2gp_QWu95p_P2aLvPZ3DkFCXaftTgs0_d6K2WPZcoXcPhmUQ@mail.gmail.com> <CANNmbuW14OGM1cSPzNkaaXXaPtt=xcuWwhJqGjy=DOMLS8j6mQ@mail.gmail.com> <CAD2gp_TzYHOs1wf8QROo4U9NXdp4Mc5RC4__XNf2qPjUQ=_xBw@mail.gmail.com> <CANNmbuWUf106GuJkFAS5XbUhXZf31oOReTAr2rhQ7=UU2gnLHw@mail.gmail.com> <CAD2gp_Q=v7z+_6aAgahcHH5weWwaL3VyX7u15S-7pmnC3yRnwA@mail.gmail.com> <CANNmbuUbCTfR8YxDm7yCi7gWM99psLoG1EfxBFvFjJW=-P9tpg@mail.gmail.com>
In-Reply-To: <CANNmbuUbCTfR8YxDm7yCi7gWM99psLoG1EfxBFvFjJW=-P9tpg@mail.gmail.com>
Date: Thu, 03 Sep 2020 21:13:04 -0600
Message-ID: <000101d68269$4e3d8580$eab89080$@ewellic.org>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_NextPart_000_0002_01D68237.03A58680"
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AQK5JOEX/jYvhD7nHMNgNLLvTD55rgJqd2TJAb9ZumQBuGNwTAKU3zqeAlfeuLwCeOvIBacoSJ+w
Content-Language: en-us
X-CMAE-Envelope: MS4wfDnHrobmYVXL6Cg9tTrLXcMxkH79ACxNWnwu7uLS8yTR4R+L6sPeRFc3ZWu+luntx7WBLRhhGuXFZG9r63VJcoKpUPpP10JYXc/0W9bqYCl71zU6Z60+ tErJWopSa0ssUN3Clykd7pPzznrndtd9TdABjc5yCWMF6PEC1j2wQ95HTYD5AOzKK/zq+96XyvQoGmLF40M2ie8QSkjyRFulWtcBzoVu/6QqzDxFOssLBNJy
X-Greylist: Sender DNS name whitelisted, not delayed by milter-greylist-4.6.2 (pechora4.lax.icann.org [0.0.0.0]); Fri, 04 Sep 2020 03:13:26 +0000 (UTC)
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/J_7VPjOiDyCKvw8L_Nq2NAMI3UE>
Subject: Re: [Ietf-languages] LANGUAGE SUBTAG REGISTRATION FORM
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 04 Sep 2020 03:13:33 -0000

Lenny Soshinsk(i)y wrote:

 �

> Description: Latvian language in the old orthography used before 1910s

> ("vecā druka" in Latvian)

 �

> Comments:    The subtag represents the old orthography of Latvian

> language used during c. 1600s–1920s. It was first described in 1863 by

> August Bielenstein in his book "Die lettische Sprache, nach ihren

> Lauten und Formen". The orthography has been been official for Latvian

> language till new orthography was approved in 1908 and fully adopted

> in 1930.

 �

I'm not the Reviewer, but I would prefer to see these two values shortened substantially.

 �

The Description field should be suitable for use in a list of variant subtags presented in a UI. The Comments field, if present, should provide additional information that is necessary to distinguish the variant from others. Neither is intended to be encyclopedic; that's what item 6 on the registration form, "Any other relevant information" (omitted from this form) is for. The registration forms are archived at IANA (see URL below) for those who need this information.

 �

Despite the exhaustive detail proposed for these Description and Comments fields, I'm still left wondering whether the old orthography was superseded in 1908, 1910, "1920s", or 1930. That is a strong hint that none of those years should be part of the subtag value, as Lenny and John Cowan agreed later.

 �

John replied:

 �

> The applicability of the date seems a little vague, as if the

> transition happened in 1910.  How about "vecaa"?  The "aa" represents

> the long "a" of the proper "ā" letter, which cannot be used in

> subtags.  It also makes the subtag 5 letters long: "veca" would be

> incorrect because it looks like a script subtag (though there is

> currently no such subtag).

 �

Specifically, a variant subtag must be 5 to 8 characters long, or 4 if it begins with a digit (but those normally represent a year, and that's already been discussed). I know John knows this; this information is for Lenny.

 �

Lenny responded:

 �

> It was not actually an exact date, since newspapers were very

> reluctant to change the orthography to a new one. And some newspapers

> switched only in 20s.

 �

No orthographic reform happens immediately everywhere. As John pointed out, for those that already have variant subtags that reference a year, it is usually the publication date of a formal decree, a normative dictionary, or at least a representative work of literature. '1910' here feels more like a guess.

 �

> I wouldn't prefer the "vecaa" tag, because it just makes it "the old"

> (if translated) deconstructs the idiom "vecā druka". I think, the tag

> "vecaadruka" would be 100% recognizable by any Latvian.

 �

But, unfortunately, not syntactically allowable.

 �

> Another question:

> * should a separate variant be created for the contemporary

> orthography of the Latvian language?

> * also, on 5 June 1946, some letters were removed from the alphabet

> by the orthographic reform (CH replaced with H, Ō with O, Ŗ with R).

> Should the types before and after reform to be registered as well?

 �

We should wait for the Reviewer to decide, but as John said, these should probably not be registered unless there is a real need to tag the data with this information.

 �

> Well, if 8 characters is completely unavoidable, then we might try a

> makeshift compound word "vecdruka" (similar to those that are used in

> German), although "vecdruka" is almost never used in Latvian. But it

> would be recognizable and understandable and I think, an acceptable

> way to shorten it in these circumstances.

 �

It doesn't have to be an actual word. "fonkirsh" is not an English word; it is a Newspeak-like identifier meant to represent "Kirshenbaum Phonetic Alphabet." Such is life when identifiers have a tight length constraint. 

 �

> Do I understand correctly, that new subtags are added in batches once

> every 2 months or so?

 �

No, that implies a certain regularity of process and volume of requests, neither of which exists.

 �

Most subtags are requested, considered, and approved (or not) one by one. They are only batched together if there is some relationship that requires them to be considered together, such as with the Occitan variants, or for convenience if multiple requests are submitted almost simultaneously.

 �

> Where may I see the subtags sorted by newest?

 �

The IANA page showing the registration forms is located here:

https://www.iana.org/assignments/lang-subtags-templates/lang-subtags-templates.xhtml

 �

and the list can be sorted by date.

 �

Note that this list contains nearly 1,200 registration forms, but the majority of these are pro-forma registrations that reflect changes in one of the core (ISO or UN) standards. Some registration forms are for modifications or deletions of existing subtags. Where you see dozens of language subtags (usually three letters) processed on the same day, it's because of the annual cycle of ISO 639-3 changes.

 �

--

Doug Ewell | Thornton, CO, US | ewellic.org