[Ietf-languages] First cut at a BCP 47 extension structure for ISO TR 21636 (was: Language subtag registration form)

Doug Ewell <doug@ewellic.org> Fri, 27 November 2020 23:01 UTC

Return-Path: <doug@ewellic.org>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 60DC33A08D4 for <ietf-languages@ietfa.amsl.com>; Fri, 27 Nov 2020 15:01:53 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 55_hXvIpb2fT for <ietf-languages@ietfa.amsl.com>; Fri, 27 Nov 2020 15:01:51 -0800 (PST)
Received: from mork.alvestrand.no (mork.alvestrand.no [IPv6:2001:700:1:2::117]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 91BC33A08BD for <ietf-languages@ietf.org>; Fri, 27 Nov 2020 15:01:51 -0800 (PST)
Received: by mork.alvestrand.no (Postfix) id 1CA007C64BE; Sat, 28 Nov 2020 00:01:50 +0100 (CET)
Delivered-To: ietf-languages@alvestrand.no
X-Comment: SPF skipped for whitelisted relay - client-ip=2620:0:2d0:201::1:74; helo=pechora4.lax.icann.org; envelope-from=doug@ewellic.org; receiver=ietf-languages@alvestrand.no
Received: from pechora4.lax.icann.org (pechora4.icann.org [IPv6:2620:0:2d0:201::1:74]) by mork.alvestrand.no (Postfix) with ESMTPS id A94577C64BD for <ietf-languages@alvestrand.no>; Sat, 28 Nov 2020 00:01:49 +0100 (CET)
Received: from p3plsmtpa11-09.prod.phx3.secureserver.net (p3plsmtpa11-09.prod.phx3.secureserver.net [68.178.252.110]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pechora4.lax.icann.org (Postfix) with ESMTPS id D15867000C17 for <ietf-languages@iana.org>; Fri, 27 Nov 2020 23:01:46 +0000 (UTC)
Received: from DESKTOPLPOB1E4 ([73.229.14.229]) by :SMTPAUTH: with ESMTPSA id imjzkbbENwNWDimk0kBgFN; Fri, 27 Nov 2020 16:01:25 -0700
X-CMAE-Analysis: v=2.4 cv=Ae50o1bG c=1 sm=1 tr=0 ts=5fc18545 a=9XGd8Ajh92evfb2NHZFWmw==:117 a=9XGd8Ajh92evfb2NHZFWmw==:17 a=IkcTkHD0fZMA:10 a=nORFd0-XAAAA:8 a=raiDyTmiNSWkvQKlYRUA:9 a=QEXdDO2ut3YA:10 a=AYkXoqVYie-NGRFAsbO8:22
X-SECURESERVER-ACCT: doug@ewellic.org
From: Doug Ewell <doug@ewellic.org>
To: 'Sebastian Drude' <drude@xs4all.nl>, 'Mark Davis ☕' <mark@macchiato.com>
Cc: ietf-languages@iana.org
Date: Fri, 27 Nov 2020 16:01:23 -0700
Message-ID: <001201d6c511$3b953b40$b2bfb1c0$@ewellic.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AdbFETUubZiNdfiFToCVPvRrTh+uJg==
Content-Language: en-us
X-CMAE-Envelope: MS4xfAbNU4OrGGKK2R6Ef7H2WXM0F3YesjTDPETw28grPGPQXrkJTA9H0aE9W86TubHIXUtESKdtKo8HXDh1sBeEdbTJu7FHOlsxi8bPLpfvuNoYPPn466al 8mY+tLxkDaDWECD2sA8hwnJh1g+359JKXZL2IeHsOPfOidXpZ/+qwPbalNeFdqmhVRnOKTKY18TIl7MTBLNUlDbYC+J0FHI4sNdxR/B7LWJrRcynIMotFrBK A3h6vSbADr4zewyH3hsGRg==
X-Greylist: Sender DNS name whitelisted, not delayed by milter-greylist-4.6.2 (pechora4.lax.icann.org [0.0.0.0]); Fri, 27 Nov 2020 23:01:47 +0000 (UTC)
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/t6zYkJmV4rlsmTRY3eDgV2ANAkc>
Subject: [Ietf-languages] First cut at a BCP 47 extension structure for ISO TR 21636 (was: Language subtag registration form)
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 27 Nov 2020 23:01:53 -0000

Sebastian Drude wrote:

> I am here participating in this group exactly to touch base and see
> how we can implement this in a way that it is usable and compatible
> with BCP 47, BEFORE a list is created which may not be compatible in
> some way to IETF/Languages' approach.

Here's one possible way the structure of ISO TR 21636 could be made to fit into a BCP 47 extension. This shows the type of groundwork that should, and really must, be done up front if it is a major goal to make 21636 fit into BCP 47, without requiring all or even most of the values or code elements to be known.

As a partial reference, I'm using a document called "NP-21636," provided by Sebastian and dated 2019-09-30. It's up to Sebastian whether I may redistribute that document to anyone.

First, an extension singleton must be chosen. Here I suggest 'v' for "varieties," as the title of the TR is "Identification and description of language varieties." The alphabetically consecutive nature of the three extensions, 't' and 'u' and now 'v', may or may not be unfortunate.

Next, the TR identifies eight "dimensions" which are independent of each other to a greater or lesser extent. It is clearly desirable to be able to specify more than one dimension in a single tag, up to all eight, but not necessarily so. For this I will suggest a key-value mechanism like that devised for the 'u' extension, so that each dimension is indicated by a two-character key, derived from the names Sebastian provided on the 24th:

sp	Space
ti	Time
so	Social group
me	Medium
si	Situation
pe	Person
pr	Proficiency
co	Communicative functioning

(The NP-21636 document used the term "performance" instead of "communicative functioning," which would have caused a minor inconvenience by overloading 'pe'.)

Then there would be "value" identifiers of 3 to 8 characters each, providing values for each of the dimensions. The lengths of each of these subtags, 1 for the extension singleton and 2 for the dimensions and 3 to 8 for the values, are a critical part of BCP 47 syntax, permitting humans and processes to parse the tag and figure out what modifies what. 

So you might have "pt-BR-v-so-business-pr-fullpro" as one example.

A finite number of values for each dimension would need to be identified so they could be encoded, keeping uniqueness and syntactical constraints in mind. This is the process of building a code list, or registry. The set of values would be extensible; no one and no group could be expected to get this perfect the first time. But stability rules would also come into play here.

Note that the values might not have to be unique across dimensions; you could have "en-v-so-foo-pr-foo" if the string 'foo' represented a social group value and coincidentally also a proficiency value. Too much of this could lead to human confusion, though.

I have no idea what to do about things like "certainty status," but some syntax along these lines could be devised. If this is an "adjective" modifying dimensional values, rather than a value itself, that should be syntactically clear.

The dimension that troubles me the most, of course, is "person." The document makes it clear in several passages that Sebastian has his own personal variety (times the number of languages in which he can communicate), I have my own, Michael and John and Leon and Peter have their own, and so forth. Obviously neither we at ietf-languages, nor anyone else, are going to encode identifiers for each of 7.7 billion living people, plus those deceased, plus those not yet born. It may be that the only possible way to encode this dimension will be with a private-use subtag at (by definition) the end of the tag.

This list may or may not be the proper venue to continue this discussion.

--
Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org