Re: [Ietf-languages] First cut at a BCP 47 extension structure for ISO TR 21636

Sebastian Drude <drude@xs4all.nl> Sun, 29 November 2020 21:17 UTC

Return-Path: <drude@xs4all.nl>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8F0CD3A0E04 for <ietf-languages@ietfa.amsl.com>; Sun, 29 Nov 2020 13:17:41 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.433
X-Spam-Level:
X-Spam-Status: No, score=-1.433 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.001, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=xs4all.nl
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RkFcwTOTfuif for <ietf-languages@ietfa.amsl.com>; Sun, 29 Nov 2020 13:17:38 -0800 (PST)
Received: from mork.alvestrand.no (mork.alvestrand.no [IPv6:2001:700:1:2::117]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 29AC53A0DFA for <ietf-languages@ietf.org>; Sun, 29 Nov 2020 13:17:37 -0800 (PST)
Received: by mork.alvestrand.no (Postfix) id DBA7D7C6533; Sun, 29 Nov 2020 22:17:32 +0100 (CET)
Delivered-To: ietf-languages@alvestrand.no
X-Comment: SPF skipped for whitelisted relay - client-ip=2620:0:2830:201::1:73; helo=pechora3.dc.icann.org; envelope-from=drude@xs4all.nl; receiver=ietf-languages@alvestrand.no
Received: from pechora3.dc.icann.org (pechora3.icann.org [IPv6:2620:0:2830:201::1:73]) by mork.alvestrand.no (Postfix) with ESMTPS id 97B717C6532 for <ietf-languages@alvestrand.no>; Sun, 29 Nov 2020 22:17:32 +0100 (CET)
Received: from lb1-smtp-cloud7.xs4all.net (lb1-smtp-cloud7.xs4all.net [194.109.24.24]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by pechora3.dc.icann.org (Postfix) with ESMTPS id 7B6CA70000F2 for <ietf-languages@iana.org>; Sun, 29 Nov 2020 21:17:30 +0000 (UTC)
Received: from cust-d2ef4cbd ([IPv6:fc0c:c138:75cc:34bc:4631:c48c:494:61cb]) by smtp-cloud7.xs4all.net with ESMTPA id jU43kolMZN7XgjU49kgEgN; Sun, 29 Nov 2020 22:17:08 +0100
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=xs4all.nl; s=s2; t=1606684628; bh=CIIbpNQUKzPz8n13OzuB2OzCqSWj38kwc28bpLqpOwQ=; h=Subject:To:From:Message-ID:Date:MIME-Version:Content-Type:From: Subject; b=J5moyNgS951lPuXcV6NwJRLzQdRz0SSX0qcM3clsL19IYeT2s4m/8wV1wtPxLna8p CN/3HikTtxixOyyZRzDRzOXAiv5+0K+ije6B6P3tgBQGYTHkzRdZO+Sd5YTrxc25X9 4jryY8dYa7MT1CeW4cOUZlW3JkHJeur6jN23LxHZ8ZWS0hObaO7ZMdC14ttW6k4yMK +Z/W9UgOLVnbsH43S86gMAEfkhBdIIOgdc1yblTxVYFVh/G1LqypFtSxsHvD+mcsPI yjj7N4jNVlW+3RWDpDRExXPje5b3vEaY3ygxhy5Hj3vVtV/pVl/i8yMkztRZ8pc57F C4nGG2SKs2AFg==
To: Doug Ewell <doug@ewellic.org>, 'Mark Davis ☕' <mark@macchiato.com>
Cc: ietf-languages@iana.org, iso639-3@sil.org
References: <20201127232932.665a7a7059d7ee80bb4d670165c8327d.20171979ac.wbe@email15.godaddy.com> <7903ae59-951e-9f46-0af8-b2a3f6657513@xs4all.nl> <000301d6c603$3cf4b540$b6de1fc0$@ewellic.org>
From: Sebastian Drude <drude@xs4all.nl>
Message-ID: <c8887ad0-2c11-6af2-2c34-27e1b7946e34@xs4all.nl>
Date: Sun, 29 Nov 2020 18:16:58 -0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0
MIME-Version: 1.0
In-Reply-To: <000301d6c603$3cf4b540$b6de1fc0$@ewellic.org>
Content-Type: multipart/alternative; boundary="------------339CD9B0CF7E4841B0FF20BA"
Content-Language: pt-BR
X-CMAE-Envelope: MS4xfNxFPEHvmbPMvyhYVe9LD0tyDpJfXK7l0Py89ZgHya8SdvAv/BI7gGtay9JLB90B9eR4baZNo8c07yYVzfp6Uu9+PV+dDkVoq01GQnPdrZDWCE5+ptb5 6tXF5/jUaOiBPoRmaEk2c2PATVJA3DW2ULLkEtRF4uTFmlYkxbJD7r9nMCqijAtzXWHJx5u1xXvi5BR7NtUDBRiXPDIHFOtq9a5RnosglgXeKZLzuk/jhJbZ JTw4D/6O8TK82BZNMJjwGZ4AB5kb4Ui5U6RJuZzpIWvSIUWiFJvchpeHGj/TLhnah/ldgCisk4iXnFFhyvk83qsxyZ3RTukrhJ1AtYLrQpiMt+++J2nMSJAt aypRJRAFVITYSxD10BEvB5OguPpbOA==
X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.2 (pechora3.dc.icann.org [0.0.0.0]); Sun, 29 Nov 2020 21:17:30 +0000 (UTC)
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/YYvSWeeap67VepuK9TNB1s_bGvY>
Subject: Re: [Ietf-languages] First cut at a BCP 47 extension structure for ISO TR 21636
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 29 Nov 2020 21:17:42 -0000

Thanks again, dear Doug,

and again my comments below.

Hava a nice (rest of) 1st advent sunday,

Sebastian

-- 

Museu P.E. Goeldi, CCH, Linguistica ▪ Av. Perimetral, 1901
Terra Firme, CEP: 66077-530 ▪ Belém do Pará – PA ▪ Brazil
drude@xs4all.nl ▪ +55 (91) 3217 6024 ▪ +55 (91) 983733319
Priv: Tv. Juvenal Cordeiro, 184, Apt 104 ▪ 66070-300 Belém

On 29/11/2020 00:53, Doug Ewell wrote:
> That shouldn’t be necessary. Many folks have access to Mailman and can 
> spin up an ad-hoc mailing list. I know Michael Everson has done so on 
> several occasions, including one for a Unicode encoding proposal I’ve 
> been involved with that really couldn’t have succeeded without it. But 
> Michael doesn’t run a mailing-list business, and I hesitate to pester 
> him for this. 
Right. If things where normal, I could ask my department, but people are 
working from home etc....

> I need to read Mark’s response about this, and the issues surrounding it, a bit more thoroughly. I was under the impression that a given extension could quite happily allow you to string together “fr-v-xx-abcdef-ghijk”, or not, depending on how the spec is written. Mark seems to imply some inherent limitations on this in BCP 47. Fortunately we have plenty of time to figure out what can and cannot, or should not, be done here.
Yes, let's go on this with care.  I understood that there are 
possibilities of several values for the same key, which would mostly 
depend on how we set that up. I am confident we figure that out.

>> For the first case, several ever more specific indications would only
>> be admissible if each is relevant for some appliaction, and one cannot
>> presume that the language-tag-consumer knows of logical implications
>> (South Tirol German implies Bavarian which implies High German, for
>> instance).
> Nothing about BCP 47 tagging is ever intended to presume that the user (at either end) knows anything about language family hierarchies, about which there is much disagreement anyway.

This is what I would expect.


> I suspect I am not reading this comment carefully enough, and taking it out of context.
No need to rush.

>>>> -- the 'certainty' and similar "adjectives" (yes, that is how I
>>>> would see them; -- e.g. primary vs. secondary modality, genuine vs.
>>>> imitated, ...)
>>> I knew I had missed some in reading quickly through the NP document.
>>> If there are many, this would require some thought.
>> At this point, I do not foresee this to be heavily used, but what do I
>> know about possible needs in 20 years?  I would need to compile a list
>> of such "adjectives", perhaps there is one more case besides the three
>> we have identified here.  Again (see below, next comment), there are
>> default values, and only exceptions would need coding.
> No, I literally meant I did not even know there were three at present. I thought only "certainty" fell into this category. That's why I need to read through that part of the NP document again.
>
> But yes, the mechanism does need to allow for future modifiers of this type, just like the 'u' extension allows for additional keys, as well as additional values within each key.

Looking at the latest version of the document, there are indeed a number 
of such 'modifying' statements.

Certainty (values: documented, confirmed, certain, inferred, assumed, 
possible, unclear, ...).

Dialects being influenced by another language (Example: Urdu as spoken 
in London, influenced by: English).  Dialectal varieties that emerge if 
the father speaks one and the mother another dialect, or the parents 
speak one dialect, but the person in question grows up speaking another 
dialect with peers -- sometimes not dialectal bi-'lingualism' emerges, 
but mixed forms.

Primary modality and secondary modality.  This may actually be relevant 
in many situations.

Proficiency: framework of rating, or theory of language acquisicion applied

Communicative functioning: degree of (informally said:) anomaly

I guess that this can all be easily solved by adding further two-letter 
keys to provide these indications in their values as if they were 
additional dimensions.  The rules then would have to state that they are 
only allowed if the major dimension is present.

> I have to confess that it never occurred to me that users of this extension, or indeed users of the TR in any form, would always be expected to provide values for all eight (or seven) dimensions, and that a defaulting mechanism would be necessary to permit eliding some of them.

Indeed, that was never the intention, but as soon as people understand 
that in principle each and every speech event or resource inevitably has 
some value on each of the eight dimensions, people could feel obliged, 
for the sake of coherence or completeness or whatever, to indicate all 8 
of them every time.  Absolutely not feasible, and not what anybody would 
want, let alone require.  Still, it came up as an argument for rejecting 
the whole framework.

Of course, if one indicates only one value for one dimension (which is 
perfectly fine), or no variation value at all, as will be the case of 
the vast majority of all language tags, the status of the (other) 
dimensions of linguistic variation can usually be inferred by assuming 
something like default values, and that could be made explicit 
somewhere.  That is what I meant.  That does not necessarily imply that 
default values are a formal part of the extension.

>> I agree.  I was not at all proposing to complement, let alone replace,
>> the ISO 639 identifiers in the main language subtags or any other
>> crucial area of BCP 47 by glottocodes.
> Thank you very much. I am glad in this case that I simply misread this and panicked for no reason.
No, I expressed myself badly here I guess.  Sorry for that.

> Well, I mean, we also know that Glottocodes exist. For that matter, we also know the Linguasphere coding system exists. Whether we consider these to be “standards,” whether on a par with ISO 639 or not, can be debated.

Sure.  This depends a lot of the community, of course.  In diversity 
linguistics, people accept Glottocodes as a valid alternative, if not 
superior, to ISO 639-3.  And I have to agree that the scholar soundness 
is impressive, certainly very different from the idiosyncratic 
Linguasphere (despite "a resourceful conception, many valid insights and 
much knowledge", as I wrote elsewhere <http://hdl.handle.net/10125/24814>).

> https://www.iana.org/assignments/language-tag-extensions-registry
I see, only two, "t" and "u", are registered.  Plenty of room to 
register "g" and "v".

> The reviewers may simply have not seen the documentation for Adhari (Old Azeri) that they were looking for.

This was just a fictious case, taking one of the first languages with a 
Glottocode that apparently does not have any ISO 639 entry.  There are 
no real "reviewers" (reviewing what?).

> Submitting a proposal to add a language like this to 639-3, and thus to the Registry, might be more productive than building a mechanism to swap in another coding system, ISO 2022-like. This is especially true if the Adhari example isn’t intended to represent hundreds or thousands of language missing from 639-3.

There are not too many languages in the Glottolog which are not in the 
Ethnologue, certainly not thousands, perhaps a couple of hundreds.  
Still, certain communities may be content to just take the existing 
Glottocode instead of taking up the burden of starting a process to add 
the language to ISO 639.  (There may be little overlap with the 
communities using BCP 47 language tags, but that may change.)

In addition to languages not (yet) in ISO, there are certainly several 
thousands varieties (mainly, but not only dialects) of languages which 
already do have a Glottocode, which *may* be interesting to allow as 
subtags or at least an extension.  I don't understand this deeply enough 
at this point to advocate for or against anything, I mainly want to draw 
your attention to this.


>
> --
> Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org
>
>
>