Re: [Ietf-languages] First cut at a BCP 47 extension structure for ISO TR 21636 (was: Language subtag registration form)

Mark Davis ☕️ <mark@macchiato.com> Sun, 29 November 2020 15:59 UTC

Return-Path: <mark.edward.davis@gmail.com>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 21A783A08A5 for <ietf-languages@ietfa.amsl.com>; Sun, 29 Nov 2020 07:59:56 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.733
X-Spam-Level:
X-Spam-Status: No, score=-0.733 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=macchiato-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Qdc-B_oiMs7D for <ietf-languages@ietfa.amsl.com>; Sun, 29 Nov 2020 07:59:53 -0800 (PST)
Received: from mork.alvestrand.no (mork.alvestrand.no [158.38.152.117]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2A7463A0836 for <ietf-languages@ietf.org>; Sun, 29 Nov 2020 07:59:52 -0800 (PST)
Received: by mork.alvestrand.no (Postfix) id AA37D7C652A; Sun, 29 Nov 2020 16:59:49 +0100 (CET)
Delivered-To: ietf-languages@alvestrand.no
X-Comment: SPF skipped for whitelisted relay - client-ip=2620:0:2830:201::1:71; helo=pechora5.dc.icann.org; envelope-from=mark.edward.davis@gmail.com; receiver=ietf-languages@alvestrand.no
Received: from pechora5.dc.icann.org (pechora5.icann.org [IPv6:2620:0:2830:201::1:71]) by mork.alvestrand.no (Postfix) with ESMTPS id 628157C650E for <ietf-languages@alvestrand.no>; Sun, 29 Nov 2020 16:59:48 +0100 (CET)
Received: from mail-qv1-xf34.google.com (mail-qv1-xf34.google.com [IPv6:2607:f8b0:4864:20::f34]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)) (No client certificate requested) by pechora5.dc.icann.org (Postfix) with ESMTPS id D62ED70007D3 for <ietf-languages@iana.org>; Sun, 29 Nov 2020 15:59:47 +0000 (UTC)
Received: by mail-qv1-xf34.google.com with SMTP id ek7so4502652qvb.6 for <ietf-languages@iana.org>; Sun, 29 Nov 2020 07:59:47 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=macchiato-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Be4j2+vgwWuxgO8Sf4e+zpDlhqGAD5QrC+mr4p9Pyes=; b=shCy/ieMyuZ50jXXqiNRs9cj3SuVQwxXFenA+ieADOad2njhvmU7X6tSmNYWFRvtT+ 7SNqo7O5a9Qg3TP0a/PunnKKrGjPTAaP0BQIgv3UOLqqwffEzz9pjDxn5ZN0d9cj9whW 6zA1yScxGTlEPXvOmgkNykGZuNosmFCXn+913FBsYCtoPXVjhJzThaglhHBqUsGHE9pI BQMYJtOIZIMHh+/NUFTOpTE8nhANDXYXq8w0BGgqL0yXdd1hsMZzjQeHrnpiyY2b4o0X BqBJmqYJLYptoaHGgkGcgn/5f1uhNQBr63OUXSFc7YUfToe8BLv2MMWnDra9YFWZA/P6 F65Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Be4j2+vgwWuxgO8Sf4e+zpDlhqGAD5QrC+mr4p9Pyes=; b=ulRi8kVGx8vJTYgOJziVOpwuyLgnN+7agWz5DbISvKHwRAtuD1T1iLH1ApytYvSX48 1P/2xpkdk6bMpmGLsN4bn0Ikr58qMwwO9wu9Fqjr4owKzyiCAOk2H07J1Csit+pQ9vyr fLEO6wj54qKbPvZFtuekRsxrQghuyb+87wrU1lSD4p2DSc9qcJjZylR1lzSH/8LGCy3j ItTt25H9wCN4BBUkNQ3MYPzHNI7nLETBOQoNRKgdTQESkzSiQWQrC9c498V13KiYtfOt 1j+BBnU9YSvbAcTURzknpRAEU8cuxzpdPMe+xDM611pq4iRK6PDMAeHyU/oquuWP5+00 MIFQ==
X-Gm-Message-State: AOAM5336C0DtrE9urONn4Pk2IJK19ynq9GdO8Yt8aPGEJwEf1l5to+2k UkJTkQjAFtzC9FEEn0V1/SM/Gaywk18On1JKWuM=
X-Google-Smtp-Source: ABdhPJw2WsPNc8LMyUhteh9GZIqRBzO/o0y8z2z9j0+iyNQM2zbHvysdrT0EJSVyThYHrLT+GBuF+mlc7Cj7dEKuNNg=
X-Received: by 2002:a0c:bf44:: with SMTP id b4mr17955783qvj.30.1606665566878; Sun, 29 Nov 2020 07:59:26 -0800 (PST)
MIME-Version: 1.0
References: <20201127232932.665a7a7059d7ee80bb4d670165c8327d.20171979ac.wbe@email15.godaddy.com> <7903ae59-951e-9f46-0af8-b2a3f6657513@xs4all.nl> <000301d6c603$3cf4b540$b6de1fc0$@ewellic.org>
In-Reply-To: <000301d6c603$3cf4b540$b6de1fc0$@ewellic.org>
From: Mark Davis ☕️ <mark@macchiato.com>
Date: Sun, 29 Nov 2020 07:59:14 -0800
Message-ID: <CAJ2xs_G3Dm0ssGj7aYiKNt8eCEca6xoSee-aJfQaKOP1GtjdWw@mail.gmail.com>
To: Doug Ewell <doug@ewellic.org>
Cc: Sebastian Drude <drude@xs4all.nl>, "ietf-languages@iana.org" <ietf-languages@iana.org>, "ISO639-3@sil.org" <iso639-3@sil.org>
Content-Type: multipart/alternative; boundary="00000000000063007205b540fa3d"
X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.2 (pechora5.dc.icann.org [0.0.0.0]); Sun, 29 Nov 2020 15:59:47 +0000 (UTC)
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/cIDu657atLqIbQ8-TI5IMDlxMoE>
Subject: Re: [Ietf-languages] First cut at a BCP 47 extension structure for ISO TR 21636 (was: Language subtag registration form)
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 29 Nov 2020 15:59:56 -0000

> I was under the impression that a given extension could quite happily
allow you to string together “fr-v-xx-abcdef-ghijk”, or not, depending on
how the spec is written. Mark seems to imply some inherent limitations on
this in BCP 47. Fortunately we have plenty of time to figure out what can
and cannot, or should not, be done here.

Your impression is right, sorry that I was unclear. Bcp47 does not impose
structure; they were just examples of how we ended up adding structure so
as to provide for clear semantics when parsing multiple values.


BTW, for a new email group there are far easier and more functional
mechanisms available in this millennium ...

On Sat, Nov 28, 2020, 19:53 Doug Ewell <doug@ewellic.org> wrote:

> Sebastian Drude wrote:
>
> >> Maybe a custom mailing list could be formed. Everyone here would need
> >> to be eligible to join the discussion if they choose to.
> >
> > Yes, that would be great.  I am not connected any more to any service
> > where I could do so, whom could we ask?  The LinguistList?
>
> That shouldn’t be necessary. Many folks have access to Mailman and can
> spin up an ad-hoc mailing list. I know Michael Everson has done so on
> several occasions, including one for a Unicode encoding proposal I’ve been
> involved with that really couldn’t have succeeded without it. But Michael
> doesn’t run a mailing-list business, and I hesitate to pester him for this.
>
> >>> -- can more than one value in the SAME dimension be indicated? (I
> >>> would argue yes, if that is syntactically okay)
> >
> > That is fine.  One could need it for at least two cases:
> >
> > - more general and more specific indications (dialects, sociolects...)
> >
> > - idiolects belonging to intersections of varieties (e.g., showing all
> > the defining criteria for more than one dialect, in border areas)
>
> I need to read Mark’s response about this, and the issues surrounding it,
> a bit more thoroughly. I was under the impression that a given extension
> could quite happily allow you to string together “fr-v-xx-abcdef-ghijk”, or
> not, depending on how the spec is written. Mark seems to imply some
> inherent limitations on this in BCP 47. Fortunately we have plenty of time
> to figure out what can and cannot, or should not, be done here.
>
> > For the first case, several ever more specific indications would only
> > be admissible if each is relevant for some appliaction, and one cannot
> > presume that the language-tag-consumer knows of logical implications
> > (South Tirol German implies Bavarian which implies High German, for
> > instance).
>
> Nothing about BCP 47 tagging is ever intended to presume that the user (at
> either end) knows anything about language family hierarchies, about which
> there is much disagreement anyway.
>
> I suspect I am not reading this comment carefully enough, and taking it
> out of context.
>
> >>> -- the 'certainty' and similar "adjectives" (yes, that is how I
> >>> would see them; -- e.g. primary vs. secondary modality, genuine vs.
> >>> imitated, ...)
> >>
> >> I knew I had missed some in reading quickly through the NP document.
> >> If there are many, this would require some thought.
> >
> > At this point, I do not foresee this to be heavily used, but what do I
> > know about possible needs in 20 years?  I would need to compile a list
> > of such "adjectives", perhaps there is one more case besides the three
> > we have identified here.  Again (see below, next comment), there are
> > default values, and only exceptions would need coding.
>
> No, I literally meant I did not even know there were three at present. I
> thought only "certainty" fell into this category. That's why I need to read
> through that part of the NP document again.
>
> But yes, the mechanism does need to allow for future modifiers of this
> type, just like the 'u' extension allows for additional keys, as well as
> additional values within each key.
>
> >>> -- default values
> >
> > I understand.  A common objection to the 8-dimension-approach is that
> > it is very cumbersome to indicate values for each dimension for each
> > resource, and I could not agree more.
> >
> > One obvious solution is to imply default values and code only
> > 'deviant' varieties.  The default values could be: (1) the respective
> > "standard varieties" in the case of the space dimension, (2) "current
> > period" for time, (3) "middle-class" or "socially neutral" for social
> > group, (5) "neutral" for register, (7) "full" proficiency, and (8)
> > "regular functioning" (no 'anomaly') for the commun. funct. dimension.
> > (We leave (6), the person dimension, out, as discussed.)
> >
> > For (3) the medium dimension, the default value would perhaps depend
> > on (a) the media carrier ("oral" for an audio recording, "written" for
> > a text document or PDF, although one may want to indicate the specific
> > writing system etc. -- that is currently also the case), and (b) the
> > language (sign languages would have the signed modality as default for
> > videos, other languages the multimodal modality; Latin would have the
> > written modality as default independent of the media carrier, and so
> > forth).
>
> I have to confess that it never occurred to me that users of this
> extension, or indeed users of the TR in any form, would always be expected
> to provide values for all eight (or seven) dimensions, and that a
> defaulting mechanism would be necessary to permit eliding some of them.
>
> > I agree.  I was not at all proposing to complement, let alone replace,
> > the ISO 639 identifiers in the main language subtags or any other
> > crucial area of BCP 47 by glottocodes.
>
> Thank you very much. I am glad in this case that I simply misread this and
> panicked for no reason.
>
> > Still, that is what I mean: while not changing the current setting for
> > the URLs, Wikipedia at some time recognized that the Glottocodes
> > exist, and are important enough to be now a standard information given
> > for each language.
>
> Well, I mean, we also know that Glottocodes exist. For that matter, we
> also know the Linguasphere coding system exists. Whether we consider these
> to be “standards,” whether on a par with ISO 639 or not, can be debated.
>
> > Instead, one could think of introducing an extension, say, with subtag
> > "g" (if that is not already assigned -- where can I find the list of
> > existing extensions?),
>
> https://www.iana.org/assignments/language-tag-extensions-registry
>
> > to be followed by a glottocode, in order to allow for a user to
> > indicate some variety that is well defined by its Glottolog entry.
> > Especially in cases where ISO does not have an appropriate language
> > code (yet), a user could use, for instance, mis-g-adha1238, where the
> > g-extension subtag adha1238 is the Glottocode for the extinct Adhari
> > language, which is not (yet) in ISO 639.  Perhaps even a better
> > solution can be imagined than using the special purpose ISO 639
> > identifier "mis" -- miscellaneous language (no ISO 639 code element is
> > assigned for this language or included in the part of ISO 639 used by
> > a given application).
>
> ISO 639-3/RA does have a well-established process to add (and modify, and
> delete) code elements to reflect linguistic realities. The reviewers may
> simply have not seen the documentation for Adhari (Old Azeri) that they
> were looking for. Submitting a proposal to add a language like this to
> 639-3, and thus to the Registry, might be more productive than building a
> mechanism to swap in another coding system, ISO 2022-like. This is
> especially true if the Adhari example isn’t intended to represent hundreds
> or thousands of language missing from 639-3.
>
> --
> Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org
>
>
>
>