Re: [Ietf-languages] Latin Sub tags

Mark Davis ☕️ <mark@macchiato.com> Mon, 04 December 2023 01:46 UTC

Return-Path: <mark.edward.davis@gmail.com>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C1A28C14F5FA for <ietf-languages@ietfa.amsl.com>; Sun, 3 Dec 2023 17:46:26 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.729
X-Spam-Level:
X-Spam-Status: No, score=-0.729 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.25, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665, T_KAM_HTML_FONT_INVALID=0.01, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail-com.20230601.gappssmtp.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5Kjvr0emC3xU for <ietf-languages@ietfa.amsl.com>; Sun, 3 Dec 2023 17:46:22 -0800 (PST)
Received: from out.mail.icann.org (out.mail.icann.org [64.78.33.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7602FC14F5F8 for <ietf-languages@ietf.org>; Sun, 3 Dec 2023 17:46:22 -0800 (PST)
Received: from MBX112-W2-CO-1.pexch112.icann.org (10.226.41.128) by MBX112-E2-CO-1.pexch112.icann.org (10.226.41.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Sun, 3 Dec 2023 17:46:21 -0800
Received: from aesmt112-va-1-2.serverpod.net (10.216.74.35) by MBX112-W2-CO-1.pexch112.icann.org (10.226.41.129) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28 via Frontend Transport; Sun, 3 Dec 2023 17:46:21 -0800
Received: from aesc112-va-1-1.serverpod.net (aesc112-va-1-1.serverpod.net [10.216.76.34]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by aesmt112-va-1.serverpod.net (Postfix) with ESMTPS id F149860002 for <ietf-languages@ex.icann.org>; Sun, 3 Dec 2023 17:46:20 -0800 (PST)
Received: from exmx112-va-1-1.serverpod.net (exmx112-va-1-1.serverpod.net [10.216.72.34]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by aesmt112-va-1.serverpod.net (Postfix) with ESMTPS id B8585A0003 for <ietf-languages@ex.icann.org>; Sun, 3 Dec 2023 17:46:20 -0800 (PST)
Received: from pechora3.dc.icann.org (pechora3.icann.org [192.0.46.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by east.smtp.mx.icann.org (Postfix) with ESMTPS id 7421E80002 for <ietf-languages@ex.icann.org>; Sun, 3 Dec 2023 17:46:20 -0800 (PST)
Received: from mail-pl1-x62e.google.com (mail-pl1-x62e.google.com [IPv6:2607:f8b0:4864:20::62e]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by pechora3.dc.icann.org (Postfix) with ESMTPS id 2449370000CE for <ietf-languages@iana.org>; Mon, 4 Dec 2023 01:46:20 +0000 (UTC)
Received: by mail-pl1-x62e.google.com with SMTP id d9443c01a7336-1d0b2752dc6so577175ad.3 for <ietf-languages@iana.org>; Sun, 03 Dec 2023 17:46:20 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail-com.20230601.gappssmtp.com; s=20230601; t=1701654359; x=1702259159; darn=iana.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=vhlqR2XuiB0FUxq63FOn0WgXQpFQ/zmLQBJGoyUxDHw=; b=FWC3pXgO+CD8pHa2JvJIrFF456wTfzuSkeAnNNnPu8Hl9hvz/GOYnx+UEcbpkbNwG0 TieC9RtvqIxM90SPNRtxJ2WVQQUldYy5eJpcAkxzrxumsbVPv63ipzyjUan1M74Gi+oL MDQksPVwcA3WlpuVK76rEFVpu2aF7395dRDnniqF2+5sWvUqfs6SjmIyigUyMX4uggVx KPNIhCwxpaNXV4ljFgac/cgm43RBhIG03z9ESfpBxovCb6nJSZ5jcoWGtjxELRjxg2pz VMlIPjZfmI7sW+qqiJroGQzh/LJ30kxMrLNiACQ0RCXVP37x0ehwcguF3/0so7U8VBdA TPSQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701654359; x=1702259159; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=vhlqR2XuiB0FUxq63FOn0WgXQpFQ/zmLQBJGoyUxDHw=; b=I/ntjj+/VjTL+dA5U3o3c2aAC4Ikz/gkHeTN2ATlvYa3pS8F2EELJAhLF31KsCsuI2 7OZriIZeY8FMus/xg8bPOgI/Z8heKXzORtCbN7v0198PGQ97StVgUwdsb50TIGFCosm3 x7QB/SnD8zP9y57l/UolbvCs2neaWf2fPkAtOkCJktFAf+m2Xyy+tx9LBlFzQv/9FUIa GRr60sm+fOZn5Y3TiqjPWFg8ael7prL6IcCsbhoBasN8NihB6O92vuowckEBys56H3KC l9Bk/i8CCO4AHMuQpZRLbhivMYzPbW06EeE5fpvAgUqBukyKTb5i+MFyDV2st2hiGj/p 52WA==
X-Gm-Message-State: AOJu0YxAZEFWHuWrmIaSmF3AItTzUW8C4pvkcB6rKFTleeRDlZ5ZLlTK 96fhXVLeNLJvYkpFF4cP6ifeaz8Pd0Jo0HBm++E=
X-Google-Smtp-Source: AGHT+IFajn2oCe/vhaFpm6LipzhZajcY2/FJ/73GpqCZrJHXzWhjdlfm6M4CNtoBryOdlv88NJVooyHajO//+uY/RaI=
X-Received: by 2002:a17:902:a98b:b0:1d0:6ffd:9e30 with SMTP id bh11-20020a170902a98b00b001d06ffd9e30mr2798529plb.130.1701654359035; Sun, 03 Dec 2023 17:45:59 -0800 (PST)
MIME-Version: 1.0
References: <CAE=3Ky-swzNn1hXba=muJF_radLugdKxhJ=u-_DcLiysDbothw@mail.gmail.com> <SJ0PR03MB6598925FCBA24238F417BE25CA82A@SJ0PR03MB6598.namprd03.prod.outlook.com> <CAE=3Ky_g3Sd7eBNq7H7_5BvkP-2qCbyYzQdO_eCSDzc70kQbJQ@mail.gmail.com> <012401da23b4$a9a291a0$fce7b4e0$@xs4all.nl> <SJ0PR03MB6598D48CD059645543661CA4CA81A@SJ0PR03MB6598.namprd03.prod.outlook.com> <018b01da2497$5094ed90$f1bec8b0$@xs4all.nl> <SJ0PR03MB65988A4F7614E585DF2BA6CBCA81A@SJ0PR03MB6598.namprd03.prod.outlook.com> <CAE=3Ky-jSJPg4pdKFbeabzHjOHzppasFV88J_v22RxKhATzxHA@mail.gmail.com> <000801da24b1$176928f0$463b7ad0$@xs4all.nl>
In-Reply-To: <000801da24b1$176928f0$463b7ad0$@xs4all.nl>
From: Mark Davis ☕️ <mark@macchiato.com>
Date: Sun, 03 Dec 2023 17:45:47 -0800
Message-ID: <CAJ2xs_FSBeryEr=zwj85Ac7_xqfwyE=_rKRd25+fCvCHjLY9Gg@mail.gmail.com>
To: drude@xs4all.nl
Cc: Hugh Paterson III <sil.linguist@gmail.com>, Doug Ewell <doug@ewellic.org>, IETF Languages Discussion <ietf-languages@iana.org>
Content-Type: multipart/alternative; boundary="00000000000099d793060ba547d1"
X-CMAE-Score: 0
X-CMAE-Analysis: v=2.4 cv=Gt5RR25C c=1 sm=1 tr=0 ts=656d2f6c a=Z2iVbzAMQWfC12katpY7Eg==:117 a=Z2iVbzAMQWfC12katpY7Eg==:17 a=OBCh2BWVGUryTzm1J6akcm4hsXU=:19 a=xqWC_Br6kY4A:10 a=e2cXIFwxEfEA:10 a=M51BFTxLslgA:10 a=A4EqBspgoKYA:10 a=xOd6jRPJAAAA:8 a=48vgC7mUAAAA:8 a=nORFd0-XAAAA:8 a=I0CVDw5ZAAAA:8 a=06aqjJ_2AAAA:8 a=OFgJNVZbH4m2myOWFAYA:9 a=QEXdDO2ut3YA:10 a=XyyWzMHKCNIkR5NoC5EA:9 a=a9fWYftDDpYsTwrE:21 a=lqcHg5cX4UMA:10 a=w1C3t2QeGrPiZgrLijVG:22 a=AYkXoqVYie-NGRFAsbO8:22 a=YdXdGVBxRxTCRzIkH2Jn:22
X-SOURCE-IP: 192.0.46.73
X-SPF-STATUS: soft_fail
X-SPF-FROM-STATUS: not_checked
X-RDNS-STATUS: pass
X-HELO-STRING: pechora3.dc.icann.org
Spam-Stopper-Id: 53cc4a59-6213-4fce-b6a3-268b1b2731b1
Spam-Stopper-v2: Yes
X-Envelope-Mail-From: mark.edward.davis@gmail.com
X-Spam-Reasons: None
X-AES-Category: LEGIT
X-Spam-Category: None
X-AES-Analytics-Data: eyJ0aW1lc3RhbXAiOiAiMjAyMy0xMi0wNFQwMTo0NjoyMC44NTlaIiwgIm1lc3NhZ2VUcmFja2luZyI6IHsiaGFuZGxpbmciOiBbIlRISVJEIFBBUlRZIEJZUEFTUyJdLCAidW5pZmllZENhdGVnb3J5IjogIlVOQ0FURUdPUklTRUQifSwgImVuZ2luZXMiOiB7fX0=
X-Auto-Response-Suppress: DR, OOF, AutoReply
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/KoPQmtFIF5OqmwpqWouDTDny8Ic>
Subject: Re: [Ietf-languages] Latin Sub tags
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Review of requests for language tag registration according to BCP 47 \(RFC 4646\)" <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Dec 2023 01:46:26 -0000

> In January 2010, ISO 639-3/RA reclassified the individual language [lav]
Latvian as a macrolanguage, encompassing the two new individual languages
[lvs] Standard Latvian and [ltg] Latgalian.

If there is no code for a given language X, people will often end up
tagging it with a language code Y that does exist where common names for X
reference the name for Y (even where there is little mutual
comprehensibility between X and Y). But then later on, a code is assigned
for X. The question is, what to do about the relationship between X and Y?

(Below I’m following BCP47 in using the ‘shortest form’, thus ‘de’ (ISO
639-1) for ‘deu’ (ISO 639-3), ‘ar’ = ‘ara’, etc.)

   1.

   *No macrolanguage. *This model is the one used for German, French, and
   many other languages. For German, the original code ‘de’ is left as the
   “standard” form Y, and the X code is just a related language. The ‘de’
   code is not reclassified as a macrolanguage.
   2.

   *Reclassifying as macrolanguage. *Another model is to reclassify the
   original code so that it doesn’t really refer to ‘a’ language, but rather
   encompasses many different languages, with perhaps little mutual
   comprehensibility among them. Under that model, ‘de’ changes to a
   macrolanguage that encompasses {gct, gsw, swg, wae, bar, vmf, ksh, ltz,
   pfl, sxu, nds, …} *and* a separate code (eg, /dea/) is added that would
   mean standard German specifically. This is the model that ISO used with
   ‘ar’, for example: ‘ar’ became a macro language code, and ‘arb’ was added
   as the standard form.
   3.

   *New code for macrolanguage.* There is a third possible model: ‘de’
   remains a non-macro language, but a new macro language (eg, /dey/) gets
   added that encompasses{de, gct, gsw, swg, wae, bar, vmf, ksh, ltz, pfl,
   sxu, nds, …}. (I don’t know whether this has ever happened.)

Although well-intentioned, reclassification of macrolanguages has the
by-product of introducing compatibility complications for a large number of
— if not most — implementations, because in practice implementations need
to interact with data using the original code. (There is little consistency
in the reclassifications, as well: why is de different from ar?)

For example, to combat the ambiguity in CLDR we treat the macrolanguage and
the predominant encompassed language as identical, and favor the
macrolanguage form for backwards compatibility. Why do this? That’s because
when an implementation asks for a set of locale data for ‘ar’, it really
doesn’t expects to get a hodgepodge of possible encompassed languages (some
Algerian Arabic, some Egyptian Arabic, some Gulf Arabic, and some Modern
Standard Arabic) in return; it expects Modern Standard Arabic.

While it is too late to change the macrolanguage structure, I would advise
against further “reclassifications”, since they just make it harder for
people, not easier.
Mark


On Fri, Dec 1, 2023 at 3:50 PM <drude@xs4all.nl> wrote:

> In our very first meeting of the MA a couple of weeks ago, he started to
> discuss these matters, so please do not expect any solution let alone
> changes in the coming weeks or months.
>
> The goal is to have ONE database, the data of which can be obtained in
> various formats, visually on webpages and as data files.  CSV should
> certainly continue to be supported; I agree.
>
> As I said, the discussions have just started, but as for versioning, a
> system such as a GIT (as in GitHub) was mentioned as an option.  Which
> includes the option that the database *is* a CSV-file.
>
> Best, Sebastian
>
>
>
> *From:* Ietf-languages <ietf-languages-bounces@ietf.org> *On Behalf Of *Hugh
> Paterson III
> *Sent:* Friday, 1 December, 2023 19:03
> *To:* Doug Ewell <doug@ewellic.org>
> *Cc:* drude@xs4all.nl; IETF Languages Discussion <ietf-languages@iana.org>
> *Subject:* Re: [Ietf-languages] Latin Sub tags
>
>
>
> As a matter of practicality, it would be helpful for my processes if the
> format of the data files would remain tab delimited or CSV. Sebastian
> mentions a "database format". I can imagine that as an SQL dump or JSON
> file, which are valid and in some cases useful formats. However, as a
> current data consumer of these standards I do prefer the tab and CSV
> formats that have existed for some time. It would be useful if there was a
> single URI to download from. My classic example has been wordpress.
> https://wordpress.org/latest if one follows the included link, "latest"
> will allow curl or wget type resources to always get the latest "version"
> of wordpress. This URL strategy is really helpful in automation contexts
> where content is fetched. Note that even though wordpress is versioned with
> a semantic versioning system for releases, "latest" in the URL is a generic
> pointer to the latest version.
>
>
>
> I am curious how the MA will approach the matter of data versioning. This
> is something that some of my colleagues and I have often desired from
> downloaded data (from various sources) in the BCP-47 space. Versions allow
> us to quickly look at the state of our systems and determine if we need to
> update data.
>
>
>
> - Hugh
>
>
>
> On Fri, Dec 1, 2023 at 1:04 PM Doug Ewell <doug@ewellic.org> wrote:
>
> drude@xs4all.nl wrote:
>
> > I am sorry to read this; I was under the impression that core members
> > of this list were aware of the revision of ISO 639 and the setting up
> > of a Maintenance Agency; this process has been ongoing for many
> > years...
>
> I had heard in very broad, general terms that administration of the parts
> of 639 was to be consolidated, at some point in the future.
>
> I had not heard any details about how the process of updating the code
> lists will change, how frequently the changes will occur, how they will be
> announced, whether and how the format of official data files will change,
> etc.
>
> None of this is meant to suggest that these changes are a bad thing. I
> think they will greatly reduce the confusion people have about ISO 639.
>
> > I am not so sure about that the new setting for ISO 639  really means
> > profound changes for BCP 47.  There will still be a new version of the
> > ISO 639 database (file) in regular points in time (I believe this will
> > certainly not become a continuous process, but rather be bundled to
> > one to a few times per year) which will have to formally be adopted by
> > the BCP 47 registry.
>
> Please do keep this list informed as details become available.
>
> --
> Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org
>
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages@ietf.org
> https://www.ietf.org/mailman/listinfo/ietf-languages
>