Re: [Ietf-languages] Latin Sub tags

Sebastian Drude <drude@xs4all.nl> Mon, 04 December 2023 13:24 UTC

Return-Path: <drude@xs4all.nl>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C0565C14F5EB for <ietf-languages@ietfa.amsl.com>; Mon, 4 Dec 2023 05:24:37 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.038
X-Spam-Level:
X-Spam-Status: No, score=-1.038 tagged_above=-999 required=5 tests=[AC_DIV_BONANZA=0.001, BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=fail (2048-bit key) reason="fail (body has been altered)" header.d=xs4all.nl
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pKKC8SklvtGL for <ietf-languages@ietfa.amsl.com>; Mon, 4 Dec 2023 05:24:32 -0800 (PST)
Received: from out.mail.icann.org (out.mail.icann.org [64.78.33.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C7D4DC14F5E5 for <ietf-languages@ietf.org>; Mon, 4 Dec 2023 05:24:31 -0800 (PST)
Received: from MBX112-W2-CO-1.pexch112.icann.org (10.226.41.128) by MBX112-E2-CO-1.pexch112.icann.org (10.226.41.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Mon, 4 Dec 2023 05:24:30 -0800
Received: from aesmt112-va-1-1.serverpod.net (10.216.74.34) by MBX112-W2-CO-1.pexch112.icann.org (10.226.41.129) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28 via Frontend Transport; Mon, 4 Dec 2023 05:24:30 -0800
Received: from aesc112-va-1-1.serverpod.net (aesc112-va-1-1.serverpod.net [10.216.76.34]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by aesmt112-va-1.serverpod.net (Postfix) with ESMTPS id E9DFCA0003 for <ietf-languages@ex.icann.org>; Mon, 4 Dec 2023 05:24:29 -0800 (PST)
Received: from exmx112-va-1-2.serverpod.net (exmx112-va-1-2.serverpod.net [10.216.72.35]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by aesmt112-va-1.serverpod.net (Postfix) with ESMTPS id AA79B60002 for <ietf-languages@ex.icann.org>; Mon, 4 Dec 2023 05:24:29 -0800 (PST)
Received: from pechora3.dc.icann.org (pechora3.icann.org [192.0.46.73]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by east.smtp.mx.icann.org (Postfix) with ESMTPS id 815891C0002 for <ietf-languages@ex.icann.org>; Mon, 4 Dec 2023 05:24:28 -0800 (PST)
Received: from ewsoutbound.kpnmail.nl (ewsoutbound.kpnmail.nl [195.121.94.184]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by pechora3.dc.icann.org (Postfix) with ESMTPS id 4049970000D3 for <ietf-languages@iana.org>; Mon, 4 Dec 2023 13:24:28 +0000 (UTC)
X-KPN-MessageId: 6313c9d8-92a8-11ee-91e7-005056994fde
Received: from smtp.kpnmail.nl (unknown [10.31.155.6]) by ewsoutbound.so.kpn.org (Halon) with ESMTPS id 6313c9d8-92a8-11ee-91e7-005056994fde; Mon, 04 Dec 2023 14:24:00 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=xs4all.nl; s=xs4all01; h=content-type:mime-version:message-id:date:subject:to:from; bh=r6F5iCc5IRvFywnnwwlEpQNayffRemoLFtGfWAPXuKA=; b=E2/WIDJxDsbxPn8pm/KwWX83f7GYwa6Hjpa50TYpa6LZVCUPUw8Gpm00YQaSTBpDbSBO+05NXwssF yz2ENULXNvqjs9/QxoaSeCcduPvJL4VyBRgG3EMy2BvX2Z2Ovqp2r9kT8ekedUSRH9f521sFYHU3pc ZwxmnAbjUbbZD1uqk7gy4fWLLB0sT/2631ksYQ2g7OTe0ZOKjnovTO+VqH3VxFLy8Y7rwiNwfq/lAo D4JckejHZHOHBOSrG8MUQ4nmAjnAeqe+lcAObh+P9OvoPDddKbrw7xUcRhYz+tDRgIZpqBo75DAbhV d8AqLDNWEHfjF9wcHzWISfciC43x3YA==
X-KPN-MID: 33|Y1uiuVELVPaJrtcHDCiYVc3TGlBv8pbXQuyi43ypV0h+IYb78ydKie4T2IA6g1E NBah99CJQXdGgmMRWJqSMftVej2PlYkPS2ZBtDPvTOhg=
X-KPN-VerifiedSender: Yes
X-CMASSUN: 33|0FoYHDp611EjrgOMQPffv0+hoagbAx+GjFFAWxgTmOO2078hqhrPbgOyeUDPYaD F4ZXOtLLbfAVfjy25pKiwLg==
X-Originating-IP: 200.129.128.254
Received: from PAT023364 (vpn.museu-goeldi.br [200.129.128.254]) by smtp.kpnmail.nl (Halon) with ESMTPSA id 637b1c22-92a8-11ee-9dc8-00505699772e; Mon, 04 Dec 2023 14:24:04 +0100 (CET)
From: Sebastian Drude <drude@xs4all.nl>
To: 'Mark Davis ☕️' <mark@macchiato.com>
CC: 'Hugh Paterson III' <sil.linguist@gmail.com>, 'Doug Ewell' <doug@ewellic.org>, 'IETF Languages Discussion' <ietf-languages@iana.org>
References: <CAE=3Ky-swzNn1hXba=muJF_radLugdKxhJ=u-_DcLiysDbothw@mail.gmail.com> <SJ0PR03MB6598925FCBA24238F417BE25CA82A@SJ0PR03MB6598.namprd03.prod.outlook.com> <CAE=3Ky_g3Sd7eBNq7H7_5BvkP-2qCbyYzQdO_eCSDzc70kQbJQ@mail.gmail.com> <012401da23b4$a9a291a0$fce7b4e0$@xs4all.nl> <SJ0PR03MB6598D48CD059645543661CA4CA81A@SJ0PR03MB6598.namprd03.prod.outlook.com> <018b01da2497$5094ed90$f1bec8b0$@xs4all.nl> <SJ0PR03MB65988A4F7614E585DF2BA6CBCA81A@SJ0PR03MB6598.namprd03.prod.outlook.com> <CAE=3Ky-jSJPg4pdKFbeabzHjOHzppasFV88J_v22RxKhATzxHA@mail.gmail.com> <000801da24b1$176928f0$463b7ad0$@xs4all.nl> <CAJ2xs_FSBeryEr=zwj85Ac7_xqfwyE=_rKRd25+fCvCHjLY9Gg@mail.gmail.com>
In-Reply-To: <CAJ2xs_FSBeryEr=zwj85Ac7_xqfwyE=_rKRd25+fCvCHjLY9Gg@mail.gmail.com>
Date: Mon, 04 Dec 2023 10:25:29 -0300
Message-ID: <416401da26b5$5bf219e0$13d64da0$@xs4all.nl>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_NextPart_000_4165_01DA269C.36A72BD0"
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AQDLQAA9IMh7Xlm6KelyaCw0s8JhTAGDerWTApsc6lgDWU7WUgICoMC4AoCCy88CADhVgQG2tt9mAZqIO4gDCsbSf7IULdIw
Content-Language: en-gb
X-CMAE-Score: 0
X-CMAE-Analysis: v=2.4 cv=C8b6dCD+ c=1 sm=1 tr=0 ts=656dd30c a=Z2iVbzAMQWfC12katpY7Eg==:117 a=Z2iVbzAMQWfC12katpY7Eg==:17 a=LomH26ciyqUA:10 a=e2cXIFwxEfEA:10 a=DAwyPP_o2Byb1YXLmDAA:9 a=xOd6jRPJAAAA:8 a=48vgC7mUAAAA:8 a=pGLkceISAAAA:8 a=nORFd0-XAAAA:8 a=I0CVDw5ZAAAA:8 a=06aqjJ_2AAAA:8 a=BhaIOSjly116TcM9tpsA:9 a=QEXdDO2ut3YA:10 a=yMhMjlubAAAA:8 a=SSmOFEACAAAA:8 a=Up8Ivsa7XPx_ErBY:21 a=gKO2Hq4RSVkA:10 a=UiCQ7L4-1S4A:10 a=hTZeC7Yk6K0A:10 a=frz4AuCg-hUA:10 a=lqcHg5cX4UMA:10 a=w1C3t2QeGrPiZgrLijVG:22 a=AYkXoqVYie-NGRFAsbO8:22 a=YdXdGVBxRxTCRzIkH2Jn:22
X-SOURCE-IP: 192.0.46.73
X-SPF-STATUS: soft_fail
X-SPF-FROM-STATUS: not_checked
X-RDNS-STATUS: pass
X-HELO-STRING: pechora3.dc.icann.org
Spam-Stopper-Id: 624aee7a-2b77-45a0-8b30-3dc26bb78d3d
Spam-Stopper-v2: Yes
X-Envelope-Mail-From: drude@xs4all.nl
X-AES-Category: LEGIT
X-Spam-Reasons: None
X-Spam-Category: None
X-AES-Analytics-Data: eyJ0aW1lc3RhbXAiOiAiMjAyMy0xMi0wNFQxMzoyNDoyOS44MTJaIiwgIm1lc3NhZ2VUcmFja2luZyI6IHsiaGFuZGxpbmciOiBbIlRISVJEIFBBUlRZIEJZUEFTUyJdLCAidW5pZmllZENhdGVnb3J5IjogIlVOQ0FURUdPUklTRUQifSwgImVuZ2luZXMiOiB7fX0=
X-Auto-Response-Suppress: DR, OOF, AutoReply
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/baWSuWL6N4K2_X_ZGPR4GdzksLI>
Subject: Re: [Ietf-languages] Latin Sub tags
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Review of requests for language tag registration according to BCP 47 \(RFC 4646\)" <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Dec 2023 13:24:37 -0000

Thanks a lot, Mark, 

for sharing this.  
It is certainly highly relevant for our discussions in the MA.

Sebastian

-- 

Museu P.E. Goeldi, CCH, Linguistica � ▪  �Av. Perimetral, 1901

Terra Firme, CEP: 66077-530 � ▪ � Belém do Pará – PA  �▪ � Brazil

 <mailto:drude@xs4all.nl> drude@xs4all.nl  �▪ � +55 (91) 3217 6024  �▪ � +55 (91) 983733319

 �

From: Ietf-languages <ietf-languages-bounces@ietf.org> On Behalf Of Mark Davis ☕️
Sent: Sunday, December 3, 2023 10:46 PM
To: drude@xs4all.nl
Cc: Hugh Paterson III <sil.linguist@gmail.com>; Doug Ewell <doug@ewellic.org>; IETF Languages Discussion <ietf-languages@iana.org>
Subject: Re: [Ietf-languages] Latin Sub tags

 �

> In January 2010, ISO 639-3/RA reclassified the individual language [lav] Latvian as a macrolanguage, encompassing the two new individual languages [lvs] Standard Latvian and [ltg] Latgalian.

If there is no code for a given language X, people will often end up tagging it with a language code Y that does exist where common names for X reference the name for Y (even where there is little mutual comprehensibility between X and Y). But then later on, a code is assigned for X. The question is, what to do about the relationship between X and Y?

(Below I’m following BCP47 in using the ‘shortest form’, thus ‘de’ (ISO 639-1) for ‘deu’ (ISO 639-3), ‘ar’ = ‘ara’, etc.)

1.	No macrolanguage. This model is the one used for German, French, and many other languages. For German, the original code ‘de’ is left as the “standard” form Y, and the X code is just a related language. The ‘de’ code is not reclassified as a macrolanguage.
2.	Reclassifying as macrolanguage. Another model is to reclassify the original code so that it doesn’t really refer to ‘a’ language, but rather encompasses many different languages, with perhaps little mutual comprehensibility among them. Under that model, ‘de’ changes to a macrolanguage that encompasses {gct, gsw, swg, wae, bar, vmf, ksh, ltz, pfl, sxu, nds, …} and a separate code (eg, /dea/) is added that would mean standard German specifically. This is the model that ISO used with ‘ar’, for example: ‘ar’ became a macro language code, and ‘arb’ was added as the standard form.
3.	New code for macrolanguage. There is a third possible model: ‘de’ remains a non-macro language, but a new macro language (eg, /dey/) gets added that encompasses{de, gct, gsw, swg, wae, bar, vmf, ksh, ltz, pfl, sxu, nds, …}. (I don’t know whether this has ever happened.)

Although well-intentioned, reclassification of macrolanguages has the by-product of introducing compatibility complications for a large number of — if not most — implementations, because in practice implementations need to interact with data using the original code. (There is little consistency in the reclassifications, as well: why is de different from ar?)

For example, to combat the ambiguity in CLDR we treat the macrolanguage and the predominant encompassed language as identical, and favor the macrolanguage form for backwards compatibility. Why do this? That’s because when an implementation asks for a set of locale data for ‘ar’, it really doesn’t expects to get a hodgepodge of possible encompassed languages (some Algerian Arabic, some Egyptian Arabic, some Gulf Arabic, and some Modern Standard Arabic) in return; it expects Modern Standard Arabic.

While it is too late to change the macrolanguage structure, I would advise against further “reclassifications”, since they just make it harder for people, not easier.

Mark

 �

 �

On Fri, Dec 1, 2023 at 3:50 PM <drude@xs4all.nl <mailto:drude@xs4all.nl> > wrote:

In our very first meeting of the MA a couple of weeks ago, he started to discuss these matters, so please do not expect any solution let alone changes in the coming weeks or months. � 

The goal is to have ONE database, the data of which can be obtained in various formats, visually on webpages and as data files. � CSV should certainly continue to be supported; I agree. � 

As I said, the discussions have just started, but as for versioning, a system such as a GIT (as in GitHub) was mentioned as an option. � Which includes the option that the database is a CSV-file.

Best, Sebastian

 �

From: Ietf-languages <ietf-languages-bounces@ietf.org <mailto:ietf-languages-bounces@ietf.org> > On Behalf Of Hugh Paterson III
Sent: Friday, 1 December, 2023 19:03
To: Doug Ewell <doug@ewellic.org <mailto:doug@ewellic.org> >
Cc: drude@xs4all.nl <mailto:drude@xs4all.nl> ; IETF Languages Discussion <ietf-languages@iana.org <mailto:ietf-languages@iana.org> >
Subject: Re: [Ietf-languages] Latin Sub tags

 �

As a matter of practicality, it would be helpful for my processes if the format of the data files would remain tab delimited or CSV. Sebastian mentions a "database format". I can imagine that as an SQL dump or JSON file, which are valid and in some cases useful formats. However, as a current data consumer of these standards I do prefer �the tab and CSV formats that have existed for some time. It would be useful if there was a single URI to download from. My classic example has been wordpress. �https://wordpress.org/latest if one follows the included link, "latest" will allow curl or wget type resources to always get the latest "version" of wordpress. This URL strategy is really helpful in automation contexts where content is fetched. Note that even though �wordpress is versioned with a semantic versioning system for releases, "latest" in the URL is a generic pointer to the latest version. �

 �

I am curious how the MA will approach the matter of data versioning. This is something that some of my colleagues �and I have often desired from downloaded data (from various sources) in the BCP-47 space. Versions allow us to quickly �look at the state of our systems and determine if we need to update data.

 �

- Hugh

 �

On Fri, Dec 1, 2023 at 1:04 PM Doug Ewell <doug@ewellic.org <mailto:doug@ewellic.org> > wrote:

drude@xs4all.nl <mailto:drude@xs4all.nl>  wrote:

> I am sorry to read this; I was under the impression that core members
> of this list were aware of the revision of ISO 639 and the setting up
> of a Maintenance Agency; this process has been ongoing for many
> years...

I had heard in very broad, general terms that administration of the parts of 639 was to be consolidated, at some point in the future.

I had not heard any details about how the process of updating the code lists will change, how frequently the changes will occur, how they will be announced, whether and how the format of official data files will change, etc.

None of this is meant to suggest that these changes are a bad thing. I think they will greatly reduce the confusion people have about ISO 639.

> I am not so sure about that the new setting for ISO 639 � really means
> profound changes for BCP 47. � There will still be a new version of the
> ISO 639 database (file) in regular points in time (I believe this will
> certainly not become a continuous process, but rather be bundled to
> one to a few times per year) which will have to formally be adopted by
> the BCP 47 registry.

Please do keep this list informed as details become available.

--
Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org <http://ewellic.org> 

_______________________________________________
Ietf-languages mailing list
Ietf-languages@ietf.org <mailto:Ietf-languages@ietf.org> 
https://www.ietf.org/mailman/listinfo/ietf-languages