Re: [Ietf-languages] Forms for subtag kmpre20c

<drude@xs4all.nl> Tue, 03 December 2019 13:48 UTC

Return-Path: <drude@xs4all.nl>
X-Original-To: ietf-languages@ietfa.amsl.com
Delivered-To: ietf-languages@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 98B091208D5 for <ietf-languages@ietfa.amsl.com>; Tue, 3 Dec 2019 05:48:28 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.623
X-Spam-Level:
X-Spam-Status: No, score=-3.623 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665, T_KAM_HTML_FONT_INVALID=0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=xs4all.nl
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2MHI7kO1tT36 for <ietf-languages@ietfa.amsl.com>; Tue, 3 Dec 2019 05:48:23 -0800 (PST)
Received: from mork.alvestrand.no (mork.alvestrand.no [158.38.152.117]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2ED6E120874 for <ietf-languages@ietf.org>; Tue, 3 Dec 2019 05:48:22 -0800 (PST)
Received: by mork.alvestrand.no (Postfix) id A40E07C0AA4; Tue, 3 Dec 2019 14:48:19 +0100 (CET)
Delivered-To: ietf-languages@alvestrand.no
Received: from localhost (localhost [127.0.0.1]) by mork.alvestrand.no (Postfix) with ESMTP id 860EF7C0ABF for <ietf-languages@alvestrand.no>; Tue, 3 Dec 2019 14:48:19 +0100 (CET)
X-Virus-Scanned: Debian amavisd-new at alvestrand.no
Authentication-Results: mork.alvestrand.no (amavisd-new); dkim=pass (2048-bit key) header.d=xs4all.nl
Received: from mork.alvestrand.no ([127.0.0.1]) by localhost (mork.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lHcF6sJvkimX for <ietf-languages@alvestrand.no>; Tue, 3 Dec 2019 14:48:13 +0100 (CET)
X-Greylist: from auto-whitelisted by SQLgrey-1.8.0
X-Greylist: from auto-whitelisted by SQLgrey-1.8.0
X-Comment: SPF skipped for whitelisted relay - client-ip=192.0.46.74; helo=pechora8.dc.icann.org; envelope-from=drude@xs4all.nl; receiver=ietf-languages@alvestrand.no
Received: from pechora8.dc.icann.org (pechora8.icann.org [192.0.46.74]) by mork.alvestrand.no (Postfix) with ESMTPS id 3D84D7C0AA4 for <ietf-languages@alvestrand.no>; Tue, 3 Dec 2019 14:48:13 +0100 (CET)
Received: from lb3-smtp-cloud9.xs4all.net (lb3-smtp-cloud9.xs4all.net [194.109.24.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pechora8.dc.icann.org (Postfix) with ESMTPS id AC77FC02D1 for <ietf-languages@iana.org>; Tue, 3 Dec 2019 13:48:10 +0000 (UTC)
Received: from COCHSPAT023364 ([200.129.128.7]) by smtp-cloud9.xs4all.net with ESMTPA id c8WjiacHYFc4Fc8WoiTabF; Tue, 03 Dec 2019 14:47:49 +0100
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=xs4all.nl; s=s1; t=1575380869; bh=YeZazm4aq4FD44DGBjObB2UiIhhPjP7ljrR7La4Ru3Y=; h=From:To:Subject:Date:Message-ID:MIME-Version:Content-Type:From: Subject; b=kqY2axnvIMBzpa1fnlvUS9duw505Esi5E5N+ES6vUeuXjilw/j9ntn7grDcLclB87 QVBohnK3TGVO7dXz2n3Uz1Fy7tnDNcfRy++PTcj7r2/vlVOnaRm5UZUi8TyA7Fqwu+ ePN9H0OSICU6jLlEhApPbCE0VZVKJZnJQeTNId77S0zu6cEFyn+Uob94c1r1HzFoCP CxSy1QRU+xNOB8YFKfjIfQYkg9k6tJYHJh6nm+Fgib/MYpXb9ERRAnOzqujNQQpX8M 3PMqm4AswHTsBgjKbTTPm866z++3X9LmA0U/BQ02NZ/O4HfJQiPU5TL7XIq3cwYWNX 4W/uE7xL3/6Mg==
From: drude@xs4all.nl
To: 'Élie Roux' <elie.roux@telecom-bretagne.eu>, 'IETF Languages Discussion' <ietf-languages@iana.org>
Cc: gary_simons@sil.org, Melinda_Lyons@sil.org
References: <20191202165611.665a7a7059d7ee80bb4d670165c8327d.dba149222d.wbe@email03.godaddy.com> <CANfi1JgSUn3YG6M5HH1s3gqHCdKV8Y0CgiQUDi_c-YLNa3fi7A@mail.gmail.com>
In-Reply-To: <CANfi1JgSUn3YG6M5HH1s3gqHCdKV8Y0CgiQUDi_c-YLNa3fi7A@mail.gmail.com>
Date: Tue, 03 Dec 2019 10:47:42 -0300
Message-ID: <010101d5a9e0$40e733c0$c2b59b40$@xs4all.nl>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_NextPart_000_0102_01D5A9C7.1B9C6CC0"
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AQMtDK7c9YLBGVzOuaK41qbHPbpAtgD2ZJ3qpPHui4A=
Content-Language: en-gb
X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.2 (pechora8.dc.icann.org [192.0.46.74]); Tue, 03 Dec 2019 13:48:10 +0000 (UTC)
X-CMAE-Envelope: MS4wfJRSdFZZ9UhyCmZw/pqZBHykKr3vr/cl8lB7Q5sG9B0APUYf1B1GvCmM6kX7g4DaU/dXS013onwC6cZt7aNhb3v2mj7AJkVPLHbkKUI1YpDy3XCumTHq 7Ss4irErBGIyCkaQGjkqv6NAjABA7aDLniPmp+syCJFVnMQBCiN2ugGVqsEGlD8j4EJXU/CAngiCDX63z/3GYDUbJg9dspIyTf1nKl8+m8YLnXIMpaO5tdLQ Ujp1OWCbXmG0TElX58vhiiCWjFeiCyvr7YrQrF7RljA=
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-languages/UDbzTMeMMml96LhOlOjas72mUkg>
Subject: Re: [Ietf-languages] Forms for subtag kmpre20c
X-BeenThere: ietf-languages@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <ietf-languages.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-languages/>
List-Post: <mailto:ietf-languages@ietf.org>
List-Help: <mailto:ietf-languages-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Dec 2019 13:48:29 -0000

Hi ��lie,

 �

The concept of macrolanguages was introduced as a means to harmonize ISO639 part 1 and 2 with the new part 3 in the early 2000s, as in some cases in the Ethnologue (basis for ISO 639-3) there were several entries where ISO 639 1/2 only had one, as in the case of zh. SIL International, the editor of Ethnologue and maintenance agency for part 3, does not intend to introduce further macrolanguage-codes, but your examples with Tibetan and Khmer may be good cases for rethinking that policy.  If there are indeed orally mutually unintelligible languages whose written texts are virtually undistinguishable, that would exactly be a good case for introducing a new macrolanguage, as you suggest. I would submit such a request, well substantiated. Gary and Melinda are in CC; they would be the people who would oversee the decision process.  Perhaps they have an opinion right away.

 �

Below I copy the section in the upcoming revised ISO 639-4 that deals with macrolanguages.  It is not yet official, but it shows how we have discussed this issue.

 �

Sebastian

 �

Macrolanguages

Parts 1 and 2 of ISO 639 include language identifiers that correspond in a one-to-many manner to language identifiers for individual language varieties in Part 3 of ISO 639.

For instance, Part 3 of ISO 639 contains 2 language identifiers designated as individual language identifiers for distinct language varieties of Azerbaijani which have separate literatures (North Azerbaijani [azj] and South Azerbaijani [azb]), while Parts 1 and 2 each contain only one language identifier for Azerbaijani, [az] and [aze]. The single language identifiers for Azerbaijani in Parts 1 and 2 of ISO 639 correspond to the multiple language identifiers for distinct language varieties of Azerbaijani in Part 3 of ISO 639.

Under a language coding perspective, a somehow similar – however different from a cultural and socio-political perspective – situation exists for:

*	Multiple closely related Chinese languages which share a common written form.
*	The individual languages Bosnian, Croatian, Serbian, etc. where in some contexts it is necessary to make a distinction, while there are other contexts in which these distinctions are not discernible in ‘Serbo-Croatian’ language resources that are in use.

Where situations like the above exist, a language identifier in parts 1 and 2 for the single, common language identity is considered as a macrolanguage identifier.

Macrolanguages are distinguished from language groups in that the individual languages that correspond to a macrolanguage must be very closely related, and there must be some application for which only a single language identity is recognized.

 �

-- 

Museu P.E. Goeldi, CCH, Linguistica � ▪  �Av. Perimetral, 1901

Terra Firme, CEP: 66077-530 � ▪ � Belém do Pará – PA  �▪ � Brazil

drude@xs4all.nl  �▪ � +55 (91) 3217 6024

 �

-----Original Message-----
From: Ietf-languages <ietf-languages-bounces@ietf.org> On Behalf Of Élie Roux
Sent: Tuesday, December 3, 2019 5:03 AM
To: IETF Languages Discussion <ietf-languages@iana.org>
Subject: Re: [Ietf-languages] Forms for subtag kmpre20c

 �

> Can you elaborate on what you mean by this? On the surface, I couldn't disagree more, but I assume I'm missing something.

 �

I think it comes from a few different angles:

 �

1. my experience with databases in the field that I'm working in (Buddhist studies) is that they use zh for Buddhist texts in Chinese (translated between the 4th and 11th c. give or take) and I'm quite happy to do that too as nobody in the field requires the distinction between the different flavors of Chinese, so zh perfectly fits the purpose.

 �

2. my experience with the same databases is that they all use the bo lang tag for Tibetan. Unfortunately bo is not a macrolanguage, it's supposed to be the language spoken in some areas today ( <https://iso639-3.sil.org/code/bod> https://iso639-3.sil.org/code/bod). This language is very different from most of the literature we have in our database which is Classical Tibetan, which has its own tag (xct). Also, I struggle a bit to make sense of the "bo" lang tag as: someone from Amdo (thus not speaking "bo" but "adx") and someone from Lhasa (speaking "bo") can't understand each other in speech, but the way they write is the quasi identical. So how do you tag a blog article? If you don't know the origin of the article, you can say it's "Literary Tibetan", which has no tag, but you can't say for sure what "language" it is. And for short sentences (such as titles like what we have in our database), there's a great deal of overlap between Modern Literary Tibetan and Classical Tibetan. And (in our applications) we don't care about this distinction, we don't want to have to choose. And if we don't want to chose, the only option is "und", which, to be honest, I find perfectly ridiculous. So, we're sticking with bo even though it's not true...

But if there was a macrolanguage, we would definitely use it.

 �

3. I suspect the situation with Khmer is actually the same, as well as probably for Tham, Khom, etc.

 �

And I don't know why some umbrella languages exist (such as zh, but also inc or pra that I find useful), and why others don't...

 �

Anyways, this is none of IETF's concern, I should bring that to SIL.

 �

Best,

--

Elie

 �

_______________________________________________

Ietf-languages mailing list

 <mailto:Ietf-languages@ietf.org> Ietf-languages@ietf.org

 <https://www.ietf.org/mailman/listinfo/ietf-languages> https://www.ietf.org/mailman/listinfo/ietf-languages