Re: [Ietf-languages] Forms for subtag kmpre20c

Hi ��lie,

 �

The concept of macrolanguages was introduced as a means to harmonize ISO639 part 1 and 2 with the new part 3 in the early 2000s, as in some cases in the Ethnologue (basis for ISO 639-3) there were several entries where ISO 639 1/2 only had one, as in the case of zh. SIL International, the editor of Ethnologue and maintenance agency for part 3, does not intend to introduce further macrolanguage-codes, but your examples with Tibetan and Khmer may be good cases for rethinking that policy.  If there are indeed orally mutually unintelligible languages whose written texts are virtually undistinguishable, that would exactly be a good case for introducing a new macrolanguage, as you suggest. I would submit such a request, well substantiated. Gary and Melinda are in CC; they would be the people who would oversee the decision process.  Perhaps they have an opinion right away.

 �

Below I copy the section in the upcoming revised ISO 639-4 that deals with macrolanguages.  It is not yet official, but it shows how we have discussed this issue.

 �

Sebastian

 �

Macrolanguages

Parts 1 and 2 of ISO 639 include language identifiers that correspond in a one-to-many manner to language identifiers for individual language varieties in Part 3 of ISO 639.

For instance, Part 3 of ISO 639 contains 2 language identifiers designated as individual language identifiers for distinct language varieties of Azerbaijani which have separate literatures (North Azerbaijani [azj] and South Azerbaijani [azb]), while Parts 1 and 2 each contain only one language identifier for Azerbaijani, [az] and [aze]. The single language identifiers for Azerbaijani in Parts 1 and 2 of ISO 639 correspond to the multiple language identifiers for distinct language varieties of Azerbaijani in Part 3 of ISO 639.

Under a language coding perspective, a somehow similar – however different from a cultural and socio-political perspective – situation exists for:

*	Multiple closely related Chinese languages which share a common written form.
*	The individual languages Bosnian, Croatian, Serbian, etc. where in some contexts it is necessary to make a distinction, while there are other contexts in which these distinctions are not discernible in ‘Serbo-Croatian’ language resources that are in use.

Where situations like the above exist, a language identifier in parts 1 and 2 for the single, common language identity is considered as a macrolanguage identifier.

Macrolanguages are distinguished from language groups in that the individual languages that correspond to a macrolanguage must be very closely related, and there must be some application for which only a single language identity is recognized.

 �

-- 

Museu P.E. Goeldi, CCH, Linguistica � ▪  �Av. Perimetral, 1901

Terra Firme, CEP: 66077-530 � ▪ � Belém do Pará – PA  �▪ � Brazil

drude@xs4all.nl  �▪ � +55 (91) 3217 6024

 �

-----Original Message-----
From: Ietf-languages <ietf-languages-bounces@ietf.org> On Behalf Of Élie Roux
Sent: Tuesday, December 3, 2019 5:03 AM
To: IETF Languages Discussion <ietf-languages@iana.org>
Subject: Re: [Ietf-languages] Forms for subtag kmpre20c

 �

> Can you elaborate on what you mean by this? On the surface, I couldn't disagree more, but I assume I'm missing something.

 �

I think it comes from a few different angles:

 �

1. my experience with databases in the field that I'm working in (Buddhist studies) is that they use zh for Buddhist texts in Chinese (translated between the 4th and 11th c. give or take) and I'm quite happy to do that too as nobody in the field requires the distinction between the different flavors of Chinese, so zh perfectly fits the purpose.

 �

2. my experience with the same databases is that they all use the bo lang tag for Tibetan. Unfortunately bo is not a macrolanguage, it's supposed to be the language spoken in some areas today ( <https://iso639-3.sil.org/code/bod> https://iso639-3.sil.org/code/bod). This language is very different from most of the literature we have in our database which is Classical Tibetan, which has its own tag (xct). Also, I struggle a bit to make sense of the "bo" lang tag as: someone from Amdo (thus not speaking "bo" but "adx") and someone from Lhasa (speaking "bo") can't understand each other in speech, but the way they write is the quasi identical. So how do you tag a blog article? If you don't know the origin of the article, you can say it's "Literary Tibetan", which has no tag, but you can't say for sure what "language" it is. And for short sentences (such as titles like what we have in our database), there's a great deal of overlap between Modern Literary Tibetan and Classical Tibetan. And (in our applications) we don't care about this distinction, we don't want to have to choose. And if we don't want to chose, the only option is "und", which, to be honest, I find perfectly ridiculous. So, we're sticking with bo even though it's not true...

But if there was a macrolanguage, we would definitely use it.

 �

3. I suspect the situation with Khmer is actually the same, as well as probably for Tham, Khom, etc.

 �

And I don't know why some umbrella languages exist (such as zh, but also inc or pra that I find useful), and why others don't...

 �

Anyways, this is none of IETF's concern, I should bring that to SIL.

 �

Best,

--

Elie

 �

_______________________________________________

Ietf-languages mailing list

 <mailto:Ietf-languages@ietf.org> Ietf-languages@ietf.org

 <https://www.ietf.org/mailman/listinfo/ietf-languages> https://www.ietf.org/mailman/listinfo/ietf-languages