RE: [Ltru] "X" vs. 'X (macrolanguage)"

Peter Constable <> Sat, 08 December 2007 19:34 UTC

Return-path: <>
Received: from [] ( by with esmtp (Exim 4.43) id 1J15RQ-0006Hx-4E; Sat, 08 Dec 2007 14:34:24 -0500
Received: from ltru by with local (Exim 4.43) id 1J15RO-0006Hr-Vo for; Sat, 08 Dec 2007 14:34:22 -0500
Received: from [] ( by with esmtp (Exim 4.43) id 1J15RO-0006Hj-Ie for; Sat, 08 Dec 2007 14:34:22 -0500
Received: from ([]) by with esmtp (Exim 4.43) id 1J15RN-0008G0-3V for; Sat, 08 Dec 2007 14:34:22 -0500
Received: from ( by ( with Microsoft SMTP Server (TLS) id; Sat, 8 Dec 2007 11:34:19 -0800
Received: from ([]) by ([]) with mapi; Sat, 8 Dec 2007 11:34:20 -0800
From: Peter Constable <>
To: LTRU Working Group <>
Date: Sat, 08 Dec 2007 11:34:13 -0800
Subject: RE: [Ltru] "X" vs. 'X (macrolanguage)"
Thread-Topic: [Ltru] "X" vs. 'X (macrolanguage)"
Thread-Index: Acg5ynTKngInpGdzTSGTnCMpaPTGnAABYcIQ
Message-ID: <>
References: <000501c83960$e8e514f0$6601a8c0@DGBP7M81> <> <> <002601c839c3$aec71df0$6601a8c0@DGBP7M81> <>
In-Reply-To: <>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
MIME-Version: 1.0
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 07e9b4af03a165a413ec6e4d37ae537b
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
Content-Type: multipart/mixed; boundary="===============1966073001=="

There is a deeper problem we should be considering: when does it make sense for people to use the macro-language or the individual-language IDs in cases like Swahili, Malay, etc.? In fact, when does it make sense to use which in cases for any macrolanguage ID in widespread use when there is a dominant encompassed language? This is an issue in these cases:

ar/ara vs. arb (Arabic)
kok vs. Knn (Konkani)
ms/msa vs. mly (Malay)
sw/swa vs. swh (Swahili)
uz/uzb vs. uzn (Uzbek)
zh/zho vs. cmn (Chinese)

These are all the cases in category 2 of ‘Macrolanguage analysis.txt’ which I sent as an attachment on 11/29. I didn’t include Dogri as I had no indication that doi is widely used, but we can certainly include it for consideration of this issue.


From: Mark Davis []
Sent: Saturday, December 08, 2007 10:45 AM
To: Doug Ewell
Cc: LTRU Working Group
Subject: Re: [Ltru] "X" vs. 'X (macrolanguage)"

Good points. My primary concern is with the presentation of language names in UIs, and a listing of Swahili like the following is not going to be comprehensible to anyone except for the 0.0000005% of people who are familiar with these standards. It's a bit better with the names for Chinese, as you point out.

Swahili (macrolanguage)
Swahili (individual language)

although even with Chinese, someone is going to be confused as to which they should pick.

Now before someone says it, I realize that the names in a UI don't have to be the same as the names in the registry. But the closer we get them to being understandable by mortals, the more likely it is that we will have non-confusing names show up in UIs. We might even leave the above in the registry, but have a note in 4646bis about the names. So I think it's worth our taking at least a little bit of time to discuss this.

The best I could think of off-hand was something like.
Swahili (general)

and maybe some UI device like a link on general to get to a box with more information.

As it turns out, there are just 4 ambiguous names after removing " (macrolanguage)" and " (individual language)":
















In all other cases, the names are different, often the result of some adjective modifying the individual language. It appears that there are a number of alternate names in SIL; perhaps we can use one of the alternate names (or have a note that points to the possible use of alternate names?)

doi: Dhogaryali, Dogari, Dogri Jammu, Dogri-Kangri, Dogri Pahari, Dongari, Hindi Dogri, Tokkaru
kok: Konkan Standard, Bankoti, Kunabi, North Konkan, Central Konkan, Concorinum, Cugani, Konkanese
msa: Bahasa Malaysia, Bahasa Malayu, Malayu, Melaju, Melayu, Standard Malay
swa: Kiswahili, Kisuaheli

On Dec 8, 2007 9:56 AM, Doug Ewell <<>> wrote:
Mark Davis wrote:

> I'd favor:
> Description: Swahili
> A vanishingly small number of people will know what "macrolanguage"
> means or how it should be translated, so having it be part of the name
> would be clumsy.
It's going to be confusing no matter what we do, because most people
think 'sw' refers to Swahili, the individual language, and now we are
telling them that language is really 'swh'.  This is worse than the
Chinese case, where at least none of the encompassed languages is called
simply "Chinese."

> I also thought we were going to have macrolanguage as a field -- that
> would be clean way to do it, and if so, then including "
> (macrolanguage)" would also be redundant.
But the macrolanguage 'sw' won't be marked in any special way; only the
presence of other language subtags with "Macrolanguage: sw" will provide
any clue.

I can see advantages and disadvantages to both sides here.  I also don't
like exposing the term "macrolanguage" in user interfaces (c'mon, you
know the Description fields will end up there) and forcing civilians to
understand what a macrolanguage is.  But then what does it mean to have
a choice between plain "Swahili" and "Swahili (individual language)"?
And what about the people who said we MUST keep all the ISO 639 names
intact, without changing so much as a hyphen or apostrophe, so the
subtags could be related back to the standard?   I hope more people
contribute their thoughts on this.

Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14<>  ˆ

Ltru mailing list