Re: [Ltru] Macrolanguage usage

"Don Osborn" <> Wed, 21 May 2008 03:30 UTC

Return-Path: <>
Received: from [] (localhost []) by (Postfix) with ESMTP id A6A7C3A6CA9; Tue, 20 May 2008 20:30:58 -0700 (PDT)
Received: from localhost (localhost []) by (Postfix) with ESMTP id 4FF6B28CB58 for <>; Tue, 20 May 2008 20:30:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -0.74
X-Spam-Status: No, score=-0.74 tagged_above=-999 required=5 tests=[BAYES_20=-0.74]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id cNPGqvoXZaJO for <>; Tue, 20 May 2008 20:30:55 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 5E8D628C9A6 for <>; Tue, 20 May 2008 07:24:33 -0700 (PDT)
Received: (qmail 25703 invoked from network); 20 May 2008 09:24:33 -0500
Received: from (HELO IBM92AA25595C4) ( by with SMTP; 20 May 2008 09:24:31 -0500
From: "Don Osborn" <>
To: "'LTRU Working Group'" <>
References: <> <00a901c8b6f5$c04529a0$e6f5e547@DGBP7M81> <> <005901c8b787$930f98c0$6801a8c0@oemcomputer> <> <> <> <>
In-Reply-To: <>
Date: Tue, 20 May 2008 10:24:27 -0400
Message-ID: <001001c8ba85$35ef6fc0$a1ce4f40$@net>
MIME-Version: 1.0
X-Mailer: Microsoft Office Outlook 12.0
Thread-Index: Aci6c/7dF0rH03nJQK+BLLFsMNQvdgACaMjg
Content-Language: en-us
Subject: Re: [Ltru] Macrolanguage usage
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64

Is anyone on this list following this discussion and the evolution of the standard from the point of view of African languages? I haven't been able to stay with it for several reasons, so apologies in advance if the following is redundant to what has already been taken into account.

Much of my concern re the form and use of language tags have been voiced previously on this list and IETF-languages. I do think that as we get more people working on African language web content, software localization, etc., there will be more questions about available ISO 639 code elements and how those are (and can be) used. 

Just yesterday I learned that a small group has translated Opera in Songhai and has plans to localize some other open source software. The tag used in the .lng file is "soŋ" which fills their need and appears to approximate most closely to the ISO 639-2 code "son" - the latter representing a cluster, not even a macrolanguage. There is no locale for any variety of Songhai (which would be a topic for another list of course). I'm in communication with them for more information on the intent of their current projects but my impression is that they are expressedly not intending to limit themselves to any single currently encoded entity (khq, ses, dje, hmb, ddn, or whatever). 

It may be that for some purposes, localizers and content developers may prefer a broader tag than what is available in ISO 639-3, while in other cases find that the latter fits the need. In some cases perhaps the optimal tags do not exist yet. What this says to me is that there is a need both for flexibility and for various ways to define and refine relationships among tags

The reality of language is of course quite fluid, and we see that especially in cases where there is no single standard language form among several very closely related tongues. Even where linguists or diverse country authorities have established rules of orthography and so on, you may have situations where the optimal localization may be a series of compromises (or a reflection of either dialect leveling or an awareness of speakers of how things are expressed in different varieties of their language). That in turn may require feedback from users to know how well it works for them, but the end result in any case may be new information about tags and relationships among them.

The needs of information technology of course tend to be for clear, mutually exclusive categories (a bit cannot be somewhere between 0 and 1, nor sometimes both 0 and 1, and so on, on up). The only way to accommodate the fluidity of language in this environment seems to be with options of resort to wider or narrower definitions.

Hopefully what I'm suggesting has already been taken into account, and if so, apologies for the redundancy. But whatever the case, I do think that as more languages enter "cyberspace" mure fully in their various forms, there will be more questions by more people about how existing tags, and the systems for using them to identify content, write locales etc., respond to their needs.


Don Osborn 

Ltru mailing list