[Ltru] Macrolanguage and extlang

"Mark Davis" <mark.davis@icu-project.org> Sat, 14 July 2007 01:06 UTC

Return-path: <ltru-bounces@ietf.org>
Received: from [] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1I9W5K-0000zd-Us; Fri, 13 Jul 2007 21:06:10 -0400
Received: from ltru by megatron.ietf.org with local (Exim 4.43) id 1I9W5J-0000s2-Hk for ltru-confirm+ok@megatron.ietf.org; Fri, 13 Jul 2007 21:06:09 -0400
Received: from [] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1I9W5I-0000or-Pd for ltru@ietf.org; Fri, 13 Jul 2007 21:06:08 -0400
Received: from wa-out-1112.google.com ([]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1I9W5E-0007VT-13 for ltru@ietf.org; Fri, 13 Jul 2007 21:06:08 -0400
Received: by wa-out-1112.google.com with SMTP id k17so914721waf for <ltru@ietf.org>; Fri, 13 Jul 2007 18:06:03 -0700 (PDT)
DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:mime-version:content-type:x-google-sender-auth; b=ML5AyyrofsZIxyaffFhqUZeWJ+jwV/2lr0nT0tM1skbhBRdlejL7MuxfiqZCNyLn8u8aWUbN3SrwaPlaCtaEW6sIRlYBJzWfyVPoNm9V9WqMjSjoIvsWnL7bORYtqOgUVrUHkwkA8MJd4FACZjvQx2kA9rvYQgEzLzVfklA2LM4=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:mime-version:content-type:x-google-sender-auth; b=JEhZw2s2CLluogp4knlgAj5OwwJyMSPrMhH+4moZ3f69YcqMoKmC2xA/frEpG5FS/XZos8MVmgd7NwGOrcWJbhNz7jHU0PBeELqeAD4vkyjJ9oK2X/ExoyB9yIyvggiJjSS/+LGj/et+fRj8y5pVOm6S0RqOdCf/woQ54VNfNLw=
Received: by with SMTP id a12mr2141665waj.1184375163321; Fri, 13 Jul 2007 18:06:03 -0700 (PDT)
Received: by with HTTP; Fri, 13 Jul 2007 18:06:03 -0700 (PDT)
Message-ID: <30b660a20707131806o19919cc7v97cc82f3eada43ff@mail.gmail.com>
Date: Fri, 13 Jul 2007 18:06:03 -0700
From: Mark Davis <mark.davis@icu-project.org>
To: LTRU Working Group <ltru@ietf.org>
MIME-Version: 1.0
X-Google-Sender-Auth: 161503b4f10a478f
X-Spam-Score: 0.3 (/)
X-Scan-Signature: 21bf7a2f1643ae0bf20c1e010766eb78
Subject: [Ltru] Macrolanguage and extlang
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============1523894569=="
Errors-To: ltru-bounces@ietf.org

Addision and I have discussed the issue of extlang and Macrolanguages and
are proposing the following text replacing the use of extlang.

*[A new section called Macrolanguages: ]*

The Macrolanguage field contains a primary language subtag that *encompasses
* this subtag. That is, this language is a dialect or sub-language of the
Macrolanguage, and is called an *encompassed* subtag. The Macrolanguage
value is defined by ISO 639-3. The field can be useful to applications or
users when selecting language tags or as additional metadata useful in
matching. The Macrolanguage field can only occur in records of type
'language'. Only values assigned by ISO 639-3 will be considered for
inclusion. Macrolanguage fields MAY be added via the normal registration
process whenever ISO 639-3 defines new values. Macrolanguages are
informational, and MAY be removed or changed if ISO 639-3 changes the
For example, the language subtags 'nb' (Norwegian Bokmal) and 'nn'
(Norwegian Nynorsk) has a Macrolanguage entry of 'no' (Norwegian). For more
information see [Choice].

*[A new section in tag choice (section 4.1), referenced from the above] *

Languages with a Macrolanguage field in the registry sometimes can be
usefully referenced using their Macrolanguage. However, the Macrolanguage
field doesn't define what the relationship is between the language subtag
whose record it appears in and its encompassed language or languages. Nor
does it define how the encompassed languages are related to one-another. In
some cases, the Macrolanguage has a standard form as well as a variety of
less-common dialects. For example, the Macrolanguage 'ar' (Arabic) and the
subtag 'arb' (Standard Arabic) generally describe the same language, with
other subtags describing less-common local variations. In other cases there
is no particular standard form and the encompassed subtags describe specific
variations within the parent language.

Applications MAY use Macrolanguage information to improve matching or
language negotiation. For example, the information that 'sr' and 'hr' share
a Macrolanguage expresses a closer relation between those languages than
between, say, "sr" and "ma" (Macedonian). It is valid to use either the
encompassed language or its Macrolanguage to form language tags. However,
many matching applications will not be aware of the relationship between the
languages. Care in selecting which subtags are used is crucial to
interoperability. In general, use the most specific tag. However, where the
standard written form of an encompassed language is captured by the
Macrolanguage, the Macrolanguage should still be used for written material.

In particular, chinese language(s) and dialects call for special
consideration. Because the written form is very similar for most languages
having 'zh' as a Macrolanguage (and because historically subtags for the
various sub-languages and dialects were not available), languages such as
'yue' (Cantonese) have usually used tags beginning with the subtag 'zh'.
This past practice of tagging means that Macrolanguage information is
encouraged when searching for content or when providing fallbacks in
language negotiation. For example, the information that 'yue' has a
macrolangauge of 'zh' could be used in the Lookup algorithm to fallback from
a request for "yue-Hans-CN" to "zh-Hans-CN" without losing the script and
region information (even though the user did not specify "zh-Hans-CN" in
their language priority list).

However, the Macrolanguage is only one of many additional pieces of
information  that can be used in matching languages. There are many other
circumstances where the "best fit" information is not contained in the
language registry. For example, the languages "ro" (Romanian) and "mo"
(Moldavian) are very closely related, and so for searching it is often best
to treat them as being the same. In other cases, the best fallback for a
requested language may be a completely unrelated language, but one that a
majority of speakers in the requested language may understand. For example,
in a given application the best fallback for "be" (Breton), may be "fr"
(French) -- rather than the more closely related "cy" (Welsh) -- because
Breton readers are far more likely to be able to read French than Welsh.

For more information on matching, see [RFC 4647].

 *[In the section talking about updates]*

The Macrolanguage field is added whenever a language has a corresponding
Macrolanguage in [ISO 639-3]. For example, 'sr' (Serbian) will have the
Macrolanguage value 'sh' (Serbo-Croatian).

*[Other changes]*

[Search for instances of "Suppress-Script" (just as a place to find where
field descriptions are) and make an addition of "Macrolanguage" if

Ltru mailing list