[Ltru] Macrolanguage and extlang
"Mark Davis" <mark.davis@icu-project.org> Sat, 14 July 2007 01:06 UTC
Return-path: <ltru-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1I9W5K-0000zd-Us; Fri, 13 Jul 2007 21:06:10 -0400
Received: from ltru by megatron.ietf.org with local (Exim 4.43) id 1I9W5J-0000s2-Hk for ltru-confirm+ok@megatron.ietf.org; Fri, 13 Jul 2007 21:06:09 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1I9W5I-0000or-Pd for ltru@ietf.org; Fri, 13 Jul 2007 21:06:08 -0400
Received: from wa-out-1112.google.com ([209.85.146.177]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1I9W5E-0007VT-13 for ltru@ietf.org; Fri, 13 Jul 2007 21:06:08 -0400
Received: by wa-out-1112.google.com with SMTP id k17so914721waf for <ltru@ietf.org>; Fri, 13 Jul 2007 18:06:03 -0700 (PDT)
DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:mime-version:content-type:x-google-sender-auth; b=ML5AyyrofsZIxyaffFhqUZeWJ+jwV/2lr0nT0tM1skbhBRdlejL7MuxfiqZCNyLn8u8aWUbN3SrwaPlaCtaEW6sIRlYBJzWfyVPoNm9V9WqMjSjoIvsWnL7bORYtqOgUVrUHkwkA8MJd4FACZjvQx2kA9rvYQgEzLzVfklA2LM4=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:mime-version:content-type:x-google-sender-auth; b=JEhZw2s2CLluogp4knlgAj5OwwJyMSPrMhH+4moZ3f69YcqMoKmC2xA/frEpG5FS/XZos8MVmgd7NwGOrcWJbhNz7jHU0PBeELqeAD4vkyjJ9oK2X/ExoyB9yIyvggiJjSS/+LGj/et+fRj8y5pVOm6S0RqOdCf/woQ54VNfNLw=
Received: by 10.115.23.12 with SMTP id a12mr2141665waj.1184375163321; Fri, 13 Jul 2007 18:06:03 -0700 (PDT)
Received: by 10.114.196.12 with HTTP; Fri, 13 Jul 2007 18:06:03 -0700 (PDT)
Message-ID: <30b660a20707131806o19919cc7v97cc82f3eada43ff@mail.gmail.com>
Date: Fri, 13 Jul 2007 18:06:03 -0700
From: Mark Davis <mark.davis@icu-project.org>
To: LTRU Working Group <ltru@ietf.org>
MIME-Version: 1.0
X-Google-Sender-Auth: 161503b4f10a478f
X-Spam-Score: 0.3 (/)
X-Scan-Signature: 21bf7a2f1643ae0bf20c1e010766eb78
Subject: [Ltru] Macrolanguage and extlang
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============1523894569=="
Errors-To: ltru-bounces@ietf.org
Addision and I have discussed the issue of extlang and Macrolanguages and are proposing the following text replacing the use of extlang. *[A new section called Macrolanguages: ]* The Macrolanguage field contains a primary language subtag that *encompasses * this subtag. That is, this language is a dialect or sub-language of the Macrolanguage, and is called an *encompassed* subtag. The Macrolanguage value is defined by ISO 639-3. The field can be useful to applications or users when selecting language tags or as additional metadata useful in matching. The Macrolanguage field can only occur in records of type 'language'. Only values assigned by ISO 639-3 will be considered for inclusion. Macrolanguage fields MAY be added via the normal registration process whenever ISO 639-3 defines new values. Macrolanguages are informational, and MAY be removed or changed if ISO 639-3 changes the values. For example, the language subtags 'nb' (Norwegian Bokmal) and 'nn' (Norwegian Nynorsk) has a Macrolanguage entry of 'no' (Norwegian). For more information see [Choice]. *[A new section in tag choice (section 4.1), referenced from the above] * Languages with a Macrolanguage field in the registry sometimes can be usefully referenced using their Macrolanguage. However, the Macrolanguage field doesn't define what the relationship is between the language subtag whose record it appears in and its encompassed language or languages. Nor does it define how the encompassed languages are related to one-another. In some cases, the Macrolanguage has a standard form as well as a variety of less-common dialects. For example, the Macrolanguage 'ar' (Arabic) and the subtag 'arb' (Standard Arabic) generally describe the same language, with other subtags describing less-common local variations. In other cases there is no particular standard form and the encompassed subtags describe specific variations within the parent language. Applications MAY use Macrolanguage information to improve matching or language negotiation. For example, the information that 'sr' and 'hr' share a Macrolanguage expresses a closer relation between those languages than between, say, "sr" and "ma" (Macedonian). It is valid to use either the encompassed language or its Macrolanguage to form language tags. However, many matching applications will not be aware of the relationship between the languages. Care in selecting which subtags are used is crucial to interoperability. In general, use the most specific tag. However, where the standard written form of an encompassed language is captured by the Macrolanguage, the Macrolanguage should still be used for written material. In particular, chinese language(s) and dialects call for special consideration. Because the written form is very similar for most languages having 'zh' as a Macrolanguage (and because historically subtags for the various sub-languages and dialects were not available), languages such as 'yue' (Cantonese) have usually used tags beginning with the subtag 'zh'. This past practice of tagging means that Macrolanguage information is encouraged when searching for content or when providing fallbacks in language negotiation. For example, the information that 'yue' has a macrolangauge of 'zh' could be used in the Lookup algorithm to fallback from a request for "yue-Hans-CN" to "zh-Hans-CN" without losing the script and region information (even though the user did not specify "zh-Hans-CN" in their language priority list). However, the Macrolanguage is only one of many additional pieces of information that can be used in matching languages. There are many other circumstances where the "best fit" information is not contained in the language registry. For example, the languages "ro" (Romanian) and "mo" (Moldavian) are very closely related, and so for searching it is often best to treat them as being the same. In other cases, the best fallback for a requested language may be a completely unrelated language, but one that a majority of speakers in the requested language may understand. For example, in a given application the best fallback for "be" (Breton), may be "fr" (French) -- rather than the more closely related "cy" (Welsh) -- because Breton readers are far more likely to be able to read French than Welsh. For more information on matching, see [RFC 4647]. *[In the section talking about updates]* The Macrolanguage field is added whenever a language has a corresponding Macrolanguage in [ISO 639-3]. For example, 'sr' (Serbian) will have the Macrolanguage value 'sh' (Serbo-Croatian). *[Other changes]* [Search for instances of "Suppress-Script" (just as a place to find where field descriptions are) and make an addition of "Macrolanguage" if appropriate, eg in the "LANGUAGE SUBTAG REGISTRATION FORM"] -- Mark
_______________________________________________ Ltru mailing list Ltru@ietf.org https://www1.ietf.org/mailman/listinfo/ltru
- [Ltru] Macrolanguage and extlang Mark Davis
- RE: [Ltru] Macrolanguage and extlang Don Osborn
- Re: [Ltru] Macrolanguage and extlang Mark Davis
- [Ltru] Re: Macrolanguage and extlang Stephane Bortzmeyer
- [Ltru] Re: Macrolanguage and extlang Stephane Bortzmeyer
- Re: [Ltru] Re: Macrolanguage and extlang John Cowan
- RE: [Ltru] Re: Macrolanguage and extlang Kent Karlsson
- Re: [Ltru] Re: Macrolanguage and extlang John Cowan
- [Ltru] Re: Macrolanguage and extlang Doug Ewell