[Ltru] Macrolanguage usage

"Mark Davis" <mark.davis@icu-project.org> Thu, 15 May 2008 15:29 UTC

Return-Path: <ltru-bounces@ietf.org>
X-Original-To: ltru-archive@megatron.ietf.org
Delivered-To: ietfarch-ltru-archive@core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 49E0E3A6879; Thu, 15 May 2008 08:29:47 -0700 (PDT)
X-Original-To: ltru@core3.amsl.com
Delivered-To: ltru@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 4B9643A6879 for <ltru@core3.amsl.com>; Thu, 15 May 2008 08:29:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.982
X-Spam-Level:
X-Spam-Status: No, score=-1.982 tagged_above=-999 required=5 tests=[AWL=-0.006, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DImklAXEh0jy for <ltru@core3.amsl.com>; Thu, 15 May 2008 08:29:38 -0700 (PDT)
Received: from yw-out-2324.google.com (yw-out-2324.google.com [74.125.46.31]) by core3.amsl.com (Postfix) with ESMTP id 06E923A6801 for <ltru@ietf.org>; Thu, 15 May 2008 08:29:37 -0700 (PDT)
Received: by yw-out-2324.google.com with SMTP id 3so246964ywj.49 for <ltru@ietf.org>; Thu, 15 May 2008 08:29:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:mime-version:content-type:x-google-sender-auth; bh=1qdJ1C81yORnMusAJvOoR4VLKo78+AteiZ5hrRc7tFE=; b=DyhPG3HdtW2cxgY+E8goLWANfH53OiM9R42bA4TTlOk4KVryAAx3xwKlq15/FqAxS4DKOXwMAbta9Q4pKiZ9+8MQe2QqO7IUlqfZLC3Ufxo5A9wOYgsKOLJuhDVCjp2+K5N2bNqiAlJ8oHTH9Z+hOMQ5SXqbG94eP2KykOap6U8=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:mime-version:content-type:x-google-sender-auth; b=OEg2aM467EeQU9Xkw/B04ZoCOjcl7KuUNZSWUYrsYe20FJCn/NRZOAie0yQ1Gv32D6Xes8m0UJmbinlmCiJOkKAKX9Pw4M5+ccT5+7GF35YgX7ItIiXOQDzsBUtAXjqafjW/e4MX8HxsgtRQR3xooGDiRtcbNHY6QV2VwLc82VQ=
Received: by 10.150.69.5 with SMTP id r5mr2415065yba.96.1210865343449; Thu, 15 May 2008 08:29:03 -0700 (PDT)
Received: by 10.150.206.3 with HTTP; Thu, 15 May 2008 08:29:03 -0700 (PDT)
Message-ID: <30b660a20805150829hda2c1e4p114504a973843543@mail.gmail.com>
Date: Thu, 15 May 2008 08:29:03 -0700
From: Mark Davis <mark.davis@icu-project.org>
To: LTRU Working Group <ltru@ietf.org>
MIME-Version: 1.0
X-Google-Sender-Auth: 9d5ccf2954e94e2e
Subject: [Ltru] Macrolanguage usage
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============2001247018=="
Sender: ltru-bounces@ietf.org
Errors-To: ltru-bounces@ietf.org

Peter Constable and I were at the Unicode Technical Committee meeting, and
had a chance to talk about macrolanguages. One of the key points is to
provide implementers with enough guidance as to what they should do, while
not precluding reasonable alternatives. Here's what we were thinking about
(not in formal language, but the points we wanted to make).

===

Formally, a macrolanguage identifier could be used to tag or look up content
in any encompassed language; alternately, specific, individual-language
identifiers could be used to tag or look up content in those languages. In
consideration of these alternatives, the following provides guidance for how
to maintain backwards compatibility while giving the ability to clearly tag
and lookup content.

1. Implementations generally should tag and/or lookup content with the
specific language where possible (this is the general recommendation with
language tags). In the case of macrolanguages, this means that Cantonese
should be tagged and looked up with "yue", Hakka with "hak", Tajiki Arabic
with "abh", Plains Cree with "crk", and so on.

2. An exception to this general recommendation may apply in the case of
macrolanguages with predominant forms, listed in Table 8. For backwards
compatibility in those cases:

   - an implementation could tag and/or lookup content in the predominant
   language either with the macrolanguage or the encompassed language. (eg
   either "ar" or "arb" for Standard Arabic).
   - an implementation could make a distinction between these in lookup, or
   could return the same content. That is, lookup for "zh-SG" and "cmn-SG" may
   return the same content, or may return different content if the
   implementation needs to make a distinction for some purpose.
   - where content written in an encompassed language is also understandable
   in the predominant language (that being a distinct language encompassed by
   the same macrolanguage), the content could also be tagged with the
   macrolanguage identifier. Thus if a Cantonese passage is understandable if
   read as Mandarin, it could also be tagged with "zh", or where a Tajiki
   Arabic passage is also understandable in Standard Arabic it could be tagged
   with "ar".

3. Another exception to this general recommendation applies in the case of
applications that have limitations that exclude the identifiers for
encompassed, individual languages of a macrolanguage. For example, some
content cataloguing systems limit language identifiers to those in ISO
639-2; as a result, they may support a macrolanguage identifier but not the
identifiers for the encompassed languages of that macrolanguage.

Mark and Peter

--
Mark
_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www.ietf.org/mailman/listinfo/ltru