Re: [Ltru] Re: extlang

Addison Phillips <addison@yahoo-inc.com> Mon, 19 March 2007 15:09 UTC

Return-path: <ltru-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1HTJUB-0000iA-8W; Mon, 19 Mar 2007 11:09:23 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1HTJUA-0000aN-Es for ltru@lists.ietf.org; Mon, 19 Mar 2007 11:09:22 -0400
Received: from rsmtp2.corp.yahoo.com ([207.126.228.150]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1HTJU3-0000zT-RQ for ltru@lists.ietf.org; Mon, 19 Mar 2007 11:09:22 -0400
Received: from [10.72.76.247] (snvvpn2-10-72-76-c247.corp.yahoo.com [10.72.76.247]) (authenticated bits=0) by rsmtp2.corp.yahoo.com (8.13.8/8.13.6/y.rout) with ESMTP id l2JF8tQw006300 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 19 Mar 2007 08:08:57 -0700 (PDT)
DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=message-id:date:from:user-agent:mime-version:to:cc:subject: references:in-reply-to:content-type:content-transfer-encoding; b=eW92rBn//Pq1gxrlNIpKJGYirgmhYw0l1UfdpCFYnuIszbv76ImOQwGXWqKL+bkx
Message-ID: <45FEA785.2080003@yahoo-inc.com>
Date: Mon, 19 Mar 2007 08:08:53 -0700
From: Addison Phillips <addison@yahoo-inc.com>
User-Agent: Thunderbird 1.5.0.10 (Windows/20070221)
MIME-Version: 1.0
To: Don Osborn <dzo@bisharat.net>
Subject: Re: [Ltru] Re: extlang
References: <E1HRsNL-0001ob-5h@megatron.ietf.org> <20070316210509.GF17950@mercury.ccil.org> <30b660a20703161537q77fcf86y9c6488e0eb0603b@mail.gmail.com> <45FB2259.7050202@yahoo-inc.com> <30b660a20703161617u85dbfe1r44ddc29fcfcf1a6d@mail.gmail.com> <45FB2C4E.9090303@yahoo-inc.com> <006e01c7682b$f0687b10$d1397130$@net> <004501c768bb$3bc185e0$6401a8c0@DGBP7M81> <00fd01c76914$18377ae0$48a670a0$@net> <45FD1A0A.2EED@xyzzy.claranet.de> <30b660a20703181137y6448508exb3e75f8e21a80a64@mail.gmail.com> <01b801c76990$e3e9b5a0$abbd20e0$@net>
In-Reply-To: <01b801c76990$e3e9b5a0$abbd20e0$@net>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by rsmtp2.corp.yahoo.com id l2JF8tQw006300
X-Spam-Score: -15.0 (---------------)
X-Scan-Signature: cd3fc8e909678b38737fc606dec187f0
Cc: 'Frank Ellermann' <nobody@xyzzy.claranet.de>, ltru@lists.ietf.org
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Errors-To: ltru-bounces@ietf.org

The idea of macro-languages is that they are not, themselves, languages. 
Rather, they are groupings of languages that can be usefully referred to 
collectively. That is, strictly speaking, "Chinese" (meaning "zh") isn't 
a language---and neither is "Arabic" (by which I mean the code "ar").

Unfortunately, that isn't common usage or understanding of the 
situation. To most people, "Arabic" is a language. The idea that it has 
regional or other variations seems natural enough, but not, perhaps, 
that it is a set of somewhat related languages that share a historical 
and/or written tradition.

Allowing both the macro- and plain-language codes on the same level is 
recipe for confusion: does "ar-EG" == "arb-EG" == "arz" == "arz-EG"? 
What is supposed to match? Which tag should be used for a given 
document? How should one distinguish these?

We have a tradition, furthermore, of not having "secret information" in 
language tags requiring extra mapping tables to make sense of or process 
the tags. This tradition is punctuated (and punctured) by an equal 
tradition of assigning "secret" meaning to language tags. Thus, for the 
longest time, "zh-TW" meant "Traditional Chinese".

Extlangs help this a little by avoiding the choice on the initial 
subtag. Thus: "ar-EG" is related to "ar-arz-EG" and "ar-arb-EG" in 
distinct and somewhat logical ways.

Mark's concern is that this tagging system doesn't play nicely with 
basic filtering. He's not cited the fact that we have two filtering 
schemes: extended filtering works more reasonably with extlangs. 
"zh-yue-Hans-CN", "zh-cmn-Hans", and "zh-Hans-CN" all match the range 
"zh-Hans", for example.

The problem with lookup (and I use lookup extensively, so it concerns me 
deeply) might suggest that some extra smarts related to extlangs is 
going to be needed. On the other hand, some of this is going to have to 
be related to Maxim #1: "Tag Content Wisely". Choosing to avoid extlangs 
where they add no distinguishing information (not uncommon in resource 
file lookup for Arabic or Chinese, say) or *consistently* including them 
when they do (Lahnda???) will make things better.

I admit to a good bit of trepidation writing the above, though.

Addison

Don Osborn wrote:
> Thanks Frank for the summary list and Mark for the pointer. Actually I 
> was going to reference that page and ask if it is exactly what is in 
> question.
> 
>  
> 
> Next (trying my luck here), is there any kind of way(s) that these can 
> be subgrouped? For instance ar ends up referring to standard Arabic, if 
> I understand correctly, and zh has been discussed already a lot; but 
> some other macrolanguages do not necessarily have a single standard 
> form. Some (macro) languages have a lot more in writing than others. 
> What I’m getting at is are there different sets of (possible) 
> complications that can be identified for the macrolanguages, languages 
> and extlang relationships, such that the list of 54 can be disaggregated 
> (perhaps in more than one way)?
> 
>  
> 
> Not as if folks don’t have enough else to think about, but it seems like 
> such an analysis, if it hasn’t been done, might raise other productive 
> questions.
> 
>  
> 
> Don
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
> *From:* Mark Davis [mailto:mark.davis@icu-project.org]
> *Sent:* Sunday, March 18, 2007 2:37 PM
> *To:* Frank Ellermann
> *Cc:* ltru@lists.ietf.org
> *Subject:* Re: [Ltru] Re: extlang
> 
>  
> 
> Another way to look at it is on 
> http://www.sil.org/iso639-3/macrolanguages.asp, which provides the 
> language names and breakdowns.
> 
> Mark
> 
> On 3/18/07, *Frank Ellermann* <nobody@xyzzy.claranet.de 
> <mailto:nobody@xyzzy.claranet.de>> wrote:
> 
> Don Osborn wrote:
> 
>  > what is the actual number of (macro)languages where extlang issues arise?
> 
> I try to determine this "manually" for Doug's latest 4645bis:
> There are 480 "Prefix:" lines in the extlang part.
> 
> ar      30              ay       2              az       2
> bal      3              bik      5              bua      3
> chm      2              cr       6              del      2
> den      2              din      5              doi      2
> fa       2              ff       9              gba      5
> gn       5              gon      2              grb      5
> hai      2              hmn     21              ik       2
> iu       2              jrb      5              kg       3
> kok      2              kpe      2              kr       3
> ku       3              kv       2              lah      8
> man      7              mg      10              mn       2
> ms      13              mwr      6              oc       5
> oj       7              om       4              ps       3
> qu      44              raj      6              rom      7
> sc       4              sgn     124             sq       4
> sw       2              syr      2              tnh      4
> uz       2              yi       2              za       2
> zap     58              zh      13              zza      2
> 
> That's 54 at the moment.
> 
> Frank
> 
> 
> 
> _______________________________________________
> Ltru mailing list
> Ltru@ietf.org <mailto:Ltru@ietf.org>
> https://www1.ietf.org/mailman/listinfo/ltru
> 
> 
> 
> 
> -- 
> Mark
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Ltru mailing list
> Ltru@ietf.org
> https://www1.ietf.org/mailman/listinfo/ltru

-- 
Addison Phillips
Globalization Architect -- Yahoo! Inc.

Internationalization is an architecture.
It is not a feature.

_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www1.ietf.org/mailman/listinfo/ltru