Re: [Ltru] Re: extlang
Addison Phillips <addison@yahoo-inc.com> Mon, 19 March 2007 15:09 UTC
Return-path: <ltru-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1HTJUB-0000iA-8W; Mon, 19 Mar 2007 11:09:23 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1HTJUA-0000aN-Es for ltru@lists.ietf.org; Mon, 19 Mar 2007 11:09:22 -0400
Received: from rsmtp2.corp.yahoo.com ([207.126.228.150]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1HTJU3-0000zT-RQ for ltru@lists.ietf.org; Mon, 19 Mar 2007 11:09:22 -0400
Received: from [10.72.76.247] (snvvpn2-10-72-76-c247.corp.yahoo.com [10.72.76.247]) (authenticated bits=0) by rsmtp2.corp.yahoo.com (8.13.8/8.13.6/y.rout) with ESMTP id l2JF8tQw006300 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 19 Mar 2007 08:08:57 -0700 (PDT)
DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=message-id:date:from:user-agent:mime-version:to:cc:subject: references:in-reply-to:content-type:content-transfer-encoding; b=eW92rBn//Pq1gxrlNIpKJGYirgmhYw0l1UfdpCFYnuIszbv76ImOQwGXWqKL+bkx
Message-ID: <45FEA785.2080003@yahoo-inc.com>
Date: Mon, 19 Mar 2007 08:08:53 -0700
From: Addison Phillips <addison@yahoo-inc.com>
User-Agent: Thunderbird 1.5.0.10 (Windows/20070221)
MIME-Version: 1.0
To: Don Osborn <dzo@bisharat.net>
Subject: Re: [Ltru] Re: extlang
References: <E1HRsNL-0001ob-5h@megatron.ietf.org> <20070316210509.GF17950@mercury.ccil.org> <30b660a20703161537q77fcf86y9c6488e0eb0603b@mail.gmail.com> <45FB2259.7050202@yahoo-inc.com> <30b660a20703161617u85dbfe1r44ddc29fcfcf1a6d@mail.gmail.com> <45FB2C4E.9090303@yahoo-inc.com> <006e01c7682b$f0687b10$d1397130$@net> <004501c768bb$3bc185e0$6401a8c0@DGBP7M81> <00fd01c76914$18377ae0$48a670a0$@net> <45FD1A0A.2EED@xyzzy.claranet.de> <30b660a20703181137y6448508exb3e75f8e21a80a64@mail.gmail.com> <01b801c76990$e3e9b5a0$abbd20e0$@net>
In-Reply-To: <01b801c76990$e3e9b5a0$abbd20e0$@net>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by rsmtp2.corp.yahoo.com id l2JF8tQw006300
X-Spam-Score: -15.0 (---------------)
X-Scan-Signature: cd3fc8e909678b38737fc606dec187f0
Cc: 'Frank Ellermann' <nobody@xyzzy.claranet.de>, ltru@lists.ietf.org
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Errors-To: ltru-bounces@ietf.org
The idea of macro-languages is that they are not, themselves, languages. Rather, they are groupings of languages that can be usefully referred to collectively. That is, strictly speaking, "Chinese" (meaning "zh") isn't a language---and neither is "Arabic" (by which I mean the code "ar"). Unfortunately, that isn't common usage or understanding of the situation. To most people, "Arabic" is a language. The idea that it has regional or other variations seems natural enough, but not, perhaps, that it is a set of somewhat related languages that share a historical and/or written tradition. Allowing both the macro- and plain-language codes on the same level is recipe for confusion: does "ar-EG" == "arb-EG" == "arz" == "arz-EG"? What is supposed to match? Which tag should be used for a given document? How should one distinguish these? We have a tradition, furthermore, of not having "secret information" in language tags requiring extra mapping tables to make sense of or process the tags. This tradition is punctuated (and punctured) by an equal tradition of assigning "secret" meaning to language tags. Thus, for the longest time, "zh-TW" meant "Traditional Chinese". Extlangs help this a little by avoiding the choice on the initial subtag. Thus: "ar-EG" is related to "ar-arz-EG" and "ar-arb-EG" in distinct and somewhat logical ways. Mark's concern is that this tagging system doesn't play nicely with basic filtering. He's not cited the fact that we have two filtering schemes: extended filtering works more reasonably with extlangs. "zh-yue-Hans-CN", "zh-cmn-Hans", and "zh-Hans-CN" all match the range "zh-Hans", for example. The problem with lookup (and I use lookup extensively, so it concerns me deeply) might suggest that some extra smarts related to extlangs is going to be needed. On the other hand, some of this is going to have to be related to Maxim #1: "Tag Content Wisely". Choosing to avoid extlangs where they add no distinguishing information (not uncommon in resource file lookup for Arabic or Chinese, say) or *consistently* including them when they do (Lahnda???) will make things better. I admit to a good bit of trepidation writing the above, though. Addison Don Osborn wrote: > Thanks Frank for the summary list and Mark for the pointer. Actually I > was going to reference that page and ask if it is exactly what is in > question. > > > > Next (trying my luck here), is there any kind of way(s) that these can > be subgrouped? For instance ar ends up referring to standard Arabic, if > I understand correctly, and zh has been discussed already a lot; but > some other macrolanguages do not necessarily have a single standard > form. Some (macro) languages have a lot more in writing than others. > What I’m getting at is are there different sets of (possible) > complications that can be identified for the macrolanguages, languages > and extlang relationships, such that the list of 54 can be disaggregated > (perhaps in more than one way)? > > > > Not as if folks don’t have enough else to think about, but it seems like > such an analysis, if it hasn’t been done, might raise other productive > questions. > > > > Don > > > > > > > > > > > > *From:* Mark Davis [mailto:mark.davis@icu-project.org] > *Sent:* Sunday, March 18, 2007 2:37 PM > *To:* Frank Ellermann > *Cc:* ltru@lists.ietf.org > *Subject:* Re: [Ltru] Re: extlang > > > > Another way to look at it is on > http://www.sil.org/iso639-3/macrolanguages.asp, which provides the > language names and breakdowns. > > Mark > > On 3/18/07, *Frank Ellermann* <nobody@xyzzy.claranet.de > <mailto:nobody@xyzzy.claranet.de>> wrote: > > Don Osborn wrote: > > > what is the actual number of (macro)languages where extlang issues arise? > > I try to determine this "manually" for Doug's latest 4645bis: > There are 480 "Prefix:" lines in the extlang part. > > ar 30 ay 2 az 2 > bal 3 bik 5 bua 3 > chm 2 cr 6 del 2 > den 2 din 5 doi 2 > fa 2 ff 9 gba 5 > gn 5 gon 2 grb 5 > hai 2 hmn 21 ik 2 > iu 2 jrb 5 kg 3 > kok 2 kpe 2 kr 3 > ku 3 kv 2 lah 8 > man 7 mg 10 mn 2 > ms 13 mwr 6 oc 5 > oj 7 om 4 ps 3 > qu 44 raj 6 rom 7 > sc 4 sgn 124 sq 4 > sw 2 syr 2 tnh 4 > uz 2 yi 2 za 2 > zap 58 zh 13 zza 2 > > That's 54 at the moment. > > Frank > > > > _______________________________________________ > Ltru mailing list > Ltru@ietf.org <mailto:Ltru@ietf.org> > https://www1.ietf.org/mailman/listinfo/ltru > > > > > -- > Mark > > > ------------------------------------------------------------------------ > > _______________________________________________ > Ltru mailing list > Ltru@ietf.org > https://www1.ietf.org/mailman/listinfo/ltru -- Addison Phillips Globalization Architect -- Yahoo! Inc. Internationalization is an architecture. It is not a feature. _______________________________________________ Ltru mailing list Ltru@ietf.org https://www1.ietf.org/mailman/listinfo/ltru
- [Ltru] Punjabi Mark Davis
- RE: [Ltru] Punjabi Don Osborn
- RE: [Ltru] Punjabi Peter Constable
- Re: [Ltru] Punjabi Mark Davis
- Re: [Ltru] Punjabi John Cowan
- RE: [Ltru] Punjabi Peter Constable
- [Ltru] Re: Punjabi Doug Ewell
- RE: [Ltru] Re: Punjabi Peter Constable
- [Ltru] Re: [everson@evertype.com: The Language Su… Doug Ewell
- RE: [Ltru] Punjabi Don Osborn
- Re: [Ltru] Re: [everson@evertype.com: The Languag… Addison Phillips
- Re: [Ltru] Punjabi Mark Davis
- RE: [Ltru] Punjabi Peter Constable
- RE: [Ltru] Punjabi Sukhjinder Sidhu
- RE: [Ltru] Punjabi Sarmad Hussain, Dr.
- Re: [Ltru] Punjabi John Cowan
- Re: [Ltru] Punjabi sukhjinder_sidhu
- Re: [Ltru] Punjabi sukhjinder_sidhu
- Re: [Ltru] Punjabi sukhjinder_sidhu
- Fwd: [Ltru] Punjabi Mark Davis
- [Ltru] Re: Punjabi Doug Ewell
- [Ltru] Punjabi Abbas Malik
- [Ltru] Re: Punjabi John Cowan
- [Ltru] extlang (was: Punjabi) Frank Ellermann
- Re: [Ltru] Punjabi Mark Davis
- Re: [Ltru] Punjabi sukhjinder_sidhu
- Re: [Ltru] Re: Punjabi Mark Davis
- Re: [Ltru] Re: Punjabi John Cowan
- Re: [Ltru] Re: Punjabi Mark Davis
- Re: [Ltru] Re: Punjabi Addison Phillips
- Re: [Ltru] Re: Punjabi Mark Davis
- Re: [Ltru] Re: Punjabi Addison Phillips
- RE: [Ltru] Re: Punjabi Don Osborn
- Re: [Ltru] Re: Punjabi Mark Davis
- RE: [Ltru] Re: Punjabi Peter Constable
- [Ltru] Re: Punjabi Doug Ewell
- Re: [Ltru] Re: Punjabi Doug Ewell
- Re: [Ltru] Re: Punjabi Doug Ewell
- RE: [Ltru] extlang (was: Punjabi) Don Osborn
- [Ltru] Re: extlang Frank Ellermann
- Re: [Ltru] Re: extlang Mark Davis
- RE: [Ltru] Re: extlang Don Osborn
- Re: [Ltru] Re: extlang Addison Phillips
- Re: [Ltru] Re: extlang Mark Davis
- Re: [Ltru] Re: extlang John Cowan
- Re: [Ltru] Re: extlang Addison Phillips
- RE: [Ltru] Re: extlang Don Osborn
- Re: [Ltru] Re: extlang GerardM
- RE: [Ltru] Re: extlang Don Osborn
- [Ltru] Re: extlang Stephane Bortzmeyer
- RE: [Ltru] Re: extlang Peter Constable
- Re: [Ltru] Re: extlang Marion Gunn
- RE: [Ltru] Re: extlang Peter Constable
- Re: [Ltru] Re: extlang Addison Phillips
- VS: [Ltru] Re: extlang Erkki I. Kolehmainen
- RE: [Ltru] Re: extlang Don Osborn
- Re: [Ltru] Re: extlang Mark Davis
- Re: [Ltru] Re: extlang John Cowan
- Re: [Ltru] Re: extlang Addison Phillips