Re: [Ltru] Re: extlang
"Mark Davis" <mark.davis@icu-project.org> Mon, 19 March 2007 16:10 UTC
Return-path: <ltru-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1HTKRK-00055E-Db; Mon, 19 Mar 2007 12:10:30 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1HTKRJ-000554-4o for ltru@lists.ietf.org; Mon, 19 Mar 2007 12:10:29 -0400
Received: from an-out-0708.google.com ([209.85.132.244]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1HTKRE-00021U-Br for ltru@lists.ietf.org; Mon, 19 Mar 2007 12:10:29 -0400
Received: by an-out-0708.google.com with SMTP id c18so1338330anc for <ltru@lists.ietf.org>; Mon, 19 Mar 2007 09:10:23 -0700 (PDT)
DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=ms5NjgPrsNFh+iSI9cjesjD6qIU0fma8khyCcLrj2FF0paGiD/z5BW5cndtMuoxRUUDODOd1aRJtxX0RzGh/+4l7i1woM4Lm14GzictBO3b79c1vHOcAwrByG1jY5HcxptprhYah1WzV73Jp4v9pB8G2+o1TD//FJ4MPp1UeZe0=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=rLgRHMX9i7pTmDWYpHYedHCPYOvKPmf9HR/O4QuergfYkDzBNuFf9OhCHJhVuz0ad0OvUNdaMEEJm3m3Y0u+eWLJBJ1CWkAl1glT7mOAD1ile3zGGMzZdX03/9/RRcFBZ39mWujkAzSteIsnaVSi/1tvOlvh9CCm0cPJ8i0No6Y=
Received: by 10.100.143.1 with SMTP id q1mr3873083and.1174320623399; Mon, 19 Mar 2007 09:10:23 -0700 (PDT)
Received: by 10.114.196.2 with HTTP; Mon, 19 Mar 2007 09:10:23 -0700 (PDT)
Message-ID: <30b660a20703190910u636658b1g56489b0d30d2333a@mail.gmail.com>
Date: Mon, 19 Mar 2007 09:10:23 -0700
From: Mark Davis <mark.davis@icu-project.org>
To: Addison Phillips <addison@yahoo-inc.com>
Subject: Re: [Ltru] Re: extlang
In-Reply-To: <45FEA785.2080003@yahoo-inc.com>
MIME-Version: 1.0
References: <E1HRsNL-0001ob-5h@megatron.ietf.org> <30b660a20703161617u85dbfe1r44ddc29fcfcf1a6d@mail.gmail.com> <45FB2C4E.9090303@yahoo-inc.com> <006e01c7682b$f0687b10$d1397130$@net> <004501c768bb$3bc185e0$6401a8c0@DGBP7M81> <00fd01c76914$18377ae0$48a670a0$@net> <45FD1A0A.2EED@xyzzy.claranet.de> <30b660a20703181137y6448508exb3e75f8e21a80a64@mail.gmail.com> <01b801c76990$e3e9b5a0$abbd20e0$@net> <45FEA785.2080003@yahoo-inc.com>
X-Google-Sender-Auth: 1c520ffd384ea4b5
X-Spam-Score: 0.1 (/)
X-Scan-Signature: 32604d42645517c44d778f1d111b40a6
Cc: Frank Ellermann <nobody@xyzzy.claranet.de>, ltru@lists.ietf.org
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============0990807351=="
Errors-To: ltru-bounces@ietf.org
I see it a somewhat different way. The fact that there is a macro language (zh), should not skew the way we encode an individual language (yue). You say: "Extlangs help this a little by avoiding the choice on the initial subtag. Thus: "ar-EG" is related to "ar-arz-EG" and "ar-arb-EG" in distinct and somewhat logical ways." However, functionally, we have to see advantage in subordinating some languages as extlangs. There isn't value if they just add complication, as per your "The problem with lookup (and I use lookup extensively, so it concerns me deeply) might suggest that some extra smarts related to extlangs is going to be needed." In order to have a good case for the extlang model, we would need to see concrete scenarios where we can demonstrate that "zh-cmn" and "zh-yue" work better than "cmn" and "yue" resp., and demonstration that those are more important than the scenarios where they cause problems. Note that ISO 639-3 does not at all force the use of the extlang model; the extlang model is just one possible way of expressing the information in ISO 639-3. We already have macrolanguages and "subordinate" languages in BCP 47, but we put them at the same level. For example, ISO 639-3 already categorizes no and sh as macrolanguages, and nb, no and sr, hr, and bs as subordinate to them. We are not going to (and cannot) be forcing users to encode nb as no-nb, nor sr as sh-sr. We shouldn't be making Cantonese a subordinate language either. At one point, I did think that having the extlang structure would be better, but the more I get into actually implementing them, the more I find that they are just a complication for no good result. What I think we should instead be doing is adding a field to the registry that says that X is a macro language for Y, and adding information to 4647 that indicates how one can make use of this information in matching. That would also extend to the current situation with no and sh in a uniform manner. It is also a much less fragile mechanism: one can add more such X,Y relations over time without gumming up everything. Mark On 3/19/07, Addison Phillips <addison@yahoo-inc.com> wrote: > > The idea of macro-languages is that they are not, themselves, languages. > Rather, they are groupings of languages that can be usefully referred to > collectively. That is, strictly speaking, "Chinese" (meaning "zh") isn't > a language---and neither is "Arabic" (by which I mean the code "ar"). > > Unfortunately, that isn't common usage or understanding of the > situation. To most people, "Arabic" is a language. The idea that it has > regional or other variations seems natural enough, but not, perhaps, > that it is a set of somewhat related languages that share a historical > and/or written tradition. > > Allowing both the macro- and plain-language codes on the same level is > recipe for confusion: does "ar-EG" == "arb-EG" == "arz" == "arz-EG"? > What is supposed to match? Which tag should be used for a given > document? How should one distinguish these? > > We have a tradition, furthermore, of not having "secret information" in > language tags requiring extra mapping tables to make sense of or process > the tags. This tradition is punctuated (and punctured) by an equal > tradition of assigning "secret" meaning to language tags. Thus, for the > longest time, "zh-TW" meant "Traditional Chinese". > > Extlangs help this a little by avoiding the choice on the initial > subtag. Thus: "ar-EG" is related to "ar-arz-EG" and "ar-arb-EG" in > distinct and somewhat logical ways. > > Mark's concern is that this tagging system doesn't play nicely with > basic filtering. He's not cited the fact that we have two filtering > schemes: extended filtering works more reasonably with extlangs. > "zh-yue-Hans-CN", "zh-cmn-Hans", and "zh-Hans-CN" all match the range > "zh-Hans", for example. > > The problem with lookup (and I use lookup extensively, so it concerns me > deeply) might suggest that some extra smarts related to extlangs is > going to be needed. On the other hand, some of this is going to have to > be related to Maxim #1: "Tag Content Wisely". Choosing to avoid extlangs > where they add no distinguishing information (not uncommon in resource > file lookup for Arabic or Chinese, say) or *consistently* including them > when they do (Lahnda???) will make things better. > > I admit to a good bit of trepidation writing the above, though. > > Addison > > Don Osborn wrote: > > Thanks Frank for the summary list and Mark for the pointer. Actually I > > was going to reference that page and ask if it is exactly what is in > > question. > > > > > > > > Next (trying my luck here), is there any kind of way(s) that these can > > be subgrouped? For instance ar ends up referring to standard Arabic, if > > I understand correctly, and zh has been discussed already a lot; but > > some other macrolanguages do not necessarily have a single standard > > form. Some (macro) languages have a lot more in writing than others. > > What I'm getting at is are there different sets of (possible) > > complications that can be identified for the macrolanguages, languages > > and extlang relationships, such that the list of 54 can be disaggregated > > (perhaps in more than one way)? > > > > > > > > Not as if folks don't have enough else to think about, but it seems like > > such an analysis, if it hasn't been done, might raise other productive > > questions. > > > > > > > > Don > > > > > > > > > > > > > > > > > > > > > > > > *From:* Mark Davis [mailto:mark.davis@icu-project.org] > > *Sent:* Sunday, March 18, 2007 2:37 PM > > *To:* Frank Ellermann > > *Cc:* ltru@lists.ietf.org > > *Subject:* Re: [Ltru] Re: extlang > > > > > > > > Another way to look at it is on > > http://www.sil.org/iso639-3/macrolanguages.asp, which provides the > > language names and breakdowns. > > > > Mark > > > > On 3/18/07, *Frank Ellermann* <nobody@xyzzy.claranet.de > > <mailto:nobody@xyzzy.claranet.de>> wrote: > > > > Don Osborn wrote: > > > > > what is the actual number of (macro)languages where extlang issues > arise? > > > > I try to determine this "manually" for Doug's latest 4645bis: > > There are 480 "Prefix:" lines in the extlang part. > > > > ar 30 ay 2 az 2 > > bal 3 bik 5 bua 3 > > chm 2 cr 6 del 2 > > den 2 din 5 doi 2 > > fa 2 ff 9 gba 5 > > gn 5 gon 2 grb 5 > > hai 2 hmn 21 ik 2 > > iu 2 jrb 5 kg 3 > > kok 2 kpe 2 kr 3 > > ku 3 kv 2 lah 8 > > man 7 mg 10 mn 2 > > ms 13 mwr 6 oc 5 > > oj 7 om 4 ps 3 > > qu 44 raj 6 rom 7 > > sc 4 sgn 124 sq 4 > > sw 2 syr 2 tnh 4 > > uz 2 yi 2 za 2 > > zap 58 zh 13 zza 2 > > > > That's 54 at the moment. > > > > Frank > > > > > > > > _______________________________________________ > > Ltru mailing list > > Ltru@ietf.org <mailto:Ltru@ietf.org> > > https://www1.ietf.org/mailman/listinfo/ltru > > > > > > > > > > -- > > Mark > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Ltru mailing list > > Ltru@ietf.org > > https://www1.ietf.org/mailman/listinfo/ltru > > -- > Addison Phillips > Globalization Architect -- Yahoo! Inc. > > Internationalization is an architecture. > It is not a feature. > -- Mark
_______________________________________________ Ltru mailing list Ltru@ietf.org https://www1.ietf.org/mailman/listinfo/ltru
- [Ltru] Punjabi Mark Davis
- RE: [Ltru] Punjabi Don Osborn
- RE: [Ltru] Punjabi Peter Constable
- Re: [Ltru] Punjabi Mark Davis
- Re: [Ltru] Punjabi John Cowan
- RE: [Ltru] Punjabi Peter Constable
- [Ltru] Re: Punjabi Doug Ewell
- RE: [Ltru] Re: Punjabi Peter Constable
- [Ltru] Re: [everson@evertype.com: The Language Su… Doug Ewell
- RE: [Ltru] Punjabi Don Osborn
- Re: [Ltru] Re: [everson@evertype.com: The Languag… Addison Phillips
- Re: [Ltru] Punjabi Mark Davis
- RE: [Ltru] Punjabi Peter Constable
- RE: [Ltru] Punjabi Sukhjinder Sidhu
- RE: [Ltru] Punjabi Sarmad Hussain, Dr.
- Re: [Ltru] Punjabi John Cowan
- Re: [Ltru] Punjabi sukhjinder_sidhu
- Re: [Ltru] Punjabi sukhjinder_sidhu
- Re: [Ltru] Punjabi sukhjinder_sidhu
- Fwd: [Ltru] Punjabi Mark Davis
- [Ltru] Re: Punjabi Doug Ewell
- [Ltru] Punjabi Abbas Malik
- [Ltru] Re: Punjabi John Cowan
- [Ltru] extlang (was: Punjabi) Frank Ellermann
- Re: [Ltru] Punjabi Mark Davis
- Re: [Ltru] Punjabi sukhjinder_sidhu
- Re: [Ltru] Re: Punjabi Mark Davis
- Re: [Ltru] Re: Punjabi John Cowan
- Re: [Ltru] Re: Punjabi Mark Davis
- Re: [Ltru] Re: Punjabi Addison Phillips
- Re: [Ltru] Re: Punjabi Mark Davis
- Re: [Ltru] Re: Punjabi Addison Phillips
- RE: [Ltru] Re: Punjabi Don Osborn
- Re: [Ltru] Re: Punjabi Mark Davis
- RE: [Ltru] Re: Punjabi Peter Constable
- [Ltru] Re: Punjabi Doug Ewell
- Re: [Ltru] Re: Punjabi Doug Ewell
- Re: [Ltru] Re: Punjabi Doug Ewell
- RE: [Ltru] extlang (was: Punjabi) Don Osborn
- [Ltru] Re: extlang Frank Ellermann
- Re: [Ltru] Re: extlang Mark Davis
- RE: [Ltru] Re: extlang Don Osborn
- Re: [Ltru] Re: extlang Addison Phillips
- Re: [Ltru] Re: extlang Mark Davis
- Re: [Ltru] Re: extlang John Cowan
- Re: [Ltru] Re: extlang Addison Phillips
- RE: [Ltru] Re: extlang Don Osborn
- Re: [Ltru] Re: extlang GerardM
- RE: [Ltru] Re: extlang Don Osborn
- [Ltru] Re: extlang Stephane Bortzmeyer
- RE: [Ltru] Re: extlang Peter Constable
- Re: [Ltru] Re: extlang Marion Gunn
- RE: [Ltru] Re: extlang Peter Constable
- Re: [Ltru] Re: extlang Addison Phillips
- VS: [Ltru] Re: extlang Erkki I. Kolehmainen
- RE: [Ltru] Re: extlang Don Osborn
- Re: [Ltru] Re: extlang Mark Davis
- Re: [Ltru] Re: extlang John Cowan
- Re: [Ltru] Re: extlang Addison Phillips