RE: [Ltru] Extended language tags (long reply)
"Debbie Garside" <debbie@ictmarketing.co.uk> Wed, 10 October 2007 16:25 UTC
Return-path: <ltru-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1IfeNH-0002e7-GL; Wed, 10 Oct 2007 12:25:31 -0400
Received: from ltru by megatron.ietf.org with local (Exim 4.43) id 1IfeNH-0002e1-0O for ltru-confirm+ok@megatron.ietf.org; Wed, 10 Oct 2007 12:25:31 -0400
Received: from [10.90.34.44] (helo=chiedprmail1.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1IfeNG-0002dW-KG for ltru@ietf.org; Wed, 10 Oct 2007 12:25:30 -0400
Received: from 132.nexbyte.net ([62.197.41.132] helo=mx1.nexbyte.net) by chiedprmail1.ietf.org with esmtp (Exim 4.43) id 1IfeN7-0001bj-EY for ltru@ietf.org; Wed, 10 Oct 2007 12:25:23 -0400
Received: from 145.nexbyte.net ([62.197.41.145]) by mx1.nexbyte.net (mx1.nexbyte.net [62.197.41.132]) (MDaemon PRO v9.6.2) with ESMTP id md50007324662.msg for <ltru@ietf.org>; Wed, 10 Oct 2007 17:28:49 +0100
Received: from CPQ86763045110 ([83.67.121.192]) by 145.nexbyte.net with MailEnable ESMTP; Wed, 10 Oct 2007 17:25:22 +0100
From: Debbie Garside <debbie@ictmarketing.co.uk>
To: mark.davis@icu-project.org, 'Addison Phillips' <addison@yahoo-inc.com>
References: <E1IdT7z-0001vv-Ly@megatron.ietf.org><C9BF0238EED3634BA1866AEF14C7A9E55A597AC370@NA-EXMSG-C116.redmond.corp.microsoft.com><4709146F.6020504@yahoo-inc.com><9d70cb000710071715p398a669fhd06326843d9d9390@mail.gmail.com><30b660a20710071740ma6d39a3u61c8543c70125847@mail.gmail.com><4709A420.80508@yahoo-inc.com> <30b660a20710100855g5130486awf10f33d3d31fb891@mail.gmail.com>
Subject: RE: [Ltru] Extended language tags (long reply)
Date: Wed, 10 Oct 2007 17:24:09 +0100
Message-ID: <059801c80b59$fc7d15b0$0d00a8c0@CPQ86763045110>
MIME-Version: 1.0
X-Mailer: Microsoft Office Outlook 11
In-Reply-To: <30b660a20710100855g5130486awf10f33d3d31fb891@mail.gmail.com>
Thread-Index: AcgLV7LOyCsgsFTsREyZMdvpTIYCjQAAQvJA
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138
X-Spam-Processed: mx1.nexbyte.net, Wed, 10 Oct 2007 17:28:49 +0100 (not processed: message from valid local sender)
X-MDRemoteIP: 62.197.41.145
X-Return-Path: prvs=18038546a2=debbie@ictmarketing.co.uk
X-Envelope-From: debbie@ictmarketing.co.uk
X-MDaemon-Deliver-To: ltru@ietf.org
X-MDAV-Processed: mx1.nexbyte.net, Wed, 10 Oct 2007 17:28:50 +0100
X-Spam-Score: 0.0 (/)
X-Scan-Signature: e52c6009a9b39871b75233310d7f3490
Cc: ltru@ietf.org
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: debbie@ictmarketing.co.uk
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============0857091542=="
Errors-To: ltru-bounces@ietf.org
Hi Mark I haven't time at the moment to put together the fallback for this (as it would take me ages) but I think fas-prs and fas-pes are good cases for macrolanguages to be included as extended language tags. These languages are mutually intelligible and to be able to choose prs and have a system create a fallback based on fas would be a huge bonus. Implementations should have the extended language tags option and end users should be presented with a question asking if they want macrolanguage related languages matched IMHO. It is not just about whether they understand the macrolanguage it is whether a user wants to return the related languages within the macrolanguage; I do understand that the end user would not want this in many cases but where is the harm in the provision? If I was querying using prs I would certainly want to have pes and fas tagged items returned as a secondary match. Best regards Debbie _____ From: Mark Davis [mailto:mark.davis@icu-project.org] Sent: 10 October 2007 16:55 To: Addison Phillips Cc: ltru@ietf.org Subject: Re: [Ltru] Extended language tags (long reply) Some detailed comments on these scenarios. On 10/7/07, Addison Phillips <addison@yahoo-inc.com> wrote: I would say: 1. With Extlangs. - change to filtering: none, but you probably want to use extended filtering instead of basic filtering (i.e. "zh-Hant-HK" matches "zh-yue-Hant-HK" and "zh-cmn-Hant-HK") Actually, I think the change is the reverse; you have to disable extended filtering. In the filtering / search business, it is just as bad (if not worse) to give too many responses as it is to give too few. Otherwise your user is swamped in irrelevant documents. Except in very few cases, notably Arabic and Chinese, according to what I have heard Peter say, we have no reason to believe that a speaker of one microlanguage speaks any other particular macrolanguage. If my user searches for Fulah, and does s/he want to get Maasina Fulfulde, Adamawa Fulfulde, Pulaar, Central-Eastern Niger Fulfulde, and so on, which I have no reason to believe that the original person speaks? And even for Arabic and Chinese, unmodified extended filtering with extlang might make sense only because ar-arb is equivalent to arb and zh-cmn is equivalent to cmn, for all practical purposes, and where all speakers of the microlanguages understand the macrolanguages (which is unclear -- I've asked on the list whether the latter is true -- and it is a crucial fact for extlang -- but get no answer). Better is that if someone really wants both standard Arabic and Tajiki Arabic, than the input be "ar, abh", just like now with Norwegian or Serbo-croation, where someone can just list "nn, nb, no" and "sr, hr, bs, sh", So with extlang we have a backwards compatibility issue with extended filtering; we need to disable macrolanguage lookup in all but a few cases, and even there it is better for the user to just supply the list of items they want. http://www.sil.org/iso639-3/macrolanguages.asp - change to lookup: treat extlang as atomic with the primary language subtag; potentially loop-back through the subtags. That is, given the range "zh-yue-Hant-HK", the fallback pattern is this: zh-yue-Hant-HK zh-yue-Hant zh-yue zh-Hant-HK zh-Hant zh (default) And again, we disable this for all but a few specific cases. It only makes sense to do this as a heuristic when the speaker of the microlanguage is extremely likely to speak the macrolanguage. If the query is Maasina Fulfulde as used in Ghana (ful-)ffm-GH, we don't want to fall back to some arbitrary Fulah; instead we want to probably fall back to Hausa. And this would need to be handled even more carefully where the input is multiple tags, as in Accept-Language. If the user supplies "zh-yue; en", meaning that they prefer Cantonese, then English -- and may not know Mandarin at all! -- then the stated algorithm, the one that everyone uses now, would give completely incorrect results. (1) Incorrect zh-yue-Hant-HK zh-yue-Hant zh-yue zh en (default) Even the modified version doesn't work (2) Still incorrect zh-yue-Hant-HK zh-yue-Hant zh-yue zh-Hant-HK zh-Hant zh en (default) What we need to do is more complicated: (3) what the user asked for, Cantonese, then English, then maybe something else. zh-yue-Hant-HK zh-yue-Hant zh-yue en zh-Hant-HK zh-Hant zh (default) Number 3 is especially needed whenever the macrolanguage is not overwhelmingly likely to be understood by the macrolanguage users. So with extlang we also have a backwards compatibility issue with lookup; code that used to work fine will fail. Or this: zh-yue-Hant-HK zh-yue-Hant zh-yue (default) This would be better, but still requires a change to algorithms, and then simply amounts to adding extlang, but coding around it so that the results were as if there were no extlang -- better to not use the extlang mechanism. 2. Without extlangs. - change to filtering: none - change to lookup: none Agreed so far, but... BUT... you want to include the macro language in your ranges in some cases. Alternatively, we would have to define new filtering and lookup options that include mapping to macrolanguages. For example, with the range "yue-Hant-HK", you would want the fallback to be: yue-Hant-HK yue-Hant yue zh-Hant-HK zh-Hant zh (default) In the case of no extlangs, we really do not want to make any changes to the algorithms. Instead, we want to point out in the text that macro languages *MAY* be a useful resource for having some extended fallback. That is, it *MAY* be reasonable to fall back from the Chinese and Arabic microlanguages to their macrolanguages, but also may not. And the interaction between this additional fallback and multiple tags needs to be taken into consideration. Suppose that the input is "yue-Hant-HK; en"; the current algorithm works. (1) yue-Hant-HK yue-Hant yue en (default) If a particular implementation wants to change the default processing to have an extra step that adds fallback to zh-Hant-HK (and so on) for certain specific languages, such as particular Chinese and Arabic microlanguages, that's a small amount of work. Moreover, it is just what a more enhanced algorithm *currently* would do with any of "nn, nb, no" and "sr, hr, bs, sh", -- in this case they are known to be good, practical fallbacks. Alternatively, if the user just feeds in "yue-Hant-HK, en, zh-Hant-HK" or the reverse "yue-Hant-HK, zh-Hant-HK, en", that will work without any code changes, or any enhancements; the user gets what is asked for. ================ What we don't want to do is make recommendations that if implemented, are harder for people to control and get the right answer. And baking extlang into the tags is even worse -- since it introduces backwards incompatibilities that require old code to be modified to work around. Someone mentioned on this list that they thought that these were only philosophical differences. I don't see it that way at all. The choices we make here will affect the ability of people to work with their own languages for a long time -- it is important to get this right, and pay attention both to backwards compatibility and to future capabilities -- the choice of architecture will make real, practical differences for people, especially for minority language users. My fear is exactly minority language users will be disadvantage with extlang. It isn't the French or German speakers that will end up with problems; it is the Gondi and Grebo speakers. Mark
_______________________________________________ Ltru mailing list Ltru@ietf.org https://www1.ietf.org/mailman/listinfo/ltru
- [Ltru] Re: Extended language tags Doug Ewell
- [Ltru] Extended language tags Shawn Steele
- Re: [Ltru] Extended language tags Andrew Cunningham
- Re: [Ltru] Extended language tags Mark Davis
- RE: [Ltru] Extended language tags Don Osborn
- Re: [Ltru] Extended language tags Randy Presuhn
- Re: [Ltru] Extended language tags Andrew Cunningham
- Re: [Ltru] Extended language tags Andrew Cunningham
- Re: [Ltru] Extended language tags Andrew Cunningham
- Re: [Ltru] Extended language tags Randy Presuhn
- Re: [Ltru] Extended language tags John Cowan
- Re: [Ltru] Extended language tags Addison Phillips
- RE: [Ltru] Extended language tags Don Osborn
- [Ltru] Re: Extended language tags Doug Ewell
- Re: [Ltru] Extended language tags John Cowan
- Re: [Ltru] Extended language tags Randy Presuhn
- RE: [Ltru] Extended language tags Shawn Steele
- Re: [Ltru] Extended language tags John Cowan
- RE: [Ltru] Extended language tags Peter Constable
- RE: [Ltru] Extended language tags Peter Constable
- RE: [Ltru] Extended language tags Peter Constable
- Re: [Ltru] Extended language tags Addison Phillips
- Re: [Ltru] Extended language tags (long reply) Addison Phillips
- Re: [Ltru] Extended language tags (long reply) Andrew Cunningham
- Re: [Ltru] Extended language tags (long reply) Mark Davis
- Re: [Ltru] Extended language tags (long reply) Addison Phillips
- RE: [Ltru] Extended language tags Peter Constable
- RE: [Ltru] Extended language tags (long reply) Peter Constable
- Re: [Ltru] Extended language tags (long reply) John Cowan
- Re: [Ltru] Extended language tags (long reply) Marion Gunn
- Re: [Ltru] Extended language tags (long reply) Addison Phillips
- Re: [Ltru] Extended language tags (long reply) John Cowan
- Re: [Ltru] Re: Extended language tags Randy Presuhn
- [Ltru] Informative (was: Extended language tags) Frank Ellermann
- [Ltru] Re: Extended language tags Doug Ewell
- [Ltru] Re: Informative (was: Extended language ta… Doug Ewell
- Re: [Ltru] Re: Informative (was: Extended languag… John Cowan
- Re: [Ltru] Re: Extended language tags Randy Presuhn
- Re: [Ltru] Re: Informative (was: Extended languag… Randy Presuhn
- RE: [Ltru] Re: Extended language tags Shawn Steele
- RE: [Ltru] Extended language tags (long reply) Shawn Steele
- Re: [Ltru] Re: Informative Addison Phillips
- Re: [Ltru] Extended language tags (long reply) Addison Phillips
- RE: [Ltru] Extended language tags (long reply) - … Shawn Steele
- Re: [Ltru] Extended language tags (long reply) - … Randy Presuhn
- Re: [Ltru] Re: Extended language tags Mark Davis
- Re: [Ltru] Re: Extended language tags Addison Phillips
- RE: [Ltru] Re: Extended language tags Debbie Garside
- Re: [Ltru] Extended language tags (long reply) Mark Davis
- RE: [Ltru] Extended language tags (long reply) Debbie Garside
- RE: [Ltru] Re: Extended language tags Peter Constable
- Re: [Ltru] Re: Extended language tags Karen_Broome
- RE: [Ltru] Extended language tags (long reply) Shawn Steele
- RE: [Ltru] Extended language tags (long reply) Karen_Broome
- RE: [Ltru] Extended language tags (long reply) Peter Constable
- RE: [Ltru] Extended language tags (long reply) Karen_Broome
- RE: [Ltru] Extended language tags (long reply) Peter Constable
- [Ltru] Teleconference Shawn Steele
- Re: [Ltru] Teleconference Mark Davis
- RE: [Ltru] Extended language tags (long reply) Karen_Broome
- Re: [Ltru] Teleconference Randy Presuhn
- Re: [Ltru] Extended language tags (long reply) John Cowan
- Re: [Ltru] Teleconference Addison Phillips
- Re: [Ltru] Teleconference John Cowan
- Re: [Ltru] Extended language tags (long reply) Karen_Broome
- RE: [Ltru] Extended language tags (long reply) Peter Constable
- RE: [Ltru] Extended language tags (long reply) Karen_Broome
- Re: [Ltru] Re: Extended language tags Doug Ewell
- RE: [Ltru] Re: Extended language tags Debbie Garside