Re: [Ltru] Consensus call: extlang

"Broome, Karen" <Karen_Broome@spe.sony.com> Fri, 30 May 2008 01:04 UTC

Return-Path: <ltru-bounces@ietf.org>
X-Original-To: ltru-archive@megatron.ietf.org
Delivered-To: ietfarch-ltru-archive@core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id E7B123A6BD1; Thu, 29 May 2008 18:04:59 -0700 (PDT)
X-Original-To: ltru@core3.amsl.com
Delivered-To: ltru@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id B4D323A6B79 for <ltru@core3.amsl.com>; Thu, 29 May 2008 18:04:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.599
X-Spam-Level:
X-Spam-Status: No, score=-6.599 tagged_above=-999 required=5 tests=[AWL=0.000, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nwg0VU27Dnbo for <ltru@core3.amsl.com>; Thu, 29 May 2008 18:04:56 -0700 (PDT)
Received: from outbound4-dub-R.bigfish.com (outbound-dub.frontbridge.com [213.199.154.16]) by core3.amsl.com (Postfix) with ESMTP id 80B643A6AE6 for <ltru@ietf.org>; Thu, 29 May 2008 18:04:55 -0700 (PDT)
Received: from outbound4-dub.bigfish.com (localhost.localdomain [127.0.0.1]) by outbound4-dub-R.bigfish.com (Postfix) with ESMTP id 9E9FC17C08D2; Fri, 30 May 2008 01:04:53 +0000 (UTC)
Received: from mail118-dub-R.bigfish.com (unknown [10.5.252.3]) by outbound4-dub.bigfish.com (Postfix) with ESMTP id 7A8AB1368043; Fri, 30 May 2008 01:04:53 +0000 (UTC)
Received: from mail118-dub (localhost.localdomain [127.0.0.1]) by mail118-dub-R.bigfish.com (Postfix) with ESMTP id 6B1651500B5B; Fri, 30 May 2008 01:04:52 +0000 (UTC)
X-BigFish: V
X-MS-Exchange-Organization-Antispam-Report: OrigIP: 160.33.98.75;Service: EHS
Received: by mail118-dub (MessageSwitch) id 1212109489772675_26043; Fri, 30 May 2008 01:04:49 +0000 (UCT)
Received: from mail8.fw-bc.sony.com (mail8.fw-bc.sony.com [160.33.98.75]) by mail118-dub.bigfish.com (Postfix) with ESMTP id 58DF3CC005D; Fri, 30 May 2008 01:04:49 +0000 (UTC)
Received: from mail3.sjc.in.sel.sony.com (mail3.sjc.in.sel.sony.com [43.134.1.211]) by mail8.fw-bc.sony.com (8.12.11/8.12.11) with ESMTP id m4U14mkO023148; Fri, 30 May 2008 01:04:48 GMT
Received: from ussdixtran21.spe.sony.com ([43.130.141.78]) by mail3.sjc.in.sel.sony.com (8.12.11/8.12.11) with ESMTP id m4U14mf5003694; Fri, 30 May 2008 01:04:48 GMT
Received: from USSDIXMSG20.spe.sony.com ([43.130.141.74]) by ussdixtran21.spe.sony.com ([43.130.141.78]) with mapi; Thu, 29 May 2008 18:04:48 -0700
From: "Broome, Karen" <Karen_Broome@spe.sony.com>
To: Mark Davis <mark.davis@icu-project.org>
Date: Thu, 29 May 2008 18:04:45 -0700
Thread-Topic: [Ltru] Consensus call: extlang
Thread-Index: AcjB37raAZt3s1pSQjORA3q6jueBVQABt0oQ
Message-ID: <E19FDBD7A3A7F04788F00E90915BD36C13C251B4FC@USSDIXMSG20.spe.sony.com>
References: <01c301c8bbe5$8c2810c0$6801a8c0@oemcomputer> <30b660a20805252132g28ff50b0kd5b04d6f47ca35d2@mail.gmail.com> <002001c8bef3$e0497520$6801a8c0@oemcomputer> <6.0.0.20.2.20080527170755.05bd89c0@localhost> <002f01c8c024$0dcdb5c0$6801a8c0@oemcomputer> <6.0.0.20.2.20080528163346.074fac80@localhost> <001f01c8c122$0cbcae80$6801a8c0@oemcomputer> <4D25F22093241741BC1D0EEBC2DBB1DA013A84C314@EX-SEA5-D.ant.amazon.com> <007601c8c1bc$84d93920$6801a8c0@oemcomputer> <104f01c8c1d8$94ad6f30$0a00a8c0@CPQ86763045110> <30b660a20805291559x4f6243a8pecc7ee92c2a36d9c@mail.gmail.com>
In-Reply-To: <30b660a20805291559x4f6243a8pecc7ee92c2a36d9c@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
MIME-Version: 1.0
Cc: LTRU Working Group <ltru@ietf.org>
Subject: Re: [Ltru] Consensus call: extlang
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: ltru-bounces@ietf.org
Errors-To: ltru-bounces@ietf.org

Mark,

One thing I think you aren't acknowledging is that "treat as synonyms" means something very different to the vast numbers of content creators who use this standard than it does the handful of search engines that use the fuzzy logic associated with companion standards. As you note in your document, "It is clear that companies like Google or Yahoo can work around the problems with extlang." How many other users need and can afford to implement the extended fallback and filtering logic? Enough that this logic should be the primary driver behind the chosen solution?

Before I spend too much time picking apart your lengthy screed involving a scenario where the BBC presents its web site in Sudanese Creole Arabic with rotating languages code logic for each day of the week ... (ahem) ... here's my real-world Chinese language list:

Chinese (Variant Unknown)
Chinese (Cantonese, Spoken)
Chinese (Cantonese, Written)
Chinese (Mandarin, Spoken)
Chinese (Mandarin, Spoken Taiwanese)
Chinese (Mandarin, Simplified)
Chinese (Mandarin, Traditional)
Chinese (Taiwanese, Spoken)
Chinese (Taiwanese, Written)

(Apologies, this is hard to represent in ASCII. I have a mini-spreadsheet if someone wants it.)


1             2       3   4
zh            zh        zh        zh
zh-yue  yue           yue         yue
zh-yue  yue           yue         yue
zh-cmn  cmn           zh          cmn
zh-cmn-TW       cmn-TW  zh-TW     cmn-TW
zh-cmn-Hans     cmn-Hans        zh-Hans zh-Hans
zh-cmn-Hant     cmn-Hant        zh-Hant zh-Hant
zh-min-nan      nan           nan         nan
zh-min-nan      nan           nan         nan


* Option #1 (RFC 4646) contains the codes as I have them today.
* Option #2 (RFC 4646bis) contains the codes if I choose to go against the grain and use "cmn".
* Option #3 (RFC 4646bis) treats "zh" and "cmn" as synonyms; avoids using "cmn" for compatibility.
* Option #4 (RFC 4646bis) contains the codes "cmn" for spoken context (where distinction is essential) and "zh" for written context.

Comments:

* Option #1 is unambiguous and shows that there is a relationship between these languages. It also preserves the legacy "zh" tag so developers that aren't hip to later versions of BCP 47 or 639-3 will have some idea what these tags mean. The tags are maybe longer than they need to be, but if I need a fixed-length tag, I can wait for 639-6. The languages may not be mutually intelligible in some contexts, but they are related.

* Option #2 is unambiguous, but Microsoft, Google, and Amazon won't be using the same tags for Chinese that I do. Even if I don't follow their lead, others likely will. This worries me. Also, the rules for #2 must include fuzzy guidelines such as, "use the 'zh' tag except when you think it's a bad idea" and "use the shortest tag except when you don't want to." This presents complications in trying to explain some sort of consistent method to the LTRU madness to others. Given this, I start to wish ISO 639-6 a safe and speedy passage.

* Option #3 is what I believe you might suggest, but for me, that's the worst list of all. There are five ambiguous "zh" categories on that list. It follows the "always use the shortest tag" rule and respects history, but it's useless to me from an identification perspective.

* Option #4 has three ambiguous tags and means I have to explain to people who aren't in this industry about why I use different tags for the same language. This strategy is less ambiguous that #3, but I'm not sure I can explain it to other content creators for the same reasons as #2 and presents the spoken/written complication others may not want. In the long run, this seems messy and unclear enough that it will result in bad tagging.

* Options #2,3,4: In general, it worries me that RFC 4646bis offers so many "preferred" options for the same thing. I really can't see how this simplifies things for anyone.

I don't have a need for fuzzy fallback scenarios. I need precise tags and mostly simple lookup. I think if you take the fallback scenarios and absurdities out of the document you reference, I don't think there's much left.

Regards,

Karen Broome




>-----Original Message-----
>From: ltru-bounces@ietf.org [mailto:ltru-bounces@ietf.org] On Behalf
>Of Mark Davis
>Sent: Thursday, May 29, 2008 4:00 PM
>To: debbie@ictmarketing.co.uk
>Cc: LTRU Working Group
>Subject: Re: [Ltru] Consensus call: extlang
>
>What would be useful is to hear from the extlangistas what their
>concerns are specifically; many have not given reasons for favoring
>encompassed languages into extlang instead of into the primary
>language subtag. It would be useful for them to give the scenarios
>where they think extlang is an improvement. It would be useful to
>find out why they think the scenarios such as in
>http://docs.google.com/Doc?docid=dfqr8rd5_676kxxxjhd&hl=en are not a
>problem.
>
>Clearly people think that using the extlang model solves more
>problems than it causes, so it would be useful to example specific
>cases and see if that is, in fact, true.
>
>
>Mark

_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www.ietf.org/mailman/listinfo/ltru