RE: [Ltru] Extended language tags (long reply)

Karen_Broome@spe.sony.com Wed, 10 October 2007 18:28 UTC

Return-path: <ltru-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1IfgIJ-0004SJ-Gd; Wed, 10 Oct 2007 14:28:31 -0400
Received: from ltru by megatron.ietf.org with local (Exim 4.43) id 1IfgII-0004SC-1K for ltru-confirm+ok@megatron.ietf.org; Wed, 10 Oct 2007 14:28:30 -0400
Received: from [10.90.34.44] (helo=chiedprmail1.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1IfgIH-0004PM-Md for ltru@ietf.org; Wed, 10 Oct 2007 14:28:29 -0400
Received: from outbound-cpk.frontbridge.com ([207.46.163.16] helo=outbound4-cpk-R.bigfish.com) by chiedprmail1.ietf.org with esmtp (Exim 4.43) id 1IfgID-0005vZ-0y for ltru@ietf.org; Wed, 10 Oct 2007 14:28:25 -0400
Received: from outbound4-cpk.bigfish.com (localhost.localdomain [127.0.0.1]) by outbound4-cpk-R.bigfish.com (Postfix) with ESMTP id 909FA908432; Wed, 10 Oct 2007 18:28:24 +0000 (UTC)
Received: from mail66-cpk-R.bigfish.com (unknown [10.2.40.3]) by outbound4-cpk.bigfish.com (Postfix) with ESMTP id 8EB8BAC0002; Wed, 10 Oct 2007 18:28:24 +0000 (UTC)
Received: from mail66-cpk (localhost.localdomain [127.0.0.1]) by mail66-cpk-R.bigfish.com (Postfix) with ESMTP id 8287A1F80C4; Wed, 10 Oct 2007 18:28:24 +0000 (UTC)
X-BigFish: VP
X-MS-Exchange-Organization-Antispam-Report: OrigIP: 64.14.251.196; Service: EHS
Received: by mail66-cpk (MessageSwitch) id 1192040904468345_31880; Wed, 10 Oct 2007 18:28:24 +0000 (UCT)
Received: from USCCIMTA02.spe.sony.com (unknown [64.14.251.196]) (using SSLv3 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by mail66-cpk.bigfish.com (Postfix) with ESMTP id 674B4708069; Wed, 10 Oct 2007 18:28:23 +0000 (UTC)
Received: from usmail02.spe.sony.com ([43.130.148.26]) by USCCIMTA02.spe.sony.com (Lotus Domino Release 6.5.5) with ESMTP id 2007101011281818-267116 ; Wed, 10 Oct 2007 11:28:18 -0700
In-Reply-To: <C9BF0238EED3634BA1866AEF14C7A9E55A598803B7@NA-EXMSG-C116.redmond.corp.microsoft.com>
To: Shawn Steele <Shawn.Steele@microsoft.com>
Subject: RE: [Ltru] Extended language tags (long reply)
MIME-Version: 1.0
X-Mailer: Lotus Notes Release 6.5.5 CCH7 December 15, 2006
Message-ID: <OFA3DD788D.8893E3ED-ON88257370.0063F700-88257370.006577A0@spe.sony.com>
From: Karen_Broome@spe.sony.com
Date: Wed, 10 Oct 2007 11:25:53 -0700
X-MIMETrack: Serialize by Router on USMAIL02/SVR/SPE(Release 6.5.5FP1|April 11, 2006) at 10/10/2007 11:25:52, Serialize complete at 10/10/2007 11:25:52, Itemize by SMTP Server on USCCiMTA02/SVR/SPE(Release 6.5.5|November 30, 2005) at 10/10/2007 11:28:18 AM, Serialize by Router on USCCiMTA02/SVR/SPE(Release 6.5.5|November 30, 2005) at 10/10/2007 11:28:23 AM, Serialize complete at 10/10/2007 11:28:23 AM
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 244a2fd369eaf00ce6820a760a3de2e8
Cc: "ltru@ietf.org" <ltru@ietf.org>
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============0996362177=="
Errors-To: ltru-bounces@ietf.org

I think if meaningful fallback is the goal, you need to consider both 
audio and textual forms of languages and their respective fallback 
mechanisms. While a fallback to "zh" (from "zh-cmn" or "zh-HK", etc.) 
would likely be useful in written contexts, falling back to spoken "zh" 
from spoken "cmn" may provide something unintelligible. 

In the context of my industry, it would likely be useful for a Cantonese 
speaker to be able to find subtitled films in "zh" if "zh-yue" is not 
available. Obviously there are questions of simplified vs. traditional 
orthography, but the "zh" tag alone encompasses both orthographies. But if 
the Cantonese speaker is looking at dubbed films, a fallback to spoken 
"zh-cmn" is not as likely to be useful.

From what I understand the same is true in written Dutch vs. spoken 
Flemish -- a Flemish speaker may be able to easily read standard Dutch, 
but the spoken forms can create problems of understanding to the point 
where Dutch is subtitled in Belgium. (I still don't know whether "vls" 
(639-3) falls back to "nld/dut" in 639-2, though Flemish is cited as a 
synonym for Vlaams in 639-3.)

Please consider the nature of both audio and written language with respect 
to ext langs. I think you'll find what makes sense for fallback depends on 
these distinctions in many other cases than those I've mentioned. 

Regards,

Karen Broome
Metadata Systems Designer
Sony Pictures Entertainment
310.244.4384



Shawn Steele <Shawn.Steele@microsoft.com> 
10/10/2007 11:02 AM

To
Mark Davis <mark.davis@icu-project.org>, Addison Phillips 
<addison@yahoo-inc.com>
cc
"ltru@ietf.org" <ltru@ietf.org>
Subject
RE: [Ltru] Extended language tags (long reply)






================

What we don't want to do is make recommendations that if implemented, are 
harder for people to control and get the right answer. And baking extlang 
into the tags is even worse -- since it introduces backwards 
incompatibilities that require old code to be modified to work around. 

================

cmn is completely incompatible with existing practice anyway, so you can’t 
claim that it solves the problem.  Existing clients ask for zh-HK (or 
whatever) and code is tagged as zh-HK (or whatever).  Those won’t match 
cmn using RFC 4647.
 
So for backwards compatibility zh-cmn is no worse than cmn.  And if you 
don’t like the inference of the zh, then you can ignore that part, but at 
least the data’s there if people do want it.
 
For some macro languages the strict fallback is probably inappropriate, 
however I don’t expect to find matches in that case (because I don’t 
expect correctly tagged data to be “zh”).  If this is a concern, those are 
easily filtered out.
 
From the discussion I don’t think the bigger problem is whether or not we 
go with zh-cmn or just cmn.  The bigger issue seems to be how to modify 
RFC 4647 to provide meaningful fallback with whichever mode is used.  
Since some applications may (or may not) want to consider zh-HK or other 
legacy behaviors, such recommendations aren’t trivial.  Given the 
differing requirements amongst us, I suspect a certain flexibility of the 
applications will be necessary, perhaps several suggestions rather than 
the strict behavior of RFC 4647 (which everyone seems to modify for their 
purposes anyway)
 
- Shawn_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www1.ietf.org/mailman/listinfo/ltru


_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www1.ietf.org/mailman/listinfo/ltru