Re: [Ltru] Macrolanguage usage

"Phillips, Addison" <addison@amazon.com> Fri, 16 May 2008 15:19 UTC

Return-Path: <ltru-bounces@ietf.org>
X-Original-To: ltru-archive@megatron.ietf.org
Delivered-To: ietfarch-ltru-archive@core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id ECF7C3A6B36; Fri, 16 May 2008 08:19:55 -0700 (PDT)
X-Original-To: ltru@core3.amsl.com
Delivered-To: ltru@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 9C3803A68D2 for <ltru@core3.amsl.com>; Fri, 16 May 2008 08:19:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.665
X-Spam-Level:
X-Spam-Status: No, score=-106.665 tagged_above=-999 required=5 tests=[AWL=-0.066, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ekv8XrpKX8PU for <ltru@core3.amsl.com>; Fri, 16 May 2008 08:19:52 -0700 (PDT)
Received: from smtp-fw-6101.amazon.com (smtp-fw-6101.amazon.com [72.21.208.25]) by core3.amsl.com (Postfix) with ESMTP id E317D28C176 for <ltru@ietf.org>; Fri, 16 May 2008 08:18:57 -0700 (PDT)
X-IronPort-AV: E=Sophos;i="4.27,498,1204502400"; d="scan'208";a="311395730"
Received: from smtp-in-0201.sea3.amazon.com ([172.20.19.24]) by smtp-border-fw-out-6101.iad6.amazon.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 16 May 2008 15:18:50 +0000
Received: from ex-hub-4104.ant.amazon.com (ex-hub-4104.sea5.amazon.com [10.248.163.25]) by smtp-in-0201.sea3.amazon.com (8.12.11/8.12.11) with ESMTP id m4GFIn9V020007 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=FAIL); Fri, 16 May 2008 15:18:49 GMT
Received: from EX-SEA5-D.ant.amazon.com ([10.248.163.28]) by ex-hub-4104.ant.amazon.com ([10.248.163.25]) with mapi; Fri, 16 May 2008 08:18:49 -0700
From: "Phillips, Addison" <addison@amazon.com>
To: Peter Constable <petercon@microsoft.com>, Doug Ewell <doug@ewellic.org>, LTRU Working Group <ltru@ietf.org>
Date: Fri, 16 May 2008 08:18:47 -0700
Thread-Topic: [Ltru] Macrolanguage usage
Thread-Index: Aci3JsQR79pXsZVBSjCDjtvmPhwOZgANuV+wAAEc12A=
Message-ID: <4D25F22093241741BC1D0EEBC2DBB1DA013A118FF0@EX-SEA5-D.ant.amazon.com>
References: <mailman.494.1210865385.5128.ltru@ietf.org> <00a901c8b6f5$c04529a0$e6f5e547@DGBP7M81> <DDB6DE6E9D27DD478AE6D1BBBB83579562E143D665@NA-EXMSG-C117.redmond.corp.microsoft.com>
In-Reply-To: <DDB6DE6E9D27DD478AE6D1BBBB83579562E143D665@NA-EXMSG-C117.redmond.corp.microsoft.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
MIME-Version: 1.0
Subject: Re: [Ltru] Macrolanguage usage
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: ltru-bounces@ietf.org
Errors-To: ltru-bounces@ietf.org

>
> Well, you demonstrate that it evidently reads that way, though it's not
> quite saying that: this is a exception to the general recommendation.
> In general, we would not recommend that people tag Cantonese content as
> "zh", although they certainly can. The situation in which it's *most*
> reasonable to tag Cantonese content "zh" is when it is understandable
> to those that read Mandarin -- meaning it's understandable by the vast
> majority.

So we have a small impasse, indicative of the larger problem here: do we or do we not recommend against the use of 'zh' to tag Mandarin Chinese? It isn't the tagging of Cantonese that is the controversy. It is whether 'zh' is equated with Mandarin.

>
> Do you only have a critique? Did you not find the overall text useful
> enough to even consider suggestions for how to improve? Overall, was it
> of no value?

It's not of "no value", but it is different from "option #2" and it isn't a commentary on the currently developing draft. I have pondered how to incorporate these divergent views carefully and suggest that the section on Chinese be expanded and edited as follows:

--
<t>The family of languages encompassed by the macrolanguage Chinese ('zh') provides a useful illustration of this. Historically many different kinds of content have been tagged with variations of the 'zh' subtag, with application specific meaning being associated with region codes in particular. This is because historically only the macrolanguage subtag was available for forming language tags. However, these languages are, in the main, not mutually intelligible when spoken. Written forms of these languages also show wide variation in form and usage and many of these languages are written in various contexts. However, note that the majority of literate Chinese can read "Standard Mandarin" documents and many, if not most, written documents use this form. This should not be taken as meaning that the other languages are never (or rarely) written, only that the standard Mandarin language form is most common and serves as a kind of written 'lingua franca'.</t>

<t>With the adoption of this document, subtags for the encompassed languages became available for use in language tags. These subtags SHOULD be used instead of the macrolanguage subtag 'zh' to identify Chinese language content. While documents written in Standard Mandarin could use the 'cmn' (Mandarin) subtag, their wide accessibility can be indicated by using the 'zh' subtag in this case.</t>

<t>For example, before script codes were available, Chinese written in the Traditional script was sometimes associated with the "zh-TW" (Chinese, Taiwan) tag. Another example would be the association of the various spoken or written forms of the Cantonese language with the "zh-HK" (Chinese, Hong Kong) tag. However, each of these tags could also be (and actually were) associated with other language forms as well. For example, "zh-TW" might also indicate the Min Nan language or "zh-HK" could indicate a Chinese content item adapted for Hong Kong (but not necessarily a Cantonese item: as noted, many written documents are in Mandarin Chinese).</t>

<t>Using the encompassed language subtags (in concert with other appropriate subtags) makes clear the actual language. Script, region, or other subtags can still delineate any additional or local variations in language (such as using the 'Hans' subtag to identify the simplified Chinese script or using the 'TW' region subtag to identify a Taiwanese regional form of a language). Thus, a Min Nan document written in the Traditional script and using a Taiwanese region form would use the tag "nan-Hant-TW", while a Cantonese sound recording could use the tag "yue" (Cantonese) or possibly "yue-HK" (Cantonese, Hong Kong) if the recording were appropriate to that region.</t>
--



Addison
_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www.ietf.org/mailman/listinfo/ltru