Re: [Ltru] zh != Mandarin

Shawn Steele <Shawn.Steele@microsoft.com> Thu, 05 June 2008 22:41 UTC

Return-Path: <ltru-bounces@ietf.org>
X-Original-To: ltru-archive@megatron.ietf.org
Delivered-To: ietfarch-ltru-archive@core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id B72F53A69A1; Thu, 5 Jun 2008 15:41:53 -0700 (PDT)
X-Original-To: ltru@core3.amsl.com
Delivered-To: ltru@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id C18383A69A1 for <ltru@core3.amsl.com>; Thu, 5 Jun 2008 15:41:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.073
X-Spam-Level:
X-Spam-Status: No, score=-10.073 tagged_above=-999 required=5 tests=[AWL=0.526, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QIIXmdOeE7tH for <ltru@core3.amsl.com>; Thu, 5 Jun 2008 15:41:51 -0700 (PDT)
Received: from smtp.microsoft.com (smtp.microsoft.com [131.107.115.214]) by core3.amsl.com (Postfix) with ESMTP id CA3323A687F for <ltru@ietf.org>; Thu, 5 Jun 2008 15:41:51 -0700 (PDT)
Received: from TK5-EXHUB-C102.redmond.corp.microsoft.com (157.54.18.53) by TK5-EXGWY-E803.partners.extranet.microsoft.com (10.251.56.169) with Microsoft SMTP Server (TLS) id 8.1.240.5; Thu, 5 Jun 2008 15:41:58 -0700
Received: from NA-EXMSG-C116.redmond.corp.microsoft.com ([157.54.62.41]) by TK5-EXHUB-C102.redmond.corp.microsoft.com ([157.54.18.53]) with mapi; Thu, 5 Jun 2008 15:41:58 -0700
From: Shawn Steele <Shawn.Steele@microsoft.com>
To: "Phillips, Addison" <addison@amazon.com>, "Broome, Karen" <Karen_Broome@spe.sony.com>, Peter Constable <petercon@microsoft.com>
Date: Thu, 05 Jun 2008 15:41:54 -0700
Thread-Topic: [Ltru] zh != Mandarin
Thread-Index: AcjGqpuHvgXUND/rT66zQ/8a5JuJtQAOUFzKAAMyKAAACreuAAAIs4qAAACOsOAABxqEwA==
Message-ID: <C9BF0238EED3634BA1866AEF14C7A9E561BFE2CCDC@NA-EXMSG-C116.redmond.corp.microsoft.com>
References: <AQHIxgLu9AlGx5cj/0mtkmlbJHUiWQ==><C9BF0238EED3634BA1866AEF14C7A9E561BFAA260C@NA-EXMSG-C116.redmond.corp.microsoft.com>, <30b660a20806041822o6d4b40edy457ed403e67d2895@mail.gmail.com> <C9BF0238EED3634BA1866AEF14C7A9E561BFAA261A@NA-EXMSG-C116.redmond.corp.microsoft.com> <15f501c8c6f2$2a53a810$0a00a8c0@CPQ86763045110> <DDB6DE6E9D27DD478AE6D1BBBB835795633368156C@NA-EXMSG-C117.redmond.corp.microsoft.com> <E19FDBD7A3A7F04788F00E90915BD36C13C251BA07@USSDIXMSG20.spe.sony.com> <4D25F22093241741BC1D0EEBC2DBB1DA013AC2FE49@EX-SEA5-D.ant.amazon.com>
In-Reply-To: <4D25F22093241741BC1D0EEBC2DBB1DA013AC2FE49@EX-SEA5-D.ant.amazon.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
MIME-Version: 1.0
Cc: "ltru@ietf.org" <ltru@ietf.org>
Subject: Re: [Ltru] zh != Mandarin
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: ltru-bounces@ietf.org
Errors-To: ltru-bounces@ietf.org

Minor note, for the "even though the content could be more precisely be tagged with 'cmn' (Mandarin)." Could it also say Cantonese, "even though the content could be more precisely be tagged with 'cmn' (Mandarin) or 'yue' (Cantonese)"

- Shawn

-----Original Message-----
From: ltru-bounces@ietf.org [mailto:ltru-bounces@ietf.org] On Behalf Of Phillips, Addison
Sent: Thursday, June 05, 2008 12:38 PM
To: Broome, Karen; Peter Constable
Cc: ltru@ietf.org
Subject: Re: [Ltru] zh != Mandarin

>
> I agree that zh-cmn or cmn SHOULD be used. zh will remain valid and
> includes both Mandarin and Cantonese, so MUST is too strong.
>
> I think this is the last of the disagreement. Mark Davis believes
> "zh" SHOULD be used. Could we straw poll that?
>

SHOULD be used for what? From what I've read Mark Davis believes that he MAY use 'zh' to mean Mandarin.

We already have a "rough consensus": let's work on text instead. Here is what I have revised the draft-15 text to say on the matter:

--
<section title="Using Extended Language Subtags" anchor="choiceUsingExtlang">

<t>The Chinese ('zh') and Arabic ('ar') languages and the various sign languages ('sgn') have a long tradition of using specific primary language subtags, possibly coupled with various region subtags or as part of a registered grandfathered tag, to indicate the language. With the adoption of this document, specific ISO 639-3 assigned subtags became available to identify languages within these diverse language families or groupings. Other than the sign languages, which share a mode of communication rather than any linguistic heritage, this relationship is provided for in ISO 639-3 via a "macrolanguage" mapping. Other languages are encompassed by a macrolanguage and guidance on tagging these languages is provided below. For Arabic and Chinese, however, compatibility with existing tagging practices uses the extended language tag feature to allow tags consistent with user expectations in these language communities.</t>

<t>Chinese ('zh') provides a useful illustration of this.       Many different kinds of content have used tags beginning with the 'zh' subtag, with application specific meaning being associated with region codes, private-use sequences, or grandfathered registered values. This is because historically only the macrolanguage subtag 'zh' was available for forming language tags. However, the languages encompassed by the Chinese subtag are, in the main, not mutually intelligible when spoken. Written forms of these languages also show wide variation in form and usage and many of these languages are written in various contexts.</t>

<t>Rather than require all Chinese content to be retagged, this document provides a special compatibility mechanism: the extended language subtag. Chinese languages encompassed by the 'zh' subtag are in the registry as both primary language subtags and as extended language subtags. For example, the subtag for Cantonese is 'yue'. Content in Cantonese might historically have used a tag such as "zh-HK" (since Cantonese is spoken commonly in Hong Kong), although that tag actually means any type of Chinese as used in Hong Kong. With the availability of ISO 639-3 codes in the registry, content in Cantonese SHOULD use a tag containing the 'yue' subtag. For example, a document written in the Traditional script might use a tag such as "yue-Hant" or "zh-yue-Hant-HK".</t>

<t>Applications MAY use the macrolanguage subtag to form the tag instead of using the more specific encompassed language subtag. For example, an application with large quantities of textual data already using tags with the 'zh' (Chinese) subtag might continue to use this more general subtag even for new data, even though the content could be more precisely be tagged with 'cmn' (Mandarin). Similarly, an application already using tags that start with the 'ar' (Arabic) subtag might continue to use this more general subtag even for new data, which could be more precisely be tagged with 'arb' (Standard Arabic).</t></section>

--

Addison
_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www.ietf.org/mailman/listinfo/ltru
_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www.ietf.org/mailman/listinfo/ltru