[Ltru] Extended language tags

Shawn Steele <Shawn.Steele@microsoft.com> Thu, 04 October 2007 22:22 UTC

Return-path: <ltru-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1IdZ5T-0006Qp-Bv; Thu, 04 Oct 2007 18:22:31 -0400
Received: from ltru by megatron.ietf.org with local (Exim 4.43) id 1IdYJm-0006rr-Ex for ltru-confirm+ok@megatron.ietf.org; Thu, 04 Oct 2007 17:33:14 -0400
Received: from [10.90.34.44] (helo=chiedprmail1.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1IdYJm-0006rj-2A for ltru@ietf.org; Thu, 04 Oct 2007 17:33:14 -0400
Received: from mailc.microsoft.com ([131.107.115.214] helo=smtp.microsoft.com) by chiedprmail1.ietf.org with esmtp (Exim 4.43) id 1IdYJl-0000WR-HG for ltru@ietf.org; Thu, 04 Oct 2007 17:33:13 -0400
Received: from tk5-exhub-c103.redmond.corp.microsoft.com (157.54.70.186) by TK5-EXGWY-E803.partners.extranet.microsoft.com (10.251.56.169) with Microsoft SMTP Server (TLS) id 8.1.177.2; Thu, 4 Oct 2007 14:33:12 -0700
Received: from NA-EXMSG-C116.redmond.corp.microsoft.com ([157.54.62.39]) by tk5-exhub-c103.redmond.corp.microsoft.com ([157.54.70.186]) with mapi; Thu, 4 Oct 2007 14:33:12 -0700
From: Shawn Steele <Shawn.Steele@microsoft.com>
To: "ltru@ietf.org" <ltru@ietf.org>
Date: Thu, 04 Oct 2007 14:33:05 -0700
Thread-Topic: Extended language tags
Thread-Index: AcgGn71y5iG0YQagRpS0ug4uGekhiAAFHBGg
Message-ID: <C9BF0238EED3634BA1866AEF14C7A9E55A597AC370@NA-EXMSG-C116.redmond.corp.microsoft.com>
References: <E1IdT7z-0001vv-Ly@megatron.ietf.org>
In-Reply-To: <E1IdT7z-0001vv-Ly@megatron.ietf.org>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-cr-puzzleid: {3E991A72-F7F6-49C6-AEF5-ADF18F8398C6}
x-cr-hashedpuzzle: DFP9 E0Uj E2yx FMQ6 FhTI FmXk GbGS GgYc HGQv H0aK IPcf IRV1 ISBC In3c JnCI KxFJ; 1; bAB0AHIAdQBAAGkAZQB0AGYALgBvAHIAZwA=; Sosha1_v1; 7; {3E991A72-F7F6-49C6-AEF5-ADF18F8398C6}; cwBoAGEAdwBuAC4AcwB0AGUAZQBsAGUAQABtAGkAYwByAG8AcwBvAGYAdAAuAGMAbwBtAA==; Thu, 04 Oct 2007 21:33:05 GMT;RQB4AHQAZQBuAGQAZQBkACAAbABhAG4AZwB1AGEAZwBlACAAdABhAGcAcwA=
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 52f7a77164458f8c7b36b66787c853da
X-Mailman-Approved-At: Thu, 04 Oct 2007 18:22:29 -0400
Subject: [Ltru] Extended language tags
X-BeenThere: ltru@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Language Tag Registry Update working group discussion list <ltru.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ltru>
List-Post: <mailto:ltru@ietf.org>
List-Help: <mailto:ltru-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ltru>, <mailto:ltru-request@ietf.org?subject=subscribe>
Errors-To: ltru-bounces@ietf.org

There's been lots of discussion about the extended language tags, but it doesn't seem like there've been many conclusions.  I've gleaned the following points for the discussion:

        * We (the bigger software ecosystem we, tools, etc) have encouraged use of zh-HK, etc for labeling both Mandarin and Cantonese.  Whatever happens in the future a large number of legacy documents, both Manderin and Cantonese, will be labeled with the existing language tag.

        * Whether or not yue or zh-yue was used, it is a change from the current label (in most cases).  It seems that in nearly all cases this will require a code change.  In particular applications that want to include zh-HK in lists containing zh-cmn-HK or cmn-HK will need extra awareness.

        * I don't think the Breton example applies.  http-accept-lang allows for "br-FR;fr-FR" type fallback, so that is a solution.  If an application independently wanted to make this assumption that's fine by me, but this seems orthogonal to the problem we're trying to solve here.

        * Lots of zh data right now is zh-cmn.  Most of "us" seem to agree that we can't narrow the meaning of zh because it does allow Cantonese.  However if some application wanted to make an assumption that zh == zh-cmn, then that seems up to the application for fallback.

        * Even for the current tags, many of the people in the teleconference seem to extend RFC 4647 in ways that are best for them.  Strict use of 4647 behavior seems rare.  It seems reasonable to me to expect that in the future people may continue to do so and that RFC 4647 and the registry can only provide guidelines.  I don't think that it can solve all problems for all applications, and I'm fine with that.

        * It was proposed that the registry could provide a link from cmn to zh for applications that want this.  My experience has been that the registry is good for providing recommendations, but in practice software is rarely kept current with the actual data currently in the registry.

        * FWIW: some documents are currently tagged zh-yue.

        * RFC 4646 strongly implies that that direction is zh-cmn rather than cmn.  By itself this isn't a big concern to me, but taken with the grandfathered and existing zh-yue data I would think there'd need to be a strong case to change directions.

So from these points, my conclusion is that the zh-cmn form is preferable.

My reasoning is that either cmn by itself or zh-cmn will require code change in nearly all cases, either to include or exclude existing data or fallback rules.   Either was will require a code change, and either way may require knowledge that zh-HK might be interesting if the request is for {Mandarin tag}-HK.  Searching may need to include zh files for {Manderin tag} queries and exclude them for {Cantonese tag} queries, but the actual tag doesn't really change this logic.  Neither variation is likely to work with existing code when the request is for the new name and the data is tagged with the old tag.

The deciding factor for me is that to know that cmn is related to zh I'd have to look in the registry, but zh-cmn contains that information.  Otherwise I don't really see advantages with either method.

- Shawn

SDE
Windows International
Visual Studio .Net
Microsoft


_______________________________________________
Ltru mailing list
Ltru@ietf.org
https://www1.ietf.org/mailman/listinfo/ltru