RE: wadegile and pinyin LANGUAGE SUBTAG REGISTRATION FORMs

Peter Constable <petercon@microsoft.com> Wed, 03 September 2008 14:40 UTC

Return-Path: <petercon@microsoft.com>
X-Original-To: ietf-languages@alvestrand.no
Delivered-To: ietf-languages@alvestrand.no
Received: from localhost (localhost [127.0.0.1]) by eikenes.alvestrand.no (Postfix) with ESMTP id 43FB739E44F for <ietf-languages@alvestrand.no>; Wed, 3 Sep 2008 16:40:38 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at eikenes.alvestrand.no
Received: from eikenes.alvestrand.no ([127.0.0.1]) by localhost (eikenes.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zbgf-G-u491x for <ietf-languages@alvestrand.no>; Wed, 3 Sep 2008 16:40:37 +0200 (CEST)
X-Greylist: from auto-whitelisted by SQLgrey-1.6.8
Received: from pechora1.lax.icann.org (pechora1.icann.org [208.77.188.36]) by eikenes.alvestrand.no (Postfix) with ESMTPS id 3E4F039E39F for <ietf-languages@alvestrand.no>; Wed, 3 Sep 2008 16:40:36 +0200 (CEST)
Received: from smtp.microsoft.com (smtp.microsoft.com [131.107.115.212]) by pechora1.lax.icann.org (8.13.8/8.13.8) with ESMTP id m83Eedfu004452 for <ietf-languages@iana.org>; Wed, 3 Sep 2008 07:40:59 -0700
Received: from tk1-exhub-c102.redmond.corp.microsoft.com (157.54.46.186) by TK5-EXGWY-E801.partners.extranet.microsoft.com (10.251.56.50) with Microsoft SMTP Server (TLS) id 8.1.291.1; Wed, 3 Sep 2008 07:24:28 -0700
Received: from NA-EXMSG-C117.redmond.corp.microsoft.com ([157.54.62.44]) by tk1-exhub-c102.redmond.corp.microsoft.com ([157.54.46.186]) with mapi; Wed, 3 Sep 2008 07:24:28 -0700
From: Peter Constable <petercon@microsoft.com>
To: Michael Everson <everson@evertype.com>, ietflang IETF Languages Discussion <ietf-languages@iana.org>
Date: Wed, 03 Sep 2008 07:24:24 -0700
Subject: RE: wadegile and pinyin LANGUAGE SUBTAG REGISTRATION FORMs
Thread-Topic: wadegile and pinyin LANGUAGE SUBTAG REGISTRATION FORMs
Thread-Index: AckHgDDOqLg2OgdbRb6RilrGP06uqAGTqGJw
Message-ID: <DDB6DE6E9D27DD478AE6D1BBBB835795633B2E3832@NA-EXMSG-C117.redmond.corp.microsoft.com>
References: <mailman.5.1219744802.2264.ietf-languages@alvestrand.no> <240C1D5B2BCD4479894DE98CA7FDF0C5@DGBP7M81> <CAE7BB83-82BA-4411-B2AC-9E23E090719C@evertype.com>
In-Reply-To: <CAE7BB83-82BA-4411-B2AC-9E23E090719C@evertype.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Virus-Scanned: ClamAV 0.93.3/8147/Wed Sep 3 05:59:48 2008 on pechora1.lax.icann.org
X-Virus-Status: Clean
X-Greylist: Delayed for 00:16:10 by milter-greylist-4.0 (pechora1.lax.icann.org [208.77.188.36]); Wed, 03 Sep 2008 07:40:59 -0700 (PDT)
X-BeenThere: ietf-languages@alvestrand.no
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IETF Language tag discussions <ietf-languages.alvestrand.no>
List-Unsubscribe: <http://www.alvestrand.no/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@alvestrand.no?subject=unsubscribe>
List-Archive: <http://www.alvestrand.no/pipermail/ietf-languages>
List-Post: <mailto:ietf-languages@alvestrand.no>
List-Help: <mailto:ietf-languages-request@alvestrand.no?subject=help>
List-Subscribe: <http://www.alvestrand.no/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@alvestrand.no?subject=subscribe>
X-List-Received-Date: Wed, 03 Sep 2008 14:40:38 -0000

From: ietf-languages-bounces@alvestrand.no [mailto:ietf-languages-bounces@alvestrand.no] On Behalf Of Michael Everson
Sent: Tuesday, August 26, 2008 6:31 AM

>> Is that to say you approve them with a Prefix value of
>> "zh-Latn", as shown on Mark's "R2" registration forms?
>
> Erm, no. Both Wade Giles and Hanyu Pinyin imply Latin
> inherently, in my opinion.

I think we all agree that Latin is implied. Chinese is also implied. By this rationale, a complete tag of "wadegile" would work just as well as "zh-wadegile" (BCP47 syntax requirements aside). In terms of semantic representation, that is true: "wadegile" contains just as much information as does "zh-wadegile".

But in processing operations, they are not equal: having a separate subtag denoting the 'Chinese' semantic (or, in a 4646bis era, the 'Mandarin' semantic) makes it easy for processes to recognize that without needing to have tables recording the relationship between "wadegiles" and "zh". In just the same way, including "Latn" frees processes from needing to have tables recording the relationship between "wadegiles" and "Latn".

We need to consider how tags will get used -- in matching -- together with the matching algorithms described in BCP47 (RFC 4647). Realistic scenarios include

- matching a request for "zh-Latn" content with content tagged to indicate Wade Giles or Hanyu Pinyin Romanizations

- matching a request for Wade Giles or Pinyin content with the best-available match, which may be content tagged "zh-Latn"


Those are made more complicated if "Latn" is not part of the prefix for "wadegile" and "pinyin".



Peter