RE: Pinyin

"Phillips, Addison" <addison@amazon.com> Thu, 25 September 2008 18:15 UTC

Return-Path: <addison@amazon.com>
X-Original-To: ietf-languages@alvestrand.no
Delivered-To: ietf-languages@alvestrand.no
Received: from localhost (localhost [127.0.0.1]) by eikenes.alvestrand.no (Postfix) with ESMTP id 22C6639E696 for <ietf-languages@alvestrand.no>; Thu, 25 Sep 2008 20:15:32 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at eikenes.alvestrand.no
Received: from eikenes.alvestrand.no ([127.0.0.1]) by localhost (eikenes.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FRfpnLQAuDll for <ietf-languages@alvestrand.no>; Thu, 25 Sep 2008 20:15:31 +0200 (CEST)
X-Greylist: from auto-whitelisted by SQLgrey-1.6.8
Received: from pechora3.lax.icann.org (pechora3.icann.org [208.77.188.38]) by eikenes.alvestrand.no (Postfix) with ESMTPS id 4D8B439E40A for <ietf-languages@alvestrand.no>; Thu, 25 Sep 2008 20:15:31 +0200 (CEST)
Received: from smtp-fw-4101.amazon.com (smtp-fw-4101.amazon.com [72.21.198.25]) by pechora3.lax.icann.org (8.13.8/8.13.8) with ESMTP id m8PIFeW7028016 for <ietf-languages@iana.org>; Thu, 25 Sep 2008 11:16:00 -0700
X-IronPort-AV: E=Sophos;i="4.33,309,1220227200"; d="scan'208";a="52189292"
Received: from smtp-in-0201.sea3.amazon.com ([172.20.19.24]) by smtp-border-fw-out-4101.iad4.amazon.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 25 Sep 2008 18:15:39 +0000
Received: from ex-hub-4102.ant.amazon.com (ex-hub-4102.ant.amazon.com [10.248.163.23]) by smtp-in-0201.sea3.amazon.com (8.12.11/8.12.11) with ESMTP id m8PIFc2n011958 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=FAIL); Thu, 25 Sep 2008 18:15:38 GMT
Received: from EX-SEA5-D.ant.amazon.com ([10.248.163.28]) by ex-hub-4102.ant.amazon.com ([10.248.163.23]) with mapi; Thu, 25 Sep 2008 11:15:38 -0700
From: "Phillips, Addison" <addison@amazon.com>
To: CE Whitehead <cewcathar@hotmail.com>, "ietf-languages@iana.org" <ietf-languages@iana.org>
Date: Thu, 25 Sep 2008 11:15:37 -0700
Subject: RE: Pinyin
Thread-Topic: Pinyin
Thread-Index: AckfNqkrQewhHsu9TAydczOCJmj9AQAAazAw
Message-ID: <4D25F22093241741BC1D0EEBC2DBB1DA014C325E8A@EX-SEA5-D.ant.amazon.com>
References: <BLU109-W3660D8B2CB2E8EC81613E0B3440@phx.gbl>
In-Reply-To: <BLU109-W3660D8B2CB2E8EC81613E0B3440@phx.gbl>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-Virus-Scanned: ClamAV 0.93.3/8337/Thu Sep 25 09:07:00 2008 on pechora3.lax.icann.org
X-Virus-Status: Clean
X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.0 (pechora3.lax.icann.org [208.77.188.38]); Thu, 25 Sep 2008 11:16:00 -0700 (PDT)
X-BeenThere: ietf-languages@alvestrand.no
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IETF Language tag discussions <ietf-languages.alvestrand.no>
List-Unsubscribe: <http://www.alvestrand.no/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@alvestrand.no?subject=unsubscribe>
List-Archive: <http://www.alvestrand.no/pipermail/ietf-languages>
List-Post: <mailto:ietf-languages@alvestrand.no>
List-Help: <mailto:ietf-languages-request@alvestrand.no?subject=help>
List-Subscribe: <http://www.alvestrand.no/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@alvestrand.no?subject=subscribe>
X-List-Received-Date: Thu, 25 Sep 2008 18:15:32 -0000

> Regarding Niall Tracey's concern, I think that if the various
> orthographies use an almost identical character set, it is o.k. to

Your terminology isn't chosen very well here. Most of the Pinyins and GOST (a transliteration used for the Russian language) use an "almost identical character set". Measuring an into-Latin transcription scheme on its character set would suggest that we tag them all as 'Latn' and be done with it. :-)

> 
> (I've noted that we do not even bother to differentiate the various
> European orthographies used for French, Spanish, English; 

Actually, we do. Many of these happen to fall on regional boundaries and thus use a region subtag to differentiate them. Or are you claiming that "en-US" and "en-GB" are the same? And we have, for example, the 'scottish' and 'scouse' subtags for those variations of English; the different year-based spelling reforms of German; the French subtags; a flock of Slovenian subtags; the recently mooted Belarussian subtags; and so forth.

Or, by "orthography" do you mean "transliteration"?

> so I do
> not think it will be offensive to treat these as a group if they
> are closely related; 

It is offensive if you are offended by it: having your language treated as a corruption or dialect of someone else's "proper" language might be perceived as insensitive. Note well: I am not saying this applies to this case. But that is Niall's point as I understood it.

> on the contrary, I think it will be nice to
> have subtags for the related Romanizations of the different
> languages.  Of course, no one has requested all these but us, and
> we could wait for a request for all but the Hanu Pinyin
> Romanization of Mandarin, but if these orthographies are related,
> we do not need to do so ).

Lots of orthographies are "related". The question isn't whether different things are "related" but rather it is whether the distinctions between them are sufficiently important to identify (or not). If Hanyu Pinyin is not different in any important way from other forms of Pinyin, then lumping them would be "okay" and possibly even responsible. However, the request didn't come through for just any Pinyin: the requester clearly was knowledgeable enough to specifically request Hanyu Pinyin. So, absent any comments from him here, I would tend to assume the distinction was important to him.

Furthermore, anyone is free to request subtags for other transcription schemes. That is what the registration process is for!

Addison