Re: Pinyin

"Doug Ewell" <doug@ewellic.org> Thu, 25 September 2008 01:03 UTC

Return-Path: <doug@ewellic.org>
X-Original-To: ietf-languages@alvestrand.no
Delivered-To: ietf-languages@alvestrand.no
Received: from localhost (localhost [127.0.0.1]) by eikenes.alvestrand.no (Postfix) with ESMTP id 5FD3439E685 for <ietf-languages@alvestrand.no>; Thu, 25 Sep 2008 03:03:11 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at eikenes.alvestrand.no
Received: from eikenes.alvestrand.no ([127.0.0.1]) by localhost (eikenes.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pXrDTSN+rs1x for <ietf-languages@alvestrand.no>; Thu, 25 Sep 2008 03:03:10 +0200 (CEST)
X-Greylist: from auto-whitelisted by SQLgrey-1.6.8
Received: from pechora1.lax.icann.org (pechora1.icann.org [208.77.188.36]) by eikenes.alvestrand.no (Postfix) with ESMTPS id DBACA39E46F for <ietf-languages@alvestrand.no>; Thu, 25 Sep 2008 03:03:09 +0200 (CEST)
Received: from smtpauth02.prod.mesa1.secureserver.net (smtpauth02.prod.mesa1.secureserver.net [64.202.165.182]) by pechora1.lax.icann.org (8.13.8/8.13.8) with SMTP id m8P13Jim007201 for <ietf-languages@iana.org>; Wed, 24 Sep 2008 18:03:39 -0700
Received: (qmail 12439 invoked from network); 25 Sep 2008 01:03:18 -0000
Received: from unknown (67.177.232.210) by smtpauth02.prod.mesa1.secureserver.net (64.202.165.182) with ESMTP; 25 Sep 2008 01:03:18 -0000
Message-ID: <7B1C8ACAE1994C49B8A417F457B32083@DGBP7M81>
From: Doug Ewell <doug@ewellic.org>
To: ietf-languages@iana.org
References: <mailman.5976.1222283002.6324.ietf-languages@alvestrand.no>
Subject: Re: Pinyin
Date: Wed, 24 Sep 2008 19:03:16 -0600
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="utf-8"; reply-type="original"
Content-Transfer-Encoding: 8bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5512
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5579
X-Virus-Scanned: ClamAV 0.93.3/8326/Wed Sep 24 15:36:10 2008 on pechora1.lax.icann.org
X-Virus-Status: Clean
X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.0 (pechora1.lax.icann.org [208.77.188.36]); Wed, 24 Sep 2008 18:03:39 -0700 (PDT)
X-BeenThere: ietf-languages@alvestrand.no
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IETF Language tag discussions <ietf-languages.alvestrand.no>
List-Unsubscribe: <http://www.alvestrand.no/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@alvestrand.no?subject=unsubscribe>
List-Archive: <http://www.alvestrand.no/pipermail/ietf-languages>
List-Post: <mailto:ietf-languages@alvestrand.no>
List-Help: <mailto:ietf-languages-request@alvestrand.no?subject=help>
List-Subscribe: <http://www.alvestrand.no/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@alvestrand.no?subject=subscribe>
X-List-Received-Date: Thu, 25 Sep 2008 01:03:11 -0000

"Phillips, Addison" <addison at amazon dot com> wrote:

> On the gripping hand, we could also register 'pinyin' to mean "any 
> Pinyin". Then if somebody needs the additional distinction later (like 
> next week), they could register subtags like 'hanyu' and 'tongyong' 
> or, heck, '2009acad' as pinyin variations.

That would be my thought.  Have we already established that the 
following is unworkable or unacceptable?

1. Register 'wadegile', meaning Wade-Giles.

2. Register 'pinyin', meaning any romanization that follows the general
   orthographic conventions of Hanyu Pinyin.

3. This already allows taggers to use "zh-(Latn-)pinyin" to convey the
   meaning they probably want to convey 99% of the time anyway.

4. If and when it is determined that finer granularity is required with
   respect to the various Pinyins, then register lower-level subtags:
   'hanyu', 'tongyong', 'canton', 'tibetan', etc.

5. This allows taggers to apply whatever level of granularity they feel
   is necessary, and avoid subtags they don't feel they need.

zh = some flavor of Chinese, writing system unspecified (or not written)
zh-Latn = Chinese written in any romanization
zh-(Latn-)pinyin = Chinese written in a "pinyin" romanization, could be
   Hanyu or Tongyong but definitely not Wade-Giles
zh-(Latn-)pinyin-hanyu = Chinese (almost certainly Mandarin) in Hanyu
   Pinyin

The upcoming addition of language subtags for Mandarin, Cantonese, etc. 
will facilitate the identification of the "flavor of Chinese" in 
question, which may *in some cases* provide hints that obviate the need 
for multiple "pinyin" subtags.  For example, "(zh-)yue-(Latn-)pinyin" 
would technically be ambiguous as to whether the content is written in 
Hanyu Pinyin or Cantonese Pinyin, but in practice the latter could be 
assumed.  For Mandarin, this would not be as obvious, and the subtag 
'tongyong' would be a reasonable first addition in step 4.

--
Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
http://www.ewellic.org
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ