Pinyin

Michael Everson <everson@evertype.com> Wed, 17 September 2008 19:00 UTC

Return-Path: <everson@evertype.com>
X-Original-To: ietf-languages@alvestrand.no
Delivered-To: ietf-languages@alvestrand.no
Received: from localhost (localhost [127.0.0.1]) by eikenes.alvestrand.no (Postfix) with ESMTP id ED11139E6A1 for <ietf-languages@alvestrand.no>; Wed, 17 Sep 2008 21:00:52 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at eikenes.alvestrand.no
Received: from eikenes.alvestrand.no ([127.0.0.1]) by localhost (eikenes.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Zof8pG4y4Scm for <ietf-languages@alvestrand.no>; Wed, 17 Sep 2008 21:00:52 +0200 (CEST)
X-Greylist: from auto-whitelisted by SQLgrey-1.6.8
Received: from pechora2.lax.icann.org (pechora2.icann.org [208.77.188.37]) by eikenes.alvestrand.no (Postfix) with ESMTPS id D3FBE39E1CB for <ietf-languages@alvestrand.no>; Wed, 17 Sep 2008 21:00:51 +0200 (CEST)
Received: from lh22.dnsireland.com (lh22.dnsireland.com [78.137.164.62]) by pechora2.lax.icann.org (8.13.8/8.13.8) with ESMTP id m8HJ0wbe030294 for <ietf-languages@iana.org>; Wed, 17 Sep 2008 12:01:18 -0700
Received: from 94-116-17-21.dynamic.thecloud.net ([94.116.17.21]:52127 helo=[10.94.17.21]) by lh22.dnsireland.com with esmtpa (Exim 4.69) (envelope-from <everson@evertype.com>) id 1Kg2AQ-0001SE-TX for ietf-languages@iana.org; Wed, 17 Sep 2008 19:54:23 +0100
Message-Id: <94F790CF-1EB5-4108-8677-F0CC494FA0E3@evertype.com>
From: Michael Everson <everson@evertype.com>
To: ietflang IETF Languages Discussion <ietf-languages@iana.org>
In-Reply-To: <30b660a20808251652le711e57vf74b07317c4d29ba@mail.gmail.com>
Content-Type: text/plain; charset="US-ASCII"; format="flowed"; delsp="yes"
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v929.2)
Subject: Pinyin
Date: Wed, 17 Sep 2008 15:40:20 +0100
References: <30b660a20808251532w617adb80w6408b78394afde60@mail.gmail.com> <E19FDBD7A3A7F04788F00E90915BD36C18B96BA8FE@USSDIXMSG20.spe.sony.com> <30b660a20808251652le711e57vf74b07317c4d29ba@mail.gmail.com>
X-Mailer: Apple Mail (2.929.2)
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - lh22.dnsireland.com
X-AntiAbuse: Original Domain - iana.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - evertype.com
X-Virus-Scanned: ClamAV 0.93.3/8271/Wed Sep 17 09:58:50 2008 on pechora2.lax.icann.org
X-Virus-Status: Clean
X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.0 (pechora2.lax.icann.org [208.77.188.37]); Wed, 17 Sep 2008 12:01:19 -0700 (PDT)
X-BeenThere: ietf-languages@alvestrand.no
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IETF Language tag discussions <ietf-languages.alvestrand.no>
List-Unsubscribe: <http://www.alvestrand.no/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@alvestrand.no?subject=unsubscribe>
List-Archive: <http://www.alvestrand.no/pipermail/ietf-languages>
List-Post: <mailto:ietf-languages@alvestrand.no>
List-Help: <mailto:ietf-languages-request@alvestrand.no?subject=help>
List-Subscribe: <http://www.alvestrand.no/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@alvestrand.no?subject=subscribe>
X-List-Received-Date: Wed, 17 Sep 2008 19:00:53 -0000

Right. So we need to be able to tag data so that people know that the  
language is Mandarin and the specific Romanization is Hanyu Pinyin, a  
Latin-based orthography. However, the orthographic conventions behind  
Pinyin are applied to other Chinese languages, and in a variant used  
in Taiwan.

zh-pinyin
zh-CN-pinyin
zh-Latn-pinyin
zh-Latn-CN-pinyin
These are formally ambiguous as to which Chinese language (in terms of  
the set of languages written in Han characters) it is. However, in  
this registration, I think we ought to SPECIFY that the string zh- 
pinyin refers to Mandarin Chinese in Hanyu Pinyin orthography--not to  
any other form of Chinese nor any other orthography.

zh-cmn-pinyin
zh-cmn-Latn-pinyin
zh-cmn-CN-pinyin
zh-cmn-Latn-CN-pinyin
cmn-pinyin
cmn-Latn-pinyin
cmn-CN-pinyin
cmn-Latn-CN-pinyin
All of these can only mean Mandarin Chinese in Hanyu Pinyin  
romanization; they are not yet permitted but will be (one supposes).  
Other Chinese languages might be listed with 639-3 in due course.

zh-TW-pinyin
zh-Latn-TW-pinyin
This is Tongyong Pinyin orthography, also defaulting to Mandarin  
Chinese language.

zh-cmn-TW-pinyin
zh-cmn-Latn-TW-pinyin
cmn-TW-pinyin
cmn-Latn-TW-pinyin
All of these can only mean Mandarin Chinese in Tongyong Pinyin  
romanization; they are not yet permitted but will be (one supposes).

bo-pinyin
bo-Latn-pinyin
Both of these mean Tibetan language in Tibetan Pinyin romanization (as  
opposed to Wiley for instance).

Peter says he would like the recommended prefix to contain -Latn-.  
Mark said he could live with or without it but thought that "with"  
should be recommended. Should we assist users of this subtag by having  
some redundancy in the registration? At this stage I think that "best  
practice" (with -Latn-) being the only one specified might be  
insufficient.