Pinyin

Michael Everson <everson@evertype.com> Wed, 24 September 2008 08:58 UTC

Return-Path: <everson@evertype.com>
X-Original-To: ietf-languages@alvestrand.no
Delivered-To: ietf-languages@alvestrand.no
Received: from localhost (localhost [127.0.0.1]) by eikenes.alvestrand.no (Postfix) with ESMTP id 7190739E498 for <ietf-languages@alvestrand.no>; Wed, 24 Sep 2008 10:58:38 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at eikenes.alvestrand.no
Received: from eikenes.alvestrand.no ([127.0.0.1]) by localhost (eikenes.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LTGTVK6u3089 for <ietf-languages@alvestrand.no>; Wed, 24 Sep 2008 10:58:37 +0200 (CEST)
X-Greylist: from auto-whitelisted by SQLgrey-1.6.8
Received: from pechora2.lax.icann.org (pechora2.icann.org [208.77.188.37]) by eikenes.alvestrand.no (Postfix) with ESMTPS id 417DE39E477 for <ietf-languages@alvestrand.no>; Wed, 24 Sep 2008 10:58:37 +0200 (CEST)
Received: from lh22.dnsireland.com (lh22.dnsireland.com [78.137.164.62]) by pechora2.lax.icann.org (8.13.8/8.13.8) with ESMTP id m8O8wjVB024284 for <ietf-languages@iana.org>; Wed, 24 Sep 2008 01:59:06 -0700
Received: from murrisk2.westnet.ie ([88.81.100.235]:52391 helo=[192.168.1.112]) by lh22.dnsireland.com with esmtpa (Exim 4.69) (envelope-from <everson@evertype.com>) id 1KiQCl-0005nN-Mm for ietf-languages@iana.org; Wed, 24 Sep 2008 09:58:39 +0100
Message-Id: <83C5E5CB-FE27-47BA-A98F-F5003F586A64@evertype.com>
From: Michael Everson <everson@evertype.com>
To: ietflang IETF Languages Discussion <ietf-languages@iana.org>
In-Reply-To: <30b660a20808251652le711e57vf74b07317c4d29ba@mail.gmail.com>
Content-Type: multipart/alternative; boundary="Apple-Mail-48--873613374"
Mime-Version: 1.0 (Apple Message framework v929.2)
Subject: Pinyin
Date: Wed, 24 Sep 2008 09:58:42 +0100
References: <30b660a20808251532w617adb80w6408b78394afde60@mail.gmail.com> <E19FDBD7A3A7F04788F00E90915BD36C18B96BA8FE@USSDIXMSG20.spe.sony.com> <30b660a20808251652le711e57vf74b07317c4d29ba@mail.gmail.com>
X-Mailer: Apple Mail (2.929.2)
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - lh22.dnsireland.com
X-AntiAbuse: Original Domain - iana.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - evertype.com
X-Virus-Scanned: ClamAV 0.93.3/8322/Tue Sep 23 23:50:41 2008 on pechora2.lax.icann.org
X-Virus-Status: Clean
X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.0 (pechora2.lax.icann.org [208.77.188.37]); Wed, 24 Sep 2008 01:59:06 -0700 (PDT)
X-BeenThere: ietf-languages@alvestrand.no
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IETF Language tag discussions <ietf-languages.alvestrand.no>
List-Unsubscribe: <http://www.alvestrand.no/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@alvestrand.no?subject=unsubscribe>
List-Archive: <http://www.alvestrand.no/pipermail/ietf-languages>
List-Post: <mailto:ietf-languages@alvestrand.no>
List-Help: <mailto:ietf-languages-request@alvestrand.no?subject=help>
List-Subscribe: <http://www.alvestrand.no/mailman/listinfo/ietf-languages>, <mailto:ietf-languages-request@alvestrand.no?subject=subscribe>
X-List-Received-Date: Wed, 24 Sep 2008 08:58:38 -0000

I would like to hear specific comments about this. I asked for it  
first on 2008-09-17.

Right. So we need to be able to tag data so that people know that the
language is Mandarin and the specific Romanization is Hanyu Pinyin, a
Latin-based orthography. However, the orthographic conventions behind
Pinyin are applied to other Chinese languages, and in a variant used
in Taiwan.

zh-pinyin
zh-CN-pinyin
zh-Latn-pinyin
zh-Latn-CN-pinyin
These are formally ambiguous as to which Chinese language (in terms of
the set of languages written in Han characters) it is. However, in
this registration, I think we ought to SPECIFY that the string zh-
pinyin refers to Mandarin Chinese in Hanyu Pinyin orthography--not to
any other form of Chinese nor any other orthography.

zh-cmn-pinyin
zh-cmn-Latn-pinyin
zh-cmn-CN-pinyin
zh-cmn-Latn-CN-pinyin
cmn-pinyin
cmn-Latn-pinyin
cmn-CN-pinyin
cmn-Latn-CN-pinyin
All of these can only mean Mandarin Chinese in Hanyu Pinyin
romanization; they are not yet permitted but will be (one supposes).
Other Chinese languages might be listed with 639-3 in due course.
For the present, this set doesn't matter to us. It's a large set though.

zh-TW-pinyin
zh-Latn-TW-pinyin
This is Tongyong Pinyin orthography, also to be defined as Mandarin
Chinese language.

zh-cmn-TW-pinyin
zh-cmn-Latn-TW-pinyin
cmn-TW-pinyin
cmn-Latn-TW-pinyin
All of these can only mean Mandarin Chinese in Tongyong Pinyin
romanization; they are not yet permitted but will be (one supposes).
For the present, this set doesn't matter to us.

bo-pinyin
bo-Latn-pinyin
Both of these mean Tibetan language in Tibetan Pinyin romanization (as
opposed to Wiley for instance).

Peter says he would like the recommended prefix to contain -Latn-.
Mark said he could live with or without it but thought that "with"
should be recommended. Should we assist users of this subtag by having
some redundancy in the registration? At this stage I think that "best
practice" (with -Latn-) being the only one specified might be
insufficient.

Michael Everson * http://www.evertype.com