[I18ndir] Review of new characters for Unicode 12.0.0

Martin J. Dürst <duerst@it.aoyama.ac.jp> Mon, 18 March 2019 00:49 UTC

Return-Path: <duerst@it.aoyama.ac.jp>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E2D44128B01 for <i18ndir@ietfa.amsl.com>; Sun, 17 Mar 2019 17:49:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.923
X-Spam-Level:
X-Spam-Status: No, score=-0.923 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FROM_EXCESS_BASE64=0.979, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=itaoyama.onmicrosoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ryDIxxEzc8bM for <i18ndir@ietfa.amsl.com>; Sun, 17 Mar 2019 17:49:46 -0700 (PDT)
Received: from JPN01-OS2-obe.outbound.protection.outlook.com (mail-eopbgr1410107.outbound.protection.outlook.com [40.107.141.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DCF34130EEE for <i18ndir@ietf.org>; Sun, 17 Mar 2019 17:49:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itaoyama.onmicrosoft.com; s=selector1-it-aoyama-ac-jp; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=LHJIS+FiBy60Xs2TzLyiiZBa9v1v9pEpeI8kgqX0rvQ=; b=dpD7bVwoIztPqKbvYSMyfSmIftOHkRmWr87JpzIfGzFyyTcTaxX8UcfzCpLPIZmRwRp5v3TSeB0kd2hoPxFcmEn4RvvHoereUH3Ms40jHIMqrE29ADOY/Rx3ckksso9BflQvkmhsKlQzHjF9q08130QvdiIwZXX5RQH4h+Xu1tM=
Received: from TYAPR01MB5149.jpnprd01.prod.outlook.com (20.179.187.18) by TYAPR01MB4941.jpnprd01.prod.outlook.com (20.179.186.22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1709.14; Mon, 18 Mar 2019 00:49:43 +0000
Received: from TYAPR01MB5149.jpnprd01.prod.outlook.com ([fe80::98b6:d90e:9ae7:302]) by TYAPR01MB5149.jpnprd01.prod.outlook.com ([fe80::98b6:d90e:9ae7:302%3]) with mapi id 15.20.1709.015; Mon, 18 Mar 2019 00:49:43 +0000
From: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
To: "i18ndir@ietf.org" <i18ndir@ietf.org>
Thread-Topic: Review of new characters for Unicode 12.0.0
Thread-Index: AQHU3SR5vGzOCrEDAkiGi1ITnTQZxw==
Date: Mon, 18 Mar 2019 00:49:43 +0000
Message-ID: <e0174987-056d-d74e-c3fa-5b457a72f8c3@it.aoyama.ac.jp>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator:
x-clientproxiedby: TYXPR01CA0049.jpnprd01.prod.outlook.com (2603:1096:403:a::19) To TYAPR01MB5149.jpnprd01.prod.outlook.com (2603:1096:404:12e::18)
authentication-results: spf=none (sender IP is ) smtp.mailfrom=duerst@it.aoyama.ac.jp;
x-ms-exchange-messagesentrepresentingtype: 1
x-originating-ip: [223.218.133.122]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 38587a8d-605b-48ec-22e9-08d6ab3b9c02
x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(7021145)(8989299)(4534185)(7022145)(4603075)(4627221)(201702281549075)(8990200)(7048125)(7024125)(7025125)(7027125)(7023125)(5600127)(711020)(4605104)(2017052603328)(7153060)(49563074)(7193020); SRVR:TYAPR01MB4941;
x-ms-traffictypediagnostic: TYAPR01MB4941:
x-ms-exchange-purlcount: 31
x-microsoft-antispam-prvs: <TYAPR01MB49417371CB9573BDAD3B7C34CA470@TYAPR01MB4941.jpnprd01.prod.outlook.com>
x-forefront-prvs: 098076C36C
x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(39830400003)(376002)(136003)(396003)(366004)(346002)(199004)(189003)(6512007)(2616005)(85202003)(106356001)(5024004)(305945005)(68736007)(7736002)(6306002)(105586002)(85182001)(31696002)(2906002)(97736004)(86362001)(186003)(6916009)(2501003)(52116002)(66066001)(508600001)(486006)(2351001)(476003)(26005)(8676002)(81166006)(6116002)(3846002)(786003)(81156014)(6506007)(14454004)(6436002)(8936002)(99936001)(5640700003)(71190400001)(5660300002)(316002)(25786009)(386003)(99286004)(256004)(966005)(74482002)(53376002)(14444005)(102836004)(6486002)(53936002)(31686004)(71200400001); DIR:OUT; SFP:1102; SCL:1; SRVR:TYAPR01MB4941; H:TYAPR01MB5149.jpnprd01.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:0; MX:1;
received-spf: None (protection.outlook.com: it.aoyama.ac.jp does not designate permitted sender hosts)
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam-message-info: 1qmrx95+kCzOjVeTqdapwGFjI6pTl5sfMSLDIy8WBVWvJLWL1/N0e1sdqPxdVDhLgSDTzkZ/vC6huvXPuGSLSrsiGael5hsMN4UH0al6NEN4hUD56OKrVfaeB49fE5aDb9D/NjV+rEk2ESgKVGvZ+Mopyz54CeTpn6wwY2ZgOWufFVlkOO99w+73U1qwgGMTJ7/jkgiuvBI7v+sJqUedjCKlNJugEFZDJ1Gwqdd06D72VTxW4MJiQOoyKtYGNIlVuWbGR+f3I/bbJXP3CmALySp5JeYJOEIayD74/7s5/QzAfNUN8DtN9gHnZ+c/tjlQaMIjMbkVwWpNtZwcWIEHuF3Ng4Ko1P1NM+o7C4UEjX4nO8QL+N9EWJL6a9Q++vjtunDop1XJ/U41dYIyPSQ8VcGyq4D8sf8hCanpV87MXXI=
Content-Type: multipart/mixed; boundary="_002_e0174987056dd74ec3fa5b457a72f8c3itaoyamaacjp_"
MIME-Version: 1.0
X-OriginatorOrg: it.aoyama.ac.jp
X-MS-Exchange-CrossTenant-Network-Message-Id: 38587a8d-605b-48ec-22e9-08d6ab3b9c02
X-MS-Exchange-CrossTenant-originalarrivaltime: 18 Mar 2019 00:49:43.0621 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: e02030e7-4d45-463e-a968-0290e738c18e
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-Transport-CrossTenantHeadersStamped: TYAPR01MB4941
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/aFkoJEp7lUCL7pRaeebznaVkqM8>
Subject: [I18ndir] Review of new characters for Unicode 12.0.0
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Mar 2019 00:49:49 -0000

There were some talks about doing a review of the new characters for 
Unicode 12.0.0, for 'due diligence'.

Here are the results such a review of new characters for Unicode 12.0.0, 
starting off http://www.unicode.org/charts/PDF/Unicode-12.0/.
I also used a small Ruby program that I wrote, attached. In order for it 
to work, you have to make sure you use the latest (and greatest :-) 
version of Ruby that supports Unicode 12.0.0.

Telugu
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-0C00.pdf
new U+0C77 TELUGU SIGN SIDDHAM -> disallowed, okay for sign used at the 
beginning of texts as an invocation

Lao
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-0E80.pdf
new Lao letters and a sign (virama) for Pali and Sanskrit
-> pvalid, which is okay for historical letters and marks

Vedic Extensions
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1CD0.pdf
new U+1CFA VEDIC SIGN DOUBLE ANUSVARA ANTARGOMUKHA -> pvalid, okay 
because this is graphically a base letter (used as a base for a 
combining nasal sign)

Miscellaneous Symbols and Arrows
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-2B00.pdf
two new symbols -> disallowed, okay

Supplemental Punctuation
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-2E00.pdf
new U+2E4F CORNISH VERSE DIVIDER -> disallowed, okay

Latin Extended-D
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-A720.pdf
4 casing pairs, 3 upper-case letters completing case pairs with existing 
lower-case letters. Upper-case are disallowed (-> okay), lower-case are 
pvalid. Among these 4, the following three may merit a closer look:
U+A7BB LATIN SMALL LETTER GLOTTAL A
U+A7BD LATIN SMALL LETTER GLOTTAL I
U+A7BF LATIN SMALL LETTER GLOTTAL U
They are graphically very similar (if not identical) to the following 
three from the Latin extensions for Vietnamese:
(see https://www.unicode.org/charts/PDF/U1E00.pdf)
U+1EA3 ả LATIN SMALL LETTER A WITH HOOK ABOVE
U+1EC9 ỉ LATIN SMALL LETTER I WITH HOOK ABOVE
U+1EE7 ủ LATIN SMALL LETTER U WITH HOOK ABOVE
I have sent a mail to Unicode experts to get more information on these.

Latin Extended-E
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-AB30.pdf
two new letters for Sinological transcription, digraphs with hooks
-> pvalid, okay

Elymaic
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-10FE0.pdf
new script Elymaic, 23 letters, thereof one ligature
-> pvalid, okay

Newa
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-11400.pdf
U+1145F NEWA LETTER VEDIC ANUSVARA -> pvalid, okay

Takri
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-11680.pdf
U+116B8 TAKRI LETTER ARCHAIC KHA -> pvalid, okay

Nandinagari
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-119A0.pdf
new script Nandinagari, all letters pvalid, one sign (U+119E3) 
disallowed, okay

Soyombo
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-11A50.pdf
two alternate visarga signs in Soyombo
pvalid, okay [maybe these are targets for a variant set]

Tamil Supplement
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-11FC0.pdf
Tamil supplement: signs for fractions, traditional units, etc.
disallowed, okay

Egyptian Hieroglyph Format Controls
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-13430.pdf
Egyptian Hieroglyph Format Controls for quadrats -> all disallowed, okay

Miao
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-16F00.pdf
Miao letters and marks for writing additional languages -> all pvalid, okay

Ideographic Symbols and Punctuation
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-16FE0.pdf
two additional symbols -> disallowed, okay

Tangut
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-17000.pdf
6 additional Tangut characters -> pvalid, okay

Small Kana Extension
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1B130.pdf
historic small Hiragana/Katakana extensions -> pvalid, okay
[Should be excluded from general use because they may be confused with 
the larger versions by everyday users, but this is only really a problem 
for the last character (U+1B167 KATAKANA LETTER SMALL N) because the 
others won't usually appear in names anyway.]

Nyiakeng Puachue Hmong
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1E100.pdf
new script, all characters except one are are pvalid
I have done some additional checks on the following two:
U+1E14E NYIAKENG PUACHUE HMONG LOGOGRAM NYAJ
   o used to represent the word for money or currency
U+1E14F NYIAKENG PUACHUE HMONG CIRCLED CA
   o used to indicate ownership
   -> U+1E108  nyiakeng puachue hmong letter ca
Currently the former is pvalid but the later is disallowed.
It may well be that it would be more appropriate to have the former be 
disallowed (as currently the '$' sign is disallowed in LDH) but allow 
the latter, if it is e.g. appearing in composite words which may appear 
in domain names. But according to the descriptions in
http://unicode.org/L2/L2017/17002r3-n4780r3-nyiakeng-puachue-hmong.pdf,
we are fine as is.

Wancho
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1E2C0.pdf
new script, all letters/digits/tone marks pvalid, one currency sign 
disallowed, okay

Adlam
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1E900.pdf
U+1E94B ADLAM NASALIZATION MARK -> pvalid, okay
[Looks somewhat like an apostrophe, but that can be said of a Hebrew 
Yod,... too.]

Ottoman Siyaq Numbers
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1ED00.pdf
These are numbers, but they are not positional, and there's no zero
-> disallowed, okay
[If these were in present-day use, we might want to check whether these 
might appear in e.g. company names and such, but these are only used 
historically, so we're okay.]

Enclosed Alphanumeric Supplement
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1F100.pdf
U+1F16C RAISED MR SIGN, has a NFKC composition -> disallowed, okay

Transport and Map Symbols
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1F680.pdf
two additional transport-related emoji -> disallowed, okay

Geometric Shapes Extended
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1F780.pdf
colored circles and squares -> disallowed, okay

Supplemental Symbols and Pictographs
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1F900.pdf
additional emoji of various kinds -> disallowed, okay

Chess Symbols
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1FA00.pdf
turned and other variants of chess symbols for variants of chess
-> disallowed, okay

Symbols and Pictographs Extended-A
http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1FA70.pdf
additional emoji -> disallowed, okay


Regards,    Martin.