[I18ndir] Review of new characters for Unicode 12.0.0
Martin J. Dürst <duerst@it.aoyama.ac.jp> Mon, 18 March 2019 00:49 UTC
Return-Path: <duerst@it.aoyama.ac.jp>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E2D44128B01 for <i18ndir@ietfa.amsl.com>; Sun, 17 Mar 2019 17:49:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.923
X-Spam-Level:
X-Spam-Status: No, score=-0.923 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FROM_EXCESS_BASE64=0.979, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=itaoyama.onmicrosoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ryDIxxEzc8bM for <i18ndir@ietfa.amsl.com>; Sun, 17 Mar 2019 17:49:46 -0700 (PDT)
Received: from JPN01-OS2-obe.outbound.protection.outlook.com (mail-eopbgr1410107.outbound.protection.outlook.com [40.107.141.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DCF34130EEE for <i18ndir@ietf.org>; Sun, 17 Mar 2019 17:49:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itaoyama.onmicrosoft.com; s=selector1-it-aoyama-ac-jp; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=LHJIS+FiBy60Xs2TzLyiiZBa9v1v9pEpeI8kgqX0rvQ=; b=dpD7bVwoIztPqKbvYSMyfSmIftOHkRmWr87JpzIfGzFyyTcTaxX8UcfzCpLPIZmRwRp5v3TSeB0kd2hoPxFcmEn4RvvHoereUH3Ms40jHIMqrE29ADOY/Rx3ckksso9BflQvkmhsKlQzHjF9q08130QvdiIwZXX5RQH4h+Xu1tM=
Received: from TYAPR01MB5149.jpnprd01.prod.outlook.com (20.179.187.18) by TYAPR01MB4941.jpnprd01.prod.outlook.com (20.179.186.22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1709.14; Mon, 18 Mar 2019 00:49:43 +0000
Received: from TYAPR01MB5149.jpnprd01.prod.outlook.com ([fe80::98b6:d90e:9ae7:302]) by TYAPR01MB5149.jpnprd01.prod.outlook.com ([fe80::98b6:d90e:9ae7:302%3]) with mapi id 15.20.1709.015; Mon, 18 Mar 2019 00:49:43 +0000
From: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
To: "i18ndir@ietf.org" <i18ndir@ietf.org>
Thread-Topic: Review of new characters for Unicode 12.0.0
Thread-Index: AQHU3SR5vGzOCrEDAkiGi1ITnTQZxw==
Date: Mon, 18 Mar 2019 00:49:43 +0000
Message-ID: <e0174987-056d-d74e-c3fa-5b457a72f8c3@it.aoyama.ac.jp>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator:
x-clientproxiedby: TYXPR01CA0049.jpnprd01.prod.outlook.com (2603:1096:403:a::19) To TYAPR01MB5149.jpnprd01.prod.outlook.com (2603:1096:404:12e::18)
authentication-results: spf=none (sender IP is ) smtp.mailfrom=duerst@it.aoyama.ac.jp;
x-ms-exchange-messagesentrepresentingtype: 1
x-originating-ip: [223.218.133.122]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 38587a8d-605b-48ec-22e9-08d6ab3b9c02
x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(7021145)(8989299)(4534185)(7022145)(4603075)(4627221)(201702281549075)(8990200)(7048125)(7024125)(7025125)(7027125)(7023125)(5600127)(711020)(4605104)(2017052603328)(7153060)(49563074)(7193020); SRVR:TYAPR01MB4941;
x-ms-traffictypediagnostic: TYAPR01MB4941:
x-ms-exchange-purlcount: 31
x-microsoft-antispam-prvs: <TYAPR01MB49417371CB9573BDAD3B7C34CA470@TYAPR01MB4941.jpnprd01.prod.outlook.com>
x-forefront-prvs: 098076C36C
x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(39830400003)(376002)(136003)(396003)(366004)(346002)(199004)(189003)(6512007)(2616005)(85202003)(106356001)(5024004)(305945005)(68736007)(7736002)(6306002)(105586002)(85182001)(31696002)(2906002)(97736004)(86362001)(186003)(6916009)(2501003)(52116002)(66066001)(508600001)(486006)(2351001)(476003)(26005)(8676002)(81166006)(6116002)(3846002)(786003)(81156014)(6506007)(14454004)(6436002)(8936002)(99936001)(5640700003)(71190400001)(5660300002)(316002)(25786009)(386003)(99286004)(256004)(966005)(74482002)(53376002)(14444005)(102836004)(6486002)(53936002)(31686004)(71200400001); DIR:OUT; SFP:1102; SCL:1; SRVR:TYAPR01MB4941; H:TYAPR01MB5149.jpnprd01.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:0; MX:1;
received-spf: None (protection.outlook.com: it.aoyama.ac.jp does not designate permitted sender hosts)
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam-message-info: 1qmrx95+kCzOjVeTqdapwGFjI6pTl5sfMSLDIy8WBVWvJLWL1/N0e1sdqPxdVDhLgSDTzkZ/vC6huvXPuGSLSrsiGael5hsMN4UH0al6NEN4hUD56OKrVfaeB49fE5aDb9D/NjV+rEk2ESgKVGvZ+Mopyz54CeTpn6wwY2ZgOWufFVlkOO99w+73U1qwgGMTJ7/jkgiuvBI7v+sJqUedjCKlNJugEFZDJ1Gwqdd06D72VTxW4MJiQOoyKtYGNIlVuWbGR+f3I/bbJXP3CmALySp5JeYJOEIayD74/7s5/QzAfNUN8DtN9gHnZ+c/tjlQaMIjMbkVwWpNtZwcWIEHuF3Ng4Ko1P1NM+o7C4UEjX4nO8QL+N9EWJL6a9Q++vjtunDop1XJ/U41dYIyPSQ8VcGyq4D8sf8hCanpV87MXXI=
Content-Type: multipart/mixed; boundary="_002_e0174987056dd74ec3fa5b457a72f8c3itaoyamaacjp_"
MIME-Version: 1.0
X-OriginatorOrg: it.aoyama.ac.jp
X-MS-Exchange-CrossTenant-Network-Message-Id: 38587a8d-605b-48ec-22e9-08d6ab3b9c02
X-MS-Exchange-CrossTenant-originalarrivaltime: 18 Mar 2019 00:49:43.0621 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: e02030e7-4d45-463e-a968-0290e738c18e
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-Transport-CrossTenantHeadersStamped: TYAPR01MB4941
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/aFkoJEp7lUCL7pRaeebznaVkqM8>
Subject: [I18ndir] Review of new characters for Unicode 12.0.0
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Mar 2019 00:49:49 -0000
There were some talks about doing a review of the new characters for Unicode 12.0.0, for 'due diligence'. Here are the results such a review of new characters for Unicode 12.0.0, starting off http://www.unicode.org/charts/PDF/Unicode-12.0/. I also used a small Ruby program that I wrote, attached. In order for it to work, you have to make sure you use the latest (and greatest :-) version of Ruby that supports Unicode 12.0.0. Telugu http://www.unicode.org/charts/PDF/Unicode-12.0/U120-0C00.pdf new U+0C77 TELUGU SIGN SIDDHAM -> disallowed, okay for sign used at the beginning of texts as an invocation Lao http://www.unicode.org/charts/PDF/Unicode-12.0/U120-0E80.pdf new Lao letters and a sign (virama) for Pali and Sanskrit -> pvalid, which is okay for historical letters and marks Vedic Extensions http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1CD0.pdf new U+1CFA VEDIC SIGN DOUBLE ANUSVARA ANTARGOMUKHA -> pvalid, okay because this is graphically a base letter (used as a base for a combining nasal sign) Miscellaneous Symbols and Arrows http://www.unicode.org/charts/PDF/Unicode-12.0/U120-2B00.pdf two new symbols -> disallowed, okay Supplemental Punctuation http://www.unicode.org/charts/PDF/Unicode-12.0/U120-2E00.pdf new U+2E4F CORNISH VERSE DIVIDER -> disallowed, okay Latin Extended-D http://www.unicode.org/charts/PDF/Unicode-12.0/U120-A720.pdf 4 casing pairs, 3 upper-case letters completing case pairs with existing lower-case letters. Upper-case are disallowed (-> okay), lower-case are pvalid. Among these 4, the following three may merit a closer look: U+A7BB LATIN SMALL LETTER GLOTTAL A U+A7BD LATIN SMALL LETTER GLOTTAL I U+A7BF LATIN SMALL LETTER GLOTTAL U They are graphically very similar (if not identical) to the following three from the Latin extensions for Vietnamese: (see https://www.unicode.org/charts/PDF/U1E00.pdf) U+1EA3 ả LATIN SMALL LETTER A WITH HOOK ABOVE U+1EC9 ỉ LATIN SMALL LETTER I WITH HOOK ABOVE U+1EE7 ủ LATIN SMALL LETTER U WITH HOOK ABOVE I have sent a mail to Unicode experts to get more information on these. Latin Extended-E http://www.unicode.org/charts/PDF/Unicode-12.0/U120-AB30.pdf two new letters for Sinological transcription, digraphs with hooks -> pvalid, okay Elymaic http://www.unicode.org/charts/PDF/Unicode-12.0/U120-10FE0.pdf new script Elymaic, 23 letters, thereof one ligature -> pvalid, okay Newa http://www.unicode.org/charts/PDF/Unicode-12.0/U120-11400.pdf U+1145F NEWA LETTER VEDIC ANUSVARA -> pvalid, okay Takri http://www.unicode.org/charts/PDF/Unicode-12.0/U120-11680.pdf U+116B8 TAKRI LETTER ARCHAIC KHA -> pvalid, okay Nandinagari http://www.unicode.org/charts/PDF/Unicode-12.0/U120-119A0.pdf new script Nandinagari, all letters pvalid, one sign (U+119E3) disallowed, okay Soyombo http://www.unicode.org/charts/PDF/Unicode-12.0/U120-11A50.pdf two alternate visarga signs in Soyombo pvalid, okay [maybe these are targets for a variant set] Tamil Supplement http://www.unicode.org/charts/PDF/Unicode-12.0/U120-11FC0.pdf Tamil supplement: signs for fractions, traditional units, etc. disallowed, okay Egyptian Hieroglyph Format Controls http://www.unicode.org/charts/PDF/Unicode-12.0/U120-13430.pdf Egyptian Hieroglyph Format Controls for quadrats -> all disallowed, okay Miao http://www.unicode.org/charts/PDF/Unicode-12.0/U120-16F00.pdf Miao letters and marks for writing additional languages -> all pvalid, okay Ideographic Symbols and Punctuation http://www.unicode.org/charts/PDF/Unicode-12.0/U120-16FE0.pdf two additional symbols -> disallowed, okay Tangut http://www.unicode.org/charts/PDF/Unicode-12.0/U120-17000.pdf 6 additional Tangut characters -> pvalid, okay Small Kana Extension http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1B130.pdf historic small Hiragana/Katakana extensions -> pvalid, okay [Should be excluded from general use because they may be confused with the larger versions by everyday users, but this is only really a problem for the last character (U+1B167 KATAKANA LETTER SMALL N) because the others won't usually appear in names anyway.] Nyiakeng Puachue Hmong http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1E100.pdf new script, all characters except one are are pvalid I have done some additional checks on the following two: U+1E14E NYIAKENG PUACHUE HMONG LOGOGRAM NYAJ o used to represent the word for money or currency U+1E14F NYIAKENG PUACHUE HMONG CIRCLED CA o used to indicate ownership -> U+1E108 nyiakeng puachue hmong letter ca Currently the former is pvalid but the later is disallowed. It may well be that it would be more appropriate to have the former be disallowed (as currently the '$' sign is disallowed in LDH) but allow the latter, if it is e.g. appearing in composite words which may appear in domain names. But according to the descriptions in http://unicode.org/L2/L2017/17002r3-n4780r3-nyiakeng-puachue-hmong.pdf, we are fine as is. Wancho http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1E2C0.pdf new script, all letters/digits/tone marks pvalid, one currency sign disallowed, okay Adlam http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1E900.pdf U+1E94B ADLAM NASALIZATION MARK -> pvalid, okay [Looks somewhat like an apostrophe, but that can be said of a Hebrew Yod,... too.] Ottoman Siyaq Numbers http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1ED00.pdf These are numbers, but they are not positional, and there's no zero -> disallowed, okay [If these were in present-day use, we might want to check whether these might appear in e.g. company names and such, but these are only used historically, so we're okay.] Enclosed Alphanumeric Supplement http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1F100.pdf U+1F16C RAISED MR SIGN, has a NFKC composition -> disallowed, okay Transport and Map Symbols http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1F680.pdf two additional transport-related emoji -> disallowed, okay Geometric Shapes Extended http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1F780.pdf colored circles and squares -> disallowed, okay Supplemental Symbols and Pictographs http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1F900.pdf additional emoji of various kinds -> disallowed, okay Chess Symbols http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1FA00.pdf turned and other variants of chess symbols for variants of chess -> disallowed, okay Symbols and Pictographs Extended-A http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1FA70.pdf additional emoji -> disallowed, okay Regards, Martin.
- [I18ndir] Review of new characters for Unicode 12… Martin J. Dürst
- Re: [I18ndir] Review of new characters for Unicod… Asmus Freytag
- Re: [I18ndir] Review of new characters for Unicod… Martin J. Dürst
- Re: [I18ndir] Review of new characters for Unicod… Asmus Freytag
- Re: [I18ndir] Review of new characters for Unicod… Martin J. Dürst
- Re: [I18ndir] Review of new characters for Unicod… Patrik Fältström
- Re: [I18ndir] Unicode 12.1.0 (was: Review of new … Martin J. Dürst
- Re: [I18ndir] Review of new characters for Unicod… Patrik Fältström
- Re: [I18ndir] Review of new characters for Unicod… Martin J. Dürst
- Re: [I18ndir] Review of new characters for Unicod… Martin J. Dürst
- Re: [I18ndir] Review of new characters for Unicod… Patrik Fältström
- Re: [I18ndir] Review of new characters for Unicod… Martin J. Dürst
- Re: [I18ndir] Unicode 12.1.0 Asmus Freytag
- Re: [I18ndir] Review of new characters for Unicod… Patrik Fältström
- Re: [I18ndir] Review of new characters for Unicod… Martin J. Dürst
- Re: [I18ndir] Review of new characters for Unicod… Patrik Fältström