Re: [I18ndir] Review of new characters for Unicode 12.0.0
Martin J. Dürst <duerst@it.aoyama.ac.jp> Mon, 18 March 2019 04:58 UTC
Return-Path: <duerst@it.aoyama.ac.jp>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A2C9C131257 for <i18ndir@ietfa.amsl.com>; Sun, 17 Mar 2019 21:58:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.923
X-Spam-Level:
X-Spam-Status: No, score=-0.923 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FROM_EXCESS_BASE64=0.979, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=itaoyama.onmicrosoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1273C9_p1339 for <i18ndir@ietfa.amsl.com>; Sun, 17 Mar 2019 21:58:39 -0700 (PDT)
Received: from JPN01-TY1-obe.outbound.protection.outlook.com (mail-eopbgr1400115.outbound.protection.outlook.com [40.107.140.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0835A131242 for <i18ndir@ietf.org>; Sun, 17 Mar 2019 21:58:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=itaoyama.onmicrosoft.com; s=selector1-it-aoyama-ac-jp; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lZt5cbKCWx08jti7Gtrkc9HxLPma41jOkmoOdNQfdIM=; b=RiHVDFES1ZASOApWoUQ8TYRtvj/7sOTdh2zGFsfu8D7UXF9FJh1wFN2/WVbSX5Sw4nlUT9EGjQbTI8mR5C/ttTLmveSEtUqzDyM9aLJGCZN0Bt15wOoSX1/x9kLuA1tNEl9BZLv9Lfdz4XRHFSxEsnDprg5JO8bEez6q5ix//Ak=
Received: from TYAPR01MB5149.jpnprd01.prod.outlook.com (20.179.187.18) by TYAPR01MB4669.jpnprd01.prod.outlook.com (20.179.174.215) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1709.13; Mon, 18 Mar 2019 04:58:36 +0000
Received: from TYAPR01MB5149.jpnprd01.prod.outlook.com ([fe80::98b6:d90e:9ae7:302]) by TYAPR01MB5149.jpnprd01.prod.outlook.com ([fe80::98b6:d90e:9ae7:302%3]) with mapi id 15.20.1709.015; Mon, 18 Mar 2019 04:58:36 +0000
From: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
To: Asmus Freytag <asmusf@ix.netcom.com>, "i18ndir@ietf.org" <i18ndir@ietf.org>
Thread-Topic: [I18ndir] Review of new characters for Unicode 12.0.0
Thread-Index: AQHU3SR5vGzOCrEDAkiGi1ITnTQZx6YQx2oAgAAMyIA=
Date: Mon, 18 Mar 2019 04:58:35 +0000
Message-ID: <0f3e7559-7797-44e2-a281-02ded405e8d0@it.aoyama.ac.jp>
References: <e0174987-056d-d74e-c3fa-5b457a72f8c3@it.aoyama.ac.jp> <38438ac4-7e78-785c-2dbf-af47a9450aa0@ix.netcom.com>
In-Reply-To: <38438ac4-7e78-785c-2dbf-af47a9450aa0@ix.netcom.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-clientproxiedby: TYAPR04CA0001.apcprd04.prod.outlook.com (2603:1096:404:15::13) To TYAPR01MB5149.jpnprd01.prod.outlook.com (2603:1096:404:12e::18)
authentication-results: spf=none (sender IP is ) smtp.mailfrom=duerst@it.aoyama.ac.jp;
x-ms-exchange-messagesentrepresentingtype: 1
x-originating-ip: [133.2.210.64]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: eda19050-790e-4f7f-1450-08d6ab5e60ae
x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(7021145)(8989299)(4534185)(7022145)(4603075)(4627221)(201702281549075)(8990200)(7048125)(7024125)(7025125)(7027125)(7023125)(5600127)(711020)(4605104)(2017052603328)(7153060)(7193020); SRVR:TYAPR01MB4669;
x-ms-traffictypediagnostic: TYAPR01MB4669:
x-ms-exchange-purlcount: 6
x-microsoft-antispam-prvs: <TYAPR01MB4669B952F3714B01844E50E7CA470@TYAPR01MB4669.jpnprd01.prod.outlook.com>
x-forefront-prvs: 098076C36C
x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(39840400004)(366004)(346002)(396003)(136003)(376002)(52314003)(189003)(199004)(51914003)(86362001)(97736004)(110136005)(31696002)(7736002)(305945005)(3846002)(6116002)(966005)(508600001)(786003)(316002)(14454004)(2501003)(66066001)(53936002)(6512007)(6306002)(6246003)(6486002)(6436002)(229853002)(26005)(99286004)(476003)(68736007)(256004)(5024004)(105586002)(14444005)(102836004)(53546011)(6506007)(386003)(31686004)(85202003)(106356001)(74482002)(85182001)(446003)(71200400001)(71190400001)(2906002)(2616005)(11346002)(5660300002)(81156014)(81166006)(8676002)(186003)(76176011)(25786009)(486006)(52116002)(66574012)(8936002); DIR:OUT; SFP:1102; SCL:1; SRVR:TYAPR01MB4669; H:TYAPR01MB5149.jpnprd01.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:0;
received-spf: None (protection.outlook.com: it.aoyama.ac.jp does not designate permitted sender hosts)
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam-message-info: Y+mp3/1G6yEI9BXxeCQNw+dWIgSbmES/t98G+MH1LCRvXrCOeGyg3+JdZysNCZ8BSrm/hOvWYHacvyis9WIhDG8JT1UjAauNxuiFSDA2mUEW3DLYvxIIXTD5P5pBXQxtVFIL1kzgMf6bd32ByqF1JAIPMmQZJgS4cAqF7JTlTjpnu1ckXRVHGhfpZ0dO2re3Y4Z+9j0qGv0IzjjMEiMC0G1+crEA+fTyhA+woK80FEIc4ZYB9wWisJzLicXPZVVhxqmafum1Qgk80DKVrH9uJeV23JJVLiGcYY6fa3tIzv09yZuR708ckRI5X8tfvXq9n93oRXUy/pI+vxdukBKx6HiAseLQq9vfyVZU0T9rMoOZXvlIdWc9lMhmAW/oj+5jjKASFpNHsq1I6QDTT4QykOZo7p2JXEetWoBRHoUg+l0=
Content-Type: text/plain; charset="utf-8"
Content-ID: <109A5B2AF9CA0346A7A89CB7F2F37C4C@jpnprd01.prod.outlook.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-OriginatorOrg: it.aoyama.ac.jp
X-MS-Exchange-CrossTenant-Network-Message-Id: eda19050-790e-4f7f-1450-08d6ab5e60ae
X-MS-Exchange-CrossTenant-originalarrivaltime: 18 Mar 2019 04:58:35.8932 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: e02030e7-4d45-463e-a968-0290e738c18e
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-Transport-CrossTenantHeadersStamped: TYAPR01MB4669
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/YGd7ZVp78tH08WWfhoWdyEBTQ-s>
Subject: Re: [I18ndir] Review of new characters for Unicode 12.0.0
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Mar 2019 04:58:42 -0000
Hello Asmus, Many thanks for the followup, also on the Unicode side. On 2019/03/18 13:12, Asmus Freytag wrote: > On 3/17/2019 5:49 PM, Martin J. Dürst wrote: >> There were some talks about doing a review of the new characters for >> Unicode 12.0.0, for 'due diligence'. >> >> Here are the results such a review of new characters for Unicode 12.0.0, >> starting off http://www.unicode.org/charts/PDF/Unicode-12.0/. >> I also used a small Ruby program that I wrote, attached. In order for it >> to work, you have to make sure you use the latest (and greatest :-) >> version of Ruby that supports Unicode 12.0.0. >> > .... > > >> Latin Extended-D >> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-A720.pdf >> 4 casing pairs, 3 upper-case letters completing case pairs with existing >> lower-case letters. Upper-case are disallowed (-> okay), lower-case are >> pvalid. Among these 4, the following three may merit a closer look: >> U+A7BB LATIN SMALL LETTER GLOTTAL A >> U+A7BD LATIN SMALL LETTER GLOTTAL I >> U+A7BF LATIN SMALL LETTER GLOTTAL U >> They are graphically very similar (if not identical) to the following >> three from the Latin extensions for Vietnamese: >> (see https://www.unicode.org/charts/PDF/U1E00.pdf) >> U+1EA3 ả LATIN SMALL LETTER A WITH HOOK ABOVE >> U+1EC9 ỉ LATIN SMALL LETTER I WITH HOOK ABOVE >> U+1EE7 ủ LATIN SMALL LETTER U WITH HOOK ABOVE >> I have sent a mail to Unicode experts to get more information on these. > > > For A7BD (and it's uppercase, which is not relevant for IDNs) the > documents say: > > "The code positions could be U+A7BA and U+A7BB (next available code > positions), properties as follow: > A7BA;LATIN CAPITAL LETTER EGYPTOLOGICAL YOD;Lu;0;L;;;;;N;;;;A7BB; > A7BB;LATIN SMALL LETTER EGYPTOLOGICAL YOD;Ll;0;L;;;;;N;;;A7BA;;A7BA > There are not confusable with any existing code points, After close examination, I indeed realized that the above Vietnamese characters have hooks (half-circles with a small downwards stroke attached at the lower end) whereas the 'glottal' characters have something more like U+2019 ’ RIGHT SINGLE QUOTATION MARK. So at least on close inspection, they are distinguishable. Making sure that diacritic marks are easily distinguishable in a requirement of any font that's claiming to be suited for IDNA anyway, so this isn't new territory. [I remember the famous font designer Chuck Bigelow talking about this in a keynote at an Unicode conference many many years ago, see also http://cajun.cs.nott.ac.uk/compsci/epo/papers/volume6/issue3/bigelow.pdf, section entitled "The design of diacritics".] > but the new > characters could be confused with the sequence of the Latin letter I, > Latin letter IOTA, Greek letter IOTA, followed by a diacritical mark > resembling the top of the glyph. Given the existing confused situation, > this does not exacerbate the risk in a meaningful way. There is a > limited need to use a YOD in general purpose identifiers; its main > purpose is to represent transliteration of historic Egyptian text > originally written using Egyptian hieroglyphs." (WG2 N4792R2) For others, this is at https://unicode.org/wg2/docs/n4792r2-egyptologicalYod.pdf. > On this basis, they could be exceptionally DISALLOWED for IDNA. > > The other two characters are for transcriptions of Ugaritic - also > something not desperately needed and exceptional "disallowed" could be > defended. Yes. The risk is then that some as-yet unalphabetized language community picks up some of these letters, and later finds out they cannot use them in IDNs. Anyway, I'd be open to any kind of conclusion if it can be made quickly (weeks rather than years). Given they are indeed distinct from the 'Vietnamese' characters, I'd be okay with just going with the result of the algorithm (i.e. PVALID), but they might be candidates for your 'problematic characters' document. > See WG2 document N3487. That's http://www.unicode.org/L2/L2008/08272-n3487-egyptological-yod.pdf. Regards, Martin.
- [I18ndir] Review of new characters for Unicode 12… Martin J. Dürst
- Re: [I18ndir] Review of new characters for Unicod… Asmus Freytag
- Re: [I18ndir] Review of new characters for Unicod… Martin J. Dürst
- Re: [I18ndir] Review of new characters for Unicod… Asmus Freytag
- Re: [I18ndir] Review of new characters for Unicod… Martin J. Dürst
- Re: [I18ndir] Review of new characters for Unicod… Patrik Fältström
- Re: [I18ndir] Unicode 12.1.0 (was: Review of new … Martin J. Dürst
- Re: [I18ndir] Review of new characters for Unicod… Patrik Fältström
- Re: [I18ndir] Review of new characters for Unicod… Martin J. Dürst
- Re: [I18ndir] Review of new characters for Unicod… Martin J. Dürst
- Re: [I18ndir] Review of new characters for Unicod… Patrik Fältström
- Re: [I18ndir] Review of new characters for Unicod… Martin J. Dürst
- Re: [I18ndir] Unicode 12.1.0 Asmus Freytag
- Re: [I18ndir] Review of new characters for Unicod… Patrik Fältström
- Re: [I18ndir] Review of new characters for Unicod… Martin J. Dürst
- Re: [I18ndir] Review of new characters for Unicod… Patrik Fältström