Re: [I18ndir] Review of new characters for Unicode 12.0.0

Asmus Freytag <asmusf@ix.netcom.com> Mon, 18 March 2019 04:12 UTC

Return-Path: <asmusf@ix.netcom.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 92EEF1310C4 for <i18ndir@ietfa.amsl.com>; Sun, 17 Mar 2019 21:12:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.7
X-Spam-Level:
X-Spam-Status: No, score=-2.7 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=ix.netcom.com; domainkeys=pass (2048-bit key) header.from=asmusf@ix.netcom.com header.d=ix.netcom.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id X1BxnZO30_BW for <i18ndir@ietfa.amsl.com>; Sun, 17 Mar 2019 21:12:53 -0700 (PDT)
Received: from elasmtp-dupuy.atl.sa.earthlink.net (elasmtp-dupuy.atl.sa.earthlink.net [209.86.89.62]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E03D713123F for <i18ndir@ietf.org>; Sun, 17 Mar 2019 21:12:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ix.netcom.com; s=dk12062016; t=1552882372; bh=+6aXCovMrdnbjt+KrCdO51c50fjyAly6RP+J klABbgg=; h=Received:Subject:To:References:From:Message-ID:Date: User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language: X-ELNK-Trace:X-Originating-IP; b=kBl3kdmAzC5A6tL4cu/qEUarviIDleHit 6ssSpHhxLQ/di2mPg83UZFxRbzaVs9RAuDe2QvkZBJqfJns/T8kj0xZuzudKA50Z6hM 717O8Zl+4TexsnpT1U/Ez+pmbWGVn0ZEiC+V9WJYkfrHeehT5vVkSxCaBsbIYAfp5w4 lpPup1RnwOVDxP/T9N6HkTaq3pwn43wDQIUVLdeNR7mkoVFIsAsyEcpNomUvA+khCcV VH5TFty3omOkwvPTUSteKO/GOVHdSk97X+DrbCCIPXeggTsuN99DfsKl9OU9Y0yhTkO B/t1fU2EfvpmnEpNRtVbGRIatCxkhBLtI1mAB2ibQ==
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk12062016; d=ix.netcom.com; b=US1Izzxi/zq60aP/LCC33w4Bjry6WkWVvrRpNZbfGcpxUjTz1KeM9oG6QPnXQ8vsCYwmWZWWt2WIUIfFPwfQwEkxrP9q/LZjn9D7RXw+gfsbnhrEmo7h2HMJuJx62VDAOuVT+gM9K5x9aLE5oGQU96fnYj8HXJ5s2HxeP0kExJV0L/aqsE5kP4NNcc80VzU2H7aTW8c/4hDT0HWOVhrOo5kFAPxPuM+t5a7NfaQ6dGFu6QcuCoTHLNWc5RL5JDrU1Dcux4sTUHMudMLB9zI+FgP5iNMVAGfUGL5UKOb5ju+XLChDYs6vv/tuG7IT7NLUyOWHy8lSAu+WNtS2Yu9YHw==; h=Received:Subject:To:References:From:Message-ID:Date:User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language:X-ELNK-Trace:X-Originating-IP;
Received: from [97.113.245.20] (helo=[192.168.1.114]) by elasmtp-dupuy.atl.sa.earthlink.net with esmtpa (Exim 4) (envelope-from <asmusf@ix.netcom.com>) id 1h5jdr-0003vy-0U for i18ndir@ietf.org; Mon, 18 Mar 2019 00:12:51 -0400
To: i18ndir@ietf.org
References: <e0174987-056d-d74e-c3fa-5b457a72f8c3@it.aoyama.ac.jp>
From: Asmus Freytag <asmusf@ix.netcom.com>
Message-ID: <38438ac4-7e78-785c-2dbf-af47a9450aa0@ix.netcom.com>
Date: Sun, 17 Mar 2019 21:12:50 -0700
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.3
MIME-Version: 1.0
In-Reply-To: <e0174987-056d-d74e-c3fa-5b457a72f8c3@it.aoyama.ac.jp>
Content-Type: multipart/alternative; boundary="------------1A71C94B2799C2A3F952FC46"
Content-Language: en-US
X-ELNK-Trace: 464f085de979d7246f36dc87813833b2817f643b8a0bc7e0d7b51b19ae474d60940029f3c44f80f4350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
X-Originating-IP: 97.113.245.20
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/5VWgc5qtE-1LcTWeO4qaOVRhb_w>
Subject: Re: [I18ndir] Review of new characters for Unicode 12.0.0
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Mar 2019 04:13:07 -0000

On 3/17/2019 5:49 PM, Martin J. Dürst wrote:
> There were some talks about doing a review of the new characters for
> Unicode 12.0.0, for 'due diligence'.
>
> Here are the results such a review of new characters for Unicode 12.0.0,
> starting off http://www.unicode.org/charts/PDF/Unicode-12.0/.
> I also used a small Ruby program that I wrote, attached. In order for it
> to work, you have to make sure you use the latest (and greatest :-)
> version of Ruby that supports Unicode 12.0.0.
>
...


> Latin Extended-D
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-A720.pdf
> 4 casing pairs, 3 upper-case letters completing case pairs with existing
> lower-case letters. Upper-case are disallowed (-> okay), lower-case are
> pvalid. Among these 4, the following three may merit a closer look:
> U+A7BB LATIN SMALL LETTER GLOTTAL A
> U+A7BD LATIN SMALL LETTER GLOTTAL I
> U+A7BF LATIN SMALL LETTER GLOTTAL U
> They are graphically very similar (if not identical) to the following
> three from the Latin extensions for Vietnamese:
> (see https://www.unicode.org/charts/PDF/U1E00.pdf)
> U+1EA3 ả LATIN SMALL LETTER A WITH HOOK ABOVE
> U+1EC9 ỉ LATIN SMALL LETTER I WITH HOOK ABOVE
> U+1EE7 ủ LATIN SMALL LETTER U WITH HOOK ABOVE
> I have sent a mail to Unicode experts to get more information on these.


For A7BD (and it's uppercase, which is not relevant for IDNs) the 
documents say:

"The code positions could be U+A7BA and U+A7BB (next available code 
positions), properties as follow:
A7BA;LATIN CAPITAL LETTER EGYPTOLOGICAL YOD;Lu;0;L;;;;;N;;;;A7BB;
A7BB;LATIN SMALL LETTER EGYPTOLOGICAL YOD;Ll;0;L;;;;;N;;;A7BA;;A7BA
There are not confusable with any existing code points, but the new 
characters could be confused with the sequence of the Latin letter I, 
Latin letter IOTA, Greek letter IOTA, followed by a diacritical mark 
resembling the top of the glyph. Given the existing confused situation, 
this does not exacerbate the risk in a meaningful way. There is a 
limited need to use a YOD in general purpose identifiers; its main 
purpose is to represent transliteration of historic Egyptian text 
originally written using Egyptian hieroglyphs." (WG2 N4792R2)

On this basis, they could be exceptionally DISALLOWED for IDNA.

The other two characters are for transcriptions of Ugaritic - also 
something not desperately needed and exceptional "disallowed" could be 
defended.

See WG2 document N3487.

A./