Re: [I18ndir] Review of new characters for Unicode 12.0.0

"Patrik Fältström " <paf@frobbit.se> Mon, 18 March 2019 10:42 UTC

Return-Path: <paf@frobbit.se>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CDC78131135 for <i18ndir@ietfa.amsl.com>; Mon, 18 Mar 2019 03:42:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.722
X-Spam-Level:
X-Spam-Status: No, score=-1.722 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FROM_EXCESS_BASE64=0.979, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=frobbit.se
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YROvnP8LZ8ML for <i18ndir@ietfa.amsl.com>; Mon, 18 Mar 2019 03:41:58 -0700 (PDT)
Received: from mail.frobbit.se (mail.frobbit.se [IPv6:2a02:80:3ffe::176]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C00E31310FF for <i18ndir@ietf.org>; Mon, 18 Mar 2019 03:41:57 -0700 (PDT)
Received: from [172.20.10.4] (unknown [IPv6:2a00:801:291:5794:64c6:6076:780a:533f]) by mail.frobbit.se (Postfix) with ESMTPSA id 4F99322E8B; Mon, 18 Mar 2019 11:41:54 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=frobbit.se; s=mail; t=1552905714; bh=T1KXTP44yTnQI7lMaHyfp0OtHSD3aufrZP9HgBdqfsc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Wh9vYaI2YUS6vzcpQOd3CdtqgyILI15BDWaauV2GEXIP9+s0vzm1pZE+80w7Hu20z qBLz5X4pv+ekkjuTWSs6DREyaD5nrQBEEsKlIe44o1KKC+SY3IGzUhO0g0ozp0CQap 2SRjoxezv1rFuau0DJTNzO6jlaTUyIcQry24vbNU=
From: Patrik Fältström <paf@frobbit.se>
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
Cc: i18ndir@ietf.org
Date: Mon, 18 Mar 2019 11:41:50 +0100
X-Mailer: MailMate (1.12.4r5597)
Message-ID: <A80E6FF6-4B14-42E9-B834-687393710685@frobbit.se>
In-Reply-To: <e0174987-056d-d74e-c3fa-5b457a72f8c3@it.aoyama.ac.jp>
References: <e0174987-056d-d74e-c3fa-5b457a72f8c3@it.aoyama.ac.jp>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=_MailMate_E02619B4-69F8-4F57-A64D-3D7D909B92E2_="; micalg="pgp-sha1"; protocol="application/pgp-signature"
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/k2BCedlC_v8gHcFfzJQwJHKtK9w>
Subject: Re: [I18ndir] Review of new characters for Unicode 12.0.0
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Mar 2019 10:42:14 -0000

Note that beta version of 12.1.0 already exists.

   Patrik

On 18 Mar 2019, at 1:49, Martin J. Dürst wrote:

> There were some talks about doing a review of the new characters for Unicode 12.0.0, for 'due diligence'.
>
> Here are the results such a review of new characters for Unicode 12.0.0, starting off http://www.unicode.org/charts/PDF/Unicode-12.0/.
> I also used a small Ruby program that I wrote, attached. In order for it to work, you have to make sure you use the latest (and greatest :-)
> version of Ruby that supports Unicode 12.0.0.
>
> Telugu
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-0C00.pdf new U+0C77 TELUGU SIGN SIDDHAM -> disallowed, okay for sign used at the beginning of texts as an invocation
>
> Lao
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-0E80.pdf new Lao letters and a sign (virama) for Pali and Sanskrit
> -> pvalid, which is okay for historical letters and marks
>
> Vedic Extensions
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1CD0.pdf new U+1CFA VEDIC SIGN DOUBLE ANUSVARA ANTARGOMUKHA -> pvalid, okay because this is graphically a base letter (used as a base for a combining nasal sign)
>
> Miscellaneous Symbols and Arrows
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-2B00.pdf two new symbols -> disallowed, okay
>
> Supplemental Punctuation
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-2E00.pdf new U+2E4F CORNISH VERSE DIVIDER -> disallowed, okay
>
> Latin Extended-D
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-A720.pdf 4 casing pairs, 3 upper-case letters completing case pairs with existing lower-case letters. Upper-case are disallowed (-> okay), lower-case are pvalid. Among these 4, the following three may merit a closer look:
> U+A7BB LATIN SMALL LETTER GLOTTAL A
> U+A7BD LATIN SMALL LETTER GLOTTAL I
> U+A7BF LATIN SMALL LETTER GLOTTAL U
> They are graphically very similar (if not identical) to the following three from the Latin extensions for Vietnamese:
> (see https://www.unicode.org/charts/PDF/U1E00.pdf)
> U+1EA3 ả LATIN SMALL LETTER A WITH HOOK ABOVE
> U+1EC9 ỉ LATIN SMALL LETTER I WITH HOOK ABOVE
> U+1EE7 ủ LATIN SMALL LETTER U WITH HOOK ABOVE
> I have sent a mail to Unicode experts to get more information on these.
>
> Latin Extended-E
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-AB30.pdf two new letters for Sinological transcription, digraphs with hooks -> pvalid, okay
>
> Elymaic
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-10FE0.pdf new script Elymaic, 23 letters, thereof one ligature
> -> pvalid, okay
>
> Newa
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-11400.pdf U+1145F NEWA LETTER VEDIC ANUSVARA -> pvalid, okay
>
> Takri
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-11680.pdf U+116B8 TAKRI LETTER ARCHAIC KHA -> pvalid, okay
>
> Nandinagari
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-119A0.pdf new script Nandinagari, all letters pvalid, one sign (U+119E3)
> disallowed, okay
>
> Soyombo
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-11A50.pdf two alternate visarga signs in Soyombo
> pvalid, okay [maybe these are targets for a variant set]
>
> Tamil Supplement
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-11FC0.pdf Tamil supplement: signs for fractions, traditional units, etc.
> disallowed, okay
>
> Egyptian Hieroglyph Format Controls
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-13430.pdf Egyptian Hieroglyph Format Controls for quadrats -> all disallowed, okay
>
> Miao
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-16F00.pdf Miao letters and marks for writing additional languages -> all pvalid, okay
>
> Ideographic Symbols and Punctuation
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-16FE0.pdf two additional symbols -> disallowed, okay
>
> Tangut
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-17000.pdf 6 additional Tangut characters -> pvalid, okay
>
> Small Kana Extension
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1B130.pdf historic small Hiragana/Katakana extensions -> pvalid, okay
> [Should be excluded from general use because they may be confused with the larger versions by everyday users, but this is only really a problem for the last character (U+1B167 KATAKANA LETTER SMALL N) because the others won't usually appear in names anyway.]
>
> Nyiakeng Puachue Hmong
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1E100.pdf new script, all characters except one are are pvalid
> I have done some additional checks on the following two:
> U+1E14E NYIAKENG PUACHUE HMONG LOGOGRAM NYAJ
>    o used to represent the word for money or currency
> U+1E14F NYIAKENG PUACHUE HMONG CIRCLED CA
>    o used to indicate ownership
>    -> U+1E108  nyiakeng puachue hmong letter ca
> Currently the former is pvalid but the later is disallowed.
> It may well be that it would be more appropriate to have the former be disallowed (as currently the '$' sign is disallowed in LDH) but allow the latter, if it is e.g. appearing in composite words which may appear in domain names. But according to the descriptions in
> http://unicode.org/L2/L2017/17002r3-n4780r3-nyiakeng-puachue-hmong.pdf, we are fine as is.
>
> Wancho
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1E2C0.pdf new script, all letters/digits/tone marks pvalid, one currency sign disallowed, okay
>
> Adlam
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1E900.pdf U+1E94B ADLAM NASALIZATION MARK -> pvalid, okay
> [Looks somewhat like an apostrophe, but that can be said of a Hebrew Yod,... too.]
>
> Ottoman Siyaq Numbers
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1ED00.pdf These are numbers, but they are not positional, and there's no zero -> disallowed, okay
> [If these were in present-day use, we might want to check whether these might appear in e.g. company names and such, but these are only used historically, so we're okay.]
>
> Enclosed Alphanumeric Supplement
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1F100.pdf U+1F16C RAISED MR SIGN, has a NFKC composition -> disallowed, okay
>
> Transport and Map Symbols
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1F680.pdf two additional transport-related emoji -> disallowed, okay
>
> Geometric Shapes Extended
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1F780.pdf colored circles and squares -> disallowed, okay
>
> Supplemental Symbols and Pictographs
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1F900.pdf additional emoji of various kinds -> disallowed, okay
>
> Chess Symbols
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1FA00.pdf turned and other variants of chess symbols for variants of chess -> disallowed, okay
>
> Symbols and Pictographs Extended-A
> http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1FA70.pdf additional emoji -> disallowed, okay
>
>
> Regards,    Martin.
>
> -- 
> I18ndir mailing list
> I18ndir@ietf.org
> https://www.ietf.org/mailman/listinfo/i18ndir