Re: [I18ndir] Review of new characters for Unicode 12.0.0
"Patrik Fältström " <paf@frobbit.se> Mon, 18 March 2019 10:42 UTC
Return-Path: <paf@frobbit.se>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CDC78131135 for <i18ndir@ietfa.amsl.com>; Mon, 18 Mar 2019 03:42:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.722
X-Spam-Level:
X-Spam-Status: No, score=-1.722 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FROM_EXCESS_BASE64=0.979, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=frobbit.se
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YROvnP8LZ8ML for <i18ndir@ietfa.amsl.com>; Mon, 18 Mar 2019 03:41:58 -0700 (PDT)
Received: from mail.frobbit.se (mail.frobbit.se [IPv6:2a02:80:3ffe::176]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C00E31310FF for <i18ndir@ietf.org>; Mon, 18 Mar 2019 03:41:57 -0700 (PDT)
Received: from [172.20.10.4] (unknown [IPv6:2a00:801:291:5794:64c6:6076:780a:533f]) by mail.frobbit.se (Postfix) with ESMTPSA id 4F99322E8B; Mon, 18 Mar 2019 11:41:54 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=frobbit.se; s=mail; t=1552905714; bh=T1KXTP44yTnQI7lMaHyfp0OtHSD3aufrZP9HgBdqfsc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Wh9vYaI2YUS6vzcpQOd3CdtqgyILI15BDWaauV2GEXIP9+s0vzm1pZE+80w7Hu20z qBLz5X4pv+ekkjuTWSs6DREyaD5nrQBEEsKlIe44o1KKC+SY3IGzUhO0g0ozp0CQap 2SRjoxezv1rFuau0DJTNzO6jlaTUyIcQry24vbNU=
From: Patrik Fältström <paf@frobbit.se>
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
Cc: i18ndir@ietf.org
Date: Mon, 18 Mar 2019 11:41:50 +0100
X-Mailer: MailMate (1.12.4r5597)
Message-ID: <A80E6FF6-4B14-42E9-B834-687393710685@frobbit.se>
In-Reply-To: <e0174987-056d-d74e-c3fa-5b457a72f8c3@it.aoyama.ac.jp>
References: <e0174987-056d-d74e-c3fa-5b457a72f8c3@it.aoyama.ac.jp>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=_MailMate_E02619B4-69F8-4F57-A64D-3D7D909B92E2_="; micalg="pgp-sha1"; protocol="application/pgp-signature"
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/k2BCedlC_v8gHcFfzJQwJHKtK9w>
Subject: Re: [I18ndir] Review of new characters for Unicode 12.0.0
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Mar 2019 10:42:14 -0000
Note that beta version of 12.1.0 already exists. Patrik On 18 Mar 2019, at 1:49, Martin J. Dürst wrote: > There were some talks about doing a review of the new characters for Unicode 12.0.0, for 'due diligence'. > > Here are the results such a review of new characters for Unicode 12.0.0, starting off http://www.unicode.org/charts/PDF/Unicode-12.0/. > I also used a small Ruby program that I wrote, attached. In order for it to work, you have to make sure you use the latest (and greatest :-) > version of Ruby that supports Unicode 12.0.0. > > Telugu > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-0C00.pdf new U+0C77 TELUGU SIGN SIDDHAM -> disallowed, okay for sign used at the beginning of texts as an invocation > > Lao > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-0E80.pdf new Lao letters and a sign (virama) for Pali and Sanskrit > -> pvalid, which is okay for historical letters and marks > > Vedic Extensions > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1CD0.pdf new U+1CFA VEDIC SIGN DOUBLE ANUSVARA ANTARGOMUKHA -> pvalid, okay because this is graphically a base letter (used as a base for a combining nasal sign) > > Miscellaneous Symbols and Arrows > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-2B00.pdf two new symbols -> disallowed, okay > > Supplemental Punctuation > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-2E00.pdf new U+2E4F CORNISH VERSE DIVIDER -> disallowed, okay > > Latin Extended-D > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-A720.pdf 4 casing pairs, 3 upper-case letters completing case pairs with existing lower-case letters. Upper-case are disallowed (-> okay), lower-case are pvalid. Among these 4, the following three may merit a closer look: > U+A7BB LATIN SMALL LETTER GLOTTAL A > U+A7BD LATIN SMALL LETTER GLOTTAL I > U+A7BF LATIN SMALL LETTER GLOTTAL U > They are graphically very similar (if not identical) to the following three from the Latin extensions for Vietnamese: > (see https://www.unicode.org/charts/PDF/U1E00.pdf) > U+1EA3 ả LATIN SMALL LETTER A WITH HOOK ABOVE > U+1EC9 ỉ LATIN SMALL LETTER I WITH HOOK ABOVE > U+1EE7 ủ LATIN SMALL LETTER U WITH HOOK ABOVE > I have sent a mail to Unicode experts to get more information on these. > > Latin Extended-E > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-AB30.pdf two new letters for Sinological transcription, digraphs with hooks -> pvalid, okay > > Elymaic > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-10FE0.pdf new script Elymaic, 23 letters, thereof one ligature > -> pvalid, okay > > Newa > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-11400.pdf U+1145F NEWA LETTER VEDIC ANUSVARA -> pvalid, okay > > Takri > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-11680.pdf U+116B8 TAKRI LETTER ARCHAIC KHA -> pvalid, okay > > Nandinagari > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-119A0.pdf new script Nandinagari, all letters pvalid, one sign (U+119E3) > disallowed, okay > > Soyombo > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-11A50.pdf two alternate visarga signs in Soyombo > pvalid, okay [maybe these are targets for a variant set] > > Tamil Supplement > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-11FC0.pdf Tamil supplement: signs for fractions, traditional units, etc. > disallowed, okay > > Egyptian Hieroglyph Format Controls > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-13430.pdf Egyptian Hieroglyph Format Controls for quadrats -> all disallowed, okay > > Miao > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-16F00.pdf Miao letters and marks for writing additional languages -> all pvalid, okay > > Ideographic Symbols and Punctuation > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-16FE0.pdf two additional symbols -> disallowed, okay > > Tangut > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-17000.pdf 6 additional Tangut characters -> pvalid, okay > > Small Kana Extension > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1B130.pdf historic small Hiragana/Katakana extensions -> pvalid, okay > [Should be excluded from general use because they may be confused with the larger versions by everyday users, but this is only really a problem for the last character (U+1B167 KATAKANA LETTER SMALL N) because the others won't usually appear in names anyway.] > > Nyiakeng Puachue Hmong > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1E100.pdf new script, all characters except one are are pvalid > I have done some additional checks on the following two: > U+1E14E NYIAKENG PUACHUE HMONG LOGOGRAM NYAJ > o used to represent the word for money or currency > U+1E14F NYIAKENG PUACHUE HMONG CIRCLED CA > o used to indicate ownership > -> U+1E108 nyiakeng puachue hmong letter ca > Currently the former is pvalid but the later is disallowed. > It may well be that it would be more appropriate to have the former be disallowed (as currently the '$' sign is disallowed in LDH) but allow the latter, if it is e.g. appearing in composite words which may appear in domain names. But according to the descriptions in > http://unicode.org/L2/L2017/17002r3-n4780r3-nyiakeng-puachue-hmong.pdf, we are fine as is. > > Wancho > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1E2C0.pdf new script, all letters/digits/tone marks pvalid, one currency sign disallowed, okay > > Adlam > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1E900.pdf U+1E94B ADLAM NASALIZATION MARK -> pvalid, okay > [Looks somewhat like an apostrophe, but that can be said of a Hebrew Yod,... too.] > > Ottoman Siyaq Numbers > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1ED00.pdf These are numbers, but they are not positional, and there's no zero -> disallowed, okay > [If these were in present-day use, we might want to check whether these might appear in e.g. company names and such, but these are only used historically, so we're okay.] > > Enclosed Alphanumeric Supplement > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1F100.pdf U+1F16C RAISED MR SIGN, has a NFKC composition -> disallowed, okay > > Transport and Map Symbols > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1F680.pdf two additional transport-related emoji -> disallowed, okay > > Geometric Shapes Extended > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1F780.pdf colored circles and squares -> disallowed, okay > > Supplemental Symbols and Pictographs > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1F900.pdf additional emoji of various kinds -> disallowed, okay > > Chess Symbols > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1FA00.pdf turned and other variants of chess symbols for variants of chess -> disallowed, okay > > Symbols and Pictographs Extended-A > http://www.unicode.org/charts/PDF/Unicode-12.0/U120-1FA70.pdf additional emoji -> disallowed, okay > > > Regards, Martin. > > -- > I18ndir mailing list > I18ndir@ietf.org > https://www.ietf.org/mailman/listinfo/i18ndir
- [I18ndir] Review of new characters for Unicode 12… Martin J. Dürst
- Re: [I18ndir] Review of new characters for Unicod… Asmus Freytag
- Re: [I18ndir] Review of new characters for Unicod… Martin J. Dürst
- Re: [I18ndir] Review of new characters for Unicod… Asmus Freytag
- Re: [I18ndir] Review of new characters for Unicod… Martin J. Dürst
- Re: [I18ndir] Review of new characters for Unicod… Patrik Fältström
- Re: [I18ndir] Unicode 12.1.0 (was: Review of new … Martin J. Dürst
- Re: [I18ndir] Review of new characters for Unicod… Patrik Fältström
- Re: [I18ndir] Review of new characters for Unicod… Martin J. Dürst
- Re: [I18ndir] Review of new characters for Unicod… Martin J. Dürst
- Re: [I18ndir] Review of new characters for Unicod… Patrik Fältström
- Re: [I18ndir] Review of new characters for Unicod… Martin J. Dürst
- Re: [I18ndir] Unicode 12.1.0 Asmus Freytag
- Re: [I18ndir] Review of new characters for Unicod… Patrik Fältström
- Re: [I18ndir] Review of new characters for Unicod… Martin J. Dürst
- Re: [I18ndir] Review of new characters for Unicod… Patrik Fältström