Re: [I18ndir] [art] Modern Network Unicode

Carsten Bormann <> Thu, 11 July 2019 07:03 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 5F28F120045 for <>; Thu, 11 Jul 2019 00:03:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -4.197
X-Spam-Status: No, score=-4.197 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 3MQqJJ_Z6YBD for <>; Thu, 11 Jul 2019 00:03:44 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 13C3E12010E for <>; Thu, 11 Jul 2019 00:03:44 -0700 (PDT)
Received: from [] ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPSA id 45kn8Z0Q9Fz14md; Thu, 11 Jul 2019 09:03:42 +0200 (CEST)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Carsten Bormann <>
In-Reply-To: <>
Date: Thu, 11 Jul 2019 09:03:41 +0200
X-Mao-Original-Outgoing-Id: 584521419.753935-235e145b091e6ab1f62269866f1c6f3a
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <0A5251342D480BA6437F7549@PSB> <> <248A8DD5DA0D3D34D6B6EFC9@PSB> <>
To: Asmus Freytag <>
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <>
Subject: Re: [I18ndir] [art] Modern Network Unicode
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 11 Jul 2019 07:03:47 -0000

On Jul 11, 2019, at 01:00, Asmus Freytag <> wrote:
> There are a few scripts where un-normalized text is "preferred" by the user community over NFC. In some cases, the most natural ordering of combining marks does not match NFC's canonical ordering. I other cases, NFC does not compose some sequences while local user communities strongly prefer the precomposed code points (e.g. Bengali).

Interesting.  There are also a few writing systems where NFC is what is strongly preferred by the user community.

Is there any effort to capture these preferences in a formal way so they become more accessible to international developer communities?

> Those scripts would be an exception to John’s statement: " NFC is also a close approximation to what any sensible terminal driver or IME is going to produce natively from a plausible keyboard layout for the relevant script", a statement that otherwise holds well.

Right.  It is interesting to see how Apple recently moved from a normalizing file system (HFS+ was normalizing [to NFD unfortunately, and using UTF-16 on disk]) to a normalization-preserving, normalize-before-comparison (“normalization-insensitive”) file system (APFS).  It is also interesting how the problem is “solved” in applications such as git (core.precomposeunicode needs to be set to true on platforms that tend to generate non-NFC names so other platforms can pretend to stay blissfully unaware).

Grüße, Carsten