Re: [I18ndir] [art] Modern Network Unicode

John C Klensin <> Thu, 11 July 2019 16:31 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 85AE2120172; Thu, 11 Jul 2019 09:31:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id nHSCqQyx3l9P; Thu, 11 Jul 2019 09:31:40 -0700 (PDT)
Received: from ( []) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 19874120106; Thu, 11 Jul 2019 09:31:40 -0700 (PDT)
Received: from [] (helo=PSB) by with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <>) id 1hlbyr-000Kg4-1z; Thu, 11 Jul 2019 12:31:37 -0400
Date: Thu, 11 Jul 2019 12:31:31 -0400
From: John C Klensin <>
To: "Asmus Freytag (c)" <>, Carsten Bormann <>
Message-ID: <7F1F41C25D0AC5960D95A67E@PSB>
In-Reply-To: <>
References: <0A5251342D480BA6437F7549@PSB> <> <248A8DD5DA0D3D34D6B6EFC9@PSB> <> <> <>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Scanned: No (on; SAEximRunCond expanded to false
Archived-At: <>
Subject: Re: [I18ndir] [art] Modern Network Unicode
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 11 Jul 2019 16:31:43 -0000

(top post and adding the ART list back in)

I think Asmus and I are in complete agreement, but you might
find an attempt to summarize the state of work for relevant
characteristics and behaviors on the web, some of which interact
with this discussion, illuminating.  It is a chart with
explanation, not a long document.  See 


--On Thursday, July 11, 2019 08:51 -0700 "Asmus Freytag (c)"
<> wrote:

> On 7/11/2019 12:03 AM, Carsten Bormann wrote:
>> On Jul 11, 2019, at 01:00, Asmus Freytag
>> <> wrote:
>>> There are a few scripts where un-normalized text is
>>> "preferred" by the user community over NFC. In some cases,
>>> the most natural ordering of combining marks does not match
>>> NFC's canonical ordering. I other cases, NFC does not
>>> compose some sequences while local user communities strongly
>>> prefer the precomposed code points (e.g. Bengali).
>> Interesting.  There are also a few writing systems where NFC
>> is what is strongly preferred by the user community.
>> Is there any effort to capture these preferences in a formal
>> way so they become more accessible to international developer
>> communities?
> No. Note least because these preferences may not be stable
> over (longish) periods of time as they depend a bit on what
> keyboards / rendering platforms and apps handle best today.
> A./
>>> Those scripts would be an exception to John's statement: "
>>> NFC is also a close approximation to what any sensible
>>> terminal driver or IME is going to produce natively from a
>>> plausible keyboard layout for the relevant script", a
>>> statement that otherwise holds well.
>> Right.  It is interesting to see how Apple recently moved
>> from a normalizing file system (HFS+ was normalizing [to NFD
>> unfortunately, and using UTF-16 on disk]) to a
>> normalization-preserving, normalize-before-comparison
>> ("normalization-insensitive") file system (APFS).  It is
>> also interesting how the problem is "solved" in
>> applications such as git (core.precomposeunicode needs to be
>> set to true on platforms that tend to generate non-NFC names
>> so other platforms can pretend to stay blissfully unaware).
>> Grüße, Carsten