Re: [I18ndir] [art] Modern Network Unicode

Asmus Freytag <> Wed, 10 July 2019 06:08 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 45AEE120091 for <>; Tue, 9 Jul 2019 23:08:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key); domainkeys=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id d6IrBlvWXQLA for <>; Tue, 9 Jul 2019 23:08:45 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id E08441200FF for <>; Tue, 9 Jul 2019 23:08:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=dk12062016; t=1562738925; bh=LGVcfdtFseBuJ5nznDz8g/bReIvo/zeyxYZc Uupnbjk=; h=Received:Subject:To:References:From:Message-ID:Date: User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language: X-ELNK-Trace:X-Originating-IP; b=UVVNztDAtNejexozkmnvQOxKWuEh6c1io 11uQPXNG05ERehHnsSKgBn8N3MrNlc1HVeW0IXBGksmcxtjeXxDw6OwGOR/7mmJVTFo CKKjKMVr5qB0ndnHE3KLTij7o6oNO9+sNri7WV/UhxZwmF/jgpJj4A+zSBmfOvBK+f4 b1NEIEhvGwKOuNs4mcv9H9TZxd0gWb17YAwhrhM+RHRGqjuqxmzbsAR1H736sFoZ+qW KT/cpEFP8/YGPJYU6vS4cAlleaavQHed1ldtlvi4bsa1/uPTmgVriwZ8RKIWMYAY2yh lxpS5buJoAZym8EVFQf3N72BwPV4jzDNfbjiY3LjA==
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk12062016;; b=YDGmGlJ63b1FvsJ9TJ0gADcSqXWQX+iyAay/u87dHn3jK8sJVkctzp6zTORBjuZspGLK6il9YSZU5wpRN4o90VFGb41NCk1L8Qkn1iB4ahRutTdg49evoxz1vLE+73hg6hzUMJg9RishOnay65NldPo/CQYWjNBEK6AuuAexR9GK9jEOZQEqXwE9cXXBSAh5MOMJN4woSN1R0kW7u1DpVefLJ2sYvRKl5R3tJgnQz0R/cshjzn77FsKnC9PiOC0QdQ/RiUkeRj27wrPSaEmA3UOENE6JyhxebdOzvBgN1yRz3a6Ti5o/F4FS6apklSp9ZyFkvhKPuJZ6BhyDFfNghw==; h=Received:Subject:To:References:From:Message-ID:Date:User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language:X-ELNK-Trace:X-Originating-IP;
Received: from [] (helo=[]) by with esmtpa (Exim 4) (envelope-from <>) id 1hl5mU-000FoV-Vz for; Wed, 10 Jul 2019 02:08:43 -0400
References: <0A5251342D480BA6437F7549@PSB> <>
From: Asmus Freytag <>
Message-ID: <>
Date: Tue, 9 Jul 2019 23:08:56 -0700
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: multipart/alternative; boundary="------------3C698EC6FF8682FA742482E1"
Content-Language: en-US
X-ELNK-Trace: 464f085de979d7246f36dc87813833b27dfed51d218466683900d2a56fc75c548093f85669400132350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
Archived-At: <>
Subject: Re: [I18ndir] [art] Modern Network Unicode
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 10 Jul 2019 06:08:47 -0000

On 7/9/2019 2:50 PM, Carsten Bormann wrote:
> (NFKC is in the current document mostly as a reminder that variances can be made in normalization, as well; it is probably the only reasonable one beyond NFC among the normalization forms, but has its own problems as you note.)


NFC should be lossless. Conformant processes are not allowed to assume 
that the interpretation changes based on the normalization form. (My 
personal, and probably insufficient paraphrase of the language in the 
definitions in chapter 3 of Unicode).

NFC normalizes away variations in encoding of what Unicode considers the 
same text element. This does not agree with everybody's interpretation 
of what is the same text element, hence the occasional grousing. In 
particular, identical appearance does not make two text elements the 
same; especially if they have certain property differences.

NFC was supposed to match (as much as possible) the legacy choice of 
encoding for certain text elements. Some communities disagree, but the 
larger goal for NFC is to be stable (any text converted to NFC remains 
in NFC forever). It can therefore be used as data format.

Some sequences in complex scripts can render identically; none of the 
normalization forms will convert them to the preferred choice. Instead, 
the non-preferred ones are now defined as "do not use".

NFKC is lossy. It conflates variations that (in some, but not all, 
contexts) have different appearance or meaning, or both.

NFKC is great for certain loose comparisons, but not much more. The 
definition of what is a "compatibility" variant is rather arbitrary. 
NFKC, despite also being stable, is not suitable as archival text format 
(it erases common distinctions that may well have had meaning in the 
source data).

NFD is more useful for certain operations; if those are needed/common, 
then NFD might make a better choice than NFC (but renormalizing to NFD 
is also lossless).