Re: [apps-discuss] Review of draft-ietf-appsawg-file-scheme

John C Klensin <> Tue, 10 May 2016 16:06 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 8A79812D500; Tue, 10 May 2016 09:06:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.896
X-Spam-Status: No, score=-2.896 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-0.996] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id xPAAnBIm8AlI; Tue, 10 May 2016 09:06:41 -0700 (PDT)
Received: from ( []) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 2C77C12B03F; Tue, 10 May 2016 09:06:41 -0700 (PDT)
Received: from [] ( by with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <>) id 1b0AB7-000AtO-Od; Tue, 10 May 2016 12:06:33 -0400
Date: Tue, 10 May 2016 12:06:28 -0400
From: John C Klensin <>
To: =?UTF-8?Q?Martin_J=2E_D=C3=BCrst?= <>, Julian Reschke <>, Matthew Kerwin <>
Message-ID: <>
In-Reply-To: <>
References: <> <> <> <> <> <> <> <> <>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
X-SA-Exim-Scanned: No (on; SAEximRunCond expanded to false
Archived-At: <>
Cc: IETF Apps Discuss <>,
Subject: Re: [apps-discuss] Review of draft-ietf-appsawg-file-scheme
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: General discussion of application-layer protocols <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 10 May 2016 16:06:42 -0000

--On Tuesday, May 10, 2016 19:19 +0900 "Martin J. Dürst"
<> wrote:

> In general, NFC gives you a higher chance for a match that
> NFD. The Mac filesystem uses (mostly) NFD internally, but is
> able to handle NFC. On the other hand, Windows and Linux don't
> do normalization inside the file system, but the chances that
> files were created in NFC is higher than for NFC.

Agreed.  But note that this is partially an artifact that
illustrates why the i18n / "multilingual" versus localization
issues are important.    NFC gives a higher chance for a match,
especially with strings that are not systematically normalized
because, if one is using a keyboard designed for a particular
language or location, that keyboard is likely to support
locally-used characters and hence far more likely to product
precomposed characters than combining sequences.  The same is
generally true when people select characters from some sort of
online character-picker, assuming the precomposed forms exist at
all.   On the other hand, if I'm an experience user of one
script trying to use a keyboard designed for a wildly different
script or one with too many distinct character forms
("graphemes" or "grapheme clusters") to allow single-stroke
arrangements to work well, all bets are off.

For some scripts, there are also what look from the outside like
internal consistency problems with Unicode: for example, NFD is
more internally consistent then NDC because many recently-added
precomposed characters decompose under NFC rather than
composing.  And some don't, leading to some of the problems that
led to the "non-decomposable character" mess that led to the
LUCID BOF and the IETF's apparent paralysis about Unicode 7.x.

It is hard to say something in cases like this that will always
deliver the best, or even the most-expected, results.