Re: [I18ndir] I-D on filesystem I18N

Nico Williams <nico@cryptonector.com> Wed, 08 July 2020 21:18 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 795823A0035 for <i18ndir@ietfa.amsl.com>; Wed, 8 Jul 2020 14:18:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.079
X-Spam-Level:
X-Spam-Status: No, score=-2.079 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, T_SPF_HELO_TEMPERROR=0.01, T_SPF_TEMPERROR=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cryptonector.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Y6fsBRRyKvaV for <i18ndir@ietfa.amsl.com>; Wed, 8 Jul 2020 14:18:34 -0700 (PDT)
Received: from green.apple.relay.mailchannels.net (green.apple.relay.mailchannels.net [23.83.208.77]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 887AF3A0033 for <i18ndir@ietf.org>; Wed, 8 Jul 2020 14:18:34 -0700 (PDT)
X-Sender-Id: dreamhost|x-authsender|nico@cryptonector.com
Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 42DE1540AE1; Wed, 8 Jul 2020 21:18:33 +0000 (UTC)
Received: from pdx1-sub0-mail-a38.g.dreamhost.com (100-96-5-123.trex.outbound.svc.cluster.local [100.96.5.123]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 61ADF540611; Wed, 8 Jul 2020 21:18:32 +0000 (UTC)
X-Sender-Id: dreamhost|x-authsender|nico@cryptonector.com
Received: from pdx1-sub0-mail-a38.g.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384) by 0.0.0.0:2500 (trex/5.18.8); Wed, 08 Jul 2020 21:18:33 +0000
X-MC-Relay: Good
X-MailChannels-SenderId: dreamhost|x-authsender|nico@cryptonector.com
X-MailChannels-Auth-Id: dreamhost
X-Abiding-Shoe: 01572983163710c8_1594243113010_3132575454
X-MC-Loop-Signature: 1594243113010:722550791
X-MC-Ingress-Time: 1594243113010
Received: from pdx1-sub0-mail-a38.g.dreamhost.com (localhost [127.0.0.1]) by pdx1-sub0-mail-a38.g.dreamhost.com (Postfix) with ESMTP id 1C9F3B4198; Wed, 8 Jul 2020 14:18:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=cryptonector.com; bh=gvVjWFCaeADdk6 vTRX31ES/Qwpc=; b=QMzpzlTfjKl0R0s1cl0BH4I2ZQgdR4YaA0kjnkQ0joQ1fB BLefzrLfpyS+1xOSLYiVFc0g6R5ZXIOEFAid9hn2N8VsSxYKv/BSsxm7c0KcEQro 9dl30hkAmL16EFwP/7L1/5Zyx+c9N3hAQb+2DOAsn8CpuIVu0bfM9qzFhq6PQ=
Received: from localhost (unknown [24.28.108.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by pdx1-sub0-mail-a38.g.dreamhost.com (Postfix) with ESMTPSA id E326CB4190; Wed, 8 Jul 2020 14:18:30 -0700 (PDT)
Date: Wed, 8 Jul 2020 16:18:27 -0500
X-DH-BACKEND: pdx1-sub0-mail-a38
From: Nico Williams <nico@cryptonector.com>
To: John Levine <johnl@taugh.com>
Cc: i18ndir@ietf.org, john-ietf@jck.com
Message-ID: <20200708211825.GQ3100@localhost>
References: <9044C737C36C0787B9EAE190@PSB> <20200708203811.6CE9D1C6D7BD@ary.qy>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20200708203811.6CE9D1C6D7BD@ary.qy>
User-Agent: Mutt/1.9.4 (2018-02-28)
X-VR-OUT-STATUS: OK
X-VR-OUT-SCORE: -100
X-VR-OUT-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgeduiedrudejgdduieduucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuggftfghnshhusghstghrihgsvgdpffftgfetoffjqffuvfenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhepfffhvffukfhfgggtuggjfgesthdtredttdervdenucfhrhhomheppfhitghoucghihhllhhirghmshcuoehnihgtohestghrhihpthhonhgvtghtohhrrdgtohhmqeenucggtffrrghtthgvrhhnpeejudfffffhfffgjeeuleetueevuedtgfefteegjedvuddujeevudffgfdufffgteenucffohhmrghinhepphhusghlihgtqdhinhgsohigrdhorhhgnecukfhppedvgedrvdekrddutdekrddukeefnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmohguvgepshhmthhppdhhvghloheplhhotggrlhhhohhsthdpihhnvghtpedvgedrvdekrddutdekrddukeefpdhrvghtuhhrnhdqphgrthhhpefpihgtohcuhghilhhlihgrmhhsuceonhhitghosegtrhihphhtohhnvggtthhorhdrtghomheqpdhmrghilhhfrhhomhepnhhitghosegtrhihphhtohhnvggtthhorhdrtghomhdpnhhrtghpthhtohepnhhitghosegtrhihphhtohhnvggtthhorhdrtghomh
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/36OBJi98s6gZubj-HaLEXy2WHbc>
Subject: Re: [I18ndir] I-D on filesystem I18N
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Jul 2020 21:18:43 -0000

On Wed, Jul 08, 2020 at 04:38:11PM -0400, John Levine wrote:
> In article <9044C737C36C0787B9EAE190@PSB> you write:
> >Well, Unix, and every Unix-derived system I know of (definitely
> >including Linux, FreeBSD, and NetBSD) are case-sensitive and
> >getting anywhere near their file names with Case Folding or even
> >lower casing will cause rather interesting problems. 
> 
> FWIW MacOS is Unix underneath and its native file system is case
> folding.  I've been surprised how little trouble it's caused when
> I'm moving code back and forth to FreeBSD and even NFS mounting
> my MacOS filesystem on FreeBSD virtual machines.
> 
> The point about ZFS is also a good one, since it's gotten very popular
> on BSD and Solaris due to its flexible disk management. (Less so on
> most linux distros due to poor kernel integration.) It gives you a
> per-filesystem choice of case sensitifity, and also optionally doing
> any of the four Unicode normalizations on filenames.

Yeah.

In ZFS the NF is about form-insensitivity, so there's no observable
difference between picking NFC and NFD, or NFKC and NFKD.

BTW, I found this thread where Git users on OS X using ZFS were bitten
by normalization to NFD on create.  It took some doing to find it!

http://public-inbox.org/git/86wsm9dbhk.fsf@blue.stonehenge.com/

and here Linus blames ZFS (though others correctly blame Apple)

http://public-inbox.org/git/alpine.LFD.1.10.0805051049440.32269@woody.linux-foundation.org/

saying that

| UFS is a traditional Unix filesystem, and will not mangle your filenames.
| 
| ZFS apparently acts like a case-sensitive HFS+: it still tries to 
| normalize to UTF-8 (and does it badly, at that - picking an Apple-specific 
| almost-NFD form of normalization rather than the more sensible NFC form).
| 
| So ZFS may not corrupt cases, but it still corrupts UTF-8 filenames.

But again, the fault was Apple's: they chose to normalize to NFD on
create.

Normalizing to NFC would have been better because that would have
aligned better with input modes' output.  Input modes don't necessarily
produce NFC though.  This is why form-insensitive and form-preserving
behavior is a better choice.

As it is, important apps with squeaky users get the normalization grease
(rsync, Git, as mentioned).  But if you're using ZFS on anything other
than OS X, no apps need any grease.

Nico
--