[nfsv4] Making I18N exciting by getting it mostly out of NFSv4

Nico Williams <nico@cryptonector.com> Thu, 09 July 2020 16:59 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DC5723A0D12 for <nfsv4@ietfa.amsl.com>; Thu, 9 Jul 2020 09:59:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level:
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cryptonector.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4oKY7PU9uMl0 for <nfsv4@ietfa.amsl.com>; Thu, 9 Jul 2020 09:59:27 -0700 (PDT)
Received: from caracal.yew.relay.mailchannels.net (caracal.yew.relay.mailchannels.net [23.83.220.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3C76C3A0D02 for <nfsv4@ietf.org>; Thu, 9 Jul 2020 09:59:25 -0700 (PDT)
X-Sender-Id: dreamhost|x-authsender|nico@cryptonector.com
Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id C4F3D480DBC; Thu, 9 Jul 2020 16:59:21 +0000 (UTC)
Received: from pdx1-sub0-mail-a86.g.dreamhost.com (100-96-19-25.trex.outbound.svc.cluster.local [100.96.19.25]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 5842048083B; Thu, 9 Jul 2020 16:59:21 +0000 (UTC)
X-Sender-Id: dreamhost|x-authsender|nico@cryptonector.com
Received: from pdx1-sub0-mail-a86.g.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384) by 0.0.0.0:2500 (trex/5.18.8); Thu, 09 Jul 2020 16:59:21 +0000
X-MC-Relay: Good
X-MailChannels-SenderId: dreamhost|x-authsender|nico@cryptonector.com
X-MailChannels-Auth-Id: dreamhost
X-Eight-Reaction: 1eb1f918048cd9d6_1594313961637_4156827056
X-MC-Loop-Signature: 1594313961637:2011828889
X-MC-Ingress-Time: 1594313961637
Received: from pdx1-sub0-mail-a86.g.dreamhost.com (localhost [127.0.0.1]) by pdx1-sub0-mail-a86.g.dreamhost.com (Postfix) with ESMTP id BCD867EF42; Thu, 9 Jul 2020 09:59:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h=date :from:to:subject:message-id:mime-version:content-type; s= cryptonector.com; bh=btjtwzRf6wwit7NDlgZtkU4wjAk=; b=oA5nEoCq50F UYtG3gfn2FC/hJTOCzTdK6iwQAtphG1SeiMbTz6F3/AQUPqtklO6LONzt08zYzyP WAff2g1LJ7Tg4ufOia1S0A5KCZk2d3uTJVvr1SmjTOL7+fBAGGxVud0tqB1qi6WS JLmR154m8Jvm07oKHES6H52/kB2OuMCw=
Received: from localhost (unknown [24.28.108.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by pdx1-sub0-mail-a86.g.dreamhost.com (Postfix) with ESMTPSA id 222DB7EF2A; Thu, 9 Jul 2020 09:59:16 -0700 (PDT)
Date: Thu, 09 Jul 2020 11:59:11 -0500
X-DH-BACKEND: pdx1-sub0-mail-a86
From: Nico Williams <nico@cryptonector.com>
To: nfsv4@ietf.org
Message-ID: <20200709165910.GU3100@localhost>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
User-Agent: Mutt/1.9.4 (2018-02-28)
X-VR-OUT-STATUS: OK
X-VR-OUT-SCORE: 0
X-VR-OUT-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgeduiedrudelgddutdelucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuggftfghnshhusghstghrihgsvgdpffftgfetoffjqffuvfenuceurghilhhouhhtmecufedttdenucenucfjughrpeffhffvuffkgggtuggfsehttdertddtredvnecuhfhrohhmpefpihgtohcuhghilhhlihgrmhhsuceonhhitghosegtrhihphhtohhnvggtthhorhdrtghomheqnecuggftrfgrthhtvghrnhepleevjefhkeegkefhieejudevlefhtefhleehhfdtjedvueetfeffkedvhfefkeelnecukfhppedvgedrvdekrddutdekrddukeefnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmohguvgepshhmthhppdhhvghloheplhhotggrlhhhohhsthdpihhnvghtpedvgedrvdekrddutdekrddukeefpdhrvghtuhhrnhdqphgrthhhpefpihgtohcuhghilhhlihgrmhhsuceonhhitghosegtrhihphhtohhnvggtthhorhdrtghomheqpdhmrghilhhfrhhomhepnhhitghosegtrhihphhtohhnvggtthhorhdrtghomhdpnhhrtghpthhtohepnhhitghosegtrhihphhtohhnvggtthhorhdrtghomh
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/cCo_FrOrhm_1dRPbb6IgwwC0bW8>
Subject: [nfsv4] Making I18N exciting by getting it mostly out of NFSv4
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Jul 2020 16:59:30 -0000

I performed an early review of draft-dnoveck-nfsv4-internationalization-
01 on behalf of the i18ndir.  I'm not holding up progress.  It was only
an early review.  I am taking advantage of this opportunity to revisit
filesystem internationalization though.

Yes, I do believe we (the IETF, the NFSv4 WG) have had filesystem I18N
wrong for 17 years now, i.e., since RFC 3530.  This is not something new
-- I've been saying something like this for almost as long as those 17
years, and I've written about this on my blog and elsewhere many times.

Specifically:

 - putting responsibility for normalization and case handling in a
   filesystem server is simply wrong and -since implementors understand
   that!- does not reflect running code

 - I18N responsibilities belong almost entirely to each filesystem

 - clients that cache directory contents with intent to perform local
   lookups against the cache do need to know the I18N rules applied by
   the remote filesystem

The NFSv4 protocol (and others) does need to support telling _caching_
clients what I18N rules are applied by each filesystem (or even each
directory).  And perhaps NFSv4 servers need to reject invalid file name
UTF-8 on create (and lookup), but that's it for the server's
responsibilities.

To help with this, I've written and submitted draft-williams-filesystem-
18n-00.  Soon I'll submit -01.  (Back in 2013 I also submitted draft-
williams-i18n-boundary-analysis-00.)

This has led to lively discussion on the i18ndir mailing list.  Some
concede that this approach is correct but don't care to change past
consensus.  Others think otherwise.  We've not reached consensus within
the i18ndir, but we will, I'm sure.  You can find the mailing list
archives, though you subscription is limited to members of the
directorate.

For this WG, this work should be exciting because it will mostly get the
WG out of the business of dealing with I18N.  There will be a modicum of
I18N work to do with caching clients.

Also, regarding running code, now that APFS implements the same approach
as ZFS has for some 14 years now, I think it's fair to say that the best
current practice for I18N in filesystems is:

 - regarding normalization, implement form-insensitive, form-preserving
   behavior

 - do it in the filesystem

Note that this treats normalization like case is typically treated in
case-insensitive filesystems: preserve {case, form}, but be {case-,
form-} insensitive.  Indeed, case-insensitivity is a lot like extending
the equivalence problem normally dealt with by normalization.

The only reasonable alternative is HFS+'s, which is closer to 20 years
old now, and that is to normalize on create (and lookup).  But Apple has
clearly abandoned this approach with APFS.

I took a look and Linux does have kernel Unicode form- and case-
insensitivity utility code, but it's generally only used by
case-insensitive filesystems.  It should be relatively simple to add
support for a form-insensitive filesystem option.  This isn't needed to
support the case about running code, but anyways, it'd be nice.

Nico
--