[nfsv4] review of draft-ietf-nfsv4-internationalization-10.txt

Christoph Hellwig <hch@lst.de> Wed, 24 July 2024 17:23 UTC

Return-Path: <hch@lst.de>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 60B1CC157931; Wed, 24 Jul 2024 10:23:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.909
X-Spam-Level:
X-Spam-Status: No, score=-1.909 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id P7rdLvgbGx6r; Wed, 24 Jul 2024 10:23:18 -0700 (PDT)
Received: from verein.lst.de (verein.lst.de [213.95.11.211]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 75E48C1CAE8A; Wed, 24 Jul 2024 10:23:16 -0700 (PDT)
Received: by verein.lst.de (Postfix, from userid 2407) id C61F468AFE; Wed, 24 Jul 2024 19:23:13 +0200 (CEST)
Date: Wed, 24 Jul 2024 19:23:13 +0200
From: Christoph Hellwig <hch@lst.de>
To: David Noveck <davenoveck@gmail.com>
Message-ID: <20240724172313.GA16524@lst.de>
References: <172001876016.915067.10248092184385058958@dt-datatracker-5f88556585-g8gwj> <CADaq8jczyQ4sBb3V+eSd1ijo4JHA0cz8SoWVWTU27YrvOB5WkA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <CADaq8jczyQ4sBb3V+eSd1ijo4JHA0cz8SoWVWTU27YrvOB5WkA@mail.gmail.com>
User-Agent: Mutt/1.5.17 (2007-11-01)
Message-ID-Hash: NGTWNBCXWVJAL6N6LMJG66UPPMH3VS56
X-Message-ID-Hash: NGTWNBCXWVJAL6N6LMJG66UPPMH3VS56
X-MailFrom: hch@lst.de
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-nfsv4.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: NFSv4 <nfsv4@ietf.org>, nfsv4-chairs <nfsv4-chairs@ietf.org>, arnt@gulbrandsen.priv.no
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [nfsv4] review of draft-ietf-nfsv4-internationalization-10.txt
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/NHf0OzZwP4nFNEiWrU-iR9UX7wQ>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Owner: <mailto:nfsv4-owner@ietf.org>
List-Post: <mailto:nfsv4@ietf.org>
List-Subscribe: <mailto:nfsv4-join@ietf.org>
List-Unsubscribe: <mailto:nfsv4-leave@ietf.org>

Hi Dave,

this is a review of draft-ietf-nfsv4-internationalization-10.txt.

First, please run the document through a spell checker, there are a lot
of typos and annoying inconsistencies like using implementers and
implementors right next to each other.

Second, please refer to my recent mail to the WG list about covering 4.0
or not.  I think that needs to be a WG conclusion an is not document
specific, but highly relevant here.

The document refers to the term 'physical file systems' without
explaining it first.  A more common term would be 'local file system'
but even that is not entirely correct as NFS servers can also
export remote or distributed file systems, so 'underlying file system'
might be a better choice.  Either way the term should be explained in
the General Definitions section.

Section 2.2. "Requirements Language Derivation" is very confusing.
The BCP 14 terms designate requirements of the specification, and
the reason why the specification has these requirements does not
matter.  As far as I can tell simply removing this section would
improve the document significantly.

The forward reference from Section 6 to Section 7 harms the readability
of the document.  Instead of referring to different modes of operation
that are defined later, please define the high level operation modes
earlier my moving section 7 before section 6. In general I also don't
find references to "NFSv3" or "older" string handling very useful.
To an unencumbered implementer, spelling out the actual modes with
descriptive names will be a lot more useful and descriptive and useful.
That does not preclude mentioning what NFSv3 did, but it should be part
of the primary definition.  It would also really help to define the
two operation modes and the terms used by them in a General Definitions
section.

The name caching described in Section 6.3 can work in a slightly modified
form when file name normalization is used.  Linux implements it for
local file systems using Unicode based normalization and case
insensitivity.  In the NFS context that would require that the
client knows the exact normalization behavior and Unicode version used
on the server, and thus is relatively impractical.  Given how little
NFS is used with case insensitive file systems is also doesn't
really matter.

Section 8 and especially Section 8.1 are very meandering and the
wording feels like an essay and not suitable for a normative document.

I would suggest to replace Section 8 with a short introduction of
what changes compared to the definition in RFC8881 and drop section
8.1 entirely.  The details history of the text is of no importance to
implementers.  If you feel it should be properly archive somewhere
an informational document on the history of internationalization in
NFS might be somewhat interesting.

Section 8.2 should be worded as actual normative language.  My
quick and dirty proposal is below:

-------------------------------- snip --------------------------------
This section replaces Section 14.4 of [RFC8881], taking into account the
behavior of existing implementation of [RFC5661] [RFC8881] while
providing best effort compatibility with the definition in [RFC5661] and
[RFC8881].


  const FSCHARSET_CAP4_CONTAINS_NON_UTF8  = 0x1;
  const FSCHARSET_CAP4_ALLOWS_ONLY_UTF8   = 0x2;

  typedef uint32_t        fs_charset_cap4;

This attribute provides a simple way of determining whether a particular
file system behaves as a UTF-8-only server and rejects file names which
are not valid UTF8-encoded strings. When this attribute is supported and
the value returned has the FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 flag set, the
error NFS4ERR_INVAL MUST be returned if any file name argument contains a
string which is not a valid UTF8-encoded string.

When this attribute is supported and the value returned has the
FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 flag clear, the error NFS4ERR_INVAL will
not be returned based on adherence to the rules of UTF-8.

The FSCHARSET_CAP4_CONTAINS_NON_UTF8 flags exist for historical reasons
only and has no clear behavior associated with it.  Servers SHOULD set
the FSCHARSET_CAP4_CONTAINS_NON_UTF8 when they set the
FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 flag, and clear the
FSCHARSET_CAP4_CONTAINS_NON_UTF8 when they clear
FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 to provide the best compatibility to
historic clients.

Clients SHOULD ignore the FSCHARSET_CAP4_CONTAINS_NON_UTF8 flag.

When the fs_charset_cap attribute is not supported, the client can
perform a LOOKUP using a name not conforming to the rules of UTF-8
and use the error returned to determine whether non-UTF-8 names are
accepted.
-------------------------------- snip --------------------------------

Section 12 is closely related to Sections 6 and 7 and it would
be very helpful if it was located close to them in the document.

The section could use a bit of rewording, and the use of BCP 14
terms to describe actual server behavior and not just protocol
requirements seems wrong.  And possible replacement for the start
of this section is provided below:

-------------------------------- snip --------------------------------
Servers MAY accept component names that are not valid UTF-8 strings on
all or on some subset of file systems exported.

A typical pattern is for a server to use UTF‑8-unaware underlying file
systems that treat component names as uninterpreted strings of bytes,
rather than having any awareness of the character set being used.
Such servers do not change the stored representation of component names
from those received on the wire and use an octet-by-octet comparison of
component name strings to determine equivalence (as opposed to any
broader notion of string comparison).

This is because the server has no knowledge of the specific character
encoding being used.
-------------------------------- snip --------------------------------

Section 13 feels like editorializing.  I don't think this belongs into
a normative section.  If you feel attached to it maybe move it to
an appendix or the above mentioned informational document?

In general the document could use a bit less of term "As stated"
{,above,previously}".

This review didn't make it to the appendices yet, but I plan to
look over them in the future.