[nfsv4] Genart last call review of draft-ietf-nfsv4-layrec-01

Dale Worley via Datatracker <noreply@ietf.org> Fri, 23 August 2024 01:29 UTC

Return-Path: <noreply@ietf.org>
X-Original-To: nfsv4@ietf.org
Delivered-To: nfsv4@ietfa.amsl.com
Received: from [10.244.2.19] (unknown [104.131.183.230]) by ietfa.amsl.com (Postfix) with ESMTP id 8A20FC16941F; Thu, 22 Aug 2024 18:29:53 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: Dale Worley via Datatracker <noreply@ietf.org>
To: gen-art@ietf.org
X-Test-IDTracker: no
X-IETF-IDTracker: 12.22.1
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <172437659311.377.1936980941446051875@dt-datatracker-584cd6c8dd-fvr2f>
Date: Thu, 22 Aug 2024 18:29:53 -0700
Message-ID-Hash: UNXOEAAOTN5JSXQHVSQMTO4OVPR7U4VZ
X-Message-ID-Hash: UNXOEAAOTN5JSXQHVSQMTO4OVPR7U4VZ
X-MailFrom: noreply@ietf.org
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-nfsv4.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: draft-ietf-nfsv4-layrec.all@ietf.org, last-call@ietf.org, nfsv4@ietf.org
X-Mailman-Version: 3.3.9rc4
Reply-To: Dale Worley <worley@ariadne.com>
Subject: [nfsv4] Genart last call review of draft-ietf-nfsv4-layrec-01
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/ublZFtYtyS3nvdsyU1mtkkDaqPU>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Owner: <mailto:nfsv4-owner@ietf.org>
List-Post: <mailto:nfsv4@ietf.org>
List-Subscribe: <mailto:nfsv4-join@ietf.org>
List-Unsubscribe: <mailto:nfsv4-leave@ietf.org>

Reviewer: Dale Worley
Review result: Ready with Issues

I am the assigned Gen-ART reviewer for this draft. The General Area
Review Team (Gen-ART) reviews all IETF documents being processed
by the IESG for the IETF Chair.  Please treat these comments just
like any other last call comments.

For more information, please see the FAQ at

<https://trac.ietf.org/trac/gen/wiki/GenArtfaq>.

Document:  draft-ietf-nfsv4-layrec-01
Reviewer:  Dale R. Worley
Review Date:  2024-08-22
IETF LC End Date:  2024-08-27
IESG Telechat date:  [not known]

Summary:

    This draft is on the right track but has open issues, described in
    the review.

Major issues:

There seems to be no discussion of upward-compatibility.
Particularly, what happens if the metadata server supports this
extension but the client does not, and what happens if the client
supports this extension but the metadata server does not.  I expect
that following the details of the text, the servers will work things
out correctly, but it's best if there is a holistic description of the
compatibility situations so the reader can verify that they've been
handled correctly, and that the implementer knows what to look for in
version-mismatch situations.

The text is written for people who have the entirety of the previously
defined protocol in their heads, and know all of the processing
paths.  That is, it's a very densely-written sent of amendments, with
no clear indexing of exactly what execution paths are affected by what
extensions/requirements.  It would be better if the items were broken
apart, the text expanded, and keyed to the definitions of the
procedures which are being amended.

Minor issues:

A section of the text is written in the subjunctive mood, despite that
it seems to be the core of the extension.  It should be written
normally, with as much use of normative keywords as possible.

Nits/editorial comments:

1.1.  Definitions

   See Section 1.1 of [RFC8435] for a more complete set of definitions.

However, the definitions that are given here duplicate RFC 8435.
Perhaps better to make that clear, e.g.

   The following definitions are from Section 1.1 of [RFC8435].  See it
   for a more complete set of definitions.

2.  Layout State Recovery

   The LAYOUTRETURN needs
   a layout stateid to proceed and there is no way for the client to
   recover layout state.

Does this mean "the previous layout stateid known by the client has
been rendered invalid by the MDS restart and there is no way for the
client to obtain a valid stateid"?  Or is there more to "recover
layout state" than just obtaining a valid a stateid?

   If the server were to allow the client to use the anonymous stateid
   of all zeros (see Section 8.2.3 of [RFC8881]) for lrf_stateid in
   LAYOUTRETURN (see Section 18.44.1 of [RFC8881]), then the client
   could inform the metadata server of errors encountered.

This paragraph and the following three paragraphs are written in the
subjunctive mood.  This leaves it unclear to the reader whether this
the text is prescribing the extension mentioned in the Abstract, or
just presenting a proposal which will be discussed further.

   That in turn
   would allow the metadata server to accurately resilver the file by
   picking the correct mirror(s).

It seems desirable to state explicitly here how the metadata server
"picks the correct mirror(s)", given that the LAYOUTRETURN seems to be
used to inform the MDS of the copies that are corrupt rather than the
ones that are not corrupt.  It's possible that this process is implied
by previously defined parts of NFSv4.1, but if so it might be useful
to point the reader to that description.

   There are two error scenarios that can occur:

   During the grace period:  If the client were to send any lrf_stateid
      in the LAYOUTRETURN other than the anonymous stateid of all zeros,
      then the metadata server would respond with an error of
      NFS4ERR_GRACE (see Section of 15.1.9.2 [RFC8881]).

   After the grace period:  If the client were to send any lrf_stateid
      in the LAYOUTRETURN with the anonymous stateid of all zeros, then
      the metadata server would respond with an error of
      NFS4ERR_NO_GRACE (see Section 15.1.9.3 of [RFC8881]).

This text is unclear what of this error processing is added in the
extension and what of it is part of the base that this document
extends.  Also, the text says "the metadata server would respond"
rather than "the metadata server MUST respond" or other normative
language that makes it clear how strict the requirements are.

   Also, when the metadata server builds the reply to the LAYOUTRETURN,
   it MUST NOT bump the seqid of the lorr_stateid.

My suspicion is that "it MUST NOT bump the seqid of the lorr_stateid"
only applies when the stateid of the request is all zeros, but the
text doesn't state that, inviting implementation errors.

   The metadata
   server MUST NOT have been resilvering the file such that it has a
   different layout (set of mirror instances) than the client before the
   restart of the metadata server.

Presumably the metadata server might be restarted at any instant,
regardless of what tasks it was executing.  That seems to make it
impossible to conform to this requirement.  I suspect that the meaning
is

   If the metadata server at the point of restarting was resilvering
   the file such that the MDS has a different layout (set of mirror
   instances) than the client, then upon restart, the MDS MUST NOT allow
   <the procedures described in this document> [<-- replace those
   words with the correct term].

Also this assumes that the metadata server has some way of knowing
what layout the client has, despite that the metadata server has
restarted.  Presumably that is previously defined in NFSv4.1, but it
might be helpful to point to that.

   The metadata server MUST NOT resilver a file if there are clients
   with outstanding layouts with iomode of LAYOUTIOMODE4_RW.  Whether
   the metadata server prevents all I/O to the file until the
   resilvering is done or [...]

Are these two sentences related?  The first talks of when the MDS must
not resilver, but the second talks of what is to happen when
resilvering is being done.  It seems that there should be a paragraph
break here and then some text setting the context in which "The
metadata server prevents all I/O ... or ...".

   Whether
   the metadata server prevents all I/O to the file until the
   resilvering is done or forces all I/O to go through the metadata
   server or allows a proxy server to update the new data file as it is
   being reslivered is all an implementation choice.

This sentence interacts in complex ways with a lot of specifics of how
the metadata server behaves.  I assume that the WG has verified that
this description is sufficient to clearly specify what behaviors are
allowed by the client and servers.

   Finally, the metadata server MUST determine that any files which are
   neither explicitly recovered with a CLAIM_PREVIOUS nor have a
   reported error via a LAYOUTRETURN, have to be resilvered.

would be simpler as

   Finally, the metadata server MUST resilver any files which are
   neither explicitly recovered with a CLAIM_PREVIOUS nor have a
   reported error via a LAYOUTRETURN.

4.  IANA Considerations

   IANA should use the current document (RFC-TBD) as the reference for
   the new entries.

This text speaks of new entries but does not specify what the new
entries are.  As far as I can tell, there are no new entries in the
IANA databases.  The text should be adjusted appropriately.

[END]