Re: [nfsv4] Benjamin Kaduk's Discuss on draft-ietf-nfsv4-rfc5661sesqui-msns-03: (with DISCUSS and COMMENT)

David Noveck <davenoveck@gmail.com> Sat, 18 January 2020 13:30 UTC
MIME-Version: 1.0
References: <157665795217.30033.16985899397047966102.idtracker@ietfa.amsl.com> <CADaq8jegizL79V4yJf8=itMVUYDuHf=-pZgZEh-yqdT30ZdJ5w@mail.gmail.com> <CADaq8jcURAKZsNvs17MhNFT7eBNtkvOdrur5hHY2J1gXH7QdsA@mail.gmail.com> <20200113225411.GI66991@kduck.mit.edu>
In-Reply-To: <20200113225411.GI66991@kduck.mit.edu>
From: David Noveck <davenoveck@gmail.com>
Date: Sat, 18 Jan 2020 08:30:18 -0500
Message-ID: <CADaq8jcUWHo9KANDavHER0CA0AMW4t88t+Hg8PykV4S=hXF_HA@mail.gmail.com>
To: Benjamin Kaduk <kaduk@mit.edu>
Cc: The IESG <iesg@ietf.org>, draft-ietf-nfsv4-rfc5661sesqui-msns@ietf.org, "nfsv4-chairs@ietf.org" <nfsv4-chairs@ietf.org>, Magnus Westerlund <magnus.westerlund@ericsson.com>, NFSv4 <nfsv4@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000e6c7e1059c6a0f3f"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/Z5tAMdvOAfL2nFeeyhbn-eknlg4>
Subject: Re: [nfsv4] Benjamin Kaduk's Discuss on draft-ietf-nfsv4-rfc5661sesqui-msns-03: (with DISCUSS and COMMENT)
Precedence: list
On Mon, Jan 13, 2020 at 5:54 PM Benjamin Kaduk <kaduk@mit.edu> wrote:

> Hi David,
>
> Trimming lots of good stuff here as well...
>
> On Thu, Jan 02, 2020 at 10:09:02AM -0500, David Noveck wrote:
> > On Wed, Dec 18, 2019 at 3:32 AM Benjamin Kaduk via Datatracker <
> > noreply@ietf.org> wrote:
> >
> > > Benjamin Kaduk has entered the following ballot position for
> > > draft-ietf-nfsv4-rfc5661sesqui-msns-03: Discuss
> > >
> > > ----------------------------------------------------------------------
> > > DISCUSS:
> > > ----------------------------------------------------------------------
> > >
> > > Responded to these on 12/20.
> >
> > >
> > > ----------------------------------------------------------------------
> > > COMMENT:
> > > ----------------------------------------------------------------------
> > >
> > > I think I may have mistakenly commented on some sections that are
> > > actually just moved text, since my lookahead window in the diff was too
> > > small.
> >
> >
> > No harm, no foul.
> >
> >
> > >
> > > Since the "Updates:" header is part of the immutable RFC text (though
> > > "Updated by:" is mutable), we should probably explicitly state that
> "the
> > > updates that RFCs 8178 and 8434 made to RFC 5661 apply equally to this
> > > document".
> > >
> >
> > I think we could update the last paragraph of Section 1.1 to be more
> > explicit about
> > this.  Perhaps it could read:
> >
> >    Until the above work is done, there will not be a consistent set of
> >    documents providing a description of the NFSv4.1 protocol and any
> >    full description would involve documents updating other documents
> >    within the specification.   The updates applied by RFC 8434 [66] and
> >    RFC 8178 [63] to RFC5661 also apply to this specification, and
> >    will apply to any subsequent v4.1 specification until that work is
> done.
>
> Sounds good.
>
> > >
> > > I note inline (in what is probably too many places; please don't reply
> > > at all of them!) some question about how clear the text is that a file
> > > system migration is something done at a per-file-system granularity,
> and
> > > that migrating a client at a time is not possible.
> >
> >
> > It might be possible but doing so is not a goal of this specfication.
> >
> > I'm not sure how to address your concern.   I don't know why anyone would
> > assume that migrating entire clients is a goal of this specification.
>  As
> > far as
> > I can see, when the word "migration" is used it is always in connection
> with
> > migrating a file system.   Is there some specific place where you think
> > this
> > issue is likely to arise?
>
> I think I garbled my point; my apologies.
> To give a semi-concrete example, suppose I have clients A and B that are
> accessing filesystem F on server X, and filesystem F is also available on
> server Y.  If X decides that it needs to migrate access to F away from X
> (e.g., for maintenance), then the "file system migration event" involves
> telling both A and B to look to Y for access to F, at basically the same
> time.


This clarifies things for me.   When you were speaking of "migrating a
client"
i ssumed you  worried anout consistency of fs's F, G,, H for a particular
client.
Now it appears the issue is consistency among clients A, B, c, all
accessing a
common F.

If X tries to tell only A but not B to access F via Y but lets B
> continue to access F at X, then I think there can be some subtle
> consistency issues.
>

Or worse, some decidely unsubltle ones :-(

>
> In some sense, this is easy to consider as a dichotomy between "migration
> is for server maintenance" vs. "migration is for load balancing".


That categorization helps.


> Assuming
> I understand correctly (not a trivial assumption!), there was never any
> intent to use these mechanisms for load balancing,


Well "Never" covers a lot.    There are cases which you do want to do
load balancing.   For example, if you are dealing with multiple network
access path to the same replica, there is no issue with the load balancing
approach.   In the case of multiple replicas where data consistency applies
between them, then you might lod balance but it is the server's
resposibility to
provide the consistency, meaning that he needs to be warned of the
possibility
of issues that might arise if clients   modifying the same dara are placed
on
different replicas. In the case in which you don't guarantee data
consistency among
replicas, you might as well say about doing load balacing that "there be
dragons".

and if we can explicitly
> disclaim such usage, then we don't have to try to reason through any
> potential subtle consistency issues.
>

I think we can disclaim the really problematic part.  I think the new text
will be needed in the migration section.  Issues with replication are
different and do not involve any server choice.

I anticipate revising section 11.5.5 to read as follows:

   When a file system is present and becomes inacessible using the
   current access path, the NFSv4.1 protocol provides a means by which
   clients can be given the opportunity to have continued access to
   their data.  This may involve use of a different access path to the
   existing replica or by providing a path to a different replica.  The
   new access path or the location of the new replica is specified by a
   file system location attribute.  The ensuing migration of acceess
   includes the ability to retain locks across the transition.
   Depending on circumstances, this can involve:

   o  The continued use of the existing clientid when accessing the
      current replica using a new access path.

   o  Use of lock reclaim, taking advantage of a per-fs grace period.

   o  Use of Tranparent State Migration.

   Typically, a client will be accessing the file system in question,
   get an NFS4ERR_MOVED error, and then use a file system location
   attribute to determine new the access path for the data.  When
   fs_locations_info is used, additional information will be available
   that will define the nature of the client's handling of the
   transition to a new server.

   In most instances clients will choose to migrate all clients using a
   particular file system to a successor replica at the same time to
   avoid cases in which different clients are updating diufferent
   replicas.  However migration of individual client can be helpful in
   providing load balancing, as long as the replicas in question are
   such that they represent the same data as described in
   Section 11.11.8.

   o  In the case in which there is no transition between replicas
      (i.e., only a change in access path), there are no special
      difficulties in using of this mechanism to effect load balancing.

   o  In the case in which the two replicas are sufficiently co-
      ordinated as to allow coherent simultaneous access to both by a
      single client, there is, in general, no obstacle to use of
      migration of particular clients to effect load balancing.
      Generally, such simultaneous use involves co-operation between
      servers to ennsure that locks granted on two co-ordinated replica
      cannot conflict and can remain effective when transferred to a
      common replica.

   o  In the case in which a large set of clients are accessing a file
      system in a read-only fashion, in can be helpful to migrate all
      clients with writable access simultaneously, while using load
      balancing on the set of read-only copies, as long as the rules
      appearing in Section 11.11.8, designed to prevent data reversion
      are adhered to.

   In other cases, the client might not have sufficient guarantees of
   data similarity/coherence to function prperly (e.g. the data in the
   two replicas is similar but not identical), and the possibility that
   different clients are updating different replicas can exacerbate the
   difficulties, making use of load balancing in such situations a
   perilous enterprise.

   The protocol does not specify how the file system will be moved
   between servers or how updates to multiple replicas will be co-
   ordinated.  It is anticipated that a number of different server-to-
   server co-ordination mechanisms might be used with the choice left to
   the server implementer.  The NFSv4.1 protocol specifies the method
   used to communicate the migration event between client and server.

   The new location may be, in the case of various forms of server
   clustering, another server providing access to the same physical file
   system.  The client's responsibilities in dealing with this
   transition will depend on whether a switch between replicas has
   occurred and the means the server has chosen to provide continuity of
   locking state.  These issues will be discussed in detail below.

   Although a single successor location is typical, multiple locations
   may be provided.  When multiple locations are provided, the client
   will typically use the first one provided.  If that is inaccessible
   for some reason, later ones can be used.  In such cases the client
   might consider the transition to the new replica to be a migration
   event, even though some of the servers involved might not be aware of
   the use of the server which was inaccessible.  In such a case, a
   client might lose access to locking state as a result of the access
   transfer.

   When an alternate location is designated as the target for migration,
   it must designate the same data (with metadata being the same to the
   degree indicated by the fs_locations_info attribute).  Where file
   systems are writable, a change made on the original file system must
   be visible on all migration targets.  Where a file system is not
   writable but represents a read-only copy (possibly periodically
   updated) of a writable file system, similar requirements apply to the
   propagation of updates.  Any change visible in the original file
   system must already be effected on all migration targets, to avoid
   any possibility that a client, in effecting a transition to the
   migration target, will see any reversion in file system state.


> > As was the case for
> > > my Discuss point about addresses/port-numbers, I'm missing the context
> > > of the rest of the document, so perhaps this is a non-issue, but the
> > > consequences of getting it wrong seem severe enough that I wanted to
> > > check.
> > >
> >
> > I'm not seeing any severe consequences.   Am I missing something?
> >
> >
>

This is clearer now. I think we can avoid any severe consequences.


> >
> > Section 1.1
> > >
> > >    The revised description of the NFS version 4 minor version 1
> > >    (NFSv4.1) protocol presented in this update is necessary to enable
> > >    full use of trunking in connection with multi-server namespace
> > >    features and to enable the use of transparent state migration in
> > >    connection with NFSv4.1.  [...]
> > >
> > > nit: do we expect all readers to know what is meant by "trunking" with
> > > no other lead-in?
> > >
> >
> > Good point.  perhaps it could be addressed by rewriting the material in
> the
> > first paragraph of  Section 1.1 to read as follows;.
> >
> >    Two important features previously defined in minor version 0 but
> >    never fully addressed in minor version 1 are trunking, the use of
> >    multiple connections between a client and server potentially to
> >    different network addresses, and transparent state migration, which
> >    allows a file system to be transferred betwwen servers in a way that
> >    provides for the client to maintain its existing locking state accross
> >    the transfer.
>
> Maybe "the simultaneous use of multiple connections"?
>

Will add.


> nit: s/betwwen/between/
>

Fixed.

>

>    The revised description of the NFS version 4 minor version 1
> >    (NFSv4.1) protocol presented in this update is necessary to enable
> >    full use of these features with other multi-server namespace features
> >    This document is in the form of an updated description of the NFS 4.1
> >    protocol previously defined in RFC5661 [62].  RFC5661 is obsoleted by
> >    this document.  However, the update has a limited scope and is focused
> >    on enabling full use of trunkinng and transparent state migration.
> The
> >    need for these changes is discussed in Appendix A.  Appendix B
> describes
> >    the specific changes made to arrive at the current text.
>
> This looks good, thanks.
>

:-)


>
> [...]
> > >
> > >    o  Work would have to be done to address many erratas relevant to
> RFC
> > >       5661, other than errata 2006 [60], which is addressed in this
> > >       document.  That errata was not deferrable because of the
> > >       interaction of the changes suggested in that errata and handling
> > >       of state and session migration.  The erratas that have been
> > >       deferred include changes originally suggested by a particular
> > >       errata, which change consensus decisions made in RFC 5661, which
> > >       need to be changed to ensure compatibility with existing
> > >       implementations that do not follow the handling delineated in RFC
> > >       5661.  Note that it is expected that such erratas will remain
> > >
> > > This sentence is pretty long and hard to follow; maybe it could be
> split
> > > after "change consensus decisions made in RFC 5661" and the second half
> > > start with a more declarative statement about existing implementations?
> > > (E.g., "Existing implementations did not perform handling as
> delineated in
> > > RFC
> > > 5661 since the procedures therein were not workable, and in order to
> > > have the specification accurately reflect the existing deployment base,
> > > changes are needed [...]")
> > >
> >
> > I will clean this bullet up.  See below for a proposed replcement.
> >
> >
> > >
> > >       relevant to implementers and the authors of an eventual
> > >       rfc5661bis, despite the fact that this document, when approved,
> > >       will obsolete RFC 5661.
> > >
> > > (I assume the RFC Editor can tweak this line to reflect what actually
> > > happens; my understanding is that the errata reports will get cloned to
> > > this-RFC.)
> > >
> >
> > I understand that Magnus has already got that issue addressed.  I'll
> > discuss the appropriate text with him.
> >
> >
> > > [rant about "errata" vs. "erratum" elided]
> > >
> >
> > This is annoying but there is no way we are going to get people to use
> > "erratum".   What I've tried to do in my propsed replacement text
> > is to refer to "errata report(s)", which is more accurate and allows
> > people who speak English to use English singulars and plurals, without
> > having to worry about Latin grammar.
>
> That's what I try to do as well :)
>
> > Here's my proposed replacement for the troubled bullet:
> >
> >    o  Work needs to be done to address many errata reports relevant to
> >       RFC 5661, other than errata report 2006 [60], which is addressed
> >       in this document.  Addressing of that report was not deferrable
> >       because of the interaction of the changes suggested there and the
> >       newly described handling of state and session migration.
> >
> >       The errata reports that have been deferred and that will need to
> >       be addressed in a later document include reports currently
> >       assigned a range of statuses in the errata reporting system
> >       including reports marked Accepted and those marked Held Over
>
> nit: it's "Hold For Document Update"
>
> Fixed

> >       because the change was too minor to address immediately.
> >
> >       In addition, there is a set of other reports, including at least
> >       one in state Rejected, which will need to be addressed in a later
> >       document.  This will involve making changes to consensus decisions
> >       reflected in RFC 5661, in situations in which the working group has
> >       already decided that the treatment in RFC 5661 is incorrect, and
> > needs
> >       to be revised to reflect the working group's new consensus and
> ensure
> >       compatibility with existing implementations that do not follow the
> >       handling described in in RFC 5661.
> >
> >       Note that it is expected that such all errata reports will remain
>
> nit: s/such all/all such/
>
> Fixed.

> >       relevant to implementers and the authors of an eventual
> >       rfc5661bis, despite the fact that this document, when approved,
> >       will obsolete RFC 5661 [62].
>
> This looks really good!
>
> >
> > > Section 2.10.4
> > >
> > >    Servers each specify a server scope value in the form of an opaque
> > >    string eir_server_scope returned as part of the results of an
> > >    EXCHANGE_ID operation.  The purpose of the server scope is to allow
> a
> > >    group of servers to indicate to clients that a set of servers
> sharing
> > >    the same server scope value has arranged to use compatible values of
> > >    otherwise opaque identifiers.  Thus, the identifiers generated by
> two
> > >    servers within that set can be assumed compatible so that, in some
> > >    cases, identifiers generated by one server in that set may be
> > >    presented to another server of the same scope.
> > >
> > > Is there more that we can say than "in some cases"?
> >
> >
> > Not really.  In general, when a server sends you an id, it comes with an
> > implied promise to recognize it when you present it subsequently to the
> > same server.
> >
> > The fact that two servers have decided to co-operate in their Id
> assignment
> > does not change that.
> >
> > The previous text
> > > implies a higher level of reliability than just "some cases", to me.
> > >
> >
> > I think I need to change the text, perhaps by replacing "use compatible
> > values of otherwise
> > opaque identifiers" by "use distinct values of otherwise opaque
> identifiers
> > so that the two
> > servers never assign the same value to two distinct objects".
> >
> > I anticipate the following replacement for the first two paragraphs of
> > Section 2.10.4:
> >
> >    Servers each specify a server scope value in the form of an opaque
> >    string eir_server_scope returned as part of the results of an
> >    EXCHANGE_ID operation.  The purpose of the server scope is to allow a
> >    group of servers to indicate to clients that a set of servers sharing
> >    the same server scope value has arranged to use distinct values of
> >    opaque identifiers so that the two servers never assign the same
> >    value to two distinct object.  Thus, the identifiers generated by two
> >    servers within that set can be assumed compatible so that, in certain
> >    important cases, identifiers generated by one server in that set may
> >    be presented to another server of the same scope.
> >
> >    The use of such compatible values does not imply that a value
> >    generated by one server will always be accepted by another.  In most
> >    cases, it will not.  However, a server will not accept a value
> >    generated by another inadvertently.  When it does accept it, it will
>
> nit: I think it flows better to put "invertently" as "will not
> inadvertently accept".
>

OK.  Fixed.


>
> >    be because it is recognized as valid and carrying the same meaning as
> >    on another server of the same scope.
> >
> >
> > As an illustration of the (limited) value of this information, consider
> the
> > case of client recovery from a server reboot.  The client has to reclaim
> > his locks using file handles returned by the previous server instance.
> If
> > the server scopes are the same (they almost always are), the client is
> not
> > sure he will get his locks back (e.g. the file might have been deleted),
> > but he does know that, if the lock reclaim succeeds, it is for the same
> > file.  If the server scopes are not the same, he has no such assurance.
>
> Thanks, the new text (and explanation here) is very clear about what's
> going on.
>
> [...]
> > > Section 11.5.5
> > >
> > >    will typically use the first one provided.  If that is inaccessible
> > >    for some reason, later ones can be used.  In such cases the client
> > >    might consider that the transition to the new replica as a migration
> > >    event, even though some of the servers involved might not be aware
> of
> > >    the use of the server which was inaccessible.  In such a case, a
> > >
> > > nit: the grammar here got wonky; maybe s/as a/is a/?

> >
> >
> > How about s/as a/to be a/ ?
>
> That works if you drop the earlier "that", for "the client might consider
> the transition to the new replica to be a migration event".
>
> Did that.

> [...]
> > >
> > >    o  The "local" representation of all owners and groups must be the
> > >       same on all servers.  The word "local" is used here since that is
> > >       the way that numeric user and group ids are described in
> > >       Section 5.9.  However, when AUTH_SYS or stringified owners or
> > >       group are used, these identifiers are not truly local, since they
> > >       are known tothe clients as well as the server.
> > >
> > > I am trying to find a way to note that the AUTH_SYS case mentioned here
> > > is precisely because of the requirement being imposed by this bullet
> > > point,
> >
> >
> > Not sure what you mean by that.  I think the requirement is to allow the
> > client
> > to be able to use AUTH_SYS, without the contortions that would be
> required
> > if
> > different fs's had the same uid's meaning different things.
> >
> > while acknowledging that the "stringified owners or group" case
> > > is separate, but not having much luck.
> > >
> >
> > My attempt to revise this area is below:
> >
> >    Note that there is no requirement in general that the users
> >    corresponding to particular security principals have the same local
> >    representation on each server, even though it is most often the case
> >    that this is so.
> >
> >    When AUTH_SYS is used, the following additional requirements must be
> >    met:
> >
> >    o  Only a single NFSv4 domain can be supported through use of
> >       AUTH_SYS.
> >
> >    o  The "local" representation of all owners and groups must be the
> >       same on all servers.  The word "local" is used here since that is
> >       the way that numeric user and group ids are described in
> >       Section 5.9.  However, when AUTH_SYS or stringified numeric owners
> >       or groups are used, these identifiers are not truly local, since
> >       they are known to the clients as well as the server.
> >
> >    Similarly, when strigified numeric user and group ids are used, the
> >    "local" representation of all owners and groups must be the same on
> >    all servers, even when AUTH_SYS is not used.
>
> I really like this rewriting; thank you for undertaking it.
> I think that what I was trying to say here is roughly that we need
> scare-quotes for "local" because of things like AUTH_SYS (or stringified
> user/group ids) that involve sending local representations over the
> network.  So your rewrite did in fact address my concern, even though I
> didn't manage to say it very well the first time :)
>
> [...]
> > >
> > >    o  When there are no potential replacement addresses in use but
> there
> > >
> > > What is a "replacement address"?
> > >
> >
> > I've explained that in some new text added before these bullets, as a new
> > second
> > paragraph of this section:
> >
> >    The appropriate action depends on the set of replacement addresses
> >    (i.e. server endpoints which are server-trunkable with one previously
> >    being used) which are available for use.
> >
> > >
> > >       are valid addresses session-trunkable with the one whose use is
> to
> > >       be discontinued, the client can use BIND_CONN_TO_SESSION to
> access
> > >       the existing session using the new address.  Although the target
> > >       session will generally be accessible, there may be cases in which
> > >       that session is no longer accessible.  In this case, the client
> > >       can create a new session to enable continued access to the
> > >       existing instance and provide for use of existing filehandles,
> > >       stateids, and client ids while providing continuity of locking
> > >       state.
> > >
> > > I'm not sure I understand this last sentence.  On its own, the "new
> > > session to enable continued access to the existing instance" sounds
> like
> > > the continued access would be on the address whose use is to cease, and
> > > thus the new session would be there.
> >
> >
> > That is not the intention.  Will need to clarify.
> >
> >
> > > But why make a new session when
> > > the old one is still good,
> >
> >
> > It isn't usable on the new connection.
> >
> >
> > > especially when we just said in the previous
> > > sentence that the old session can't be moved to the new
> > > connection/address?
> > >
> >
> > Because we can't use it on the new connection, we have to create a
> > new session to access  the client.
> >
> > Perhaps a forward reference down to Section 11.12.{4,5} for this and the
> > > next bullet point would help as well as rewording?
> > >
> >
> > It rurns out these would add confusion since they deal with migration
> > situations
> > and deciding wheher transparent stte miugration has occurred in the
> switch
> > between
> > replicas.  In the cases we are dealing with, ther is only a  single
> > replicas/fs and no
> > migration..
> >
> > Here is my proposed replacement text for the two bullets in question:
> >
> >    o  When there are no potential replacement addresses in use but there
> >       are valid addresses session-trunkable with the one whose use is to
> >       be discontinued, the client can use BIND_CONN_TO_SESSION to access
> >       the existing session using the new address.  Although the target
> >       session will generally be accessible, there may be rare situations
> >       in which that session is no longer accessible, when an attempt is
> >       made tto bind the new conntectin to it.  In this case, the client
>
> nits: s/tto/to/, s/conntectin/connection/
>

Fixed.

>
> >       can create a new session to enable continued access to the
> >       existing instance and provide for use of existing filehandles,
> >       stateids, and client ids while providing continuity of locking
> >       state.
>
> Just to check: this sounds like even in the case where the client creates
> a new session, the filehandle, stateid, clientid, and locking state
> (values) are in effect "transparently preserved" by the server, so the
> client has no need to do any reclamation of locking state.  I think that's
> what's intended, but holler if I'm wrong about that.
>
Ok.
I'll holler that *you're right about that.*

>
> >    o  When there is no potential replacement address in use and there
> >       are no valid addresses session-trunkable with the one whose use is
> >       to be discontinued, other server-trunkable addresses may be used
> >       to provide continued access.  Although use of CREATE_SESSION is
> >       available to provide continued access to the existing instance,
> >       servers have the option of providing continued access to the
> >       existing session through the new network access path in a fashion
> >       similar to that provided by session migration (see Section 11.12).
> >       To take advantage of this possibility, clients can perform an
> >       initial BIND_CONN_TO_SESSION, as in the previous case, and use
> >       CREATE_SESSION only if that fails.
> >
> >
> > > Section 11.10.6
> > >
> > >    In a file system transition, the two file systems might be clustered
> > >    in the handling of unstably written data.  When this is the case,
> and
> > >
> > > What does "clustered in the handling of unstably written data" mean?
> > >
> > >    the two file systems belong to the same write-verifier class, write
> > >
> > > How is the client supposed to determine "when this is the case"?
> > >
> >
> > Here's a prpoed replcment for this pargraph:
> >
> >    In a file system transition, the two file systems might be
> >    cooperating in the handling of unstably written data.  Clients can
> >    ditermine if this is the case, by seeing if the two file systems
> >    belong to the same write-verifier class.  When this is the case,
> >    write verifiers returned from one system may be compared to those
> >    returned by the other and superfluous writes avoided.
> >
> >
> > > Section 11.10.7
> > >
> > >    In a file system transition, the two file systems might be
> consistent
> > >    in their handling of READDIR cookies and verifiers.  When this is
> the
> > >    case, and the two file systems belong to the same readdir class,
> > >
> > > As above, how is the client supposed to determine "when this is the
> > > case"?
> >
> >
> > >    READDIR cookies and verifiers from one system may be recognized by
> > >    the other and READDIR operations started on one server may be
> validly
> > >    continued on the other, simply by presenting the cookie and verifier
> > >    returned by a READDIR operation done on the first file system to the
> > >    second.
> > >
> > > Are these "may be"s supposed to admit the possibility that the
> > > destination server can just decide to not honor them arbitrarily?
> > >
> >
> > No. They are intended to indicate that the client might or might not use
> > the capability
> >
> > Here is proposed replacement text for the paragraph:
> >
> >    In a file system transition, the two file systems might be consistent
> >    in their handling of READDIR cookies and verifiers.  Clients can
> >    determine if this is the case, by seeing if the two file systems
> >    belong to the same readdit class.  When this is the case, readdir
>
> nit: s/readdit/readdirt
>

Fixed.


>
> >    class, READDIR cookies and verifiers from one system will be
> >    recognized by the other and READDIR operations started on one server
> >    can be validly continued on the other, simply by presenting the
> >    cookie and verifier returned
>
> Ah, this formulation (for both write-verifier and readdir) is very helpful.
>
> [...]
> > > Section 11.16.1
> > >
> > >    With the exception of the transport-flag field (at offset
> > >    FSLI4BX_TFLAGS with the fls_info array), all of this data applies to
> > >    the replica specified by the entry, rather that the specific network
> > >    path used to access it.
> > >
> > > Is it clear that this applies only to the fields defined by this
> > > specification (since, as mentioned later, future extensions must
> specify
> > > whether they apply to the replica or the entry)?
> > >
> >
> > Intend to use the following replacement text:
> >
> >    With the exception of the transport-flag field (at offset
> >    FSLI4BX_TFLAGS with the fls_info array), all of this data defuined in
>
> nit: s/defuined/defined/
>

 Fixed.


> >    this specification applies to the replica specified by the entry,
> >    rather that the specific network path used to access it.  The
> >    classification of data in extensions to this data is discussed
> >    below
>
> [...]
> > > Section 18.35.3
> > >
> > > I a little bit wonder if we want to reaffirm that co_verifier remains
> > > fixed when the client is establishing multiple connections for trunking
> > > usage -- the "incarnation of the client" language here could make a
> > > reader wonder, though I think the discussion of its use elsewhere as
> > > relating to "client restart" is sufficiently clear.
> > >
> >
> > This should be made clearer but the clarification needs to be done
> multiple
> > places.
> >
> > Possible replacement text for eighth non-code paragraph of section 2.4:
> >
> >    The first field, co_verifier, is a client incarnation verifier,
> >    allowing the server to distingish successive incarnations (e.g.
> >    reboots) of the same client.  The server will start the process of
> >    canceling the client's leased state if co_verifier is different than
> >    what the server has previously recorded for the identified client (as
> >    specified in the co_ownerid field).
> >
> > Likely replacement text for the seventh paragraph of this section:
> >
> >    The eia_clientowner field is composed of a co_verifier field and a
> >    co_ownerid string.  As noted in Section 2.4, the co_ownerid describes
> >    the client, and the co_verifier is the incarnation of the client.  An
> >    EXCHANGE_ID sent with a new incarnation of the client will lead to
> >    the server removing lock state of the old incarnation.  Whereas an
> >    EXCHANGE_ID sent with the current incarnation and co_ownerid will
> >    result in an error, an update of the client ID's properties,
> >    depending on the arguments to EXCHANGE_ID, or the return of
> >    information about the existing client_id as might happen when this
> >    opration is done to the same seerver using different network
> >    addresses as part of creating trunked connections.
>

Not sure what error that error text was  referring to above.   Think it
added to the
confusion..

>
> I think I get the general sense of what is going on here (i.e., the last
> sentence) but am still uncertain on the specifics.  Namely, "most of the
> time" (TM), sending EXCHANGE_ID with current incarnation/ownerid will be an
> error, since it's a client bug to try to register the same way twice in a
> row.


No it isn't.   This is case 2 on page 508, " Non-Update on Existing Client
ID".
Given  retries and possible communication difficulties, it is just too hard
to
make this case an error.

However, some times we might have to do that in order to update
> properties of the client or get some new information that a server has
> associated to a given client ID.  I *think* (but am not sure) that the
> error case is exactly when the (same-incarnation/ownerid) EXCHANGE_ID is
> done to the same *server and address* as the original EXCHANGE_ID, and that
> the "update properties or get new information back" case is exactly when
> the EXCHANGE_ID is done to a different server/address combination.
>
> If I'm right about that, then I'd suggest:
>
> %    the server removing lock state of the old incarnation.  Whereas an
> %    EXCHANGE_ID sent with the current incarnation and co_ownerid will
> %    result in an error when sent to a given server at a given address for
> %    a second time, it is not an error to send EXCHANGE_ID with current
> %    incarnation and co_ownerid to a different server (e.g., as part of a
> %    migration event).  In such cases, the EXCHANGE_ID can allow for an
> %    update of the client ID's properties, depending on the arguments to
> %    EXCHANGE_ID, or the return of (potentially updated) information about
> %    the existing client_id, as might happen when this opration is done to
> %    the same server using different network addresses as part of creating
> %    trunked connections.
>

I think I have to revise the paragraph above to be clearer.   I anticipate
replacing the seventh paragraph of section 18.35.3 by the following
replacement:

   The eia_clientowner field is composed of a co_verifier field and a
   co_ownerid string.  As noted in Section 2.4, the co_ownerid
   identifies the client, and the co_verifier specfies a particular
   incarnation of that client.  An EXCHANGE_ID sent with a new
   incarnation of the client will lead to the server removing lock state
   of the old incarnation.  On the other hand, an EXCHANGE_ID sent with
   the current incarnation and co_ownerid will, when it does not result
   in an unrelated error, porentially update an existing client ID's
   properties, or simply return of information about the existing
   client_id.  That latter would happen when this operation is done to
   the same server using different network addresses as part of creating
   trunked connections.


> > > Section 21
> > >
> > > Some other topics at least somewhat related to trunking and migration
> > > that we could potentially justify including in the current,
> > > limited-scope, update (as opposed to deferring for a full -bis)
> include:
> > >
> >
> > Some of these are related to multi-server namespace but not related to
> > security, as far as I can see.
>
> It does look like it; in some sense I was going through a brainstorming
> exercise to make this list, and appreciate the sanity checks.  (To be
> clear, I am not insisting that any of them get covered in specifically the
> sesqui update, just mentioning topics for potential consideration.)
> >
> > >
> > > - clients that lie about reclaimed locks during a post-migration grace
> > >   period
> > >
> >
> > Will address in a number of places:
> >
> > First of all, I inted to add a new paragraph to Section 21, to be placed
> as
> > the
> > sixth non-bulleted paragraph and to read as follows:
> >
> >    Security consideration for lock reclaim differ between the state
> >    reclaim done after server failure (discussed in Section 8.4.2.1.1 and
> >    the per-fs state reclaim done in support of migration/replication
> >    (discussed in Section 11.11.9.1).
> >
> > Next is a new proposed new section to appear as Section 11.11.9.1:
> >
> > 11.11.9.1.  Security Consideration Related to Reclaiming Lock State
> >             after File System Transitions
> >
> >    Although it is possible for a client reclaiming state to misrepresent
> >    its state, in the same fashion as described in Section 8.4.2.1.1,
> >    most implementations providing for such reclamation in the case of
> >    file system transitions will have the ability to detect such
> >    misreprsentations.  this limits the ability of unauthenicatd clients
>
> typos: "misrepresentations", "This", "unauthenticated"
>

Fixed.


>
> >    to execute denial-of-service attacks in these cirsumstances.
>
> "circumstances"
>
>
Fixed.


> >    Nevertheless, the rules stated in Section 8.4.2.1.1, regarding
> >    prinipal verification for reclaim requests, apply in this situation
> >    as well.
> >
> >    Typically,implementations support file system transitions will have
>
> nits: space after comma, and "that" for "that support"
>
> Fixed.


> >    extensive information about the locks to be transferred.  This is
> >    because:
> >
> >    o  Since failure is not involved, there is no need store to locking
> >       information in persistent storage.
> >
> >    o  There is no need, as there is in the failure case, to update
> >       multiple repositories containg locking state to keep them in sync.
> >       Instead, there is a one-time communication of locking state from
> >       the source to the destination server.
> >
> >    o  Providing this information avoids potential interference with
> >       existing clients using the destination file system, by denying
> >       them the ability to obtain new locks during the grace period.
> >
> >    When such detailed locking infornation, not necessarily including the
> >    associated stateid,s is available,
>
> nits: "information", s/stateid,s/stateids,/
>

Fixed.


> >
> >    o  It is possible to detect reclaim requests that attempt to reclsim
>
> nit: s/reclsim/reclaim/
>

Fixed.

>
> >       locks that did not exist before the transfer, rejecting them with
> >       NFS4ERR_RECLAIM_BAD (Section 15.1.9.4).
> >
> >    o  It is possible when dealing with non-reclaim requests, to
> >       determine whether they conflict with existing locks, eliminating
> >       the need to return NFS4ERR_GRACE ((Section 15.1.9.2) on non-
> >       reclaim requests.
> >
> >    It is possible for implementations of grace periods in connection
> >    with file system transitions not to have detailed locking information
> >    available at the destination server, in which case the security
> >    situation is exactly as described in Section 8.4.2.1.1.
> >
> > I think I should also draw your attention to a revised Section 15.1.9.
> >  These
> > includes some revisions originally done for
> > draft-ietf-nfsv4-rfc5661-msns-update,
> > which somehow got dropped as a few that turned up as necessary in writing
> > 11.11.9.1:
> >
> > 15.1.9.  Reclaim Errors
> >
> >    These errors relate to the process of reclaiming locks after a server
> >    restart.
> >
> > 15.1.9.1.  NFS4ERR_COMPLETE_ALREADY (Error Code 10054)
> >
> >    The client previously sent a successful RECLAIM_COMPLETE operation
> >    specifying the same scope, whether that scope is global or for the
> >    same file system in the case of a per-fs RECLAIM_COMPLETE.  An
> >    additional RECLAIM_COMPLETE operation is not necessary and results in
> >    this error.
> >
> > 15.1.9.2.  NFS4ERR_GRACE (Error Code 10013)
> >
> >    This error is returned when the server was in its recovery or grace
> >    period. with regard to the file system object for which the lock was
>
> (no full stop)
>

 Fixed.


> >    requested resulting in a situation in which a non-reclaim locking
> >    request could not be granted.  This can occur because either
> >
> >    o  The server does not have sufficiuent information about locks that
> >       might be poentially reclaimed to determine whether the lock could
> >       validly be granted.
> >
> >    o  The request is made by a client responsible for reclaiming its
> >       locks that has not yet done the appropriate RECLAIM_COMPLETE
> >       operation, allowing it to proceed to obtain new locks.
> >
> >    It should be noted that, in the case of a per-fs grace period, there
> >    may be clients, i.e. those currently using the destination file
> >    system who might be unaware of the circumstances resulting in the
>
> nit: comma after "file system"
>

This phrase is now within parentheses.

>
> >    intiation of the grace period.  Such clients need to periodically
> >    retry the request until the grace period is over, just as other
> >    clients do.
> >
> > 15.1.9.3.  NFS4ERR_NO_GRACE (Error Code 10033)
> >
> >    A reclaim of client state was attempted in circumstances in which the
> >    server cannot guarantee that conflicting state has not been provided
> >    to another client.  This occurs if there is no active grace period
> >    appliying to the file system object for which the request was made,if
> >    the client making the request has no current role in reclaining
> >    locks, or because previous operations have created a situation in
> >    which the server is not able to determine that a reclaim-interfering
> >    edge condition does not exist.
> >
> > 15.1.9.4.  NFS4ERR_RECLAIM_BAD (Error Code 10034)
> >
> >    The server has determined that a reclaim attempted by the client is
> >    not valid, i.e. the lock specified as being reclaimed could not
> >    possibly have existed before the server restart or file system
> >    migration event.  A server is not obliged to make this determination
> >    and will typically rely on the client to only reclaim locks that the
> >    client was granted prior to restart.  However, when a server does
> >    have reliable information to enable it make this determination, this
> >    error indicates that the reclaim has been rejected as invalid.  This
> >    is as opposed to the error NFS4ERR_RECLAIM_CONFLICT (see
> >    Section 15.1.9.5) where the server can only determine that there has
> >    been an invalid reclaim, but cannot determine which request is
> >    invalid.
> >
> > 15.1.9.5.  NFS4ERR_RECLAIM_CONFLICT (Error Code 10035)
> >
> >    The reclaim attempted by the client has encountered a conflict and
> >    cannot be satisfied.  This potentially indicates a misbehaving
> >    client, although not necessarily the one receiving the error.  The
> >    misbehavior might be on the part of the client that established the
> >    lock with which this client conflicted.  See also Section 15.1.9.4
> >    for the related error, NFS4ERR_RECLAIM_BAD
>
> Thanks for remembering to fetch these updates from the full bis WIP!
>
> >
> > > - how attacker capabilities compare by using a compromised server to
> > >   give bogus referrals/etc. as opposed to just giving bogus data/etc.
> > >
> >
> > Will address. See the paragraphs to be added to the end of Section 21.
> >
> >
> > > - an attacker in the network trying to shift client traffic (in terms
> of
> > >   what endpoints/connections they use) to overload a server
> > >
> >
> > Will address. See the paragraphs to be added to the end of Section 21.
> >
> >
> > > - how asynchronous replication can cause clients to repeat
> > >   non-idempotent actions
> > >
> >
> > Not sure what you are referring to.
>
> I don't have something fully fleshed out here, but it's in the general
> space when there are multiple replicas that get updates at (varying)
> delays from the underlying write.  A contrived situation would be if you
> have a pool of worker machines that use NFS for state management (I know, a
> pretty questionable idea), and try to do compare-and-set on a state file.
> If one worker tries to assert that it owns the state but other NFS replicas
> see delayed updates, additional worker machines could also try to claim the
> state and perform whatever operation the state file is controlling.
>
> Basically, the point here is that if you as an NFS consumer are using NFS
> with relaxed replication semantics, you have to think through how your
> workflow will behave in the presence of such relaxed updates.  Which ought
> to be obvious, when I say it like that, but perhaps is not always actually
> obvious.
>
> >
> > > - the potential for state skew and/or data loss if migration events
> > >   happen in close succession and the client "misses a notification"
> > >
> >
> > Is there a specfic problem that needs to be addressed?
>
> I don't have a concrete scenario that's specific to NFS, no; this is just a
> generic possibility for any scheme that involves discrete updates (e.g.,
> file-modifying RPCs) and the potential for asynchronous replication.
>

I think that the necessary discussion can be folded into some clarification
of replication discussed in
another thread.


> >
> > > - cases where a filesystem moves and there's no longer anything running
> > >   at the old network endpoint to return NFS4ERR_MOVED
> > >
> >
> > This seems to me just a recognition that sometimes system fail.  Not sure
> > specifically what to address.
>
> Okay.
>
> > > - what can happen when non-idempotent requests are in a COMPOUND before
> > >   a request that gets NFS4ERR_MOVED
> > >
> >
> > Intend to address in Section 15.1.2.4:
> >
> >    The file system that contains the current filehandle object is not
> >    present at the server, or is not accessible using the network
> >    addressed.  It may have been made accessible on a different ser of
> >    network addresses, relocated or migrated to another server, or it may
> >    have never been present.  The client may obtain the new file system
> >    location by obtaining the "fs_locations" or "fs_locations_info"
> >    attribute for the current filehandle.  For further discussion, refer
> >    to Section 11.3.
> >
> >    As with the case of NFS4ERR_DELAY, it is possible that one or more
> >    non-idempotent operation may have been successfully executed within a
> >    COMPOUND before NFS4ERR_MOVED is returned.  Because of this, once the
> >    new location is determined, the original request which received the
> >    NFS4ERR_MOVED should not be re-executed in full.  Instead, a new
> >    COMPOUND, with any successfully executed non-idempotent operation
> >    removed should be executed.  This new request should have different
> >    slot id or sequence in those cases in which the same session is used
> >    for the new request (i.e. transparent session migration or an
>
> nit: comma after "i.e.".
>

Fixed.


>
> >    endpoint transition to a new address session-trunkable with the
> >    original one).
> >
> >
> > > - how bad it is if the client messes up at Transparent State Migration
> > >   discovery, most notably in the case when some lock state is lost
> > >
> >
> > Propose to address this by adding the following paragraph to the end of
> > Section 11.13.2:
> >
> >    Lease discovery needs to be provided as described above, in order to
> >    ensure that migrations are discovered soon enough to ensure that
> >    leases moved to new servers are discovred in time to make sure that
> >    leases are renewed early enough to avoid lease expiration, leading to
> >    loss of locking state.  While the consequences of such loss can be
>
> nit: the double "are discovered {soon enough,in time}" is a little awkward
> of a construction; how about "Lease discovery needs to be provided as

described above, in order to ensure that migrations are discovered soon
> enough that leases moved to new servers can successfully be renewed before
> they expire, avoiding loss of locking state"?
>

Went with the following:

        Lease discovery needs to be provided as described above, in order to
        ensure that migrations are discovered soon enough to enable
        leases moved to new servers to be  appropriately renewed in order to
        avoid lease expiration, leading to loss of locking state.
>
>
> >    ameliorated through implementations of courtesy locks, servers are
> >    under no obligation to do, and a conflicting lock request may means
>
> nit: s/means/mean/
>

Fixed.


>
> >    that a lock is revoked unexpectedly.  Clients should be aware of this
> >    possibility.
> >
> >
> >
> > > - the interactions between cached replies and migration(-like) events,
> > >   though a lot of this is discussed in section 11.13.X and 15.1.1.3
> > >   already
> > >
> >
> > Will address any specfics that you feel aren't adequately addressed.
>
> I don't remember any particular specifics, so we should probably just let
> this go for now.
>
> >
> > >
> > > but I defer to the WG as to what to cover now vs. later.
> > >
> > > In light of the ongoing work on draft-ietf-nfsv4-rpc-tls, it might be
> > > reasonable to just talk about "integrity protection" as an abstract
> > > thing without the specific focus on RPCSEC_GSS's integrity protection
> > > (or authentication)
> > >
> > >
> > I was initially leery of this, but when I looked at the text, I was able
> to
> > avoid referring to RPCSEC_GSS in most cases in which integrity was
> > mentioned:-).  The same does not seem posible for authentication :-(
>
> We'll take the easy wins and try to not fret too much about the other
> stuff.
>
> > > RPCSEC_GSS does not
> > > %   protect the binding from one server to another as part of a
> referral
> > > %   or migration event.  The source server must be trusted to provide
> > > %   the correct information, based on whatever factors are available to
> > > %   the client.
> > >
> >
> > These are both situations for which RPCSEC_GSS has no solution, but
> neither
> > is there another one.   It is probably best to just say that without
> > reference
> > to integrity protection.
>
> True.
>
> > I have added new paragraphs after these bullets that may address some of
> > the
> > issues you were concerned about.
> >
> >    Even if such requests are not interfered with in flight, it is
> >    possible for a compromised server to direct the client to use
> >    inappropriate servers, such as those under the control of the
> >    attacker.  It is not clear that being directed to such servers
> >    represents a greater threat to the client than the damage that could
> >    be done by the comprromised server itself.  However, it is possible
> >    that some sorts of transient server compromises might be taken
> >    advantage of to direct a client to a server capable of doing greater
> >    damage over a longer time.  One useful step to guard against this
> >    possibility is to issue requests to fetch location data using
> >    RPCSEC_GSS, even if no mapping to an RPCSEC_GSS principal is
> >    available.  In this case, RPCSEC_GSS would not be used, as it
> >    typically is, to identify the client principal to the server, but
> >    rather to make sure (via RPCSEC_GSS mutual authentication) that the
> >    server being contacted is the one intended.
> >
> >    Similar considrations apply if the threat to be avoided is the
> >    direction of client traffic to inappropriate (i.e. poorly performing)
> >    servers.  In both cases, there is no reason for the information
> >    returned to depend on the identity of the client principal requesting
> >    it, while the validity of the server information, which has the
> >    capability to affect all client principals, is of considerable
> >    importance.
>
> These do address some of the issues I mentioned; thank you.  I do have a
> couple further comments:
>
> - I'm not sure what "no mapping to an RPCSEC_GSS principal is available"
>   means (but maybe that's just because I've not read the RPCSEC_GSS RFCs
>   recently enough)
>

This is partly because " even if no mapping to an RPCSEC_GSS principal is
available"
is misleading.   It would have been better to say "even if no mapping  is
to an
RPCSEC_GSS principal is available for the user currently obtaining the
information".
The issue is not within RPCSEC_GSS itself but relates to how it is used by
NFSv4.
Servers are required to support RPCSEG_GSS but to use it, you need to
translate
a uid to an RPCSEC_GSS principal,   Where that mapping is not available, as
it often
is not, you cannot use RPCSEC_GSS for that user.
>    available
>
> - w.r.t. "there is no reason for the information returned to depnd on the
>   identity of the client principal", I could perhaps imagine some setup
>   that uses information from a corporate contacts database to determine the
>   current office/location of a given user and provide a referral to a
>   spatially-local replica.  So "no reason" may be too absolute (but I don't
>   have a proposed alternative and don't object to using this text).
>
>
> [...]
> > > Section B.4
> > >
> > >    o  The discussion of trunking which appeared in Section 2.10.5 of
> > >       RFC5661 [62] needed to be revised, to more clearly explain the
> > >       multiple types of trunking supporting and how the client can be
> > >       made aware of the existing trunking configuration.  In addition
> > >       the last paragraph (exclusive of sub-sections) of that section,
> > >       dealing with server_owner changes, is literally true, it has been
> > >       a source of confusion.  [...]
> > >
> > > nit: the grammar here is weird; I think there's a missing "while" or
> > > similar.

> >
> > >
> >
> >  Anticipate using the following replacement text:
> >
> >   o  The discussion of trunking which appeared in Section 2.10.5 of
> >       RFC5661 [62] needed to be revised, to more clearly explain the
> >       multiple types of trunking supporting and how the client can be
>
> nit: just "trunking support" (not "-ing")?
>

Fixed.


>
> >       made aware of the existing trunking configuration.  In addition,
> >       while the last paragraph (exclusive of sub-sections) of that
> >       section, dealing with server_owner changes, is literally true, it
> >       has been a source of confusion.  Since the existing paragraph can
> >       be read as suggesting that such changes be dealt with non-
> >       disruptively, the issue needs to be clarified in the revised
> >       section, which appears in Section 2.10.5.
>
> Thanks again for going through my giant pile of comments; I hope that you
> think the improvements to the document are worth the time spent.
>

I do.


> -Ben
>
[nfsv4] Benjamin Kaduk's Discuss on draft-ietf-nf… Benjamin Kaduk via Datatracker
Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… David Noveck
Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… David Noveck
[nfsv4] Benjamin Kaduk's Discuss on draft-ietf-nf… David Noveck
Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… David Noveck
Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… David Noveck
Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… David Noveck
Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… Magnus Westerlund