[nfsv4] Benjamin Kaduk's Discuss on draft-ietf-nfsv4-rfc5661sesqui-msns-03: (with DISCUSS and COMMENT)

David Noveck <davenoveck@gmail.com> Thu, 02 January 2020 15:09 UTC
MIME-Version: 1.0
References: <157665795217.30033.16985899397047966102.idtracker@ietfa.amsl.com> <CADaq8jegizL79V4yJf8=itMVUYDuHf=-pZgZEh-yqdT30ZdJ5w@mail.gmail.com>
In-Reply-To: <CADaq8jegizL79V4yJf8=itMVUYDuHf=-pZgZEh-yqdT30ZdJ5w@mail.gmail.com>
From: David Noveck <davenoveck@gmail.com>
Date: Thu, 02 Jan 2020 10:09:02 -0500
Message-ID: <CADaq8jcURAKZsNvs17MhNFT7eBNtkvOdrur5hHY2J1gXH7QdsA@mail.gmail.com>
To: Benjamin Kaduk <kaduk@mit.edu>
Cc: The IESG <iesg@ietf.org>, draft-ietf-nfsv4-rfc5661sesqui-msns@ietf.org, "nfsv4-chairs@ietf.org" <nfsv4-chairs@ietf.org>, Magnus Westerlund <magnus.westerlund@ericsson.com>, NFSv4 <nfsv4@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000078daee059b299345"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/dR_kJrRSEQpfndPY393D--ZAm88>
Subject: [nfsv4] Benjamin Kaduk's Discuss on draft-ietf-nfsv4-rfc5661sesqui-msns-03: (with DISCUSS and COMMENT)
Precedence: list
On Wed, Dec 18, 2019 at 3:32 AM Benjamin Kaduk via Datatracker <
noreply@ietf.org> wrote:

> Benjamin Kaduk has entered the following ballot position for
> draft-ietf-nfsv4-rfc5661sesqui-msns-03: Discuss
>
> ----------------------------------------------------------------------
> DISCUSS:
> ----------------------------------------------------------------------
>
> Responded to these on 12/20.

>
> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------
>
> I think I may have mistakenly commented on some sections that are
> actually just moved text, since my lookahead window in the diff was too
> small.


No harm, no foul.


>
> Since the "Updates:" header is part of the immutable RFC text (though
> "Updated by:" is mutable), we should probably explicitly state that "the
> updates that RFCs 8178 and 8434 made to RFC 5661 apply equally to this
> document".
>

I think we could update the last paragraph of Section 1.1 to be more
explicit about
this.  Perhaps it could read:

   Until the above work is done, there will not be a consistent set of
   documents providing a description of the NFSv4.1 protocol and any
   full description would involve documents updating other documents
   within the specification.   The updates applied by RFC 8434 [66] and
   RFC 8178 [63] to RFC5661 also apply to this specification, and
   will apply to any subsequent v4.1 specification until that work is done.

>
> I note inline (in what is probably too many places; please don't reply
> at all of them!) some question about how clear the text is that a file
> system migration is something done at a per-file-system granularity, and
> that migrating a client at a time is not possible.


It might be possible but doing so is not a goal of this specfication.

I'm not sure how to address your concern.   I don't know why anyone would
assume that migrating entire clients is a goal of this specification.   As
far as
I can see, when the word "migration" is used it is always in connection with
migrating a file system.   Is there some specific place where you think
this
issue is likely to arise?

As was the case for
> my Discuss point about addresses/port-numbers, I'm missing the context
> of the rest of the document, so perhaps this is a non-issue, but the
> consequences of getting it wrong seem severe enough that I wanted to
> check.
>

I'm not seeing any severe consequences.   Am I missing something?


>
> Does a client have any way to know in advance that two addresses will be
> session-trunkable other than the one listed in Section 11.1.1 that "when
> two connections of different connection types are made to the same
> network address and are based on a single file system location entry
> they are always session-trunkable"?


No.


> It seems like mostly we're defining
> the property by saying that the client has to try it and see if it
> works; I'd love to be wrong about that.
>

We could add an extension to provide easier access to this information, but
the goal
of this document is to clarify what is possible with the current protocol.

Section 1.1
>
>    The revised description of the NFS version 4 minor version 1
>    (NFSv4.1) protocol presented in this update is necessary to enable
>    full use of trunking in connection with multi-server namespace
>    features and to enable the use of transparent state migration in
>    connection with NFSv4.1.  [...]
>
> nit: do we expect all readers to know what is meant by "trunking" with
> no other lead-in?
>

Good point.  perhaps it could be addressed by rewriting the material in the
first paragraph of  Section 1.1 to read as follows;.

   Two important features previously defined in minor version 0 but
   never fully addressed in minor version 1 are trunking, the use of
   multiple connections between a client and server potentially to
   different network addresses, and transparent state migration, which
   allows a file system to be transferred betwwen servers in a way that
   provides for the client to maintain its existing locking state accross
   the transfer.

   The revised description of the NFS version 4 minor version 1
   (NFSv4.1) protocol presented in this update is necessary to enable
   full use of these features with other multi-server namespace features
   This document is in the form of an updated description of the NFS 4.1
   protocol previously defined in RFC5661 [62].  RFC5661 is obsoleted by
   this document.  However, the update has a limited scope and is focused
   on enabling full use of trunkinng and transparent state migration.  The
   need for these changes is discussed in Appendix A.  Appendix B describes
   the specific changes made to arrive at the current text.


>    This limited scope update is applied to the main NFSv4.1 RFC with the
>
> nit: hyphenate "limited-scope"
>

Will fix.


>
>    scope as could expected by a full update of the protocol.  Below are
>    some areas which are known to need addressing in a future update of
>    the protocol.
>    [...]
>
> side note: I'd be interested in better understanding the preference for
> the subjunctive verb tense for most of these points ("work would have to
> be done"); my naive expectation would be that since there are plans to
> undertake the work, just "work needs to be done" or "work will be done"
> might suffice.
>

I think that when these words were written, the wg had not fully embraced
the idea that rfc5661bis had to be done.   The difficulty/length of rfc7530
was
a focus of attention.   Since that time, it has become generally accepted
that
this work is unavoidable and I intend to switch to the future indicative in
the next
revision.


>
>    o  Work would have to be done with regard to RFC8178 [63] which
>       establishes NFSv4-wide versioning rules.  As RFC5661 is curretly
>       inconsistent with this document, changes are needed in order to
>       arrive at a situation in which there would be no need for RFC8178
>       to update the NFSv4.1 specfication.
>
> nit: s/this document/that document/ -- "this document" is
> draft-ietf-nfsv4-

rfc5661sesqui-msns.


Will fix.


>    o  Work would have to be done with regard to RFC8434 [66], which
>       establishes the requirements for pNFS layout types, which are not
>       clearly defined in RFC5661.  When that work is done and the
>       resulting documents approved, the new NFSv4.1 specfication
>       document will provide a clear set of requirements for layout types
>       and a description of the file layout type that conforms to those
>       requirements.  Other layout types will have their own specfication
>       documents that conforms to those requirements as well.
>
> It's not entirely clear to me that the other layout types need to get
> mentioned in this document; how do they relate to the formal status of
> the "current NFSv4.1 core protocol specification document"?
>

Other layout types are not specfically mentioned but it is clear that they
will exist
and that this document needs to make provision for them, as rfc5661 did.

Such documents refer normatively to rfc5661, but rfc5661 (and this document)
would only refer to them informatively..

There is some text in Section 22.3 about setting up a layout type registry
and it
refers (informatively) to the defining documents.

It may be that incorporation of RFC8434 in rfc5661bis will involve further
references
to layout type specifications but I expect that all such references would be
 informative.



>
>    o  Work would have to be done to address many erratas relevant to RFC
>       5661, other than errata 2006 [60], which is addressed in this
>       document.  That errata was not deferrable because of the
>       interaction of the changes suggested in that errata and handling
>       of state and session migration.  The erratas that have been
>       deferred include changes originally suggested by a particular
>       errata, which change consensus decisions made in RFC 5661, which
>       need to be changed to ensure compatibility with existing
>       implementations that do not follow the handling delineated in RFC
>       5661.  Note that it is expected that such erratas will remain
>
> This sentence is pretty long and hard to follow; maybe it could be split
> after "change consensus decisions made in RFC 5661" and the second half
> start with a more declarative statement about existing implementations?
> (E.g., "Existing implementations did not perform handling as delineated in
> RFC
> 5661 since the procedures therein were not workable, and in order to
> have the specification accurately reflect the existing deployment base,
> changes are needed [...]")
>

I will clean this bullet up.  See below for a proposed replcement.


>
>       relevant to implementers and the authors of an eventual
>       rfc5661bis, despite the fact that this document, when approved,
>       will obsolete RFC 5661.
>
> (I assume the RFC Editor can tweak this line to reflect what actually
> happens; my understanding is that the errata reports will get cloned to
> this-RFC.)
>

I understand that Magnus has already got that issue addressed.  I'll
discuss the appropriate text with him.


> [rant about "errata" vs. "erratum" elided]
>

This is annoying but there is no way we are going to get people to use
"erratum".   What I've tried to do in my propsed replacement text
is to refer to "errata report(s)", which is more accurate and allows
people who speak English to use English singulars and plurals, without
having to worry about Latin grammar.

Here's my proposed replacement for the troubled bullet:

   o  Work needs to be done to address many errata reports relevant to
      RFC 5661, other than errata report 2006 [60], which is addressed
      in this document.  Addressing of that report was not deferrable
      because of the interaction of the changes suggested there and the
      newly described handling of state and session migration.

      The errata reports that have been deferred and that will need to
      be addressed in a later document include reports currently
      assigned a range of statuses in the errata reporting system
      including reports marked Accepted and those marked Held Over
      because the change was too minor to address immediately.

      In addition, there is a set of other reports, including at least
      one in state Rejected, which will need to be addressed in a later
      document.  This will involve making changes to consensus decisions
      reflected in RFC 5661, in situations in which the working group has
      already decided that the treatment in RFC 5661 is incorrect, and
needs
      to be revised to reflect the working group's new consensus and ensure
      compatibility with existing implementations that do not follow the
      handling described in in RFC 5661.

      Note that it is expected that such all errata reports will remain
      relevant to implementers and the authors of an eventual
      rfc5661bis, despite the fact that this document, when approved,
      will obsolete RFC 5661 [62].


> Section 2.10.4
>
>    Servers each specify a server scope value in the form of an opaque
>    string eir_server_scope returned as part of the results of an
>    EXCHANGE_ID operation.  The purpose of the server scope is to allow a
>    group of servers to indicate to clients that a set of servers sharing
>    the same server scope value has arranged to use compatible values of
>    otherwise opaque identifiers.  Thus, the identifiers generated by two
>    servers within that set can be assumed compatible so that, in some
>    cases, identifiers generated by one server in that set may be
>    presented to another server of the same scope.
>
> Is there more that we can say than "in some cases"?


Not really.  In general, when a server sends you an id, it comes with an
implied promise to recognize it when you present it subsequently to the
same server.

The fact that two servers have decided to co-operate in their Id assignment
does not change that.

The previous text
> implies a higher level of reliability than just "some cases", to me.
>

I think I need to change the text, perhaps by replacing "use compatible
values of otherwise
opaque identifiers" by "use distinct values of otherwise opaque identifiers
so that the two
servers never assign the same value to two distinct objects".

I anticipate the following replacement for the first two paragraphs of
Section 2.10.4:

   Servers each specify a server scope value in the form of an opaque
   string eir_server_scope returned as part of the results of an
   EXCHANGE_ID operation.  The purpose of the server scope is to allow a
   group of servers to indicate to clients that a set of servers sharing
   the same server scope value has arranged to use distinct values of
   opaque identifiers so that the two servers never assign the same
   value to two distinct object.  Thus, the identifiers generated by two
   servers within that set can be assumed compatible so that, in certain
   important cases, identifiers generated by one server in that set may
   be presented to another server of the same scope.

   The use of such compatible values does not imply that a value
   generated by one server will always be accepted by another.  In most
   cases, it will not.  However, a server will not accept a value
   generated by another inadvertently.  When it does accept it, it will
   be because it is recognized as valid and carrying the same meaning as
   on another server of the same scope.


As an illustration of the (limited) value of this information, consider the
case of client recovery from a server reboot.  The client has to reclaim
his locks using file handles returned by the previous server instance.  If
the server scopes are the same (they almost always are), the client is not
sure he will get his locks back (e.g. the file might have been deleted),
but he does know that, if the lock reclaim succeeds, it is for the same
file.  If the server scopes are not the same, he has no such assurance.


> Section 2.10.4
>
> I see the list of identifier types for which same-scope compatibility
> applies got reduced from RFC 5661 to this document, by removing session
> ID, client ID, and state ID values.  For at least one of those I can see
> this making sense as only being workable when the server really is "the
> same server", inline with the improved discussion of migration vs.
> trunking that is a main focus of this document.  Does that
> justification apply to all of them, or are there more reasons involved?
>

That was involved but overall we wound up deciding to have the list reflect
actual
server practice, as opposed to rfc5661, which focused on what it thought
servers
should do.


> We also remove the text about a client needing to compare server scope
> values during a potential migration event, to determine whether the
> migration preserved state or a reclaim is needed.  I thought this
> scenario would still be possible (and thus still need to be listed),
> though perhaps we are claiming that it is so under-specified so as to be
> never workable in practice?
>

It's not workable in practice.


> Section 2.10.5
>
>    o  When eir_server_scope changes, the client has no assurance that
>       any id's it obtained previously (e.g. file handles, state ids,
>       client ids) can be validly used on the new server, and, even if
>
> It's interesting to see file handles, state ids, and client ids listed
> together here (nit: also with lowercase "id"), when in the previous
> section we have removed state IDs and client IDs from a list that
> includes all three in RFC 5661.
>

They shouldn't be. Given the potential interaction with migration, will be
fixing
this document.


>    o  When eir_server_scope remains the same and
>       eir_server_owner.so_major_id changes, the client can use the
>       filehandles it has, consider its locking state lost, and attempt
>       to reclaim or otherwise re-obtain its locks.  It may find that its
>       file handle IS now stale but if NFS4ERR_STALE is not received, it
>       can proceed to reclaim or otherwise re-obtain its open locking
>       state.
>
> nit(?): this bit about "It may find that its file handle IS now stale
> but if NFS4ERR_STALE is not received" seems to assume some familiarity
> by the reader as to what actions would be performed that would get
> NFS4ERR_STALE back.
>

No actions needed.  I think the last sentence would be better as two
sentences:

It might find out that its file handle is now stale. However, if
NFS4ERR_STALE is not returned, the client can proceed to reclaim or
otherwise re-obtain its open locking state.


> Section 2.10.5.1
>
>    When the server responds using two different connections claim
>    matching or partially matching eir_server_owner, eir_server_scope,
>
> nit: The grammar got wonky here; maybe s/claim/claiming/?


Will fix


>
>
>
> Section 11.1.1
>
>       In the case of NFS version 4.1 and later minor versions, the means
>       of trunking detection are as described in this document and are
>       available to every client.  Two network addresses connected to the
>       same server are always server-trunkable but cannot necessarily be
>       used together to access a single session.
>
> nit: we haven't defined "server-trunkable" yet, so it may be worth a
> hint that the definition is coming soon.


I would prefer not using "server-trunkable before it is defined.   I
anticipate
rewriting this paragraph as follows:

      In the case of NFS version 4.1 and later minor versions, the means
      of trunking detection are as described in this document and are
      available to every client.  Two network addresses connected to the
      same server can always be used to together to access a particular
      server but cannot necessarily be used together to access a single
      session.  See below for definitions of the terms "server-
      trunkable" and "session-trunkable"


>
>    The combination of a server network address and a particular
>    connection type to be used by a connection is referred to as a
>    "server endpoint".  Although using different connection types may
>    result in different ports being used, the use of different ports by
>    multiple connections to the same network address is not the essence
>    of the distinction between the two endpoints used.
>
> There's perhaps a fine line to walk here, as the port can still have
> significant relevance, in general,


I'd prefer not to keep walking a fine line if that can be avoided and
intend to
allow port specification as indicated in my response to your DISCUSS
points.
Given that I've done that I need to make clear that, in this case, I am
only
referring to differences in ports due to different connection types, i.e.
2049 vs 20049.

 I anticipate revising this paragraph to read:

   The combination of a server network address and a particular
   connection type to be used by a connection is referred to as a
   "server endpoint".  Although using different connection types may
   result in different ports being used, the use of different ports by
   multiple connections to the same network address in such cases is not
   the essence of the distinction between the two endpoints used.  This
   is in contrast to the case in which the explicit specification of
   port numbers within network addresses is used to allow a single
   server node to suppport multiple NFS servers.


> and we are frequently in the IETF
> told to make no assumption about what is behind specific port values at
> a given network address.


That may be, that there is a lot of client code out there that assumes that
you
will find an NFS server at port 2049.


> (Consider, for example, a hypothetical virtual
> hosting service that provides "DS-as-a-service" where customers run
> their own MDS that point to configured DSes for actual storage.
> Different ports on that cloud provider would represent entirely
> different customers/servers!)  [This became a discuss point but it
> didn't end up including all the discussion here, so I left it as an
> informational thing; discussion should happen in the Discuss section]
>

DS's at different ports, although interesting, are not addressed by any of
this text.   Network addresses for DS's are specified in layouts, which do
use
XDR structures to define network addresses.  I assume port numbers are
appropriately dealt with.  If they aren't, then this would be an issue for
rfc566bis,
in the case of pNFS files and other documents in the case of other layout
types
(e.g. flex files)


> Section 11.1.2
>
>    o  In some cases, a server will have a namespace more extensive than
>       its local namespace by using features associated with attributes
>       that provide file system location information.  These features,
>       which allow construction of a multi-server namespace are all
>
> nit: comma after "multi-server namespace".
>

Will fix.


>    o  A file system present in a server's pseudo-fs may have multiple
>       file system instances on different servers associated with it.
>       All such instances are considered replicas of one another.
>
> [Some readers might take this as requiring live read/write replication
> such that all writes to any instance are immediately visible on all
> other instances.


I expect that such readers would have led exceptionally sheltered lives :-)


> The rest of the document ought to disabuse them of
> that notion, and yet...]
>

It is reasonable to give them more warning.

I intend to address this by adding the following to the paragraph.

      Whether such replicas can be used simultaneously is discussed in
      Section 11.11.1, while the level of co-ordination betwenn them
      (important when switching between them) is discussed in Sections
      11.11.2 through 11.11.8 below.

>
>    o  File system location entries provide the individual file system
>       locations within the file system location attributes.  Each such
>       entry specifies a server, in the form of a host name or IP an
>       address, and an fs name, which designates the location of the file
>
> nit: s/IP an/an IP/.
>

Will fix


>       client may establish connections.  There may be multiple endpoints
>       because a host name may map to multiple network addresses and
>       because multiple connection types may be used to communicate with
>       a single network address.  However, all such endpoints MUST
>       provide a way of connecting to a single server.  The exact form of
>
> nit: "MUST provide" feels strange here, since it implies in some sense
> an extra layer of indirection ("A lists X, and X among other things
> provides Y"); would a different word like "indicate" work?
>

I think "designate"would work.   Will use that in next revision.


>       element derives from a corresponding location entry.  When a
>       location entry specifies an IP address there is only a single
>       corresponding location element.  File system location entries that
>       contain a host name are resolved using DNS, and may result in one
>       or more location elements.  All location elements consist of a
>       location address which is the IP address of an interface to a
>       server and an fs name which is the location of the file system
>       within the server's local namespace.  The fs name can be empty if
>
> I can't decide whether both instances of "IP address" are pedantically
> correct, in the presence of the potential for port information to be
> included/available.  The former is probably okay, but the latter might
> need some clarification.
>

I think you can switch from."IP address" to "network address containing an
IP address"
or from "is an IP address" to "includes an IP address".  Will fix.

>
> Section 11.2
>
>    The fs_locations attribute defined in NFSv4.0 is also a part of
>    NFSv4.1.  This attribute only allows specification of the file system
>    locations where the data corresponding to a given file system may be
>    found.  Servers should make this attribute available whenever
>    fs_locations_info is supported, but client use of fs_locations_info
>    is preferable, as it provides more information.
>
> I think this was probably okay as "SHOULD make this attribute available"
> (as it was in 5661), but don't object to the lowercase version either.
>

I think "SHOULD" is right. Will fix.


> Section 11.5
>
>    Where a file system had been absent, specification of file system
>

I expect to change this to say "Where a file system is currently absent".

>
> I guess I'm probably in the rough on this one (since 5661 had my
> more-preferred language), but it still feels like "had been absent"
> implies that it is no longer absent, i.e., that it is now present or has
> otherwise changed.  What's going on here with referrals is more like a
> "was never present" case, though using "never" is of course problematic
> as it's more absolute than is appropriate.
>

I did later use "never previously present" but will switch that to "not
previously present".


>
>
> If we're going to talk about "pure referral"s, do we want to make
> mention of or otherwise differentiate/characterize "non-pure"
> ("impure"?) referrals?
>
>
I'd prefer not to do that.


> Section 11.5.1
>
>    In order to simplify client handling and allow the best choice of
>    replicas to access, the server should adhere to the following
>    guidelines.
>
> Just to check: these are just informal "guidelines" and not something
> that a server SHOULD or even MUST adhere to?
>

They are informal.   It woluld be nice to go farther but we can't as it
raises
compatibility isses to suddenly tell a sever that what it had been doing is
now illegal/strongly disfavored.

>
> Section 11.5.2
>
>    Locations entries used to discover candidate addresses for use in
>
> nit(?): is this supposed to just be "Location" singular
>

Yes.
 Will fix


> Section 11.5.3
>
>    Irrespective of the particular attribute used, when there is no
>    indication that a step-up operation can be performed, a client
>    supporting RDMA operation can establish a new RDMA connection and it
>    can be bound to the session already established by the TCP
>    connection, allowing the TCP connection to be dropped and the session
>    converted to further use in RDMA node.
>
> Should we say something to make this contingent on the server also
> supporting RDMA?
>

I will add ", if the server supports that" to the sentence.


> Section 11.5.5
>
>    will typically use the first one provided.  If that is inaccessible
>    for some reason, later ones can be used.  In such cases the client
>    might consider that the transition to the new replica as a migration
>    event, even though some of the servers involved might not be aware of
>    the use of the server which was inaccessible.  In such a case, a
>
> nit: the grammar here got wonky; maybe s/as a/is a/?
>

How about s/as a/to be a/ ?

>
> Section ??
>
> The old (RFC 5661) Section 11.5 mentioned several things, and I'd like
> to check that we have either covered or disavowed all of them.
>

No disavowal, as far as I know.


> My current understanding is that:
>
> The first paragraph basically talked about trunking detection, and is
> covered elsewhere.
>

Yes.


> The second paragraph talks about something that I would call "implicit
> replication" with the 5661 definition of "replica", but in the new model
> is essentially definitionally true, since we consider all addresses for

the same server to be ... part of the same server, so of course that
> server's namespaces match up.


We've disavowed the description although not the reality described.


> Though perhaps the discussion about not
> all of the cartesian product of (addresses-for-server, local path) being
> listed is still worth having?
>

I think this is already discussed (in two places in the next revision).


>
> The third paragraph basically talks about the need for trunking
> detection, and includes some guidance to clients about assuming server
> misconfiguration that seems of questionable merit.
>

Agree.   I think the treatment of misconfiguration is primarily in 2.10.5.
Not sure
what else to add.


> Section 11.5.7
>
>    o  Deletions from the list of network addresses for the current file
>       system instance need not be acted on immediately, although the
>       client might need to be prepared for a shift in access whenever
>       the server indicates that a network access path is not usable to
>       access the current file system, by returning NFS4ERR_MOVED.
>
> I think this should be wordsmithed a bit more, as (IIUC) the idea here
> is that if a client notices in a location response that the address the
> client is currently using for a filesystem has disappeared from the
> list, the client should be prepared for imminent changes in server
> behavior relating to the presumed-move.  Those imminent changes would
> most likely be reflected in the form of the server returning
> NFS4ERR_MOVED, but there is no NFS4ERR_MOVED involved in the actual
> deletion from the list of network instances of the current system
> instance.
>

Yes.   If there is a deletion, there need not be any MOVED error.
 However, if you
get a MOVED error, then there has to be a deletion of your address from the
list.

I anticipate replacing this bullet by the following:

   o  Deletions from the list of network addresses for the current file
      system instance do not need to be acted on immediately by ceasing
      use of existing access paths although new connections are not to
      be esablished on addresses that have been deleted.  However,
      clients can choose to act on such deletions by making preparations
      for an eventual shift in access which would become unavoidable as
      soon as the server indicates that a particular network access path
      is not usable to access the current file system, by returning
      NFS4ERR_MOVED


> Section 11.6
>
>    corresponding attribute is interrogated subsequently.  In the case of
>    a multi-server namespace, that same promise applies even if server
>    boundaries have been crossed.  Similarly, when the owner attribute of
>    a file is derived from the securiy principal which created the file,
>    that attribute should have the same value even if the interrogation
>    occurs on a different server from the file creation.
>
> I can see how the interrogation would be on a different server from file
> creation for "simple" replication scenarios, but I'm not sure I'm seeing
> how non-replication cases would arise, paritulcarly that cross server
> boundaries in a multi-server (hierarchical?) namespace.  Am I missing
> something obvious?
>

I suppose  non-replication cases cannot arise.

nit: s/securiy/security/
>

Will fix.


>
>    o  All servers support a common set of domains which includes all of
>       the domains clients use and expect to see returned as the domain
>       portion of an owner or group in the form "id@domain".  Note that
>       although this set most ofen consists of a single domain, it is
>       possible for mutiple domains to be supported.
>
> I a little bit wonder if the "most often" still holds when client
> principals come from an AD forest.
>

"Most often" describes the reality that exists.   The protocol allows
multiple domains, but the big obstacle to multi-domain support is the
on-disk file systems that support only 32-bit uids and gids, as well
as the vfs's which have the same restriction. :-(


>
>    o  All servers recognize the same set of security principals, and
>       each principal, the same credential are required, independent of
>       the server being accessed.  In addition, the group membership for
>
> nit: I think there's a missing word here, maybe "and for each
> principal"?
>

Yes, but I needed to rewrite the bullet as follows:

   o  All servers recognize the same set of security principals.  For
      each principal, the same credential is required, independent of
      the server being accessed.  In addition, the group membership for
      each such prinicipal is to be the same, independent of the server
      accessed.



>
>    Note that there is no requirment that the users corresponding to
>
> nit: "requirement"
>

Will fix.


>
>    o  The "local" representation of all owners and groups must be the
>       same on all servers.  The word "local" is used here since that is
>       the way that numeric user and group ids are described in
>       Section 5.9.  However, when AUTH_SYS or stringified owners or
>       group are used, these identifiers are not truly local, since they
>       are known tothe clients as well as the server.
>
> I am trying to find a way to note that the AUTH_SYS case mentioned here
> is precisely because of the requirement being imposed by this bullet
> point,


Not sure what you mean by that.  I think the requirement is to allow the
client
to be able to use AUTH_SYS, without the contortions that would be required
if
different fs's had the same uid's meaning different things.

while acknowledging that the "stringified owners or group" case
> is separate, but not having much luck.
>

My attempt to revise this area is below:

   Note that there is no requirement in general that the users
   corresponding to particular security principals have the same local
   representation on each server, even though it is most often the case
   that this is so.

   When AUTH_SYS is used, the following additional requirements must be
   met:

   o  Only a single NFSv4 domain can be supported through use of
      AUTH_SYS.

   o  The "local" representation of all owners and groups must be the
      same on all servers.  The word "local" is used here since that is
      the way that numeric user and group ids are described in
      Section 5.9.  However, when AUTH_SYS or stringified numeric owners
      or groups are used, these identifiers are not truly local, since
      they are known to the clients as well as the server.

   Similarly, when strigified numeric user and group ids are used, the
   "local" representation of all owners and groups must be the same on
   all servers, even when AUTH_SYS is not used.

Also, nit: "to the"
>

Fixed.




> Section 11.9
>
>    o  When use of a particular address is to cease and there is also one
>       currently in use which is server-trunkable with it, requests that
>       would have been issued on the address whose use is to be
>       discontinued can be issued on the remaining address(es).  When an
>       address is not a session-trunkable one, the request might need to
>       be modified to reflect the fact that a different session will be
>       used.
>
> I suggest writing this as "when an address is server-trunkable but not
> session-trunkable,
>

OK. Will fix.

>
>    o  When use of a particular connection is to cease, as indicated by
>       receiving NFS4ERR_MOVED when using that connection but that
>       address is still indicated as accessible according to the
>       appropriate file system location entries, it is likely that
>       requests can be issued on a new connection of a different
>       connection type, once that connection is established.  Since any
>       two server endpoints that share a network address are inherently
>       session-trunkable, the client can use BIND_CONN_TO_SESSION to
>       access the existing session using the new connection and proceed
>       to access the file system using the new connection.
>
> I'm not entirely sure how "inherent" this is (in the vein of my Discuss
> point, and what we mean by "network address").
>

Will need to address for specified ports, given that these are now allowed.

Part of this will be new text in the definition of "server endpoints":

   The combination of a server network address and a particular
   connection type to be used by a connection is referred to as a
   "server endpoint".  Although using different connection types may
   result in different ports being used, the use of different ports by
   multiple connections to the same network address in such cases is not
   the essence of the distinction between the two endpoints used.  This
   is in contrast to the case of port-specific endpoints, in which the
   explicit specification of port numbers within network addresses is
   used to allow a single server node to suppport multiple NFS servers.

Also need to to revise, as follows, the paragraph containing 'inherently".

   o  When use of a particular connection is to cease, as indicated by
      receiving NFS4ERR_MOVED when using that connection but that
      address is still indicated as accessible according to the
      appropriate file system location entries, it is likely that
      requests can be issued on a new connection of a different
      connection type, once that connection is established.  Since any
      two, non-port-specific server endpoints that share a network
      address are inherently session-trunkable, the client can use
      BIND_CONN_TO_SESSION to access the existing session using the new
      connection and proceed to access the file system using the new
      connection.

>
>    o  When there are no potential replacement addresses in use but there
>
> What is a "replacement address"?
>

I've explained that in some new text added before these bullets, as a new
second
paragraph of this section:

   The appropriate action depends on the set of replacement addresses
   (i.e. server endpoints which are server-trunkable with one previously
   being used) which are available for use.

>
>       are valid addresses session-trunkable with the one whose use is to
>       be discontinued, the client can use BIND_CONN_TO_SESSION to access
>       the existing session using the new address.  Although the target
>       session will generally be accessible, there may be cases in which
>       that session is no longer accessible.  In this case, the client
>       can create a new session to enable continued access to the
>       existing instance and provide for use of existing filehandles,
>       stateids, and client ids while providing continuity of locking
>       state.
>
> I'm not sure I understand this last sentence.  On its own, the "new
> session to enable continued access to the existing instance" sounds like
> the continued access would be on the address whose use is to cease, and
> thus the new session would be there.


That is not the intention.  Will need to clarify.


> But why make a new session when
> the old one is still good,


It isn't usable on the new connection.


> especially when we just said in the previous
> sentence that the old session can't be moved to the new
> connection/address?
>

Because we can't use it on the new connection, we have to create a
new session to access  the client.

Perhaps a forward reference down to Section 11.12.{4,5} for this and the
> next bullet point would help as well as rewording?
>

It rurns out these would add confusion since they deal with migration
situations
and deciding wheher transparent stte miugration has occurred in the switch
between
replicas.  In the cases we are dealing with, ther is only a  single
replicas/fs and no
migration..

Here is my proposed replacement text for the two bullets in question:

   o  When there are no potential replacement addresses in use but there
      are valid addresses session-trunkable with the one whose use is to
      be discontinued, the client can use BIND_CONN_TO_SESSION to access
      the existing session using the new address.  Although the target
      session will generally be accessible, there may be rare situations
      in which that session is no longer accessible, when an attempt is
      made tto bind the new conntectin to it.  In this case, the client
      can create a new session to enable continued access to the
      existing instance and provide for use of existing filehandles,
      stateids, and client ids while providing continuity of locking
      state.

   o  When there is no potential replacement address in use and there
      are no valid addresses session-trunkable with the one whose use is
      to be discontinued, other server-trunkable addresses may be used
      to provide continued access.  Although use of CREATE_SESSION is
      available to provide continued access to the existing instance,
      servers have the option of providing continued access to the
      existing session through the new network access path in a fashion
      similar to that provided by session migration (see Section 11.12).
      To take advantage of this possibility, clients can perform an
      initial BIND_CONN_TO_SESSION, as in the previous case, and use
      CREATE_SESSION only if that fails.


> Section 11.10.6
>
>    In a file system transition, the two file systems might be clustered
>    in the handling of unstably written data.  When this is the case, and
>
> What does "clustered in the handling of unstably written data" mean?
>
>    the two file systems belong to the same write-verifier class, write
>
> How is the client supposed to determine "when this is the case"?
>

Here's a prpoed replcment for this pargraph:

   In a file system transition, the two file systems might be
   cooperating in the handling of unstably written data.  Clients can
   ditermine if this is the case, by seeing if the two file systems
   belong to the same write-verifier class.  When this is the case,
   write verifiers returned from one system may be compared to those
   returned by the other and superfluous writes avoided.


> Section 11.10.7
>
>    In a file system transition, the two file systems might be consistent
>    in their handling of READDIR cookies and verifiers.  When this is the
>    case, and the two file systems belong to the same readdir class,
>
> As above, how is the client supposed to determine "when this is the
> case"?


>    READDIR cookies and verifiers from one system may be recognized by
>    the other and READDIR operations started on one server may be validly
>    continued on the other, simply by presenting the cookie and verifier
>    returned by a READDIR operation done on the first file system to the
>    second.
>
> Are these "may be"s supposed to admit the possibility that the
> destination server can just decide to not honor them arbitrarily?
>

No. They are intended to indicate that the client might or might not use
the capability

Here is proposed replacement text for the paragraph:

   In a file system transition, the two file systems might be consistent
   in their handling of READDIR cookies and verifiers.  Clients can
   determine if this is the case, by seeing if the two file systems
   belong to the same readdit class.  When this is the case, readdir
   class, READDIR cookies and verifiers from one system will be
   recognized by the other and READDIR operations started on one server
   can be validly continued on the other, simply by presenting the
   cookie and verifier returned


> Section 11.10.8
>
>    the degree indicated by the fs_locations_info attribute).  However,
>    when multiple file systems are presented as replicas of one another,
>    the precise relationship between the data of one and the data of
>    another is not, as a general matter, specified by the NFSv4.1
>    protocol.  It is quite possible to present as replicas file systems
>    where the data of those file systems is sufficiently different that
>    some applications have problems dealing with the transition between
>    replicas.  The namespace will typically be constructed so that
>    applications can choose an appropriate level of support, so that in
>    one position in the namespace a varied set of replicas will be
>    listed, while in another only those that are up-to-date may be
>    considered replicas.  [...]
>
> This seems quite wishy-washy for a standards-track protocol!  We give no
> hard bounds on how "different" replicas may be, no protocol element to
> convey even a qualitative sense of where on the spectrum of replication
> fidelity a replica may lie, and no indication as to how the namespace
> might be constructed to indicate a level of support.
>

I agree but feel that fixing this situation gets us beyond rfc5661sesqui or
even rfc5661bis,
but it would be possible to extend the fs_location_info attribute to
accommodate
this sort of information.  This would take the form of an extension to
NFsv4.2.

>
>                          The protocol does define three special cases of
>    the relationship among replicas to be specified by the server and
>    relied upon by clients:
>
> I'd like to hear from the rest of the IESG, but we may need to consider
> limiting "replication" to just these special cases until we can be more
> precise about the other cases.
>

The troublesome problem is that this terminology (and the associated wishy-
washiness has been used for multiple minor versions. That makes it hard
to change now.

>
>    o  When multiple replicas exist and are used simultaneously by a
>       client (see the FSLIB4_CLSIMUL definition within
>       fs_locations_info), they must designate the same data.  Where file
>       systems are writable, a change made on one instance must be
>       visible on all instances, immediately upon the earlier of the
>       return of the modifying requester or the visibility of that change
>       on any of the associated replicas.  This allows a client to use
>
> Hmm, how would this "earlier of [...]" work when there are three
> nominally equivalent machines?  Assume the RPC is made to A, and the
> other two are B and C.  If the update first goes visible on B, it must
> also be visible on C, instilling what is apparently a hard requirement
> for exact synchronization between B an C, perhaps by some sort of
> negotiated "make visible at timestamp X" mechanism.  But if the RPC
> returns from A first, then the change still has to be visible on B and C
> at the same time.  Does this phrasing give any weaker a requirement than
> "must be visible on all machines at the same time", in practice?  (There
> are, of course, various distributed-consensus protocols that can do
> this, as could a scenario where all NFS servers are connected to a
> common file store backend.)
>

Actually it leads to a stronger requirement and I'm not sure that's right.
 The
current text would not allow A to store  the new data in non-volatile
memory,
but delay propagation until one of  A,B,C is askedf or the data.   Will
switch to
use your suggetion.

I anticipate using the following text for this bullet:

   o  When multiple replicas exist and are used simultaneously by a
      client (see the FSLIB4_CLSIMUL definition within
      fs_locations_info), they must designate the same data.  Where file
      systems are writable, a change made on one instance must be
      visible on all instances at the same time, regardless of whether
      the interrogated instance is the one on which the modification was
      done.  This allows a client to use these replicas simultaneously
      without any special adaptation to the fact that there are multiple
      replicas, beyond adapting to the fact that locks obtained on one
      replica are maintained separately (i.e. under a different client
      ID).  In this case, locks (whether share reservations or byte-
      range locks) and delegations obtained on one replica are
      immediately reflected on all replicas, in the sense that access
      from all other servers is prevented regardless of the replica
      used.  However, because the servers are not required to treat two
      associated client IDs as representing the same client, it is best
      to access each file using only a single client ID.


> Section 11.10.9
>
>    When access is transferred between replicas, clients need to be
>    assured that the actions disallowed by holding these locks cannot
>
> To check my understanding: this "access is transferred" means *all*
> clients' access (not just one particular client)?  Otherwise I'm not
> sure how the destination would know to enforce the grace period.
>

It doesn't imply that.   Even if all clients of the source fs were
transferred, there will be clients of the dstination fs who have not been
transferred>

As far as grace periods go, co-operating servers could transfer locks
transparently or limit a grace period to a single transferred client

The following anticipated text for Section 11.11.9 may be helpful:

   When access is transferred between replicas, clients need to be
   assured that the actions disallowed by holding these locks cannot
   have occurred during the transition.  This can be ensured by the
   methods below.  Unless at least one of these is implemented, clients
   will not be assured of continuity of lock possession across a
   migration event.

   o  Providing the client an opportunity to re-obtain his locks via a
      per-fs grace period on the destination server, denying all clients
      using the destination filesysten the opportunity to obtain new
      locks that conflict which those held by the transferred client as
      long as that client has not completed its per-fs grace period.
      Because the lock reclaim mechanism was originally defined to
      support server reboot, it implicitly assumes that file handles
      will, upon reclaim, will be the same as those at open.  In the
      case of migration, this requires that source and destination
      servers use the same filehandles, as evidenced by using the same
      server scope (see Section 2.10.4) or by showing this agreement
      using fs_locations_info (see Section 11.11.2 above).

      Note that such a grace period can be implemented without
      interfering with the ability of non-transferred clients to obtain
      new locks while it is going on.  As long as the destination server
      is aware of the transferred locks, it can distinguish requests to
      obtain new locks that contrast with existing locks from those that
      do not, allowing it to treat such client requests without
      reference to the ongoing grace period.


> Section 11.11.1
>
> I think the last two paragraphs might be duplicating some things
> mentioned earlier in the section, but the repetition is probably not
> harmful.
>
> Section 11.12.1
>
>    Because of the absence of NFSV4ERR_LEASE_MOVED, it is possible for
>    file systems whose access path has not changed to be successfully
>
> It might be worth phrasing this as "SEQ4_STATUS_LEASE_MOVED is not an
> error condition".
>

Will do.


> Section 11.12.2
>
>    o  No action needs to be taken for such indications received by the
>       those performing migration discovery, since continuation of that
>       work will address the issue.
>
> nit: "by the those" is not right, but the proper fix eludes me, as this
> bullet point needs to be more specific somehow than the next one.
>

I expect to use the following replacement:

   o  No action needs to be taken for such indications received by any
      threads performing migration discovery, since continuation of that
      work will address the issue.


>
>    o  If the fs_status attribute indicates that the file system is a
>       migrated one (i.e. fss_absent is true and fss_type !=
>       STATUS4_REFERRAL) and thus that it is likely that the fetch of the
>       file system location attribute has cleared one the file systems
>       contributing to the lease-migrated indication.
>
> This looks like a sentence fragment -- it's of the form "If X, and thus
> Y." with no concluding clause.
>

It is.  Will fix by using the following replacement:

   o  If the fs_status attribute indicates that the file system is a
      migrated one (i.e. fss_absent is true and fss_type !=
      STATUS4_REFERRAL) then a migrated file system has been found.  In
      this situation, it is likely that the fetch of the file system
      location attribute has cleared one the file systems contributing
      to the lease-migrated indication.



> Section 11.12.4
>
>    Once the client has determined the initial migration status, and
>    determined that there was a shift to a new server, it needs to re-
>    establish its locking state, if possible.  To enable this to happen
>    without loss of the guarantees normally provided by locking, the
>    destination server needs to implement a per-fs grace period in all
>    cases in which lock state was lost, including those in which
>    Transparent State Migration was not implemented.
>
> Similarly to above, does this imply that the migration has to happen for
> all clients concurrently, as opposed to clients getting migrated in
> sequence?
>

No, it doesn't.  The following replacement text should help make this clear:

   Once the client has determined the initial migration status, and
   determined that there was a shift to a new server, it needs to re-
   establish its locking state, if possible.  To enable this to happen
   without loss of the guarantees normally provided by locking, the
   destination server needs to implement a per-fs grace period in all
   cases in which lock state was lost, including those in which
   Transparent State Migration was not implemented.  Each client for
   which there was a shift of locking state to the new server will have
   the length of the grace period to reclaim its locks, from the time
   its locks were transferred.


> Section 11.3.1
>
>    In this case, destination server need have no knowledge of the locks
>
> nit: singular/plural mismatch "destination server"/"need"
>

"needs have" would not work as "need" was being used as an (undeclinble)
auxiliary verb.

Went with "does not need any knowledge".

Also clarified stuff about grace period with the following replacement text:

   In this case, destination server does not need any knowledge of the
   locks held on the source server, but relies on the clients to
   accurately report (via reclaim operations) the locks previously held,
   not allowing new locks to be granted on migrated file system until
   the grace period expires.  Note that the diallowing of new locks
   applies to all clients accessing the file system, while grace period
   expiration occurs for each migrated client independently.



> Section 11.13.3
>
>    o  Not responding with NFS4ERR_SEQ_MISORDERED for the initial request
>       on a slot within a transferred session, since the destination
>
> Does this then translate to "process as usual in the absence of
> migration"?  "Don't return error X" tells me what not to do, but doesn't
> really tell me what to do instead.
>

Intend to use the following replacement text:

   o  Not responding with NFS4ERR_SEQ_MISORDERED for the initial request
      on a slot within a transferred session, since the destination
      server cannot be aware of requests made by the client after the
      server handoff but before the client became aware of the shift.
      In cases in which NFS4ERR_SEQ_MISORDERED would normally have been
      reported, the request is to be processed normally, as a new
      request.


> Section 11.16.1
>
>    With the exception of the transport-flag field (at offset
>    FSLI4BX_TFLAGS with the fls_info array), all of this data applies to
>    the replica specified by the entry, rather that the specific network
>    path used to access it.
>
> Is it clear that this applies only to the fields defined by this
> specification (since, as mentioned later, future extensions must specify
> whether they apply to the replica or the entry)?
>

Intend to use the following replacement text:

   With the exception of the transport-flag field (at offset
   FSLI4BX_TFLAGS with the fls_info array), all of this data defuined in
   this specification applies to the replica specified by the entry,
   rather that the specific network path used to access it.  The
   classification of data in extensions to this data is discussed
   below


> Section 15.1.1.3
>
>    o  When NFS4ERR_DELAY is returned on an operation other than the
>       first within a request and there has been a non-idempotent
>       operation processed before the NFS4ERR_DELAY was returned, the
>       reissued request should avoid the non-idempotent operation.  The
>       request still must use a SEQUENCE operation with either a
>       different slot id or sequence value from the SEQUENCE in the
>       original request.  Because this is done, there is no way the
>       replier could avoid spuriously re-executing the non-idempotent
>       operation since the different SEQUENCE parameters prevent the
>       requester from recognizing that the non-idempotent operation us
>       being retried.
>
> I don't think that this is very clear about the counterfactual scenario
> in which the replier is trying to avoid spuriously re-executing the
> non-idempotent operation.  Is it supposed to be explaining why the
> client has to use a different slot or sequence value, because the
> replier would reexecute the non-idempotent operation otherwise?
>

Expect to use the following replacement text:

   o  When NFS4ERR_DELAY is returned on an operation other than the
      first within a request and there has been a non-idempotent
      operation processed before the NFS4ERR_DELAY was returned,
      reissuing the request as is normally done would incorrectly cause
      the re-execution of the non-idempotent operation.

      To avoid this, the reissued request should avoid the non-
      idempotent operation.  The request still must use a SEQUENCE
      operation with either a different slot id or sequence value from
      the SEQUENCE in the original request.  Because this is done, there
      is no way the replier could avoid spuriously re-executing the non-
      idempotent operation since the different SEQUENCE parameters
      prevent the requester from recognizing that the non-idempotent
      operation is being retried.


> Section 18.35.3
>
> I a little bit wonder if we want to reaffirm that co_verifier remains
> fixed when the client is establishing multiple connections for trunking
> usage -- the "incarnation of the client" language here could make a
> reader wonder, though I think the discussion of its use elsewhere as
> relating to "client restart" is sufficiently clear.
>

This should be made clearer but the clarification needs to be done multiple
places.

Possible replacement text for eighth non-code paragraph of section 2.4:

   The first field, co_verifier, is a client incarnation verifier,
   allowing the server to distingish successive incarnations (e.g.
   reboots) of the same client.  The server will start the process of
   canceling the client's leased state if co_verifier is different than
   what the server has previously recorded for the identified client (as
   specified in the co_ownerid field).

Likely replacement text for the seventh paragraph of this section:

   The eia_clientowner field is composed of a co_verifier field and a
   co_ownerid string.  As noted in Section 2.4, the co_ownerid describes
   the client, and the co_verifier is the incarnation of the client.  An
   EXCHANGE_ID sent with a new incarnation of the client will lead to
   the server removing lock state of the old incarnation.  Whereas an
   EXCHANGE_ID sent with the current incarnation and co_ownerid will
   result in an error, an update of the client ID's properties,
   depending on the arguments to EXCHANGE_ID, or the return of
   information about the existing client_id as might happen when this
   opration is done to the same seerver using different network
   addresses as part of creating trunked connections.




>    The eia_clientowner field is composed of a co_verifier field and a
>    co_ownerid string.  As noted in s Section 2.4, the co_ownerid
>
> s/s //
>

Will fix.


> Section 18.51.4
>
>    o  When a server might become the destination for a file system being
>       migrated, inappropriate use of per-fs RECLAIM_COMPLETE is more
>       concerning.  In the case in which the file system designated is
>       not within a per-fs grace period, the per-fs RECLAIM_COMPLETE
>       SHOULD be ignored, with the negative consequences of accepting it
>       being limited, as in the case in which migration is not supported.
>       However, if the server encounters a file system undergoing
>       migration, the operation cannot be accepted as if it were a global
>       RECLAIM_COMPLETE without invalidating its intended use.
>
> This seems to be the only place where we acknowledge that the "misuse"
> in question was to "treat rca_one_fs of TRUE as if it was FALSE", which
> is probably not so great for clarity.
>

Addressed in the following revised pargraph in SEction 18.51.4:

   Because previous descriptions of RECLAIM_COMPLETE were not
   sufficiently explicit about the circumstances in which use of
   RECLAIM_COMPLETE with rca_one_fs set to TRUE was appropriate, there
   have been cases which it has been misused by clients who have issued
   RECLAIM_COMPLETE with rca_one_fs set to TRUE when it should have not
   been.  There have also been cases in which servers have, in various
   ways, not responded to such misuse as described above, either
   ignoring the rca_one_fs setting (treating the operation as a global
   RECLAIM_COMPLETE) or ignoring the entire operation.


> Section 21
>
> Some other topics at least somewhat related to trunking and migration
> that we could potentially justify including in the current,
> limited-scope, update (as opposed to deferring for a full -bis) include:
>

Some of these are related to multi-server namespace but not related to
security, as far as I can see.


>
> - clients that lie about reclaimed locks during a post-migration grace
>   period
>

Will address in a number of places:

First of all, I inted to add a new paragraph to Section 21, to be placed as
the
sixth non-bulleted paragraph and to read as follows:

   Security consideration for lock reclaim differ between the state
   reclaim done after server failure (discussed in Section 8.4.2.1.1 and
   the per-fs state reclaim done in support of migration/replication
   (discussed in Section 11.11.9.1).

Next is a new proposed new section to appear as Section 11.11.9.1:

11.11.9.1.  Security Consideration Related to Reclaiming Lock State
            after File System Transitions

   Although it is possible for a client reclaiming state to misrepresent
   its state, in the same fashion as described in Section 8.4.2.1.1,
   most implementations providing for such reclamation in the case of
   file system transitions will have the ability to detect such
   misreprsentations.  this limits the ability of unauthenicatd clients
   to execute denial-of-service attacks in these cirsumstances.
   Nevertheless, the rules stated in Section 8.4.2.1.1, regarding
   prinipal verification for reclaim requests, apply in this situation
   as well.

   Typically,implementations support file system transitions will have
   extensive information about the locks to be transferred.  This is
   because:

   o  Since failure is not involved, there is no need store to locking
      information in persistent storage.

   o  There is no need, as there is in the failure case, to update
      multiple repositories containg locking state to keep them in sync.
      Instead, there is a one-time communication of locking state from
      the source to the destination server.

   o  Providing this information avoids potential interference with
      existing clients using the destination file system, by denying
      them the ability to obtain new locks during the grace period.

   When such detailed locking infornation, not necessarily including the
   associated stateid,s is available,

   o  It is possible to detect reclaim requests that attempt to reclsim
      locks that did not exist before the transfer, rejecting them with
      NFS4ERR_RECLAIM_BAD (Section 15.1.9.4).

   o  It is possible when dealing with non-reclaim requests, to
      determine whether they conflict with existing locks, eliminating
      the need to return NFS4ERR_GRACE ((Section 15.1.9.2) on non-
      reclaim requests.

   It is possible for implementations of grace periods in connection
   with file system transitions not to have detailed locking information
   available at the destination server, in which case the security
   situation is exactly as described in Section 8.4.2.1.1.

I think I should also draw your attention to a revised Section 15.1.9.
 These
includes some revisions originally done for
draft-ietf-nfsv4-rfc5661-msns-update,
which somehow got dropped as a few that turned up as necessary in writing
11.11.9.1:

15.1.9.  Reclaim Errors

   These errors relate to the process of reclaiming locks after a server
   restart.

15.1.9.1.  NFS4ERR_COMPLETE_ALREADY (Error Code 10054)

   The client previously sent a successful RECLAIM_COMPLETE operation
   specifying the same scope, whether that scope is global or for the
   same file system in the case of a per-fs RECLAIM_COMPLETE.  An
   additional RECLAIM_COMPLETE operation is not necessary and results in
   this error.

15.1.9.2.  NFS4ERR_GRACE (Error Code 10013)

   This error is returned when the server was in its recovery or grace
   period. with regard to the file system object for which the lock was
   requested resulting in a situation in which a non-reclaim locking
   request could not be granted.  This can occur because either

   o  The server does not have sufficiuent information about locks that
      might be poentially reclaimed to determine whether the lock could
      validly be granted.

   o  The request is made by a client responsible for reclaiming its
      locks that has not yet done the appropriate RECLAIM_COMPLETE
      operation, allowing it to proceed to obtain new locks.

   It should be noted that, in the case of a per-fs grace period, there
   may be clients, i.e. those currently using the destination file
   system who might be unaware of the circumstances resulting in the
   intiation of the grace period.  Such clients need to periodically
   retry the request until the grace period is over, just as other
   clients do.

15.1.9.3.  NFS4ERR_NO_GRACE (Error Code 10033)

   A reclaim of client state was attempted in circumstances in which the
   server cannot guarantee that conflicting state has not been provided
   to another client.  This occurs if there is no active grace period
   appliying to the file system object for which the request was made,if
   the client making the request has no current role in reclaining
   locks, or because previous operations have created a situation in
   which the server is not able to determine that a reclaim-interfering
   edge condition does not exist.

15.1.9.4.  NFS4ERR_RECLAIM_BAD (Error Code 10034)

   The server has determined that a reclaim attempted by the client is
   not valid, i.e. the lock specified as being reclaimed could not
   possibly have existed before the server restart or file system
   migration event.  A server is not obliged to make this determination
   and will typically rely on the client to only reclaim locks that the
   client was granted prior to restart.  However, when a server does
   have reliable information to enable it make this determination, this
   error indicates that the reclaim has been rejected as invalid.  This
   is as opposed to the error NFS4ERR_RECLAIM_CONFLICT (see
   Section 15.1.9.5) where the server can only determine that there has
   been an invalid reclaim, but cannot determine which request is
   invalid.

15.1.9.5.  NFS4ERR_RECLAIM_CONFLICT (Error Code 10035)

   The reclaim attempted by the client has encountered a conflict and
   cannot be satisfied.  This potentially indicates a misbehaving
   client, although not necessarily the one receiving the error.  The
   misbehavior might be on the part of the client that established the
   lock with which this client conflicted.  See also Section 15.1.9.4
   for the related error, NFS4ERR_RECLAIM_BAD


- how attacker capabilities compare by using a compromised server to
>   give bogus referrals/etc. as opposed to just giving bogus data/etc.
>

Will address. See the paragraphs to be added to the end of Section 21.


> - an attacker in the network trying to shift client traffic (in terms of
>   what endpoints/connections they use) to overload a server
>

Will address. See the paragraphs to be added to the end of Section 21.


> - how asynchronous replication can cause clients to repeat
>   non-idempotent actions
>

Not sure what you are referring to.


> - the potential for state skew and/or data loss if migration events
>   happen in close succession and the client "misses a notification"
>

Is there a specfic problem that needs to be addressed?


> - cases where a filesystem moves and there's no longer anything running
>   at the old network endpoint to return NFS4ERR_MOVED
>

This seems to me just a recognition that sometimes system fail.  Not sure
specifically what to address.

- what can happen when non-idempotent requests are in a COMPOUND before
>   a request that gets NFS4ERR_MOVED
>

Intend to address in Section 15.1.2.4:

   The file system that contains the current filehandle object is not
   present at the server, or is not accessible using the network
   addressed.  It may have been made accessible on a different ser of
   network addresses, relocated or migrated to another server, or it may
   have never been present.  The client may obtain the new file system
   location by obtaining the "fs_locations" or "fs_locations_info"
   attribute for the current filehandle.  For further discussion, refer
   to Section 11.3.

   As with the case of NFS4ERR_DELAY, it is possible that one or more
   non-idempotent operation may have been successfully executed within a
   COMPOUND before NFS4ERR_MOVED is returned.  Because of this, once the
   new location is determined, the original request which received the
   NFS4ERR_MOVED should not be re-executed in full.  Instead, a new
   COMPOUND, with any successfully executed non-idempotent operation
   removed should be executed.  This new request should have different
   slot id or sequence in those cases in which the same session is used
   for the new request (i.e. transparent session migration or an
   endpoint transition to a new address session-trunkable with the
   original one).


> - how bad it is if the client messes up at Transparent State Migration
>   discovery, most notably in the case when some lock state is lost
>

Propose to address this by adding the following paragraph to the end of
Section 11.13.2:

   Lease discovery needs to be provided as described above, in order to
   ensure that migrations are discovered soon enough to ensure that
   leases moved to new servers are discovred in time to make sure that
   leases are renewed early enough to avoid lease expiration, leading to
   loss of locking state.  While the consequences of such loss can be
   ameliorated through implementations of courtesy locks, servers are
   under no obligation to do, and a conflicting lock request may means
   that a lock is revoked unexpectedly.  Clients should be aware of this
   possibility.



> - the interactions between cached replies and migration(-like) events,
>   though a lot of this is discussed in section 11.13.X and 15.1.1.3
>   already
>

Will address any specfics that you feel aren't adequately addressed.


>
> but I defer to the WG as to what to cover now vs. later.
>
> In light of the ongoing work on draft-ietf-nfsv4-rpc-tls, it might be
> reasonable to just talk about "integrity protection" as an abstract
> thing without the specific focus on RPCSEC_GSS's integrity protection
> (or authentication)
>
>
I was initially leery of this, but when I looked at the text, I was able to
avoid referring to RPCSEC_GSS in most cases in which integrity was
mentioned:-).  The same does not seem posible for authentication :-(


>       being returned.  These include cases in which the client is
>       directed a server under the control of an attacker, who might get
>
> nit: "directed to"
>

Fixed.


>    o  Despite the fact that it is a requirement that "implementations"
>       provide "support" for use of RPCSEC_GSS, it cannot be assumed that
>       use of RPCSEC_GSS is always available between any particular
>       client-server pair.
>
> side note: scare-quotes around "support" makes sense to me, but not
> around "implementations".
>

Fixed.


>
>    the destination.  Even when RPCSEC_GSS authentication is available on
>    the destination, the server might validly represent itself as the
>    server to which the client was erroneously directed.  Without a way
>
> Something about the wording here tickles me funny; at first I thought it
> was the "validly", but now I think it's "represent itself", perhaps
> because that phrasing can have connotations of "falsely represent".
> ("Valid" is fine -- the attack here is the misdirection, and the target
> of the misdirection doesn't have to misbehave at all for it to be a
> damaging attack.)  The best remedy I can come up with is a somewhat
> drastic change, and thus questionable: "Even when [...], the server
> might still properly authenticate as the server to which the client was
> erroneously directed."
>

I think it's a good remedy and I intend to adopt it.

>
>
> I'd also consider adding a third bullet point to the final list ("to
> summarize considerations regarding the use of RPCSEC_GSS"):
>

Actually it is to summarise considertions regarding its use in fetching
location information

>
> % o The integrity protection afforded to results by RPCSEC_GSS protects
> %   only a given request/response transaction;


True but that is in the nature of fetching location information.


> RPCSEC_GSS does not
> %   protect the binding from one server to another as part of a referral
> %   or migration event.  The source server must be trusted to provide
> %   the correct information, based on whatever factors are available to
> %   the client.
>

These are both situations for which RPCSEC_GSS has no solution, but neither
is there another one.   It is probably best to just say that without
reference
to integrity protection.

I have added new paragraphs after these bullets that may address some of
the
issues you were concerned about.

   Even if such requests are not interfered with in flight, it is
   possible for a compromised server to direct the client to use
   inappropriate servers, such as those under the control of the
   attacker.  It is not clear that being directed to such servers
   represents a greater threat to the client than the damage that could
   be done by the comprromised server itself.  However, it is possible
   that some sorts of transient server compromises might be taken
   advantage of to direct a client to a server capable of doing greater
   damage over a longer time.  One useful step to guard against this
   possibility is to issue requests to fetch location data using
   RPCSEC_GSS, even if no mapping to an RPCSEC_GSS principal is
   available.  In this case, RPCSEC_GSS would not be used, as it
   typically is, to identify the client principal to the server, but
   rather to make sure (via RPCSEC_GSS mutual authentication) that the
   server being contacted is the one intended.

   Similar considrations apply if the threat to be avoided is the
   direction of client traffic to inappropriate (i.e. poorly performing)
   servers.  In both cases, there is no reason for the information
   returned to depend on the identity of the client principal requesting
   it, while the validity of the server information, which has the
   capability to affect all client principals, is of considerable
   importance.


> Section 22.1
>
> Thank you for thinking about how the IANA considerations should be
> presented in the post-update document.  (I think I've had to place at
> least two Discuss positions on bis documents that did not...)
>
> Section 23.2
>
> I'm not sure that all of the moves from Normative to Informative should
> stick; e.g., HMAC (which went from [11] to [59]) is needed for SSV
> calculation.  Hmm, actually, maybe that's the only one.
>

Moved it back to being Normative.


>
> Appendix B
>
> I have mixed feelings about whether to keep this content for the final
> RFC.  (Appendix A seems clearly useful; the specific details of the
> reorganization are less clear, as to some extent they can be deduced
> from the changes themselves.  But only to some extent...)
>
> Appendix B.1.2
>
>    o  The new Sections 11.8 and 11.9 have resulted in existing sections
>       wit these numbers to be renumbered.
>
> s/wit/with/
>
> Section B.2.1
>
>    The new treatment can be found in Section 18.35 below.  It is
>
> s/below/above/
>
>    intended to supersede the treatment in Section 18.35 of RFC5661 [62].
>    Publishing a complete replacement for Section 18.35 allows the
>    corrected definition to be read as a whole, in place of the one in
>    RFC5661 [62].
>
> This seems like it was more appropriate in the scope of
> draft-ietf-nfsv4-mv1-msns-update but could be obsolete here.
>
> Section B.4
>
>    o  The discussion of trunking which appeared in Section 2.10.5 of
>       RFC5661 [62] needed to be revised, to more clearly explain the
>       multiple types of trunking supporting and how the client can be
>       made aware of the existing trunking configuration.  In addition
>       the last paragraph (exclusive of sub-sections) of that section,
>       dealing with server_owner changes, is literally true, it has been
>       a source of confusion.  [...]
>
> nit: the grammar here is weird; I think there's a missing "while" or
> similar.
>
>

 Anticipate using the following replacement text:

  o  The discussion of trunking which appeared in Section 2.10.5 of
      RFC5661 [62] needed to be revised, to more clearly explain the
      multiple types of trunking supporting and how the client can be
      made aware of the existing trunking configuration.  In addition,
      while the last paragraph (exclusive of sub-sections) of that
      section, dealing with server_owner changes, is literally true, it
      has been a source of confusion.  Since the existing paragraph can
      be read as suggesting that such changes be dealt with non-
      disruptively, the issue needs to be clarified in the revised
      section, which appears in Section 2.10.5.
[nfsv4] Benjamin Kaduk's Discuss on draft-ietf-nf… Benjamin Kaduk via Datatracker
Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… David Noveck
Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… David Noveck
[nfsv4] Benjamin Kaduk's Discuss on draft-ietf-nf… David Noveck
Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… David Noveck
Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… David Noveck
Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… David Noveck
Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… Magnus Westerlund