Re: [nfsv4] Benjamin Kaduk's Discuss on draft-ietf-nfsv4-rfc5661sesqui-msns-03: (with DISCUSS and COMMENT)

Benjamin Kaduk <kaduk@mit.edu> Mon, 13 January 2020 22:54 UTC

Return-Path: <kaduk@mit.edu>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E9EDE1200E3; Mon, 13 Jan 2020 14:54:30 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.2
X-Spam-Level:
X-Spam-Status: No, score=-4.2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TLQJz6K5bLxs; Mon, 13 Jan 2020 14:54:24 -0800 (PST)
Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0243D12004C; Mon, 13 Jan 2020 14:54:22 -0800 (PST)
Received: from kduck.mit.edu ([24.16.140.251]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 00DMsB2g009513 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 13 Jan 2020 17:54:14 -0500
Date: Mon, 13 Jan 2020 14:54:11 -0800
From: Benjamin Kaduk <kaduk@mit.edu>
To: David Noveck <davenoveck@gmail.com>
Cc: The IESG <iesg@ietf.org>, draft-ietf-nfsv4-rfc5661sesqui-msns@ietf.org, "nfsv4-chairs@ietf.org" <nfsv4-chairs@ietf.org>, Magnus Westerlund <magnus.westerlund@ericsson.com>, NFSv4 <nfsv4@ietf.org>
Message-ID: <20200113225411.GI66991@kduck.mit.edu>
References: <157665795217.30033.16985899397047966102.idtracker@ietfa.amsl.com> <CADaq8jegizL79V4yJf8=itMVUYDuHf=-pZgZEh-yqdT30ZdJ5w@mail.gmail.com> <CADaq8jcURAKZsNvs17MhNFT7eBNtkvOdrur5hHY2J1gXH7QdsA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CADaq8jcURAKZsNvs17MhNFT7eBNtkvOdrur5hHY2J1gXH7QdsA@mail.gmail.com>
User-Agent: Mutt/1.12.1 (2019-06-15)
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/1L6svx8igxYBlaRwlNXMfQ1W3pQ>
Subject: Re: [nfsv4] Benjamin Kaduk's Discuss on draft-ietf-nfsv4-rfc5661sesqui-msns-03: (with DISCUSS and COMMENT)
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Jan 2020 22:54:31 -0000

Hi David,

Trimming lots of good stuff here as well...

On Thu, Jan 02, 2020 at 10:09:02AM -0500, David Noveck wrote:
> On Wed, Dec 18, 2019 at 3:32 AM Benjamin Kaduk via Datatracker <
> noreply@ietf.org> wrote:
> 
> > Benjamin Kaduk has entered the following ballot position for
> > draft-ietf-nfsv4-rfc5661sesqui-msns-03: Discuss
> >
> > ----------------------------------------------------------------------
> > DISCUSS:
> > ----------------------------------------------------------------------
> >
> > Responded to these on 12/20.
> 
> >
> > ----------------------------------------------------------------------
> > COMMENT:
> > ----------------------------------------------------------------------
> >
> > I think I may have mistakenly commented on some sections that are
> > actually just moved text, since my lookahead window in the diff was too
> > small.
> 
> 
> No harm, no foul.
> 
> 
> >
> > Since the "Updates:" header is part of the immutable RFC text (though
> > "Updated by:" is mutable), we should probably explicitly state that "the
> > updates that RFCs 8178 and 8434 made to RFC 5661 apply equally to this
> > document".
> >
> 
> I think we could update the last paragraph of Section 1.1 to be more
> explicit about
> this.  Perhaps it could read:
> 
>    Until the above work is done, there will not be a consistent set of
>    documents providing a description of the NFSv4.1 protocol and any
>    full description would involve documents updating other documents
>    within the specification.   The updates applied by RFC 8434 [66] and
>    RFC 8178 [63] to RFC5661 also apply to this specification, and
>    will apply to any subsequent v4.1 specification until that work is done.

Sounds good.

> >
> > I note inline (in what is probably too many places; please don't reply
> > at all of them!) some question about how clear the text is that a file
> > system migration is something done at a per-file-system granularity, and
> > that migrating a client at a time is not possible.
> 
> 
> It might be possible but doing so is not a goal of this specfication.
> 
> I'm not sure how to address your concern.   I don't know why anyone would
> assume that migrating entire clients is a goal of this specification.   As
> far as
> I can see, when the word "migration" is used it is always in connection with
> migrating a file system.   Is there some specific place where you think
> this
> issue is likely to arise?

I think I garbled my point; my apologies.
To give a semi-concrete example, suppose I have clients A and B that are
accessing filesystem F on server X, and filesystem F is also available on
server Y.  If X decides that it needs to migrate access to F away from X
(e.g., for maintenance), then the "file system migration event" involves
telling both A and B to look to Y for access to F, at basically the same
time.  If X tries to tell only A but not B to access F via Y but lets B
continue to access F at X, then I think there can be some subtle
consistency issues.

In some sense, this is easy to consider as a dichotomy between "migration
is for server maintenance" vs. "migration is for load balancing".  Assuming
I understand correctly (not a trivial assumption!), there was never any
intent to use these mechanisms for load balancing, and if we can explicitly
disclaim such usage, then we don't have to try to reason through any
potential subtle consistency issues.

> As was the case for
> > my Discuss point about addresses/port-numbers, I'm missing the context
> > of the rest of the document, so perhaps this is a non-issue, but the
> > consequences of getting it wrong seem severe enough that I wanted to
> > check.
> >
> 
> I'm not seeing any severe consequences.   Am I missing something?
> 
> 
[...]
> 
> Section 1.1
> >
> >    The revised description of the NFS version 4 minor version 1
> >    (NFSv4.1) protocol presented in this update is necessary to enable
> >    full use of trunking in connection with multi-server namespace
> >    features and to enable the use of transparent state migration in
> >    connection with NFSv4.1.  [...]
> >
> > nit: do we expect all readers to know what is meant by "trunking" with
> > no other lead-in?
> >
> 
> Good point.  perhaps it could be addressed by rewriting the material in the
> first paragraph of  Section 1.1 to read as follows;.
> 
>    Two important features previously defined in minor version 0 but
>    never fully addressed in minor version 1 are trunking, the use of
>    multiple connections between a client and server potentially to
>    different network addresses, and transparent state migration, which
>    allows a file system to be transferred betwwen servers in a way that
>    provides for the client to maintain its existing locking state accross
>    the transfer.

Maybe "the simultaneous use of multiple connections"?
nit: s/betwwen/between/

>    The revised description of the NFS version 4 minor version 1
>    (NFSv4.1) protocol presented in this update is necessary to enable
>    full use of these features with other multi-server namespace features
>    This document is in the form of an updated description of the NFS 4.1
>    protocol previously defined in RFC5661 [62].  RFC5661 is obsoleted by
>    this document.  However, the update has a limited scope and is focused
>    on enabling full use of trunkinng and transparent state migration.  The
>    need for these changes is discussed in Appendix A.  Appendix B describes
>    the specific changes made to arrive at the current text.

This looks good, thanks.

[...]
> >
> >    o  Work would have to be done to address many erratas relevant to RFC
> >       5661, other than errata 2006 [60], which is addressed in this
> >       document.  That errata was not deferrable because of the
> >       interaction of the changes suggested in that errata and handling
> >       of state and session migration.  The erratas that have been
> >       deferred include changes originally suggested by a particular
> >       errata, which change consensus decisions made in RFC 5661, which
> >       need to be changed to ensure compatibility with existing
> >       implementations that do not follow the handling delineated in RFC
> >       5661.  Note that it is expected that such erratas will remain
> >
> > This sentence is pretty long and hard to follow; maybe it could be split
> > after "change consensus decisions made in RFC 5661" and the second half
> > start with a more declarative statement about existing implementations?
> > (E.g., "Existing implementations did not perform handling as delineated in
> > RFC
> > 5661 since the procedures therein were not workable, and in order to
> > have the specification accurately reflect the existing deployment base,
> > changes are needed [...]")
> >
> 
> I will clean this bullet up.  See below for a proposed replcement.
> 
> 
> >
> >       relevant to implementers and the authors of an eventual
> >       rfc5661bis, despite the fact that this document, when approved,
> >       will obsolete RFC 5661.
> >
> > (I assume the RFC Editor can tweak this line to reflect what actually
> > happens; my understanding is that the errata reports will get cloned to
> > this-RFC.)
> >
> 
> I understand that Magnus has already got that issue addressed.  I'll
> discuss the appropriate text with him.
> 
> 
> > [rant about "errata" vs. "erratum" elided]
> >
> 
> This is annoying but there is no way we are going to get people to use
> "erratum".   What I've tried to do in my propsed replacement text
> is to refer to "errata report(s)", which is more accurate and allows
> people who speak English to use English singulars and plurals, without
> having to worry about Latin grammar.

That's what I try to do as well :)

> Here's my proposed replacement for the troubled bullet:
> 
>    o  Work needs to be done to address many errata reports relevant to
>       RFC 5661, other than errata report 2006 [60], which is addressed
>       in this document.  Addressing of that report was not deferrable
>       because of the interaction of the changes suggested there and the
>       newly described handling of state and session migration.
> 
>       The errata reports that have been deferred and that will need to
>       be addressed in a later document include reports currently
>       assigned a range of statuses in the errata reporting system
>       including reports marked Accepted and those marked Held Over

nit: it's "Hold For Document Update"

>       because the change was too minor to address immediately.
> 
>       In addition, there is a set of other reports, including at least
>       one in state Rejected, which will need to be addressed in a later
>       document.  This will involve making changes to consensus decisions
>       reflected in RFC 5661, in situations in which the working group has
>       already decided that the treatment in RFC 5661 is incorrect, and
> needs
>       to be revised to reflect the working group's new consensus and ensure
>       compatibility with existing implementations that do not follow the
>       handling described in in RFC 5661.
> 
>       Note that it is expected that such all errata reports will remain

nit: s/such all/all such/

>       relevant to implementers and the authors of an eventual
>       rfc5661bis, despite the fact that this document, when approved,
>       will obsolete RFC 5661 [62].

This looks really good!

> 
> > Section 2.10.4
> >
> >    Servers each specify a server scope value in the form of an opaque
> >    string eir_server_scope returned as part of the results of an
> >    EXCHANGE_ID operation.  The purpose of the server scope is to allow a
> >    group of servers to indicate to clients that a set of servers sharing
> >    the same server scope value has arranged to use compatible values of
> >    otherwise opaque identifiers.  Thus, the identifiers generated by two
> >    servers within that set can be assumed compatible so that, in some
> >    cases, identifiers generated by one server in that set may be
> >    presented to another server of the same scope.
> >
> > Is there more that we can say than "in some cases"?
> 
> 
> Not really.  In general, when a server sends you an id, it comes with an
> implied promise to recognize it when you present it subsequently to the
> same server.
> 
> The fact that two servers have decided to co-operate in their Id assignment
> does not change that.
> 
> The previous text
> > implies a higher level of reliability than just "some cases", to me.
> >
> 
> I think I need to change the text, perhaps by replacing "use compatible
> values of otherwise
> opaque identifiers" by "use distinct values of otherwise opaque identifiers
> so that the two
> servers never assign the same value to two distinct objects".
> 
> I anticipate the following replacement for the first two paragraphs of
> Section 2.10.4:
> 
>    Servers each specify a server scope value in the form of an opaque
>    string eir_server_scope returned as part of the results of an
>    EXCHANGE_ID operation.  The purpose of the server scope is to allow a
>    group of servers to indicate to clients that a set of servers sharing
>    the same server scope value has arranged to use distinct values of
>    opaque identifiers so that the two servers never assign the same
>    value to two distinct object.  Thus, the identifiers generated by two
>    servers within that set can be assumed compatible so that, in certain
>    important cases, identifiers generated by one server in that set may
>    be presented to another server of the same scope.
> 
>    The use of such compatible values does not imply that a value
>    generated by one server will always be accepted by another.  In most
>    cases, it will not.  However, a server will not accept a value
>    generated by another inadvertently.  When it does accept it, it will

nit: I think it flows better to put "invertently" as "will not
inadvertently accept".

>    be because it is recognized as valid and carrying the same meaning as
>    on another server of the same scope.
> 
> 
> As an illustration of the (limited) value of this information, consider the
> case of client recovery from a server reboot.  The client has to reclaim
> his locks using file handles returned by the previous server instance.  If
> the server scopes are the same (they almost always are), the client is not
> sure he will get his locks back (e.g. the file might have been deleted),
> but he does know that, if the lock reclaim succeeds, it is for the same
> file.  If the server scopes are not the same, he has no such assurance.

Thanks, the new text (and explanation here) is very clear about what's
going on.

[...]
> > Section 11.5.5
> >
> >    will typically use the first one provided.  If that is inaccessible
> >    for some reason, later ones can be used.  In such cases the client
> >    might consider that the transition to the new replica as a migration
> >    event, even though some of the servers involved might not be aware of
> >    the use of the server which was inaccessible.  In such a case, a
> >
> > nit: the grammar here got wonky; maybe s/as a/is a/?
> >
> 
> How about s/as a/to be a/ ?

That works if you drop the earlier "that", for "the client might consider
the transition to the new replica to be a migration event".

[...]
> >
> >    o  The "local" representation of all owners and groups must be the
> >       same on all servers.  The word "local" is used here since that is
> >       the way that numeric user and group ids are described in
> >       Section 5.9.  However, when AUTH_SYS or stringified owners or
> >       group are used, these identifiers are not truly local, since they
> >       are known tothe clients as well as the server.
> >
> > I am trying to find a way to note that the AUTH_SYS case mentioned here
> > is precisely because of the requirement being imposed by this bullet
> > point,
> 
> 
> Not sure what you mean by that.  I think the requirement is to allow the
> client
> to be able to use AUTH_SYS, without the contortions that would be required
> if
> different fs's had the same uid's meaning different things.
> 
> while acknowledging that the "stringified owners or group" case
> > is separate, but not having much luck.
> >
> 
> My attempt to revise this area is below:
> 
>    Note that there is no requirement in general that the users
>    corresponding to particular security principals have the same local
>    representation on each server, even though it is most often the case
>    that this is so.
> 
>    When AUTH_SYS is used, the following additional requirements must be
>    met:
> 
>    o  Only a single NFSv4 domain can be supported through use of
>       AUTH_SYS.
> 
>    o  The "local" representation of all owners and groups must be the
>       same on all servers.  The word "local" is used here since that is
>       the way that numeric user and group ids are described in
>       Section 5.9.  However, when AUTH_SYS or stringified numeric owners
>       or groups are used, these identifiers are not truly local, since
>       they are known to the clients as well as the server.
> 
>    Similarly, when strigified numeric user and group ids are used, the
>    "local" representation of all owners and groups must be the same on
>    all servers, even when AUTH_SYS is not used.

I really like this rewriting; thank you for undertaking it.
I think that what I was trying to say here is roughly that we need
scare-quotes for "local" because of things like AUTH_SYS (or stringified
user/group ids) that involve sending local representations over the
network.  So your rewrite did in fact address my concern, even though I
didn't manage to say it very well the first time :)

[...]
> >
> >    o  When there are no potential replacement addresses in use but there
> >
> > What is a "replacement address"?
> >
> 
> I've explained that in some new text added before these bullets, as a new
> second
> paragraph of this section:
> 
>    The appropriate action depends on the set of replacement addresses
>    (i.e. server endpoints which are server-trunkable with one previously
>    being used) which are available for use.
> 
> >
> >       are valid addresses session-trunkable with the one whose use is to
> >       be discontinued, the client can use BIND_CONN_TO_SESSION to access
> >       the existing session using the new address.  Although the target
> >       session will generally be accessible, there may be cases in which
> >       that session is no longer accessible.  In this case, the client
> >       can create a new session to enable continued access to the
> >       existing instance and provide for use of existing filehandles,
> >       stateids, and client ids while providing continuity of locking
> >       state.
> >
> > I'm not sure I understand this last sentence.  On its own, the "new
> > session to enable continued access to the existing instance" sounds like
> > the continued access would be on the address whose use is to cease, and
> > thus the new session would be there.
> 
> 
> That is not the intention.  Will need to clarify.
> 
> 
> > But why make a new session when
> > the old one is still good,
> 
> 
> It isn't usable on the new connection.
> 
> 
> > especially when we just said in the previous
> > sentence that the old session can't be moved to the new
> > connection/address?
> >
> 
> Because we can't use it on the new connection, we have to create a
> new session to access  the client.
> 
> Perhaps a forward reference down to Section 11.12.{4,5} for this and the
> > next bullet point would help as well as rewording?
> >
> 
> It rurns out these would add confusion since they deal with migration
> situations
> and deciding wheher transparent stte miugration has occurred in the switch
> between
> replicas.  In the cases we are dealing with, ther is only a  single
> replicas/fs and no
> migration..
> 
> Here is my proposed replacement text for the two bullets in question:
> 
>    o  When there are no potential replacement addresses in use but there
>       are valid addresses session-trunkable with the one whose use is to
>       be discontinued, the client can use BIND_CONN_TO_SESSION to access
>       the existing session using the new address.  Although the target
>       session will generally be accessible, there may be rare situations
>       in which that session is no longer accessible, when an attempt is
>       made tto bind the new conntectin to it.  In this case, the client

nits: s/tto/to/, s/conntectin/connection/

>       can create a new session to enable continued access to the
>       existing instance and provide for use of existing filehandles,
>       stateids, and client ids while providing continuity of locking
>       state.

Just to check: this sounds like even in the case where the client creates
a new session, the filehandle, stateid, clientid, and locking state
(values) are in effect "transparently preserved" by the server, so the
client has no need to do any reclamation of locking state.  I think that's
what's intended, but holler if I'm wrong about that.

>    o  When there is no potential replacement address in use and there
>       are no valid addresses session-trunkable with the one whose use is
>       to be discontinued, other server-trunkable addresses may be used
>       to provide continued access.  Although use of CREATE_SESSION is
>       available to provide continued access to the existing instance,
>       servers have the option of providing continued access to the
>       existing session through the new network access path in a fashion
>       similar to that provided by session migration (see Section 11.12).
>       To take advantage of this possibility, clients can perform an
>       initial BIND_CONN_TO_SESSION, as in the previous case, and use
>       CREATE_SESSION only if that fails.
> 
> 
> > Section 11.10.6
> >
> >    In a file system transition, the two file systems might be clustered
> >    in the handling of unstably written data.  When this is the case, and
> >
> > What does "clustered in the handling of unstably written data" mean?
> >
> >    the two file systems belong to the same write-verifier class, write
> >
> > How is the client supposed to determine "when this is the case"?
> >
> 
> Here's a prpoed replcment for this pargraph:
> 
>    In a file system transition, the two file systems might be
>    cooperating in the handling of unstably written data.  Clients can
>    ditermine if this is the case, by seeing if the two file systems
>    belong to the same write-verifier class.  When this is the case,
>    write verifiers returned from one system may be compared to those
>    returned by the other and superfluous writes avoided.
> 
> 
> > Section 11.10.7
> >
> >    In a file system transition, the two file systems might be consistent
> >    in their handling of READDIR cookies and verifiers.  When this is the
> >    case, and the two file systems belong to the same readdir class,
> >
> > As above, how is the client supposed to determine "when this is the
> > case"?
> 
> 
> >    READDIR cookies and verifiers from one system may be recognized by
> >    the other and READDIR operations started on one server may be validly
> >    continued on the other, simply by presenting the cookie and verifier
> >    returned by a READDIR operation done on the first file system to the
> >    second.
> >
> > Are these "may be"s supposed to admit the possibility that the
> > destination server can just decide to not honor them arbitrarily?
> >
> 
> No. They are intended to indicate that the client might or might not use
> the capability
> 
> Here is proposed replacement text for the paragraph:
> 
>    In a file system transition, the two file systems might be consistent
>    in their handling of READDIR cookies and verifiers.  Clients can
>    determine if this is the case, by seeing if the two file systems
>    belong to the same readdit class.  When this is the case, readdir

nit: s/readdit/readdirt

>    class, READDIR cookies and verifiers from one system will be
>    recognized by the other and READDIR operations started on one server
>    can be validly continued on the other, simply by presenting the
>    cookie and verifier returned

Ah, this formulation (for both write-verifier and readdir) is very helpful.

[...]
> > Section 11.16.1
> >
> >    With the exception of the transport-flag field (at offset
> >    FSLI4BX_TFLAGS with the fls_info array), all of this data applies to
> >    the replica specified by the entry, rather that the specific network
> >    path used to access it.
> >
> > Is it clear that this applies only to the fields defined by this
> > specification (since, as mentioned later, future extensions must specify
> > whether they apply to the replica or the entry)?
> >
> 
> Intend to use the following replacement text:
> 
>    With the exception of the transport-flag field (at offset
>    FSLI4BX_TFLAGS with the fls_info array), all of this data defuined in

nit: s/defuined/defined/

>    this specification applies to the replica specified by the entry,
>    rather that the specific network path used to access it.  The
>    classification of data in extensions to this data is discussed
>    below

[...]
> > Section 18.35.3
> >
> > I a little bit wonder if we want to reaffirm that co_verifier remains
> > fixed when the client is establishing multiple connections for trunking
> > usage -- the "incarnation of the client" language here could make a
> > reader wonder, though I think the discussion of its use elsewhere as
> > relating to "client restart" is sufficiently clear.
> >
> 
> This should be made clearer but the clarification needs to be done multiple
> places.
> 
> Possible replacement text for eighth non-code paragraph of section 2.4:
> 
>    The first field, co_verifier, is a client incarnation verifier,
>    allowing the server to distingish successive incarnations (e.g.
>    reboots) of the same client.  The server will start the process of
>    canceling the client's leased state if co_verifier is different than
>    what the server has previously recorded for the identified client (as
>    specified in the co_ownerid field).
> 
> Likely replacement text for the seventh paragraph of this section:
> 
>    The eia_clientowner field is composed of a co_verifier field and a
>    co_ownerid string.  As noted in Section 2.4, the co_ownerid describes
>    the client, and the co_verifier is the incarnation of the client.  An
>    EXCHANGE_ID sent with a new incarnation of the client will lead to
>    the server removing lock state of the old incarnation.  Whereas an
>    EXCHANGE_ID sent with the current incarnation and co_ownerid will
>    result in an error, an update of the client ID's properties,
>    depending on the arguments to EXCHANGE_ID, or the return of
>    information about the existing client_id as might happen when this
>    opration is done to the same seerver using different network
>    addresses as part of creating trunked connections.

I think I get the general sense of what is going on here (i.e., the last
sentence) but am still uncertain on the specifics.  Namely, "most of the
time" (TM), sending EXCHANGE_ID with current incarnation/ownerid will be an
error, since it's a client bug to try to register the same way twice in a
row.  However, some times we might have to do that in order to update
properties of the client or get some new information that a server has
associated to a given client ID.  I *think* (but am not sure) that the
error case is exactly when the (same-incarnation/ownerid) EXCHANGE_ID is
done to the same *server and address* as the original EXCHANGE_ID, and that
the "update properties or get new information back" case is exactly when
the EXCHANGE_ID is done to a different server/address combination.

If I'm right about that, then I'd suggest:

%    the server removing lock state of the old incarnation.  Whereas an
%    EXCHANGE_ID sent with the current incarnation and co_ownerid will
%    result in an error when sent to a given server at a given address for
%    a second time, it is not an error to send EXCHANGE_ID with current
%    incarnation and co_ownerid to a different server (e.g., as part of a
%    migration event).  In such cases, the EXCHANGE_ID can allow for an
%    update of the client ID's properties, depending on the arguments to
%    EXCHANGE_ID, or the return of (potentially updated) information about
%    the existing client_id, as might happen when this opration is done to
%    the same server using different network addresses as part of creating
%    trunked connections.


> > Section 21
> >
> > Some other topics at least somewhat related to trunking and migration
> > that we could potentially justify including in the current,
> > limited-scope, update (as opposed to deferring for a full -bis) include:
> >
> 
> Some of these are related to multi-server namespace but not related to
> security, as far as I can see.

It does look like it; in some sense I was going through a brainstorming
exercise to make this list, and appreciate the sanity checks.  (To be
clear, I am not insisting that any of them get covered in specifically the
sesqui update, just mentioning topics for potential consideration.)
> 
> >
> > - clients that lie about reclaimed locks during a post-migration grace
> >   period
> >
> 
> Will address in a number of places:
> 
> First of all, I inted to add a new paragraph to Section 21, to be placed as
> the
> sixth non-bulleted paragraph and to read as follows:
> 
>    Security consideration for lock reclaim differ between the state
>    reclaim done after server failure (discussed in Section 8.4.2.1.1 and
>    the per-fs state reclaim done in support of migration/replication
>    (discussed in Section 11.11.9.1).
> 
> Next is a new proposed new section to appear as Section 11.11.9.1:
> 
> 11.11.9.1.  Security Consideration Related to Reclaiming Lock State
>             after File System Transitions
> 
>    Although it is possible for a client reclaiming state to misrepresent
>    its state, in the same fashion as described in Section 8.4.2.1.1,
>    most implementations providing for such reclamation in the case of
>    file system transitions will have the ability to detect such
>    misreprsentations.  this limits the ability of unauthenicatd clients

typos: "misrepresentations", "This", "unauthenticated"

>    to execute denial-of-service attacks in these cirsumstances.

"circumstances"

>    Nevertheless, the rules stated in Section 8.4.2.1.1, regarding
>    prinipal verification for reclaim requests, apply in this situation
>    as well.
> 
>    Typically,implementations support file system transitions will have

nits: space after comma, and "that" for "that support"

>    extensive information about the locks to be transferred.  This is
>    because:
> 
>    o  Since failure is not involved, there is no need store to locking
>       information in persistent storage.
> 
>    o  There is no need, as there is in the failure case, to update
>       multiple repositories containg locking state to keep them in sync.
>       Instead, there is a one-time communication of locking state from
>       the source to the destination server.
> 
>    o  Providing this information avoids potential interference with
>       existing clients using the destination file system, by denying
>       them the ability to obtain new locks during the grace period.
> 
>    When such detailed locking infornation, not necessarily including the
>    associated stateid,s is available,

nits: "information", s/stateid,s/stateids,/
> 
>    o  It is possible to detect reclaim requests that attempt to reclsim

nit: s/reclsim/reclaim/

>       locks that did not exist before the transfer, rejecting them with
>       NFS4ERR_RECLAIM_BAD (Section 15.1.9.4).
> 
>    o  It is possible when dealing with non-reclaim requests, to
>       determine whether they conflict with existing locks, eliminating
>       the need to return NFS4ERR_GRACE ((Section 15.1.9.2) on non-
>       reclaim requests.
> 
>    It is possible for implementations of grace periods in connection
>    with file system transitions not to have detailed locking information
>    available at the destination server, in which case the security
>    situation is exactly as described in Section 8.4.2.1.1.
> 
> I think I should also draw your attention to a revised Section 15.1.9.
>  These
> includes some revisions originally done for
> draft-ietf-nfsv4-rfc5661-msns-update,
> which somehow got dropped as a few that turned up as necessary in writing
> 11.11.9.1:
> 
> 15.1.9.  Reclaim Errors
> 
>    These errors relate to the process of reclaiming locks after a server
>    restart.
> 
> 15.1.9.1.  NFS4ERR_COMPLETE_ALREADY (Error Code 10054)
> 
>    The client previously sent a successful RECLAIM_COMPLETE operation
>    specifying the same scope, whether that scope is global or for the
>    same file system in the case of a per-fs RECLAIM_COMPLETE.  An
>    additional RECLAIM_COMPLETE operation is not necessary and results in
>    this error.
> 
> 15.1.9.2.  NFS4ERR_GRACE (Error Code 10013)
> 
>    This error is returned when the server was in its recovery or grace
>    period. with regard to the file system object for which the lock was

(no full stop)

>    requested resulting in a situation in which a non-reclaim locking
>    request could not be granted.  This can occur because either
> 
>    o  The server does not have sufficiuent information about locks that
>       might be poentially reclaimed to determine whether the lock could
>       validly be granted.
> 
>    o  The request is made by a client responsible for reclaiming its
>       locks that has not yet done the appropriate RECLAIM_COMPLETE
>       operation, allowing it to proceed to obtain new locks.
> 
>    It should be noted that, in the case of a per-fs grace period, there
>    may be clients, i.e. those currently using the destination file
>    system who might be unaware of the circumstances resulting in the

nit: comma after "file system"

>    intiation of the grace period.  Such clients need to periodically
>    retry the request until the grace period is over, just as other
>    clients do.
> 
> 15.1.9.3.  NFS4ERR_NO_GRACE (Error Code 10033)
> 
>    A reclaim of client state was attempted in circumstances in which the
>    server cannot guarantee that conflicting state has not been provided
>    to another client.  This occurs if there is no active grace period
>    appliying to the file system object for which the request was made,if
>    the client making the request has no current role in reclaining
>    locks, or because previous operations have created a situation in
>    which the server is not able to determine that a reclaim-interfering
>    edge condition does not exist.
> 
> 15.1.9.4.  NFS4ERR_RECLAIM_BAD (Error Code 10034)
> 
>    The server has determined that a reclaim attempted by the client is
>    not valid, i.e. the lock specified as being reclaimed could not
>    possibly have existed before the server restart or file system
>    migration event.  A server is not obliged to make this determination
>    and will typically rely on the client to only reclaim locks that the
>    client was granted prior to restart.  However, when a server does
>    have reliable information to enable it make this determination, this
>    error indicates that the reclaim has been rejected as invalid.  This
>    is as opposed to the error NFS4ERR_RECLAIM_CONFLICT (see
>    Section 15.1.9.5) where the server can only determine that there has
>    been an invalid reclaim, but cannot determine which request is
>    invalid.
> 
> 15.1.9.5.  NFS4ERR_RECLAIM_CONFLICT (Error Code 10035)
> 
>    The reclaim attempted by the client has encountered a conflict and
>    cannot be satisfied.  This potentially indicates a misbehaving
>    client, although not necessarily the one receiving the error.  The
>    misbehavior might be on the part of the client that established the
>    lock with which this client conflicted.  See also Section 15.1.9.4
>    for the related error, NFS4ERR_RECLAIM_BAD

Thanks for remembering to fetch these updates from the full bis WIP!

> 
> > - how attacker capabilities compare by using a compromised server to
> >   give bogus referrals/etc. as opposed to just giving bogus data/etc.
> >
> 
> Will address. See the paragraphs to be added to the end of Section 21.
> 
> 
> > - an attacker in the network trying to shift client traffic (in terms of
> >   what endpoints/connections they use) to overload a server
> >
> 
> Will address. See the paragraphs to be added to the end of Section 21.
> 
> 
> > - how asynchronous replication can cause clients to repeat
> >   non-idempotent actions
> >
> 
> Not sure what you are referring to.

I don't have something fully fleshed out here, but it's in the general
space when there are multiple replicas that get updates at (varying)
delays from the underlying write.  A contrived situation would be if you
have a pool of worker machines that use NFS for state management (I know, a
pretty questionable idea), and try to do compare-and-set on a state file.
If one worker tries to assert that it owns the state but other NFS replicas
see delayed updates, additional worker machines could also try to claim the
state and perform whatever operation the state file is controlling.

Basically, the point here is that if you as an NFS consumer are using NFS
with relaxed replication semantics, you have to think through how your
workflow will behave in the presence of such relaxed updates.  Which ought
to be obvious, when I say it like that, but perhaps is not always actually
obvious.

> 
> > - the potential for state skew and/or data loss if migration events
> >   happen in close succession and the client "misses a notification"
> >
> 
> Is there a specfic problem that needs to be addressed?

I don't have a concrete scenario that's specific to NFS, no; this is just a
generic possibility for any scheme that involves discrete updates (e.g.,
file-modifying RPCs) and the potential for asynchronous replication.

> 
> > - cases where a filesystem moves and there's no longer anything running
> >   at the old network endpoint to return NFS4ERR_MOVED
> >
> 
> This seems to me just a recognition that sometimes system fail.  Not sure
> specifically what to address.

Okay.

> > - what can happen when non-idempotent requests are in a COMPOUND before
> >   a request that gets NFS4ERR_MOVED
> >
> 
> Intend to address in Section 15.1.2.4:
> 
>    The file system that contains the current filehandle object is not
>    present at the server, or is not accessible using the network
>    addressed.  It may have been made accessible on a different ser of
>    network addresses, relocated or migrated to another server, or it may
>    have never been present.  The client may obtain the new file system
>    location by obtaining the "fs_locations" or "fs_locations_info"
>    attribute for the current filehandle.  For further discussion, refer
>    to Section 11.3.
> 
>    As with the case of NFS4ERR_DELAY, it is possible that one or more
>    non-idempotent operation may have been successfully executed within a
>    COMPOUND before NFS4ERR_MOVED is returned.  Because of this, once the
>    new location is determined, the original request which received the
>    NFS4ERR_MOVED should not be re-executed in full.  Instead, a new
>    COMPOUND, with any successfully executed non-idempotent operation
>    removed should be executed.  This new request should have different
>    slot id or sequence in those cases in which the same session is used
>    for the new request (i.e. transparent session migration or an

nit: comma after "i.e.".

>    endpoint transition to a new address session-trunkable with the
>    original one).
> 
> 
> > - how bad it is if the client messes up at Transparent State Migration
> >   discovery, most notably in the case when some lock state is lost
> >
> 
> Propose to address this by adding the following paragraph to the end of
> Section 11.13.2:
> 
>    Lease discovery needs to be provided as described above, in order to
>    ensure that migrations are discovered soon enough to ensure that
>    leases moved to new servers are discovred in time to make sure that
>    leases are renewed early enough to avoid lease expiration, leading to
>    loss of locking state.  While the consequences of such loss can be

nit: the double "are discovered {soon enough,in time}" is a little awkward
of a construction; how about "Lease discovery needs to be provided as
described above, in order to ensure that migrations are discovered soon
enough that leases moved to new servers can successfully be renewed before
they expire, avoiding loss of locking state"?

>    ameliorated through implementations of courtesy locks, servers are
>    under no obligation to do, and a conflicting lock request may means

nit: s/means/mean/

>    that a lock is revoked unexpectedly.  Clients should be aware of this
>    possibility.
> 
> 
> 
> > - the interactions between cached replies and migration(-like) events,
> >   though a lot of this is discussed in section 11.13.X and 15.1.1.3
> >   already
> >
> 
> Will address any specfics that you feel aren't adequately addressed.

I don't remember any particular specifics, so we should probably just let
this go for now.

> 
> >
> > but I defer to the WG as to what to cover now vs. later.
> >
> > In light of the ongoing work on draft-ietf-nfsv4-rpc-tls, it might be
> > reasonable to just talk about "integrity protection" as an abstract
> > thing without the specific focus on RPCSEC_GSS's integrity protection
> > (or authentication)
> >
> >
> I was initially leery of this, but when I looked at the text, I was able to
> avoid referring to RPCSEC_GSS in most cases in which integrity was
> mentioned:-).  The same does not seem posible for authentication :-(

We'll take the easy wins and try to not fret too much about the other
stuff.

> > RPCSEC_GSS does not
> > %   protect the binding from one server to another as part of a referral
> > %   or migration event.  The source server must be trusted to provide
> > %   the correct information, based on whatever factors are available to
> > %   the client.
> >
> 
> These are both situations for which RPCSEC_GSS has no solution, but neither
> is there another one.   It is probably best to just say that without
> reference
> to integrity protection.

True.

> I have added new paragraphs after these bullets that may address some of
> the
> issues you were concerned about.
> 
>    Even if such requests are not interfered with in flight, it is
>    possible for a compromised server to direct the client to use
>    inappropriate servers, such as those under the control of the
>    attacker.  It is not clear that being directed to such servers
>    represents a greater threat to the client than the damage that could
>    be done by the comprromised server itself.  However, it is possible
>    that some sorts of transient server compromises might be taken
>    advantage of to direct a client to a server capable of doing greater
>    damage over a longer time.  One useful step to guard against this
>    possibility is to issue requests to fetch location data using
>    RPCSEC_GSS, even if no mapping to an RPCSEC_GSS principal is
>    available.  In this case, RPCSEC_GSS would not be used, as it
>    typically is, to identify the client principal to the server, but
>    rather to make sure (via RPCSEC_GSS mutual authentication) that the
>    server being contacted is the one intended.
> 
>    Similar considrations apply if the threat to be avoided is the
>    direction of client traffic to inappropriate (i.e. poorly performing)
>    servers.  In both cases, there is no reason for the information
>    returned to depend on the identity of the client principal requesting
>    it, while the validity of the server information, which has the
>    capability to affect all client principals, is of considerable
>    importance.

These do address some of the issues I mentioned; thank you.  I do have a
couple further comments:

- I'm not sure what "no mapping to an RPCSEC_GSS principal is available"
  means (but maybe that's just because I've not read the RPCSEC_GSS RFCs
  recently enough)
- w.r.t. "there is no reason for the information returned to depnd on the
  identity of the client principal", I could perhaps imagine some setup
  that uses information from a corporate contacts database to determine the
  current office/location of a given user and provide a referral to a
  spatially-local replica.  So "no reason" may be too absolute (but I don't
  have a proposed alternative and don't object to using this text).


[...]
> > Section B.4
> >
> >    o  The discussion of trunking which appeared in Section 2.10.5 of
> >       RFC5661 [62] needed to be revised, to more clearly explain the
> >       multiple types of trunking supporting and how the client can be
> >       made aware of the existing trunking configuration.  In addition
> >       the last paragraph (exclusive of sub-sections) of that section,
> >       dealing with server_owner changes, is literally true, it has been
> >       a source of confusion.  [...]
> >
> > nit: the grammar here is weird; I think there's a missing "while" or
> > similar.
> >
> >
> 
>  Anticipate using the following replacement text:
> 
>   o  The discussion of trunking which appeared in Section 2.10.5 of
>       RFC5661 [62] needed to be revised, to more clearly explain the
>       multiple types of trunking supporting and how the client can be

nit: just "trunking support" (not "-ing")?

>       made aware of the existing trunking configuration.  In addition,
>       while the last paragraph (exclusive of sub-sections) of that
>       section, dealing with server_owner changes, is literally true, it
>       has been a source of confusion.  Since the existing paragraph can
>       be read as suggesting that such changes be dealt with non-
>       disruptively, the issue needs to be clarified in the revised
>       section, which appears in Section 2.10.5.

Thanks again for going through my giant pile of comments; I hope that you
think the improvements to the document are worth the time spent.

-Ben