Re: [nfsv4] Benjamin Kaduk's Discuss on draft-ietf-nfsv4-rfc5661sesqui-msns-03: (with DISCUSS and COMMENT)

Benjamin Kaduk <kaduk@mit.edu> Wed, 22 January 2020 03:17 UTC

Return-Path: <kaduk@mit.edu>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BAC7112002F; Tue, 21 Jan 2020 19:17:03 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.2
X-Spam-Level:
X-Spam-Status: No, score=-4.2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zZ8rjYCboEdI; Tue, 21 Jan 2020 19:16:58 -0800 (PST)
Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 405AB120013; Tue, 21 Jan 2020 19:16:58 -0800 (PST)
Received: from kduck.mit.edu ([24.16.140.251]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 00M3GoLW001260 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 21 Jan 2020 22:16:52 -0500
Date: Tue, 21 Jan 2020 19:16:50 -0800
From: Benjamin Kaduk <kaduk@mit.edu>
To: David Noveck <davenoveck@gmail.com>
Cc: The IESG <iesg@ietf.org>, draft-ietf-nfsv4-rfc5661sesqui-msns@ietf.org, "nfsv4-chairs@ietf.org" <nfsv4-chairs@ietf.org>, Magnus Westerlund <magnus.westerlund@ericsson.com>, NFSv4 <nfsv4@ietf.org>
Message-ID: <20200122031650.GE80030@kduck.mit.edu>
References: <157665795217.30033.16985899397047966102.idtracker@ietfa.amsl.com> <CADaq8jegizL79V4yJf8=itMVUYDuHf=-pZgZEh-yqdT30ZdJ5w@mail.gmail.com> <CADaq8jcURAKZsNvs17MhNFT7eBNtkvOdrur5hHY2J1gXH7QdsA@mail.gmail.com> <20200113225411.GI66991@kduck.mit.edu> <CADaq8jcUWHo9KANDavHER0CA0AMW4t88t+Hg8PykV4S=hXF_HA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <CADaq8jcUWHo9KANDavHER0CA0AMW4t88t+Hg8PykV4S=hXF_HA@mail.gmail.com>
User-Agent: Mutt/1.12.1 (2019-06-15)
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/1iwbuLLfaugA4UBC3lFYW9kjCrk>
Subject: Re: [nfsv4] Benjamin Kaduk's Discuss on draft-ietf-nfsv4-rfc5661sesqui-msns-03: (with DISCUSS and COMMENT)
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 22 Jan 2020 03:17:04 -0000

Not attempting to trim anything, so sparsely inline...

On Sat, Jan 18, 2020 at 08:30:18AM -0500, David Noveck wrote:
> On Mon, Jan 13, 2020 at 5:54 PM Benjamin Kaduk <kaduk@mit.edu> wrote:
> 
> > Hi David,
> >
> > Trimming lots of good stuff here as well...
> >
> > On Thu, Jan 02, 2020 at 10:09:02AM -0500, David Noveck wrote:
> > > On Wed, Dec 18, 2019 at 3:32 AM Benjamin Kaduk via Datatracker <
> > > noreply@ietf.org> wrote:
> > >
> > > > Benjamin Kaduk has entered the following ballot position for
> > > > draft-ietf-nfsv4-rfc5661sesqui-msns-03: Discuss
> > > >
> > > > ----------------------------------------------------------------------
> > > > DISCUSS:
> > > > ----------------------------------------------------------------------
> > > >
> > > > Responded to these on 12/20.
> > >
> > > >
> > > > ----------------------------------------------------------------------
> > > > COMMENT:
> > > > ----------------------------------------------------------------------
> > > >
> > > > I think I may have mistakenly commented on some sections that are
> > > > actually just moved text, since my lookahead window in the diff was too
> > > > small.
> > >
> > >
> > > No harm, no foul.
> > >
> > >
> > > >
> > > > Since the "Updates:" header is part of the immutable RFC text (though
> > > > "Updated by:" is mutable), we should probably explicitly state that
> > "the
> > > > updates that RFCs 8178 and 8434 made to RFC 5661 apply equally to this
> > > > document".
> > > >
> > >
> > > I think we could update the last paragraph of Section 1.1 to be more
> > > explicit about
> > > this.  Perhaps it could read:
> > >
> > >    Until the above work is done, there will not be a consistent set of
> > >    documents providing a description of the NFSv4.1 protocol and any
> > >    full description would involve documents updating other documents
> > >    within the specification.   The updates applied by RFC 8434 [66] and
> > >    RFC 8178 [63] to RFC5661 also apply to this specification, and
> > >    will apply to any subsequent v4.1 specification until that work is
> > done.
> >
> > Sounds good.
> >
> > > >
> > > > I note inline (in what is probably too many places; please don't reply
> > > > at all of them!) some question about how clear the text is that a file
> > > > system migration is something done at a per-file-system granularity,
> > and
> > > > that migrating a client at a time is not possible.
> > >
> > >
> > > It might be possible but doing so is not a goal of this specfication.
> > >
> > > I'm not sure how to address your concern.   I don't know why anyone would
> > > assume that migrating entire clients is a goal of this specification.
> >  As
> > > far as
> > > I can see, when the word "migration" is used it is always in connection
> > with
> > > migrating a file system.   Is there some specific place where you think
> > > this
> > > issue is likely to arise?
> >
> > I think I garbled my point; my apologies.
> > To give a semi-concrete example, suppose I have clients A and B that are
> > accessing filesystem F on server X, and filesystem F is also available on
> > server Y.  If X decides that it needs to migrate access to F away from X
> > (e.g., for maintenance), then the "file system migration event" involves
> > telling both A and B to look to Y for access to F, at basically the same
> > time.
> 
> 
> This clarifies things for me.   When you were speaking of "migrating a
> client"
> i ssumed you  worried anout consistency of fs's F, G,, H for a particular
> client.
> Now it appears the issue is consistency among clients A, B, c, all
> accessing a
> common F.
> 
> If X tries to tell only A but not B to access F via Y but lets B
> > continue to access F at X, then I think there can be some subtle
> > consistency issues.
> >
> 
> Or worse, some decidely unsubltle ones :-(
> 
> >
> > In some sense, this is easy to consider as a dichotomy between "migration
> > is for server maintenance" vs. "migration is for load balancing".
> 
> 
> That categorization helps.
> 
> 
> > Assuming
> > I understand correctly (not a trivial assumption!), there was never any
> > intent to use these mechanisms for load balancing,
> 
> 
> Well "Never" covers a lot.    There are cases which you do want to do
> load balancing.   For example, if you are dealing with multiple network
> access path to the same replica, there is no issue with the load balancing
> approach.   In the case of multiple replicas where data consistency applies
> between them, then you might lod balance but it is the server's
> resposibility to
> provide the consistency, meaning that he needs to be warned of the
> possibility
> of issues that might arise if clients   modifying the same dara are placed
> on
> different replicas. In the case in which you don't guarantee data
> consistency among
> replicas, you might as well say about doing load balacing that "there be
> dragons".
> 
> and if we can explicitly
> > disclaim such usage, then we don't have to try to reason through any
> > potential subtle consistency issues.
> >
> 
> I think we can disclaim the really problematic part.  I think the new text
> will be needed in the migration section.  Issues with replication are
> different and do not involve any server choice.
> 
> I anticipate revising section 11.5.5 to read as follows:
> 
>    When a file system is present and becomes inacessible using the
>    current access path, the NFSv4.1 protocol provides a means by which
>    clients can be given the opportunity to have continued access to
>    their data.  This may involve use of a different access path to the
>    existing replica or by providing a path to a different replica.  The
>    new access path or the location of the new replica is specified by a
>    file system location attribute.  The ensuing migration of acceess
>    includes the ability to retain locks across the transition.
>    Depending on circumstances, this can involve:
> 
>    o  The continued use of the existing clientid when accessing the
>       current replica using a new access path.
> 
>    o  Use of lock reclaim, taking advantage of a per-fs grace period.
> 
>    o  Use of Tranparent State Migration.
> 
>    Typically, a client will be accessing the file system in question,
>    get an NFS4ERR_MOVED error, and then use a file system location
>    attribute to determine new the access path for the data.  When
>    fs_locations_info is used, additional information will be available
>    that will define the nature of the client's handling of the
>    transition to a new server.
> 
>    In most instances clients will choose to migrate all clients using a

(I assume s/clients/servers/ (just the first time))

>    particular file system to a successor replica at the same time to
>    avoid cases in which different clients are updating diufferent
>    replicas.  However migration of individual client can be helpful in
>    providing load balancing, as long as the replicas in question are
>    such that they represent the same data as described in
>    Section 11.11.8.
> 
>    o  In the case in which there is no transition between replicas
>       (i.e., only a change in access path), there are no special
>       difficulties in using of this mechanism to effect load balancing.
> 
>    o  In the case in which the two replicas are sufficiently co-
>       ordinated as to allow coherent simultaneous access to both by a
>       single client, there is, in general, no obstacle to use of
>       migration of particular clients to effect load balancing.
>       Generally, such simultaneous use involves co-operation between
>       servers to ennsure that locks granted on two co-ordinated replica
>       cannot conflict and can remain effective when transferred to a
>       common replica.
> 
>    o  In the case in which a large set of clients are accessing a file
>       system in a read-only fashion, in can be helpful to migrate all
>       clients with writable access simultaneously, while using load
>       balancing on the set of read-only copies, as long as the rules
>       appearing in Section 11.11.8, designed to prevent data reversion
>       are adhered to.
> 
>    In other cases, the client might not have sufficient guarantees of
>    data similarity/coherence to function prperly (e.g. the data in the
>    two replicas is similar but not identical), and the possibility that
>    different clients are updating different replicas can exacerbate the
>    difficulties, making use of load balancing in such situations a
>    perilous enterprise.
> 
>    The protocol does not specify how the file system will be moved
>    between servers or how updates to multiple replicas will be co-
>    ordinated.  It is anticipated that a number of different server-to-
>    server co-ordination mechanisms might be used with the choice left to
>    the server implementer.  The NFSv4.1 protocol specifies the method
>    used to communicate the migration event between client and server.
> 
>    The new location may be, in the case of various forms of server
>    clustering, another server providing access to the same physical file
>    system.  The client's responsibilities in dealing with this
>    transition will depend on whether a switch between replicas has
>    occurred and the means the server has chosen to provide continuity of
>    locking state.  These issues will be discussed in detail below.
> 
>    Although a single successor location is typical, multiple locations
>    may be provided.  When multiple locations are provided, the client
>    will typically use the first one provided.  If that is inaccessible
>    for some reason, later ones can be used.  In such cases the client
>    might consider the transition to the new replica to be a migration
>    event, even though some of the servers involved might not be aware of
>    the use of the server which was inaccessible.  In such a case, a
>    client might lose access to locking state as a result of the access
>    transfer.
> 
>    When an alternate location is designated as the target for migration,
>    it must designate the same data (with metadata being the same to the
>    degree indicated by the fs_locations_info attribute).  Where file
>    systems are writable, a change made on the original file system must
>    be visible on all migration targets.  Where a file system is not
>    writable but represents a read-only copy (possibly periodically
>    updated) of a writable file system, similar requirements apply to the
>    propagation of updates.  Any change visible in the original file
>    system must already be effected on all migration targets, to avoid
>    any possibility that a client, in effecting a transition to the
>    migration target, will see any reversion in file system state.
> 
> 
> > > As was the case for
> > > > my Discuss point about addresses/port-numbers, I'm missing the context
> > > > of the rest of the document, so perhaps this is a non-issue, but the
> > > > consequences of getting it wrong seem severe enough that I wanted to
> > > > check.
> > > >
> > >
> > > I'm not seeing any severe consequences.   Am I missing something?
> > >
> > >
> >
> 
> This is clearer now. I think we can avoid any severe consequences.
> 
> 
> > >
> > > Section 1.1
> > > >
> > > >    The revised description of the NFS version 4 minor version 1
> > > >    (NFSv4.1) protocol presented in this update is necessary to enable
> > > >    full use of trunking in connection with multi-server namespace
> > > >    features and to enable the use of transparent state migration in
> > > >    connection with NFSv4.1.  [...]
> > > >
> > > > nit: do we expect all readers to know what is meant by "trunking" with
> > > > no other lead-in?
> > > >
> > >
> > > Good point.  perhaps it could be addressed by rewriting the material in
> > the
> > > first paragraph of  Section 1.1 to read as follows;.
> > >
> > >    Two important features previously defined in minor version 0 but
> > >    never fully addressed in minor version 1 are trunking, the use of
> > >    multiple connections between a client and server potentially to
> > >    different network addresses, and transparent state migration, which
> > >    allows a file system to be transferred betwwen servers in a way that
> > >    provides for the client to maintain its existing locking state accross
> > >    the transfer.
> >
> > Maybe "the simultaneous use of multiple connections"?
> >
> 
> Will add.
> 
> 
> > nit: s/betwwen/between/
> >
> 
> Fixed.
> 
> >
> 
> >    The revised description of the NFS version 4 minor version 1
> > >    (NFSv4.1) protocol presented in this update is necessary to enable
> > >    full use of these features with other multi-server namespace features
> > >    This document is in the form of an updated description of the NFS 4.1
> > >    protocol previously defined in RFC5661 [62].  RFC5661 is obsoleted by
> > >    this document.  However, the update has a limited scope and is focused
> > >    on enabling full use of trunkinng and transparent state migration.
> > The
> > >    need for these changes is discussed in Appendix A.  Appendix B
> > describes
> > >    the specific changes made to arrive at the current text.
> >
> > This looks good, thanks.
> >
> 
> :-)
> 
> 
> >
> > [...]
> > > >
> > > >    o  Work would have to be done to address many erratas relevant to
> > RFC
> > > >       5661, other than errata 2006 [60], which is addressed in this
> > > >       document.  That errata was not deferrable because of the
> > > >       interaction of the changes suggested in that errata and handling
> > > >       of state and session migration.  The erratas that have been
> > > >       deferred include changes originally suggested by a particular
> > > >       errata, which change consensus decisions made in RFC 5661, which
> > > >       need to be changed to ensure compatibility with existing
> > > >       implementations that do not follow the handling delineated in RFC
> > > >       5661.  Note that it is expected that such erratas will remain
> > > >
> > > > This sentence is pretty long and hard to follow; maybe it could be
> > split
> > > > after "change consensus decisions made in RFC 5661" and the second half
> > > > start with a more declarative statement about existing implementations?
> > > > (E.g., "Existing implementations did not perform handling as
> > delineated in
> > > > RFC
> > > > 5661 since the procedures therein were not workable, and in order to
> > > > have the specification accurately reflect the existing deployment base,
> > > > changes are needed [...]")
> > > >
> > >
> > > I will clean this bullet up.  See below for a proposed replcement.
> > >
> > >
> > > >
> > > >       relevant to implementers and the authors of an eventual
> > > >       rfc5661bis, despite the fact that this document, when approved,
> > > >       will obsolete RFC 5661.
> > > >
> > > > (I assume the RFC Editor can tweak this line to reflect what actually
> > > > happens; my understanding is that the errata reports will get cloned to
> > > > this-RFC.)
> > > >
> > >
> > > I understand that Magnus has already got that issue addressed.  I'll
> > > discuss the appropriate text with him.
> > >
> > >
> > > > [rant about "errata" vs. "erratum" elided]
> > > >
> > >
> > > This is annoying but there is no way we are going to get people to use
> > > "erratum".   What I've tried to do in my propsed replacement text
> > > is to refer to "errata report(s)", which is more accurate and allows
> > > people who speak English to use English singulars and plurals, without
> > > having to worry about Latin grammar.
> >
> > That's what I try to do as well :)
> >
> > > Here's my proposed replacement for the troubled bullet:
> > >
> > >    o  Work needs to be done to address many errata reports relevant to
> > >       RFC 5661, other than errata report 2006 [60], which is addressed
> > >       in this document.  Addressing of that report was not deferrable
> > >       because of the interaction of the changes suggested there and the
> > >       newly described handling of state and session migration.
> > >
> > >       The errata reports that have been deferred and that will need to
> > >       be addressed in a later document include reports currently
> > >       assigned a range of statuses in the errata reporting system
> > >       including reports marked Accepted and those marked Held Over
> >
> > nit: it's "Hold For Document Update"
> >
> > Fixed
> 
> > >       because the change was too minor to address immediately.
> > >
> > >       In addition, there is a set of other reports, including at least
> > >       one in state Rejected, which will need to be addressed in a later
> > >       document.  This will involve making changes to consensus decisions
> > >       reflected in RFC 5661, in situations in which the working group has
> > >       already decided that the treatment in RFC 5661 is incorrect, and
> > > needs
> > >       to be revised to reflect the working group's new consensus and
> > ensure
> > >       compatibility with existing implementations that do not follow the
> > >       handling described in in RFC 5661.
> > >
> > >       Note that it is expected that such all errata reports will remain
> >
> > nit: s/such all/all such/
> >
> > Fixed.
> 
> > >       relevant to implementers and the authors of an eventual
> > >       rfc5661bis, despite the fact that this document, when approved,
> > >       will obsolete RFC 5661 [62].
> >
> > This looks really good!
> >
> > >
> > > > Section 2.10.4
> > > >
> > > >    Servers each specify a server scope value in the form of an opaque
> > > >    string eir_server_scope returned as part of the results of an
> > > >    EXCHANGE_ID operation.  The purpose of the server scope is to allow
> > a
> > > >    group of servers to indicate to clients that a set of servers
> > sharing
> > > >    the same server scope value has arranged to use compatible values of
> > > >    otherwise opaque identifiers.  Thus, the identifiers generated by
> > two
> > > >    servers within that set can be assumed compatible so that, in some
> > > >    cases, identifiers generated by one server in that set may be
> > > >    presented to another server of the same scope.
> > > >
> > > > Is there more that we can say than "in some cases"?
> > >
> > >
> > > Not really.  In general, when a server sends you an id, it comes with an
> > > implied promise to recognize it when you present it subsequently to the
> > > same server.
> > >
> > > The fact that two servers have decided to co-operate in their Id
> > assignment
> > > does not change that.
> > >
> > > The previous text
> > > > implies a higher level of reliability than just "some cases", to me.
> > > >
> > >
> > > I think I need to change the text, perhaps by replacing "use compatible
> > > values of otherwise
> > > opaque identifiers" by "use distinct values of otherwise opaque
> > identifiers
> > > so that the two
> > > servers never assign the same value to two distinct objects".
> > >
> > > I anticipate the following replacement for the first two paragraphs of
> > > Section 2.10.4:
> > >
> > >    Servers each specify a server scope value in the form of an opaque
> > >    string eir_server_scope returned as part of the results of an
> > >    EXCHANGE_ID operation.  The purpose of the server scope is to allow a
> > >    group of servers to indicate to clients that a set of servers sharing
> > >    the same server scope value has arranged to use distinct values of
> > >    opaque identifiers so that the two servers never assign the same
> > >    value to two distinct object.  Thus, the identifiers generated by two
> > >    servers within that set can be assumed compatible so that, in certain
> > >    important cases, identifiers generated by one server in that set may
> > >    be presented to another server of the same scope.
> > >
> > >    The use of such compatible values does not imply that a value
> > >    generated by one server will always be accepted by another.  In most
> > >    cases, it will not.  However, a server will not accept a value
> > >    generated by another inadvertently.  When it does accept it, it will
> >
> > nit: I think it flows better to put "invertently" as "will not
> > inadvertently accept".
> >
> 
> OK.  Fixed.
> 
> 
> >
> > >    be because it is recognized as valid and carrying the same meaning as
> > >    on another server of the same scope.
> > >
> > >
> > > As an illustration of the (limited) value of this information, consider
> > the
> > > case of client recovery from a server reboot.  The client has to reclaim
> > > his locks using file handles returned by the previous server instance.
> > If
> > > the server scopes are the same (they almost always are), the client is
> > not
> > > sure he will get his locks back (e.g. the file might have been deleted),
> > > but he does know that, if the lock reclaim succeeds, it is for the same
> > > file.  If the server scopes are not the same, he has no such assurance.
> >
> > Thanks, the new text (and explanation here) is very clear about what's
> > going on.
> >
> > [...]
> > > > Section 11.5.5
> > > >
> > > >    will typically use the first one provided.  If that is inaccessible
> > > >    for some reason, later ones can be used.  In such cases the client
> > > >    might consider that the transition to the new replica as a migration
> > > >    event, even though some of the servers involved might not be aware
> > of
> > > >    the use of the server which was inaccessible.  In such a case, a
> > > >
> > > > nit: the grammar here got wonky; maybe s/as a/is a/?
> 
> > >
> > >
> > > How about s/as a/to be a/ ?
> >
> > That works if you drop the earlier "that", for "the client might consider
> > the transition to the new replica to be a migration event".
> >
> > Did that.
> 
> > [...]
> > > >
> > > >    o  The "local" representation of all owners and groups must be the
> > > >       same on all servers.  The word "local" is used here since that is
> > > >       the way that numeric user and group ids are described in
> > > >       Section 5.9.  However, when AUTH_SYS or stringified owners or
> > > >       group are used, these identifiers are not truly local, since they
> > > >       are known tothe clients as well as the server.
> > > >
> > > > I am trying to find a way to note that the AUTH_SYS case mentioned here
> > > > is precisely because of the requirement being imposed by this bullet
> > > > point,
> > >
> > >
> > > Not sure what you mean by that.  I think the requirement is to allow the
> > > client
> > > to be able to use AUTH_SYS, without the contortions that would be
> > required
> > > if
> > > different fs's had the same uid's meaning different things.
> > >
> > > while acknowledging that the "stringified owners or group" case
> > > > is separate, but not having much luck.
> > > >
> > >
> > > My attempt to revise this area is below:
> > >
> > >    Note that there is no requirement in general that the users
> > >    corresponding to particular security principals have the same local
> > >    representation on each server, even though it is most often the case
> > >    that this is so.
> > >
> > >    When AUTH_SYS is used, the following additional requirements must be
> > >    met:
> > >
> > >    o  Only a single NFSv4 domain can be supported through use of
> > >       AUTH_SYS.
> > >
> > >    o  The "local" representation of all owners and groups must be the
> > >       same on all servers.  The word "local" is used here since that is
> > >       the way that numeric user and group ids are described in
> > >       Section 5.9.  However, when AUTH_SYS or stringified numeric owners
> > >       or groups are used, these identifiers are not truly local, since
> > >       they are known to the clients as well as the server.
> > >
> > >    Similarly, when strigified numeric user and group ids are used, the
> > >    "local" representation of all owners and groups must be the same on
> > >    all servers, even when AUTH_SYS is not used.
> >
> > I really like this rewriting; thank you for undertaking it.
> > I think that what I was trying to say here is roughly that we need
> > scare-quotes for "local" because of things like AUTH_SYS (or stringified
> > user/group ids) that involve sending local representations over the
> > network.  So your rewrite did in fact address my concern, even though I
> > didn't manage to say it very well the first time :)
> >
> > [...]
> > > >
> > > >    o  When there are no potential replacement addresses in use but
> > there
> > > >
> > > > What is a "replacement address"?
> > > >
> > >
> > > I've explained that in some new text added before these bullets, as a new
> > > second
> > > paragraph of this section:
> > >
> > >    The appropriate action depends on the set of replacement addresses
> > >    (i.e. server endpoints which are server-trunkable with one previously
> > >    being used) which are available for use.
> > >
> > > >
> > > >       are valid addresses session-trunkable with the one whose use is
> > to
> > > >       be discontinued, the client can use BIND_CONN_TO_SESSION to
> > access
> > > >       the existing session using the new address.  Although the target
> > > >       session will generally be accessible, there may be cases in which
> > > >       that session is no longer accessible.  In this case, the client
> > > >       can create a new session to enable continued access to the
> > > >       existing instance and provide for use of existing filehandles,
> > > >       stateids, and client ids while providing continuity of locking
> > > >       state.
> > > >
> > > > I'm not sure I understand this last sentence.  On its own, the "new
> > > > session to enable continued access to the existing instance" sounds
> > like
> > > > the continued access would be on the address whose use is to cease, and
> > > > thus the new session would be there.
> > >
> > >
> > > That is not the intention.  Will need to clarify.
> > >
> > >
> > > > But why make a new session when
> > > > the old one is still good,
> > >
> > >
> > > It isn't usable on the new connection.
> > >
> > >
> > > > especially when we just said in the previous
> > > > sentence that the old session can't be moved to the new
> > > > connection/address?
> > > >
> > >
> > > Because we can't use it on the new connection, we have to create a
> > > new session to access  the client.
> > >
> > > Perhaps a forward reference down to Section 11.12.{4,5} for this and the
> > > > next bullet point would help as well as rewording?
> > > >
> > >
> > > It rurns out these would add confusion since they deal with migration
> > > situations
> > > and deciding wheher transparent stte miugration has occurred in the
> > switch
> > > between
> > > replicas.  In the cases we are dealing with, ther is only a  single
> > > replicas/fs and no
> > > migration..
> > >
> > > Here is my proposed replacement text for the two bullets in question:
> > >
> > >    o  When there are no potential replacement addresses in use but there
> > >       are valid addresses session-trunkable with the one whose use is to
> > >       be discontinued, the client can use BIND_CONN_TO_SESSION to access
> > >       the existing session using the new address.  Although the target
> > >       session will generally be accessible, there may be rare situations
> > >       in which that session is no longer accessible, when an attempt is
> > >       made tto bind the new conntectin to it.  In this case, the client
> >
> > nits: s/tto/to/, s/conntectin/connection/
> >
> 
> Fixed.
> 
> >
> > >       can create a new session to enable continued access to the
> > >       existing instance and provide for use of existing filehandles,
> > >       stateids, and client ids while providing continuity of locking
> > >       state.
> >
> > Just to check: this sounds like even in the case where the client creates
> > a new session, the filehandle, stateid, clientid, and locking state
> > (values) are in effect "transparently preserved" by the server, so the
> > client has no need to do any reclamation of locking state.  I think that's
> > what's intended, but holler if I'm wrong about that.
> >
> Ok.
> I'll holler that *you're right about that.*
> 
> >
> > >    o  When there is no potential replacement address in use and there
> > >       are no valid addresses session-trunkable with the one whose use is
> > >       to be discontinued, other server-trunkable addresses may be used
> > >       to provide continued access.  Although use of CREATE_SESSION is
> > >       available to provide continued access to the existing instance,
> > >       servers have the option of providing continued access to the
> > >       existing session through the new network access path in a fashion
> > >       similar to that provided by session migration (see Section 11.12).
> > >       To take advantage of this possibility, clients can perform an
> > >       initial BIND_CONN_TO_SESSION, as in the previous case, and use
> > >       CREATE_SESSION only if that fails.
> > >
> > >
> > > > Section 11.10.6
> > > >
> > > >    In a file system transition, the two file systems might be clustered
> > > >    in the handling of unstably written data.  When this is the case,
> > and
> > > >
> > > > What does "clustered in the handling of unstably written data" mean?
> > > >
> > > >    the two file systems belong to the same write-verifier class, write
> > > >
> > > > How is the client supposed to determine "when this is the case"?
> > > >
> > >
> > > Here's a prpoed replcment for this pargraph:
> > >
> > >    In a file system transition, the two file systems might be
> > >    cooperating in the handling of unstably written data.  Clients can
> > >    ditermine if this is the case, by seeing if the two file systems
> > >    belong to the same write-verifier class.  When this is the case,
> > >    write verifiers returned from one system may be compared to those
> > >    returned by the other and superfluous writes avoided.
> > >
> > >
> > > > Section 11.10.7
> > > >
> > > >    In a file system transition, the two file systems might be
> > consistent
> > > >    in their handling of READDIR cookies and verifiers.  When this is
> > the
> > > >    case, and the two file systems belong to the same readdir class,
> > > >
> > > > As above, how is the client supposed to determine "when this is the
> > > > case"?
> > >
> > >
> > > >    READDIR cookies and verifiers from one system may be recognized by
> > > >    the other and READDIR operations started on one server may be
> > validly
> > > >    continued on the other, simply by presenting the cookie and verifier
> > > >    returned by a READDIR operation done on the first file system to the
> > > >    second.
> > > >
> > > > Are these "may be"s supposed to admit the possibility that the
> > > > destination server can just decide to not honor them arbitrarily?
> > > >
> > >
> > > No. They are intended to indicate that the client might or might not use
> > > the capability
> > >
> > > Here is proposed replacement text for the paragraph:
> > >
> > >    In a file system transition, the two file systems might be consistent
> > >    in their handling of READDIR cookies and verifiers.  Clients can
> > >    determine if this is the case, by seeing if the two file systems
> > >    belong to the same readdit class.  When this is the case, readdir
> >
> > nit: s/readdit/readdirt
> >
> 
> Fixed.
> 
> 
> >
> > >    class, READDIR cookies and verifiers from one system will be
> > >    recognized by the other and READDIR operations started on one server
> > >    can be validly continued on the other, simply by presenting the
> > >    cookie and verifier returned
> >
> > Ah, this formulation (for both write-verifier and readdir) is very helpful.
> >
> > [...]
> > > > Section 11.16.1
> > > >
> > > >    With the exception of the transport-flag field (at offset
> > > >    FSLI4BX_TFLAGS with the fls_info array), all of this data applies to
> > > >    the replica specified by the entry, rather that the specific network
> > > >    path used to access it.
> > > >
> > > > Is it clear that this applies only to the fields defined by this
> > > > specification (since, as mentioned later, future extensions must
> > specify
> > > > whether they apply to the replica or the entry)?
> > > >
> > >
> > > Intend to use the following replacement text:
> > >
> > >    With the exception of the transport-flag field (at offset
> > >    FSLI4BX_TFLAGS with the fls_info array), all of this data defuined in
> >
> > nit: s/defuined/defined/
> >
> 
>  Fixed.
> 
> 
> > >    this specification applies to the replica specified by the entry,
> > >    rather that the specific network path used to access it.  The
> > >    classification of data in extensions to this data is discussed
> > >    below
> >
> > [...]
> > > > Section 18.35.3
> > > >
> > > > I a little bit wonder if we want to reaffirm that co_verifier remains
> > > > fixed when the client is establishing multiple connections for trunking
> > > > usage -- the "incarnation of the client" language here could make a
> > > > reader wonder, though I think the discussion of its use elsewhere as
> > > > relating to "client restart" is sufficiently clear.
> > > >
> > >
> > > This should be made clearer but the clarification needs to be done
> > multiple
> > > places.
> > >
> > > Possible replacement text for eighth non-code paragraph of section 2.4:
> > >
> > >    The first field, co_verifier, is a client incarnation verifier,
> > >    allowing the server to distingish successive incarnations (e.g.
> > >    reboots) of the same client.  The server will start the process of
> > >    canceling the client's leased state if co_verifier is different than
> > >    what the server has previously recorded for the identified client (as
> > >    specified in the co_ownerid field).
> > >
> > > Likely replacement text for the seventh paragraph of this section:
> > >
> > >    The eia_clientowner field is composed of a co_verifier field and a
> > >    co_ownerid string.  As noted in Section 2.4, the co_ownerid describes
> > >    the client, and the co_verifier is the incarnation of the client.  An
> > >    EXCHANGE_ID sent with a new incarnation of the client will lead to
> > >    the server removing lock state of the old incarnation.  Whereas an
> > >    EXCHANGE_ID sent with the current incarnation and co_ownerid will
> > >    result in an error, an update of the client ID's properties,
> > >    depending on the arguments to EXCHANGE_ID, or the return of
> > >    information about the existing client_id as might happen when this
> > >    opration is done to the same seerver using different network
> > >    addresses as part of creating trunked connections.
> >
> 
> Not sure what error that error text was  referring to above.   Think it
> added to the
> confusion..
> 
> >
> > I think I get the general sense of what is going on here (i.e., the last
> > sentence) but am still uncertain on the specifics.  Namely, "most of the
> > time" (TM), sending EXCHANGE_ID with current incarnation/ownerid will be an
> > error, since it's a client bug to try to register the same way twice in a
> > row.
> 
> 
> No it isn't.   This is case 2 on page 508, " Non-Update on Existing Client
> ID".
> Given  retries and possible communication difficulties, it is just too hard
> to
> make this case an error.
> 
> However, some times we might have to do that in order to update
> > properties of the client or get some new information that a server has
> > associated to a given client ID.  I *think* (but am not sure) that the
> > error case is exactly when the (same-incarnation/ownerid) EXCHANGE_ID is
> > done to the same *server and address* as the original EXCHANGE_ID, and that
> > the "update properties or get new information back" case is exactly when
> > the EXCHANGE_ID is done to a different server/address combination.
> >
> > If I'm right about that, then I'd suggest:
> >
> > %    the server removing lock state of the old incarnation.  Whereas an
> > %    EXCHANGE_ID sent with the current incarnation and co_ownerid will
> > %    result in an error when sent to a given server at a given address for
> > %    a second time, it is not an error to send EXCHANGE_ID with current
> > %    incarnation and co_ownerid to a different server (e.g., as part of a
> > %    migration event).  In such cases, the EXCHANGE_ID can allow for an
> > %    update of the client ID's properties, depending on the arguments to
> > %    EXCHANGE_ID, or the return of (potentially updated) information about
> > %    the existing client_id, as might happen when this opration is done to
> > %    the same server using different network addresses as part of creating
> > %    trunked connections.
> >
> 
> I think I have to revise the paragraph above to be clearer.   I anticipate
> replacing the seventh paragraph of section 18.35.3 by the following
> replacement:
> 
>    The eia_clientowner field is composed of a co_verifier field and a
>    co_ownerid string.  As noted in Section 2.4, the co_ownerid
>    identifies the client, and the co_verifier specfies a particular
>    incarnation of that client.  An EXCHANGE_ID sent with a new
>    incarnation of the client will lead to the server removing lock state
>    of the old incarnation.  On the other hand, an EXCHANGE_ID sent with
>    the current incarnation and co_ownerid will, when it does not result
>    in an unrelated error, porentially update an existing client ID's
>    properties, or simply return of information about the existing
>    client_id.  That latter would happen when this operation is done to
>    the same server using different network addresses as part of creating
>    trunked connections.

Ah, I think I get it now.  Thanks.

> 
> > > > Section 21
> > > >
> > > > Some other topics at least somewhat related to trunking and migration
> > > > that we could potentially justify including in the current,
> > > > limited-scope, update (as opposed to deferring for a full -bis)
> > include:
> > > >
> > >
> > > Some of these are related to multi-server namespace but not related to
> > > security, as far as I can see.
> >
> > It does look like it; in some sense I was going through a brainstorming
> > exercise to make this list, and appreciate the sanity checks.  (To be
> > clear, I am not insisting that any of them get covered in specifically the
> > sesqui update, just mentioning topics for potential consideration.)
> > >
> > > >
> > > > - clients that lie about reclaimed locks during a post-migration grace
> > > >   period
> > > >
> > >
> > > Will address in a number of places:
> > >
> > > First of all, I inted to add a new paragraph to Section 21, to be placed
> > as
> > > the
> > > sixth non-bulleted paragraph and to read as follows:
> > >
> > >    Security consideration for lock reclaim differ between the state
> > >    reclaim done after server failure (discussed in Section 8.4.2.1.1 and
> > >    the per-fs state reclaim done in support of migration/replication
> > >    (discussed in Section 11.11.9.1).
> > >
> > > Next is a new proposed new section to appear as Section 11.11.9.1:
> > >
> > > 11.11.9.1.  Security Consideration Related to Reclaiming Lock State
> > >             after File System Transitions
> > >
> > >    Although it is possible for a client reclaiming state to misrepresent
> > >    its state, in the same fashion as described in Section 8.4.2.1.1,
> > >    most implementations providing for such reclamation in the case of
> > >    file system transitions will have the ability to detect such
> > >    misreprsentations.  this limits the ability of unauthenicatd clients
> >
> > typos: "misrepresentations", "This", "unauthenticated"
> >
> 
> Fixed.
> 
> 
> >
> > >    to execute denial-of-service attacks in these cirsumstances.
> >
> > "circumstances"
> >
> >
> Fixed.
> 
> 
> > >    Nevertheless, the rules stated in Section 8.4.2.1.1, regarding
> > >    prinipal verification for reclaim requests, apply in this situation
> > >    as well.
> > >
> > >    Typically,implementations support file system transitions will have
> >
> > nits: space after comma, and "that" for "that support"
> >
> > Fixed.
> 
> 
> > >    extensive information about the locks to be transferred.  This is
> > >    because:
> > >
> > >    o  Since failure is not involved, there is no need store to locking
> > >       information in persistent storage.
> > >
> > >    o  There is no need, as there is in the failure case, to update
> > >       multiple repositories containg locking state to keep them in sync.
> > >       Instead, there is a one-time communication of locking state from
> > >       the source to the destination server.
> > >
> > >    o  Providing this information avoids potential interference with
> > >       existing clients using the destination file system, by denying
> > >       them the ability to obtain new locks during the grace period.
> > >
> > >    When such detailed locking infornation, not necessarily including the
> > >    associated stateid,s is available,
> >
> > nits: "information", s/stateid,s/stateids,/
> >
> 
> Fixed.
> 
> 
> > >
> > >    o  It is possible to detect reclaim requests that attempt to reclsim
> >
> > nit: s/reclsim/reclaim/
> >
> 
> Fixed.
> 
> >
> > >       locks that did not exist before the transfer, rejecting them with
> > >       NFS4ERR_RECLAIM_BAD (Section 15.1.9.4).
> > >
> > >    o  It is possible when dealing with non-reclaim requests, to
> > >       determine whether they conflict with existing locks, eliminating
> > >       the need to return NFS4ERR_GRACE ((Section 15.1.9.2) on non-
> > >       reclaim requests.
> > >
> > >    It is possible for implementations of grace periods in connection
> > >    with file system transitions not to have detailed locking information
> > >    available at the destination server, in which case the security
> > >    situation is exactly as described in Section 8.4.2.1.1.
> > >
> > > I think I should also draw your attention to a revised Section 15.1.9.
> > >  These
> > > includes some revisions originally done for
> > > draft-ietf-nfsv4-rfc5661-msns-update,
> > > which somehow got dropped as a few that turned up as necessary in writing
> > > 11.11.9.1:
> > >
> > > 15.1.9.  Reclaim Errors
> > >
> > >    These errors relate to the process of reclaiming locks after a server
> > >    restart.
> > >
> > > 15.1.9.1.  NFS4ERR_COMPLETE_ALREADY (Error Code 10054)
> > >
> > >    The client previously sent a successful RECLAIM_COMPLETE operation
> > >    specifying the same scope, whether that scope is global or for the
> > >    same file system in the case of a per-fs RECLAIM_COMPLETE.  An
> > >    additional RECLAIM_COMPLETE operation is not necessary and results in
> > >    this error.
> > >
> > > 15.1.9.2.  NFS4ERR_GRACE (Error Code 10013)
> > >
> > >    This error is returned when the server was in its recovery or grace
> > >    period. with regard to the file system object for which the lock was
> >
> > (no full stop)
> >
> 
>  Fixed.
> 
> 
> > >    requested resulting in a situation in which a non-reclaim locking
> > >    request could not be granted.  This can occur because either
> > >
> > >    o  The server does not have sufficiuent information about locks that
> > >       might be poentially reclaimed to determine whether the lock could
> > >       validly be granted.
> > >
> > >    o  The request is made by a client responsible for reclaiming its
> > >       locks that has not yet done the appropriate RECLAIM_COMPLETE
> > >       operation, allowing it to proceed to obtain new locks.
> > >
> > >    It should be noted that, in the case of a per-fs grace period, there
> > >    may be clients, i.e. those currently using the destination file
> > >    system who might be unaware of the circumstances resulting in the
> >
> > nit: comma after "file system"
> >
> 
> This phrase is now within parentheses.
> 
> >
> > >    intiation of the grace period.  Such clients need to periodically
> > >    retry the request until the grace period is over, just as other
> > >    clients do.
> > >
> > > 15.1.9.3.  NFS4ERR_NO_GRACE (Error Code 10033)
> > >
> > >    A reclaim of client state was attempted in circumstances in which the
> > >    server cannot guarantee that conflicting state has not been provided
> > >    to another client.  This occurs if there is no active grace period
> > >    appliying to the file system object for which the request was made,if
> > >    the client making the request has no current role in reclaining
> > >    locks, or because previous operations have created a situation in
> > >    which the server is not able to determine that a reclaim-interfering
> > >    edge condition does not exist.
> > >
> > > 15.1.9.4.  NFS4ERR_RECLAIM_BAD (Error Code 10034)
> > >
> > >    The server has determined that a reclaim attempted by the client is
> > >    not valid, i.e. the lock specified as being reclaimed could not
> > >    possibly have existed before the server restart or file system
> > >    migration event.  A server is not obliged to make this determination
> > >    and will typically rely on the client to only reclaim locks that the
> > >    client was granted prior to restart.  However, when a server does
> > >    have reliable information to enable it make this determination, this
> > >    error indicates that the reclaim has been rejected as invalid.  This
> > >    is as opposed to the error NFS4ERR_RECLAIM_CONFLICT (see
> > >    Section 15.1.9.5) where the server can only determine that there has
> > >    been an invalid reclaim, but cannot determine which request is
> > >    invalid.
> > >
> > > 15.1.9.5.  NFS4ERR_RECLAIM_CONFLICT (Error Code 10035)
> > >
> > >    The reclaim attempted by the client has encountered a conflict and
> > >    cannot be satisfied.  This potentially indicates a misbehaving
> > >    client, although not necessarily the one receiving the error.  The
> > >    misbehavior might be on the part of the client that established the
> > >    lock with which this client conflicted.  See also Section 15.1.9.4
> > >    for the related error, NFS4ERR_RECLAIM_BAD
> >
> > Thanks for remembering to fetch these updates from the full bis WIP!
> >
> > >
> > > > - how attacker capabilities compare by using a compromised server to
> > > >   give bogus referrals/etc. as opposed to just giving bogus data/etc.
> > > >
> > >
> > > Will address. See the paragraphs to be added to the end of Section 21.
> > >
> > >
> > > > - an attacker in the network trying to shift client traffic (in terms
> > of
> > > >   what endpoints/connections they use) to overload a server
> > > >
> > >
> > > Will address. See the paragraphs to be added to the end of Section 21.
> > >
> > >
> > > > - how asynchronous replication can cause clients to repeat
> > > >   non-idempotent actions
> > > >
> > >
> > > Not sure what you are referring to.
> >
> > I don't have something fully fleshed out here, but it's in the general
> > space when there are multiple replicas that get updates at (varying)
> > delays from the underlying write.  A contrived situation would be if you
> > have a pool of worker machines that use NFS for state management (I know, a
> > pretty questionable idea), and try to do compare-and-set on a state file.
> > If one worker tries to assert that it owns the state but other NFS replicas
> > see delayed updates, additional worker machines could also try to claim the
> > state and perform whatever operation the state file is controlling.
> >
> > Basically, the point here is that if you as an NFS consumer are using NFS
> > with relaxed replication semantics, you have to think through how your
> > workflow will behave in the presence of such relaxed updates.  Which ought
> > to be obvious, when I say it like that, but perhaps is not always actually
> > obvious.
> >
> > >
> > > > - the potential for state skew and/or data loss if migration events
> > > >   happen in close succession and the client "misses a notification"
> > > >
> > >
> > > Is there a specfic problem that needs to be addressed?
> >
> > I don't have a concrete scenario that's specific to NFS, no; this is just a
> > generic possibility for any scheme that involves discrete updates (e.g.,
> > file-modifying RPCs) and the potential for asynchronous replication.
> >
> 
> I think that the necessary discussion can be folded into some clarification
> of replication discussed in
> another thread.

Sure.

> 
> > >
> > > > - cases where a filesystem moves and there's no longer anything running
> > > >   at the old network endpoint to return NFS4ERR_MOVED
> > > >
> > >
> > > This seems to me just a recognition that sometimes system fail.  Not sure
> > > specifically what to address.
> >
> > Okay.
> >
> > > > - what can happen when non-idempotent requests are in a COMPOUND before
> > > >   a request that gets NFS4ERR_MOVED
> > > >
> > >
> > > Intend to address in Section 15.1.2.4:
> > >
> > >    The file system that contains the current filehandle object is not
> > >    present at the server, or is not accessible using the network
> > >    addressed.  It may have been made accessible on a different ser of
> > >    network addresses, relocated or migrated to another server, or it may
> > >    have never been present.  The client may obtain the new file system
> > >    location by obtaining the "fs_locations" or "fs_locations_info"
> > >    attribute for the current filehandle.  For further discussion, refer
> > >    to Section 11.3.
> > >
> > >    As with the case of NFS4ERR_DELAY, it is possible that one or more
> > >    non-idempotent operation may have been successfully executed within a
> > >    COMPOUND before NFS4ERR_MOVED is returned.  Because of this, once the
> > >    new location is determined, the original request which received the
> > >    NFS4ERR_MOVED should not be re-executed in full.  Instead, a new
> > >    COMPOUND, with any successfully executed non-idempotent operation
> > >    removed should be executed.  This new request should have different
> > >    slot id or sequence in those cases in which the same session is used
> > >    for the new request (i.e. transparent session migration or an
> >
> > nit: comma after "i.e.".
> >
> 
> Fixed.
> 
> 
> >
> > >    endpoint transition to a new address session-trunkable with the
> > >    original one).
> > >
> > >
> > > > - how bad it is if the client messes up at Transparent State Migration
> > > >   discovery, most notably in the case when some lock state is lost
> > > >
> > >
> > > Propose to address this by adding the following paragraph to the end of
> > > Section 11.13.2:
> > >
> > >    Lease discovery needs to be provided as described above, in order to
> > >    ensure that migrations are discovered soon enough to ensure that
> > >    leases moved to new servers are discovred in time to make sure that
> > >    leases are renewed early enough to avoid lease expiration, leading to
> > >    loss of locking state.  While the consequences of such loss can be
> >
> > nit: the double "are discovered {soon enough,in time}" is a little awkward
> > of a construction; how about "Lease discovery needs to be provided as
> > described above, in order to ensure that migrations are discovered soon
> > enough that leases moved to new servers can successfully be renewed before
> > they expire, avoiding loss of locking state"?
> >
> 
> Went with the following:
> 
>         Lease discovery needs to be provided as described above, in order to
>         ensure that migrations are discovered soon enough to enable
>         leases moved to new servers to be  appropriately renewed in order to
>         avoid lease expiration, leading to loss of locking state.

I think maybe we should leave any further tweaks to the RFC Editor; my only
concern is whether it can be misread as saying that the renewal leads to
loss of locking state (which is, admittedly, a nonsensical interpretation).

> >
> > >    ameliorated through implementations of courtesy locks, servers are
> > >    under no obligation to do, and a conflicting lock request may means
> >
> > nit: s/means/mean/
> >
> 
> Fixed.
> 
> 
> >
> > >    that a lock is revoked unexpectedly.  Clients should be aware of this
> > >    possibility.
> > >
> > >
> > >
> > > > - the interactions between cached replies and migration(-like) events,
> > > >   though a lot of this is discussed in section 11.13.X and 15.1.1.3
> > > >   already
> > > >
> > >
> > > Will address any specfics that you feel aren't adequately addressed.
> >
> > I don't remember any particular specifics, so we should probably just let
> > this go for now.
> >
> > >
> > > >
> > > > but I defer to the WG as to what to cover now vs. later.
> > > >
> > > > In light of the ongoing work on draft-ietf-nfsv4-rpc-tls, it might be
> > > > reasonable to just talk about "integrity protection" as an abstract
> > > > thing without the specific focus on RPCSEC_GSS's integrity protection
> > > > (or authentication)
> > > >
> > > >
> > > I was initially leery of this, but when I looked at the text, I was able
> > to
> > > avoid referring to RPCSEC_GSS in most cases in which integrity was
> > > mentioned:-).  The same does not seem posible for authentication :-(
> >
> > We'll take the easy wins and try to not fret too much about the other
> > stuff.
> >
> > > > RPCSEC_GSS does not
> > > > %   protect the binding from one server to another as part of a
> > referral
> > > > %   or migration event.  The source server must be trusted to provide
> > > > %   the correct information, based on whatever factors are available to
> > > > %   the client.
> > > >
> > >
> > > These are both situations for which RPCSEC_GSS has no solution, but
> > neither
> > > is there another one.   It is probably best to just say that without
> > > reference
> > > to integrity protection.
> >
> > True.
> >
> > > I have added new paragraphs after these bullets that may address some of
> > > the
> > > issues you were concerned about.
> > >
> > >    Even if such requests are not interfered with in flight, it is
> > >    possible for a compromised server to direct the client to use
> > >    inappropriate servers, such as those under the control of the
> > >    attacker.  It is not clear that being directed to such servers
> > >    represents a greater threat to the client than the damage that could
> > >    be done by the comprromised server itself.  However, it is possible
> > >    that some sorts of transient server compromises might be taken
> > >    advantage of to direct a client to a server capable of doing greater
> > >    damage over a longer time.  One useful step to guard against this
> > >    possibility is to issue requests to fetch location data using
> > >    RPCSEC_GSS, even if no mapping to an RPCSEC_GSS principal is
> > >    available.  In this case, RPCSEC_GSS would not be used, as it
> > >    typically is, to identify the client principal to the server, but
> > >    rather to make sure (via RPCSEC_GSS mutual authentication) that the
> > >    server being contacted is the one intended.
> > >
> > >    Similar considrations apply if the threat to be avoided is the
> > >    direction of client traffic to inappropriate (i.e. poorly performing)
> > >    servers.  In both cases, there is no reason for the information
> > >    returned to depend on the identity of the client principal requesting
> > >    it, while the validity of the server information, which has the
> > >    capability to affect all client principals, is of considerable
> > >    importance.
> >
> > These do address some of the issues I mentioned; thank you.  I do have a
> > couple further comments:
> >
> > - I'm not sure what "no mapping to an RPCSEC_GSS principal is available"
> >   means (but maybe that's just because I've not read the RPCSEC_GSS RFCs
> >   recently enough)
> >
> 
> This is partly because " even if no mapping to an RPCSEC_GSS principal is
> available"
> is misleading.   It would have been better to say "even if no mapping  is
> to an
> RPCSEC_GSS principal is available for the user currently obtaining the
> information".
> The issue is not within RPCSEC_GSS itself but relates to how it is used by
> NFSv4.
> Servers are required to support RPCSEG_GSS but to use it, you need to
> translate
> a uid to an RPCSEC_GSS principal,   Where that mapping is not available, as
> it often
> is not, you cannot use RPCSEC_GSS for that user.

Ah, that was enough to jostle the neurons into place (and no change to the
text needed).

Thanks again,

Ben

> >    available
> >
> > - w.r.t. "there is no reason for the information returned to depnd on the
> >   identity of the client principal", I could perhaps imagine some setup
> >   that uses information from a corporate contacts database to determine the
> >   current office/location of a given user and provide a referral to a
> >   spatially-local replica.  So "no reason" may be too absolute (but I don't
> >   have a proposed alternative and don't object to using this text).
> >
> >
> > [...]
> > > > Section B.4
> > > >
> > > >    o  The discussion of trunking which appeared in Section 2.10.5 of
> > > >       RFC5661 [62] needed to be revised, to more clearly explain the
> > > >       multiple types of trunking supporting and how the client can be
> > > >       made aware of the existing trunking configuration.  In addition
> > > >       the last paragraph (exclusive of sub-sections) of that section,
> > > >       dealing with server_owner changes, is literally true, it has been
> > > >       a source of confusion.  [...]
> > > >
> > > > nit: the grammar here is weird; I think there's a missing "while" or
> > > > similar.
> 
> > >
> > > >
> > >
> > >  Anticipate using the following replacement text:
> > >
> > >   o  The discussion of trunking which appeared in Section 2.10.5 of
> > >       RFC5661 [62] needed to be revised, to more clearly explain the
> > >       multiple types of trunking supporting and how the client can be
> >
> > nit: just "trunking support" (not "-ing")?
> >
> 
> Fixed.
> 
> 
> >
> > >       made aware of the existing trunking configuration.  In addition,
> > >       while the last paragraph (exclusive of sub-sections) of that
> > >       section, dealing with server_owner changes, is literally true, it
> > >       has been a source of confusion.  Since the existing paragraph can
> > >       be read as suggesting that such changes be dealt with non-
> > >       disruptively, the issue needs to be clarified in the revised
> > >       section, which appears in Section 2.10.5.
> >
> > Thanks again for going through my giant pile of comments; I hope that you
> > think the improvements to the document are worth the time spent.
> >
> 
> I do.
> 
> 
> > -Ben
> >