Re: [nfsv4] Preventing an NFSv4.1 client from destroying a migrated lease after TSM

David Noveck <davenoveck@gmail.com> Wed, 19 April 2017 16:37 UTC

Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C40D0129B2F for <nfsv4@ietfa.amsl.com>; Wed, 19 Apr 2017 09:37:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.698
X-Spam-Level:
X-Spam-Status: No, score=-2.698 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wUViGUMRitq9 for <nfsv4@ietfa.amsl.com>; Wed, 19 Apr 2017 09:37:26 -0700 (PDT)
Received: from mail-io0-x236.google.com (mail-io0-x236.google.com [IPv6:2607:f8b0:4001:c06::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2BCC2129B2A for <nfsv4@ietf.org>; Wed, 19 Apr 2017 09:37:26 -0700 (PDT)
Received: by mail-io0-x236.google.com with SMTP id a103so28064318ioj.1 for <nfsv4@ietf.org>; Wed, 19 Apr 2017 09:37:26 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=s4kUH+UdvoywAzNSWXAsYSp929e1j2fKkrXEd9UXHCQ=; b=EDdX/VTFVcC/cEqxMNwM5SyAb+zf8BL/JVwmC217K6bB7u7hEp7HpF1pOcx93iGzVh Ihqk22JGYu3rmvyeCm6dqtQwZtZ8DnaVboyMUBziuTVghiG8vjlkNW9Qch+hXaOJIFyQ NWS52Zy5ylxD4S8IeH15xz6VqHDtozfYn0iCXhyJfhWqv0WgjlBapRXpiTrr4J1oghba WUUCWOflhdTE4HeX+E3bTHJ38BBDbYLehvptuboy3GWBwgwNYdfi+3L0xcFNWU3EazFm mJPsHricv1ylXqZ08xG239Eq5fIdLZYcC2Z87p0URPql2U0xf4H3JzandoRAQFa03nW4 QmrQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=s4kUH+UdvoywAzNSWXAsYSp929e1j2fKkrXEd9UXHCQ=; b=dP3AwSlTQU10oDZvkhLPEf5nr5KPiO1i6fHOvaSudsd3EjEF1pgfWkXtOs3iqCcrtZ RoLN4f0BeSzgSP0HREc81znclqgSy1SicyJvIhe95qob8zPPnmAq+0vFLuTIg/ZYpANm L7st4vHSwdPb6t4vtbQvKjrH4fcaov4d/TguzUTgY6ApNs7i69b4rxt843b2Oa5Ej1fP XynT3+8MjsDLF+XWaK4AvTpZJCoYTtnwEBn3W1HMZ0OpH+QYHIaq0rFkySG8Vuokmi+J fuMZKFF0OqiA6LtxRMx6aUxBM4+V1LqGWFDRkkhb5kggu7Q6verrPtuuYpYL9lWtGrI0 nHIQ==
X-Gm-Message-State: AN3rC/5A32SLblnM2cPtP031qqePaqMAIAGS0v2I8aMFSXA2VNKVqtqE L4Q9AXvLHZRCUp39Sy/j0eTTSZb5/ud0
X-Received: by 10.36.22.85 with SMTP id a82mr21288183ita.83.1492619844015; Wed, 19 Apr 2017 09:37:24 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.107.175.14 with HTTP; Wed, 19 Apr 2017 09:37:23 -0700 (PDT)
In-Reply-To: <0C02F2F9-A202-45A7-844B-55CA6D308CCC@oracle.com>
References: <ED0D48EE-4618-4E07-B97F-8320C77CF1EC@oracle.com> <CADaq8jfHAF6+2AfuRGr2a=D0FX96YAVu=gGJTqGwSvhXCbntfg@mail.gmail.com> <BADB9283-3B08-49DD-A2E2-5C0C20054C45@oracle.com> <CADaq8jfhxcGqJd0U40JkyT9eT+hj24dovMmqFUtkSic_b08jDg@mail.gmail.com> <213DBA50-FB48-4B48-A177-22C2B6879D49@oracle.com> <CADaq8jeuPQfUNM-QH2jEKuMRSjsEwWQU_0ZBVJH0KDU7-vipmQ@mail.gmail.com> <0C02F2F9-A202-45A7-844B-55CA6D308CCC@oracle.com>
From: David Noveck <davenoveck@gmail.com>
Date: Wed, 19 Apr 2017 12:37:23 -0400
Message-ID: <CADaq8jdcYN3JZGLgqs3JGmPyQdyQ8OubPj4i9jRVhW5Uk7ZHHQ@mail.gmail.com>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: NFSv4 <nfsv4@ietf.org>
Content-Type: multipart/alternative; boundary="001a11452daa965e79054d87a33a"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/kLNx_5RJiLzufn7PBzs86yib1i8>
Subject: Re: [nfsv4] Preventing an NFSv4.1 client from destroying a migrated lease after TSM
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Apr 2017 16:37:30 -0000

> I believe that is what I have prototyped in the Linux
> client, currently, and it is known to work as well as
> a non-migration EXCHANGE_ID / CREATE_SESSION.

Good  to know.

> I don't object to this as long as the WG has blessed
> this mechanism

I don't know exactly what is involved in the WG
"bless[ing]" this mechanism.  Perhaps it is not distinct from:

> documents it in an "NFSv4.1
> migration update" as mentioned above.

In any case, we'll discuss the possibility of moving forward on
a standards-track document once migration-issues-13 is out.




On Wed, Apr 19, 2017 at 12:09 PM, Chuck Lever <chuck.lever@oracle.com>
wrote:

>
> > On Apr 18, 2017, at 3:30 PM, David Noveck <davenoveck@gmail.com> wrote:
> >
> > To avoid burying the lead, I'll respond waaay out of order:
> >
> > > In sum, here are some options we've considered:
> >
> > > 1a. destination asserts CONF_R, but client uses the
> > > returned contrived slot sequence anyway
> >
> > Clearly, migration-issues-13 needs to take account of this issue.
> > I believe 1a is the best choice and will explain why below.
> >
> > > We've explored this mechanism a little more, and now we seem
> > > to be hamstrung on the first paragraph of RFC 5661, Section
> > > 18.35.3:
> >
> > Like much of RFCs 5661 and 7530, the paragraph in question
> > was written without any consideration of issues related to
> > transparent state migration.  Sigh!
> >
> > I think I have not been as explicit about this as I should have been
> > but, as we have started to address a number of migration-related issues
> for
> > NFSv4.1 it appears that we will need a standards-track document,
> > updating RFC5661, paralleling RFC7931 for NFSv4.0.
> >
> > RFC7931 had a replacement SETCLIENTID description and it now
> > appears that the new document will need a replacement for the
> > EXCHANGE_ID  description.   Coincidence? ....  or something
> > else? :-)
> >
> > >   The client uses the EXCHANGE_ID operation to register a particular
> > >   client owner with the server.
> >
> > However, when the client_owner has been already been registered
> > by other means (e.g. transparent state migration), the client may still
> > use EXCHANGE_ID to obtain the clientid assigned.previously.
> >
> > >  The client ID returned from this
> > >   operation will be necessary for requests that create state on the
> > >   server and will serve as a parent object to sessions created by the
> > >   client.
> >
> > OK.
> >
> > >   In order to confirm the client ID it must first be used,
> > >   along with the returned eir_sequenceid, as arguments to
> > >   CREATE_SESSION.
> >
> > In situations in which the registration of the clent_owner has not
> occurred
> > previously ...
> >
> > > If the flag EXCHGID4_FLAG_CONFIRMED_R is set in the
> > > result, eir_flags, then eir_sequenceid MUST be ignored, as it has no
> > >  relevancy.
> >
> > It is not central to my argument but I believe that the "MUST" is not in
> > accord with RFC2119.  The point here is that if a CREATE_SESSION
> > has been one, which this text assume is the case, that has already
> > established the proper sequence id to use.  This is explaning how the
> > protocol works and not establishing an interoperability REQUIREMENT>
> >
> > Anyway, I would go on:
> >
> > If the flag EXCHGID4_FLAG_CONFIRMED_R is set in the
> > result, eir_flags, then it is an indication that the the registration
> > of the client_owner has already occurred and that a further
> > CREATE_SESSION is not  needed to confirm it.  Of course,
> > subsequent CREATE_SESSIONs may be needed for other
> > reasons.
> >
> > The value eir_seqenceid is used to establish an initial
> > sequence value associate with the clientid returned.  In cases
> > in which a CREATE_SESSION has already been done, there
> > is no need for this value, since sequencing of such request has
> > already been established and the client has no need for this
> > value and will ignore it.
> >
> > > The problem is the normative requirement in the last
> > > sentence.
> >
> > I agree that this REQUIREMENT is a problem.
> >
> > > If CONFIRMED_R is set, a client is required
> > > to ignore the returned contrived slot sequence number.
> >
> > According to RFC5661, yes.
> >
> > > [ This is likely the reason that clients react to this
> > > flag by purging the lease and starting over ].
> >
> > I believe that the reason is that if the client thinks/knows a
> > clientid is unconfirmed and he finds out that the server thinks
> > otherwise, he figure something is really messed up, in which
> > punting like this is to be preferred to a panic :-(
> >
> > > Our first thought was to use the sequence number in the
> > > migrated lease (in other words, the sequence number that
> > > was established on the source server). A client knows
> > > that sequence number by looking at the lease it has with
> > > the source server.
> >
> > In this case, you are moving over the single-slot
> > quasi-session associated with the client.
> >
> > > However, there's no guarantee that the client will not
> > > perform additional CREATE_SESSION operations against
> > > the source server between the time the source server
> > > copies the migrating lease and the client starts trunking
> > > discovery for the destination server.
> >
> > True.  The client can hack around this by starting with
> > the last sequence he has and backing up  if it is too
> > high.  I'd rather not go there, though.
> >
> > > Another thought was to use a well-known constant, like
> > > zero, when creating the first session on the destination
> > > server.
> >
> > That is probably the best option if you is committed to respect
> > the language of the current RFC5661 requirement.  However,
> > you would still need to revise the EXCHANGE_ID description
> > to provide a coherent description, in any case.
> >
> > > We've established that trunking discovery works correctly
> > > when the destination server does not assert CONFIRMED_R.
> >
> > :-)
> >
> > > This is because the client may use the contrived slot
> > > sequence number provided by the destination server.
> >
> > It's because he does use that slot sequence number.  To me, that is
> > an additional reason he should be allowed to do so.
> >
> > > If the destination server does not assert CONFIRMED_R,
> > > then how does the client determine whether Transparent
> > > State Migration has occurred? Can it simply start using
> > > the open and lock state it has, and deal with BAD_STATEID
> > > if the servers did not perform TSM?
> >
> > I think that may be possible.  However, it is is ugly.  if you have
> another way
> > to find out that state was transferred, you can look at the status bits
> from SEQUENCE,
> > and only worry about lost stateids when there is an indication  that some
> > state was lost.
> >
> > > Confirmation of the lease means more than just that it
> > > is present on the queried server
> >
> > Confirmation means that the client is not uncertain about
> > whether it is exists.  Confirmation by means of CREATE_SESSION
> > is one way to do that, by making sure that the EXCHANGE_ID was
> > not a random retry, since abandoned.
> >
> > > it also means there is
> > > now a cached CREATE_SESSION reply, and a known good
> > > contrived slot sequence number.
> >
> > In many situations, that is a reasonable inference bu it doesn't
> > change the meaning of "confirmation".
> >
> > > Neither of those is true
> > > for a migrated lease when the client's sessions were not
> > > also migrated.
> >
> > Yes, but I'm not certain that both of these are true in the case in
> which the client's
> > session are migrated.
> >
> > > Thus we should consider that a migrated lease is really
> > > not confirmed at all,
> >
> > I consider it confirmed, but confirmed in a different manner.
> >
> > > but in some intermediate state.
> >
> > if it it were an intermediate state, there would be some way to get into
> > what you call the confirmed state (and I would call the
> > confirmed-by-CREATE_SESSION state).  You could do that by dong
> > a CREATE_SESSION if you knew the sequenceid but I don't see the
> > point in explicating the fine structure of the various confirmed clientid
> > sub-states.
> >
> > > A more ideal solution would be to create an additional
> > > EXCHID4_FLAG that signifies the existence of a migrated
> > > lease for the querying client.
> >
> > If you were starting today (or had a time machine) you might do it that
> way.
> >
> >
> > > In sum, here are some options we've considered:
> >
> > > 1a. destination asserts CONF_R, but client uses the
> > > returned contrived slot sequence anyway
> >
> >
> > My first choice.
>
> I believe that is what I have prototyped in the Linux
> client, currently, and it is known to work as well as
> a non-migration EXCHANGE_ID / CREATE_SESSION.
>
> I don't object to this as long as the WG has blessed
> this mechanism and documents it in an "NFSv4.1
> migration update" as mentioned above.
>
>
> > > 1b. destination asserts CONF_R, and client uses a fixed
> > > constant starting slot sequence
> >
> > My second choice.
> >
> > > 2. destination does not assert CONF_R, and client is
> > > prepared for BAD_STATEID if TSM did not occur
> >
> > a real drag.
> >
> > > 3. destination asserts a new EXCHID4_FLAG that signifies
> > > a TSM has made the client's lease available on that
> > > server
> >
> > Unfortunately, NFSv4.1 is not an extensible minor version.
> >
> > > 4. NFSv4.1 TSM cannot occur unless the client's sessions
> > > are also migrated
> >
> >
> > > Are there others?
> >
> > There are others but none come to mind right now.
> >
> > On Mon, Apr 17, 2017 at 2:45 PM, Chuck Lever <chuck.lever@oracle.com>
> wrote:
> >
> > > On Mar 9, 2017, at 2:00 PM, David Noveck <davenoveck@gmail.com> wrote:
> > >
> > > > I'm not certain what you mean by "handled ... as it
> > > > would have been in 4.0". Server trunking discovery in
> > > > NFSv4.0 requires an elaborate dance. In NFSv4.1,
> > > > trunking discovery can be done by a single operation
> > > > (EXCHANGE_ID).
> > >
> > > What I meant was that, apart from the greater
> > > calories expended in v4.0 case, the result should have
> > > been the same, i.e. that there is no trunking.
> > >
> > > > I'm speculating that if the server hadn't asserted
> > > > CONFIRMED_R, this client could have recovered
> > > > correctly from the migration event. Is that what you
> > > > mean?
> > >
> > > It wasn't what I meant, but, now that you say it, I think is
> > > probably right.
> >
> > We've explored this mechanism a little more, and now we seem
> > to be hamstrung on the first paragraph of RFC 5661, Section
> > 18.35.3:
> >
> >    The client uses the EXCHANGE_ID operation to register a particular
> >    client owner with the server.  The client ID returned from this
> >    operation will be necessary for requests that create state on the
> >    server and will serve as a parent object to sessions created by the
> >    client.  In order to confirm the client ID it must first be used,
> >    along with the returned eir_sequenceid, as arguments to
> >    CREATE_SESSION.  If the flag EXCHGID4_FLAG_CONFIRMED_R is set in the
> >    result, eir_flags, then eir_sequenceid MUST be ignored, as it has no
> >    relevancy.
> >
> > The problem is the normative requirement in the last
> > sentence. If CONFIRMED_R is set, a client is required
> > to ignore the returned contrived slot sequence number.
> > [ This is likely the reason that clients react to this
> > flag by purging the lease and starting over ].
> >
> > Our first thought was to use the sequence number in the
> > migrated lease (in other words, the sequence number that
> > was established on the source server). A client knows
> > that sequence number by looking at the lease it has with
> > the source server.
> >
> > However, there's no guarantee that the client will not
> > perform additional CREATE_SESSION operations against
> > the source server between the time the source server
> > copies the migrating lease and the client starts trunking
> > discovery for the destination server.
> >
> > Another thought was to use a well-known constant, like
> > zero, when creating the first session on the destination
> > server.
> >
> > What sequence number should be used for the first
> > CREATE_SESSSION with the destination server? If it is
> > the copied sequence number, what can prevent mutation
> > of that sequence number while migration of that lease
> > is not yet complete?
> >
> > We've established that trunking discovery works correctly
> > when the destination server does not assert CONFIRMED_R.
> > This is because the client may use the contrived slot
> > sequence number provided by the destination server.
> >
> > If the destination server does not assert CONFIRMED_R,
> > then how does the client determine whether Transparent
> > State Migration has occurred? Can it simply start using
> > the open and lock state it has, and deal with BAD_STATEID
> > if the servers did not perform TSM?
> >
> >
> > Confirmation of the lease means more than just that it
> > is present on the queried server: it also means there is
> > now a cached CREATE_SESSION reply, and a known good
> > contrived slot sequence number. Neither of those is true
> > for a migrated lease when the client's sessions were not
> > also migrated.
> >
> > Thus we should consider that a migrated lease is really
> > not confirmed at all, but in some intermediate state.
> > A more ideal solution would be to create an additional
> > EXCHID4_FLAG that signifies the existence of a migrated
> > lease for the querying client.
> >
> >
> > In sum, here are some options we've considered:
> >
> > 1a. destination asserts CONF_R, but client uses the
> > returned contrived slot sequence anyway
> >
> > 1b. destination asserts CONF_R, and client uses a fixed
> > constant starting slot sequence
> >
> > 2. destination does not assert CONF_R, and client is
> > prepared for BAD_STATEID if TSM did not occur
> >
> > 3. destination asserts a new EXCHID4_FLAG that signifies
> > a TSM has made the client's lease available on that
> > server
> >
> > 4. NFSv4.1 TSM cannot occur unless the client's sessions
> > are also migrated
> >
> > Are there others?
> >
> > --
> > Chuck Lever
> >
> >
> >
> >
>
> --
> Chuck Lever
>
>
>
>