Re: [nfsv4] Preventing an NFSv4.1 client from destroying a migrated lease after TSM
Chuck Lever <chuck.lever@oracle.com> Wed, 19 April 2017 16:16 UTC
Return-Path: <chuck.lever@oracle.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 312361287A7 for <nfsv4@ietfa.amsl.com>; Wed, 19 Apr 2017 09:16:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.22
X-Spam-Level:
X-Spam-Status: No, score=-4.22 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tXGTOqA3WgrX for <nfsv4@ietfa.amsl.com>; Wed, 19 Apr 2017 09:16:10 -0700 (PDT)
Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4230F128896 for <nfsv4@ietf.org>; Wed, 19 Apr 2017 09:09:19 -0700 (PDT)
Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id v3JG9HG1006007 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 19 Apr 2017 16:09:17 GMT
Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id v3JG9HxU021001 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 19 Apr 2017 16:09:17 GMT
Received: from abhmp0017.oracle.com (abhmp0017.oracle.com [141.146.116.23]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id v3JG9GsD026814; Wed, 19 Apr 2017 16:09:17 GMT
Received: from anon-dhcp-171.1015granger.net (/68.46.169.226) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 19 Apr 2017 09:09:16 -0700
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Chuck Lever <chuck.lever@oracle.com>
In-Reply-To: <CADaq8jeuPQfUNM-QH2jEKuMRSjsEwWQU_0ZBVJH0KDU7-vipmQ@mail.gmail.com>
Date: Wed, 19 Apr 2017 12:09:15 -0400
Cc: NFSv4 <nfsv4@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <0C02F2F9-A202-45A7-844B-55CA6D308CCC@oracle.com>
References: <ED0D48EE-4618-4E07-B97F-8320C77CF1EC@oracle.com> <CADaq8jfHAF6+2AfuRGr2a=D0FX96YAVu=gGJTqGwSvhXCbntfg@mail.gmail.com> <BADB9283-3B08-49DD-A2E2-5C0C20054C45@oracle.com> <CADaq8jfhxcGqJd0U40JkyT9eT+hj24dovMmqFUtkSic_b08jDg@mail.gmail.com> <213DBA50-FB48-4B48-A177-22C2B6879D49@oracle.com> <CADaq8jeuPQfUNM-QH2jEKuMRSjsEwWQU_0ZBVJH0KDU7-vipmQ@mail.gmail.com>
To: David Noveck <davenoveck@gmail.com>
X-Mailer: Apple Mail (2.3124)
X-Source-IP: userv0021.oracle.com [156.151.31.71]
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/D30WINMYWGOURbbQKLUE_AEHwT4>
Subject: Re: [nfsv4] Preventing an NFSv4.1 client from destroying a migrated lease after TSM
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Apr 2017 16:16:12 -0000
> On Apr 18, 2017, at 3:30 PM, David Noveck <davenoveck@gmail.com> wrote: > > To avoid burying the lead, I'll respond waaay out of order: > > > In sum, here are some options we've considered: > > > 1a. destination asserts CONF_R, but client uses the > > returned contrived slot sequence anyway > > Clearly, migration-issues-13 needs to take account of this issue. > I believe 1a is the best choice and will explain why below. > > > We've explored this mechanism a little more, and now we seem > > to be hamstrung on the first paragraph of RFC 5661, Section > > 18.35.3: > > Like much of RFCs 5661 and 7530, the paragraph in question > was written without any consideration of issues related to > transparent state migration. Sigh! > > I think I have not been as explicit about this as I should have been > but, as we have started to address a number of migration-related issues for > NFSv4.1 it appears that we will need a standards-track document, > updating RFC5661, paralleling RFC7931 for NFSv4.0. > > RFC7931 had a replacement SETCLIENTID description and it now > appears that the new document will need a replacement for the > EXCHANGE_ID description. Coincidence? .... or something > else? :-) > > > The client uses the EXCHANGE_ID operation to register a particular > > client owner with the server. > > However, when the client_owner has been already been registered > by other means (e.g. transparent state migration), the client may still > use EXCHANGE_ID to obtain the clientid assigned.previously. > > > The client ID returned from this > > operation will be necessary for requests that create state on the > > server and will serve as a parent object to sessions created by the > > client. > > OK. > > > In order to confirm the client ID it must first be used, > > along with the returned eir_sequenceid, as arguments to > > CREATE_SESSION. > > In situations in which the registration of the clent_owner has not occurred > previously ... > > > If the flag EXCHGID4_FLAG_CONFIRMED_R is set in the > > result, eir_flags, then eir_sequenceid MUST be ignored, as it has no > > relevancy. > > It is not central to my argument but I believe that the "MUST" is not in > accord with RFC2119. The point here is that if a CREATE_SESSION > has been one, which this text assume is the case, that has already > established the proper sequence id to use. This is explaning how the > protocol works and not establishing an interoperability REQUIREMENT> > > Anyway, I would go on: > > If the flag EXCHGID4_FLAG_CONFIRMED_R is set in the > result, eir_flags, then it is an indication that the the registration > of the client_owner has already occurred and that a further > CREATE_SESSION is not needed to confirm it. Of course, > subsequent CREATE_SESSIONs may be needed for other > reasons. > > The value eir_seqenceid is used to establish an initial > sequence value associate with the clientid returned. In cases > in which a CREATE_SESSION has already been done, there > is no need for this value, since sequencing of such request has > already been established and the client has no need for this > value and will ignore it. > > > The problem is the normative requirement in the last > > sentence. > > I agree that this REQUIREMENT is a problem. > > > If CONFIRMED_R is set, a client is required > > to ignore the returned contrived slot sequence number. > > According to RFC5661, yes. > > > [ This is likely the reason that clients react to this > > flag by purging the lease and starting over ]. > > I believe that the reason is that if the client thinks/knows a > clientid is unconfirmed and he finds out that the server thinks > otherwise, he figure something is really messed up, in which > punting like this is to be preferred to a panic :-( > > > Our first thought was to use the sequence number in the > > migrated lease (in other words, the sequence number that > > was established on the source server). A client knows > > that sequence number by looking at the lease it has with > > the source server. > > In this case, you are moving over the single-slot > quasi-session associated with the client. > > > However, there's no guarantee that the client will not > > perform additional CREATE_SESSION operations against > > the source server between the time the source server > > copies the migrating lease and the client starts trunking > > discovery for the destination server. > > True. The client can hack around this by starting with > the last sequence he has and backing up if it is too > high. I'd rather not go there, though. > > > Another thought was to use a well-known constant, like > > zero, when creating the first session on the destination > > server. > > That is probably the best option if you is committed to respect > the language of the current RFC5661 requirement. However, > you would still need to revise the EXCHANGE_ID description > to provide a coherent description, in any case. > > > We've established that trunking discovery works correctly > > when the destination server does not assert CONFIRMED_R. > > :-) > > > This is because the client may use the contrived slot > > sequence number provided by the destination server. > > It's because he does use that slot sequence number. To me, that is > an additional reason he should be allowed to do so. > > > If the destination server does not assert CONFIRMED_R, > > then how does the client determine whether Transparent > > State Migration has occurred? Can it simply start using > > the open and lock state it has, and deal with BAD_STATEID > > if the servers did not perform TSM? > > I think that may be possible. However, it is is ugly. if you have another way > to find out that state was transferred, you can look at the status bits from SEQUENCE, > and only worry about lost stateids when there is an indication that some > state was lost. > > > Confirmation of the lease means more than just that it > > is present on the queried server > > Confirmation means that the client is not uncertain about > whether it is exists. Confirmation by means of CREATE_SESSION > is one way to do that, by making sure that the EXCHANGE_ID was > not a random retry, since abandoned. > > > it also means there is > > now a cached CREATE_SESSION reply, and a known good > > contrived slot sequence number. > > In many situations, that is a reasonable inference bu it doesn't > change the meaning of "confirmation". > > > Neither of those is true > > for a migrated lease when the client's sessions were not > > also migrated. > > Yes, but I'm not certain that both of these are true in the case in which the client's > session are migrated. > > > Thus we should consider that a migrated lease is really > > not confirmed at all, > > I consider it confirmed, but confirmed in a different manner. > > > but in some intermediate state. > > if it it were an intermediate state, there would be some way to get into > what you call the confirmed state (and I would call the > confirmed-by-CREATE_SESSION state). You could do that by dong > a CREATE_SESSION if you knew the sequenceid but I don't see the > point in explicating the fine structure of the various confirmed clientid > sub-states. > > > A more ideal solution would be to create an additional > > EXCHID4_FLAG that signifies the existence of a migrated > > lease for the querying client. > > If you were starting today (or had a time machine) you might do it that way. > > > > In sum, here are some options we've considered: > > > 1a. destination asserts CONF_R, but client uses the > > returned contrived slot sequence anyway > > > My first choice. I believe that is what I have prototyped in the Linux client, currently, and it is known to work as well as a non-migration EXCHANGE_ID / CREATE_SESSION. I don't object to this as long as the WG has blessed this mechanism and documents it in an "NFSv4.1 migration update" as mentioned above. > > 1b. destination asserts CONF_R, and client uses a fixed > > constant starting slot sequence > > My second choice. > > > 2. destination does not assert CONF_R, and client is > > prepared for BAD_STATEID if TSM did not occur > > a real drag. > > > 3. destination asserts a new EXCHID4_FLAG that signifies > > a TSM has made the client's lease available on that > > server > > Unfortunately, NFSv4.1 is not an extensible minor version. > > > 4. NFSv4.1 TSM cannot occur unless the client's sessions > > are also migrated > > > > Are there others? > > There are others but none come to mind right now. > > On Mon, Apr 17, 2017 at 2:45 PM, Chuck Lever <chuck.lever@oracle.com> wrote: > > > On Mar 9, 2017, at 2:00 PM, David Noveck <davenoveck@gmail.com> wrote: > > > > > I'm not certain what you mean by "handled ... as it > > > would have been in 4.0". Server trunking discovery in > > > NFSv4.0 requires an elaborate dance. In NFSv4.1, > > > trunking discovery can be done by a single operation > > > (EXCHANGE_ID). > > > > What I meant was that, apart from the greater > > calories expended in v4.0 case, the result should have > > been the same, i.e. that there is no trunking. > > > > > I'm speculating that if the server hadn't asserted > > > CONFIRMED_R, this client could have recovered > > > correctly from the migration event. Is that what you > > > mean? > > > > It wasn't what I meant, but, now that you say it, I think is > > probably right. > > We've explored this mechanism a little more, and now we seem > to be hamstrung on the first paragraph of RFC 5661, Section > 18.35.3: > > The client uses the EXCHANGE_ID operation to register a particular > client owner with the server. The client ID returned from this > operation will be necessary for requests that create state on the > server and will serve as a parent object to sessions created by the > client. In order to confirm the client ID it must first be used, > along with the returned eir_sequenceid, as arguments to > CREATE_SESSION. If the flag EXCHGID4_FLAG_CONFIRMED_R is set in the > result, eir_flags, then eir_sequenceid MUST be ignored, as it has no > relevancy. > > The problem is the normative requirement in the last > sentence. If CONFIRMED_R is set, a client is required > to ignore the returned contrived slot sequence number. > [ This is likely the reason that clients react to this > flag by purging the lease and starting over ]. > > Our first thought was to use the sequence number in the > migrated lease (in other words, the sequence number that > was established on the source server). A client knows > that sequence number by looking at the lease it has with > the source server. > > However, there's no guarantee that the client will not > perform additional CREATE_SESSION operations against > the source server between the time the source server > copies the migrating lease and the client starts trunking > discovery for the destination server. > > Another thought was to use a well-known constant, like > zero, when creating the first session on the destination > server. > > What sequence number should be used for the first > CREATE_SESSSION with the destination server? If it is > the copied sequence number, what can prevent mutation > of that sequence number while migration of that lease > is not yet complete? > > We've established that trunking discovery works correctly > when the destination server does not assert CONFIRMED_R. > This is because the client may use the contrived slot > sequence number provided by the destination server. > > If the destination server does not assert CONFIRMED_R, > then how does the client determine whether Transparent > State Migration has occurred? Can it simply start using > the open and lock state it has, and deal with BAD_STATEID > if the servers did not perform TSM? > > > Confirmation of the lease means more than just that it > is present on the queried server: it also means there is > now a cached CREATE_SESSION reply, and a known good > contrived slot sequence number. Neither of those is true > for a migrated lease when the client's sessions were not > also migrated. > > Thus we should consider that a migrated lease is really > not confirmed at all, but in some intermediate state. > A more ideal solution would be to create an additional > EXCHID4_FLAG that signifies the existence of a migrated > lease for the querying client. > > > In sum, here are some options we've considered: > > 1a. destination asserts CONF_R, but client uses the > returned contrived slot sequence anyway > > 1b. destination asserts CONF_R, and client uses a fixed > constant starting slot sequence > > 2. destination does not assert CONF_R, and client is > prepared for BAD_STATEID if TSM did not occur > > 3. destination asserts a new EXCHID4_FLAG that signifies > a TSM has made the client's lease available on that > server > > 4. NFSv4.1 TSM cannot occur unless the client's sessions > are also migrated > > Are there others? > > -- > Chuck Lever > > > > -- Chuck Lever
- Re: [nfsv4] Preventing an NFSv4.1 client from des… Chuck Lever
- Re: [nfsv4] Preventing an NFSv4.1 client from des… Xuan Qi
- Re: [nfsv4] Preventing an NFSv4.1 client from des… David Noveck
- Re: [nfsv4] Preventing an NFSv4.1 client from des… David Noveck
- Re: [nfsv4] Preventing an NFSv4.1 client from des… Chuck Lever
- [nfsv4] Preventing an NFSv4.1 client from destroy… Chuck Lever
- Re: [nfsv4] Preventing an NFSv4.1 client from des… David Noveck
- Re: [nfsv4] Preventing an NFSv4.1 client from des… Chuck Lever
- Re: [nfsv4] Preventing an NFSv4.1 client from des… David Noveck