Re: [nfsv4] Preventing an NFSv4.1 client from destroying a migrated lease after TSM
David Noveck <davenoveck@gmail.com> Wed, 19 April 2017 16:37 UTC
Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C40D0129B2F for <nfsv4@ietfa.amsl.com>; Wed, 19 Apr 2017 09:37:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.698
X-Spam-Level:
X-Spam-Status: No, score=-2.698 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wUViGUMRitq9 for <nfsv4@ietfa.amsl.com>; Wed, 19 Apr 2017 09:37:26 -0700 (PDT)
Received: from mail-io0-x236.google.com (mail-io0-x236.google.com [IPv6:2607:f8b0:4001:c06::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2BCC2129B2A for <nfsv4@ietf.org>; Wed, 19 Apr 2017 09:37:26 -0700 (PDT)
Received: by mail-io0-x236.google.com with SMTP id a103so28064318ioj.1 for <nfsv4@ietf.org>; Wed, 19 Apr 2017 09:37:26 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=s4kUH+UdvoywAzNSWXAsYSp929e1j2fKkrXEd9UXHCQ=; b=EDdX/VTFVcC/cEqxMNwM5SyAb+zf8BL/JVwmC217K6bB7u7hEp7HpF1pOcx93iGzVh Ihqk22JGYu3rmvyeCm6dqtQwZtZ8DnaVboyMUBziuTVghiG8vjlkNW9Qch+hXaOJIFyQ NWS52Zy5ylxD4S8IeH15xz6VqHDtozfYn0iCXhyJfhWqv0WgjlBapRXpiTrr4J1oghba WUUCWOflhdTE4HeX+E3bTHJ38BBDbYLehvptuboy3GWBwgwNYdfi+3L0xcFNWU3EazFm mJPsHricv1ylXqZ08xG239Eq5fIdLZYcC2Z87p0URPql2U0xf4H3JzandoRAQFa03nW4 QmrQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=s4kUH+UdvoywAzNSWXAsYSp929e1j2fKkrXEd9UXHCQ=; b=dP3AwSlTQU10oDZvkhLPEf5nr5KPiO1i6fHOvaSudsd3EjEF1pgfWkXtOs3iqCcrtZ RoLN4f0BeSzgSP0HREc81znclqgSy1SicyJvIhe95qob8zPPnmAq+0vFLuTIg/ZYpANm L7st4vHSwdPb6t4vtbQvKjrH4fcaov4d/TguzUTgY6ApNs7i69b4rxt843b2Oa5Ej1fP XynT3+8MjsDLF+XWaK4AvTpZJCoYTtnwEBn3W1HMZ0OpH+QYHIaq0rFkySG8Vuokmi+J fuMZKFF0OqiA6LtxRMx6aUxBM4+V1LqGWFDRkkhb5kggu7Q6verrPtuuYpYL9lWtGrI0 nHIQ==
X-Gm-Message-State: AN3rC/5A32SLblnM2cPtP031qqePaqMAIAGS0v2I8aMFSXA2VNKVqtqE L4Q9AXvLHZRCUp39Sy/j0eTTSZb5/ud0
X-Received: by 10.36.22.85 with SMTP id a82mr21288183ita.83.1492619844015; Wed, 19 Apr 2017 09:37:24 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.107.175.14 with HTTP; Wed, 19 Apr 2017 09:37:23 -0700 (PDT)
In-Reply-To: <0C02F2F9-A202-45A7-844B-55CA6D308CCC@oracle.com>
References: <ED0D48EE-4618-4E07-B97F-8320C77CF1EC@oracle.com> <CADaq8jfHAF6+2AfuRGr2a=D0FX96YAVu=gGJTqGwSvhXCbntfg@mail.gmail.com> <BADB9283-3B08-49DD-A2E2-5C0C20054C45@oracle.com> <CADaq8jfhxcGqJd0U40JkyT9eT+hj24dovMmqFUtkSic_b08jDg@mail.gmail.com> <213DBA50-FB48-4B48-A177-22C2B6879D49@oracle.com> <CADaq8jeuPQfUNM-QH2jEKuMRSjsEwWQU_0ZBVJH0KDU7-vipmQ@mail.gmail.com> <0C02F2F9-A202-45A7-844B-55CA6D308CCC@oracle.com>
From: David Noveck <davenoveck@gmail.com>
Date: Wed, 19 Apr 2017 12:37:23 -0400
Message-ID: <CADaq8jdcYN3JZGLgqs3JGmPyQdyQ8OubPj4i9jRVhW5Uk7ZHHQ@mail.gmail.com>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: NFSv4 <nfsv4@ietf.org>
Content-Type: multipart/alternative; boundary="001a11452daa965e79054d87a33a"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/kLNx_5RJiLzufn7PBzs86yib1i8>
Subject: Re: [nfsv4] Preventing an NFSv4.1 client from destroying a migrated lease after TSM
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Apr 2017 16:37:30 -0000
> I believe that is what I have prototyped in the Linux > client, currently, and it is known to work as well as > a non-migration EXCHANGE_ID / CREATE_SESSION. Good to know. > I don't object to this as long as the WG has blessed > this mechanism I don't know exactly what is involved in the WG "bless[ing]" this mechanism. Perhaps it is not distinct from: > documents it in an "NFSv4.1 > migration update" as mentioned above. In any case, we'll discuss the possibility of moving forward on a standards-track document once migration-issues-13 is out. On Wed, Apr 19, 2017 at 12:09 PM, Chuck Lever <chuck.lever@oracle.com> wrote: > > > On Apr 18, 2017, at 3:30 PM, David Noveck <davenoveck@gmail.com> wrote: > > > > To avoid burying the lead, I'll respond waaay out of order: > > > > > In sum, here are some options we've considered: > > > > > 1a. destination asserts CONF_R, but client uses the > > > returned contrived slot sequence anyway > > > > Clearly, migration-issues-13 needs to take account of this issue. > > I believe 1a is the best choice and will explain why below. > > > > > We've explored this mechanism a little more, and now we seem > > > to be hamstrung on the first paragraph of RFC 5661, Section > > > 18.35.3: > > > > Like much of RFCs 5661 and 7530, the paragraph in question > > was written without any consideration of issues related to > > transparent state migration. Sigh! > > > > I think I have not been as explicit about this as I should have been > > but, as we have started to address a number of migration-related issues > for > > NFSv4.1 it appears that we will need a standards-track document, > > updating RFC5661, paralleling RFC7931 for NFSv4.0. > > > > RFC7931 had a replacement SETCLIENTID description and it now > > appears that the new document will need a replacement for the > > EXCHANGE_ID description. Coincidence? .... or something > > else? :-) > > > > > The client uses the EXCHANGE_ID operation to register a particular > > > client owner with the server. > > > > However, when the client_owner has been already been registered > > by other means (e.g. transparent state migration), the client may still > > use EXCHANGE_ID to obtain the clientid assigned.previously. > > > > > The client ID returned from this > > > operation will be necessary for requests that create state on the > > > server and will serve as a parent object to sessions created by the > > > client. > > > > OK. > > > > > In order to confirm the client ID it must first be used, > > > along with the returned eir_sequenceid, as arguments to > > > CREATE_SESSION. > > > > In situations in which the registration of the clent_owner has not > occurred > > previously ... > > > > > If the flag EXCHGID4_FLAG_CONFIRMED_R is set in the > > > result, eir_flags, then eir_sequenceid MUST be ignored, as it has no > > > relevancy. > > > > It is not central to my argument but I believe that the "MUST" is not in > > accord with RFC2119. The point here is that if a CREATE_SESSION > > has been one, which this text assume is the case, that has already > > established the proper sequence id to use. This is explaning how the > > protocol works and not establishing an interoperability REQUIREMENT> > > > > Anyway, I would go on: > > > > If the flag EXCHGID4_FLAG_CONFIRMED_R is set in the > > result, eir_flags, then it is an indication that the the registration > > of the client_owner has already occurred and that a further > > CREATE_SESSION is not needed to confirm it. Of course, > > subsequent CREATE_SESSIONs may be needed for other > > reasons. > > > > The value eir_seqenceid is used to establish an initial > > sequence value associate with the clientid returned. In cases > > in which a CREATE_SESSION has already been done, there > > is no need for this value, since sequencing of such request has > > already been established and the client has no need for this > > value and will ignore it. > > > > > The problem is the normative requirement in the last > > > sentence. > > > > I agree that this REQUIREMENT is a problem. > > > > > If CONFIRMED_R is set, a client is required > > > to ignore the returned contrived slot sequence number. > > > > According to RFC5661, yes. > > > > > [ This is likely the reason that clients react to this > > > flag by purging the lease and starting over ]. > > > > I believe that the reason is that if the client thinks/knows a > > clientid is unconfirmed and he finds out that the server thinks > > otherwise, he figure something is really messed up, in which > > punting like this is to be preferred to a panic :-( > > > > > Our first thought was to use the sequence number in the > > > migrated lease (in other words, the sequence number that > > > was established on the source server). A client knows > > > that sequence number by looking at the lease it has with > > > the source server. > > > > In this case, you are moving over the single-slot > > quasi-session associated with the client. > > > > > However, there's no guarantee that the client will not > > > perform additional CREATE_SESSION operations against > > > the source server between the time the source server > > > copies the migrating lease and the client starts trunking > > > discovery for the destination server. > > > > True. The client can hack around this by starting with > > the last sequence he has and backing up if it is too > > high. I'd rather not go there, though. > > > > > Another thought was to use a well-known constant, like > > > zero, when creating the first session on the destination > > > server. > > > > That is probably the best option if you is committed to respect > > the language of the current RFC5661 requirement. However, > > you would still need to revise the EXCHANGE_ID description > > to provide a coherent description, in any case. > > > > > We've established that trunking discovery works correctly > > > when the destination server does not assert CONFIRMED_R. > > > > :-) > > > > > This is because the client may use the contrived slot > > > sequence number provided by the destination server. > > > > It's because he does use that slot sequence number. To me, that is > > an additional reason he should be allowed to do so. > > > > > If the destination server does not assert CONFIRMED_R, > > > then how does the client determine whether Transparent > > > State Migration has occurred? Can it simply start using > > > the open and lock state it has, and deal with BAD_STATEID > > > if the servers did not perform TSM? > > > > I think that may be possible. However, it is is ugly. if you have > another way > > to find out that state was transferred, you can look at the status bits > from SEQUENCE, > > and only worry about lost stateids when there is an indication that some > > state was lost. > > > > > Confirmation of the lease means more than just that it > > > is present on the queried server > > > > Confirmation means that the client is not uncertain about > > whether it is exists. Confirmation by means of CREATE_SESSION > > is one way to do that, by making sure that the EXCHANGE_ID was > > not a random retry, since abandoned. > > > > > it also means there is > > > now a cached CREATE_SESSION reply, and a known good > > > contrived slot sequence number. > > > > In many situations, that is a reasonable inference bu it doesn't > > change the meaning of "confirmation". > > > > > Neither of those is true > > > for a migrated lease when the client's sessions were not > > > also migrated. > > > > Yes, but I'm not certain that both of these are true in the case in > which the client's > > session are migrated. > > > > > Thus we should consider that a migrated lease is really > > > not confirmed at all, > > > > I consider it confirmed, but confirmed in a different manner. > > > > > but in some intermediate state. > > > > if it it were an intermediate state, there would be some way to get into > > what you call the confirmed state (and I would call the > > confirmed-by-CREATE_SESSION state). You could do that by dong > > a CREATE_SESSION if you knew the sequenceid but I don't see the > > point in explicating the fine structure of the various confirmed clientid > > sub-states. > > > > > A more ideal solution would be to create an additional > > > EXCHID4_FLAG that signifies the existence of a migrated > > > lease for the querying client. > > > > If you were starting today (or had a time machine) you might do it that > way. > > > > > > > In sum, here are some options we've considered: > > > > > 1a. destination asserts CONF_R, but client uses the > > > returned contrived slot sequence anyway > > > > > > My first choice. > > I believe that is what I have prototyped in the Linux > client, currently, and it is known to work as well as > a non-migration EXCHANGE_ID / CREATE_SESSION. > > I don't object to this as long as the WG has blessed > this mechanism and documents it in an "NFSv4.1 > migration update" as mentioned above. > > > > > 1b. destination asserts CONF_R, and client uses a fixed > > > constant starting slot sequence > > > > My second choice. > > > > > 2. destination does not assert CONF_R, and client is > > > prepared for BAD_STATEID if TSM did not occur > > > > a real drag. > > > > > 3. destination asserts a new EXCHID4_FLAG that signifies > > > a TSM has made the client's lease available on that > > > server > > > > Unfortunately, NFSv4.1 is not an extensible minor version. > > > > > 4. NFSv4.1 TSM cannot occur unless the client's sessions > > > are also migrated > > > > > > > Are there others? > > > > There are others but none come to mind right now. > > > > On Mon, Apr 17, 2017 at 2:45 PM, Chuck Lever <chuck.lever@oracle.com> > wrote: > > > > > On Mar 9, 2017, at 2:00 PM, David Noveck <davenoveck@gmail.com> wrote: > > > > > > > I'm not certain what you mean by "handled ... as it > > > > would have been in 4.0". Server trunking discovery in > > > > NFSv4.0 requires an elaborate dance. In NFSv4.1, > > > > trunking discovery can be done by a single operation > > > > (EXCHANGE_ID). > > > > > > What I meant was that, apart from the greater > > > calories expended in v4.0 case, the result should have > > > been the same, i.e. that there is no trunking. > > > > > > > I'm speculating that if the server hadn't asserted > > > > CONFIRMED_R, this client could have recovered > > > > correctly from the migration event. Is that what you > > > > mean? > > > > > > It wasn't what I meant, but, now that you say it, I think is > > > probably right. > > > > We've explored this mechanism a little more, and now we seem > > to be hamstrung on the first paragraph of RFC 5661, Section > > 18.35.3: > > > > The client uses the EXCHANGE_ID operation to register a particular > > client owner with the server. The client ID returned from this > > operation will be necessary for requests that create state on the > > server and will serve as a parent object to sessions created by the > > client. In order to confirm the client ID it must first be used, > > along with the returned eir_sequenceid, as arguments to > > CREATE_SESSION. If the flag EXCHGID4_FLAG_CONFIRMED_R is set in the > > result, eir_flags, then eir_sequenceid MUST be ignored, as it has no > > relevancy. > > > > The problem is the normative requirement in the last > > sentence. If CONFIRMED_R is set, a client is required > > to ignore the returned contrived slot sequence number. > > [ This is likely the reason that clients react to this > > flag by purging the lease and starting over ]. > > > > Our first thought was to use the sequence number in the > > migrated lease (in other words, the sequence number that > > was established on the source server). A client knows > > that sequence number by looking at the lease it has with > > the source server. > > > > However, there's no guarantee that the client will not > > perform additional CREATE_SESSION operations against > > the source server between the time the source server > > copies the migrating lease and the client starts trunking > > discovery for the destination server. > > > > Another thought was to use a well-known constant, like > > zero, when creating the first session on the destination > > server. > > > > What sequence number should be used for the first > > CREATE_SESSSION with the destination server? If it is > > the copied sequence number, what can prevent mutation > > of that sequence number while migration of that lease > > is not yet complete? > > > > We've established that trunking discovery works correctly > > when the destination server does not assert CONFIRMED_R. > > This is because the client may use the contrived slot > > sequence number provided by the destination server. > > > > If the destination server does not assert CONFIRMED_R, > > then how does the client determine whether Transparent > > State Migration has occurred? Can it simply start using > > the open and lock state it has, and deal with BAD_STATEID > > if the servers did not perform TSM? > > > > > > Confirmation of the lease means more than just that it > > is present on the queried server: it also means there is > > now a cached CREATE_SESSION reply, and a known good > > contrived slot sequence number. Neither of those is true > > for a migrated lease when the client's sessions were not > > also migrated. > > > > Thus we should consider that a migrated lease is really > > not confirmed at all, but in some intermediate state. > > A more ideal solution would be to create an additional > > EXCHID4_FLAG that signifies the existence of a migrated > > lease for the querying client. > > > > > > In sum, here are some options we've considered: > > > > 1a. destination asserts CONF_R, but client uses the > > returned contrived slot sequence anyway > > > > 1b. destination asserts CONF_R, and client uses a fixed > > constant starting slot sequence > > > > 2. destination does not assert CONF_R, and client is > > prepared for BAD_STATEID if TSM did not occur > > > > 3. destination asserts a new EXCHID4_FLAG that signifies > > a TSM has made the client's lease available on that > > server > > > > 4. NFSv4.1 TSM cannot occur unless the client's sessions > > are also migrated > > > > Are there others? > > > > -- > > Chuck Lever > > > > > > > > > > -- > Chuck Lever > > > >
- Re: [nfsv4] Preventing an NFSv4.1 client from des… Chuck Lever
- Re: [nfsv4] Preventing an NFSv4.1 client from des… Xuan Qi
- Re: [nfsv4] Preventing an NFSv4.1 client from des… David Noveck
- Re: [nfsv4] Preventing an NFSv4.1 client from des… David Noveck
- Re: [nfsv4] Preventing an NFSv4.1 client from des… Chuck Lever
- [nfsv4] Preventing an NFSv4.1 client from destroy… Chuck Lever
- Re: [nfsv4] Preventing an NFSv4.1 client from des… David Noveck
- Re: [nfsv4] Preventing an NFSv4.1 client from des… Chuck Lever
- Re: [nfsv4] Preventing an NFSv4.1 client from des… David Noveck