Re: [nfsv4] DESTROY_SESSION and clientid trunking
<Noveck_David@emc.com> Mon, 26 July 2010 22:03 UTC
Return-Path: <Noveck_David@emc.com>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 312273A6872 for <nfsv4@core3.amsl.com>; Mon, 26 Jul 2010 15:03:06 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.599
X-Spam-Level:
X-Spam-Status: No, score=-6.599 tagged_above=-999 required=5 tests=[AWL=0.000, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MaAJHW2XvAT3 for <nfsv4@core3.amsl.com>; Mon, 26 Jul 2010 15:03:00 -0700 (PDT)
Received: from mexforward.lss.emc.com (mexforward.lss.emc.com [128.222.32.20]) by core3.amsl.com (Postfix) with ESMTP id 473BE3A6890 for <nfsv4@ietf.org>; Mon, 26 Jul 2010 15:02:59 -0700 (PDT)
Received: from hop04-l1d11-si01.isus.emc.com (HOP04-L1D11-SI01.isus.emc.com [10.254.111.54]) by mexforward.lss.emc.com (Switch-3.3.2/Switch-3.1.7) with ESMTP id o6QM3E2Y028870 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 26 Jul 2010 18:03:14 -0400
Received: from mailhub.lss.emc.com (sesha.lss.emc.com [10.254.144.16]) by hop04-l1d11-si01.isus.emc.com (RSA Interceptor); Mon, 26 Jul 2010 18:03:06 -0400
Received: from corpussmtp3.corp.emc.com (corpussmtp3.corp.emc.com [10.254.169.196]) by mailhub.lss.emc.com (Switch-3.4.2/Switch-3.3.2mp) with ESMTP id o6QM35eI014483; Mon, 26 Jul 2010 18:03:05 -0400
Received: from CORPUSMX50A.corp.emc.com ([128.221.62.43]) by corpussmtp3.corp.emc.com with Microsoft SMTPSVC(6.0.3790.4675); Mon, 26 Jul 2010 18:03:05 -0400
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Date: Mon, 26 Jul 2010 18:02:59 -0400
Message-ID: <BF3BB6D12298F54B89C8DCC1E4073D8001F6BF97@CORPUSMX50A.corp.emc.com>
In-Reply-To: <20100720221023.GB15024@fieldses.org>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: [nfsv4] DESTROY_SESSION and clientid trunking
Thread-Index: AcsoWJrBpsLyje0mTvGTUf9eSgo3FAEbSt8A
References: <BF3BB6D12298F54B89C8DCC1E4073D8001D44311@CORPUSMX50A.corp.emc.com> <20100719214218.GE29058@fieldses.org> <BF3BB6D12298F54B89C8DCC1E4073D8001D44429@CORPUSMX50A.corp.emc.com> <20100720221023.GB15024@fieldses.org>
From: Noveck_David@emc.com
To: bfields@fieldses.org
X-OriginalArrivalTime: 26 Jul 2010 22:03:05.0390 (UTC) FILETIME=[52E7B4E0:01CB2D0E]
X-EMM-EM: Active
Cc: nfsv4@ietf.org
Subject: Re: [nfsv4] DESTROY_SESSION and clientid trunking
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jul 2010 22:03:06 -0000
> If an entire server has become unresponsive, then you're really doing > failover--at which point, don't you need a persistent and/or distributed > reply cache if you really want exactly once semantics? First, we need to clarify the "entire server" which may be becoming unresponsive. Do we mean the set of things having the same value of so_major_id and so_minor_id, which I'm going to call a server-minor? Or do we mean the set of things having the same value of so_major_id and any set of potentially different so_minor_id values, which I'll call a server-major. If the entire server which has become unresponsive is a server-major, then yes you have failover. So the client will typically do an EXCHANGE_ID to establish a clientid and depending on whether there are persistent sessions he will either have the EOS that he really needs or not. But what is important in terms of the issue we are talking about here is that the EXCHANGE_ID draws a line under everything that happens under the previous clientid. Whether you have EOS or not, what the client cannot live with is a situation which a WRITE started under the previous clientid might happen later, after I start doing IO's under the new clientid. That is a recipe for data corruption. Now consider the same situation for the case in which a server-minor is what becomes unresponsive. You are not going to establish a new clientid but if you want to issue a WRITE previously issued on a different server-minor, you need an analogous guarantee, that everything previously issued to that server-minor has either been done or it hasn't and never will. What you can't live with a situation in which it might happen at any subsequent time. Leaving aside the case of read-only data, everybody needs true EOS, whether they are attentive enough to want it or not. But v4.1 does not make true EOS with persistent sessions a mandatory feature, probably because of implementation difficulty. Now in our example of an unresponsive server-minor, if there is support to update the session reply cache on disk or on another backup piece of hardware, then we have EOS in this case. It doesn't show up as persistent sessions but rather as connections becoming unresponsive and when they are re-established, they will wind up on an alternate piece of hardware with the correct reply cache being available, whether from disk or otherwise. But just as we deal with servers that don't have persistent sessions in the server-major case, we should deal as well as we can, when we have the case of a server-minor which does not have the ability to store its reply cache information persistently. I agree that people need EOS for this case, but we do have to deal with the case in which we don't have the needed infrastructure. In the case, I'm thinking about a case in which we have unresponsive connections to a server-minor and we need to be able assure that operations previously started on these are either done or not and what is critical is that they are not done in the future. You could do this explicitly with an op that take a server-owner or you could to this at the session level. It seems to me that doing that as part of DESTROY_SESSION makes sense, but you would have to deal with the current spec issues that make this difficult or impossible. My feeling is DESTROY_SESSION needs to assure people that ops started under that session are not going to execute after the DESTROY_SESSION completes, and I would like to understand if there is a good reason for the current restriction as to what connection the op can be issued on. -----Original Message----- From: J. Bruce Fields [mailto:bfields@fieldses.org] Sent: Tuesday, July 20, 2010 6:10 PM To: Noveck, David Cc: nfsv4@ietf.org Subject: Re: [nfsv4] DESTROY_SESSION and clientid trunking On Mon, Jul 19, 2010 at 09:06:45PM -0400, Noveck_David@emc.com wrote: > > I don't get it--isn't it the *connections* associated with the session > > that become unresponsive, not the session itself? > > OK, but if the single connection associated with a session or all the > connections associated with a session becomes unresponsive, then you > still have the same issue whether you are saying the session is > unresponsive or the connections. > > I think I neglected to mention one important piece of context. If you > have two sessions associated with a single clientid and this is the > result of the client on its own deciding on doing this form of clientid > trunking, then what you say is true. > > However, if the two sessions are established because the server returned > different so_minor_id values, then we have a different situation. In > this case the server is telling you that two different IP's are not > capable of supporting a single session. If that is the case, typically > it is because you have a clustered server such that the two IP's > terminate in different hardware which is incapable of sharing memory. If an entire server has become unresponsive, then you're really doing failover--at which point, don't you need a persistent and/or distributed reply cache if you really want exactly once semantics? --b. > With shared memory, you would simply be able to support a common session > and its associated replay cache. > > In such cases, it is very likely that a session or rather the set of all > sessions sharing a given so_minor_id, will become unresponsive, as the > result of various hardware problems in addition to bugs that you can > simply fix. > > > If you wanted EOS across sessions, then it seems to me that you'll end > > up creating another session abstraction on top of sessions. > > That may seem to be a flaw in terms of abstraction management, but it is > in response to a practical requirement. Clients want an EOS guarantee > for their requests. It is simple (by which I mean we have already done > it) to provide it within the context of a single session. However, if > one is forced to move requests from one session to another, and if one > can be forced into either doing this or having the requests wait > forever, then if one is not willing to accept the possibility of data > corruption, one has to address the issue of how to make sure the old > request cannot come to life. > > Note that this doesn't fully address the issue of EOS across sessions > for non-idempotent requests and I think some v4.2 changes would be > required, but v4.1 does allow different so_minor_id value for clustered > servers, and it seems kind of rough to say you can only do this if you > accept the possibility of data corruption. > > -----Original Message----- > From: J. Bruce Fields [mailto:bfields@fieldses.org] > Sent: Monday, July 19, 2010 5:42 PM > To: Noveck, David > Cc: nfsv4@ietf.org > Subject: Re: [nfsv4] DESTROY_SESSION and clientid trunking > > On Mon, Jul 19, 2010 at 05:22:01PM -0400, Noveck_David@emc.com wrote: > > This mail concerns the use of DESTROY_SESSION in the context of > clientid > > trunking. Specifically, there are going to be cases in which a client > > will encounter a situation in which connections to one session are > > unresponsive, whether due to client-side networking issues, a wire > being > > cut, server-side networking issues, or a node of a clustered server > > being down. No matter what the cause the client will need to transfer > > work from one session to another. For simplicity, let's assume we > have > > clientid CL and with two sessions S1 and S2 and that S1 becomes > > unresponsive making it necessary to transfer work so that everything > for > > the client is being done on a single session S2. > > I don't get it--isn't it the *connections* associated with the session > that become unresponsive, not the session itself? > > If you wanted EOS across sessions, then it seems to me that you'll end > up creating another session abstraction on top of sessions. > > And just fixing whatever bug is causing the session "unresponsiveness" > would seem simpler than the infinite regress.... But I probably don't > understand what you mean: how could a session become unresponsive? > > --b. > > > > > The problem I'd like to consider is that of modifying-idempotent > > requests such as WRITE. The question of non-idempotent requests such > as > > RENAME is harder and probably requires some work in v4.2, but the > > simpler case is important since WRITEs are typically much more common. > > > > So the issue is that if you have a request issued on S1 and we'll use > > WRITE throughout as the example, and you get no response, then you > would > > like to issue the WRITE on the other session. Since WRITEs are, in > the > > strict sense, idempotent, you don't have a problem if two WRITEs are > > issued but what is a problem is if there were a WRITE issued on S1, > and > > it was just lazily hanging around somewhere, and then if you did the > > WRITE on S2, and it completed then if the WRITE on S1 were to spring > > back to life, you have the likelihood of data corruption. > > > > What you need is the assurance that the first WRITE (the one on S1), > > either succeed or failed but is not sitting around waiting to be done > at > > a later point. You want to draw a line under all requests initiated > as > > part of S1. I'm assuming the right way to do that is to destroy S1. > > Does anybody have any other ways to do that? Things like closing the > > file will assure that the writes are done but they have unsatisfactory > > locking issues in that someone can open the file with a deny-mode and > > you might not be allowed to open it again. > > > > However, there are some problems in the way that DESTROY_SESSION is > > specified and I think they need to be addressed, whether as errata in > > the v4.1 context or in the v4.2 context. > > > > The first problem is that you would expect to have such a guarantee > > specified in the definition of DESTROY_SESSION and it isn't there. > You > > could argue that since it specifies that the reply information is > > dropped, it would be bad for requests to terminate in a situation > where > > there is session reply information to update, but that isn't ironclad. > > I think it makes sense for the initial paragraph to be changed to > > something like this: > > > > The DESTROY_SESSION operation closes the session and discards the > > session's reply cache, if any. All pending operations initiated on > > > the session are terminated and no additional ones can be started. > > Thus, the requester is assured, once it receives the response, that > > no operations initiated on the session will modify any file data > > or attributes or locking state associated with the client. > > > > Any remaining connections associated with the session are > immediately > > > > disassociated. If the connection has no remaining associated > > sessions, > > the connection MAY be closed by the server. Locks, delegations, > > layouts, > > wants, and the lease, which are all tied to the client ID, are not > > affected by DESTROY_SESSION. > > > > What do people think? Is this reasonable? Is there some way we can > get > > by without it? > > > > The other issue is that the spec says: > > > > DESTROY_SESSION MUST be invoked on a connection that is associated > > with the session being destroyed. > > > > So here I have a couple of questions: > > * Is there a good reason for this? Does it make sense? > > * Is this a big problem for the issue we are talking about? > > > > I believe the answers to both are "no", but I'm not sure. > > > > The problem I'm worried about is that if you are doing a > DESTROY_SESSION > > to clean up from a non-responsive session, if you have to issue it on > > connections associated with that session. So the problem would be > that > > the DESTROY_SESSION can only be sent when you don't need it to be > sent, > > i.e. when the destination IP's are functioning, but if they are, you > > don't want to send it. > > > > There may be security issues but the aren't the security issues dealt > > with by the text that follows the text above. Would there really be > any > > security issue allowing this on any connection that is associated with > a > > session that is part of the same clientid as the session being > > destroyed? After all, you can do an EXCHANGE_ID with a new client > > verifier and get rid of all of all sessions as long as you can confirm > > it my creating one new session. Given that you can doing that why > can't > > you destroy the session. > > > > So one possible way of getting around this would be to associate > > connections that are part of S2 (let's call them with C2A and C2B) > with > > S1 just so they would be allowed in the context in which you need > them. > > While the text talks about the security issues of making sure that we > > have the right server, it doesn't talk about how you might reject the > > BIND_CONN_TO_SESSION because it is for a connection that is for the > > wrong so_minor_id. Is the server obliged to reject this? If it isn't > > then it might have a connection associated with a different session, > > e.g. C2A with S1. If it allows that, is the server obliged to accept > S1 > > requests (in general) on the same basis as it accepts S2 requests? Or > > can it treat these specially, i.e. slow for normal requests (because > > they are routed over a cluster interconnect to the node normally > > handling S2), but capable of using clustering knowledge to respond to > > DESTROY_SESSION even when the S1 node is not functioning? > > > > Is there anything in the spec that prevents this? > > > > > > > > _______________________________________________ > > nfsv4 mailing list > > nfsv4@ietf.org > > https://www.ietf.org/mailman/listinfo/nfsv4 >
- [nfsv4] DESTROY_SESSION and clientid trunking Noveck_David
- Re: [nfsv4] DESTROY_SESSION and clientid trunking J. Bruce Fields
- Re: [nfsv4] DESTROY_SESSION and clientid trunking Noveck_David
- [nfsv4] What does v4.2 look like? Tom Haynes
- Re: [nfsv4] What does v4.2 look like? Noveck_David
- Re: [nfsv4] DESTROY_SESSION and clientid trunking J. Bruce Fields
- Re: [nfsv4] DESTROY_SESSION and clientid trunking Noveck_David
- Re: [nfsv4] DESTROY_SESSION and clientid trunking J. Bruce Fields