Re: [nfsv4] DESTROY_SESSION and clientid trunking
"J. Bruce Fields" <bfields@fieldses.org> Tue, 20 July 2010 22:10 UTC
Return-Path: <bfields@fieldses.org>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id EA8AE3A67D9 for <nfsv4@core3.amsl.com>; Tue, 20 Jul 2010 15:10:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xhjAfPXZDMMs for <nfsv4@core3.amsl.com>; Tue, 20 Jul 2010 15:10:49 -0700 (PDT)
Received: from fieldses.org (fieldses.org [174.143.236.118]) by core3.amsl.com (Postfix) with ESMTP id 425473A680F for <nfsv4@ietf.org>; Tue, 20 Jul 2010 15:10:49 -0700 (PDT)
Received: from bfields by fieldses.org with local (Exim 4.71) (envelope-from <bfields@fieldses.org>) id 1ObL15-0004UT-Px; Tue, 20 Jul 2010 18:10:23 -0400
Date: Tue, 20 Jul 2010 18:10:23 -0400
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Noveck_David@emc.com
Message-ID: <20100720221023.GB15024@fieldses.org>
References: <BF3BB6D12298F54B89C8DCC1E4073D8001D44311@CORPUSMX50A.corp.emc.com> <20100719214218.GE29058@fieldses.org> <BF3BB6D12298F54B89C8DCC1E4073D8001D44429@CORPUSMX50A.corp.emc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <BF3BB6D12298F54B89C8DCC1E4073D8001D44429@CORPUSMX50A.corp.emc.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
Cc: nfsv4@ietf.org
Subject: Re: [nfsv4] DESTROY_SESSION and clientid trunking
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Jul 2010 22:10:52 -0000
On Mon, Jul 19, 2010 at 09:06:45PM -0400, Noveck_David@emc.com wrote: > > I don't get it--isn't it the *connections* associated with the session > > that become unresponsive, not the session itself? > > OK, but if the single connection associated with a session or all the > connections associated with a session becomes unresponsive, then you > still have the same issue whether you are saying the session is > unresponsive or the connections. > > I think I neglected to mention one important piece of context. If you > have two sessions associated with a single clientid and this is the > result of the client on its own deciding on doing this form of clientid > trunking, then what you say is true. > > However, if the two sessions are established because the server returned > different so_minor_id values, then we have a different situation. In > this case the server is telling you that two different IP's are not > capable of supporting a single session. If that is the case, typically > it is because you have a clustered server such that the two IP's > terminate in different hardware which is incapable of sharing memory. If an entire server has become unresponsive, then you're really doing failover--at which point, don't you need a persistent and/or distributed reply cache if you really want exactly once semantics? --b. > With shared memory, you would simply be able to support a common session > and its associated replay cache. > > In such cases, it is very likely that a session or rather the set of all > sessions sharing a given so_minor_id, will become unresponsive, as the > result of various hardware problems in addition to bugs that you can > simply fix. > > > If you wanted EOS across sessions, then it seems to me that you'll end > > up creating another session abstraction on top of sessions. > > That may seem to be a flaw in terms of abstraction management, but it is > in response to a practical requirement. Clients want an EOS guarantee > for their requests. It is simple (by which I mean we have already done > it) to provide it within the context of a single session. However, if > one is forced to move requests from one session to another, and if one > can be forced into either doing this or having the requests wait > forever, then if one is not willing to accept the possibility of data > corruption, one has to address the issue of how to make sure the old > request cannot come to life. > > Note that this doesn't fully address the issue of EOS across sessions > for non-idempotent requests and I think some v4.2 changes would be > required, but v4.1 does allow different so_minor_id value for clustered > servers, and it seems kind of rough to say you can only do this if you > accept the possibility of data corruption. > > -----Original Message----- > From: J. Bruce Fields [mailto:bfields@fieldses.org] > Sent: Monday, July 19, 2010 5:42 PM > To: Noveck, David > Cc: nfsv4@ietf.org > Subject: Re: [nfsv4] DESTROY_SESSION and clientid trunking > > On Mon, Jul 19, 2010 at 05:22:01PM -0400, Noveck_David@emc.com wrote: > > This mail concerns the use of DESTROY_SESSION in the context of > clientid > > trunking. Specifically, there are going to be cases in which a client > > will encounter a situation in which connections to one session are > > unresponsive, whether due to client-side networking issues, a wire > being > > cut, server-side networking issues, or a node of a clustered server > > being down. No matter what the cause the client will need to transfer > > work from one session to another. For simplicity, let's assume we > have > > clientid CL and with two sessions S1 and S2 and that S1 becomes > > unresponsive making it necessary to transfer work so that everything > for > > the client is being done on a single session S2. > > I don't get it--isn't it the *connections* associated with the session > that become unresponsive, not the session itself? > > If you wanted EOS across sessions, then it seems to me that you'll end > up creating another session abstraction on top of sessions. > > And just fixing whatever bug is causing the session "unresponsiveness" > would seem simpler than the infinite regress.... But I probably don't > understand what you mean: how could a session become unresponsive? > > --b. > > > > > The problem I'd like to consider is that of modifying-idempotent > > requests such as WRITE. The question of non-idempotent requests such > as > > RENAME is harder and probably requires some work in v4.2, but the > > simpler case is important since WRITEs are typically much more common. > > > > So the issue is that if you have a request issued on S1 and we'll use > > WRITE throughout as the example, and you get no response, then you > would > > like to issue the WRITE on the other session. Since WRITEs are, in > the > > strict sense, idempotent, you don't have a problem if two WRITEs are > > issued but what is a problem is if there were a WRITE issued on S1, > and > > it was just lazily hanging around somewhere, and then if you did the > > WRITE on S2, and it completed then if the WRITE on S1 were to spring > > back to life, you have the likelihood of data corruption. > > > > What you need is the assurance that the first WRITE (the one on S1), > > either succeed or failed but is not sitting around waiting to be done > at > > a later point. You want to draw a line under all requests initiated > as > > part of S1. I'm assuming the right way to do that is to destroy S1. > > Does anybody have any other ways to do that? Things like closing the > > file will assure that the writes are done but they have unsatisfactory > > locking issues in that someone can open the file with a deny-mode and > > you might not be allowed to open it again. > > > > However, there are some problems in the way that DESTROY_SESSION is > > specified and I think they need to be addressed, whether as errata in > > the v4.1 context or in the v4.2 context. > > > > The first problem is that you would expect to have such a guarantee > > specified in the definition of DESTROY_SESSION and it isn't there. > You > > could argue that since it specifies that the reply information is > > dropped, it would be bad for requests to terminate in a situation > where > > there is session reply information to update, but that isn't ironclad. > > I think it makes sense for the initial paragraph to be changed to > > something like this: > > > > The DESTROY_SESSION operation closes the session and discards the > > session's reply cache, if any. All pending operations initiated on > > > the session are terminated and no additional ones can be started. > > Thus, the requester is assured, once it receives the response, that > > no operations initiated on the session will modify any file data > > or attributes or locking state associated with the client. > > > > Any remaining connections associated with the session are > immediately > > > > disassociated. If the connection has no remaining associated > > sessions, > > the connection MAY be closed by the server. Locks, delegations, > > layouts, > > wants, and the lease, which are all tied to the client ID, are not > > affected by DESTROY_SESSION. > > > > What do people think? Is this reasonable? Is there some way we can > get > > by without it? > > > > The other issue is that the spec says: > > > > DESTROY_SESSION MUST be invoked on a connection that is associated > > with the session being destroyed. > > > > So here I have a couple of questions: > > * Is there a good reason for this? Does it make sense? > > * Is this a big problem for the issue we are talking about? > > > > I believe the answers to both are "no", but I'm not sure. > > > > The problem I'm worried about is that if you are doing a > DESTROY_SESSION > > to clean up from a non-responsive session, if you have to issue it on > > connections associated with that session. So the problem would be > that > > the DESTROY_SESSION can only be sent when you don't need it to be > sent, > > i.e. when the destination IP's are functioning, but if they are, you > > don't want to send it. > > > > There may be security issues but the aren't the security issues dealt > > with by the text that follows the text above. Would there really be > any > > security issue allowing this on any connection that is associated with > a > > session that is part of the same clientid as the session being > > destroyed? After all, you can do an EXCHANGE_ID with a new client > > verifier and get rid of all of all sessions as long as you can confirm > > it my creating one new session. Given that you can doing that why > can't > > you destroy the session. > > > > So one possible way of getting around this would be to associate > > connections that are part of S2 (let's call them with C2A and C2B) > with > > S1 just so they would be allowed in the context in which you need > them. > > While the text talks about the security issues of making sure that we > > have the right server, it doesn't talk about how you might reject the > > BIND_CONN_TO_SESSION because it is for a connection that is for the > > wrong so_minor_id. Is the server obliged to reject this? If it isn't > > then it might have a connection associated with a different session, > > e.g. C2A with S1. If it allows that, is the server obliged to accept > S1 > > requests (in general) on the same basis as it accepts S2 requests? Or > > can it treat these specially, i.e. slow for normal requests (because > > they are routed over a cluster interconnect to the node normally > > handling S2), but capable of using clustering knowledge to respond to > > DESTROY_SESSION even when the S1 node is not functioning? > > > > Is there anything in the spec that prevents this? > > > > > > > > _______________________________________________ > > nfsv4 mailing list > > nfsv4@ietf.org > > https://www.ietf.org/mailman/listinfo/nfsv4 >
- [nfsv4] DESTROY_SESSION and clientid trunking Noveck_David
- Re: [nfsv4] DESTROY_SESSION and clientid trunking J. Bruce Fields
- Re: [nfsv4] DESTROY_SESSION and clientid trunking Noveck_David
- [nfsv4] What does v4.2 look like? Tom Haynes
- Re: [nfsv4] What does v4.2 look like? Noveck_David
- Re: [nfsv4] DESTROY_SESSION and clientid trunking J. Bruce Fields
- Re: [nfsv4] DESTROY_SESSION and clientid trunking Noveck_David
- Re: [nfsv4] DESTROY_SESSION and clientid trunking J. Bruce Fields