Re: [nfsv4] DESTROY_SESSION and clientid trunking
<Noveck_David@emc.com> Tue, 20 July 2010 01:06 UTC
Return-Path: <Noveck_David@emc.com>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id AF9A53A6849 for <nfsv4@core3.amsl.com>; Mon, 19 Jul 2010 18:06:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.599
X-Spam-Level:
X-Spam-Status: No, score=-6.599 tagged_above=-999 required=5 tests=[AWL=0.000, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YwTBpKuvczKM for <nfsv4@core3.amsl.com>; Mon, 19 Jul 2010 18:06:47 -0700 (PDT)
Received: from mexforward.lss.emc.com (mexforward.lss.emc.com [128.222.32.20]) by core3.amsl.com (Postfix) with ESMTP id E47793A6848 for <nfsv4@ietf.org>; Mon, 19 Jul 2010 18:06:46 -0700 (PDT)
Received: from hop04-l1d11-si02.isus.emc.com (HOP04-L1D11-SI02.isus.emc.com [10.254.111.55]) by mexforward.lss.emc.com (Switch-3.3.2/Switch-3.1.7) with ESMTP id o6K16t9i028827 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 19 Jul 2010 21:06:55 -0400
Received: from mailhub.lss.emc.com (nagas.lss.emc.com [10.254.144.15]) by hop04-l1d11-si02.isus.emc.com (RSA Interceptor); Mon, 19 Jul 2010 21:06:46 -0400
Received: from corpussmtp3.corp.emc.com (corpussmtp3.corp.emc.com [10.254.169.196]) by mailhub.lss.emc.com (Switch-3.4.2/Switch-3.3.2mp) with ESMTP id o6K16kAT009789; Mon, 19 Jul 2010 21:06:46 -0400
Received: from CORPUSMX50A.corp.emc.com ([128.221.62.41]) by corpussmtp3.corp.emc.com with Microsoft SMTPSVC(6.0.3790.4675); Mon, 19 Jul 2010 21:06:45 -0400
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Date: Mon, 19 Jul 2010 21:06:45 -0400
Message-ID: <BF3BB6D12298F54B89C8DCC1E4073D8001D44429@CORPUSMX50A.corp.emc.com>
In-Reply-To: <20100719214218.GE29058@fieldses.org>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: [nfsv4] DESTROY_SESSION and clientid trunking
Thread-Index: Acsni18uLwtD9x4BQqKQ0Mu54Z2d4QAGOlkw
References: <BF3BB6D12298F54B89C8DCC1E4073D8001D44311@CORPUSMX50A.corp.emc.com> <20100719214218.GE29058@fieldses.org>
From: Noveck_David@emc.com
To: bfields@fieldses.org
X-OriginalArrivalTime: 20 Jul 2010 01:06:45.0870 (UTC) FILETIME=[D2BB40E0:01CB27A7]
X-EMM-EM: Active
Cc: nfsv4@ietf.org
Subject: Re: [nfsv4] DESTROY_SESSION and clientid trunking
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Jul 2010 01:06:48 -0000
> I don't get it--isn't it the *connections* associated with the session > that become unresponsive, not the session itself? OK, but if the single connection associated with a session or all the connections associated with a session becomes unresponsive, then you still have the same issue whether you are saying the session is unresponsive or the connections. I think I neglected to mention one important piece of context. If you have two sessions associated with a single clientid and this is the result of the client on its own deciding on doing this form of clientid trunking, then what you say is true. However, if the two sessions are established because the server returned different so_minor_id values, then we have a different situation. In this case the server is telling you that two different IP's are not capable of supporting a single session. If that is the case, typically it is because you have a clustered server such that the two IP's terminate in different hardware which is incapable of sharing memory. With shared memory, you would simply be able to support a common session and its associated replay cache. In such cases, it is very likely that a session or rather the set of all sessions sharing a given so_minor_id, will become unresponsive, as the result of various hardware problems in addition to bugs that you can simply fix. > If you wanted EOS across sessions, then it seems to me that you'll end > up creating another session abstraction on top of sessions. That may seem to be a flaw in terms of abstraction management, but it is in response to a practical requirement. Clients want an EOS guarantee for their requests. It is simple (by which I mean we have already done it) to provide it within the context of a single session. However, if one is forced to move requests from one session to another, and if one can be forced into either doing this or having the requests wait forever, then if one is not willing to accept the possibility of data corruption, one has to address the issue of how to make sure the old request cannot come to life. Note that this doesn't fully address the issue of EOS across sessions for non-idempotent requests and I think some v4.2 changes would be required, but v4.1 does allow different so_minor_id value for clustered servers, and it seems kind of rough to say you can only do this if you accept the possibility of data corruption. -----Original Message----- From: J. Bruce Fields [mailto:bfields@fieldses.org] Sent: Monday, July 19, 2010 5:42 PM To: Noveck, David Cc: nfsv4@ietf.org Subject: Re: [nfsv4] DESTROY_SESSION and clientid trunking On Mon, Jul 19, 2010 at 05:22:01PM -0400, Noveck_David@emc.com wrote: > This mail concerns the use of DESTROY_SESSION in the context of clientid > trunking. Specifically, there are going to be cases in which a client > will encounter a situation in which connections to one session are > unresponsive, whether due to client-side networking issues, a wire being > cut, server-side networking issues, or a node of a clustered server > being down. No matter what the cause the client will need to transfer > work from one session to another. For simplicity, let's assume we have > clientid CL and with two sessions S1 and S2 and that S1 becomes > unresponsive making it necessary to transfer work so that everything for > the client is being done on a single session S2. I don't get it--isn't it the *connections* associated with the session that become unresponsive, not the session itself? If you wanted EOS across sessions, then it seems to me that you'll end up creating another session abstraction on top of sessions. And just fixing whatever bug is causing the session "unresponsiveness" would seem simpler than the infinite regress.... But I probably don't understand what you mean: how could a session become unresponsive? --b. > > The problem I'd like to consider is that of modifying-idempotent > requests such as WRITE. The question of non-idempotent requests such as > RENAME is harder and probably requires some work in v4.2, but the > simpler case is important since WRITEs are typically much more common. > > So the issue is that if you have a request issued on S1 and we'll use > WRITE throughout as the example, and you get no response, then you would > like to issue the WRITE on the other session. Since WRITEs are, in the > strict sense, idempotent, you don't have a problem if two WRITEs are > issued but what is a problem is if there were a WRITE issued on S1, and > it was just lazily hanging around somewhere, and then if you did the > WRITE on S2, and it completed then if the WRITE on S1 were to spring > back to life, you have the likelihood of data corruption. > > What you need is the assurance that the first WRITE (the one on S1), > either succeed or failed but is not sitting around waiting to be done at > a later point. You want to draw a line under all requests initiated as > part of S1. I'm assuming the right way to do that is to destroy S1. > Does anybody have any other ways to do that? Things like closing the > file will assure that the writes are done but they have unsatisfactory > locking issues in that someone can open the file with a deny-mode and > you might not be allowed to open it again. > > However, there are some problems in the way that DESTROY_SESSION is > specified and I think they need to be addressed, whether as errata in > the v4.1 context or in the v4.2 context. > > The first problem is that you would expect to have such a guarantee > specified in the definition of DESTROY_SESSION and it isn't there. You > could argue that since it specifies that the reply information is > dropped, it would be bad for requests to terminate in a situation where > there is session reply information to update, but that isn't ironclad. > I think it makes sense for the initial paragraph to be changed to > something like this: > > The DESTROY_SESSION operation closes the session and discards the > session's reply cache, if any. All pending operations initiated on > the session are terminated and no additional ones can be started. > Thus, the requester is assured, once it receives the response, that > no operations initiated on the session will modify any file data > or attributes or locking state associated with the client. > > Any remaining connections associated with the session are immediately > > disassociated. If the connection has no remaining associated > sessions, > the connection MAY be closed by the server. Locks, delegations, > layouts, > wants, and the lease, which are all tied to the client ID, are not > affected by DESTROY_SESSION. > > What do people think? Is this reasonable? Is there some way we can get > by without it? > > The other issue is that the spec says: > > DESTROY_SESSION MUST be invoked on a connection that is associated > with the session being destroyed. > > So here I have a couple of questions: > * Is there a good reason for this? Does it make sense? > * Is this a big problem for the issue we are talking about? > > I believe the answers to both are "no", but I'm not sure. > > The problem I'm worried about is that if you are doing a DESTROY_SESSION > to clean up from a non-responsive session, if you have to issue it on > connections associated with that session. So the problem would be that > the DESTROY_SESSION can only be sent when you don't need it to be sent, > i.e. when the destination IP's are functioning, but if they are, you > don't want to send it. > > There may be security issues but the aren't the security issues dealt > with by the text that follows the text above. Would there really be any > security issue allowing this on any connection that is associated with a > session that is part of the same clientid as the session being > destroyed? After all, you can do an EXCHANGE_ID with a new client > verifier and get rid of all of all sessions as long as you can confirm > it my creating one new session. Given that you can doing that why can't > you destroy the session. > > So one possible way of getting around this would be to associate > connections that are part of S2 (let's call them with C2A and C2B) with > S1 just so they would be allowed in the context in which you need them. > While the text talks about the security issues of making sure that we > have the right server, it doesn't talk about how you might reject the > BIND_CONN_TO_SESSION because it is for a connection that is for the > wrong so_minor_id. Is the server obliged to reject this? If it isn't > then it might have a connection associated with a different session, > e.g. C2A with S1. If it allows that, is the server obliged to accept S1 > requests (in general) on the same basis as it accepts S2 requests? Or > can it treat these specially, i.e. slow for normal requests (because > they are routed over a cluster interconnect to the node normally > handling S2), but capable of using clustering knowledge to respond to > DESTROY_SESSION even when the S1 node is not functioning? > > Is there anything in the spec that prevents this? > > > > _______________________________________________ > nfsv4 mailing list > nfsv4@ietf.org > https://www.ietf.org/mailman/listinfo/nfsv4
- [nfsv4] DESTROY_SESSION and clientid trunking Noveck_David
- Re: [nfsv4] DESTROY_SESSION and clientid trunking J. Bruce Fields
- Re: [nfsv4] DESTROY_SESSION and clientid trunking Noveck_David
- [nfsv4] What does v4.2 look like? Tom Haynes
- Re: [nfsv4] What does v4.2 look like? Noveck_David
- Re: [nfsv4] DESTROY_SESSION and clientid trunking J. Bruce Fields
- Re: [nfsv4] DESTROY_SESSION and clientid trunking Noveck_David
- Re: [nfsv4] DESTROY_SESSION and clientid trunking J. Bruce Fields