[nfsv4] DESTROY_SESSION and clientid trunking
<Noveck_David@emc.com> Mon, 19 July 2010 21:22 UTC
Return-Path: <Noveck_David@emc.com>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id BCFFC3A695E for <nfsv4@core3.amsl.com>; Mon, 19 Jul 2010 14:22:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.599
X-Spam-Level:
X-Spam-Status: No, score=-6.599 tagged_above=-999 required=5 tests=[AWL=0.000, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yBADEIZw5-vB for <nfsv4@core3.amsl.com>; Mon, 19 Jul 2010 14:22:01 -0700 (PDT)
Received: from mexforward.lss.emc.com (mexforward.lss.emc.com [128.222.32.20]) by core3.amsl.com (Postfix) with ESMTP id 02DEF3A67CF for <nfsv4@ietf.org>; Mon, 19 Jul 2010 14:22:00 -0700 (PDT)
Received: from hop04-l1d11-si01.isus.emc.com (HOP04-L1D11-SI01.isus.emc.com [10.254.111.54]) by mexforward.lss.emc.com (Switch-3.3.2/Switch-3.1.7) with ESMTP id o6JLMF5Z010704 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for <nfsv4@ietf.org>; Mon, 19 Jul 2010 17:22:15 -0400
Received: from mailhub.lss.emc.com (nagas.lss.emc.com [10.254.144.15]) by hop04-l1d11-si01.isus.emc.com (RSA Interceptor) for <nfsv4@ietf.org>; Mon, 19 Jul 2010 17:22:11 -0400
Received: from corpussmtp4.corp.emc.com (corpussmtp4.corp.emc.com [10.254.169.197]) by mailhub.lss.emc.com (Switch-3.4.2/Switch-3.3.2mp) with ESMTP id o6JLLq7s031354 for <nfsv4@ietf.org>; Mon, 19 Jul 2010 17:22:11 -0400
Received: from CORPUSMX50A.corp.emc.com ([128.221.62.41]) by corpussmtp4.corp.emc.com with Microsoft SMTPSVC(6.0.3790.4675); Mon, 19 Jul 2010 17:22:01 -0400
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Date: Mon, 19 Jul 2010 17:22:01 -0400
Message-ID: <BF3BB6D12298F54B89C8DCC1E4073D8001D44311@CORPUSMX50A.corp.emc.com>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: DESTROY_SESSION and clientid trunking
Thread-Index: AcsniG0dnimtPDB9T5+yUH93CfRnNg==
From: Noveck_David@emc.com
To: nfsv4@ietf.org
X-OriginalArrivalTime: 19 Jul 2010 21:22:01.0843 (UTC) FILETIME=[6DA00430:01CB2788]
X-EMM-EM: Active
Subject: [nfsv4] DESTROY_SESSION and clientid trunking
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 19 Jul 2010 21:22:04 -0000
This mail concerns the use of DESTROY_SESSION in the context of clientid trunking. Specifically, there are going to be cases in which a client will encounter a situation in which connections to one session are unresponsive, whether due to client-side networking issues, a wire being cut, server-side networking issues, or a node of a clustered server being down. No matter what the cause the client will need to transfer work from one session to another. For simplicity, let's assume we have clientid CL and with two sessions S1 and S2 and that S1 becomes unresponsive making it necessary to transfer work so that everything for the client is being done on a single session S2. The problem I'd like to consider is that of modifying-idempotent requests such as WRITE. The question of non-idempotent requests such as RENAME is harder and probably requires some work in v4.2, but the simpler case is important since WRITEs are typically much more common. So the issue is that if you have a request issued on S1 and we'll use WRITE throughout as the example, and you get no response, then you would like to issue the WRITE on the other session. Since WRITEs are, in the strict sense, idempotent, you don't have a problem if two WRITEs are issued but what is a problem is if there were a WRITE issued on S1, and it was just lazily hanging around somewhere, and then if you did the WRITE on S2, and it completed then if the WRITE on S1 were to spring back to life, you have the likelihood of data corruption. What you need is the assurance that the first WRITE (the one on S1), either succeed or failed but is not sitting around waiting to be done at a later point. You want to draw a line under all requests initiated as part of S1. I'm assuming the right way to do that is to destroy S1. Does anybody have any other ways to do that? Things like closing the file will assure that the writes are done but they have unsatisfactory locking issues in that someone can open the file with a deny-mode and you might not be allowed to open it again. However, there are some problems in the way that DESTROY_SESSION is specified and I think they need to be addressed, whether as errata in the v4.1 context or in the v4.2 context. The first problem is that you would expect to have such a guarantee specified in the definition of DESTROY_SESSION and it isn't there. You could argue that since it specifies that the reply information is dropped, it would be bad for requests to terminate in a situation where there is session reply information to update, but that isn't ironclad. I think it makes sense for the initial paragraph to be changed to something like this: The DESTROY_SESSION operation closes the session and discards the session's reply cache, if any. All pending operations initiated on the session are terminated and no additional ones can be started. Thus, the requester is assured, once it receives the response, that no operations initiated on the session will modify any file data or attributes or locking state associated with the client. Any remaining connections associated with the session are immediately disassociated. If the connection has no remaining associated sessions, the connection MAY be closed by the server. Locks, delegations, layouts, wants, and the lease, which are all tied to the client ID, are not affected by DESTROY_SESSION. What do people think? Is this reasonable? Is there some way we can get by without it? The other issue is that the spec says: DESTROY_SESSION MUST be invoked on a connection that is associated with the session being destroyed. So here I have a couple of questions: * Is there a good reason for this? Does it make sense? * Is this a big problem for the issue we are talking about? I believe the answers to both are "no", but I'm not sure. The problem I'm worried about is that if you are doing a DESTROY_SESSION to clean up from a non-responsive session, if you have to issue it on connections associated with that session. So the problem would be that the DESTROY_SESSION can only be sent when you don't need it to be sent, i.e. when the destination IP's are functioning, but if they are, you don't want to send it. There may be security issues but the aren't the security issues dealt with by the text that follows the text above. Would there really be any security issue allowing this on any connection that is associated with a session that is part of the same clientid as the session being destroyed? After all, you can do an EXCHANGE_ID with a new client verifier and get rid of all of all sessions as long as you can confirm it my creating one new session. Given that you can doing that why can't you destroy the session. So one possible way of getting around this would be to associate connections that are part of S2 (let's call them with C2A and C2B) with S1 just so they would be allowed in the context in which you need them. While the text talks about the security issues of making sure that we have the right server, it doesn't talk about how you might reject the BIND_CONN_TO_SESSION because it is for a connection that is for the wrong so_minor_id. Is the server obliged to reject this? If it isn't then it might have a connection associated with a different session, e.g. C2A with S1. If it allows that, is the server obliged to accept S1 requests (in general) on the same basis as it accepts S2 requests? Or can it treat these specially, i.e. slow for normal requests (because they are routed over a cluster interconnect to the node normally handling S2), but capable of using clustering knowledge to respond to DESTROY_SESSION even when the S1 node is not functioning? Is there anything in the spec that prevents this?
- [nfsv4] DESTROY_SESSION and clientid trunking Noveck_David
- Re: [nfsv4] DESTROY_SESSION and clientid trunking J. Bruce Fields
- Re: [nfsv4] DESTROY_SESSION and clientid trunking Noveck_David
- [nfsv4] What does v4.2 look like? Tom Haynes
- Re: [nfsv4] What does v4.2 look like? Noveck_David
- Re: [nfsv4] DESTROY_SESSION and clientid trunking J. Bruce Fields
- Re: [nfsv4] DESTROY_SESSION and clientid trunking Noveck_David
- Re: [nfsv4] DESTROY_SESSION and clientid trunking J. Bruce Fields