Re: [nfsv4] DESTROY_SESSION and clientid trunking

"J. Bruce Fields" <bfields@fieldses.org> Mon, 19 July 2010 21:42 UTC

Return-Path: <bfields@fieldses.org>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 4BE6F3A68A4 for <nfsv4@core3.amsl.com>; Mon, 19 Jul 2010 14:42:43 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6Vz4Xi3SvWfC for <nfsv4@core3.amsl.com>; Mon, 19 Jul 2010 14:42:41 -0700 (PDT)
Received: from fieldses.org (fieldses.org [174.143.236.118]) by core3.amsl.com (Postfix) with ESMTP id BBADF3A697D for <nfsv4@ietf.org>; Mon, 19 Jul 2010 14:42:41 -0700 (PDT)
Received: from bfields by fieldses.org with local (Exim 4.71) (envelope-from <bfields@fieldses.org>) id 1Oay6M-0007uZ-DZ; Mon, 19 Jul 2010 17:42:18 -0400
Date: Mon, 19 Jul 2010 17:42:18 -0400
To: Noveck_David@emc.com
Message-ID: <20100719214218.GE29058@fieldses.org>
References: <BF3BB6D12298F54B89C8DCC1E4073D8001D44311@CORPUSMX50A.corp.emc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <BF3BB6D12298F54B89C8DCC1E4073D8001D44311@CORPUSMX50A.corp.emc.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
From: "J. Bruce Fields" <bfields@fieldses.org>
Cc: nfsv4@ietf.org
Subject: Re: [nfsv4] DESTROY_SESSION and clientid trunking
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 19 Jul 2010 21:42:43 -0000

On Mon, Jul 19, 2010 at 05:22:01PM -0400, Noveck_David@emc.com wrote:
> This mail concerns the use of DESTROY_SESSION in the context of clientid
> trunking.  Specifically, there are going to be cases in which a client
> will encounter a situation in which connections to one session are
> unresponsive, whether due to client-side networking issues, a wire being
> cut, server-side networking issues, or a node of a clustered server
> being down.  No matter what the cause the client will need to transfer
> work from one session to another.  For simplicity, let's assume we have
> clientid CL and with two sessions S1 and S2 and that S1 becomes
> unresponsive making it necessary to transfer work so that everything for
> the client is being done on a single session S2.

I don't get it--isn't it the *connections* associated with the session
that become unresponsive, not the session itself?

If you wanted EOS across sessions, then it seems to me that you'll end
up creating another session abstraction on top of sessions.

And just fixing whatever bug is causing the session "unresponsiveness"
would seem simpler than the infinite regress....  But I probably don't
understand what you mean: how could a session become unresponsive?

--b.

> 
> The problem I'd like to consider is that of modifying-idempotent
> requests such as WRITE.  The question of non-idempotent requests such as
> RENAME is harder and probably requires some work in v4.2, but the
> simpler case is important since WRITEs are typically much more common.
> 
> So the issue is that if you have a request issued on S1 and we'll use
> WRITE throughout as the example, and you get no response, then you would
> like to issue the WRITE on the other session.  Since WRITEs are, in the
> strict sense, idempotent, you don't have a problem if two WRITEs are
> issued but what is a problem is if there were a WRITE issued on S1, and
> it was just lazily hanging around somewhere, and then if you did the
> WRITE on S2, and it completed then if the WRITE on S1 were to spring
> back to life, you have the likelihood of data corruption.
> 
> What you need is the assurance that the first WRITE (the one on S1),
> either succeed or failed but is not sitting around waiting to be done at
> a later point.  You want to draw a line under all requests initiated as
> part of S1.  I'm assuming the right way to do that is to destroy S1.
> Does anybody have any other ways to do that?  Things like closing the
> file will assure that the writes are done but they have unsatisfactory
> locking issues in that someone can open the file with a deny-mode and
> you might not be allowed to open it again.
> 
> However, there are some problems in the way that DESTROY_SESSION is
> specified and I think they need to be addressed, whether as errata in
> the v4.1 context or in the v4.2 context.
> 
> The first problem is that you would expect to have such a guarantee
> specified in the definition of DESTROY_SESSION and it isn't there.  You
> could argue that since it specifies that the reply information is
> dropped, it would be bad for requests to terminate in a situation where
> there is session reply information to update, but that isn't ironclad.
> I think it makes sense for the initial paragraph to be changed to
> something like this:
> 
>    The DESTROY_SESSION operation closes the session and discards the
>    session's reply cache, if any.  All pending operations initiated on 
>    the session are terminated and no additional ones can be started.
>    Thus, the requester is assured, once it receives the response, that
>    no operations initiated on the session will modify any file data 
>    or attributes or locking state associated with the client.
> 
>    Any remaining connections associated with the session are immediately
> 
>    disassociated.  If the connection has no remaining associated
> sessions, 
>    the connection MAY be closed by the server.  Locks, delegations,
> layouts, 
>    wants, and the lease, which are all tied to the client ID, are not 
>    affected by DESTROY_SESSION.
> 
> What do people think?  Is this reasonable?  Is there some way we can get
> by without it?
> 
> The other issue is that the spec says:
> 
>    DESTROY_SESSION MUST be invoked on a connection that is associated
>    with the session being destroyed.
> 
> So here I have a couple of questions:
> *	Is there a good reason for this?  Does it make sense?
> *	Is this a big problem for the issue we are talking about?
> 
> I believe the answers to both are "no", but I'm not sure.
> 
> The problem I'm worried about is that if you are doing a DESTROY_SESSION
> to clean up from a non-responsive session, if you have to issue it on
> connections associated with that session.  So the problem would be that
> the DESTROY_SESSION can only be sent when you don't need it to be sent,
> i.e. when the destination IP's are functioning, but if they are, you
> don't want to send it.
> 
> There may be security issues but the aren't the security issues dealt
> with by the text that follows the text above.  Would there really be any
> security issue allowing this on any connection that is associated with a
> session that is part of the same clientid as the session being
> destroyed?  After all, you can do an EXCHANGE_ID with a new client
> verifier and get rid of all of all sessions as long as you can confirm
> it my creating one new session.  Given that you can doing that why can't
> you destroy the session.
> 
> So one possible way of getting around this would be to associate
> connections that are part of S2 (let's call them with C2A and C2B) with
> S1 just so they would be allowed in the context in which you need them.
> While the text talks about the security issues of making sure that we
> have the right server, it doesn't talk about how you might reject the
> BIND_CONN_TO_SESSION because it is for a connection that is for the
> wrong so_minor_id.  Is the server obliged to reject this?  If it isn't
> then it might have a connection associated with a different session,
> e.g. C2A with S1.  If it allows that, is the server obliged to accept S1
> requests (in general) on the same basis as it accepts S2 requests?  Or
> can it treat these specially, i.e. slow for normal requests (because
> they are routed over a cluster interconnect to the node normally
> handling S2), but capable of using clustering knowledge to respond to
> DESTROY_SESSION even when the S1 node is not functioning?
> 
> Is there anything in the spec that prevents this?  
> 
> 
> 
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4