Re: [nfsv4] DESTROY_SESSION and clientid trunking

"J. Bruce Fields" <bfields@fieldses.org> Thu, 05 August 2010 16:20 UTC

Return-Path: <bfields@fieldses.org>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 60D223A69CE for <nfsv4@core3.amsl.com>; Thu, 5 Aug 2010 09:20:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XGNiS5Ht++Yj for <nfsv4@core3.amsl.com>; Thu, 5 Aug 2010 09:20:17 -0700 (PDT)
Received: from fieldses.org (fieldses.org [174.143.236.118]) by core3.amsl.com (Postfix) with ESMTP id 65DB63A6978 for <nfsv4@ietf.org>; Thu, 5 Aug 2010 09:20:17 -0700 (PDT)
Received: from bfields by fieldses.org with local (Exim 4.71) (envelope-from <bfields@fieldses.org>) id 1Oh3A9-0001od-OJ; Thu, 05 Aug 2010 12:19:21 -0400
Date: Thu, 05 Aug 2010 12:19:21 -0400
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Noveck_David@emc.com
Message-ID: <20100805161921.GE27141@fieldses.org>
References: <BF3BB6D12298F54B89C8DCC1E4073D8001D44311@CORPUSMX50A.corp.emc.com> <20100719214218.GE29058@fieldses.org> <BF3BB6D12298F54B89C8DCC1E4073D8001D44429@CORPUSMX50A.corp.emc.com> <20100720221023.GB15024@fieldses.org> <BF3BB6D12298F54B89C8DCC1E4073D8001F6BF97@CORPUSMX50A.corp.emc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <BF3BB6D12298F54B89C8DCC1E4073D8001F6BF97@CORPUSMX50A.corp.emc.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
Cc: nfsv4@ietf.org
Subject: Re: [nfsv4] DESTROY_SESSION and clientid trunking
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Aug 2010 16:20:18 -0000

On Mon, Jul 26, 2010 at 06:02:59PM -0400, Noveck_David@emc.com wrote:
> Leaving aside the case of read-only data, everybody needs true EOS,
> whether they are attentive enough to want it or not.  But v4.1 does not
> make true EOS with persistent sessions a mandatory feature, probably
> because of implementation difficulty.  
> 
> Now in our example of an unresponsive server-minor, if there is support
> to update the session reply cache on disk or on another backup piece of
> hardware, then we have EOS in this case.  It doesn't show up as
> persistent sessions but rather as connections becoming unresponsive and
> when they are re-established, they will wind up on an alternate piece of
> hardware with the correct reply cache being available, whether from disk
> or otherwise.
> 
> But just as we deal with servers that don't have persistent sessions in
> the server-major case, we should deal as well as we can, when we have
> the case of a server-minor which does not have the ability to store its
> reply cache information persistently.  I agree that people need EOS for
> this case, but we do have to deal with the case in which we don't have
> the needed infrastructure.  In the case, I'm thinking about a case in
> which we have unresponsive connections to a server-minor and we need to
> be able assure that operations previously started on these are either
> done or not and what is critical is that they are not done in the
> future.  

OK, so we could call the sort of failover you're talking about
cross-minor_id-failover, since it occurs between two servers with
different minor_id's (but the same major_id)?

And we're assuming a server implementation for which:

	- It is hard to implemented shared sessions (shared minor_id)
	- It is easy for one server to fence off another (meaning, it
	  can perform some operation on completion of which it knows no
	  further writes from the other server will hit the filesystem).
	- Full EOS across cross-minor_id-failover is not required.
	- Preventing out-of-order writes over such failover is.

With "hard" and "easy" defined however strongly necessary to justify the
exercise.  OK, makes sense, thanks for the patient explanation!

My concerns:

	- Just one data point: looking at the linux 4.1 server, I
	  haven't been personally worrying about making DESTROY_SESSION
	  wait for anything.  I don't consider the implementation done,
	  so that's fixable, but maybe it's some evidence that (given a
	  sufficiently dim implementor) readers of the current spec
	  won't be doing this, and hence it may be difficult to just
	  slip this in as a clarification of the semantics of
	  DESTROY_SESSION.
	- Your initial suggestion seemed to be that we would eventually
	  build on whatever solution we chose here as a way of providing
	  full EOS across this sort of failover.  I worry that the
	  result would no longer be significantly easier to implement
	  than shared sessions, so wonder if it would be worth the
	  additional protocol.

> You could do this explicitly  with an op that take a server-owner or you
> could to this at the session level.  It seems to me that doing that as
> part of DESTROY_SESSION makes sense, but you would have to deal with the
> current spec issues that make this difficult or impossible.  My feeling
> is DESTROY_SESSION needs to assure people that ops started under that
> session are not going to execute after the DESTROY_SESSION completes,

Right, so that part wasn't obvious to me: I assumed the DESTROY_SESSION
was the client's way of saying: "I give up; I no longer care what
happens to these operations; let's just drop everything and get on with
our lives".  Also, if we've lost contact with a disk it may be nice to
be able to destroy the state and let the client continue unmounting (or
whatever it's doing) without having to wait forever.  I don't know.

> and I would like to understand if there is a good reason for the current
> restriction as to what connection the op can be issued on.

Yeah, I did a little work on implementing that but honestly I never
understood the point either.  My first thought was that it was to ensure
that a rogue client wanting to destroy the session would be required to
do a tcp injection attack.  But surely it could just associate another
connection first instead.

--b.