Re: [nfsv4] re: re: NFS4ERR_ADMIN_REVOKE

Spencer Shepler <spencer.shepler@sun.com> Fri, 07 January 2005 21:31 UTC

Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id QAA00724 for <nfsv4-web-archive@ietf.org>; Fri, 7 Jan 2005 16:31:38 -0500 (EST)
Received: from megatron.ietf.org ([132.151.6.71]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1Cn1ur-0004wc-NI for nfsv4-web-archive@ietf.org; Fri, 07 Jan 2005 16:45:11 -0500
Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Cn1gj-0004Tk-Sr; Fri, 07 Jan 2005 16:30:29 -0500
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1Cn1fm-0003nN-6h for nfsv4@megatron.ietf.org; Fri, 07 Jan 2005 16:29:30 -0500
Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id QAA00378 for <nfsv4@ietf.org>; Fri, 7 Jan 2005 16:29:28 -0500 (EST)
Received: from nwkea-mail-1.sun.com ([192.18.42.13]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1Cn1sn-0004eg-CQ for nfsv4@ietf.org; Fri, 07 Jan 2005 16:43:00 -0500
Received: from centralmail1brm.Central.Sun.COM ([129.147.62.1]) by nwkea-mail-1.sun.com (8.12.10/8.12.9) with ESMTP id j07LTNiI009365 for <nfsv4@ietf.org>; Fri, 7 Jan 2005 13:29:23 -0800 (PST)
Received: from nfsclient.central.sun.com (nfsclient.Central.Sun.COM [129.153.128.2]) by centralmail1brm.Central.Sun.COM (8.12.10+Sun/8.12.10/ENSMAIL, v2.2) with ESMTP id j07LTNjW017671 for <nfsv4@ietf.org>; Fri, 7 Jan 2005 14:29:23 -0700 (MST)
Received: from nfsclient.central.sun.com (localhost [127.0.0.1]) by nfsclient.central.sun.com (8.13.2+Sun/8.13.1) with ESMTP id j07LTMQe101666 for <nfsv4@ietf.org>; Fri, 7 Jan 2005 15:29:22 -0600 (CST)
Received: (from shepler@localhost) by nfsclient.central.sun.com (8.13.2+Sun/8.13.2/Submit) id j07LTMSo101665 for nfsv4@ietf.org; Fri, 7 Jan 2005 15:29:22 -0600 (CST)
Date: Fri, 07 Jan 2005 15:29:22 -0600
From: Spencer Shepler <spencer.shepler@sun.com>
To: nfsv4@ietf.org
Subject: Re: [nfsv4] re: re: NFS4ERR_ADMIN_REVOKE
Message-ID: <20050107212922.GF101566@nfsclient.central.sun.com>
Mail-Followup-To: Spencer Shepler <spencer.shepler@sun.com>, nfsv4@ietf.org
References: <C98692FD98048C41885E0B0FACD9DFB840D218@exnane01.hq.netapp.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <C98692FD98048C41885E0B0FACD9DFB840D218@exnane01.hq.netapp.com>
User-Agent: Mutt/1.4.2.1i
X-Spam-Score: 0.0 (/)
X-Scan-Signature: ec7c6dab5a62df223002ae71b5179d41
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: spencer.shepler@sun.com
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
Sender: nfsv4-bounces@ietf.org
Errors-To: nfsv4-bounces@ietf.org
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 311e798ce51dbeacf5cdfcc8e9fda21b

On Fri, Noveck, Dave wrote:
> Spencer Shepler wrote.
> > It might help me if we split the question of returning NFS4ERR_EXPIRED
> > into two pieces: with and without the use of SETCLIENTID...
> 
> That makes sense.  I'll try my best to avoid splitting those into
> four sub-cases :-)
> 
> > So in the case of a network partition between client and server in
> > which the partition lasts longer than the lease period, the server
> > presumably has to return some error to the client.  It wouldn't allow
> > the client to continue using state associate with the now-expired
> >lease.  NFS4ERR_EXPIRED is the most appropriate.  Not sure what else
> > the server could do in this case.
> 
> As you note below, BAD_STATEID would give the client the message that
> his state-id is no longer usable.  It doesn't give the reason but 
> I'm not sure the reason is all that helpful to the client.  The
> fact that is important is that the stateid is no longer valid and
> it isn't clear what the client would do with the information
> that lease expiration was the reason.  In many cases, it is the
> only possible reason, though.

For a well behaving client and server, yes, it is the most likely cause.
I suppose that is the best viewpoint to take throughout this discussion
is that we have a client and server that are plainly broken.
The Solaris client will handle either error returns.  Recovery is started
upon receipt of the _EXPIRED error (SETCLIENTID sequence) and in the
case of _BAD_STATEID, the client will check to see if it is outside
of the lease period; if so, it will start the recovery sequence.
I believe the _BAD_STATEID handling was put into place because of
a server bug that existed for some period in early development and
was left behind. :-)

> The problem with EXPIRED is the one Rick mentioned, that there is
> no way to delimit when it is no longer needed and so you have a piece
> of state that there is no way to deallocate, at least within a client
> instance, and it is troubling to have something where there is a
> resource leak by design, even when the magnitude of leakage means
> that it is not a big issue in practice.  Note also that if the stateid's
> can't go away, neither can the owner, and so there is another thing 
> which leaks.
>
> By the way, one interesting side question about delimiting the 
> scope of EXPIRED concerns RELEASE_OWNER.  If I do a RELEASE_OWNER
> and all of the associated stateid's are EXPIRED, then I would say
> that this would go through, allow deallocation of those stateid's
> and the lockowner.  This leaves open stateid's and allowing CLOSE
> of those would allow us a way to avoid leakage if the client takes
> care to get rid of such stuff.
> 
> But, if you deallocate revoked state, then you don't face leakage
> issues, and the client still gets the message ("That state you just
> handed me is gone.  Live with it.") via the BAD_STATEID and life goes 
> on.

So the server can determine that it should return _BAD_STATEID.  If
that can be done, would it be possible to verify the structure of the
stateid and that is was provided during the current server instance
and return _EXPIRED instead (without holding all of the associated
state)?  

Not sure it really matters given that we seem to be agreeing that the
client should interpret the receipt of _BAD_STATEID outside of the
lease period as equivalent to _EXPIRED.

> Bias alert:  My approval of returning BAD_STATEID may have something
> to do the fact that that is what our server currently does.

Then the rest of the client implementations should take note. :-)

> > In the other case, use of SETCLIENTID, the client may still have some
> > inflight requests.  For example, the client may have received a
> > NFS4ERR_EXPIRED error on one request and started its recovery and sent
> > the SETCLIENTID whilst other requests were still outstanding (which
> > happen to use state from the previous client/server instantiation).
> 
> If you do setcl/setcl-cf with a new verifier and you have outstanding
> requests that refer to states within the state corpus of the previous
> client instance that this setcl/setcl-cf will trash, you need to deal
> with the consequences.  My inclination, if I had to write a client, 
> would to simplify things and drain that stuff before proceeding to 
> what is, from the v4 state point of view, the moral equivalent of a 
> client reboot.

Agreed and this is what the Solaris client does because it does make
it easier to deal with a lot of things if all of the requests are
"drained" first.

My comments were made in the context of the RFC in that it does not
dictate the client's implementation in this area.

> I want to leave aside expired stateid's for the moment.  When you issue
> the setcl/setcl-cf you don't know that all of locks, and thus the
> stateids are in the expired state, some may be fine.  For those non-
> expired stateids that are trashed by the new setcl/setcl-cf, the 
> client has to be prepared for BAD_STATEID.  You can't return EXPIRED
> and they are not valid.  They are bad stateid's and there is nothing
> the server can return but BAD_STATEID.
> 
> > It seems that the server will again need to return some error based on
> > checking the stateid and NFS4ERR_EXPIRED seems most appropriate.  I
> > suppose that NFS4ERR_BAD_STATEID may be appropriate but _EXPIRED is
> > friendlier.
> 
> Despite my resolution to avoid four sub-cases, I appear forced to it:
> 
> If you have setcl racing with other state referencing requests, then 
> you may have the request hit the server BEFORE or AFTER the setcl
> and it may encounter a state that was OK or REVOKED (due to lease
> expiration.  So, we have four cases:
> 
> 1) AFTER/OK
> 
>    It seems like the only thing that could be returned here is
> BAD_STATEID.
>    So the client has to be prepared for that case.
> 
> 2) AFTER/REVOKED
> 
>    I would return BAD_STATEID here, indicating that if there were any
> state
>    corresponding to that stateid, it has been trashed, at the clients
> request.
> 
>    Returning EXPIRED to indicate that the state which was trashed was
> revoked
>    at the time does not seem helpful to me.  You might describe it as
> friendlier, 
>    but I would consider it obsessively friendly, in giving you dubiously
>    helpful information you don't care about.
> 
> 3) BEFORE/OK
> 
>    State is OK and the operation goes through.
> 
> 4) BEFORE/REVOKED
> 
>    I agree that EXPIRED can be returned her, but I would argue that
> BAD_STATEID
>    is just as good.  The client knows that the requested operation did
> not 
>    happen and that when the setcl/setcl-cf completes, he has clean
> slate, statewise.
> 
> So the client has to be prepared for OK, EXPIRED and BAD_STATEID and it
> isn't
> clear what he would do different in the two error cases.

If BAD_STATEID is received within the lease period (or the client's
perception of it), it is difficult to determine what happened and the
best assumption is that this piece of state has been munged somehow
(either administratively or because of a broken implementation).

> > I agree that the RFC doesn't mandate the return of _EXPIRED but as
> > mentioned above, it seems most appropriate.
> 
> Not to me.  If the client issues a setcl/setcl-cf with a new verifier
> then it is asking for the state corpus associated with the same id and
> different verifier, to be trashed/eliminated.  All the stateids, locks,
> etc, become invalid.  You cannot issue a new lock request and have it
> conflict with a lock from the previous instance.  Stateids from that
> instance become invalid and trying to maintain some across the instance
> boundary seems wrong to me (and besides that I don't want to do the
> work to that unless it is *really* needed).

I agree that the client,, through implementation, can assure that
outstanding requests are cleared and will be able to deal with the
respective errors.  Outside of the lease period, it makes it easier to
see the _EXPIRED but not impossible to deal with (as the Solaris
implementation has shown).

> > Are you concerned about exhaustion of the stateid space and reuse such
> > that the server will be unable to meet a MUST statement about _EXPIRED
> > error returns?
> 
> That's certainly part of it.  The other part, to be honest, is that our
> server does not do this, and we've got enough work to do that we are
> reluctant to make changes unless they are either clearly required by
> the spec or are for other reasons something that clients truly need.
> 
> If clients would find EXPIRED helpful, I can see returning it on a best-
> effort basis, and only within the code of a single client setcl
> instance.  
> Keeping the EXPIRED state for a few lease times and then freeing it, 
> allowing any subsequent references to get BAD_STATEID, seems a
> reasonable 
> way of providing this more detailed error information to clients who
> want 
> it and are interested enough to obtain it within a reasonable time.

Seems reasonable.

Any other opinions?

Spencer

_______________________________________________
nfsv4 mailing list
nfsv4@ietf.org
https://www1.ietf.org/mailman/listinfo/nfsv4