RE: [nfsv4] re: re: NFS4ERR_ADMIN_REVOKE

Spencer Shepler wrote.
> It might help me if we split the question of returning NFS4ERR_EXPIRED
> into two pieces: with and without the use of SETCLIENTID...

That makes sense.  I'll try my best to avoid splitting those into
four sub-cases :-)

> So in the case of a network partition between client and server in
> which the partition lasts longer than the lease period, the server
> presumably has to return some error to the client.  It wouldn't allow
> the client to continue using state associate with the now-expired
>lease.  NFS4ERR_EXPIRED is the most appropriate.  Not sure what else
> the server could do in this case.

As you note below, BAD_STATEID would give the client the message that
his state-id is no longer usable.  It doesn't give the reason but 
I'm not sure the reason is all that helpful to the client.  The
fact that is important is that the stateid is no longer valid and
it isn't clear what the client would do with the information
that lease expiration was the reason.  In many cases, it is the
only possible reason, though.

The problem with EXPIRED is the one Rick mentioned, that there is
no way to delimit when it is no longer needed and so you have a piece
of state that there is no way to deallocate, at least within a client
instance, and it is troubling to have something where there is a
resource leak by design, even when the magnitude of leakage means
that it is not a big issue in practice.  Note also that if the stateid's
can't go away, neither can the owner, and so there is another thing 
which leaks.

By the way, one interesting side question about delimiting the 
scope of EXPIRED concerns RELEASE_OWNER.  If I do a RELEASE_OWNER
and all of the associated stateid's are EXPIRED, then I would say
that this would go through, allow deallocation of those stateid's
and the lockowner.  This leaves open stateid's and allowing CLOSE
of those would allow us a way to avoid leakage if the client takes
care to get rid of such stuff.

But, if you deallocate revoked state, then you don't face leakage
issues, and the client still gets the message ("That state you just
handed me is gone.  Live with it.") via the BAD_STATEID and life goes 
on.

Bias alert:  My approval of returning BAD_STATEID may have something
to do the fact that that is what our server currently does.

> In the other case, use of SETCLIENTID, the client may still have some
> inflight requests.  For example, the client may have received a
> NFS4ERR_EXPIRED error on one request and started its recovery and sent
> the SETCLIENTID whilst other requests were still outstanding (which
> happen to use state from the previous client/server instantiation).

If you do setcl/setcl-cf with a new verifier and you have outstanding
requests that refer to states within the state corpus of the previous
client instance that this setcl/setcl-cf will trash, you need to deal
with the consequences.  My inclination, if I had to write a client, 
would to simplify things and drain that stuff before proceeding to 
what is, from the v4 state point of view, the moral equivalent of a 
client reboot.

I want to leave aside expired stateid's for the moment.  When you issue
the setcl/setcl-cf you don't know that all of locks, and thus the
stateids are in the expired state, some may be fine.  For those non-
expired stateids that are trashed by the new setcl/setcl-cf, the 
client has to be prepared for BAD_STATEID.  You can't return EXPIRED
and they are not valid.  They are bad stateid's and there is nothing
the server can return but BAD_STATEID.

> It seems that the server will again need to return some error based on
> checking the stateid and NFS4ERR_EXPIRED seems most appropriate.  I
> suppose that NFS4ERR_BAD_STATEID may be appropriate but _EXPIRED is
> friendlier.

Despite my resolution to avoid four sub-cases, I appear forced to it:

If you have setcl racing with other state referencing requests, then 
you may have the request hit the server BEFORE or AFTER the setcl
and it may encounter a state that was OK or REVOKED (due to lease
expiration.  So, we have four cases:

1) AFTER/OK

   It seems like the only thing that could be returned here is
BAD_STATEID.
   So the client has to be prepared for that case.

2) AFTER/REVOKED

   I would return BAD_STATEID here, indicating that if there were any
state
   corresponding to that stateid, it has been trashed, at the clients
request.

   Returning EXPIRED to indicate that the state which was trashed was
revoked
   at the time does not seem helpful to me.  You might describe it as
friendlier, 
   but I would consider it obsessively friendly, in giving you dubiously
   helpful information you don't care about.

3) BEFORE/OK

   State is OK and the operation goes through.

4) BEFORE/REVOKED

   I agree that EXPIRED can be returned her, but I would argue that
BAD_STATEID
   is just as good.  The client knows that the requested operation did
not 
   happen and that when the setcl/setcl-cf completes, he has clean
slate, statewise.

So the client has to be prepared for OK, EXPIRED and BAD_STATEID and it
isn't
clear what he would do different in the two error cases.

> I agree that the RFC doesn't mandate the return of _EXPIRED but as
> mentioned above, it seems most appropriate.

Not to me.  If the client issues a setcl/setcl-cf with a new verifier
then it is asking for the state corpus associated with the same id and
different verifier, to be trashed/eliminated.  All the stateids, locks,
etc, become invalid.  You cannot issue a new lock request and have it
conflict with a lock from the previous instance.  Stateids from that
instance become invalid and trying to maintain some across the instance
boundary seems wrong to me (and besides that I don't want to do the
work to that unless it is *really* needed).

> Are you concerned about exhaustion of the stateid space and reuse such
> that the server will be unable to meet a MUST statement about _EXPIRED
> error returns?

That's certainly part of it.  The other part, to be honest, is that our
server does not do this, and we've got enough work to do that we are
reluctant to make changes unless they are either clearly required by
the spec or are for other reasons something that clients truly need.

If clients would find EXPIRED helpful, I can see returning it on a best-
effort basis, and only within the code of a single client setcl
instance.  
Keeping the EXPIRED state for a few lease times and then freeing it, 
allowing any subsequent references to get BAD_STATEID, seems a
reasonable 
way of providing this more detailed error information to clients who
want 
it and are interested enough to obtain it within a reasonable time.

On Thu, Noveck, Dave wrote:
> I was just reviewing the ADMIN_REVOKED mail in conection with dealing
with
> the case of a delegation which is not returned when recalled while the
> client continues to renew its lease.  This isn't exactly
administrative
> action in the sense of a administrator deciding to do something but
since
> we will be getting rid of the delegation while the client has a valid
> (non-expired) lease, it is the closest fit we could find.
> 
> I'll probably be sending more mail on that subject when I get some of
my
> thoughts/questions together but for now I noticed the following:
> 
> rick@snowhite.cis.uoguelph.ca wrote:
> > > I also allow the SetClientID/confirm to lift the embargo, although
I am
> > > still on the fence as to whether or not that should be the case.
> 
> Spencer Shepler wrote:
> > The server has to return NFS4ERR_EXPIRED to old state from the
client
> > regardless of how long the client keeps sending it (unless the
server
> > eventually reboots).  Even if the client does a
> > SETCLIENTID/SETCLIENTID_CONFIRM and is confused enough to send old
> > state requests, those old state requests still have to receive
> > NFS4ERR_EXPIRED.
> 
> Why?
> 
> This is two questions I guess:
> 
>      What purpose is served in doing this?
> 
>      Where does the spec say that this has to be done?
> 
> I looking for answers to either question (or both).
> 
> I have a real problem with keeping state around forever that will
never be
> used.  If the client rebooting once will not clear it then no number
of client 
> reboots will do it and we will wind up keeping revoked state that the
client
> lost interest in months ago.  To what purpose?
> 
> Also, do you mean this requirement to apply just to cases of
NFS4ERR_EXPIRED
> that happen after admin revocation or to cases in which
NFS4ERR_EXPIRED
> is the result of lease expiration was well.  If the latter, it is very
likely
> that a common sequence with a buggy client kernel (not necessarily the
v4
> client code having the bug):
> 
>      Client comes up and does setclientid
> 
>      Client gets a bunch of locks/opens
> 
>      Client dies
>    
>      Lease expires
> 
>      Client reboots
> 
>      Repeat until the damn bug gets fixed
> 
> would result in lot of state being saved to distnguish states expired
N
> client invocations ago (EXPIRED) from just random stateid
(BAD_STATED).
> Is the confused client using obsolete stateids going to be any better 
> off if the ones that corresponded to expired obsolete stateids
returned
> EXPIRED in order to justify the work of maintaining that sort of 
> archival information?
> 
> 
> -----Original Message-----
> From: Spencer Shepler [mailto:spencer.shepler@sun.com]
> Sent: Friday, October 01, 2004 6:24 PM
> To: nfsv4@ietf.org
> Subject: Re: [nfsv4] re: re: NFS4ERR_ADMIN_REVOKE
> 
> 
> On Mon, rick@snowhite.cis.uoguelph.ca wrote:
> > > We use the NFS4ERR_ADMIN_REVOKED error for indicating when state 
> > > [clientid, stateid (either open,lock, or delegation)] has been
revoked and 
> > > will no longer be accepted by the server.
> > 
> > Yep. My main concern in this area was "how permanent" this has to
be. Which
> > you've addressed later. (Put another way when/if the
NFS4ERR_ADMIN_REVOKED
> > can be replaced by NFS4ERR_EXPIRED.)
> 
> Clientid doesn't apply and I think that is understood in that if the
> clientid is revoked then NFS4ERR_EXPIRED should be returned, not
> NFS4ERR_ADMIN_REVOKED.
> 
> > > Once the client as a whole has been marked as revoked, we do not
do any 
> > > renewing of it.
> > > Once the client's state structures are reclaimed (which may be
some time 
> > > after it has really "expired"), then the client would get back the

> > > NFS4ERR_EXPIRED error.
> > 
> > This sounds exactly like what my current code will do. (I, also,
don't
> > expire the client until sometime after the expiry.)
> > 
> > > The only way for the client to get past this client wide revoke is
to then 
> > > issue a SETCLIENTID/SETCLIENTID_CONFIRM sequence. This will purge
all 
> > > revoked state from the client and it can begin anew. Do you do
something 
> > > like this OR do you make the revoke permanent? That is, require 
> > > administrative intervention to lift the embargo on the bad client
?
> > 
> > I also allow the SetClientID/confirm to lift the embargo, although I
am
> > still on the fence as to whether or not that should be the case.
> 
> The server has to return NFS4ERR_EXPIRED to old state from the client
> regardless of how long the client keeps sending it (unless the server
> eventually reboots).  Even if the client does a
> SETCLIENTID/SETCLIENTID_CONFIRM and is confused enough to send old
> state requests, those old state requests still have to receive
> NFS4ERR_EXPIRED.
> 
> > > We do not tie the revokes to a specific open owner. The main
reason being 
> > > is that the server hands out clientids and stateids, but not "open

> > > owners". The server only revokes things it hands out.
> > 
> > This sounds like the only place where our current code differs. I
used
> > the open/lockowner since it was explicitly referred to in the RFC.
(My
> > assumption was that the author was thinking that an open/lockowner
equates
> > to a client process/task that went south without releasing the state
as
> > it should.) btw, when I say "revoke an openowner" I mean that all
stateids
> > for all opens and lockowners associated with that openowner will be
revoked
> > and use of all those stateids will get NFS4ERR_ADMIN_REVOKED.
> > 
> > My current code could easily be changed to revoke opens instead of
openowners.
> > 
> > Do others see this as an issue? (ie. Does it matter w.r.t the
protocol whether
> > the openowner with all associated opens or the individual opens,
gets revoked?)
> 
> Administratively, I would expect that files or shares/exports are the
> unit of revocation.  The admin is unlikely to care about openowners
> unless there seems to something broken with the NFSv4 client and it
> that case the entire client might as well be revoked.
> 
> Spencer
> 
> 
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www1.ietf.org/mailman/listinfo/nfsv4
> 

_______________________________________________
nfsv4 mailing list
nfsv4@ietf.org
https://www1.ietf.org/mailman/listinfo/nfsv4

_______________________________________________
nfsv4 mailing list
nfsv4@ietf.org
https://www1.ietf.org/mailman/listinfo/nfsv4