RE: [nfsv4] Directory delegations, take 2

"Noveck, Dave" <Dave.Noveck@netapp.com> Wed, 22 October 2003 14:40 UTC

Received: from optimus.ietf.org (ietf.org [132.151.1.19] (may be forged)) by ietf.org (8.9.1a/8.9.1a) with ESMTP id KAA12404 for <nfsv4-archive@odin.ietf.org>; Wed, 22 Oct 2003 10:40:28 -0400 (EDT)
Received: from localhost.localdomain ([127.0.0.1] helo=www1.ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 1ACK9f-00086C-Pu for nfsv4-archive@odin.ietf.org; Wed, 22 Oct 2003 10:40:08 -0400
Received: (from exim@localhost) by www1.ietf.org (8.12.8/8.12.8/Submit) id h9MEe7PC031105 for nfsv4-archive@odin.ietf.org; Wed, 22 Oct 2003 10:40:07 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 1ACK9f-00085c-06 for nfsv4-web-archive@optimus.ietf.org; Wed, 22 Oct 2003 10:40:07 -0400
Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id KAA12401 for <nfsv4-web-archive@ietf.org>; Wed, 22 Oct 2003 10:39:55 -0400 (EDT)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 1ACK9c-0006Bn-00 for nfsv4-web-archive@ietf.org; Wed, 22 Oct 2003 10:40:04 -0400
Received: from ietf.org ([132.151.1.19] helo=optimus.ietf.org) by ietf-mx with esmtp (Exim 4.12) id 1ACK9c-0006Bj-00 for nfsv4-web-archive@ietf.org; Wed, 22 Oct 2003 10:40:04 -0400
Received: from localhost.localdomain ([127.0.0.1] helo=www1.ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 1ACK9Z-00083r-Vw; Wed, 22 Oct 2003 10:40:01 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 1ACK8j-0007vx-Ex for nfsv4@optimus.ietf.org; Wed, 22 Oct 2003 10:39:09 -0400
Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id KAA12387 for <nfsv4@ietf.org>; Wed, 22 Oct 2003 10:38:57 -0400 (EDT)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 1ACK8g-0006BP-00 for nfsv4@ietf.org; Wed, 22 Oct 2003 10:39:06 -0400
Received: from mx01.netapp.com ([198.95.226.53]) by ietf-mx with esmtp (Exim 4.12) id 1ACK8g-0006As-00 for nfsv4@ietf.org; Wed, 22 Oct 2003 10:39:06 -0400
Received: from hawk.corp.netapp.com (hawk [10.10.20.101]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id h9MEcT4Z002309; Wed, 22 Oct 2003 07:38:29 -0700 (PDT)
Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by hawk.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id h9MEcSij028583; Wed, 22 Oct 2003 07:38:29 -0700 (PDT)
content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0
Subject: RE: [nfsv4] Directory delegations, take 2
Message-ID: <C8CF60CFC4D8A74E9945E32CF096548A6D358E@silver.nane.netapp.com>
Thread-Topic: [nfsv4] Directory delegations, take 2
Thread-Index: AcOYFAcW73GPoYeSQ4yyuRhyFBDAzAAhF5lg
From: "Noveck, Dave" <Dave.Noveck@netapp.com>
To: David.Robinson@Sun.COM, nfsv4@ietf.org
Content-Transfer-Encoding: quoted-printable
Sender: nfsv4-admin@ietf.org
Errors-To: nfsv4-admin@ietf.org
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.0.12
Precedence: bulk
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
List-Archive: <https://www1.ietf.org/mail-archive/working-groups/nfsv4/>
X-Original-Date: Wed, 22 Oct 2003 07:38:24 -0700
Date: Wed, 22 Oct 2003 07:38:24 -0700
Content-Transfer-Encoding: quoted-printable
Content-Transfer-Encoding: quoted-printable

 
> The purist in me would love to have directory delegations
> be symmetric with file delegations, but I grant that the
> complexity and recovery semantics of revoking a write
> delegation can get really ugly.

The purist in me would like that symmetry as well and we're 
not foreclosing it in some 4.x for x > 1, but I can't see
us dealing with the all the issues in any reasonable 4.1
time frame.

> I will claim that one of the set of requirements that we are
> trying to solve is the ability of a client to cache the
> contents of a directory (the names and file handles) in
> order to efficiently operate on part of the namespace tree.

> I will argue that the ability for a client to modify a
> directory without going to the server is not a requirement.
> For all the arguments laid out by Dave, it will be difficult
> to handle not only the error semantics, but it would imply that
> the client should have a reasonable ability to determine what
> the new cookies should be, otherwise we take the rare problem
> of invalid cookies and make it the common case.

> I would support read-only delegations of directory contents.
> All directory modifying operations MUST we written through to
> the server and any client having a delegation MUST be notified.
> In the unlucky event that a client is unreachable, the effect is
> that directory entries created or deleted by another client
> may go undetected.  But I will claim that we have this problem
> today in NFSv3 with normal client DNLC's and attribute caching,
> this just opens the window a bit wider in case of a lost revocation.

The other issue is that the client doing the modification must
wait for the unresponsive client, but since, as you point out,
the effects of giving up on the notification, and allowing the
update to proceed without it are not dire, we can be aggressive 
in allowing that to happen, and so the client will not have to
wait a very long time.

> I agree that the delegation recall (effectively change notification)
> should be synchronous. It should not be a large burden on the
> client as by definition there is nothing that will need to be
> written back, only a flush of cached entries.  It even leaves open the
> option for client to ignore the recall and continue to run out
> of their caches knowing that there may be an inconsistency. Seat belts?
> Don't need no stinking seat belts! To advocate async recalls will
> prohibit an interesting class of clients that need to know
> about changes.

If the client is going to ignore the recall, why he is he bothering
to get the delegation in the first place?  It seems a shame to order
those special seat-belts if you're not going to use them.  With a person,
it would probably be that he wasn't drunk when he ordered them, but 
does that apply to v4 clients?  Sometimes, at bakeathon, it seems like
my server has had few beers too many.  

> > Tom Talpey (private e-mail) has raised the issue of whether change notifica-
> > tion is worth it at all, and whether instead you just have a recall/revoca-
> > tion event and let the client just get his delegation again and refetch the 
> > modified directory contents.  This does make the feature easier to spec 
> > (e.g. There is a tough case of sequencing notifications and successive 
> > READDIR's when fetching a big directory, as well as a more complicated set 
> > of callbacks to define).  However, I worry about very large directories and 
> > the effort of refetching, when there is a modest level of exogenous
> > directory change.  What do other people think?  I think I'm going to go 
> > forward trying to do the notifications, unless it turns out  that the 
> > complexities make this too difficult to do for v4.1. 

> Assuming that as part of the addition of directory delegations, we
> add in the concept of reacquiring a delegation, I see no
> significant difference between a recall and a change notification.
> Both must occur on a directory modifying event, I can see an
> argument that a change notification can be made on a per file basis
> versus a delegation on a per directory basis. Because the order,
> cookies, and contents can have a ripple effect through the
> directory I am not sure that a client can safely do anything other
> than reread the directory. I am explicitly not addressing attribute
> changes here, more on that later.

The significant difference is that you don't have to re-read the
directory, in most (almost all?) cases, and when directories are
large (and many are) that is a big deal indeed.

You say that it might be the case that a single change could have a
ripple effect (order, cookies, etc.) and that is possible but not
very common.  Such a thing would cause clients doing multiple
READDIR's to get the whole directory to lose their mind.  The spec
tells servers to try not to do this, and I suspect most servers don't.

I agree that you do need the option of recall if you are in a situation
where the server for whatever reason has changed so much stuff that
re-READDIR is the best thing.  However, in most cases, something simple
has happened and the server knows that.  When a file is created and nothing
else changes in the directory, it makes sense for the server, which
knows that to tell the client, so he can get rid of is negative dnlc
entry, without rereading the whole directory, simply because some
other changes might have happened, when it in fact they didn't.

There is still room for argument about whether the benefits justify the 
complexity costs of notifications but there clearly is benefit, which
is pretty significant in many cases.

> > One issue that Saadia Khan raised is the issue of directory changes made 
> > by the delegate himself.  I think we have to make clear that this is 
> > allowed and the client is presumed to know about directory changes it 
> > makes itself.  Doing otherwise would compromise the usefulness of 
> > directory delegations in the case in which a single client is modifying 
> > the directory.  My assumption is that it just too difficult to do write 
> > directory delegation, but exclusive use is still a very important case, 
> > and we should what we can to make read directory delegations useful in 
> > the exclusive use environment.  

> I find this contradictory with your previous arguments against write
> delegations. All the issues of revocation and unresponsive clients
> exists in this scenerio as well.  The only way to make this sort of
> xclusive access work is to make irrevokable delegations which will
> suffer the problem of stale locks blocking any directory access.
> You must require all modifying directory operations to write through.

Yes, all modifying directory operations must be written through.  We
agree on that.  The question is whether the client who did that write
loses his delegation (or gets a notification).  I'm saying he doesn't.
He is presumed to know about changes he makes himself (by getting
the reply) and he retains the delegation and thus the ability to be
notified (whether through change notifications or recall) about 
changes made by other clients.

> On the argument against a separate attribute delegation it appears
> to come down primarily to that of performance. One recent a-ha that
> I had with respect to the old READDIRPLUS as translated to v4,
> the straight forward implementation of requesting the desired
> attributes as part of the READDIR operation may not be the most
> efficient mechanism to process from a client perspective. With
> the availablity of COMPOUND, the client can alternatively issue
> a READDIR with no requested attributes (trivial decode) and then
> issue a single COMPOUND of LOOKUP's for all the names returned.
> (Alternatively request just the nfs_fh4 and COMPOUND GETATTRs).

Clearly you could do this and the performance might not be terrible
but you seem to claim ("the READDIR operation may not be the most
efficient mechanism to process from a client perspective") that
this can perform better or at least equal to getting the attributes
with READDIR and I don't see how that possibly could be true.  
Besides the latency issue of two round-trips vs. one, the client is
going to have to process the directory and turn it into a large
compound and that is going to take time.  The client would also
have to deal with the fact that the directory could change before
the second compound, and one of the LOOKUP's/GETATTR's could fail,
breaking the chain, and requiring another COMPOUND be issued.

If there is going to be a proposal to get rid of READDIR in v4.1 
(or more accurately make it mandatory to not implement) and add a 
simplified version (READDIRMINUS?), I'm definitely against it.

> With this sort of approach, if the client wanted to gather
> delegations of attributes it could just COMPOUND the delegation
> requests with the GETATTRS, likewise revocation could also be
> COMPOUND'd. From a performance perspective this seems trivially
> more work than doing it on a per directory basis.

I think this depends on your definition of "trivially".  Clearly 
this is possible, but it can get expensive.  Big compounds have a
lot of interpretation overhead.  Doing a thousand ops is not as
bad as doing a thousand requests but it takes more time than doing
a single op.

For me the big issue is that client and server have to keep track
of a lot of separate stateid's for each of these files.  In 
environments where you have tons of files that change rarely, 
I'm worried about the overhead of adding all that state management.
It seems better to me to aggregate it so that it is more manageable.

> More importantly, unless we want to ban hard links, or delegation of
> directories that contain files with hard links (something I would
> probably support notwithstanding Posix), determing which directory
> delegations need to get revoked due to a SETATTR on a hard linked
> file is incredibly hard to implement.

First point, even it were the case, that determining which delegations 
to recall was very hard in the presence of hard links, you would
not have to ban such delegations.  Delegations, as I see it, are
optional for the server, and if that server would have trouble, 
providing the requisite guarantees for a particular directory, such
as one with files that had hard links, then it is perfectly free
to return some sort of NFS4ERR_SORRY_NO_CAN_DO.

As to the difficulty of this, let me turn your COMPOUND argument
back on you.  You are say that if I send back information on each of
the files and that the client asks for a delegation for each,
that is OK, i.e. not so hard to implement.  On the other hand, if
the server in response to a directory delegation does exactly the
same operations it would have done had the client requested all
those individual delegations (the only difference is that the
client doesn't have to do individual delegation requests for files
and that he only has a single delegation stateid instead of one
for each file), the task of notifying the clients on a SETATTR of
a hard-linked file has become difficult.  Why?  I can't see
anything that would make this harder for the server.  On the other 
hand, in many cases, such as where there are no or few hard links 
in the directory, it might be considerably easier.  The point is that
with the semantic of attributes being guranteed by the directory 
delegation, the server can take adavantage of those common, easier
cases, to do the simpler thing, if that is available to him. 

> I have not thought of how to spec this in a protocol, but Mike's
> idea's of OPENDIR/CLOSEDIR looks interesting...

> While I can see how we could nicely add in symlink delegation,
> I am not sure that is a "problem" that needs to be solved.

Since Carl is the one with a need for this, I'll leave this to him.




_______________________________________________
nfsv4 mailing list
nfsv4@ietf.org
https://www1.ietf.org/mailman/listinfo/nfsv4