Re: [nfsv4] Directory delegations, take 2

Carl Burnett <cburnett@us.ibm.com> Mon, 20 October 2003 12:55 UTC

Received: from optimus.ietf.org (ietf.org [132.151.1.19] (may be forged)) by ietf.org (8.9.1a/8.9.1a) with ESMTP id IAA24322 for <nfsv4-archive@odin.ietf.org>; Mon, 20 Oct 2003 08:55:36 -0400 (EDT)
Received: from localhost.localdomain ([127.0.0.1] helo=www1.ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 1ABZZ4-0008NN-7k for nfsv4-archive@odin.ietf.org; Mon, 20 Oct 2003 08:55:15 -0400
Received: (from exim@localhost) by www1.ietf.org (8.12.8/8.12.8/Submit) id h9KCtEZ3032196 for nfsv4-archive@odin.ietf.org; Mon, 20 Oct 2003 08:55:14 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 1ABZZ3-0008ND-VO for nfsv4-web-archive@optimus.ietf.org; Mon, 20 Oct 2003 08:55:14 -0400
Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id IAA24316 for <nfsv4-web-archive@ietf.org>; Mon, 20 Oct 2003 08:55:04 -0400 (EDT)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 1ABZZ2-0007fi-00 for nfsv4-web-archive@ietf.org; Mon, 20 Oct 2003 08:55:12 -0400
Received: from ietf.org ([132.151.1.19] helo=optimus.ietf.org) by ietf-mx with esmtp (Exim 4.12) id 1ABZZ2-0007fe-00 for nfsv4-web-archive@ietf.org; Mon, 20 Oct 2003 08:55:12 -0400
Received: from localhost.localdomain ([127.0.0.1] helo=www1.ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 1ABZYs-0008Kj-Cc; Mon, 20 Oct 2003 08:55:02 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 1ABZY9-0008Cs-Nv for nfsv4@optimus.ietf.org; Mon, 20 Oct 2003 08:54:17 -0400
Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id IAA24279 for <nfsv4@ietf.org>; Mon, 20 Oct 2003 08:54:08 -0400 (EDT)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 1ABZY8-0007fF-00 for nfsv4@ietf.org; Mon, 20 Oct 2003 08:54:16 -0400
Received: from e35.co.us.ibm.com ([32.97.110.133]) by ietf-mx with esmtp (Exim 4.12) id 1ABZY6-0007ew-00 for nfsv4@ietf.org; Mon, 20 Oct 2003 08:54:15 -0400
Received: from westrelay04.boulder.ibm.com (westrelay04.boulder.ibm.com [9.17.193.32]) by e35.co.us.ibm.com (8.12.10/8.12.2) with ESMTP id h9KCrgIl125332; Mon, 20 Oct 2003 08:53:42 -0400
Received: from d03nm130.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.193.82]) by westrelay04.boulder.ibm.com (8.12.9/NCO/VER6.6) with ESMTP id h9KCrZRm090650; Mon, 20 Oct 2003 06:53:41 -0600
To: "Noveck, Dave" <Dave.Noveck@netapp.com>
Cc: nfsv4@ietf.org
MIME-Version: 1.0
Subject: Re: [nfsv4] Directory delegations, take 2
X-Mailer: Lotus Notes Release 5.0.7 March 21, 2001
Message-ID: <OF121DC528.C89BE01A-ON87256DC5.00443F86@us.ibm.com>
From: Carl Burnett <cburnett@us.ibm.com>
X-MIMETrack: Serialize by Router on D03NM130/03/M/IBM(Release 6.0.2CF2|July 23, 2003) at 10/20/2003 06:53:41, Serialize complete at 10/20/2003 06:53:41
Content-Type: text/plain; charset="us-ascii"
Sender: nfsv4-admin@ietf.org
Errors-To: nfsv4-admin@ietf.org
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.0.12
Precedence: bulk
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
List-Archive: <https://www1.ietf.org/mail-archive/working-groups/nfsv4/>
X-Original-Date: Mon, 20 Oct 2003 07:53:36 -0500
Date: Mon, 20 Oct 2003 07:53:36 -0500

Look below. My remarks are within [Carl: .... ].

Thanks,
Carl

Carl Burnett
AIX Kernel Architecture - Network File System
(512) 838-8498, TL 678-8498
(please reply to cburnett@us.ibm.com)





"Noveck, Dave" <Dave.Noveck@netapp.com>
Sent by: nfsv4-admin@ietf.org
10/17/2003 06:55 PM

 
        To:     <nfsv4@ietf.org>
        cc: 
        Subject:        [nfsv4] Directory delegations, take 2



So this is my attempt to update the directory delegation approach 
to reflect comments I have recently received.  In some cases, I've 
changed things and in some cases I've simply tried to explain things 
better.  I'm not going to do extensive quoting of comments I've 
received, but I hope my summaries are not misrepresenting anyone. 
The first thing that has been mentioned, by David Robinson I believe, 
is directory write delegation.  I'd like to explain why I've stayed 
away from directory write delegation. 
Directory write delegations scare the hell out of me.  There are two 
important practical issues.  First, you have the problem of an 
unresponsive client stopping things on the server from proceeding. 
It is true that this problem already exists with write delegations 
for files, but the problem is much more likely if you have a whole 
directory such that any lookup through it will cause people to wait 
for a long time, while we decide whether the client holding the write 
delegation is ever going to respond.  The second issue, which exacerbates 
the first is that the effect of revoking a write directory delegation 
is liable to be extremely disconcerting to the client/user, causing the 
server to be extremely reluctant to revoke the thing, exacerbating the 
delay to other clients when there is an unresponsive client.  The problem 
is that once you do a directory-modifying operation, and it succeeds at 
the application level, if you have your delegation ripped away, you are 
in a tough situation.  Your application syscall has succeeded, the 
application may have terminated, and now you have created a file, other 
clients have seen the directory when you didn't, and thus you don't want 
to push the create out and if you don't, you essentially have a corrupt- 
fs/reboot situation.  It is  theoretically possible to embed the 
directory- 
modifying operations in transactions such that we have a nice recovery 
but my guess is that there aren't going to be actual clients able to use 
this kind of thing safely for a long long time, if ever. 
Carl Burnett made reference to write directory delegation in DFS (I think 
???*****).  I'm guessing that the issues of lack of shared semantics that 
he mentions could be worked around, if it was worth it.  But I wonder 
about the effects of communications problems, as I mentioned above, 
particularly in an internet environment.  So I would be interested in 
hearing about actual experiences with this.  Is it worth it, given that 
spec-ing and implementing this is liable to be a lot of work? 
[CARL: DFS did not use write guarantees on dirs for some of the reasons 
you listed above. AFS had it. It was largely possible in AFS because the 
server side filesystem was a well known and it was the only one that AFS 
worked with.]
There are number of other issues that people have brought up that seem 
to be inter-related: 
     Delegated directory contents as READDIR or READDIR+ (or what is 
     the role of attributes?). 
     Synchronous or asynchronous notification. 
     Notification of changes vs. a clear-your-dnlc-for-directory model. 
     (raised by Tom Talpey in some private e-mail). 
These all relate to the issue of what write delegations are intended to 
do (Oh Gosh, I need a Problem Statement :-).  So the following things are 
possibilities (not mutually exclusive). I'm particularly interested in 
additions to this list, except when they complicate the design.  Come to 
think of it, I'm not interested in changes to the list :-), but I know 
that won't stop anybody. 
     Enable files to be accessed without significant server interaction 
     when they exist in read-mostly directories, or rather, in directories 

     that are not being written by other clients. 
     Tracking changes in a specific directories for programs that display 
     directories for GUI tools, without ugly polling. 
     Accessing non-existent files in a non-changing directory, presumably 
     one which exists :-), or the issue of ENOENT lookups/opens mentioned 
     by Carl Burnett. 
Let's first consider the issue of asynchronous vs. synchronous 
notification. 
The motivation for asynchronous notification is that it is better from the 

server's point of view in that operations will not be held up due to 
network 
problems or a client not responding quickly to a callback (or being down) 
and that even when everything is working OK there is a cost in that a 
delay 
equal to twice the latency to the most distant notified client is added. 
So the issue boils down to whether asynchronous notification will do the 
job. 
Carl's comments have caused me to rethink the issue and I've decided that 
they won't, at least for anything other than the case of the GUI tools. So 

I think I'm back in the synchronous notification camp. 
Tom Talpey (private e-mail) has raised the issue of whether change 
notifica- 
tion is worth it at all, and whether instead you just have a 
recall/revoca- 
tion event and let the client just get his delegation again and refetch 
the 
modified directory contents.  This does make the feature easier to spec 
(e.g. There is a tough case of sequencing notifications and successive 
READDIR's when fetching a big directory, as well as a more complicated set 

of callbacks to define).  However, I worry about very large directories 
and 
the effort of refetching, when there is a modest level of exogenous 
directory change.  What do other people think?  I think I'm going to go 
forward trying to do the notifications, unless it turns out  that the 
complexities make this too difficult to do for v4.1. 
One issue that Saadia Khan raised is the issue of directory changes made 
by the delegate himself.  I think we have to make clear that this is 
allowed and the client is presumed to know about directory changes it 
makes itself.  Doing otherwise would compromise the usefulness of 
directory delegations in the case in which a single client is modifying 
the directory.  My assumption is that it just too difficult to do write 
directory delegation, but exclusive use is still a very important case, 
and we should what we can to make read directory delegations useful in 
the exclusive use environment. 
This would be particularly important if we are not doing notifications 
and have to recall the delegation and re-READDIR, but even if we do 
notification, the thought of a RENAME on a high-latency link waiting 
for a high-latency callback to the client doing the rename, makes me 
kind of sick. 
The issue of not sending callbacks to the client making the change 
could require stateid's in all directory-modifying operations, but 
with sessions, we can simply not notify delegations associated with 
the session making the change. 
Regardless of all the IETF procedural stuff, my impression is that 
sessions are going to be in v4.1 and I don't want to waste my time 
defining new operations, that won't be needed for v4.1 if sessions 
are present. 
Another issue that was raised is the requirement that attributes not 
be changed.  There was some objections to this by David Robinson on 
what I take to be architectural grounds, in that directories and 
the attributes of files within them are just different sorts of 
objects.  Also, Rob Thurlow worries about the difficulty of 
implementing the callback in response to, for example, a SETATTR 
on a filehandle which just happens to be in the subject directory. 
[CARL: I agree with Dave and Rob. You want to keep them separate with 
maybe the possible exception for symlinks that you elude to below]
So let me first explain my basic motivation for the attribute 
requirement.  The attributes I am basically concerned with are 
those that have to do with access to the file: mode, owner, 
group, and acl.  Also the change attribute so that the client can 
see if he has the right version of the file.  We could try to 
reduce the attributes guaranteed constant to the minimum, but 
there doesn't seem to be a lot of reason to do that.  This is the 
same situation as with file delegation.  Any SETATTR causes the 
delegation to be recalled, even though it might be possible to 
allow a few marginal attributes to be changed.  Having the client 
able to assume that all attributes remain unchanged just makes 
things simpler. 
I want the directory-delegated client to be able to access files 
(i.e. open and read) files within the directory without needing 
to contact the server.  So this is why it makes sense to impose 
a similar attribute constancy requirement on directory delegation. 
If you didn't, you could not determine whether a given user 
could access the file and would have to contact the server 
for each individual file. 
 
Even when you have read delegations available for the individual 
files, you have to get a delegation for each one being accessed, 
and then return that delegation.  Given that clients may cache 
copies of infrequently changed files on disk, a simple way of 
validating such copies and securing access would be very 
nice indeed, especially without forcing a per-file state 
housekeeping requirement.  The number of directories you are 
going to be accessing is much smaller than the number of files, 
in almost all cases. 
[Carl: I think one problem with the above is security. The security of the 
directory and each object's attributes are not enough to determine the 
security of each object and the client should not assume the security. At 
a minimum it needs to check with the server to make sure access is allowed 
by the entity operating on the file. The server could have security 
policies that go beyond file attributes, including NFS V4 ACLs. For 
example, time of day policies, etc.]
So I'd argue that the performance benefits of this override any 
architectural reservations, but that is generally the way I lean 
on these things.  After all, READDIRPLUS (now READDIR) returns 
attribute information together with directory information, in the 
face of the same architectural disconnect for the same sorts of 
pragmatic reasons.  As far as the difficulty of implementation, 
I'd say "No pain, no gain" but I would be open to an option what 
would allow servers that couldn't implement this to obtain all the 
benefits that they could get without it.  Let me also offer the 
following full disclosure.  WAFL does not have pointers in the inode 
back to the enclosing directory but it has been discussed.  I'm 
pretty firm in believing that this is something that filesystems 
will just have to do.  When things go wrong, for example, saying you 
have a problem with inode xxx (as opposed to the file named 
aaa/bbb/cc) as is part of the typical UNIX fs paradigm is not 
something that users can or should be asked to accept. 
I guess it is possible to reduce the attributes to the critical 
set, if someone can make a strong case for this.  However, once 
you subtract what are basically filesystem attributes, get rid 
of atime which has to be excluded, take away unchanging attributes 
such as fileid and fsid, there isn't all that much left.  Also, 
the difficulty of implementation does not seem to be reduced with 
fewer attributes. 
One issue that has come up recently that we will have to resolve 
for directory delegation, and appears particularly relevant to the 
client looking at the acls and granting access to the individual 
user processes is relation of credentials and state, particularly 
delegation state.  I haven't followed the ongoing discussion of 
this issue well enough to determine my exact position on how it 
does or should affect directory delegations, although it is clearly 
quite relevant.  This needs further discussion. 
 
Carl also mentioned some ideas for structuring requests to get 
directory delegation.  I'm thinking a request to get a directory 
delegation alone would work OK.  You can add a READDIR to the 
COMPOUND.  There would have to be an option so that failure to 
get the delegation would not cause an error so that you could 
try for a delegation and get the directory information whether 
you got the delegation or not.  Mike Eisler has suggested (in 
private e-mail), the possibility that this would fit well with 
OPENDIR/CLOSEDIR operations in which a delegation request was 
a client option.  Since OPENDIR would allow the server to know 
when the directory was open, it could make the cookie verifier 
useful by enabling the server to switch the verifier only when 
the directory was not open. 
Carl also mentioned the possibility of symlink delegations.  I 
don't think this is needed and it would be a lot of delegation 
stateid's for the server and client to keep track of.  At least 
within the nfs protocols, there is no way to change a symlink 
without changing the directory.  Symlinks are not writable 
objects.  You have to delete the existing one and then create a 
new one of the same name to get the effect of changing symlink 
contents, and even this would change the filehandle of the 
symlink, rather than being see as modifying an existing object. 
So changing a symlink is always going to involve a directory 
delegation callback in any case.  To deal with the possibility 
that the local server OS has a API to modify as symlink, we merely 
have to make the rule that a read directory delegation provides 
an assurance that there be no change in symlinks within the directory 
without a callback. 
[CARL: Using the directory delegation to cover the validity of symlink 
data sounds like it could work and it would provide the fundamental 
benefit]


_______________________________________________
nfsv4 mailing list
nfsv4@ietf.org
https://www1.ietf.org/mailman/listinfo/nfsv4