Re: [nfsv4] Directory delegations, take 2
Carl Burnett <cburnett@us.ibm.com> Mon, 20 October 2003 12:55 UTC
Received: from optimus.ietf.org (ietf.org [132.151.1.19] (may be forged)) by ietf.org (8.9.1a/8.9.1a) with ESMTP id IAA24322 for <nfsv4-archive@odin.ietf.org>; Mon, 20 Oct 2003 08:55:36 -0400 (EDT)
Received: from localhost.localdomain ([127.0.0.1] helo=www1.ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 1ABZZ4-0008NN-7k for nfsv4-archive@odin.ietf.org; Mon, 20 Oct 2003 08:55:15 -0400
Received: (from exim@localhost) by www1.ietf.org (8.12.8/8.12.8/Submit) id h9KCtEZ3032196 for nfsv4-archive@odin.ietf.org; Mon, 20 Oct 2003 08:55:14 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 1ABZZ3-0008ND-VO for nfsv4-web-archive@optimus.ietf.org; Mon, 20 Oct 2003 08:55:14 -0400
Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id IAA24316 for <nfsv4-web-archive@ietf.org>; Mon, 20 Oct 2003 08:55:04 -0400 (EDT)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 1ABZZ2-0007fi-00 for nfsv4-web-archive@ietf.org; Mon, 20 Oct 2003 08:55:12 -0400
Received: from ietf.org ([132.151.1.19] helo=optimus.ietf.org) by ietf-mx with esmtp (Exim 4.12) id 1ABZZ2-0007fe-00 for nfsv4-web-archive@ietf.org; Mon, 20 Oct 2003 08:55:12 -0400
Received: from localhost.localdomain ([127.0.0.1] helo=www1.ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 1ABZYs-0008Kj-Cc; Mon, 20 Oct 2003 08:55:02 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 1ABZY9-0008Cs-Nv for nfsv4@optimus.ietf.org; Mon, 20 Oct 2003 08:54:17 -0400
Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id IAA24279 for <nfsv4@ietf.org>; Mon, 20 Oct 2003 08:54:08 -0400 (EDT)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 1ABZY8-0007fF-00 for nfsv4@ietf.org; Mon, 20 Oct 2003 08:54:16 -0400
Received: from e35.co.us.ibm.com ([32.97.110.133]) by ietf-mx with esmtp (Exim 4.12) id 1ABZY6-0007ew-00 for nfsv4@ietf.org; Mon, 20 Oct 2003 08:54:15 -0400
Received: from westrelay04.boulder.ibm.com (westrelay04.boulder.ibm.com [9.17.193.32]) by e35.co.us.ibm.com (8.12.10/8.12.2) with ESMTP id h9KCrgIl125332; Mon, 20 Oct 2003 08:53:42 -0400
Received: from d03nm130.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.193.82]) by westrelay04.boulder.ibm.com (8.12.9/NCO/VER6.6) with ESMTP id h9KCrZRm090650; Mon, 20 Oct 2003 06:53:41 -0600
To: "Noveck, Dave" <Dave.Noveck@netapp.com>
Cc: nfsv4@ietf.org
MIME-Version: 1.0
Subject: Re: [nfsv4] Directory delegations, take 2
X-Mailer: Lotus Notes Release 5.0.7 March 21, 2001
Message-ID: <OF121DC528.C89BE01A-ON87256DC5.00443F86@us.ibm.com>
From: Carl Burnett <cburnett@us.ibm.com>
X-MIMETrack: Serialize by Router on D03NM130/03/M/IBM(Release 6.0.2CF2|July 23, 2003) at 10/20/2003 06:53:41, Serialize complete at 10/20/2003 06:53:41
Content-Type: text/plain; charset="us-ascii"
Sender: nfsv4-admin@ietf.org
Errors-To: nfsv4-admin@ietf.org
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.0.12
Precedence: bulk
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
List-Archive: <https://www1.ietf.org/mail-archive/working-groups/nfsv4/>
X-Original-Date: Mon, 20 Oct 2003 07:53:36 -0500
Date: Mon, 20 Oct 2003 07:53:36 -0500
Look below. My remarks are within [Carl: .... ]. Thanks, Carl Carl Burnett AIX Kernel Architecture - Network File System (512) 838-8498, TL 678-8498 (please reply to cburnett@us.ibm.com) "Noveck, Dave" <Dave.Noveck@netapp.com> Sent by: nfsv4-admin@ietf.org 10/17/2003 06:55 PM To: <nfsv4@ietf.org> cc: Subject: [nfsv4] Directory delegations, take 2 So this is my attempt to update the directory delegation approach to reflect comments I have recently received. In some cases, I've changed things and in some cases I've simply tried to explain things better. I'm not going to do extensive quoting of comments I've received, but I hope my summaries are not misrepresenting anyone. The first thing that has been mentioned, by David Robinson I believe, is directory write delegation. I'd like to explain why I've stayed away from directory write delegation. Directory write delegations scare the hell out of me. There are two important practical issues. First, you have the problem of an unresponsive client stopping things on the server from proceeding. It is true that this problem already exists with write delegations for files, but the problem is much more likely if you have a whole directory such that any lookup through it will cause people to wait for a long time, while we decide whether the client holding the write delegation is ever going to respond. The second issue, which exacerbates the first is that the effect of revoking a write directory delegation is liable to be extremely disconcerting to the client/user, causing the server to be extremely reluctant to revoke the thing, exacerbating the delay to other clients when there is an unresponsive client. The problem is that once you do a directory-modifying operation, and it succeeds at the application level, if you have your delegation ripped away, you are in a tough situation. Your application syscall has succeeded, the application may have terminated, and now you have created a file, other clients have seen the directory when you didn't, and thus you don't want to push the create out and if you don't, you essentially have a corrupt- fs/reboot situation. It is theoretically possible to embed the directory- modifying operations in transactions such that we have a nice recovery but my guess is that there aren't going to be actual clients able to use this kind of thing safely for a long long time, if ever. Carl Burnett made reference to write directory delegation in DFS (I think ???*****). I'm guessing that the issues of lack of shared semantics that he mentions could be worked around, if it was worth it. But I wonder about the effects of communications problems, as I mentioned above, particularly in an internet environment. So I would be interested in hearing about actual experiences with this. Is it worth it, given that spec-ing and implementing this is liable to be a lot of work? [CARL: DFS did not use write guarantees on dirs for some of the reasons you listed above. AFS had it. It was largely possible in AFS because the server side filesystem was a well known and it was the only one that AFS worked with.] There are number of other issues that people have brought up that seem to be inter-related: Delegated directory contents as READDIR or READDIR+ (or what is the role of attributes?). Synchronous or asynchronous notification. Notification of changes vs. a clear-your-dnlc-for-directory model. (raised by Tom Talpey in some private e-mail). These all relate to the issue of what write delegations are intended to do (Oh Gosh, I need a Problem Statement :-). So the following things are possibilities (not mutually exclusive). I'm particularly interested in additions to this list, except when they complicate the design. Come to think of it, I'm not interested in changes to the list :-), but I know that won't stop anybody. Enable files to be accessed without significant server interaction when they exist in read-mostly directories, or rather, in directories that are not being written by other clients. Tracking changes in a specific directories for programs that display directories for GUI tools, without ugly polling. Accessing non-existent files in a non-changing directory, presumably one which exists :-), or the issue of ENOENT lookups/opens mentioned by Carl Burnett. Let's first consider the issue of asynchronous vs. synchronous notification. The motivation for asynchronous notification is that it is better from the server's point of view in that operations will not be held up due to network problems or a client not responding quickly to a callback (or being down) and that even when everything is working OK there is a cost in that a delay equal to twice the latency to the most distant notified client is added. So the issue boils down to whether asynchronous notification will do the job. Carl's comments have caused me to rethink the issue and I've decided that they won't, at least for anything other than the case of the GUI tools. So I think I'm back in the synchronous notification camp. Tom Talpey (private e-mail) has raised the issue of whether change notifica- tion is worth it at all, and whether instead you just have a recall/revoca- tion event and let the client just get his delegation again and refetch the modified directory contents. This does make the feature easier to spec (e.g. There is a tough case of sequencing notifications and successive READDIR's when fetching a big directory, as well as a more complicated set of callbacks to define). However, I worry about very large directories and the effort of refetching, when there is a modest level of exogenous directory change. What do other people think? I think I'm going to go forward trying to do the notifications, unless it turns out that the complexities make this too difficult to do for v4.1. One issue that Saadia Khan raised is the issue of directory changes made by the delegate himself. I think we have to make clear that this is allowed and the client is presumed to know about directory changes it makes itself. Doing otherwise would compromise the usefulness of directory delegations in the case in which a single client is modifying the directory. My assumption is that it just too difficult to do write directory delegation, but exclusive use is still a very important case, and we should what we can to make read directory delegations useful in the exclusive use environment. This would be particularly important if we are not doing notifications and have to recall the delegation and re-READDIR, but even if we do notification, the thought of a RENAME on a high-latency link waiting for a high-latency callback to the client doing the rename, makes me kind of sick. The issue of not sending callbacks to the client making the change could require stateid's in all directory-modifying operations, but with sessions, we can simply not notify delegations associated with the session making the change. Regardless of all the IETF procedural stuff, my impression is that sessions are going to be in v4.1 and I don't want to waste my time defining new operations, that won't be needed for v4.1 if sessions are present. Another issue that was raised is the requirement that attributes not be changed. There was some objections to this by David Robinson on what I take to be architectural grounds, in that directories and the attributes of files within them are just different sorts of objects. Also, Rob Thurlow worries about the difficulty of implementing the callback in response to, for example, a SETATTR on a filehandle which just happens to be in the subject directory. [CARL: I agree with Dave and Rob. You want to keep them separate with maybe the possible exception for symlinks that you elude to below] So let me first explain my basic motivation for the attribute requirement. The attributes I am basically concerned with are those that have to do with access to the file: mode, owner, group, and acl. Also the change attribute so that the client can see if he has the right version of the file. We could try to reduce the attributes guaranteed constant to the minimum, but there doesn't seem to be a lot of reason to do that. This is the same situation as with file delegation. Any SETATTR causes the delegation to be recalled, even though it might be possible to allow a few marginal attributes to be changed. Having the client able to assume that all attributes remain unchanged just makes things simpler. I want the directory-delegated client to be able to access files (i.e. open and read) files within the directory without needing to contact the server. So this is why it makes sense to impose a similar attribute constancy requirement on directory delegation. If you didn't, you could not determine whether a given user could access the file and would have to contact the server for each individual file. Even when you have read delegations available for the individual files, you have to get a delegation for each one being accessed, and then return that delegation. Given that clients may cache copies of infrequently changed files on disk, a simple way of validating such copies and securing access would be very nice indeed, especially without forcing a per-file state housekeeping requirement. The number of directories you are going to be accessing is much smaller than the number of files, in almost all cases. [Carl: I think one problem with the above is security. The security of the directory and each object's attributes are not enough to determine the security of each object and the client should not assume the security. At a minimum it needs to check with the server to make sure access is allowed by the entity operating on the file. The server could have security policies that go beyond file attributes, including NFS V4 ACLs. For example, time of day policies, etc.] So I'd argue that the performance benefits of this override any architectural reservations, but that is generally the way I lean on these things. After all, READDIRPLUS (now READDIR) returns attribute information together with directory information, in the face of the same architectural disconnect for the same sorts of pragmatic reasons. As far as the difficulty of implementation, I'd say "No pain, no gain" but I would be open to an option what would allow servers that couldn't implement this to obtain all the benefits that they could get without it. Let me also offer the following full disclosure. WAFL does not have pointers in the inode back to the enclosing directory but it has been discussed. I'm pretty firm in believing that this is something that filesystems will just have to do. When things go wrong, for example, saying you have a problem with inode xxx (as opposed to the file named aaa/bbb/cc) as is part of the typical UNIX fs paradigm is not something that users can or should be asked to accept. I guess it is possible to reduce the attributes to the critical set, if someone can make a strong case for this. However, once you subtract what are basically filesystem attributes, get rid of atime which has to be excluded, take away unchanging attributes such as fileid and fsid, there isn't all that much left. Also, the difficulty of implementation does not seem to be reduced with fewer attributes. One issue that has come up recently that we will have to resolve for directory delegation, and appears particularly relevant to the client looking at the acls and granting access to the individual user processes is relation of credentials and state, particularly delegation state. I haven't followed the ongoing discussion of this issue well enough to determine my exact position on how it does or should affect directory delegations, although it is clearly quite relevant. This needs further discussion. Carl also mentioned some ideas for structuring requests to get directory delegation. I'm thinking a request to get a directory delegation alone would work OK. You can add a READDIR to the COMPOUND. There would have to be an option so that failure to get the delegation would not cause an error so that you could try for a delegation and get the directory information whether you got the delegation or not. Mike Eisler has suggested (in private e-mail), the possibility that this would fit well with OPENDIR/CLOSEDIR operations in which a delegation request was a client option. Since OPENDIR would allow the server to know when the directory was open, it could make the cookie verifier useful by enabling the server to switch the verifier only when the directory was not open. Carl also mentioned the possibility of symlink delegations. I don't think this is needed and it would be a lot of delegation stateid's for the server and client to keep track of. At least within the nfs protocols, there is no way to change a symlink without changing the directory. Symlinks are not writable objects. You have to delete the existing one and then create a new one of the same name to get the effect of changing symlink contents, and even this would change the filehandle of the symlink, rather than being see as modifying an existing object. So changing a symlink is always going to involve a directory delegation callback in any case. To deal with the possibility that the local server OS has a API to modify as symlink, we merely have to make the rule that a read directory delegation provides an assurance that there be no change in symlinks within the directory without a callback. [CARL: Using the directory delegation to cover the validity of symlink data sounds like it could work and it would provide the fundamental benefit] _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4
- [nfsv4] Directory delegations, take 2 Noveck, Dave
- Re: [nfsv4] Directory delegations, take 2 Carl Burnett
- RE: [nfsv4] Directory delegations, take 2 Noveck, Dave
- Re: [nfsv4] Directory delegations, take 2 David Robinson
- RE: [nfsv4] Directory delegations, take 2 Noveck, Dave
- Re: [nfsv4] Directory delegations, take 2 Nicolas Williams
- Re: [nfsv4] Directory delegations, take 2 Nicolas Williams
- Re: [nfsv4] Directory delegations, take 2 Ted Anderson
- RE: [nfsv4] Directory delegations, take 2 Noveck, Dave
- Re: [nfsv4] Directory delegations, take 2 Nicolas Williams
- Re: [nfsv4] Directory delegations, take 2 David Robinson
- RE: [nfsv4] Directory delegations, take 2 Halevy, Benny