RE: [nfsv4] Directory delegations, take 2

"Noveck, Dave" <Dave.Noveck@netapp.com> Thu, 23 October 2003 14:49 UTC

Received: from optimus.ietf.org (ietf.org [132.151.1.19] (may be forged)) by ietf.org (8.9.1a/8.9.1a) with ESMTP id KAA26960 for <nfsv4-archive@odin.ietf.org>; Thu, 23 Oct 2003 10:49:25 -0400 (EDT)
Received: from localhost.localdomain ([127.0.0.1] helo=www1.ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 1ACglt-0006pJ-Vy for nfsv4-archive@odin.ietf.org; Thu, 23 Oct 2003 10:49:06 -0400
Received: (from exim@localhost) by www1.ietf.org (8.12.8/8.12.8/Submit) id h9NEn5tP026238 for nfsv4-archive@odin.ietf.org; Thu, 23 Oct 2003 10:49:05 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 1ACglt-0006p5-AB for nfsv4-web-archive@optimus.ietf.org; Thu, 23 Oct 2003 10:49:05 -0400
Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id KAA26945 for <nfsv4-web-archive@ietf.org>; Thu, 23 Oct 2003 10:48:53 -0400 (EDT)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 1ACglq-0001cP-00 for nfsv4-web-archive@ietf.org; Thu, 23 Oct 2003 10:49:02 -0400
Received: from ietf.org ([132.151.1.19] helo=optimus.ietf.org) by ietf-mx with esmtp (Exim 4.12) id 1ACglq-0001cL-00 for nfsv4-web-archive@ietf.org; Thu, 23 Oct 2003 10:49:02 -0400
Received: from localhost.localdomain ([127.0.0.1] helo=www1.ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 1ACglo-0006oS-RL; Thu, 23 Oct 2003 10:49:00 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 1ACgl0-0006fz-Af for nfsv4@optimus.ietf.org; Thu, 23 Oct 2003 10:48:10 -0400
Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id KAA26859 for <nfsv4@ietf.org>; Thu, 23 Oct 2003 10:47:58 -0400 (EDT)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 1ACgkx-0001bI-00 for nfsv4@ietf.org; Thu, 23 Oct 2003 10:48:07 -0400
Received: from mx01.netapp.com ([198.95.226.53]) by ietf-mx with esmtp (Exim 4.12) id 1ACgkw-0001aY-00 for nfsv4@ietf.org; Thu, 23 Oct 2003 10:48:07 -0400
Received: from frejya.corp.netapp.com (frejya [10.10.20.91]) by mx01.netapp.com (8.12.10/8.12.10/NTAP-1.4) with ESMTP id h9NEla4Z011507; Thu, 23 Oct 2003 07:47:36 -0700 (PDT)
Received: from svlexc01.hq.netapp.com (svlexc01.corp.netapp.com [10.10.22.171]) by frejya.corp.netapp.com (8.12.9/8.12.9/NTAP-1.5) with ESMTP id h9NElZo5005983; Thu, 23 Oct 2003 07:47:35 -0700 (PDT)
content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0
Subject: RE: [nfsv4] Directory delegations, take 2
Message-ID: <C8CF60CFC4D8A74E9945E32CF096548A6D358F@silver.nane.netapp.com>
Thread-Topic: [nfsv4] Directory delegations, take 2
Thread-Index: AcOZaHWe+UpHad5YRb6QQj0uGHlCQAABUzXA
From: "Noveck, Dave" <Dave.Noveck@netapp.com>
To: Ted Anderson <TedAnderson@mindspring.com>, nfsv4@ietf.org
Cc: Nicolas Williams <Nicolas.Williams@Sun.COM>, David.Robinson@Sun.COM
Content-Transfer-Encoding: quoted-printable
Sender: nfsv4-admin@ietf.org
Errors-To: nfsv4-admin@ietf.org
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.0.12
Precedence: bulk
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
List-Archive: <https://www1.ietf.org/mail-archive/working-groups/nfsv4/>
X-Original-Date: Thu, 23 Oct 2003 07:47:33 -0700
Date: Thu, 23 Oct 2003 07:47:33 -0700
Content-Transfer-Encoding: quoted-printable
Content-Transfer-Encoding: quoted-printable

I'd like to discuss two issues that were raised by Nico and addressed
by Ted.  I agree with what Ted has said about these but I'd like to
add some detail and a bit of an extension.

The first concerns the need for synchronous notifications (or recall).  
Here is the concrete example that convinced me we had to be synchronous.
Client A creates a file and then renames it into a directory that
client B happens to have a directory delegation for.  Client A sends
(an application on) client B a (non-NFS) message saying "I've created 
a file for you to process so do that."  The application stats the file.  
If the notification were asynchronous, then you would have a race and the
stat of the created file could get ENOENT due to a negative dnlc
entry, which the racing notification/recall would cause to be deleted,
but possibly, alas, too late.

I agree that Ted is going in the right direction with tying this to the
lease interval but I think we can do better than special failure 
indications that "can be delivered to highly paranoid applications",
at least for read directory delegations.

Let's use the example above.  Client A does a rename at time T, and
so we issue the callback (recall or notification) and since, this
is pure read delegation, the response to the callback can be considered
sufficient acknowledgement in either case (i.e. no forward delegation-
return op is required).  Now if we don't get an immediate response,
let's say after one second, we start using the normal (non-callback path)
path to notify the caller.  Within the lease period he has to do some
state-related request, and I think we can contrive to provide a 
notification that there is a problem with the callback path (details
to be worked out -- there is probably useful interactions with the
sessions proposal to be explored).  So once the callback becomes 
pending, within L+1 seconds, where L is the lease time, the client
will either:

     Get the callback, causing him to drop his negative dnlc entry
     for the file being created/renamed.

     Issue a (state-related) request which will get back a callback-
     problem indication, telling him he'd better drop his directory-
     delegation-related dnlc entries including the troublesome one. 

     Have the lease time go by without getting a response from the server
     for a state-related request, indicating the possibility/probability
     that there is a communication problem making it advisable that
     he drop directory-delegation-related dnlc info (among other stuff).

So if the rename waits for L+1+epsilon seconds before actually proceeding
to do the rename, then when the rename returns to the application on A
and it sends the message to the application on B and we know that if 
the client on B is working correctly, regardless of any possible network
prblems by that time the client will, in one way or the other, have dropped 
his negative dnlc entry for the file in question, so that the stat will 
succeed.



-----Original Message-----
From: Ted Anderson [mailto:TedAnderson@mindspring.com]
Sent: Thursday, October 23, 2003 9:21 AM
To: nfsv4@ietf.org
Cc: Nicolas Williams; Noveck, Dave; David.Robinson@Sun.COM
Subject: Re: [nfsv4] Directory delegations, take 2


On 10/21/2003 16:41, David Robinson wrote:
 > I will claim that one of the set of requirements that we are trying to
 > solve is the ability of a client to cache the contents of a directory
 > (the names and file handles) in order to efficiently operate on part
 > of the namespace tree.

Yes, and not just efficiently, but with well-defined semantics.

 > I will argue that the ability for a client to modify a directory
 > without going to the server is not a requirement.  For all the
 > arguments laid out by Dave, it will be difficult to handle not only
 > the error semantics, but it would imply that the client should have a
 > reasonable ability to determine what the new cookies should be,
 > otherwise we take the rare problem of invalid cookies and make it the
 > common case.

I even agree with this.  I think in the long run, providing very high
scalability and client performance will require write delegation of
directories, but there are significant complications.  So putting this
off makes sense.

 > I would support read-only delegations of directory contents.  All
 > directory modifying operations MUST we written through to the server
 > and any client having a delegation MUST be notified.  [...]  I agree
 > that the delegation recall (effectively change notification) should be
 > synchronous. ...

Yes.  I think this would be sufficient to provide very useful semantics
to clients.  Anything less than synchronous notifications leads to
update semantics that whose specification is too fuzzy to be very
useful.  Asynchronous notification avoids polling and the performance
problems associated with that approach.  But when you consider delivery
delays, there is essentially no improvement over the traditional
time-based approach WRT consistency semantics.  Making the notifications
synchronous provides are qualitative improvement in the protocol.

On 10/22/2003 12:58, Nicolas Williams wrote:
 > Synchronous notification delegations sound both neat and like trouble.
 > If we can't guarantee synchronicity (and from these arguments it's
 > clear that we shouldn't), what good is it to provide synchronous
 > notification delegations?  Applications will not be able to use the
 > notification mechanism for synchronization without the guarantee, so
 > why would they want synchronous notifications?

Nico wades in on the other side, due to concern about failures.  My
counter argument is that failures can be handled effectively while
preserving the benefits synchronicity provides.

It is certainly possible to make the notification timeout smaller for
read delegations than for write delegations.  Bounding the time required
to send back cached updates is hard for write delegations, but easy for
read delegations, which require relatively small amounts of work on the
client.  The important aspect of the timeout, however small it is made,
is to ensure that it is tied to the lease interval.  In this way the
client always knows when the server might have failed to deliver a
delegation related message.  This has the big advantage that consistency
semantics are preserved as long as there are no failures of this sort.
These failure events can be delivered to highly paranoid applications or
can be considered after the fact when debugging application problems.


There are still interesting issues to debate.  I like the idea of
delivering notifications which include a description of the directory
update so that delegations do not have to be recalled.  It seems to have
large performance benefits in the common case of low-flux updates.
However, Nico's discussion of "directories as databases" is germane.

The topic of granularity is also very interesting and important.  Should
a directory delegation apply to its contents only or also the attributes
of its children?  Since a file delegation can protect file attributes,
extending directory delegation to its children is not strictly necessary
from a functional point of view.  However, the benefits of reducing the
state management burden are not to be overlooked.  Once you consider
children, it is tempting to allow entire subtrees as well (again,
hardlinks rear their ugly heads here).  A delegation on an entire file
system would also be a very nice feature for when the update-flux is low
enough, as it often is.

I am mindful of the complexity and timing concerns of addressing all
these additional issues.  At a minimum, I think read directory
delegation with synchronous recall is necessary to provide a useful
increment in functionality over NFSv4.0.

Ted Anderson


_______________________________________________
nfsv4 mailing list
nfsv4@ietf.org
https://www1.ietf.org/mailman/listinfo/nfsv4