Re: [nfsv4] can a server replace a read deleg with a write deleg?

<david.noveck@emc.com> Sat, 27 August 2011 23:37 UTC

Return-Path: <david.noveck@emc.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1A8E421F8B4C for <nfsv4@ietfa.amsl.com>; Sat, 27 Aug 2011 16:37:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.539
X-Spam-Level:
X-Spam-Status: No, score=-6.539 tagged_above=-999 required=5 tests=[AWL=0.060, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id locD2ClSF49N for <nfsv4@ietfa.amsl.com>; Sat, 27 Aug 2011 16:37:48 -0700 (PDT)
Received: from mexforward.lss.emc.com (mexforward.lss.emc.com [128.222.32.20]) by ietfa.amsl.com (Postfix) with ESMTP id E851921F8B28 for <nfsv4@ietf.org>; Sat, 27 Aug 2011 16:37:47 -0700 (PDT)
Received: from hop04-l1d11-si01.isus.emc.com (HOP04-L1D11-SI01.isus.emc.com [10.254.111.54]) by mexforward.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id p7RNcvkI021620 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 27 Aug 2011 19:38:57 -0400
Received: from mailhub.lss.emc.com (mailhub.lss.emc.com [10.254.222.226]) by hop04-l1d11-si01.isus.emc.com (RSA Interceptor); Sat, 27 Aug 2011 19:38:42 -0400
Received: from mxhub21.corp.emc.com (mxhub21.corp.emc.com [128.221.56.107]) by mailhub.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id p7RNcfdQ012340; Sat, 27 Aug 2011 19:38:42 -0400
Received: from mx31a.corp.emc.com ([169.254.1.28]) by mxhub21.corp.emc.com ([128.221.56.107]) with mapi; Sat, 27 Aug 2011 19:38:41 -0400
From: david.noveck@emc.com
To: rmacklem@uoguelph.ca, nfsv4@ietf.org
Date: Sat, 27 Aug 2011 19:43:38 -0400
Thread-Topic: [nfsv4] can a server replace a read deleg with a write deleg?
Thread-Index: Acsbk4aNCUtmIS7QR4GvOCZhrGGKklJOuQUw
Message-ID: <5DEA8DB993B81040A21CF3CB332489F68146A637@MX31A.corp.emc.com>
References: <Pine.GSO.4.63.1007041217520.6453@muncher.cs.uoguelph.ca>
In-Reply-To: <Pine.GSO.4.63.1007041217520.6453@muncher.cs.uoguelph.ca>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-EMM-MHVC: 1
Cc: trond.myklebust@fys.uio.no
Subject: Re: [nfsv4] can a server replace a read deleg with a write deleg?
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 27 Aug 2011 23:37:49 -0000

Although I'm responding to Rick's original message on this subject, I'm
not ignoring what has been said relevant to the topic by Trond after that
and will  refer to what he said in his mails as appropriate.

I believe that the spec needs to be clarified (to agree with what I'm
saying :-) or else clearly say it is wrong :-(

This is apart from my belief that the restriction that Trond notes is
mistaken.  For now, I'll just assume it as currently written.  I will 
not address how its possible mistakenness relates to the issue until later
on in the message after a big "================================="

------------------------------------------

So my answers to Rick are
1) it does not need to CBRecall the read delegation.  However, if it does
   not recall it, it needs to revoke it.

As Trond points out, the spec says you "may not" change the file while holding
the read delegation.   It does not say "MUST NOT" or "SHOULD NOT" because that
language implies that the other party can depend (at least normally) on the other
party not doing the action.  That is normally said about what the responder does and
the requester is relying on that.   Here it is about what the requester would do and
the server is not relying on the client not doing these things.  If the client does
them, the server has to do something, like return an error and I don't see any 
statement about an error being returned.

How I interpret this is that the client can do any writes
he wants to try but that they are incompatible with retaining the delegation,
So that the server has to somehow get rid of it.

Rick points out that the WRITE and not the OPEN is the important thing, but
I think Trond is right in saying the server need not wait for the WRITE, i.e. 
that he MAY and should take the OPEN for write as evidence that a WRITE will 
most likely follow, although I am not quite as judgmental as Trond in making 
that assessment. 

So if the read delegation should go, is a CBRecall needed?  I don't think so,
since the client knows he is opening for write.  I think the server is justified
in skipping recall in this case and going directly to revocation of the read
Delegation.

Once the read delegation is revoked, the server is free to grant a write delegation. 

So the answer to the other question is:
2) yes, it can return a write delegation in the open reply but only if it has already
   revoked the read delegation.  The client might consider that a replacement but the
   two are independent since they have different stateid's.  The client could interpret
   the write delegation as incompatible with the read delegation and he could simply
   consider the read delegation as gone without actually verifying that the revoke
   happened.

Although this is not a replacement in the sense of an upgrade of an existing object, the 
client might well consider it as essentially the same in effect, although there is 
no guarantee that it is done atomically.

One way in which the server might well abide by a replacement paradigm (not "SHOULD" or
"should" but more like "would be well-advised to") is in terms of resource allocation.
If a server is limited to having N outstanding delegations, it could try to make sure
that the revoked delegation was not snapped up by an unrelated request, making it
likely that he will be able to grant the write delegation if nothing else (conflicting
opens or delegations) prevents it.

-----------------------------------------------------------------------------------

Having answered Rick's questions that about what the server might do, a related question
is whether a client should (by which I mean "may, if determined to act prudently") depend 
on this behavior.  While this is allowed behavior, and seems to me sensible, it is 
not mandated by the spec and clients should not expect it to always happen.

It would be far better for the client in this situation to put a delegation return in
front of the COMPOUND and do the OPEN after that.  The server might well take the
same resource trading precautions as it does in the revoke case.

In this case, if there are delegations that are incompatible with the OPEN, the read
delegation could be gone when you received the DELAY error, so you might have resource
issues when you re-issued that prevented you from gettin the delegation.  Without
the delegation return, the server could be coded so that in that case, the revoke 
of the read delegation would not happen in the DELAY case.  But I don't think that is
a big enough deal to justify not putting the delegation return in the COMPOUND.

==================================================================================

I've previously explained why I think this "may not" is a mistake as follows and so 
far I've not heard any argument that it isn't:

> When I get a read delegation, I'm assured that nobody else is 
> changing the file.

> I don't need an assurance that I'm not changing the file.  I know 
> whether I'm changing the file.   

> It could be that this restriction helps the others who have a read 
> delegation but it doesn't.  If I had no delegation, then when I opened 
> for write, all of clients holding read delegations would have their 
> delegations recalled.  That make sense.

> Now if I also hold a read delegation, then the spec indicates, as Trond 
> points out, that I will lose mine as well.  The question is "why?"  It 
> can't be to inform me that I'm writing.  I know that.

> The problem here is that the protocol as now specified, cannot give 
> you a delegation-based assurance that you are the only one writing.  You 
> can get an open-based non-revocable assurance that you are the only writer 
> by opening for write with deny-write, but it is anomalous that you can't 
> get a delegation-style assurance of this.

Now as to my concern as to the mistakenness of the "may not" and why I join Rick in
exploring the ragged edges of the protocol here, let me explain why I think this is
important now.

We now have cheaper fast caching media in the form of flash memory and soon MRAM may
continue in this vein.  Delegations are v4's prime means of providing support for more
abundant and long-lived caching.

So you want to have read delegations to make sure that your flash copies of files remain
up-to-date.  When you don't have a read-delegation, you can just interrogate the changed
attribute, as long as you aren't writing.  When you are writing and don't have any assurance
that nobody else is, the data that you have cached is not reliable.  The problem is
compounded (not intended as a pun when written, honest :-) by the absence of atomic 
pre-and-post-attributes for write.

So the problem is that what you need is an assurance that nobody else is writing the file,
to assure validity of your cache.  A write delegation might allow you to do write-back caching 
But you might not be that aggressive (e.g. concern about disasters like asteroids, hurricanes 
and earthquakes), but you really want to be able to at least do write-through caching.

With the ability to trade (in some sense) a write delegation for a read delegation), you have 
the assurance you need to do write-through caching.  Non-atomicity is not a problem as long
As you can check the change attribute before you start writing.  So if he client could
return the read delegation and get the write delegation, he would be OK, even if there were
read delegations that others held to be recalled.

The fly in this particular ointment is that if other clients had the file opened for read,
the write delegation could not be granted and the writing client would have to flush his copy 
of the file which might be fairly large in the case of flash caching.  The problem here is
that the client needs an assurance that nobody else is writing and can't get one, all 
because of "the gratuitous 'may not'".

So how about getting rid of "the gratuitous 'may not'"?

Would that hurt the protocol in any way?

If we deleted this "may not", you could keep your read delegation across the OPEN, 
retaining the assurance that nobody else is writing.  The open-for-write would cause
read delegations held by others to be recalled but opens for read would not interfere
with the read delegation, as they should not.

If nobody has a reason in hand and this is just a time-to-bis issue, I can see how the
need to "ship product" might get in the way.  If so, might we reconsider the issue in 
the context of v4.1?  If the deferral of this is motivated by time considerations, as 
opposed to deciding that the restriction is correct, I ask that we not consider that 
deferral as a precedent for the case of v4.1, where the same schedule considerations 
do not apply.

-----Original Message-----
From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf Of Rick Macklem
Sent: Sunday, July 04, 2010 12:28 PM
To: nfsv4@ietf.org
Subject: [nfsv4] can a server replace a read deleg with a write deleg?

Somewhat tangencial to the recent thread, what about the following
simple case:
- client opens foo for reading and gets read_deleg_stateid_foo
- client opens foo for writing against the server (which it must do
   when it holds a read delegation)

Now, if the server sees that this client is the only one with a read
delegation for (and opens on) "foo", it could issue a write delegation
to the client for "foo".
- To do this does it first need to CBRecall the read delegation? (If so,
   it probably cannot issue the write delegation for this open, since it
   would take too long to reply to the Open unless it returns NFS4ERR_DELAY
   for the Open for a while. Not an ideal situation.)
OR
- Can it return a write delegation for "foo" in the write open reply?
   - If it is allowed to do this, does this delegation replace
     read_delegation_stateid_foo or is there now multiple delegations for
     "foo" issued to the same client?
     (I don't like the concept of having multiple delegations issued to
      the same client for the same file concurrently and I'm pretty sure
      my client isn't implemented to handle this case. Without looking at
      the code to be sure, I think it logs an error and throws away the
      new second delegation.)

I don't think this is clarified in RFC3530, but please correct me if I'm
incorrect w.r.t. this.

rick

_______________________________________________
nfsv4 mailing list
nfsv4@ietf.org
https://www.ietf.org/mailman/listinfo/nfsv4