Re: [nfsv4] Write-behind caching

Spencer Shepler <sshepler@microsoft.com> Tue, 26 October 2010 03:32 UTC

Return-Path: <sshepler@microsoft.com>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 837283A63CB for <nfsv4@core3.amsl.com>; Mon, 25 Oct 2010 20:32:43 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.445
X-Spam-Level:
X-Spam-Status: No, score=-10.445 tagged_above=-999 required=5 tests=[AWL=0.154, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JLKcrTwBUTBM for <nfsv4@core3.amsl.com>; Mon, 25 Oct 2010 20:32:40 -0700 (PDT)
Received: from smtp.microsoft.com (mail2.microsoft.com [131.107.115.215]) by core3.amsl.com (Postfix) with ESMTP id 03CC23A67DF for <nfsv4@ietf.org>; Mon, 25 Oct 2010 20:32:39 -0700 (PDT)
Received: from TK5EX14HUBC103.redmond.corp.microsoft.com (157.54.86.9) by TK5-EXGWY-E802.partners.extranet.microsoft.com (10.251.56.168) with Microsoft SMTP Server (TLS) id 8.2.176.0; Mon, 25 Oct 2010 20:34:19 -0700
Received: from TK5EX14MBXC126.redmond.corp.microsoft.com ([169.254.11.204]) by TK5EX14HUBC103.redmond.corp.microsoft.com ([157.54.86.9]) with mapi id 14.01.0255.003; Mon, 25 Oct 2010 20:34:19 -0700
From: Spencer Shepler <sshepler@microsoft.com>
To: "david.noveck@emc.com" <david.noveck@emc.com>, "nfsv4@ietf.org" <nfsv4@ietf.org>
Thread-Topic: [nfsv4] Write-behind caching
Thread-Index: Act0sTwJi7tX/aDpTA+jC4LSYbZLZwAALReAAAIs/VAAAOVtoA==
Date: Tue, 26 Oct 2010 03:34:18 +0000
Message-ID: <E043D9D8EE3B5743B8B174A814FD584F0D498E1D@TK5EX14MBXC126.redmond.corp.microsoft.com>
References: <BF3BB6D12298F54B89C8DCC1E4073D80028C76DB@CORPUSMX50A.corp.emc.com> <E043D9D8EE3B5743B8B174A814FD584F0D498D54@TK5EX14MBXC126.redmond.corp.microsoft.com> <BF3BB6D12298F54B89C8DCC1E4073D80028C76E0@CORPUSMX50A.corp.emc.com>
In-Reply-To: <BF3BB6D12298F54B89C8DCC1E4073D80028C76E0@CORPUSMX50A.corp.emc.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [157.54.51.78]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [nfsv4] Write-behind caching
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Oct 2010 03:32:44 -0000

Fair enough.  I haven't looked to see if the layout types
address this specific, needed, behavior.  Obviously the
statement you reference and the individual layout descriptions
should be tied together.  Again, I don't remember but there
may be layout specific steps needed in the case of handling
layoutreturns.  

In any case, we can handle the eventual conclusion as an errata.

Spencer


> -----Original Message-----
> From: david.noveck@emc.com [mailto:david.noveck@emc.com]
> Sent: Monday, October 25, 2010 8:25 PM
> To: Spencer Shepler; nfsv4@ietf.org
> Subject: RE: [nfsv4] Write-behind caching
> 
> I agree that the intent was to cover a variety of layout types.
> 
> I think what you are saying about the issue of different throughputs for
> having and not having layouts also makes sense.  It may in some way have
> led to the statement in RFC5661 but those statements are by no means the
> same.  They have different consequences.  I take it that you are saying
> (correctly) something like:
> 
>      However, write-behind implementations will generally need to bound
>      the amount of unwritten date so that given the bandwidth of the
>      output path, the data can be written in a reasonable time.  Clients
> 
>      which have layouts should avoid keeping larger amounts to reflect a
>      situation in which a layout provides a write path of higher
> bandwidth.
>      This is because a CB_LAYOUTRECALL may be received.  The client
>      should not delay returning the layout so as to use that higher-
> bandwidth
>      path, so it is best if it assumes, in limiting the amount of data
>      to be written, that the write bandwidth is only what is available
>      without the layout, and that it uses this bandwidth assumption even
>      if it does happen to have a layout.
> 
> This differs from the text in RFC5661 in a few respects.
> 
> 	First it says that the amount of dirty data should be the same when
> 	you have the layout and when you don't, rather than simply saying it
> 	should be small when you have the layout, possibly implying that it
>  	should be smaller than when you don't have a layout.
> 
> 	Second the text now in RFC5661 strongly implies that when you get
> 	CB_LAYOUTRECALL, you would normally start new IO's, rather than
>       simply drain the pending IO's and return the layout ASAP.
> 
> So I don't agree that what is in RFC5661 is good implementation advice,
> particularly in suggesting that clients should delay the LAYOUTRETURN
> while doing a bunch of IO, including starting new IO's.
> 
> 
> -----Original Message-----
> From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf Of
> Spencer Shepler
> Sent: Monday, October 25, 2010 10:07 PM
> To: Noveck, David; nfsv4@ietf.org
> Subject: Re: [nfsv4] Write-behind caching
> 
> 
> Since this description is part of the general pNFS description, the intent
> may have been to cover a variety of layout types.  However, I agree that
> the client is not guaranteed access to the layout and is fully capable of
> writing the data via the MDS if all else fails (inability to obtain the
> layout after a return); it may not be the most performant path but it
> should be functional.  And maybe that is the source of the statement that
> the client should take care in managing its dirty pages given the lack of
> guarantee of access to the supposed, higher throughput path for writing
> data.
> 
> As implementation guidance it seems okay but truly a requirement for
> correct function.
> 
> Spencer
> 
> > -----Original Message-----
> > From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf
> Of
> > david.noveck@emc.com
> > Sent: Monday, October 25, 2010 6:58 PM
> > To: nfsv4@ietf.org
> > Subject: [nfsv4] Write-behind caching
> >
> > The following statement appears at the bottom of page 292 of RFC5661.
> >
> >    However, write-behind caching may negatively
> >    affect the latency in returning a layout in response to a
> >    CB_LAYOUTRECALL; this is similar to file delegations and the impact
> >    that file data caching has on DELEGRETURN.  Client implementations
> >    SHOULD limit the amount of unwritten data they have outstanding at
> >    any one time in order to prevent excessively long responses to
> >    CB_LAYOUTRECALL.
> >
> > This does not seem to make sense to me.
> >
> > First of all the analogy between DELEGRETURN and
> > CB_LAYOUTRECALL/LAYOUTRETURN doesn't seem to me to be correct.  In the
> > case of DELEGRETURN, at least if the file in question has been closed,
> > during the pendency of the delegation, you do need to write all of the
> > dirty data associated with those previously open files.  Normally,
> clients
> > just write all dirty data.
> >
> > LAYOUTRETURN does not have that sort of requirement.  If it is valid
> to
> > hold the dirty data when you do have the layout, it is just as valid
> to
> > hold it when you don't.  You could very well return the layout and get
> it
> > again before some of those dirty blocks are written.  Having a layout
> > grants you the right to do IO using a particular means (different
> based on
> > the mapping type), but if you don't have the layout, you still have a
> way
> > to do the writeback, and there is no particular need to write back all
> the
> > data before returning the layout.  As mentioned above, you may well
> get
> > the layout again before there is any need to actually do the
> write-back.
> >
> > You have to wait until IO's that are in flight are completed before
> you
> > return the layout.  However, I don't see why you would have to or want
> to
> > start new IO's using the layout if you have received a
> CB_LAYOUTRECALL..
> >
> > Am I missing something?  Is there some valid reason for this
> statement?
> > Or should this be dealt with via the errata mechanism?
> >
> > What do existing clients actually do with pending writeback data when
> they
> > get a CB_LAYOUTRECALL?  Do they start new IO's using the layout?
> > If so, is there any other reason other than the paragraph above?
> > _______________________________________________
> > nfsv4 mailing list
> > nfsv4@ietf.org
> > https://www.ietf.org/mailman/listinfo/nfsv4
> 
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4
>