Re: [nfsv4] Write-behind caching

Benny Halevy <bhalevy@panasas.com> Wed, 27 October 2010 16:25 UTC

Return-Path: <bhalevy@panasas.com>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id D9E913A6934 for <nfsv4@core3.amsl.com>; Wed, 27 Oct 2010 09:25:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.569
X-Spam-Level:
X-Spam-Status: No, score=-6.569 tagged_above=-999 required=5 tests=[AWL=0.030, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gXrvbxjhVMeu for <nfsv4@core3.amsl.com>; Wed, 27 Oct 2010 09:25:51 -0700 (PDT)
Received: from exprod5og108.obsmtp.com (exprod5og108.obsmtp.com [64.18.0.186]) by core3.amsl.com (Postfix) with SMTP id 2597F3A6911 for <nfsv4@ietf.org>; Wed, 27 Oct 2010 09:25:51 -0700 (PDT)
Received: from source ([67.152.220.89]) by exprod5ob108.postini.com ([64.18.4.12]) with SMTP ID DSNKTMhS/UV0M6zPn4s4f6kB6b0mSKwAPR/8@postini.com; Wed, 27 Oct 2010 09:27:41 PDT
Received: from fs1.bhalevy.com ([172.17.33.166]) by daytona.int.panasas.com with Microsoft SMTPSVC(6.0.3790.3959); Wed, 27 Oct 2010 12:27:40 -0400
Message-ID: <4CC852FA.4050903@panasas.com>
Date: Wed, 27 Oct 2010 18:27:38 +0200
From: Benny Halevy <bhalevy@panasas.com>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.9) Gecko/20100921 Fedora/3.1.4-1.fc13 Thunderbird/3.1.4
MIME-Version: 1.0
To: david.noveck@emc.com
References: <BF3BB6D12298F54B89C8DCC1E4073D80028C76DB@CORPUSMX50A.corp.emc.com> <E043D9D8EE3B5743B8B174A814FD584F0D498D54@TK5EX14MBXC126.redmond.corp.microsoft.com> <BF3BB6D12298F54B89C8DCC1E4073D80028C76E0@CORPUSMX50A.corp.emc.com>
In-Reply-To: <BF3BB6D12298F54B89C8DCC1E4073D80028C76E0@CORPUSMX50A.corp.emc.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
X-OriginalArrivalTime: 27 Oct 2010 16:27:40.0237 (UTC) FILETIME=[DFCBCFD0:01CB75F3]
Cc: nfsv4@ietf.org
Subject: Re: [nfsv4] Write-behind caching
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Oct 2010 16:25:53 -0000

On 2010-10-26 05:24, david.noveck@emc.com wrote:
> I agree that the intent was to cover a variety of layout types.
> 
> I think what you are saying about the issue of different throughputs for
> having and not having layouts also makes sense.  It may in some way have
> led to the statement in RFC5661 but those statements are by no means the
> same.  They have different consequences.  I take it that you are saying
> (correctly) something like:
> 
>      However, write-behind implementations will generally need to bound
>      the amount of unwritten date so that given the bandwidth of the 
>      output path, the data can be written in a reasonable time.  Clients
> 
>      which have layouts should avoid keeping larger amounts to reflect a
>      situation in which a layout provides a write path of higher
> bandwidth.
>      This is because a CB_LAYOUTRECALL may be received.  The client
>      should not delay returning the layout so as to use that
> higher-bandwidth
>      path, so it is best if it assumes, in limiting the amount of data
>      to be written, that the write bandwidth is only what is available
>      without the layout, and that it uses this bandwidth assumption even
>      if it does happen to have a layout.
> 
> This differs from the text in RFC5661 in a few respects.
> 
> 	First it says that the amount of dirty data should be the same
> when
> 	you have the layout and when you don't, rather than simply
> saying it
> 	should be small when you have the layout, possibly implying that
> it
>  	should be smaller than when you don't have a layout.
> 
> 	Second the text now in RFC5661 strongly implies that when you
> get
> 	CB_LAYOUTRECALL, you would normally start new IO's, rather than 
>       simply drain the pending IO's and return the layout ASAP. 
> 
> So I don't agree that what is in RFC5661 is good implementation advice,
> particularly in suggesting that clients should delay the LAYOUTRETURN
> while doing a bunch of IO, including starting new IO's.


That what clora_changed is for.
It's up to the server to provide this hint to the client
and it's up to the client to throttle its dirty cache write-behind in response
to CB_LAYOUTRECALL.
If in your implementation flushing dirty data always seem to be a bad idea
the server can just always set clora_changed to true (though the hint name
is somewhat too specific for the implied semantics)

Benny

>  
> 
> -----Original Message-----
> From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf
> Of Spencer Shepler
> Sent: Monday, October 25, 2010 10:07 PM
> To: Noveck, David; nfsv4@ietf.org
> Subject: Re: [nfsv4] Write-behind caching
> 
> 
> Since this description is part of the general pNFS description, the
> intent may have been to cover a variety of layout types.  However,
> I agree that the client is not guaranteed access to the layout and
> is fully capable of writing the data via the MDS if all else
> fails (inability to obtain the layout after a return); it may not
> be the most performant path but it should be functional.  And maybe
> that is the source of the statement that the client should take
> care in managing its dirty pages given the lack of guarantee of
> access to the supposed, higher throughput path for writing data.
> 
> As implementation guidance it seems okay but truly a requirement
> for correct function.
> 
> Spencer
> 
>> -----Original Message-----
>> From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf
> Of
>> david.noveck@emc.com
>> Sent: Monday, October 25, 2010 6:58 PM
>> To: nfsv4@ietf.org
>> Subject: [nfsv4] Write-behind caching
>>
>> The following statement appears at the bottom of page 292 of RFC5661.
>>
>>    However, write-behind caching may negatively
>>    affect the latency in returning a layout in response to a
>>    CB_LAYOUTRECALL; this is similar to file delegations and the impact
>>    that file data caching has on DELEGRETURN.  Client implementations
>>    SHOULD limit the amount of unwritten data they have outstanding at
>>    any one time in order to prevent excessively long responses to
>>    CB_LAYOUTRECALL.
>>
>> This does not seem to make sense to me.
>>
>> First of all the analogy between DELEGRETURN and
>> CB_LAYOUTRECALL/LAYOUTRETURN doesn't seem to me to be correct.  In the
>> case of DELEGRETURN, at least if the file in question has been closed,
>> during the pendency of the delegation, you do need to write all of the
>> dirty data associated with those previously open files.  Normally,
> clients
>> just write all dirty data.
>>
>> LAYOUTRETURN does not have that sort of requirement.  If it is valid
> to
>> hold the dirty data when you do have the layout, it is just as valid
> to
>> hold it when you don't.  You could very well return the layout and get
> it
>> again before some of those dirty blocks are written.  Having a layout
>> grants you the right to do IO using a particular means (different
> based on
>> the mapping type), but if you don't have the layout, you still have a
> way
>> to do the writeback, and there is no particular need to write back all
> the
>> data before returning the layout.  As mentioned above, you may well
> get
>> the layout again before there is any need to actually do the
> write-back.
>>
>> You have to wait until IO's that are in flight are completed before
> you
>> return the layout.  However, I don't see why you would have to or want
> to
>> start new IO's using the layout if you have received a
> CB_LAYOUTRECALL..
>>
>> Am I missing something?  Is there some valid reason for this
> statement?
>> Or should this be dealt with via the errata mechanism?
>>
>> What do existing clients actually do with pending writeback data when
> they
>> get a CB_LAYOUTRECALL?  Do they start new IO's using the layout?
>> If so, is there any other reason other than the paragraph above?
>> _______________________________________________
>> nfsv4 mailing list
>> nfsv4@ietf.org
>> https://www.ietf.org/mailman/listinfo/nfsv4
> 
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4
> 
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4