Re: [nfsv4] Write-behind caching

Dean Hildebrand <seattleplus@gmail.com> Wed, 27 October 2010 05:06 UTC

Return-Path: <seattleplus@gmail.com>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 860D83A686E for <nfsv4@core3.amsl.com>; Tue, 26 Oct 2010 22:06:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.524
X-Spam-Level:
X-Spam-Status: No, score=-2.524 tagged_above=-999 required=5 tests=[AWL=0.075, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aRvoA51Z8QTP for <nfsv4@core3.amsl.com>; Tue, 26 Oct 2010 22:06:21 -0700 (PDT)
Received: from mail-yx0-f172.google.com (mail-yx0-f172.google.com [209.85.213.172]) by core3.amsl.com (Postfix) with ESMTP id BF03B3A6826 for <nfsv4@ietf.org>; Tue, 26 Oct 2010 22:06:20 -0700 (PDT)
Received: by yxp4 with SMTP id 4so129894yxp.31 for <nfsv4@ietf.org>; Tue, 26 Oct 2010 22:08:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=iijKUKn/aZ5bB+zQsZezdpSYsO8m1t50FarfTUOkQl4=; b=JJiv5dwb5iyS7hJ2hGvCu6CUcRI8INFdm9km60vJUxuxe67/iCGy3KH7WL8SxWIHDt dtdVixACkayESYJNrd+bN0xGr1gFxurrEZ+UlQQZ/4OFn4yohic2VeHP3/m+ADmOzbKH WuEcxqAm2IH62yu6UXnVC2dYr0QZqGZq2EtKI=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; b=gwY1vXUQX+W5AFVM3eQ3gMNoCPmaRqa9KLiVzr+f/XQ1PmHOPbraEmKZCIAtGuS5Gl 64u/mPF2FUjx9Oour1HAML3NSy+yV7KRrQGi142wF5k0NzIJ65p9AYn5iHOZ36L33KU7 uDcJPzrTroVPWNk2+cCbjUqzt5gDYsL13I3Jw=
Received: by 10.91.62.3 with SMTP id p3mr346662agk.105.1288156088990; Tue, 26 Oct 2010 22:08:08 -0700 (PDT)
Received: from [192.168.1.42] (pool-71-112-60-10.sttlwa.dsl-w.verizon.net [71.112.60.10]) by mx.google.com with ESMTPS id 3sm11253977ano.21.2010.10.26.22.08.06 (version=SSLv3 cipher=RC4-MD5); Tue, 26 Oct 2010 22:08:07 -0700 (PDT)
Message-ID: <4CC7B3AE.8000802@gmail.com>
Date: Tue, 26 Oct 2010 22:07:58 -0700
From: Dean Hildebrand <seattleplus@gmail.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.11) Gecko/20101013 Thunderbird/3.1.5
MIME-Version: 1.0
To: nfsv4@ietf.org
References: <BF3BB6D12298F54B89C8DCC1E4073D80028C76DB@CORPUSMX50A.corp.emc.com> <E043D9D8EE3B5743B8B174A814FD584F0D498D54@TK5EX14MBXC126.redmond.corp.microsoft.com> <BF3BB6D12298F54B89C8DCC1E4073D80028C76E0@CORPUSMX50A.corp.emc.com> <E043D9D8EE3B5743B8B174A814FD584F0D498E1D@TK5EX14MBXC126.redmond.corp.microsoft.com> <BF3BB6D12298F54B89C8DCC1E4073D80028C76EA@CORPUSMX50A.corp.emc.com>
In-Reply-To: <BF3BB6D12298F54B89C8DCC1E4073D80028C76EA@CORPUSMX50A.corp.emc.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Subject: Re: [nfsv4] Write-behind caching
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Oct 2010 05:06:22 -0000

I remember at one time there was a thought that all dirty data would 
have to be written to disk when it receives a layoutrecall.  Once the 
data was written, it would send a layoutreturn.  I think this was the 
thinking before all the timing issues and other such things cropped up.  
I assume someone wrote that as general advice, somehow thinking that 
responding to a layoutrecall was more important than actually achieving 
good write performance.

In this light, the analogy with delegreturn makes sense if you take a 
very specific example, but obviously not in general.

I would vote to just cut this text, as I think it is simply outdated.
Dean

On 10/26/2010 3:34 AM, david.noveck@emc.com wrote:
> That makes sense.  Let me take on this issue with regard to the file
> layout.  Are there volunteers to address it with regard to block and
> object?  It would be great if we could get together in Beijing, discuss
> this, and come to a joint conclusion to present to the working group
> (via email I mean).  I'm not planning trying to do this before the
> working group meeting.  In any case, I'm pretty sure there won't be any
> time during the working group meeting.
>
> -----Original Message-----
> From: Spencer Shepler [mailto:sshepler@microsoft.com]
> Sent: Monday, October 25, 2010 11:34 PM
> To: Noveck, David; nfsv4@ietf.org
> Subject: RE: [nfsv4] Write-behind caching
>
>
> Fair enough.  I haven't looked to see if the layout types
> address this specific, needed, behavior.  Obviously the
> statement you reference and the individual layout descriptions
> should be tied together.  Again, I don't remember but there
> may be layout specific steps needed in the case of handling
> layoutreturns.
>
> In any case, we can handle the eventual conclusion as an errata.
>
> Spencer
>
>
>> -----Original Message-----
>> From: david.noveck@emc.com [mailto:david.noveck@emc.com]
>> Sent: Monday, October 25, 2010 8:25 PM
>> To: Spencer Shepler; nfsv4@ietf.org
>> Subject: RE: [nfsv4] Write-behind caching
>>
>> I agree that the intent was to cover a variety of layout types.
>>
>> I think what you are saying about the issue of different throughputs
> for
>> having and not having layouts also makes sense.  It may in some way
> have
>> led to the statement in RFC5661 but those statements are by no means
> the
>> same.  They have different consequences.  I take it that you are
> saying
>> (correctly) something like:
>>
>>       However, write-behind implementations will generally need to
> bound
>>       the amount of unwritten date so that given the bandwidth of the
>>       output path, the data can be written in a reasonable time.
> Clients
>>       which have layouts should avoid keeping larger amounts to reflect
> a
>>       situation in which a layout provides a write path of higher
>> bandwidth.
>>       This is because a CB_LAYOUTRECALL may be received.  The client
>>       should not delay returning the layout so as to use that higher-
>> bandwidth
>>       path, so it is best if it assumes, in limiting the amount of data
>>       to be written, that the write bandwidth is only what is available
>>       without the layout, and that it uses this bandwidth assumption
> even
>>       if it does happen to have a layout.
>>
>> This differs from the text in RFC5661 in a few respects.
>>
>> 	First it says that the amount of dirty data should be the same
> when
>> 	you have the layout and when you don't, rather than simply
> saying it
>> 	should be small when you have the layout, possibly implying that
> it
>>   	should be smaller than when you don't have a layout.
>>
>> 	Second the text now in RFC5661 strongly implies that when you
> get
>> 	CB_LAYOUTRECALL, you would normally start new IO's, rather than
>>        simply drain the pending IO's and return the layout ASAP.
>>
>> So I don't agree that what is in RFC5661 is good implementation
> advice,
>> particularly in suggesting that clients should delay the LAYOUTRETURN
>> while doing a bunch of IO, including starting new IO's.
>>
>>
>> -----Original Message-----
>> From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf
> Of
>> Spencer Shepler
>> Sent: Monday, October 25, 2010 10:07 PM
>> To: Noveck, David; nfsv4@ietf.org
>> Subject: Re: [nfsv4] Write-behind caching
>>
>>
>> Since this description is part of the general pNFS description, the
> intent
>> may have been to cover a variety of layout types.  However, I agree
> that
>> the client is not guaranteed access to the layout and is fully capable
> of
>> writing the data via the MDS if all else fails (inability to obtain
> the
>> layout after a return); it may not be the most performant path but it
>> should be functional.  And maybe that is the source of the statement
> that
>> the client should take care in managing its dirty pages given the lack
> of
>> guarantee of access to the supposed, higher throughput path for
> writing
>> data.
>>
>> As implementation guidance it seems okay but truly a requirement for
>> correct function.
>>
>> Spencer
>>
>>> -----Original Message-----
>>> From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On
> Behalf
>> Of
>>> david.noveck@emc.com
>>> Sent: Monday, October 25, 2010 6:58 PM
>>> To: nfsv4@ietf.org
>>> Subject: [nfsv4] Write-behind caching
>>>
>>> The following statement appears at the bottom of page 292 of
> RFC5661.
>>>     However, write-behind caching may negatively
>>>     affect the latency in returning a layout in response to a
>>>     CB_LAYOUTRECALL; this is similar to file delegations and the
> impact
>>>     that file data caching has on DELEGRETURN.  Client
> implementations
>>>     SHOULD limit the amount of unwritten data they have outstanding
> at
>>>     any one time in order to prevent excessively long responses to
>>>     CB_LAYOUTRECALL.
>>>
>>> This does not seem to make sense to me.
>>>
>>> First of all the analogy between DELEGRETURN and
>>> CB_LAYOUTRECALL/LAYOUTRETURN doesn't seem to me to be correct.  In
> the
>>> case of DELEGRETURN, at least if the file in question has been
> closed,
>>> during the pendency of the delegation, you do need to write all of
> the
>>> dirty data associated with those previously open files.  Normally,
>> clients
>>> just write all dirty data.
>>>
>>> LAYOUTRETURN does not have that sort of requirement.  If it is valid
>> to
>>> hold the dirty data when you do have the layout, it is just as valid
>> to
>>> hold it when you don't.  You could very well return the layout and
> get
>> it
>>> again before some of those dirty blocks are written.  Having a
> layout
>>> grants you the right to do IO using a particular means (different
>> based on
>>> the mapping type), but if you don't have the layout, you still have
> a
>> way
>>> to do the writeback, and there is no particular need to write back
> all
>> the
>>> data before returning the layout.  As mentioned above, you may well
>> get
>>> the layout again before there is any need to actually do the
>> write-back.
>>> You have to wait until IO's that are in flight are completed before
>> you
>>> return the layout.  However, I don't see why you would have to or
> want
>> to
>>> start new IO's using the layout if you have received a
>> CB_LAYOUTRECALL..
>>> Am I missing something?  Is there some valid reason for this
>> statement?
>>> Or should this be dealt with via the errata mechanism?
>>>
>>> What do existing clients actually do with pending writeback data
> when
>> they
>>> get a CB_LAYOUTRECALL?  Do they start new IO's using the layout?
>>> If so, is there any other reason other than the paragraph above?
>>> _______________________________________________
>>> nfsv4 mailing list
>>> nfsv4@ietf.org
>>> https://www.ietf.org/mailman/listinfo/nfsv4
>> _______________________________________________
>> nfsv4 mailing list
>> nfsv4@ietf.org
>> https://www.ietf.org/mailman/listinfo/nfsv4
>>
>
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4