Re: [nfsv4] Write-behind caching

<david.noveck@emc.com> Tue, 26 October 2010 10:33 UTC

Return-Path: <david.noveck@emc.com>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 30B5D3A690E for <nfsv4@core3.amsl.com>; Tue, 26 Oct 2010 03:33:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.664
X-Spam-Level:
X-Spam-Status: No, score=-6.664 tagged_above=-999 required=5 tests=[AWL=-0.065, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VaZCNvoF5dA6 for <nfsv4@core3.amsl.com>; Tue, 26 Oct 2010 03:33:07 -0700 (PDT)
Received: from mexforward.lss.emc.com (mexforward.lss.emc.com [128.222.32.20]) by core3.amsl.com (Postfix) with ESMTP id B16423A67E7 for <nfsv4@ietf.org>; Tue, 26 Oct 2010 03:33:07 -0700 (PDT)
Received: from hop04-l1d11-si01.isus.emc.com (HOP04-L1D11-SI01.isus.emc.com [10.254.111.54]) by mexforward.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id o9QAYs0L002727 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 26 Oct 2010 06:34:54 -0400
Received: from mailhub.lss.emc.com (mailhub.lss.emc.com [10.254.221.145]) by hop04-l1d11-si01.isus.emc.com (RSA Interceptor); Tue, 26 Oct 2010 06:34:51 -0400
Received: from corpussmtp5.corp.emc.com (corpussmtp5.corp.emc.com [128.221.166.229]) by mailhub.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id o9QAY5Pb020509; Tue, 26 Oct 2010 06:34:05 -0400
Received: from CORPUSMX50A.corp.emc.com ([128.221.62.45]) by corpussmtp5.corp.emc.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 26 Oct 2010 06:34:04 -0400
x-mimeole: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Date: Tue, 26 Oct 2010 06:34:05 -0400
Message-ID: <BF3BB6D12298F54B89C8DCC1E4073D80028C76EA@CORPUSMX50A.corp.emc.com>
In-Reply-To: <E043D9D8EE3B5743B8B174A814FD584F0D498E1D@TK5EX14MBXC126.redmond.corp.microsoft.com>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: [nfsv4] Write-behind caching
Thread-Index: Act0sTwJi7tX/aDpTA+jC4LSYbZLZwAALReAAAIs/VAAAOVtoAAOftmA
References: <BF3BB6D12298F54B89C8DCC1E4073D80028C76DB@CORPUSMX50A.corp.emc.com> <E043D9D8EE3B5743B8B174A814FD584F0D498D54@TK5EX14MBXC126.redmond.corp.microsoft.com> <BF3BB6D12298F54B89C8DCC1E4073D80028C76E0@CORPUSMX50A.corp.emc.com> <E043D9D8EE3B5743B8B174A814FD584F0D498E1D@TK5EX14MBXC126.redmond.corp.microsoft.com>
From: david.noveck@emc.com
To: sshepler@microsoft.com, nfsv4@ietf.org
X-OriginalArrivalTime: 26 Oct 2010 10:34:04.0939 (UTC) FILETIME=[50147DB0:01CB74F9]
X-EMM-MHVC: 1
Subject: Re: [nfsv4] Write-behind caching
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Oct 2010 10:33:11 -0000

That makes sense.  Let me take on this issue with regard to the file
layout.  Are there volunteers to address it with regard to block and
object?  It would be great if we could get together in Beijing, discuss
this, and come to a joint conclusion to present to the working group
(via email I mean).  I'm not planning trying to do this before the
working group meeting.  In any case, I'm pretty sure there won't be any
time during the working group meeting. 

-----Original Message-----
From: Spencer Shepler [mailto:sshepler@microsoft.com] 
Sent: Monday, October 25, 2010 11:34 PM
To: Noveck, David; nfsv4@ietf.org
Subject: RE: [nfsv4] Write-behind caching


Fair enough.  I haven't looked to see if the layout types
address this specific, needed, behavior.  Obviously the
statement you reference and the individual layout descriptions
should be tied together.  Again, I don't remember but there
may be layout specific steps needed in the case of handling
layoutreturns.  

In any case, we can handle the eventual conclusion as an errata.

Spencer


> -----Original Message-----
> From: david.noveck@emc.com [mailto:david.noveck@emc.com]
> Sent: Monday, October 25, 2010 8:25 PM
> To: Spencer Shepler; nfsv4@ietf.org
> Subject: RE: [nfsv4] Write-behind caching
> 
> I agree that the intent was to cover a variety of layout types.
> 
> I think what you are saying about the issue of different throughputs
for
> having and not having layouts also makes sense.  It may in some way
have
> led to the statement in RFC5661 but those statements are by no means
the
> same.  They have different consequences.  I take it that you are
saying
> (correctly) something like:
> 
>      However, write-behind implementations will generally need to
bound
>      the amount of unwritten date so that given the bandwidth of the
>      output path, the data can be written in a reasonable time.
Clients
> 
>      which have layouts should avoid keeping larger amounts to reflect
a
>      situation in which a layout provides a write path of higher
> bandwidth.
>      This is because a CB_LAYOUTRECALL may be received.  The client
>      should not delay returning the layout so as to use that higher-
> bandwidth
>      path, so it is best if it assumes, in limiting the amount of data
>      to be written, that the write bandwidth is only what is available
>      without the layout, and that it uses this bandwidth assumption
even
>      if it does happen to have a layout.
> 
> This differs from the text in RFC5661 in a few respects.
> 
> 	First it says that the amount of dirty data should be the same
when
> 	you have the layout and when you don't, rather than simply
saying it
> 	should be small when you have the layout, possibly implying that
it
>  	should be smaller than when you don't have a layout.
> 
> 	Second the text now in RFC5661 strongly implies that when you
get
> 	CB_LAYOUTRECALL, you would normally start new IO's, rather than
>       simply drain the pending IO's and return the layout ASAP.
> 
> So I don't agree that what is in RFC5661 is good implementation
advice,
> particularly in suggesting that clients should delay the LAYOUTRETURN
> while doing a bunch of IO, including starting new IO's.
> 
> 
> -----Original Message-----
> From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf
Of
> Spencer Shepler
> Sent: Monday, October 25, 2010 10:07 PM
> To: Noveck, David; nfsv4@ietf.org
> Subject: Re: [nfsv4] Write-behind caching
> 
> 
> Since this description is part of the general pNFS description, the
intent
> may have been to cover a variety of layout types.  However, I agree
that
> the client is not guaranteed access to the layout and is fully capable
of
> writing the data via the MDS if all else fails (inability to obtain
the
> layout after a return); it may not be the most performant path but it
> should be functional.  And maybe that is the source of the statement
that
> the client should take care in managing its dirty pages given the lack
of
> guarantee of access to the supposed, higher throughput path for
writing
> data.
> 
> As implementation guidance it seems okay but truly a requirement for
> correct function.
> 
> Spencer
> 
> > -----Original Message-----
> > From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On
Behalf
> Of
> > david.noveck@emc.com
> > Sent: Monday, October 25, 2010 6:58 PM
> > To: nfsv4@ietf.org
> > Subject: [nfsv4] Write-behind caching
> >
> > The following statement appears at the bottom of page 292 of
RFC5661.
> >
> >    However, write-behind caching may negatively
> >    affect the latency in returning a layout in response to a
> >    CB_LAYOUTRECALL; this is similar to file delegations and the
impact
> >    that file data caching has on DELEGRETURN.  Client
implementations
> >    SHOULD limit the amount of unwritten data they have outstanding
at
> >    any one time in order to prevent excessively long responses to
> >    CB_LAYOUTRECALL.
> >
> > This does not seem to make sense to me.
> >
> > First of all the analogy between DELEGRETURN and
> > CB_LAYOUTRECALL/LAYOUTRETURN doesn't seem to me to be correct.  In
the
> > case of DELEGRETURN, at least if the file in question has been
closed,
> > during the pendency of the delegation, you do need to write all of
the
> > dirty data associated with those previously open files.  Normally,
> clients
> > just write all dirty data.
> >
> > LAYOUTRETURN does not have that sort of requirement.  If it is valid
> to
> > hold the dirty data when you do have the layout, it is just as valid
> to
> > hold it when you don't.  You could very well return the layout and
get
> it
> > again before some of those dirty blocks are written.  Having a
layout
> > grants you the right to do IO using a particular means (different
> based on
> > the mapping type), but if you don't have the layout, you still have
a
> way
> > to do the writeback, and there is no particular need to write back
all
> the
> > data before returning the layout.  As mentioned above, you may well
> get
> > the layout again before there is any need to actually do the
> write-back.
> >
> > You have to wait until IO's that are in flight are completed before
> you
> > return the layout.  However, I don't see why you would have to or
want
> to
> > start new IO's using the layout if you have received a
> CB_LAYOUTRECALL..
> >
> > Am I missing something?  Is there some valid reason for this
> statement?
> > Or should this be dealt with via the errata mechanism?
> >
> > What do existing clients actually do with pending writeback data
when
> they
> > get a CB_LAYOUTRECALL?  Do they start new IO's using the layout?
> > If so, is there any other reason other than the paragraph above?
> > _______________________________________________
> > nfsv4 mailing list
> > nfsv4@ietf.org
> > https://www.ietf.org/mailman/listinfo/nfsv4
> 
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4
>