[nfsv4] Write-behind caching

<david.noveck@emc.com> Tue, 26 October 2010 01:57 UTC

Return-Path: <david.noveck@emc.com>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 12B3B3A67BE for <nfsv4@core3.amsl.com>; Mon, 25 Oct 2010 18:57:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.666
X-Spam-Level:
X-Spam-Status: No, score=-6.666 tagged_above=-999 required=5 tests=[AWL=-0.067, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qQis+18f-PGO for <nfsv4@core3.amsl.com>; Mon, 25 Oct 2010 18:57:06 -0700 (PDT)
Received: from mexforward.lss.emc.com (mexforward.lss.emc.com [128.222.32.20]) by core3.amsl.com (Postfix) with ESMTP id BD1753A677D for <nfsv4@ietf.org>; Mon, 25 Oct 2010 18:57:06 -0700 (PDT)
Received: from hop04-l1d11-si03.isus.emc.com (HOP04-L1D11-SI03.isus.emc.com [10.254.111.23]) by mexforward.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id o9Q1wqNB005037 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for <nfsv4@ietf.org>; Mon, 25 Oct 2010 21:58:52 -0400
Received: from mailhub.lss.emc.com (mailhub.lss.emc.com [10.254.221.251]) by hop04-l1d11-si03.isus.emc.com (RSA Interceptor) for <nfsv4@ietf.org>; Mon, 25 Oct 2010 21:58:40 -0400
Received: from corpussmtp3.corp.emc.com (corpussmtp3.corp.emc.com [10.254.169.196]) by mailhub.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id o9Q1w8OI026072 for <nfsv4@ietf.org>; Mon, 25 Oct 2010 21:58:09 -0400
Received: from CORPUSMX50A.corp.emc.com ([128.221.62.45]) by corpussmtp3.corp.emc.com with Microsoft SMTPSVC(6.0.3790.4675); Mon, 25 Oct 2010 21:58:08 -0400
x-mimeole: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Date: Mon, 25 Oct 2010 21:58:07 -0400
Message-ID: <BF3BB6D12298F54B89C8DCC1E4073D80028C76DB@CORPUSMX50A.corp.emc.com>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: Write-behind caching
Thread-Index: Act0sTwJi7tX/aDpTA+jC4LSYbZLZw==
From: david.noveck@emc.com
To: nfsv4@ietf.org
X-OriginalArrivalTime: 26 Oct 2010 01:58:08.0836 (UTC) FILETIME=[3CCE0840:01CB74B1]
X-EMM-MHVC: 1
Subject: [nfsv4] Write-behind caching
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Oct 2010 01:57:08 -0000

The following statement appears at the bottom of page 292 of RFC5661.  

   However, write-behind caching may negatively
   affect the latency in returning a layout in response to a
   CB_LAYOUTRECALL; this is similar to file delegations and the impact
   that file data caching has on DELEGRETURN.  Client implementations
   SHOULD limit the amount of unwritten data they have outstanding at
   any one time in order to prevent excessively long responses to
   CB_LAYOUTRECALL.

This does not seem to make sense to me.

First of all the analogy between DELEGRETURN and
CB_LAYOUTRECALL/LAYOUTRETURN doesn't seem to me to be correct.  In the
case of DELEGRETURN, at least if the file in question has been closed,
during the pendency of the delegation, you do need to write all of the
dirty data associated with those previously open files.  Normally,
clients just write all dirty data.

LAYOUTRETURN does not have that sort of requirement.  If it is valid to
hold the dirty data when you do have the layout, it is just as valid to
hold it when you don't.  You could very well return the layout and get
it again before some of those dirty blocks are written.  Having a layout
grants you the right to do IO using a particular means (different based
on the mapping type), but if you don't have the layout, you still have a
way to do the writeback, and there is no particular need to write back
all the data before returning the layout.  As mentioned above, you may
well get the layout again before there is any need to actually do the
write-back.

You have to wait until IO's that are in flight are completed before you
return the layout.  However, I don't see why you would have to or want
to start new IO's using the layout if you have received a
CB_LAYOUTRECALL..

Am I missing something?  Is there some valid reason for this statement?
Or should this be dealt with via the errata mechanism?

What do existing clients actually do with pending writeback data when
they get a CB_LAYOUTRECALL?  Do they start new IO's using the layout?
If so, is there any other reason other than the paragraph above?