Re: [nfsv4] Write-behind caching

Trond Myklebust <trond.myklebust@fys.uio.no> Fri, 29 October 2010 22:01 UTC

Return-Path: <trond.myklebust@fys.uio.no>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 401093A6AE5 for <nfsv4@core3.amsl.com>; Fri, 29 Oct 2010 15:01:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.44
X-Spam-Level:
X-Spam-Status: No, score=-6.44 tagged_above=-999 required=5 tests=[AWL=0.159, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AhS6MoJZ42Uo for <nfsv4@core3.amsl.com>; Fri, 29 Oct 2010 15:01:56 -0700 (PDT)
Received: from mail-out2.uio.no (mail-out2.uio.no [129.240.10.58]) by core3.amsl.com (Postfix) with ESMTP id F199C3A685C for <nfsv4@ietf.org>; Fri, 29 Oct 2010 15:01:55 -0700 (PDT)
Received: from mail-mx5.uio.no ([129.240.10.46]) by mail-out2.uio.no with esmtp (Exim 4.69) (envelope-from <trond.myklebust@fys.uio.no>) id 1PBx35-0001wE-Fi; Sat, 30 Oct 2010 00:03:47 +0200
Received: from c-68-40-206-115.hsd1.mi.comcast.net ([68.40.206.115] helo=[192.168.1.29]) by mail-mx5.uio.no with esmtpsa (SSLv3:CAMELLIA256-SHA:256) user trondmy (Exim 4.69) (envelope-from <trond.myklebust@fys.uio.no>) id 1PBx34-0000tS-FE; Sat, 30 Oct 2010 00:03:47 +0200
From: Trond Myklebust <trond.myklebust@fys.uio.no>
To: sfaibish <sfaibish@emc.com>
In-Reply-To: <1288388933.3701.47.camel@heimdal.trondhjem.org>
References: <BF3BB6D12298F54B89C8DCC1E4073D80028C76DB@CORPUSMX50A.corp.emc.com> <E043D9D8EE3B5743B8B174A814FD584F0D498D54@TK5EX14MBXC126.redmond.corp.microsoft.com> <BF3BB6D12298F54B89C8DCC1E4073D80028C76E0@CORPUSMX50A.corp.emc.com> <E043D9D8EE3B5743B8B174A814FD584F0D498E1D@TK5EX14MBXC126.redmond.corp.microsoft.com> <BF3BB6D12298F54B89C8DCC1E4073D80028C76EA@CORPUSMX50A.corp.emc.com> <4CC7B3AE.8000802@gmail.com> <AANLkTi=gD+qr-OhJuf19miV60w9t9TbJiopNS6y4-YVA@mail.gmail.com> <1288186821.8477.28.camel@heimdal.trondhjem.org> <BF3BB6D12298F54B89C8DCC1E4073D80028C7A3E@CORPUSMX50A.corp.emc.com> <op.vk8tpuc5unckof@usensfaibisl2e.eng.emc.com> <4CC857D5.5010104@panasas.com> <op.vk8vpbldunckof@usensfaibisl2e.eng.emc.com> <BF3BB6D12298F54B89C8DCC1E4073D80028C80AB@CORPUSMX50A.corp.emc.com> <1288373995.3701.35.camel@heimdal.trondhjem.org> <op.vlcwr1zqunckof@usensfaibisl2e.eng.emc.com> <1288388933.3701.47.camel@heimdal.trondhjem.org>
Content-Type: text/plain; charset="UTF-8"
Date: Fri, 29 Oct 2010 18:03:43 -0400
Message-ID: <1288389823.3701.59.camel@heimdal.trondhjem.org>
Mime-Version: 1.0
X-Mailer: Evolution 2.30.3 (2.30.3-1.fc13)
Content-Transfer-Encoding: 7bit
X-UiO-Ratelimit-Test: rcpts/h 5 msgs/h 1 sum rcpts/h 8 sum msgs/h 1 total rcpts 1111 max rcpts/h 20 ratelimit 0
X-UiO-Spam-info: not spam, SpamAssassin (score=-5.0, required=5.0, autolearn=disabled, UIO_MAIL_IS_INTERNAL=-5, uiobl=NO, uiouri=NO)
X-UiO-Scanned: 014A246842B220597B992D706BB3536F85F22767
X-UiO-SPAM-Test: remote_host: 68.40.206.115 spam_score: -49 maxlevel 80 minaction 2 bait 0 mail/h: 1 total 443 max/h 7 blacklist 0 greylist 0 ratelimit 0
Cc: bhalevy@panasas.com, nfsv4@ietf.org
Subject: Re: [nfsv4] Write-behind caching
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 29 Oct 2010 22:01:57 -0000

On Fri, 2010-10-29 at 17:48 -0400, Trond Myklebust wrote:
> On Fri, 2010-10-29 at 17:32 -0400, sfaibish wrote:
> > On Fri, 29 Oct 2010 13:39:55 -0400, Trond Myklebust  
> > <trond.myklebust@fys.uio.no> wrote:
> > 
> > > On Fri, 2010-10-29 at 13:20 -0400, david.noveck@emc.com wrote:
> > >> There are two issues here with regard to handling of layout recall.
> > >>
> > >> One is with regard to in-flight IO.  As Benny points out, you cannot be  
> > >> sure that the in-flight IO can be completed in time to avoid the MDS  
> > >> losing patience.  That should rarely be the case though, if things are  
> > >> working right.  The client has to be prepared to deal with IO failures  
> > >> due to layout revocation.  Any IO that was in flight and failed because  
> > >> of layout revocation will need to be handled by being reissued to the  
> > >> MDS.  Is there anybody that disagrees with that?
> > >>
> > >> The second issue concerns IO not in-flight (in other words, not IO's  
> > >> yet but potential IO's) when the recall is received.  I just don't see  
> > >> that it reasonable to start IO's using layout segments being recalled  
> > >> (whether for dirty buffers or anything else).  Doing IO's to the MDS is  
> > >> fine but there is no real need for the layout recall to specially  
> > >> trigger them, whether clora_changed is set or not.
> > >
> > > This should be _very_ rare. Any cases where 2 clients are trying to do
> > > conflicting I/O on the same data is likely to be either a violation of
> > > the NFS cache consistency rules, or a scenario where it is in any case
> > > more efficient to go through the MDS (e.g. writing to adjacent records
> > > that share the same extent).
> > Well this is a different discussion: what was the reason for the recall in
> > the first place. This is one usecase but there could be other usecases
> > for the recall and we discuss here how to implement the protcol more than
> > how to solve a real problem. My 2c
> 
> I strongly disagree. If this is an unrealistic scenario, then we don't
> have to care about devising an optimal strategy for it. The 'there could
> be other usecases' scenario needs to be fleshed out before we can deal
> with it.

To clarify a bit what I mean: we MUST devise optimal strategies for
realistic and useful scenarios. It is entirely OPTIONAL to devise
optimal strategies for unrealistic ones.

If writing back all data before returning the layout causes protocol
issues because the server cannot distinguish between a bad client and
one that is waiting for I/O to complete, then my argument is that we're
in the second case: we don't have to optimise for it, and so it is safe
for the server to assume 'bad client'...

   Trond