RE: [nfsv4] Block Layout and CB_SIZECHANGED

Trond Myklebust <trond.myklebust@fys.uio.no> Fri, 14 July 2006 16:12 UTC

Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1G1QGo-0007Qp-TQ; Fri, 14 Jul 2006 12:12:02 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1G1QGo-0007Qk-5M for nfsv4@ietf.org; Fri, 14 Jul 2006 12:12:02 -0400
Received: from pat.uio.no ([129.240.10.4]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1G1QGn-0000Jp-Jd for nfsv4@ietf.org; Fri, 14 Jul 2006 12:12:02 -0400
Received: from mail-mx7.uio.no ([129.240.10.52]) by pat.uio.no with esmtp (Exim 4.43) id 1G1QGl-0000BG-MD; Fri, 14 Jul 2006 18:11:59 +0200
Received: from dh134.citi.umich.edu ([141.211.133.134]) by mail-mx7.uio.no with esmtpsa (SSLv3:RC4-MD5:128) (Exim 4.43) id 1G1QGg-0003eN-CZ; Fri, 14 Jul 2006 18:11:54 +0200
Subject: RE: [nfsv4] Block Layout and CB_SIZECHANGED
From: Trond Myklebust <trond.myklebust@fys.uio.no>
To: "Noveck, Dave" <Dave.Noveck@netapp.com>
In-Reply-To: <C98692FD98048C41885E0B0FACD9DFB8023DF68A@exnane01.hq.netapp.com>
References: <C98692FD98048C41885E0B0FACD9DFB8023DF68A@exnane01.hq.netapp.com>
Content-Type: text/plain
Date: Fri, 14 Jul 2006 12:11:49 -0400
Message-Id: <1152893509.5729.12.camel@lade.trondhjem.org>
Mime-Version: 1.0
X-Mailer: Evolution 2.6.1
Content-Transfer-Encoding: 7bit
X-UiO-Spam-info: not spam, SpamAssassin (score=-3.55, required 12, autolearn=disabled, AWL 1.45, UIO_MAIL_IS_INTERNAL -5.00)
X-Spam-Score: 0.0 (/)
X-Scan-Signature: dbb8771284c7a36189745aa720dc20ab
Cc: Black_David@emc.com, nfsv4@ietf.org
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
Errors-To: nfsv4-bounces@ietf.org

On Fri, 2006-07-14 at 11:52 -0400, Noveck, Dave wrote:
> > forcing the application above NFS to fflush() or the
> > equivalent (to force an earlier LAYOUT COMMIT)
> 
> If he doesn't do a flush, then the data can be in the
> buffer cache and in that case the data will reappear 
> after the truncate, in NFS as well an pNFS.  So the 
> client has to do something to force his writes to be
> committed at least as far as necessary to ensure that
> they don't happen again.  Given that he has to do 
> something, what is the difficulty with saying that 
> something has to include LAYOUTCOMMIT as well as 
> WRITE and COMMIT?
> 
> By the way, I think was wrong about unstable write case.
> While it is true that if I do a COMMIT after truncate 
> no additional data will be written, if I do an unstable
> write and do not COMMIT, old-style NFS is just as 
> exposed to this issue.  This is because after the 
> unstable writes and the truncate the server may reboot,
> in which case I am going to have my COMMIT fail and I 
> am going to redo my writes, extending the file.
> 
> So I think the proper distinction here is between writes
> that others may see and those that other may but don't
> have to see.  It makes sense that writes in that latter 
> state, whether due to not doing a COMMIT or not doing a 
> LAYOUTCOMMIT are inherently subject to appearing after
> a truncate.  This means that the rule is that if you 
> want to make sure that they are included as subject to
> the truncate you have to convert them from possibly-
> visible-by-others to really-done-and-I-mean-it-and-
> others-must-be-able-to-see-them status. 

Agreed! ...and as far as a POSIX client is concerned, the operations
that guarantee visibility of the writes on the disk/server should be
well defined. I'm not sure this is an exhaustive list, it should be
close:

  write(O_SYNC)/write(O_DSYNC)/write(O_DIRECT)
  fcntl(F_SETLK)/fcntl(F_SETLKW);
  fsync()/msync(MS_SYNC)
  close()

In addition, it is common NFS client practice to flush writes on
truncate() and/or fstat().

In all those cases, the client must do a LAYOUTCOMMIT, and (assuming I
didn't miss something above) that should suffice to deal with all those
cases where the application is using some sneaky out-of-band
communication.

The "broken client" scenario need not be fixed in the protocol.

Cheers,
  Trond

> -----Original Message-----
> From: Black_David@emc.com [mailto:Black_David@emc.com] 
> Sent: Friday, July 14, 2006 11:22 AM
> To: trond.myklebust@fys.uio.no
> Cc: nfsv4@ietf.org
> Subject: RE: [nfsv4] Block Layout and CB_SIZECHANGED
> 
> 
> > I'd argue that until you commit the layout, you are still in the
> > situation where the data has not been written. You have not done the
> > equivalent of a full NFSv4.0 unstable WRITE since a successful
> unstable
> > write must update both the data _and_ the metadata in the server's
> cache.
> > IOW the point at which the written data becomes visible to others is
> > what matters, and that means after LAYOUTCOMMIT.
> 
> And if NFS were the only possible communication channel, I might agree,
> but going back to my scenario (and inserting a couple of instances of
> "[layout]" for clarification:
> 
> 1) pNFS client takes out an extent from 32k to 64k, and writes data.
> 	It marks the written area as needing to be [layout] COMMIT-ed,
> but
> 	doesn't do the [layout] COMMIT.
> 2) Some other client uses SETATTR to truncates the file to be 4k in
> size.
> 
> Suppose that the clients are in cahoots - there was an out-of-band
> communication between them, and the SETATTR was supposed to throw
> away the first client's writes (and some other data).  Having it
> reappear because pNFS did something strange (first client does the
> delayed LAYOUTCOMMIT after the SETATTR) would be peculiar, and
> to my mind, forcing the application above NFS to fflush() or the
> equivalent (to force an earlier LAYOUT COMMIT) before the out-of-band
> communication is tantamount to admitting that there is a problem here
> but we're going to force applications to fix it.  This is an NFS vs.
> pNFS behavior difference that I'd prefer to eliminate.
> 
> Thanks,
> --David
> ----------------------------------------------------
> David L. Black, Senior Technologist
> EMC Corporation, 176 South St., Hopkinton, MA  01748
> +1 (508) 293-7953             FAX: +1 (508) 293-7786
> black_david@emc.com        Mobile: +1 (978) 394-7754
> ----------------------------------------------------
> 
> > > But David is not talking about cached writes but writes done to 
> > > the data server which have not been LAYOUTCOMMITed.  There is no 
> > > non-pnfs equivalent of that.
> > > 
> > > The closest I can come is unstable writes done to the server
> > > which have not been COMMITed.  In this case a truncate is effective
> > > without locking.  You do the the COMMIT and the file not extended.
> > > 
> > > How you judge this case depends on what analogies you make.  Is
> > > writing to the data server more like putting things in your cache
> > > or it more like doing an unstable write?  I'd argue that the latter
> > > is a more appropriate analogy.
> > 
> > I'd argue that until you commit the layout, you are still in the
> > situation where the data has not been written. You have not done the
> > equivalent of a full NFSv4.0 unstable WRITE since a successful
> unstable
> > write must update both the data _and_ the metadata in the server's
> cache.
> > IOW the point at which the written data becomes visible to others is
> > what matters, and that means after LAYOUTCOMMIT.
> > 
> > Cheers,
> >   Trond
> 
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www1.ietf.org/mailman/listinfo/nfsv4


_______________________________________________
nfsv4 mailing list
nfsv4@ietf.org
https://www1.ietf.org/mailman/listinfo/nfsv4