RE: [nfsv4] Block Layout and CB_SIZECHANGED

On Fri, 2006-07-14 at 11:52 -0400, Noveck, Dave wrote:
> > forcing the application above NFS to fflush() or the
> > equivalent (to force an earlier LAYOUT COMMIT)
> 
> If he doesn't do a flush, then the data can be in the
> buffer cache and in that case the data will reappear 
> after the truncate, in NFS as well an pNFS.  So the 
> client has to do something to force his writes to be
> committed at least as far as necessary to ensure that
> they don't happen again.  Given that he has to do 
> something, what is the difficulty with saying that 
> something has to include LAYOUTCOMMIT as well as 
> WRITE and COMMIT?
> 
> By the way, I think was wrong about unstable write case.
> While it is true that if I do a COMMIT after truncate 
> no additional data will be written, if I do an unstable
> write and do not COMMIT, old-style NFS is just as 
> exposed to this issue.  This is because after the 
> unstable writes and the truncate the server may reboot,
> in which case I am going to have my COMMIT fail and I 
> am going to redo my writes, extending the file.
> 
> So I think the proper distinction here is between writes
> that others may see and those that other may but don't
> have to see.  It makes sense that writes in that latter 
> state, whether due to not doing a COMMIT or not doing a 
> LAYOUTCOMMIT are inherently subject to appearing after
> a truncate.  This means that the rule is that if you 
> want to make sure that they are included as subject to
> the truncate you have to convert them from possibly-
> visible-by-others to really-done-and-I-mean-it-and-
> others-must-be-able-to-see-them status. 

Agreed! ...and as far as a POSIX client is concerned, the operations
that guarantee visibility of the writes on the disk/server should be
well defined. I'm not sure this is an exhaustive list, it should be
close:

  write(O_SYNC)/write(O_DSYNC)/write(O_DIRECT)
  fcntl(F_SETLK)/fcntl(F_SETLKW);
  fsync()/msync(MS_SYNC)
  close()

In addition, it is common NFS client practice to flush writes on
truncate() and/or fstat().

In all those cases, the client must do a LAYOUTCOMMIT, and (assuming I
didn't miss something above) that should suffice to deal with all those
cases where the application is using some sneaky out-of-band
communication.

The "broken client" scenario need not be fixed in the protocol.

Cheers,
  Trond

> -----Original Message-----
> From: Black_David@emc.com [mailto:Black_David@emc.com] 
> Sent: Friday, July 14, 2006 11:22 AM
> To: trond.myklebust@fys.uio.no
> Cc: nfsv4@ietf.org
> Subject: RE: [nfsv4] Block Layout and CB_SIZECHANGED
> 
> 
> > I'd argue that until you commit the layout, you are still in the
> > situation where the data has not been written. You have not done the
> > equivalent of a full NFSv4.0 unstable WRITE since a successful
> unstable
> > write must update both the data _and_ the metadata in the server's
> cache.
> > IOW the point at which the written data becomes visible to others is
> > what matters, and that means after LAYOUTCOMMIT.
> 
> And if NFS were the only possible communication channel, I might agree,
> but going back to my scenario (and inserting a couple of instances of
> "[layout]" for clarification:
> 
> 1) pNFS client takes out an extent from 32k to 64k, and writes data.
> 	It marks the written area as needing to be [layout] COMMIT-ed,
> but
> 	doesn't do the [layout] COMMIT.
> 2) Some other client uses SETATTR to truncates the file to be 4k in
> size.
> 
> Suppose that the clients are in cahoots - there was an out-of-band
> communication between them, and the SETATTR was supposed to throw
> away the first client's writes (and some other data).  Having it
> reappear because pNFS did something strange (first client does the
> delayed LAYOUTCOMMIT after the SETATTR) would be peculiar, and
> to my mind, forcing the application above NFS to fflush() or the
> equivalent (to force an earlier LAYOUT COMMIT) before the out-of-band
> communication is tantamount to admitting that there is a problem here
> but we're going to force applications to fix it.  This is an NFS vs.
> pNFS behavior difference that I'd prefer to eliminate.
> 
> Thanks,
> --David
> ----------------------------------------------------
> David L. Black, Senior Technologist
> EMC Corporation, 176 South St., Hopkinton, MA  01748
> +1 (508) 293-7953             FAX: +1 (508) 293-7786
> black_david@emc.com        Mobile: +1 (978) 394-7754
> ----------------------------------------------------
> 
> > > But David is not talking about cached writes but writes done to 
> > > the data server which have not been LAYOUTCOMMITed.  There is no 
> > > non-pnfs equivalent of that.
> > > 
> > > The closest I can come is unstable writes done to the server
> > > which have not been COMMITed.  In this case a truncate is effective
> > > without locking.  You do the the COMMIT and the file not extended.
> > > 
> > > How you judge this case depends on what analogies you make.  Is
> > > writing to the data server more like putting things in your cache
> > > or it more like doing an unstable write?  I'd argue that the latter
> > > is a more appropriate analogy.
> > 
> > I'd argue that until you commit the layout, you are still in the
> > situation where the data has not been written. You have not done the
> > equivalent of a full NFSv4.0 unstable WRITE since a successful
> unstable
> > write must update both the data _and_ the metadata in the server's
> cache.
> > IOW the point at which the written data becomes visible to others is
> > what matters, and that means after LAYOUTCOMMIT.
> > 
> > Cheers,
> >   Trond
> 
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www1.ietf.org/mailman/listinfo/nfsv4

_______________________________________________
nfsv4 mailing list
nfsv4@ietf.org
https://www1.ietf.org/mailman/listinfo/nfsv4