RE: [nfsv4] Block Layout and CB_SIZECHANGED
Trond Myklebust <trond.myklebust@fys.uio.no> Fri, 14 July 2006 16:12 UTC
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1G1QGo-0007Qp-TQ; Fri, 14 Jul 2006 12:12:02 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1G1QGo-0007Qk-5M for nfsv4@ietf.org; Fri, 14 Jul 2006 12:12:02 -0400
Received: from pat.uio.no ([129.240.10.4]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1G1QGn-0000Jp-Jd for nfsv4@ietf.org; Fri, 14 Jul 2006 12:12:02 -0400
Received: from mail-mx7.uio.no ([129.240.10.52]) by pat.uio.no with esmtp (Exim 4.43) id 1G1QGl-0000BG-MD; Fri, 14 Jul 2006 18:11:59 +0200
Received: from dh134.citi.umich.edu ([141.211.133.134]) by mail-mx7.uio.no with esmtpsa (SSLv3:RC4-MD5:128) (Exim 4.43) id 1G1QGg-0003eN-CZ; Fri, 14 Jul 2006 18:11:54 +0200
Subject: RE: [nfsv4] Block Layout and CB_SIZECHANGED
From: Trond Myklebust <trond.myklebust@fys.uio.no>
To: "Noveck, Dave" <Dave.Noveck@netapp.com>
In-Reply-To: <C98692FD98048C41885E0B0FACD9DFB8023DF68A@exnane01.hq.netapp.com>
References: <C98692FD98048C41885E0B0FACD9DFB8023DF68A@exnane01.hq.netapp.com>
Content-Type: text/plain
Date: Fri, 14 Jul 2006 12:11:49 -0400
Message-Id: <1152893509.5729.12.camel@lade.trondhjem.org>
Mime-Version: 1.0
X-Mailer: Evolution 2.6.1
Content-Transfer-Encoding: 7bit
X-UiO-Spam-info: not spam, SpamAssassin (score=-3.55, required 12, autolearn=disabled, AWL 1.45, UIO_MAIL_IS_INTERNAL -5.00)
X-Spam-Score: 0.0 (/)
X-Scan-Signature: dbb8771284c7a36189745aa720dc20ab
Cc: Black_David@emc.com, nfsv4@ietf.org
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
Errors-To: nfsv4-bounces@ietf.org
On Fri, 2006-07-14 at 11:52 -0400, Noveck, Dave wrote: > > forcing the application above NFS to fflush() or the > > equivalent (to force an earlier LAYOUT COMMIT) > > If he doesn't do a flush, then the data can be in the > buffer cache and in that case the data will reappear > after the truncate, in NFS as well an pNFS. So the > client has to do something to force his writes to be > committed at least as far as necessary to ensure that > they don't happen again. Given that he has to do > something, what is the difficulty with saying that > something has to include LAYOUTCOMMIT as well as > WRITE and COMMIT? > > By the way, I think was wrong about unstable write case. > While it is true that if I do a COMMIT after truncate > no additional data will be written, if I do an unstable > write and do not COMMIT, old-style NFS is just as > exposed to this issue. This is because after the > unstable writes and the truncate the server may reboot, > in which case I am going to have my COMMIT fail and I > am going to redo my writes, extending the file. > > So I think the proper distinction here is between writes > that others may see and those that other may but don't > have to see. It makes sense that writes in that latter > state, whether due to not doing a COMMIT or not doing a > LAYOUTCOMMIT are inherently subject to appearing after > a truncate. This means that the rule is that if you > want to make sure that they are included as subject to > the truncate you have to convert them from possibly- > visible-by-others to really-done-and-I-mean-it-and- > others-must-be-able-to-see-them status. Agreed! ...and as far as a POSIX client is concerned, the operations that guarantee visibility of the writes on the disk/server should be well defined. I'm not sure this is an exhaustive list, it should be close: write(O_SYNC)/write(O_DSYNC)/write(O_DIRECT) fcntl(F_SETLK)/fcntl(F_SETLKW); fsync()/msync(MS_SYNC) close() In addition, it is common NFS client practice to flush writes on truncate() and/or fstat(). In all those cases, the client must do a LAYOUTCOMMIT, and (assuming I didn't miss something above) that should suffice to deal with all those cases where the application is using some sneaky out-of-band communication. The "broken client" scenario need not be fixed in the protocol. Cheers, Trond > -----Original Message----- > From: Black_David@emc.com [mailto:Black_David@emc.com] > Sent: Friday, July 14, 2006 11:22 AM > To: trond.myklebust@fys.uio.no > Cc: nfsv4@ietf.org > Subject: RE: [nfsv4] Block Layout and CB_SIZECHANGED > > > > I'd argue that until you commit the layout, you are still in the > > situation where the data has not been written. You have not done the > > equivalent of a full NFSv4.0 unstable WRITE since a successful > unstable > > write must update both the data _and_ the metadata in the server's > cache. > > IOW the point at which the written data becomes visible to others is > > what matters, and that means after LAYOUTCOMMIT. > > And if NFS were the only possible communication channel, I might agree, > but going back to my scenario (and inserting a couple of instances of > "[layout]" for clarification: > > 1) pNFS client takes out an extent from 32k to 64k, and writes data. > It marks the written area as needing to be [layout] COMMIT-ed, > but > doesn't do the [layout] COMMIT. > 2) Some other client uses SETATTR to truncates the file to be 4k in > size. > > Suppose that the clients are in cahoots - there was an out-of-band > communication between them, and the SETATTR was supposed to throw > away the first client's writes (and some other data). Having it > reappear because pNFS did something strange (first client does the > delayed LAYOUTCOMMIT after the SETATTR) would be peculiar, and > to my mind, forcing the application above NFS to fflush() or the > equivalent (to force an earlier LAYOUT COMMIT) before the out-of-band > communication is tantamount to admitting that there is a problem here > but we're going to force applications to fix it. This is an NFS vs. > pNFS behavior difference that I'd prefer to eliminate. > > Thanks, > --David > ---------------------------------------------------- > David L. Black, Senior Technologist > EMC Corporation, 176 South St., Hopkinton, MA 01748 > +1 (508) 293-7953 FAX: +1 (508) 293-7786 > black_david@emc.com Mobile: +1 (978) 394-7754 > ---------------------------------------------------- > > > > But David is not talking about cached writes but writes done to > > > the data server which have not been LAYOUTCOMMITed. There is no > > > non-pnfs equivalent of that. > > > > > > The closest I can come is unstable writes done to the server > > > which have not been COMMITed. In this case a truncate is effective > > > without locking. You do the the COMMIT and the file not extended. > > > > > > How you judge this case depends on what analogies you make. Is > > > writing to the data server more like putting things in your cache > > > or it more like doing an unstable write? I'd argue that the latter > > > is a more appropriate analogy. > > > > I'd argue that until you commit the layout, you are still in the > > situation where the data has not been written. You have not done the > > equivalent of a full NFSv4.0 unstable WRITE since a successful > unstable > > write must update both the data _and_ the metadata in the server's > cache. > > IOW the point at which the written data becomes visible to others is > > what matters, and that means after LAYOUTCOMMIT. > > > > Cheers, > > Trond > > _______________________________________________ > nfsv4 mailing list > nfsv4@ietf.org > https://www1.ietf.org/mailman/listinfo/nfsv4 _______________________________________________ nfsv4 mailing list nfsv4@ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4
- [nfsv4] Block Layout and CB_SIZECHANGED Black_David
- Re: [nfsv4] Block Layout and CB_SIZECHANGED Trond Myklebust
- RE: [nfsv4] Block Layout and CB_SIZECHANGED Noveck, Dave
- RE: [nfsv4] Block Layout and CB_SIZECHANGED Trond Myklebust
- RE: [nfsv4] Block Layout and CB_SIZECHANGED Black_David
- RE: [nfsv4] Block Layout and CB_SIZECHANGED Noveck, Dave
- RE: [nfsv4] Block Layout and CB_SIZECHANGED Trond Myklebust