[nfsv4] LAYOUTCOMMIT discussion

<Noveck_David@emc.com> Tue, 27 July 2010 19:30 UTC

Return-Path: <Noveck_David@emc.com>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id B888A3A6886 for <nfsv4@core3.amsl.com>; Tue, 27 Jul 2010 12:30:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.599
X-Spam-Level:
X-Spam-Status: No, score=-6.599 tagged_above=-999 required=5 tests=[AWL=0.000, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FX1Aqg3y+Nxd for <nfsv4@core3.amsl.com>; Tue, 27 Jul 2010 12:30:34 -0700 (PDT)
Received: from mexforward.lss.emc.com (mexforward.lss.emc.com [128.222.32.20]) by core3.amsl.com (Postfix) with ESMTP id 552373A696D for <nfsv4@ietf.org>; Tue, 27 Jul 2010 12:30:00 -0700 (PDT)
Received: from hop04-l1d11-si02.isus.emc.com (HOP04-L1D11-SI02.isus.emc.com [10.254.111.55]) by mexforward.lss.emc.com (Switch-3.3.2/Switch-3.1.7) with ESMTP id o6RJUMBE029072 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for <nfsv4@ietf.org>; Tue, 27 Jul 2010 15:30:22 -0400
Received: from mailhub.lss.emc.com (nagas.lss.emc.com [10.254.144.15]) by hop04-l1d11-si02.isus.emc.com (RSA Interceptor) for <nfsv4@ietf.org>; Tue, 27 Jul 2010 15:30:17 -0400
Received: from corpussmtp4.corp.emc.com (corpussmtp4.corp.emc.com [10.254.169.197]) by mailhub.lss.emc.com (Switch-3.4.2/Switch-3.3.2mp) with ESMTP id o6RJU6Za016525 for <nfsv4@ietf.org>; Tue, 27 Jul 2010 15:30:17 -0400
Received: from CORPUSMX50A.corp.emc.com ([128.221.62.43]) by corpussmtp4.corp.emc.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 27 Jul 2010 15:30:08 -0400
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Date: Tue, 27 Jul 2010 15:30:06 -0400
Message-ID: <BF3BB6D12298F54B89C8DCC1E4073D8001F6CA5F@CORPUSMX50A.corp.emc.com>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: LAYOUTCOMMIT discussion
Thread-Index: Acstwh55BeH8y0ftQo6OSRtHoXS9hQ==
From: Noveck_David@emc.com
To: nfsv4@ietf.org
X-OriginalArrivalTime: 27 Jul 2010 19:30:08.0265 (UTC) FILETIME=[1F534F90:01CB2DC2]
X-EMM-EM: Active
Subject: [nfsv4] LAYOUTCOMMIT discussion
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Jul 2010 19:30:38 -0000

I'd like to make sure we pursue this and come to some sort of conclusion
or at least make progress toward a solution.

One problem with LAYOUTCOMMIT description is that the text is very
block-oriented.  That's a result of the fact that 80% of the
functionality is appropriate to block layouts but it needs to explain,
what is and what isn't done in the case of file layouts.  A particular
example is loca_last_write_offset.  This is required because for block
because you may write one block of a file when only one byte is part of
the file.  But, for file, the DS knows the last offset and allowing you
to specify a possibly invalid one is not appropriate. 

But as far as the issue about when and if to use it, the troubling part
is the following last paragraph of section 13.10.

   The NFSv4.1 protocol only provides close-to-open file data cache
   semantics;

This is NOT what the v4.1 protocol provides.  It is up to the client to
determine if it caches data and many clients, don't for good reasons.
As far as the protocol goes, leaving pNFS aside, if you do a write the
change attribute will change and it supposed do. 

   meaning that when the file is closed, all modified data is
   written to the server.  When a subsequent OPEN of the file is done,
   the change attribute is inspected for a difference from a cached
   value for the change attribute.  For the case above, this means that
   a LAYOUTCOMMIT will be done at close (along with the data WRITEs) and
   will update the file's size and change attribute.  Access from
   another client after that point will result in the appropriate size
   being returned.

"a LAYOUTCOMMIT 'will be done' at close".  Not MUST or SHOULD.  And the
text suggests that the client adds this op as opposed to it being part
of CLOSE, but it isn't absolutely clear about that.

The reason that it is better for this to be automatic part of CLOSE, is
that there are going to be instances in which the file is closed when no
CLOSE is done (e.g. lease expiration), and the server had better do a
LAYOUTCOMMIT at that point or everything is a mess.

So here is my proposal for what should be written in this place:

    Many uses of the NFSv4.1 protocol are based on close-to-open
    caching semantics and it is a requirement that the modify time
    is updated frequently enough to support this.  In the case of
    the file layout type, when there have been writes done through
    a file layout on a file being closed, the client SHOULD do a 
    LAYOUTCOMMIT after completion of all the writes and before the
    CLOSE.  This will ensure that the change attribute and file size
    are updated appropriately so that access from another client
    after that point will result in the appropriate attributes
    being returned.

    When a file open for write is closed a result of a lease expiration,

    and the client has a layout for that file, the server must perform 
    the equivalent of a LAYOUTCOMMIT before closing the file, typically
    by fetching current attributes from the DS, to ensure that the
file's 
    attributes are properly updated.  When a lease expiration involves
    multiple files open for write for which there are associated
    layouts, the server is free to fetch attributes for multiple
    file from a DS in a single control protocol request. 
    
    When a lease is expired but the close does not occur immediately 
    the server still SHOULD perform the equivalent of a LAYOUTCOMMIT 
    at the point of lease expiration.

    In many applications, there are files that essentially are never
    closed.  In order to provide more meaningful attribute values in
    such environments, the server SHOULD ensure that the propagation
    of attributes (size, change, etc.) is done at least every lease
    period.  Because attribute updates for many file from a single DS 
    could be gathered in a single request, the resources to do this
    not very large.

 Comments?