Re: [nfsv4] List of possible work items for NFSv4.2

"William A. (Andy) Adamson" <androsadamson@gmail.com> Thu, 13 August 2009 17:51 UTC

Return-Path: <androsadamson@gmail.com>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 8F1163A6CF3 for <nfsv4@core3.amsl.com>; Thu, 13 Aug 2009 10:51:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bQJ5ffVKPr8n for <nfsv4@core3.amsl.com>; Thu, 13 Aug 2009 10:51:06 -0700 (PDT)
Received: from mail-gx0-f217.google.com (mail-gx0-f217.google.com [209.85.217.217]) by core3.amsl.com (Postfix) with ESMTP id 09BDB3A6CF5 for <nfsv4@ietf.org>; Thu, 13 Aug 2009 10:51:00 -0700 (PDT)
Received: by gxk17 with SMTP id 17so1171384gxk.19 for <nfsv4@ietf.org>; Thu, 13 Aug 2009 10:51:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=gLsdvKGwv3XswBNr1FwaAcfFJZcpZ+qQpscDJECZRDE=; b=wfr1pQxJ1wsDNt++Fs+EyjE7Mk6ozno9whirsPHD8aHXF/NSAiQPhqTLJwV8Uv9wCD eDlxdRfUiDwTQY8lIny+UbrnaF8ZETK83AAMYD5sk/b7IAtU+TbAKPYKWnn8YWsUUe1N lLvzie4ObcvY7S4L7dKpadLpUJZB4TRMpGR74=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=cEoqGqPzqk2Dpw6VAWWENLjGTPsX3dicWrd2NytzvP22FfjTxUbFurBTjAcMVLznoc 04YSx7jymDqSDgGPic5cBcES7z6e1jv3f2mDhvv7KQ6/R3e2gdXxfXmE0oTLObK2u81/ 4dACY9ACXJD5HeJXOOkIBT5I0lrs0RIe2wozU=
MIME-Version: 1.0
Received: by 10.151.38.3 with SMTP id q3mr1649802ybj.230.1250185861016; Thu, 13 Aug 2009 10:51:01 -0700 (PDT)
In-Reply-To: <fe7adea4b3fba5af3e063472b7048e41.squirrel@webmail.eisler.com>
References: <fe7adea4b3fba5af3e063472b7048e41.squirrel@webmail.eisler.com>
Date: Thu, 13 Aug 2009 13:51:00 -0400
Message-ID: <89c397150908131051r610dad9uca34cc33db4e80bb@mail.gmail.com>
From: "William A. (Andy) Adamson" <androsadamson@gmail.com>
To: Mike Eisler <mre-ietf@eisler.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Cc: nfsv4@ietf.org
Subject: Re: [nfsv4] List of possible work items for NFSv4.2
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Aug 2009 17:51:08 -0000

On Wed, Aug 12, 2009 at 11:41 PM, Mike Eisler<mre-ietf@eisler.com> wrote:
>
> At the Stockholm IETF meeting, I left with an
> action item to list and summarize possible work
> items for NFSv4.2 and provide this within two
> weeks of the meeting.
>
> Please read this list, and discuss it, and where
> possible indicate your willingness to:
>
> - contribute to NFSv4.2
>
> - review NFSv4.2 proposals

yes.

>
> - implement an NFSv4.2 client and/or server at
>  supported one or more of these ideas.

The striped meta-data sounds interesting.

>
> And of course, suggest items to add to the list,
> and items to remove from the list.
>
> Work items apparently requiring a new minor version of NFSv4
> ============================================================
>
> Peer-to-Peer NFS
> ----------------
>
> See
> http://tools.ietf.org/id/draft-myklebust-nfsv4-pnfs-backend-00.txt .
>
> The proposal involves allowing pNFS clients to
> become data servers for other pNFS clients thus
> off loading the primary storage array.  As
> presented at the Stockholm IETF meeting it
> appears to be limited to read-only workloads
> based on whole-file caching.  Feedback in
> Stockholm was that read/write workloads should be
> considered and that perhaps something along the
> lines of
> http://tools.ietf.org/html/draft-eisler-nfsv4-pnfs-dedupe-00
> could be combined with the peer-to-peer NFS proposal to offer
> sub-file caching. Note that the dedupe proposal
> is easily extended to provide sub-file caching.
> Regardless the concept was well received.
>
> Given industry trends toward flash memory, and
> that the economics of flash memory best realized
> if the flash is close to the application,
> Peer-to-Peer NFS with sub-file caching is a very
> timely.
>
> Copy
> ----
>
> See http://tools.ietf.org/html/draft-lentini-nfsv4-server-side-copy-03 .
>
> The proposal supports intra-NFS server and
> inter-NFS server file copy.  This is not a new
> idea for WG; it was mentioned in the BOF that
> predated the formation of the WG over 10 years
> ago. The idea has never had merit because there
> have been now APIs on existing NFS clients to use
> it. This is starting to change.  Hypervisors are
> starting to have facilities (e.g.  VMware vSphere
> system, see
> http://www.vmware.com/files/pdf/key_features_vsphere.pdf)
> that require the ability to copy storage between
> storage devices. In addition, there is a proposal
> to add a reflink() ( see
> http://lwn.net/Articles/331576/ ) system call
> that enables application to leverage a file
> system's zero copy cloning capability.
>
> There seems to be little controversy supporting
> adopting file copy as a work item.
>
> Note that as proposed, file copy requires a
> revision to RPCSEC_GSS to enable a concept called
> "privileges".  See
> http://tools.ietf.org/html/draft-williams-rpcsecgssv3-00 .
> While the original motivation for RPCSEC_GSSv3
> was to support Mandatory Access Control, the
> privileges concept has proven to be very useful
> to express the type of delegated security
> inter-server file copy requires.
>
> Hole Punching
> -------------
>
> Regular files can contain holes: byte ranges that
> are zero filled but do not contain allocated
> space. Holes are useful for saving space.
>
> Hole Punching is the act of zeroing out a byte
> range of a file, and de-allocating the storage
> for that byte range. This is easy to do in NFSv4
> today if hole punching from the end of the file
> downward: invoke SEATTR to reduce the file's
> length, then invoke SEATTR to restore the file to
> its original size.
>
> Hope punching has been proposed for NFS in the
> past. In the 1980s a draft proposal for NFSv3
> included it. However, as with file copy, the lack
> of APIs caused the idea to die.
>
> What has changed is the existence of hypervisors.
> A hypervisor virtualizes the storage of guest
> operating systems. The guest thinks it is dealing
> with a physical storage device and sends commands
> to deallocate storage. The hypervisor intercepts
> these requests. If using network storage, then
> the hypervisor needs support in the storage
> access protocol to deallocate blocks of storage.
> I.e., it needs hole punching support.
>
> Hole punching has had little (if not zero)
> discussion on the NFSv4 mailing list, but it
> would be inconsistent to adopt file copy without
> hole punching.
>
> MAC Security Attribute
> ----------------------
>
> See http://tools.ietf.org/html/draft-quigley-nfsv4-sec-label-00.
>
> The attribute is necessary to support Mandatory
> Access Control.
>
> This work has been presented several times at
> IETF and has been discussed several times on the
> mailing list. One issue is that there is no
> consensus on defining REQUIRED
> Domain-Of-Interpretations (DOIs) to ensure
> interoperability, nor is there consensus that
> such a thing is required. There is arguably a
> precedent:  the mimetype attribute. The NFSv4.1
> protocol does not explicitly define the content
> of this attribute, relying on the standards
> bodies that control mime types to define the
> content. Mime types are similar to DOIs in the
> action taken by the NFS client when reading the
> type is not explained by the NFS protocol. Nor
> are mandatory mime types defined. The difference
> is that DOIs potentially have explicit
> requirements on the NFSv4 server.
>
> The idea of associating MAC with an
> IANA-registered named attribute has been
> suggested in the past, but according to the I-D
> rejected because:
>
> "[named attributes] lack a way to atomically set
>  the attribute on creation.  In addition, named
>  attributes themselves are file system objects
>  which need to be assigned a security attribute.
>  "
>
> These are certainly issues. While we could
> imagine changing the NFSv4.x protocol to allow
> named attributes to be created atomically, more
> thought would have to be put into what it means
> to set what is effectively a permission attribute
> on the permission attribute. Assigning special
> semantics to one particular named attribute seems
> to be what a RECOMMENDED or REQUIRED attribute
> are designed to do.
>
> It is clear (as evidenced by the energy behind
> SELinux) that on the NFS client-side, especially
> Linux, there is strong desire to support MAC. The
> same level of desire does not appear to exist on
> the NFS server-side, with several storage vendors
> at IETF meetings indicating that in absence of a
> customer demand, they would not be likely to
> support the feature.
>
> Traffic Classification
> ----------------------
> Near the end of the feature freeze for NFSv4.1, a
> proposal was made to specify priority channels.
> This was very controversial and quickly
> withdrawn. Nonetheless, the demand for
> classifying or tagging streams of traffic never
> goes away.
>
> End-to-End Data Integrity
> -------------------------
> At various times during the NFSv4.1 standards
> process, the topic of defining data integrity
> checksum that would be kept in the storage device
> and provided to the client when it read the data
> was discussed. The motivation was to protect data
> from silent corruption as it left the storage
> media on read, or was sent from the client on
> write.
>
> Various issues were raised:
> - the method by which this checksum was provided,
>  as explicit NFSv4.x operations or via
>  RPCSEC_GSS was controversial
>
> - additional performance impact, especially if
>  the client or server was already using
>  integrity or privacy in RPCSEC_GSS (i.e. why
>  calculate two different checksums)
>
> - no support for operations other than READ and WRITE. I.e. metadata
>  was not protected.
>
> - how should mismatches between the alignment of
>  transfer size of the client's I/O versus the
>  server's on media check sum be handled?
>
> - controversy as to whether this was a significant problem
>
> The last issue is the key one. Without consensus
> there is a problem to solve, this work item won't
> go forward.
>
> Umask Attribute
> ---------------
> See http://www.ietf.org/proceedings/74/slides/nfsv4-3.pdf
>
> The proposal is to include a umask attribute that
> would be provided with the OPEN operation during
> file creation.  This is not an attribute that
> would be stored in the file but instead would
> allow the NFSv4 client to indicate to the NFSv4
> server what umask to apply to file when combining
> the mode and/or acl attributes in the OPEN
> arguments.
>
> The proposal goes on to say that if there is a
> default ACL on the file's directory, the server
> can ignore the umask.  What is not explained is
> what problem this solves, since the client could
> combine the umask and mode on its side, and send
> the OPEN with a mode attribute reflecting the
> combination of umask and the mode asrgument to
> the open() system call.
>
> The proposal does say that if there is a default
> ACL on the file's parent directory, the server
> can ignore the umask. Apparently the purpose here
> is to emulate a UNIX semantic that says that the
> mode should be used as is when there is a default
> ACL (but then how is the mode combined with any
> corresponding user, group, and other ACEs in the
> default ACL?).
>
> More discussion is likely requireed for this
> proposed item.
>
> Shutdown Callback
> -----------------
> See http://www.ietf.org/proceedings/74/slides/nfsv4-3.pdf
>
> The proposal is that the server will send a
> callback in preparation for a planned shutdown.
>
> The client can then react as needed: inform user,
> unmount NFS file systems etc.
>
> One reaction not mentioned is that the client
> could commit modified data to the server.
>
> This functionality replaces the "rwall" ONC RPC service.
>
> Readahead Hint
> --------------
> See http://www.ietf.org/proceedings/74/slides/nfsv4-3.pdf
>
> Today NFS servers use heuristics to determine if
> a sequential read pattern exists, and if so, they
> will schedule reads from their storage devices in
> anticipation that by the time the client sends a
> READ, the data will be in the server's cache.
> This has drawbacks:
>
> - With pNFS, a given storage device has
>  difficulty detecting a read pattern, since the
>  next logical block might be on the next
>  device.
>
> - NFS clients often have parallel threads issuing
>  read requests. The pattern of READs as received
>  by the server is not sequential.
>
> - Detecting readahead requires a set of READs.
>
> - For small files, the set of READs needed might
>  exceed the length of the file
>
> - The heuristics on the server can produce false
>  positives.
>
> It appears the proposal would consist of a new
> operation that would be like READ, but would not
> return data. Possible return values might be:
>
> - requested ignored (server is too loaded)
>
> - range is already in cache
>
> - request in progress
>
> pNFS Connectivity/Access Indication
> -----------------------------------
> See http://www.ietf.org/proceedings/75/slides/nfsv4-0.pdf, slides 112-121.
>
> The issue is that a pNFS client might be able to
> reach a storage device identified in a layout,
> due to a misconfiguration in the network or on
> the pNFS server. Ease-of-use considerations
> motivate a way for the pNFS client to communicate
> the problem to the MDS.
>
> This communication could be in the form of an
> extension to LAYOUT_RETURN, or a new operation.
>
> There seemed to be consensus at the Stockholm
> meeting that we want to solve this.
>
> Better Negotiation of Session Reply Cache Sizes
> -----------------------------------------------
> After the WG meeting in Stockholm there was
> discussion around how to enable a replier on a
> session to pre-allocate the necessary space
> needed for the reply cache without over
> provisioning. One idea discussed to add an
> operation that limits the set of operations that
> can be used on the session. For example, a client
> might create a session used only for operations
> with results that are never cached, such as READ,
> READDIR, and another session used only for
> operations that are invariably cached, such as
> WRITE, RENAME, REMOVE, etc. One problem with this
> approach is that the operation would be sent
> after the session was created, making it too late
> for the server to pre-allocate the optimal size
> for its reply cache.
>
> Work items apparently not requiring a new minor version of NFSv4
> ================================================================
>
> Metadata Striping
> -----------------
>
> See http://www.ietf.org/proceedings/73/slides/nfsv4-3.pdf .
>
> The proposal is extend pNFS via a new layout type
> to support distribution of metadata in a pNFS
> server. A second type of MDS, the lMDS is
> described. A pNFS client would be directed to an
> lMDS via a layout returned by LAYOUTGET on the
> new layout type. As proposed, only a new layout
> type is needed.
>
> The proposal has had little discussion on mailing
> list, other than to clarify some points. At the
> Minneapolis IETF meeting, it was noted that the
> registered algorithms used for distributing
> metadata by file name needed to be small in
> number if pNFS clients were going to successfully
> interoperate with any pNFS server.
>
> De-Dupe Awareness and Sub-File Caching
> --------------------------------------
>
> See http://www.ietf.org/proceedings/73/slides/nfsv4-3.pdf .
>
> The proposal is that NFS servers that support
> space efficiency (i.e. data is the same between
> two files is stored once), provide the space
> efficiency maps to the NFS client. The maps are
> encoded as bit maps, each bit corresponding to a
> fixed sized block of a file.
>
> The proposal does not require a new minor version
> of NFS, but instead requires 64 new pNFS layout
> types.
>
> The proposal can be extended to support sub-file
> caching, whether the file has de-duplication or
> not, and is a candidate for marrying with the
> peer-to-peer NFS proposal.
>
> At the San Francisco and Minneapolis IETF
> meetings, the feedback on the proposals has been
> that block sizes and alignments that are powers
> of 2 don't match up with all forms of
> de-duplication and major use cases of caching.
> For example suppose file 1 is 9111 bytes long and
> file 2 is 1000 bytes long.  At offset 111, the
> next 1000 bytes are equal to all of file 2. File
> 1 and file 2 are de-duplicated in some storage
> devices.  A major use cache of caching that is
> not covered by the proposal might be an HPC
> application that has records are each aligned on
> 64 bit boundaries but with lengths that are not
> powers of 2, e.g. the record lengths might be 108
> bytes each (a multiple of 64 bits).
>
> It seems obvious how the proposal could address
> the HPC caching use case; simply relax the
> requirement that block sizes be powers of 2.  More
> thought will be needed to address unaligned
> de-duplication use case, at least in its most
> general forms.
>
>
> --
> Mike Eisler, Senior Technical Director, NetApp, 719 599 9026,
> http://blogs.netapp.com/eislers_nfs_blog/
>
>
>
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4
>