Re: [nfsv4] List of possible work items for NFSv4.2
"William A. (Andy) Adamson" <androsadamson@gmail.com> Thu, 13 August 2009 17:51 UTC
Return-Path: <androsadamson@gmail.com>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 8F1163A6CF3 for <nfsv4@core3.amsl.com>; Thu, 13 Aug 2009 10:51:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bQJ5ffVKPr8n for <nfsv4@core3.amsl.com>; Thu, 13 Aug 2009 10:51:06 -0700 (PDT)
Received: from mail-gx0-f217.google.com (mail-gx0-f217.google.com [209.85.217.217]) by core3.amsl.com (Postfix) with ESMTP id 09BDB3A6CF5 for <nfsv4@ietf.org>; Thu, 13 Aug 2009 10:51:00 -0700 (PDT)
Received: by gxk17 with SMTP id 17so1171384gxk.19 for <nfsv4@ietf.org>; Thu, 13 Aug 2009 10:51:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=gLsdvKGwv3XswBNr1FwaAcfFJZcpZ+qQpscDJECZRDE=; b=wfr1pQxJ1wsDNt++Fs+EyjE7Mk6ozno9whirsPHD8aHXF/NSAiQPhqTLJwV8Uv9wCD eDlxdRfUiDwTQY8lIny+UbrnaF8ZETK83AAMYD5sk/b7IAtU+TbAKPYKWnn8YWsUUe1N lLvzie4ObcvY7S4L7dKpadLpUJZB4TRMpGR74=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=cEoqGqPzqk2Dpw6VAWWENLjGTPsX3dicWrd2NytzvP22FfjTxUbFurBTjAcMVLznoc 04YSx7jymDqSDgGPic5cBcES7z6e1jv3f2mDhvv7KQ6/R3e2gdXxfXmE0oTLObK2u81/ 4dACY9ACXJD5HeJXOOkIBT5I0lrs0RIe2wozU=
MIME-Version: 1.0
Received: by 10.151.38.3 with SMTP id q3mr1649802ybj.230.1250185861016; Thu, 13 Aug 2009 10:51:01 -0700 (PDT)
In-Reply-To: <fe7adea4b3fba5af3e063472b7048e41.squirrel@webmail.eisler.com>
References: <fe7adea4b3fba5af3e063472b7048e41.squirrel@webmail.eisler.com>
Date: Thu, 13 Aug 2009 13:51:00 -0400
Message-ID: <89c397150908131051r610dad9uca34cc33db4e80bb@mail.gmail.com>
From: "William A. (Andy) Adamson" <androsadamson@gmail.com>
To: Mike Eisler <mre-ietf@eisler.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Cc: nfsv4@ietf.org
Subject: Re: [nfsv4] List of possible work items for NFSv4.2
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Aug 2009 17:51:08 -0000
On Wed, Aug 12, 2009 at 11:41 PM, Mike Eisler<mre-ietf@eisler.com> wrote: > > At the Stockholm IETF meeting, I left with an > action item to list and summarize possible work > items for NFSv4.2 and provide this within two > weeks of the meeting. > > Please read this list, and discuss it, and where > possible indicate your willingness to: > > - contribute to NFSv4.2 > > - review NFSv4.2 proposals yes. > > - implement an NFSv4.2 client and/or server at > supported one or more of these ideas. The striped meta-data sounds interesting. > > And of course, suggest items to add to the list, > and items to remove from the list. > > Work items apparently requiring a new minor version of NFSv4 > ============================================================ > > Peer-to-Peer NFS > ---------------- > > See > http://tools.ietf.org/id/draft-myklebust-nfsv4-pnfs-backend-00.txt . > > The proposal involves allowing pNFS clients to > become data servers for other pNFS clients thus > off loading the primary storage array. As > presented at the Stockholm IETF meeting it > appears to be limited to read-only workloads > based on whole-file caching. Feedback in > Stockholm was that read/write workloads should be > considered and that perhaps something along the > lines of > http://tools.ietf.org/html/draft-eisler-nfsv4-pnfs-dedupe-00 > could be combined with the peer-to-peer NFS proposal to offer > sub-file caching. Note that the dedupe proposal > is easily extended to provide sub-file caching. > Regardless the concept was well received. > > Given industry trends toward flash memory, and > that the economics of flash memory best realized > if the flash is close to the application, > Peer-to-Peer NFS with sub-file caching is a very > timely. > > Copy > ---- > > See http://tools.ietf.org/html/draft-lentini-nfsv4-server-side-copy-03 . > > The proposal supports intra-NFS server and > inter-NFS server file copy. This is not a new > idea for WG; it was mentioned in the BOF that > predated the formation of the WG over 10 years > ago. The idea has never had merit because there > have been now APIs on existing NFS clients to use > it. This is starting to change. Hypervisors are > starting to have facilities (e.g. VMware vSphere > system, see > http://www.vmware.com/files/pdf/key_features_vsphere.pdf) > that require the ability to copy storage between > storage devices. In addition, there is a proposal > to add a reflink() ( see > http://lwn.net/Articles/331576/ ) system call > that enables application to leverage a file > system's zero copy cloning capability. > > There seems to be little controversy supporting > adopting file copy as a work item. > > Note that as proposed, file copy requires a > revision to RPCSEC_GSS to enable a concept called > "privileges". See > http://tools.ietf.org/html/draft-williams-rpcsecgssv3-00 . > While the original motivation for RPCSEC_GSSv3 > was to support Mandatory Access Control, the > privileges concept has proven to be very useful > to express the type of delegated security > inter-server file copy requires. > > Hole Punching > ------------- > > Regular files can contain holes: byte ranges that > are zero filled but do not contain allocated > space. Holes are useful for saving space. > > Hole Punching is the act of zeroing out a byte > range of a file, and de-allocating the storage > for that byte range. This is easy to do in NFSv4 > today if hole punching from the end of the file > downward: invoke SEATTR to reduce the file's > length, then invoke SEATTR to restore the file to > its original size. > > Hope punching has been proposed for NFS in the > past. In the 1980s a draft proposal for NFSv3 > included it. However, as with file copy, the lack > of APIs caused the idea to die. > > What has changed is the existence of hypervisors. > A hypervisor virtualizes the storage of guest > operating systems. The guest thinks it is dealing > with a physical storage device and sends commands > to deallocate storage. The hypervisor intercepts > these requests. If using network storage, then > the hypervisor needs support in the storage > access protocol to deallocate blocks of storage. > I.e., it needs hole punching support. > > Hole punching has had little (if not zero) > discussion on the NFSv4 mailing list, but it > would be inconsistent to adopt file copy without > hole punching. > > MAC Security Attribute > ---------------------- > > See http://tools.ietf.org/html/draft-quigley-nfsv4-sec-label-00. > > The attribute is necessary to support Mandatory > Access Control. > > This work has been presented several times at > IETF and has been discussed several times on the > mailing list. One issue is that there is no > consensus on defining REQUIRED > Domain-Of-Interpretations (DOIs) to ensure > interoperability, nor is there consensus that > such a thing is required. There is arguably a > precedent: the mimetype attribute. The NFSv4.1 > protocol does not explicitly define the content > of this attribute, relying on the standards > bodies that control mime types to define the > content. Mime types are similar to DOIs in the > action taken by the NFS client when reading the > type is not explained by the NFS protocol. Nor > are mandatory mime types defined. The difference > is that DOIs potentially have explicit > requirements on the NFSv4 server. > > The idea of associating MAC with an > IANA-registered named attribute has been > suggested in the past, but according to the I-D > rejected because: > > "[named attributes] lack a way to atomically set > the attribute on creation. In addition, named > attributes themselves are file system objects > which need to be assigned a security attribute. > " > > These are certainly issues. While we could > imagine changing the NFSv4.x protocol to allow > named attributes to be created atomically, more > thought would have to be put into what it means > to set what is effectively a permission attribute > on the permission attribute. Assigning special > semantics to one particular named attribute seems > to be what a RECOMMENDED or REQUIRED attribute > are designed to do. > > It is clear (as evidenced by the energy behind > SELinux) that on the NFS client-side, especially > Linux, there is strong desire to support MAC. The > same level of desire does not appear to exist on > the NFS server-side, with several storage vendors > at IETF meetings indicating that in absence of a > customer demand, they would not be likely to > support the feature. > > Traffic Classification > ---------------------- > Near the end of the feature freeze for NFSv4.1, a > proposal was made to specify priority channels. > This was very controversial and quickly > withdrawn. Nonetheless, the demand for > classifying or tagging streams of traffic never > goes away. > > End-to-End Data Integrity > ------------------------- > At various times during the NFSv4.1 standards > process, the topic of defining data integrity > checksum that would be kept in the storage device > and provided to the client when it read the data > was discussed. The motivation was to protect data > from silent corruption as it left the storage > media on read, or was sent from the client on > write. > > Various issues were raised: > - the method by which this checksum was provided, > as explicit NFSv4.x operations or via > RPCSEC_GSS was controversial > > - additional performance impact, especially if > the client or server was already using > integrity or privacy in RPCSEC_GSS (i.e. why > calculate two different checksums) > > - no support for operations other than READ and WRITE. I.e. metadata > was not protected. > > - how should mismatches between the alignment of > transfer size of the client's I/O versus the > server's on media check sum be handled? > > - controversy as to whether this was a significant problem > > The last issue is the key one. Without consensus > there is a problem to solve, this work item won't > go forward. > > Umask Attribute > --------------- > See http://www.ietf.org/proceedings/74/slides/nfsv4-3.pdf > > The proposal is to include a umask attribute that > would be provided with the OPEN operation during > file creation. This is not an attribute that > would be stored in the file but instead would > allow the NFSv4 client to indicate to the NFSv4 > server what umask to apply to file when combining > the mode and/or acl attributes in the OPEN > arguments. > > The proposal goes on to say that if there is a > default ACL on the file's directory, the server > can ignore the umask. What is not explained is > what problem this solves, since the client could > combine the umask and mode on its side, and send > the OPEN with a mode attribute reflecting the > combination of umask and the mode asrgument to > the open() system call. > > The proposal does say that if there is a default > ACL on the file's parent directory, the server > can ignore the umask. Apparently the purpose here > is to emulate a UNIX semantic that says that the > mode should be used as is when there is a default > ACL (but then how is the mode combined with any > corresponding user, group, and other ACEs in the > default ACL?). > > More discussion is likely requireed for this > proposed item. > > Shutdown Callback > ----------------- > See http://www.ietf.org/proceedings/74/slides/nfsv4-3.pdf > > The proposal is that the server will send a > callback in preparation for a planned shutdown. > > The client can then react as needed: inform user, > unmount NFS file systems etc. > > One reaction not mentioned is that the client > could commit modified data to the server. > > This functionality replaces the "rwall" ONC RPC service. > > Readahead Hint > -------------- > See http://www.ietf.org/proceedings/74/slides/nfsv4-3.pdf > > Today NFS servers use heuristics to determine if > a sequential read pattern exists, and if so, they > will schedule reads from their storage devices in > anticipation that by the time the client sends a > READ, the data will be in the server's cache. > This has drawbacks: > > - With pNFS, a given storage device has > difficulty detecting a read pattern, since the > next logical block might be on the next > device. > > - NFS clients often have parallel threads issuing > read requests. The pattern of READs as received > by the server is not sequential. > > - Detecting readahead requires a set of READs. > > - For small files, the set of READs needed might > exceed the length of the file > > - The heuristics on the server can produce false > positives. > > It appears the proposal would consist of a new > operation that would be like READ, but would not > return data. Possible return values might be: > > - requested ignored (server is too loaded) > > - range is already in cache > > - request in progress > > pNFS Connectivity/Access Indication > ----------------------------------- > See http://www.ietf.org/proceedings/75/slides/nfsv4-0.pdf, slides 112-121. > > The issue is that a pNFS client might be able to > reach a storage device identified in a layout, > due to a misconfiguration in the network or on > the pNFS server. Ease-of-use considerations > motivate a way for the pNFS client to communicate > the problem to the MDS. > > This communication could be in the form of an > extension to LAYOUT_RETURN, or a new operation. > > There seemed to be consensus at the Stockholm > meeting that we want to solve this. > > Better Negotiation of Session Reply Cache Sizes > ----------------------------------------------- > After the WG meeting in Stockholm there was > discussion around how to enable a replier on a > session to pre-allocate the necessary space > needed for the reply cache without over > provisioning. One idea discussed to add an > operation that limits the set of operations that > can be used on the session. For example, a client > might create a session used only for operations > with results that are never cached, such as READ, > READDIR, and another session used only for > operations that are invariably cached, such as > WRITE, RENAME, REMOVE, etc. One problem with this > approach is that the operation would be sent > after the session was created, making it too late > for the server to pre-allocate the optimal size > for its reply cache. > > Work items apparently not requiring a new minor version of NFSv4 > ================================================================ > > Metadata Striping > ----------------- > > See http://www.ietf.org/proceedings/73/slides/nfsv4-3.pdf . > > The proposal is extend pNFS via a new layout type > to support distribution of metadata in a pNFS > server. A second type of MDS, the lMDS is > described. A pNFS client would be directed to an > lMDS via a layout returned by LAYOUTGET on the > new layout type. As proposed, only a new layout > type is needed. > > The proposal has had little discussion on mailing > list, other than to clarify some points. At the > Minneapolis IETF meeting, it was noted that the > registered algorithms used for distributing > metadata by file name needed to be small in > number if pNFS clients were going to successfully > interoperate with any pNFS server. > > De-Dupe Awareness and Sub-File Caching > -------------------------------------- > > See http://www.ietf.org/proceedings/73/slides/nfsv4-3.pdf . > > The proposal is that NFS servers that support > space efficiency (i.e. data is the same between > two files is stored once), provide the space > efficiency maps to the NFS client. The maps are > encoded as bit maps, each bit corresponding to a > fixed sized block of a file. > > The proposal does not require a new minor version > of NFS, but instead requires 64 new pNFS layout > types. > > The proposal can be extended to support sub-file > caching, whether the file has de-duplication or > not, and is a candidate for marrying with the > peer-to-peer NFS proposal. > > At the San Francisco and Minneapolis IETF > meetings, the feedback on the proposals has been > that block sizes and alignments that are powers > of 2 don't match up with all forms of > de-duplication and major use cases of caching. > For example suppose file 1 is 9111 bytes long and > file 2 is 1000 bytes long. At offset 111, the > next 1000 bytes are equal to all of file 2. File > 1 and file 2 are de-duplicated in some storage > devices. A major use cache of caching that is > not covered by the proposal might be an HPC > application that has records are each aligned on > 64 bit boundaries but with lengths that are not > powers of 2, e.g. the record lengths might be 108 > bytes each (a multiple of 64 bits). > > It seems obvious how the proposal could address > the HPC caching use case; simply relax the > requirement that block sizes be powers of 2. More > thought will be needed to address unaligned > de-duplication use case, at least in its most > general forms. > > > -- > Mike Eisler, Senior Technical Director, NetApp, 719 599 9026, > http://blogs.netapp.com/eislers_nfs_blog/ > > > > _______________________________________________ > nfsv4 mailing list > nfsv4@ietf.org > https://www.ietf.org/mailman/listinfo/nfsv4 >
- Re: [nfsv4] List of possible work items for NFSv4… William A. (Andy) Adamson
- [nfsv4] List of possible work items for NFSv4.2 Mike Eisler
- Re: [nfsv4] List of possible work items for NFSv4… Tom Haynes
- Re: [nfsv4] List of possible work items for NFSv4… Trond Myklebust
- Re: [nfsv4] List of possible work items for NFSv4… Robert Gordon
- Re: [nfsv4] List of possible work items for NFSv4… Rick Macklem
- Re: [nfsv4] List of possible work items for NFSv4… Trond Myklebust
- Re: [nfsv4] List of possible work items for NFSv4… James Lentini
- Re: [nfsv4] List of possible work items for NFSv4… Nicolas Williams
- Re: [nfsv4] List of possible work items for NFSv4… J. Bruce Fields
- Re: [nfsv4] List of possible work items for NFSv4… Rick Macklem
- Re: [nfsv4] List of possible work items for NFSv4… Trond Myklebust
- Re: [nfsv4] List of possible work items for NFSv4… Rick Macklem
- Re: [nfsv4] List of possible work items for NFSv4… Trond Myklebust
- Re: [nfsv4] List of possible work items for NFSv4… Trond Myklebust
- Re: [nfsv4] List of possible work items for NFSv4… Rick Macklem
- Re: [nfsv4] List of possible work items for NFSv4… David P. Quigley
- Re: [nfsv4] List of possible work items for NFSv4… Trond Myklebust
- Re: [nfsv4] List of possible work items for NFSv4… Rick Macklem
- Re: [nfsv4] List of possible work items for NFSv4… J. Bruce Fields
- Re: [nfsv4] List of possible work items for NFSv4… Tom Haynes
- Re: [nfsv4] List of possible work items for NFSv4… Noveck, Dave
- Re: [nfsv4] List of possible work items for NFSv4… Noveck, Dave
- Re: [nfsv4] List of possible work items for NFSv4… Muntz, Daniel
- Re: [nfsv4] List of possible work items for NFSv4… Robert Gordon
- Re: [nfsv4] List of possible work items for NFSv4… Trond Myklebust
- Re: [nfsv4] List of possible work items for NFSv4… Nick Williams
- Re: [nfsv4] List of possible work items for NFSv4… Trond Myklebust
- Re: [nfsv4] List of possible work items for NFSv4… Trond Myklebust
- Re: [nfsv4] List of possible work items for NFSv4… Nicolas Williams
- Re: [nfsv4] List of possible work items for NFSv4… Mike Eisler
- Re: [nfsv4] List of possible work items for NFSv4… Mike Eisler
- Re: [nfsv4] List of possible work items for NFSv4… David P. Quigley
- Re: [nfsv4] List of possible work items for NFSv4… sfaibish
- Re: [nfsv4] List of possible work items for NFSv4… Spencer Shepler
- Re: [nfsv4] List of possible work items for NFSv4… R N ALEX
- Re: [nfsv4] List of possible work items for NFSv4… Sorin Faibish
- Re: [nfsv4] List of possible work items for NFSv4… Mahesh Siddheshwar
- Re: [nfsv4] List of possible work items for NFSv4… J. Bruce Fields
- Re: [nfsv4] List of possible work items for NFSv4… Lisa Week
- Re: [nfsv4] List of possible work items for NFSv4… Noveck, Dave
- Re: [nfsv4] List of possible work items for NFSv4… Tom Haynes
- Re: [nfsv4] List of possible work items for NFSv4… Nicolas Williams
- Re: [nfsv4] List of possible work items for NFSv4… J. Bruce Fields
- Re: [nfsv4] List of possible work items for NFSv4… Sam Falkner
- Re: [nfsv4] List of possible work items for NFSv4… Mike Eisler
- Re: [nfsv4] List of possible work items for NFSv4… Nicolas Williams
- Re: [nfsv4] List of possible work items for NFSv4… Mike Eisler
- Re: [nfsv4] List of possible work items for NFSv4… Trond Myklebust
- Re: [nfsv4] List of possible work items for NFSv4… Nicolas Williams
- Re: [nfsv4] List of possible work items for NFSv4… Tigran Mkrtchyan
- Re: [nfsv4] List of possible work items for NFSv4… Muntz, Daniel
- Re: [nfsv4] List of possible work items for NFSv4… sfaibish
- Re: [nfsv4] List of possible work items for NFSv4… Benny Halevy
- Re: [nfsv4] List of possible work items for NFSv4… Trond Myklebust
- Re: [nfsv4] List of possible work items for NFSv4… Benny Halevy