Re: [nfsv4] rfc5667bis open issues

Chuck Lever <chuck.lever@oracle.com> Mon, 26 September 2016 17:28 UTC

Return-Path: <chuck.lever@oracle.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F173512B245 for <nfsv4@ietfa.amsl.com>; Mon, 26 Sep 2016 10:28:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.518
X-Spam-Level:
X-Spam-Status: No, score=-6.518 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H2=-0.001, RP_MATCHES_RCVD=-2.316, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rLBkvBJMWQEe for <nfsv4@ietfa.amsl.com>; Mon, 26 Sep 2016 10:28:11 -0700 (PDT)
Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D276012B33C for <nfsv4@ietf.org>; Mon, 26 Sep 2016 10:27:39 -0700 (PDT)
Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u8QHRbJ0012076 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 26 Sep 2016 17:27:37 GMT
Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0021.oracle.com (8.13.8/8.13.8) with ESMTP id u8QHRbSu005674 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 26 Sep 2016 17:27:37 GMT
Received: from abhmp0009.oracle.com (abhmp0009.oracle.com [141.146.116.15]) by userv0121.oracle.com (8.13.8/8.13.8) with ESMTP id u8QHRVGZ022964; Mon, 26 Sep 2016 17:27:36 GMT
Received: from [172.20.9.88] (/207.241.138.226) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 26 Sep 2016 10:27:25 -0700
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Chuck Lever <chuck.lever@oracle.com>
In-Reply-To: <11DB3812-B605-4426-A316-176CE31910B2@oracle.com>
Date: Mon, 26 Sep 2016 13:27:24 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <7F9A17B8-B79B-497D-9DA5-03AD6AE586CE@oracle.com>
References: <15F62327-B73F-45CF-B4A5-8535955E954F@oracle.com> <65E80EDE-6031-4A83-9B73-3A88C91F8E6A@oracle.com> <CADaq8jc50Ca6eDZ3D6zRvfG+Q2DngNN6+mN9WKXj9AS=d1iQVg@mail.gmail.com> <D0ECCDF7-F785-4419-AA93-33B2054C4737@oracle.com> <CADaq8jcSxc6BQKJ1SZ=OrpRcEGpgpfdLDcPpBp=GfGQJwkbLEw@mail.gmail.com> <11DB3812-B605-4426-A316-176CE31910B2@oracle.com>
To: David Noveck <davenoveck@gmail.com>
X-Mailer: Apple Mail (2.3124)
X-Source-IP: aserv0021.oracle.com [141.146.126.233]
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/wiPDriP1om8jqc9LE2HFeZCkF-k>
Cc: NFSv4 <nfsv4@ietf.org>
Subject: Re: [nfsv4] rfc5667bis open issues
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Sep 2016 17:28:15 -0000

> On Sep 26, 2016, at 1:20 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
> 
>> 
>> On Sep 24, 2016, at 8:11 AM, David Noveck <davenoveck@gmail.com> wrote:
>> 
>>> The issue is that the language allows ERR_CHUNK to be
>>> returned after a server has processed the RPC request when 
>>> a client has not provided adequate Write list or Reply chunk
>>> resources to convey the reply.
>> 
>> I forgot that we made ERR_CHUNK ambiguous, apparently because of reluctance to add things to the XDR.  There is no pressing need to do this since the responder is aware of the difference and his management of the DRC can follow based on his knowledge.
>> 
>> However, in the context of Version Two it would be better if we avoided those ambiguities, given that we have lots of space for distinct error codes.
>> 
>>> In that case, it makes sense for the server to have added 
>>> the request to its DRC.
>> 
>> Agree that in the case in which the server has executed the request, it should add the request to the DRC.  In practical terms, there are not likely to be cases in which there is a non-idempotent request with a reply longer than 1K.
> 
> For NFSv3, that is largely true.
> 
> For NFSv4, I believe a large reply to a non-idempotent request is
> possible, and may even be common. Any time a client does something
> like this:
> 
>  { SEQUENCE, PUTFH, SETATTR, GETATTR }
> 
> Where the GETATTR requests an ACL or security label, is problematic
> if the client does not estimate the reply buffer size correctly.
> 
> However, this case is for when the client has a bug; it's not a
> case where we expect one or the other side to perform heroic
> recovery. ERR_CHUNK would terminate the RPC on the client, which
> would very likely return EIO to the application. I think that's
> about the sanest outcome we can expect.
> 
> 
>> As far as the session case, the server will consider the request executed but the client does not have a reply containing the slot and sequence.  To deal with that case, he would have to use rdma_xid in the message with the ERR_CHUNK to get the needed context and so conclude that the slot was available for reuse.
> 
> This feels like something that belongs in 5667bis. I'm no real
> expert on session behavior. Can you sketch some text that can be
> added?
> 
> 
>>> What I propose is that if the first READ_PLUS returns 
>>> NFS4_CONTENT_HOLE, the server would return an empty first
>>> Write chunk. Then the second READ_PLUS result always 
>>> lines up with the second Write chunk, which IMO is much better > for clients.
>> 
>> I'm OK with this but I think you will need to adjust the text to reflect the fact that READ_PLUS can return an array of read_plus_content's, although, in practice, those that return more than one are extremely rare.
> 
> I hadn't realized READ_PLUS returned an array.
> 
> If an NFS server is allowed to structure its reply in a way that
> the client cannot predict, then I think we'll have to limit the
> way READ_PLUS uses DDP. I propose these rules:
> 
> - The client can provide no more than one Write chunk if it expects
> NFS4_CONTENT_DATA. (No Write chunk or an empty Write chunk, following
> the previous rules, would be for when the client predicts that the
> reply can go inline).
> 
> - If that Write chunk is non-empty, it MUST be large enough to
> receive all expected payload bytes in a single NFS4_CONTENT_DATA
> element.
> 
> - The server uses that Write chunk for the first array element that
> has an NFS4_CONTENT_DATA arm.
> 
> 
> Then we have a choice, depending on whether it is more desirable
> to return data in a single round-trip, or more desirable to preserve
> holes. Either:
> 
> - If the server finds that the array has grown larger than can be
> returned inline or via the supplied Reply chunk, it MUST return
> the payload data in a single NFS4_CONTENT_DATA element via the
> provided Write chunk.
> 
> Or:
> 
> - The server MUST return as much payload as it can fit within the
> resources provided by the client, and return it as a short READ
> result.

Sorry, that's not clear. I mean a "short read result": In other
words, the server indicates that it didn't return all the bytes
the client requested.


> The client is responsible for retrying the READ_PLUS to
> read the remaining payload.
> 
> Somehow we have to deal with the case where the server cannot fit
> any of the payload in the client-provided resources.
> 
> 
> READ_PLUS is actually a poor fit for offloaded DDP anyway. The whole
> point of offload is that the client has to do no work; the payload
> arrives in its memory without any effort on its part.
> 
> I would just as soon require that, on RDMA transports, READ_PLUS
> returns only a hole or exactly one contiguous piece of content.
> 
> 
> --
> Chuck Lever
> 
> 
> 
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4

--
Chuck Lever