Return-Path: <chuck.lever@oracle.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1])
 by ietfa.amsl.com (Postfix) with ESMTP id F173512B245
 for <nfsv4@ietfa.amsl.com>; Mon, 26 Sep 2016 10:28:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.518
X-Spam-Level: 
X-Spam-Status: No, score=-6.518 tagged_above=-999 required=5
 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3,
 RCVD_IN_MSPIKE_H2=-0.001, RP_MATCHES_RCVD=-2.316, SPF_PASS=-0.001]
 autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44])
 by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id rLBkvBJMWQEe for <nfsv4@ietfa.amsl.com>;
 Mon, 26 Sep 2016 10:28:11 -0700 (PDT)
Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by ietfa.amsl.com (Postfix) with ESMTPS id D276012B33C
 for <nfsv4@ietf.org>; Mon, 26 Sep 2016 10:27:39 -0700 (PDT)
Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233])
 by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id
 u8QHRbJ0012076
 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
 Mon, 26 Sep 2016 17:27:37 GMT
Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72])
 by aserv0021.oracle.com (8.13.8/8.13.8) with ESMTP id u8QHRbSu005674
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
 Mon, 26 Sep 2016 17:27:37 GMT
Received: from abhmp0009.oracle.com (abhmp0009.oracle.com [141.146.116.15])
 by userv0121.oracle.com (8.13.8/8.13.8) with ESMTP id u8QHRVGZ022964;
 Mon, 26 Sep 2016 17:27:36 GMT
Received: from [172.20.9.88] (/207.241.138.226)
 by default (Oracle Beehive Gateway v4.0)
 with ESMTP ; Mon, 26 Sep 2016 10:27:25 -0700
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Chuck Lever <chuck.lever@oracle.com>
In-Reply-To: <11DB3812-B605-4426-A316-176CE31910B2@oracle.com>
Date: Mon, 26 Sep 2016 13:27:24 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <7F9A17B8-B79B-497D-9DA5-03AD6AE586CE@oracle.com>
References: <15F62327-B73F-45CF-B4A5-8535955E954F@oracle.com>
 <65E80EDE-6031-4A83-9B73-3A88C91F8E6A@oracle.com>
 <CADaq8jc50Ca6eDZ3D6zRvfG+Q2DngNN6+mN9WKXj9AS=d1iQVg@mail.gmail.com>
 <D0ECCDF7-F785-4419-AA93-33B2054C4737@oracle.com>
 <CADaq8jcSxc6BQKJ1SZ=OrpRcEGpgpfdLDcPpBp=GfGQJwkbLEw@mail.gmail.com>
 <11DB3812-B605-4426-A316-176CE31910B2@oracle.com>
To: David Noveck <davenoveck@gmail.com>
X-Mailer: Apple Mail (2.3124)
X-Source-IP: aserv0021.oracle.com [141.146.126.233]
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/wiPDriP1om8jqc9LE2HFeZCkF-k>
Cc: NFSv4 <nfsv4@ietf.org>
Subject: Re: [nfsv4] rfc5667bis open issues
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>,
 <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>,
 <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Sep 2016 17:28:15 -0000


> On Sep 26, 2016, at 1:20 PM, Chuck Lever <chuck.lever@oracle.com> =
wrote:
>=20
>>=20
>> On Sep 24, 2016, at 8:11 AM, David Noveck <davenoveck@gmail.com> =
wrote:
>>=20
>>> The issue is that the language allows ERR_CHUNK to be
>>> returned after a server has processed the RPC request when=20
>>> a client has not provided adequate Write list or Reply chunk
>>> resources to convey the reply.
>>=20
>> I forgot that we made ERR_CHUNK ambiguous, apparently because of =
reluctance to add things to the XDR.  There is no pressing need to do =
this since the responder is aware of the difference and his management =
of the DRC can follow based on his knowledge.
>>=20
>> However, in the context of Version Two it would be better if we =
avoided those ambiguities, given that we have lots of space for distinct =
error codes.
>>=20
>>> In that case, it makes sense for the server to have added=20
>>> the request to its DRC.
>>=20
>> Agree that in the case in which the server has executed the request, =
it should add the request to the DRC.  In practical terms, there are not =
likely to be cases in which there is a non-idempotent request with a =
reply longer than 1K.
>=20
> For NFSv3, that is largely true.
>=20
> For NFSv4, I believe a large reply to a non-idempotent request is
> possible, and may even be common. Any time a client does something
> like this:
>=20
>  { SEQUENCE, PUTFH, SETATTR, GETATTR }
>=20
> Where the GETATTR requests an ACL or security label, is problematic
> if the client does not estimate the reply buffer size correctly.
>=20
> However, this case is for when the client has a bug; it's not a
> case where we expect one or the other side to perform heroic
> recovery. ERR_CHUNK would terminate the RPC on the client, which
> would very likely return EIO to the application. I think that's
> about the sanest outcome we can expect.
>=20
>=20
>> As far as the session case, the server will consider the request =
executed but the client does not have a reply containing the slot and =
sequence.  To deal with that case, he would have to use rdma_xid in the =
message with the ERR_CHUNK to get the needed context and so conclude =
that the slot was available for reuse.
>=20
> This feels like something that belongs in 5667bis. I'm no real
> expert on session behavior. Can you sketch some text that can be
> added?
>=20
>=20
>>> What I propose is that if the first READ_PLUS returns=20
>>> NFS4_CONTENT_HOLE, the server would return an empty first
>>> Write chunk. Then the second READ_PLUS result always=20
>>> lines up with the second Write chunk, which IMO is much better > for =
clients.
>>=20
>> I'm OK with this but I think you will need to adjust the text to =
reflect the fact that READ_PLUS can return an array of =
read_plus_content's, although, in practice, those that return more than =
one are extremely rare.
>=20
> I hadn't realized READ_PLUS returned an array.
>=20
> If an NFS server is allowed to structure its reply in a way that
> the client cannot predict, then I think we'll have to limit the
> way READ_PLUS uses DDP. I propose these rules:
>=20
> - The client can provide no more than one Write chunk if it expects
> NFS4_CONTENT_DATA. (No Write chunk or an empty Write chunk, following
> the previous rules, would be for when the client predicts that the
> reply can go inline).
>=20
> - If that Write chunk is non-empty, it MUST be large enough to
> receive all expected payload bytes in a single NFS4_CONTENT_DATA
> element.
>=20
> - The server uses that Write chunk for the first array element that
> has an NFS4_CONTENT_DATA arm.
>=20
>=20
> Then we have a choice, depending on whether it is more desirable
> to return data in a single round-trip, or more desirable to preserve
> holes. Either:
>=20
> - If the server finds that the array has grown larger than can be
> returned inline or via the supplied Reply chunk, it MUST return
> the payload data in a single NFS4_CONTENT_DATA element via the
> provided Write chunk.
>=20
> Or:
>=20
> - The server MUST return as much payload as it can fit within the
> resources provided by the client, and return it as a short READ
> result.

Sorry, that's not clear. I mean a "short read result": In other
words, the server indicates that it didn't return all the bytes
the client requested.


> The client is responsible for retrying the READ_PLUS to
> read the remaining payload.
>=20
> Somehow we have to deal with the case where the server cannot fit
> any of the payload in the client-provided resources.
>=20
>=20
> READ_PLUS is actually a poor fit for offloaded DDP anyway. The whole
> point of offload is that the client has to do no work; the payload
> arrives in its memory without any effort on its part.
>=20
> I would just as soon require that, on RDMA transports, READ_PLUS
> returns only a hole or exactly one contiguous piece of content.
>=20
>=20
> --
> Chuck Lever
>=20
>=20
>=20
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4

--
Chuck Lever



