Re: [nfsv4] rfc5667bis open issues

David Noveck <davenoveck@gmail.com> Tue, 27 September 2016 02:39 UTC

Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 61DC012B3CD for <nfsv4@ietfa.amsl.com>; Mon, 26 Sep 2016 19:39:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.699
X-Spam-Level:
X-Spam-Status: No, score=-2.699 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EP5UBokPb1J9 for <nfsv4@ietfa.amsl.com>; Mon, 26 Sep 2016 19:39:02 -0700 (PDT)
Received: from mail-oi0-x234.google.com (mail-oi0-x234.google.com [IPv6:2607:f8b0:4003:c06::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B4F0F12B3C0 for <nfsv4@ietf.org>; Mon, 26 Sep 2016 19:39:02 -0700 (PDT)
Received: by mail-oi0-x234.google.com with SMTP id w11so677401oia.2 for <nfsv4@ietf.org>; Mon, 26 Sep 2016 19:39:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=nGvkBkbfceER942veqzy2RtQSdMkVRpYxuXbfwGUdGA=; b=QKLxTMWuK0F8C6dDtSMNApVoYZp2C2plUV4B2zj+Z5+4h7Ust/WEV7kdJjImHm2Sb5 /BjEZt7/xy1s9Xvg81xGd+Q7FxhyVdW+XzAVnpfdkXW8GEWWyvZqr8FQ15tfV4K2a18x 61ZQw78o/JWl1ZOcRvM8KLphy5+Y1jDzVUr8PUEd6fng6RmMzMVxCTosR7vdBqUlGbWt d/V4hN6zZz33grEd4wyMuRSXzNL5f/Rm24ITKR9C8Wxzeq7hWsaiAmq6Jn4H7uvTSpgy 3nincy1e8E9PLBHI5r9GHku10ekfIQeaRiLlwlYx22NYsBHT5RPmaQpv6fFIjsd//Pen aJMQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=nGvkBkbfceER942veqzy2RtQSdMkVRpYxuXbfwGUdGA=; b=HPcHM84eWtBVcFxaUPeFhF74Qc9jyX6l67+HyCyAm+y4sQMZ7SlMY3uzgu/8IYbEMR iXXDLAFFLLt+qdL4YtqM3UXrxOiOIg3IOTAMHCmT1N5FLHIBh4n3oj799d6HXxaPJRm8 mFZD0oZeACgvznYr7obRQlb7kfICOE8FxMfg4yYNcZEtJovlL3W7yBh678RZcTTltfyC y4yyfDhXahekv9OEgDlTBVrigTlN35rVT/EsMHrywWCXv1cwoZT97dabC60xsLZDI+st 5wRZU3rNu9ptMkcbgPni6ZXzoBULolrF+sJ8Uozbn7zPeO27uY6bxlTVQy38q74NevD2 YOPA==
X-Gm-Message-State: AE9vXwP4OA2Ec83c+KQqRkL9ZgYtLuqcP27U8aDj5yiBAkwUhaHkI+QciO91MA25uL9IB2oLVuPYOTDXnAmp0A==
X-Received: by 10.202.172.82 with SMTP id v79mr32501666oie.178.1474943942069; Mon, 26 Sep 2016 19:39:02 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.182.249.46 with HTTP; Mon, 26 Sep 2016 19:39:01 -0700 (PDT)
In-Reply-To: <11DB3812-B605-4426-A316-176CE31910B2@oracle.com>
References: <15F62327-B73F-45CF-B4A5-8535955E954F@oracle.com> <65E80EDE-6031-4A83-9B73-3A88C91F8E6A@oracle.com> <CADaq8jc50Ca6eDZ3D6zRvfG+Q2DngNN6+mN9WKXj9AS=d1iQVg@mail.gmail.com> <D0ECCDF7-F785-4419-AA93-33B2054C4737@oracle.com> <CADaq8jcSxc6BQKJ1SZ=OrpRcEGpgpfdLDcPpBp=GfGQJwkbLEw@mail.gmail.com> <11DB3812-B605-4426-A316-176CE31910B2@oracle.com>
From: David Noveck <davenoveck@gmail.com>
Date: Mon, 26 Sep 2016 22:39:01 -0400
Message-ID: <CADaq8jfVDVWcqu2tHBG7dvFDHRo7HGqUANthP4hQp9UwiyZBVw@mail.gmail.com>
To: Chuck Lever <chuck.lever@oracle.com>
Content-Type: multipart/alternative; boundary="001a113c37babb1297053d7425b0"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/7PBseg4jd_gukCOU26aV40CUJaI>
Cc: NFSv4 <nfsv4@ietf.org>
Subject: Re: [nfsv4] rfc5667bis open issues
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Sep 2016 02:39:05 -0000

> > As far as the session case, the server will consider the request
executed but the client does not have a reply containing the slot and
sequence.  To deal with that case, he would have to use rdma_xid in the
message with the ERR_CHUNK to get the needed context and so conclude that
the slot was available for reuse.

> This feels like something that belongs in 5667bis. I'm no real
> expert on session behavior. Can you sketch some text that can > be added?

I'll write it assuming it will constitute a new subsection 4.x dealing with
session-related issues.

On Mon, Sep 26, 2016 at 1:20 PM, Chuck Lever <chuck.lever@oracle.com> wrote:

>
> > On Sep 24, 2016, at 8:11 AM, David Noveck <davenoveck@gmail.com> wrote:
> >
> > >The issue is that the language allows ERR_CHUNK to be
> > > returned after a server has processed the RPC request when
> > > a client has not provided adequate Write list or Reply chunk
> > > resources to convey the reply.
> >
> > I forgot that we made ERR_CHUNK ambiguous, apparently because of
> reluctance to add things to the XDR.  There is no pressing need to do this
> since the responder is aware of the difference and his management of the
> DRC can follow based on his knowledge.
> >
> > However, in the context of Version Two it would be better if we avoided
> those ambiguities, given that we have lots of space for distinct error
> codes.
> >
> > > In that case, it makes sense for the server to have added
> > > the request to its DRC.
> >
> > Agree that in the case in which the server has executed the request, it
> should add the request to the DRC.  In practical terms, there are not
> likely to be cases in which there is a non-idempotent request with a reply
> longer than 1K.
>
> For NFSv3, that is largely true.
>
> For NFSv4, I believe a large reply to a non-idempotent request is
> possible, and may even be common. Any time a client does something
> like this:
>
>   { SEQUENCE, PUTFH, SETATTR, GETATTR }
>
> Where the GETATTR requests an ACL or security label, is problematic
> if the client does not estimate the reply buffer size correctly.
>
> However, this case is for when the client has a bug; it's not a
> case where we expect one or the other side to perform heroic
> recovery. ERR_CHUNK would terminate the RPC on the client, which
> would very likely return EIO to the application. I think that's
> about the sanest outcome we can expect.
>
>
> > As far as the session case, the server will consider the request
> executed but the client does not have a reply containing the slot and
> sequence.  To deal with that case, he would have to use rdma_xid in the
> message with the ERR_CHUNK to get the needed context and so conclude that
> the slot was available for reuse.
>
> This feels like something that belongs in 5667bis. I'm no real
> expert on session behavior. Can you sketch some text that can be
> added?
>
>
> > > What I propose is that if the first READ_PLUS returns
> > > NFS4_CONTENT_HOLE, the server would return an empty first
> > > Write chunk. Then the second READ_PLUS result always
> > > lines up with the second Write chunk, which IMO is much better > for
> clients.
> >
> > I'm OK with this but I think you will need to adjust the text to reflect
> the fact that READ_PLUS can return an array of read_plus_content's,
> although, in practice, those that return more than one are extremely rare.
>
> I hadn't realized READ_PLUS returned an array.
>
> If an NFS server is allowed to structure its reply in a way that
> the client cannot predict, then I think we'll have to limit the
> way READ_PLUS uses DDP. I propose these rules:
>
> - The client can provide no more than one Write chunk if it expects
> NFS4_CONTENT_DATA. (No Write chunk or an empty Write chunk, following
> the previous rules, would be for when the client predicts that the
> reply can go inline).
>
> - If that Write chunk is non-empty, it MUST be large enough to
> receive all expected payload bytes in a single NFS4_CONTENT_DATA
> element.
>
> - The server uses that Write chunk for the first array element that
> has an NFS4_CONTENT_DATA arm.
>
>
> Then we have a choice, depending on whether it is more desirable
> to return data in a single round-trip, or more desirable to preserve
> holes. Either:
>
> - If the server finds that the array has grown larger than can be
> returned inline or via the supplied Reply chunk, it MUST return
> the payload data in a single NFS4_CONTENT_DATA element via the
> provided Write chunk.
>
> Or:
>
> - The server MUST return as much payload as it can fit within the
> resources provided by the client, and return it as a short READ
> result. The client is responsible for retrying the READ_PLUS to
> read the remaining payload.
>
> Somehow we have to deal with the case where the server cannot fit
> any of the payload in the client-provided resources.
>
>
> READ_PLUS is actually a poor fit for offloaded DDP anyway. The whole
> point of offload is that the client has to do no work; the payload
> arrives in its memory without any effort on its part.
>
> I would just as soon require that, on RDMA transports, READ_PLUS
> returns only a hole or exactly one contiguous piece of content.
>
>
> --
> Chuck Lever
>
>
>
>