Re: [nfsv4] rfc5667bis open issues

David Noveck <davenoveck@gmail.com> Sat, 24 September 2016 12:11 UTC

Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4B48612BD0B for <nfsv4@ietfa.amsl.com>; Sat, 24 Sep 2016 05:11:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.699
X-Spam-Level:
X-Spam-Status: No, score=-2.699 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uheGIp17OS2c for <nfsv4@ietfa.amsl.com>; Sat, 24 Sep 2016 05:11:11 -0700 (PDT)
Received: from mail-oi0-x232.google.com (mail-oi0-x232.google.com [IPv6:2607:f8b0:4003:c06::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3FAC912BD09 for <nfsv4@ietf.org>; Sat, 24 Sep 2016 05:11:11 -0700 (PDT)
Received: by mail-oi0-x232.google.com with SMTP id a62so162567440oib.1 for <nfsv4@ietf.org>; Sat, 24 Sep 2016 05:11:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=fE1xxhe0N/fOQ54p6Nc+XM3KKTcwNqNafcezvzm5Uc4=; b=ZadFvz84vwf219TP6U9ENK4wwMhRrZ2HV36KFZ6VfpxwJsdPYQSewLm+6sXSL/D1Iu mlgWrPAMuE1JXmXJPkweamuVhcz/ecmcyvUHp72LmP/7++j/c+hY1/ry1xGrr7iasrH/ zOIVbwnPyuEea8R1fm+GErae1irZQJY1crk058YKO+537Gdq3PZk4+bOkXIeXj5RULY8 JLjIKIFsOHFoVUvbaOlk/v+6dFrwS7jLNctGgLkSP8sAnkzrkYy+wCOSs28U5NTuw7Cm a5bohiKQxDoyoWYwgwfo27XKyJLQsd42wr/4qyfQ7zxLOb+G7cV/nJ8eXruGnowto3ol cC2w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=fE1xxhe0N/fOQ54p6Nc+XM3KKTcwNqNafcezvzm5Uc4=; b=Sgq/NPcPfAZN7y/3JW4Bi2Bn704nda1nPbNYqho2ExHvU12+dOgqXiwMDRX9KXpdoB CNPO659JInMb71CQq2DSCOqNCDCk/8Wfqt+Qx2FiZu15D4vcM9HdqHcsFEF1DBel8dJu z+0Ie0TU+MYc+BmEBOr5X93tc0CtmN7AIa1mBK0pyRIfVZTVU5diyG8fCJmJi7zb3psF St23Tgkw1+aoizngFlkVWEzrk8wTK0pNyUQEjLLGQ2IyWRTxaOrE5XK8tPz5uZMlFfxh jchsDb0zlx3gDE55Lm4LvzvVYGQGpDhQfv5y5XtwDgN3+m9+93H/zt0YiYwxdNi/JvQD 4ykw==
X-Gm-Message-State: AE9vXwOKJ2bl4u1+gs0xSKV+xPETFCc5ZgpBktkEefTKA1BJ9arbv0DS4xtD/CG7sbGIW84+rZrGl+CH7Hkwtg==
X-Received: by 10.202.205.212 with SMTP id d203mr16251366oig.180.1474719070563; Sat, 24 Sep 2016 05:11:10 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.182.192.10 with HTTP; Sat, 24 Sep 2016 05:11:10 -0700 (PDT)
In-Reply-To: <D0ECCDF7-F785-4419-AA93-33B2054C4737@oracle.com>
References: <15F62327-B73F-45CF-B4A5-8535955E954F@oracle.com> <65E80EDE-6031-4A83-9B73-3A88C91F8E6A@oracle.com> <CADaq8jc50Ca6eDZ3D6zRvfG+Q2DngNN6+mN9WKXj9AS=d1iQVg@mail.gmail.com> <D0ECCDF7-F785-4419-AA93-33B2054C4737@oracle.com>
From: David Noveck <davenoveck@gmail.com>
Date: Sat, 24 Sep 2016 08:11:10 -0400
Message-ID: <CADaq8jcSxc6BQKJ1SZ=OrpRcEGpgpfdLDcPpBp=GfGQJwkbLEw@mail.gmail.com>
To: Chuck Lever <chuck.lever@oracle.com>
Content-Type: multipart/alternative; boundary="001a1134f6f05831db053d3fcaf6"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/c0C6YUUv-LBy_agZXW0s4pDnVc4>
Cc: NFSv4 <nfsv4@ietf.org>
Subject: Re: [nfsv4] rfc5667bis open issues
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 24 Sep 2016 12:11:14 -0000

>The issue is that the language allows ERR_CHUNK to be
> returned after a server has processed the RPC request when
> a client has not provided adequate Write list or Reply chunk
> resources to convey the reply.

I forgot that we made ERR_CHUNK ambiguous, apparently because of reluctance
to add things to the XDR.  There is no pressing need to do this since the
responder is aware of the difference and his management of the DRC can
follow based on his knowledge.

However, in the context of Version Two it would be better if we avoided
those ambiguities, given that we have lots of space
for distinct error codes.

> In that case, it makes sense for the server to have added
> the request to its DRC.

Agree that in the case in which the server has executed the request, it
should add the request to the DRC.  In practical terms, there are not
likely to be cases in which there is a non-idempotent request with a reply
longer than 1K.

As far as the session case, the server will consider the request executed
but the client does not have a reply containing the slot and sequence.  To
deal with that case, he would have to use rdma_xid in the message with the
ERR_CHUNK to get the needed context and so conclude that the slot was
available for reuse.

> What I propose is that if the first READ_PLUS returns
> NFS4_CONTENT_HOLE, the server would return an empty first
> Write chunk. Then the second READ_PLUS result always
> lines up with the second Write chunk, which IMO is much better > for
clients.

I'm OK with this but I think you will need to adjust the text to reflect
the fact that READ_PLUS can return an array of read_plus_content's,
although, in practice, those that return more than one are extremely rare.


On Fri, Sep 23, 2016 at 12:37 PM, Chuck Lever <chuck.lever@oracle.com>
wrote:

>
> > On Sep 22, 2016, at 5:51 PM, David Noveck <davenoveck@gmail.com> wrote:
> >
> > > Does this count as a dropped RPC reply, such that an NFSv4 server
> > > would have to drop the connection?
> >
> > I don't see any reason why it should.  Apart from RPC-over-RDMA, does
> this apply to other XDR decode issues on the server? After all, it is
> getting a reply, even if the content is essentially "I don't understand the
> request."
> >
> > > When an NFS server returns one of these responses, does it
> > > have to enter the reply in its DRC ?
> >
> > I should hope not.  The purpose of the DRC is to prevent repeated
> execution of non-idempotent requests.  A request that you can't decode does
> not change any state on the server.
>
> The issue is that the language allows ERR_CHUNK to be returned
> after a server has processed the RPC request when a client has
> not provided adequate Write list or Reply chunk resources to
> convey the reply.
>
> In that case, it makes sense for the server to have added the
> request to its DRC.
>
>
> > > What if any implications are there for an NFSv4.1 session
> > > (slot retired? ignored?)
> >
> > Why does this have to have to be addressed in 5667bis?  I don't see why
> XDR decode errors need to be addressed differently in the RPC-over-RDMA
> case.
>
> ERR_CHUNK is not necessarily an XDR decode error. See above.
>
> If returning ERR_CHUNK in the cases where a result has been
> generated but a reply can not be formed is not workable, the
> alternative is for the server to drop the connection without
> replying, IMO.
>
> Some text could be introduced that suggests that servers take
> care to ensure there are enough reply resources before they
> begin processing an RPC. That may be challenging for some
> server implementations, though.
>
>
> > > The current text of rfc5666bis (Section 5.3.2) suggests that when
> > > multiple Write chunks are provided for an RPC, and the responder
> > > doesn't use one of them, it should use that chunk for the next
> > > DDP-eligible XDR data item.
> >
> > It does say that and the treatment there is limited to the case of
> multiple write chunks.  The treatment for the analogous case when there is
> a single write chunk is addressed in 4.4.6.2.  Th treatment is the same:
> each DDP-eligible item is matched with the first available write chunk
> until either there are no more write chunks or no more DDP-eligible items
> in the reply.
> >
> > > The problematic text is actually this part of rfc5666bis:
> >
> > I don't see why this is described as "problematic".  Is there a
> suggestion that we might change this text to something less problematic?  I
> don't see how to do that.  I believe we should leave the text as it is.
>
> However, we can say that the default behavior for RPC-over-RDMA
> is to skip to the next result, but that there is no prohibition
> for ULBs to amend that default. I took that approach below.
>
>
> > Within Version One, there is no way to tie write chunks to particular
> DDP-eligible items.  While one might think of that as a problem it is not
> one that can be addressed in Version One or in rfc5667bis.
>
> There is a way to tie these together. Here is the text I have
> adopted for rfc5667bis (revision not yet submitted). First:
>
>
> 2.2.1.  Empty Write Chunks
>
>    Section 4.4.6.2 of [I-D.ietf-nfsv4-rfc5666bis] defines the concept of
>    unused Write chunks.  An unused Write chunk is a Write chunk with
>    either zero segments or where all segments in the Write chunk have
>    zero length.  In this document these are referred to as "empty" Write
>    chunks.  A "non-empty" Write chunk has at least one segment of non-
>    zero length.
>
>    An NFS client might wish an NFS server to return a DDP-eligible
>    result inline.  If there is only one DDP-eligible result item in the
>    reply, the NFS client simply specifies an empty Write list to force
>    the NFS server to return that result inline.  If there are multiple
>    DDP-eligible results, the NFS client specifies empty Write chunks for
>    each DDP-eligible data item that it wishes to be returned inline.
>
>    An NFS server might encounter an XDR union result where there are
>    arms that have a DDP-eligible result, and arms that do not.  If the
>    NFS client has provided a non-empty Write chunk that matches with a
>    DDP-eligible result, but the response does not contain that result,
>    the NFS server MUST return an empty Write chunk in that position in
>    the Write list.
>
>
> And then, in Section 4.3:
>
>    The mechanism specified in Section 5.3.2 of
>    [I-D.ietf-nfsv4-rfc5666bis]) is applied here, with some additional
>    restrictions.  In the following list, a "READ" operation refers to
>    either a READ, READ_PLUS, or READLINK operation.
>
>    o  If an NFS client does not wish to use direct placement for any
>       DDP-eligible item in an NFS reply, it leaves the Write list empty.
>
>    o  The first chunk in the Write list MUST be used by the first READ
>       operation in an NFS version 4 COMPOUND procedure.  The next Write
>       chunk is used by the next READ operation, and so on.
>
>    o  If an NFS client has provided a matching non-empty Write chunk,
>       then the corresponding READ operation MUST return its data by
>       placing data into that chunk.
>
>    o  If an NFS client has provided an empty matching Write chunk, then
>       the corresponding READ operation MUST return its result inline.
>
>    o  If a READ operation returns a union arm which does not contain a
>       DDP-eligible result, and the NFS client has provided a matching
>       non-empty Write chunk, the NFS server MUST return an empty Write
>       chunk in that Write list position.
>
>    o  If there are more READ operations than Write chunks, then any
>       remaining READ operations in the COMPOUND MUST return their
>       results inline.
>
>
> This problem has not arisen before for two reasons:
>
> - So far NFSv4 clients do not build COMPOUNDs that contain more
> DDP-eligible results than NFSv3 clients use (ie, only one per RPC).
>
> - So far union results are used only in error cases; and when an
> operation returns an error, it is always the last one in the
> COMPOUND reply.
>
> But now we have NFSv4.2 READ_PLUS. READ_PLUS can return:
>
> - NFS4_CONTENT_DATA, which has an opaque DDP-eligible result
> - NFS4_CONTENT_HOLE, which has no DDP-eligible result
> - an error status, which has no DDP-eligible result
>
> So we now have a situation where there are operations that can follow
> a result that might or might not be returned in a Write chunk.
>
> Suppose you have a COMPOUND that looks like this:
>
>    { SEQUENCE, PUTFH, READ_PLUS(2048), READ_PLUS(9000), GETATTR }
>
> Using the currently proposed scheme, the client must set up two Write
> chunks that can receive 9000 bytes, to handle the case where the first
> READ_PLUS returns NFS4_CONTENT_HOLE. After the reply is received, the
> client has to ensure that the first returned Write chunk is matched to
> the second XDR result, which could be troublesome for some
> implementations.
>
> What I propose is that if the first READ_PLUS returns NFS4_CONTENT_HOLE,
> the server would return an empty first Write chunk. Then the second
> READ_PLUS result always lines up with the second Write chunk, which IMO
> is much better for clients.
>
>
> --
> Chuck Lever
>
>
>
>