Re: [nfsv4] rfc5667bis open issues
Chuck Lever <chuck.lever@oracle.com> Fri, 23 September 2016 16:37 UTC
Return-Path: <chuck.lever@oracle.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7E0E412BD43 for <nfsv4@ietfa.amsl.com>; Fri, 23 Sep 2016 09:37:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.518
X-Spam-Level:
X-Spam-Status: No, score=-6.518 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H2=-0.001, RP_MATCHES_RCVD=-2.316, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id C-dayHcb5Lr6 for <nfsv4@ietfa.amsl.com>; Fri, 23 Sep 2016 09:37:40 -0700 (PDT)
Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0219412BD24 for <nfsv4@ietf.org>; Fri, 23 Sep 2016 09:37:39 -0700 (PDT)
Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u8NGbb8u029734 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 23 Sep 2016 16:37:38 GMT
Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserv0021.oracle.com (8.13.8/8.13.8) with ESMTP id u8NGbbub023345 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 23 Sep 2016 16:37:37 GMT
Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by aserv0121.oracle.com (8.13.8/8.13.8) with ESMTP id u8NGbWvD030570; Fri, 23 Sep 2016 16:37:35 GMT
Received: from [10.151.96.44] (/148.87.13.8) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 23 Sep 2016 09:37:32 -0700
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Chuck Lever <chuck.lever@oracle.com>
In-Reply-To: <CADaq8jc50Ca6eDZ3D6zRvfG+Q2DngNN6+mN9WKXj9AS=d1iQVg@mail.gmail.com>
Date: Fri, 23 Sep 2016 09:37:37 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <D0ECCDF7-F785-4419-AA93-33B2054C4737@oracle.com>
References: <15F62327-B73F-45CF-B4A5-8535955E954F@oracle.com> <65E80EDE-6031-4A83-9B73-3A88C91F8E6A@oracle.com> <CADaq8jc50Ca6eDZ3D6zRvfG+Q2DngNN6+mN9WKXj9AS=d1iQVg@mail.gmail.com>
To: David Noveck <davenoveck@gmail.com>
X-Mailer: Apple Mail (2.3124)
X-Source-IP: aserv0021.oracle.com [141.146.126.233]
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/DyongnReUwBtYzYTD5Dc0Re1zNc>
Cc: NFSv4 <nfsv4@ietf.org>
Subject: Re: [nfsv4] rfc5667bis open issues
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 23 Sep 2016 16:37:41 -0000
> On Sep 22, 2016, at 5:51 PM, David Noveck <davenoveck@gmail.com> wrote: > > > Does this count as a dropped RPC reply, such that an NFSv4 server > > would have to drop the connection? > > I don't see any reason why it should. Apart from RPC-over-RDMA, does this apply to other XDR decode issues on the server? After all, it is getting a reply, even if the content is essentially "I don't understand the request." > > > When an NFS server returns one of these responses, does it > > have to enter the reply in its DRC ? > > I should hope not. The purpose of the DRC is to prevent repeated execution of non-idempotent requests. A request that you can't decode does not change any state on the server. The issue is that the language allows ERR_CHUNK to be returned after a server has processed the RPC request when a client has not provided adequate Write list or Reply chunk resources to convey the reply. In that case, it makes sense for the server to have added the request to its DRC. > > What if any implications are there for an NFSv4.1 session > > (slot retired? ignored?) > > Why does this have to have to be addressed in 5667bis? I don't see why XDR decode errors need to be addressed differently in the RPC-over-RDMA case. ERR_CHUNK is not necessarily an XDR decode error. See above. If returning ERR_CHUNK in the cases where a result has been generated but a reply can not be formed is not workable, the alternative is for the server to drop the connection without replying, IMO. Some text could be introduced that suggests that servers take care to ensure there are enough reply resources before they begin processing an RPC. That may be challenging for some server implementations, though. > > The current text of rfc5666bis (Section 5.3.2) suggests that when > > multiple Write chunks are provided for an RPC, and the responder > > doesn't use one of them, it should use that chunk for the next > > DDP-eligible XDR data item. > > It does say that and the treatment there is limited to the case of multiple write chunks. The treatment for the analogous case when there is a single write chunk is addressed in 4.4.6.2. Th treatment is the same: each DDP-eligible item is matched with the first available write chunk until either there are no more write chunks or no more DDP-eligible items in the reply. > > > The problematic text is actually this part of rfc5666bis: > > I don't see why this is described as "problematic". Is there a suggestion that we might change this text to something less problematic? I don't see how to do that. I believe we should leave the text as it is. However, we can say that the default behavior for RPC-over-RDMA is to skip to the next result, but that there is no prohibition for ULBs to amend that default. I took that approach below. > Within Version One, there is no way to tie write chunks to particular DDP-eligible items. While one might think of that as a problem it is not one that can be addressed in Version One or in rfc5667bis. There is a way to tie these together. Here is the text I have adopted for rfc5667bis (revision not yet submitted). First: 2.2.1. Empty Write Chunks Section 4.4.6.2 of [I-D.ietf-nfsv4-rfc5666bis] defines the concept of unused Write chunks. An unused Write chunk is a Write chunk with either zero segments or where all segments in the Write chunk have zero length. In this document these are referred to as "empty" Write chunks. A "non-empty" Write chunk has at least one segment of non- zero length. An NFS client might wish an NFS server to return a DDP-eligible result inline. If there is only one DDP-eligible result item in the reply, the NFS client simply specifies an empty Write list to force the NFS server to return that result inline. If there are multiple DDP-eligible results, the NFS client specifies empty Write chunks for each DDP-eligible data item that it wishes to be returned inline. An NFS server might encounter an XDR union result where there are arms that have a DDP-eligible result, and arms that do not. If the NFS client has provided a non-empty Write chunk that matches with a DDP-eligible result, but the response does not contain that result, the NFS server MUST return an empty Write chunk in that position in the Write list. And then, in Section 4.3: The mechanism specified in Section 5.3.2 of [I-D.ietf-nfsv4-rfc5666bis]) is applied here, with some additional restrictions. In the following list, a "READ" operation refers to either a READ, READ_PLUS, or READLINK operation. o If an NFS client does not wish to use direct placement for any DDP-eligible item in an NFS reply, it leaves the Write list empty. o The first chunk in the Write list MUST be used by the first READ operation in an NFS version 4 COMPOUND procedure. The next Write chunk is used by the next READ operation, and so on. o If an NFS client has provided a matching non-empty Write chunk, then the corresponding READ operation MUST return its data by placing data into that chunk. o If an NFS client has provided an empty matching Write chunk, then the corresponding READ operation MUST return its result inline. o If a READ operation returns a union arm which does not contain a DDP-eligible result, and the NFS client has provided a matching non-empty Write chunk, the NFS server MUST return an empty Write chunk in that Write list position. o If there are more READ operations than Write chunks, then any remaining READ operations in the COMPOUND MUST return their results inline. This problem has not arisen before for two reasons: - So far NFSv4 clients do not build COMPOUNDs that contain more DDP-eligible results than NFSv3 clients use (ie, only one per RPC). - So far union results are used only in error cases; and when an operation returns an error, it is always the last one in the COMPOUND reply. But now we have NFSv4.2 READ_PLUS. READ_PLUS can return: - NFS4_CONTENT_DATA, which has an opaque DDP-eligible result - NFS4_CONTENT_HOLE, which has no DDP-eligible result - an error status, which has no DDP-eligible result So we now have a situation where there are operations that can follow a result that might or might not be returned in a Write chunk. Suppose you have a COMPOUND that looks like this: { SEQUENCE, PUTFH, READ_PLUS(2048), READ_PLUS(9000), GETATTR } Using the currently proposed scheme, the client must set up two Write chunks that can receive 9000 bytes, to handle the case where the first READ_PLUS returns NFS4_CONTENT_HOLE. After the reply is received, the client has to ensure that the first returned Write chunk is matched to the second XDR result, which could be troublesome for some implementations. What I propose is that if the first READ_PLUS returns NFS4_CONTENT_HOLE, the server would return an empty first Write chunk. Then the second READ_PLUS result always lines up with the second Write chunk, which IMO is much better for clients. -- Chuck Lever
- [nfsv4] rfc5667bis open issues Chuck Lever
- Re: [nfsv4] rfc5667bis open issues Chuck Lever
- Re: [nfsv4] rfc5667bis open issues David Noveck
- Re: [nfsv4] rfc5667bis open issues Chuck Lever
- Re: [nfsv4] rfc5667bis open issues David Noveck
- Re: [nfsv4] rfc5667bis open issues Chuck Lever
- Re: [nfsv4] rfc5667bis open issues Chuck Lever
- Re: [nfsv4] rfc5667bis open issues David Noveck
- Re: [nfsv4] rfc5667bis open issues Chuck Lever