Re: [nfsv4] Clean up for rpcrdma-version-two Read chunks
Chuck Lever <chuck.lever@oracle.com> Tue, 05 May 2020 06:21 UTC
Return-Path: <chuck.lever@oracle.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 336A73A1429 for <nfsv4@ietfa.amsl.com>; Mon, 4 May 2020 23:21:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.102
X-Spam-Level:
X-Spam-Status: No, score=-2.102 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=oracle.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zjC_ljJrROJ2 for <nfsv4@ietfa.amsl.com>; Mon, 4 May 2020 23:21:08 -0700 (PDT)
Received: from userp2120.oracle.com (userp2120.oracle.com [156.151.31.85]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B4BEE3A0CC9 for <nfsv4@ietf.org>; Mon, 4 May 2020 23:21:08 -0700 (PDT)
Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0456Hnkr131442; Tue, 5 May 2020 06:21:06 GMT
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=content-type : mime-version : subject : from : in-reply-to : date : cc : content-transfer-encoding : message-id : references : to; s=corp-2020-01-29; bh=nXlXuwGzH+2Hk+3LNLUDafTF3R4j3LtF6b3cFI5QV/8=; b=iSksz9wUaigknvZ9SU3pcnmLeVHpPCMxk4eywdsevfHgAiJ3ZRVgEsMiNtdp94XzDAK4 8jOgVmtiX2vXcbJpcFAJo8ddcvJfVem3s0KAw/OCfW+FGJofFF2u6qHG+vNjaBa/desZ qbPteD0ExVoZve8V+0+FySYunOC3NOGi/QCxcPXHVRaah/abFpgtsMDBeNcAF9IgJQPU C7PBOdfJMNYK0Zn77iWbwylcOkbWjSebaBSDBkRlTlpwuruVGlJHP2b5ZK6/Ntj5eZPD UWUSs4ICgzYRNlOq5jKGffSYR+qeq+Cc2qQMvrEc5BTGvZHqOCxjAXWY9juX97eY9Dwk CQ==
Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2120.oracle.com with ESMTP id 30s1gn2knb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 05 May 2020 06:21:06 +0000
Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0456DhnM017726; Tue, 5 May 2020 06:21:05 GMT
Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3030.oracle.com with ESMTP id 30t1r48x4u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 05 May 2020 06:21:05 +0000
Received: from abhmp0015.oracle.com (abhmp0015.oracle.com [141.146.116.21]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0456L4Vt000341; Tue, 5 May 2020 06:21:04 GMT
Received: from anon-dhcp-153.1015granger.net (/68.61.232.219) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 04 May 2020 23:21:04 -0700
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
From: Chuck Lever <chuck.lever@oracle.com>
In-Reply-To: <CADaq8jfXo65s-nPP0eh_zwJUtZ194XQrth8f5RpmMvy_54urVA@mail.gmail.com>
Date: Tue, 05 May 2020 02:21:02 -0400
Cc: NFSv4 <nfsv4@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <97344C4C-E9A5-4230-B477-F5E2775BED85@oracle.com>
References: <A999AEE0-9201-4A73-AC9D-005500A32BCA@oracle.com> <CADaq8jfXo65s-nPP0eh_zwJUtZ194XQrth8f5RpmMvy_54urVA@mail.gmail.com>
To: David Noveck <davenoveck@gmail.com>
X-Mailer: Apple Mail (2.3445.104.11)
X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9611 signatures=668687
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=0 spamscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005050052
X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9611 signatures=668687
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 suspectscore=0 mlxscore=0 spamscore=0 clxscore=1015 priorityscore=1501 bulkscore=0 phishscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005050052
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/c9OCq2NVukYjLsFGz4sMhd5oQE0>
Subject: Re: [nfsv4] Clean up for rpcrdma-version-two Read chunks
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 05 May 2020 06:21:13 -0000
> On May 4, 2020, at 8:59 PM, David Noveck <davenoveck@gmail.com> wrote: > >> On Mon, May 4, 2020, 12:13 PM Chuck Lever <chuck.lever@oracle.com> wrote: >> >>> RPC/RDMA v1 allows a position zero Read chunk to appear in an RDMA_MSG type Call. >>> Where does a Responder put the inline portion of such a message? >> >> I propose that in RPC/RDMA version 2, a Responder MUST return RDMA2_ERR_BAD_XDR if >> a Requester sends a Read list containing a position zero Read chunk as part of >> header type other than RDMA2_NOMSG. > > Agree. > > >>> RPC/RDMA v1 does not explicitly require an RDMA_NOMSG type Call to have a position >>> zero Read chunk. Does such a message have gaps? Are they zero-filled? >> >> I propose that in RPC/RDMA version 2, a Responder MUST return RDMA2_ERR_BAD_XDR if >> a Requester sends an RDMA2_NOMSG header type whose Read list does not include a >> position zero Read chunk. > > As stated, this would forbid NOMSG bring used to send a long reply. Nit: rpcrdma-version-two no longer uses the term Long message, see Section 4.4.3. > I think the text to address this needs to be careful not to foreclose that. Your text above uses the word "requester" assuming this is sufficient but the only way a peer receiving message could determine whether it was sent by a requester or responder is my looking at the message, which, in this case, does not exist. The RPC/RDMA version two header has the RDMA2_F_RESPONSE flag (Section 6.2.2.1) which was introduced to enable a receiver to distinguish the roles of the sender and receiver peers without sniffing the RPC layer payload. A while back I had envisioned using Read chunks in Responder-to- Requester messages, and even wrote an I-D about it. But now that we have both Reply chunks and Message chaining, it seems unnecessary to hold the door open for using a Read chunk for a Reply, especially given how arcane Read chunks are. Did you have a particular use case in mind? Also, I think the proposed text above would prevent the use of the RDMA2_NOMSG type for asynchronous credit grants (Section 4.2.1.2) so I ought to restate the requirement as: >> A Responder MUST return RDMA2_ERR_BAD_XDR if a Requester sends an >> RDMA2_NOMSG header type with a non-empty Read list that does not >> include a position zero Read chunk. >>> RPC/RDMA v1 does not prevent or prohibit overlapping Read chunks. Is the correct >>> response ERR_CHUNK? >> >> A protocol change would be needed to totally prevent the expression of overlapping >> Read chunks. Maybe it's a little too late to address that in RPC/RDMA version 2. > > I think you mean version 1. No, I meant version 2. My sense of the virtual room two weeks ago was that no-one had the stomach for major surgery on the RPC/RDMA version 2 data structures at this point. That's why I've limited the above proposals to simple requirements that the Responder recognize badly formed Read lists and respond to them as errors. If we were to "go there," my thought about how to address the gap/overlap issue would be to eliminate Read chunks and structure the Read list the same way that the Write list is structured; ie, as a list of arrays of RDMA segments, but each array would have a position field. It might be interesting to use position fields in the Write list as well, filled in by the Responder, to help disambiguate the position of result data items in a Reply message. But at this point we have escaped the orbit of RPC/RDMA version one entirely. > Nobody seems up to do rfc8166bis. > >> I propose that in RPC/RDMA version 2, a Responder MUST return RDMA2_ERR_BAD_XDR if >> a Requester sends a Read list with chunks whose offsets and lengths result in the >> same message byte position appearing in more than one Read chunk. > > This would require sorting by the receiver. It would indeed. The alternative is kind of disastrous. Gaps would probably contain data from a previous use of the Read sink buffer. The exact contents of overlap regions would depend on the order that the RNIC completed the RDMA Read operations. Thus IMO a good quality Responder has to perform some sanity checking on the position values and lengths in incoming Read segments. I'm not sure how it could avoid some sorting-like behavior to perform this check. > It might be better to place responsibility on the sender to sort these. IMO the goal of RPC/RDMA is to reduce host processing on the Requester. Sorting on the Responder follows that paradigm more closely than the converse. For a given position value, RPC/RDMA already requires that Read segments have to be in the order they appear in the reconstructed RPC message. But there's currently no requirement that the Read list's position values have to appear in monotonically increasing order. In fact I think RPC/RDMA permits a Requester to interleave Read segments at different positions, as long as they are in the Read list in the order they should be used to reconstruct the RPC Call. I'm working on some changes to the Linux NFS/RDMA implementation that might perform Responder-side Read list sorting in order to deal properly with Read lists that contain segments with more than one Position value. For example, a Read list that contains Read segments with position zero and Read segments with a non-zero position could be re-sorted so that all of the segments are in byte order at position zero. This enables the Responder to set up the Read sink buffer pages so at Read completion the message is already in proper segment and byte order. -- Chuck Lever
- [nfsv4] Clean up for rpcrdma-version-two Read chu… Chuck Lever
- Re: [nfsv4] Clean up for rpcrdma-version-two Read… David Noveck
- Re: [nfsv4] Clean up for rpcrdma-version-two Read… Chuck Lever
- Re: [nfsv4] Clean up for rpcrdma-version-two Read… David Noveck
- Re: [nfsv4] Clean up for rpcrdma-version-two Read… Chuck Lever
- Re: [nfsv4] Clean up for rpcrdma-version-two Read… Chuck Lever