Re: [nfsv4] Clean up for rpcrdma-version-two Read chunks

Chuck Lever <chuck.lever@oracle.com> Tue, 05 May 2020 06:21 UTC

Return-Path: <chuck.lever@oracle.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 336A73A1429 for <nfsv4@ietfa.amsl.com>; Mon, 4 May 2020 23:21:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.102
X-Spam-Level:
X-Spam-Status: No, score=-2.102 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=oracle.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zjC_ljJrROJ2 for <nfsv4@ietfa.amsl.com>; Mon, 4 May 2020 23:21:08 -0700 (PDT)
Received: from userp2120.oracle.com (userp2120.oracle.com [156.151.31.85]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B4BEE3A0CC9 for <nfsv4@ietf.org>; Mon, 4 May 2020 23:21:08 -0700 (PDT)
Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0456Hnkr131442; Tue, 5 May 2020 06:21:06 GMT
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=content-type : mime-version : subject : from : in-reply-to : date : cc : content-transfer-encoding : message-id : references : to; s=corp-2020-01-29; bh=nXlXuwGzH+2Hk+3LNLUDafTF3R4j3LtF6b3cFI5QV/8=; b=iSksz9wUaigknvZ9SU3pcnmLeVHpPCMxk4eywdsevfHgAiJ3ZRVgEsMiNtdp94XzDAK4 8jOgVmtiX2vXcbJpcFAJo8ddcvJfVem3s0KAw/OCfW+FGJofFF2u6qHG+vNjaBa/desZ qbPteD0ExVoZve8V+0+FySYunOC3NOGi/QCxcPXHVRaah/abFpgtsMDBeNcAF9IgJQPU C7PBOdfJMNYK0Zn77iWbwylcOkbWjSebaBSDBkRlTlpwuruVGlJHP2b5ZK6/Ntj5eZPD UWUSs4ICgzYRNlOq5jKGffSYR+qeq+Cc2qQMvrEc5BTGvZHqOCxjAXWY9juX97eY9Dwk CQ==
Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2120.oracle.com with ESMTP id 30s1gn2knb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 05 May 2020 06:21:06 +0000
Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0456DhnM017726; Tue, 5 May 2020 06:21:05 GMT
Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3030.oracle.com with ESMTP id 30t1r48x4u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 05 May 2020 06:21:05 +0000
Received: from abhmp0015.oracle.com (abhmp0015.oracle.com [141.146.116.21]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0456L4Vt000341; Tue, 5 May 2020 06:21:04 GMT
Received: from anon-dhcp-153.1015granger.net (/68.61.232.219) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 04 May 2020 23:21:04 -0700
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
From: Chuck Lever <chuck.lever@oracle.com>
In-Reply-To: <CADaq8jfXo65s-nPP0eh_zwJUtZ194XQrth8f5RpmMvy_54urVA@mail.gmail.com>
Date: Tue, 05 May 2020 02:21:02 -0400
Cc: NFSv4 <nfsv4@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <97344C4C-E9A5-4230-B477-F5E2775BED85@oracle.com>
References: <A999AEE0-9201-4A73-AC9D-005500A32BCA@oracle.com> <CADaq8jfXo65s-nPP0eh_zwJUtZ194XQrth8f5RpmMvy_54urVA@mail.gmail.com>
To: David Noveck <davenoveck@gmail.com>
X-Mailer: Apple Mail (2.3445.104.11)
X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9611 signatures=668687
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 suspectscore=0 spamscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005050052
X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9611 signatures=668687
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 suspectscore=0 mlxscore=0 spamscore=0 clxscore=1015 priorityscore=1501 bulkscore=0 phishscore=0 impostorscore=0 malwarescore=0 lowpriorityscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005050052
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/c9OCq2NVukYjLsFGz4sMhd5oQE0>
Subject: Re: [nfsv4] Clean up for rpcrdma-version-two Read chunks
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 05 May 2020 06:21:13 -0000

> On May 4, 2020, at 8:59 PM, David Noveck <davenoveck@gmail.com> wrote:
> 
>> On Mon, May 4, 2020, 12:13 PM Chuck Lever <chuck.lever@oracle.com> wrote:
>> 
>>> RPC/RDMA v1 allows a position zero Read chunk to appear in an RDMA_MSG type Call.
>>> Where does a Responder put the inline portion of such a message?
>> 
>> I propose that in RPC/RDMA version 2, a Responder MUST return RDMA2_ERR_BAD_XDR if
>> a Requester sends a Read list containing a position zero Read chunk as part of
>> header type other than RDMA2_NOMSG.
> 
> Agree.
> 
> 
>>> RPC/RDMA v1 does not explicitly require an RDMA_NOMSG type Call to have a position
>>> zero Read chunk. Does such a message have gaps? Are they zero-filled?
>> 
>> I propose that in RPC/RDMA version 2, a Responder MUST return RDMA2_ERR_BAD_XDR if
>> a Requester sends an RDMA2_NOMSG header type whose Read list does not include a
>> position zero Read chunk.
> 
> As stated, this would forbid NOMSG bring used to send a long reply.

Nit: rpcrdma-version-two no longer uses the term Long message, see
Section 4.4.3.


> I think the text to address this needs to be careful not to foreclose that. Your text above uses the word "requester" assuming this is sufficient but the only way a peer receiving message could determine whether it was sent by a requester or responder is my looking at the message, which, in this case, does not exist.

The RPC/RDMA version two header has the RDMA2_F_RESPONSE flag (Section
6.2.2.1) which was introduced to enable a receiver to distinguish the
roles of the sender and receiver peers without sniffing the RPC layer
payload.

A while back I had envisioned using Read chunks in Responder-to-
Requester messages, and even wrote an I-D about it. But now that we
have both Reply chunks and Message chaining, it seems unnecessary to
hold the door open for using a Read chunk for a Reply, especially
given how arcane Read chunks are. Did you have a particular use case in
mind?

Also, I think the proposed text above would prevent the use of the
RDMA2_NOMSG type for asynchronous credit grants (Section 4.2.1.2) so I
ought to restate the requirement as:

>> A Responder MUST return RDMA2_ERR_BAD_XDR if a Requester sends an
>> RDMA2_NOMSG header type with a non-empty Read list that does not
>> include a position zero Read chunk.



>>> RPC/RDMA v1 does not prevent or prohibit overlapping Read chunks. Is the correct
>>> response ERR_CHUNK?
>> 
>> A protocol change would be needed to totally prevent the expression of overlapping
>> Read chunks. Maybe it's a little too late to address that in RPC/RDMA version 2.
> 
> I think you mean version 1.

No, I meant version 2. My sense of the virtual room two weeks ago was
that no-one had the stomach for major surgery on the RPC/RDMA version 2
data structures at this point. That's why I've limited the above proposals
to simple requirements that the Responder recognize badly formed Read
lists and respond to them as errors.

If we were to "go there," my thought about how to address the gap/overlap
issue would be to eliminate Read chunks and structure the Read list the
same way that the Write list is structured; ie, as a list of arrays of
RDMA segments, but each array would have a position field.

It might be interesting to use position fields in the Write list as well,
filled in by the Responder, to help disambiguate the position of result
data items in a Reply message.

But at this point we have escaped the orbit of RPC/RDMA version one entirely.


> Nobody seems up to do rfc8166bis.
> 
>> I propose that in RPC/RDMA version 2, a Responder MUST return RDMA2_ERR_BAD_XDR if
>> a Requester sends a Read list with chunks whose offsets and lengths result in the
>> same message byte position appearing in more than one Read chunk.
> 
> This would require sorting by the receiver.

It would indeed. The alternative is kind of disastrous.

Gaps would probably contain data from a previous use of the Read sink
buffer.

The exact contents of overlap regions would depend on the order that the
RNIC completed the RDMA Read operations.

Thus IMO a good quality Responder has to perform some sanity checking on
the position values and lengths in incoming Read segments. I'm not sure
how it could avoid some sorting-like behavior to perform this check.


> It might be better to place responsibility on the sender to sort these.

IMO the goal of RPC/RDMA is to reduce host processing on the Requester.
Sorting on the Responder follows that paradigm more closely than the
converse.

For a given position value, RPC/RDMA already requires that Read segments
have to be in the order they appear in the reconstructed RPC message. But
there's currently no requirement that the Read list's position values have
to appear in monotonically increasing order. In fact I think RPC/RDMA
permits a Requester to interleave Read segments at different positions, as
long as they are in the Read list in the order they should be used to
reconstruct the RPC Call.

I'm working on some changes to the Linux NFS/RDMA implementation that might
perform Responder-side Read list sorting in order to deal properly with
Read lists that contain segments with more than one Position value. For
example, a Read list that contains Read segments with position zero and
Read segments with a non-zero position could be re-sorted so that all of
the segments are in byte order at position zero. This enables the Responder
to set up the Read sink buffer pages so at Read completion the message is
already in proper segment and byte order.


--
Chuck Lever