Re: [nfsv4] Clean up for rpcrdma-version-two Read chunks

Chuck Lever <chuck.lever@oracle.com> Tue, 05 May 2020 19:43 UTC

Return-Path: <chuck.lever@oracle.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8CCBC3A0400 for <nfsv4@ietfa.amsl.com>; Tue, 5 May 2020 12:43:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=oracle.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TvcROPRqXuGU for <nfsv4@ietfa.amsl.com>; Tue, 5 May 2020 12:43:47 -0700 (PDT)
Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5A8EB3A044D for <nfsv4@ietf.org>; Tue, 5 May 2020 12:43:35 -0700 (PDT)
Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 045Jh8LR176290; Tue, 5 May 2020 19:43:32 GMT
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : message-id : content-type : mime-version : subject : date : in-reply-to : cc : to : references; s=corp-2020-01-29; bh=wrLqVbjONcjXWcpjzerHi8M2Ob11Pls+Waf0x9oWN9U=; b=kBA5zUyhj6DS3eks9N5HdnTBWBKoy7/FvqmWUESOFWitA1Pn+jIpc8mbTM3cr2muH2h7 NY0L3VauzqchSbIpzBZW6YCSzZsNENT+w2k5RXTEnbrWzceW/kQY+O4oDGeWnhrx3eav K7dWOKC7IY7lMknAM9CggVs18bEXCyZCn0a0H1vRapJQUe2/axZqNVedYwOHLDJcmhCC HIbZZxCZv+WqdA0x/8apVPvqB1JTI6k4W6dpTu15dr3cvWy3HtoPQD6y1yVtBsIMGmdj yfP5yeERjXoIH/kw8U6gJPfUbKtDitEbfijNNIgJCmG7sfM1+ax8bJzUoyaul4oOGFX2 cg==
Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by aserp2120.oracle.com with ESMTP id 30s0tmersb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 05 May 2020 19:43:32 +0000
Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 045JhQx1137211; Tue, 5 May 2020 19:43:32 GMT
Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserp3020.oracle.com with ESMTP id 30sjnfj2rv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 05 May 2020 19:43:31 +0000
Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 045JhUHV018689; Tue, 5 May 2020 19:43:30 GMT
Received: from dhcp-10-39-192-169.vpn.oracle.com (/10.39.192.169) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 05 May 2020 12:43:29 -0700
From: Chuck Lever <chuck.lever@oracle.com>
Message-Id: <2A510455-71DA-4C3E-877A-F50D6EB4CDA5@oracle.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_01836EF9-2A49-4A4C-A3DE-516A68486EFF"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\))
Date: Tue, 05 May 2020 15:43:27 -0400
In-Reply-To: <CADaq8jfEKBOnQKDFvvLfd4jaKjtCZWOYZBzQb2V=REJKu_+=bA@mail.gmail.com>
Cc: NFSv4 <nfsv4@ietf.org>
To: David Noveck <davenoveck@gmail.com>
References: <A999AEE0-9201-4A73-AC9D-005500A32BCA@oracle.com> <CADaq8jfXo65s-nPP0eh_zwJUtZ194XQrth8f5RpmMvy_54urVA@mail.gmail.com> <97344C4C-E9A5-4230-B477-F5E2775BED85@oracle.com> <CADaq8jfEKBOnQKDFvvLfd4jaKjtCZWOYZBzQb2V=REJKu_+=bA@mail.gmail.com>
X-Mailer: Apple Mail (2.3445.104.11)
X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9612 signatures=668687
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 adultscore=0 phishscore=0 mlxlogscore=999 bulkscore=0 malwarescore=0 spamscore=0 suspectscore=3 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005050150
X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9612 signatures=668687
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 malwarescore=0 mlxscore=0 priorityscore=1501 lowpriorityscore=0 spamscore=0 suspectscore=3 phishscore=0 clxscore=1015 bulkscore=0 mlxlogscore=999 adultscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2005050150
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/kJIyenPdO4cS46T5EqCByx1W78U>
Subject: Re: [nfsv4] Clean up for rpcrdma-version-two Read chunks
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 05 May 2020 19:43:50 -0000


> On May 5, 2020, at 3:04 PM, David Noveck <davenoveck@gmail.com> wrote:
> 
>       > This would require sorting by the receiver.
> 
> > It would indeed. 
> 
> It would require that if they could arrive unsorted.

Even if the Requester is "REQUIRED to send these in order" there can always be Requester implementation bugs
which mis-order the segments. That's why the Responder implementation has to be defensive anyway.


> > The alternative is kind of disastrous.
> 
> I think a reasonable alternative is to ask the sender to send them in order.  I understand you don't agree with that but can't see how it is "disastrous".

I don't have a problem with suggesting or requiring the sender to organize the Read list in a particular order. I just don't
think it's a full solution. The possibility of Requester bugs or malicious intent necessitates that Responders be protective
anyway. A Read list ordering requirement doesn't make the need for Responder defensiveness go away. At the very
least the Responder still has to recognize mis-sorted Read lists and bat them down.

It's the difference between "the protocol requires receivers to recognize and reject badness" versus "the protocol makes
it impossible to express badness".


> > Gaps would probably contain data from a previous use of the Read sink
> > buffer.
> 
> I don't see how.  If you have multiple read chunks, it is probably due to multiple operations, each of which contain a DDP-eligible data area, since there is no op with more than one DDp-eligible argument. In this cse the non-DDP_eligible data appears inline.
> 
> I guess you are focused on the case of a mix of PZRCS and PNZRCS, which is kind of a special case.

For the foreseeable future, it is a common special case. This case arises frequently for krb5, for example.


> > The exact contents of overlap regions would depend on the order that the
> > RNIC completed the RDMA Read operations.
> 
> I think we agree about overlapping chunks being bad but only disagree about
> how best to prevent them.
> 
> > Thus IMO a good quality Responder has to perform some sanity checking on
> > the position values and lengths in incoming Read segments. 
> 
> I Agree.
> 
> > I'm not sure
> > how it could avoid some sorting-like behavior to perform this check.
> 
> That's true only if it is valld to send them out-of-order which I don't think
> should be the case.
> 
> > It might be better to place responsibility on the sender to sort these.
> 
> I should have said "send these in sorted order".  Realistically, requesters
> will send these in sorted order and no actual sort would be required.

True enough for Requesters we know of with sane implementations. However,
as we well know, that doesn't mean this will always be the case. We need to
somehow document our assumptions and requirements here.


> > IMO the goal of RPC/RDMA is to reduce host processing on the Requester.
> 
> That's an important goal but don't agree that it is *the* goal.
> 
> > Sorting on the Responder follows that paradigm more closely than the
> > converse.
> 
> I think requring them being in sorted order avoids a lot of complexity.
> I haven't seen any case where it makes sense to send them other than in sorted order.
> 
> With regard to this goal/paradigm, alll I can say is that T. S. Kuhn has lot to answer for :-)
> 
> > For a given position value, RPC/RDMA already requires that Read segments
> > have to be in the order they appear in the reconstructed RPC message. But
> > there's currently no requirement that the Read list's position values have
> > to appear in monotonically increasing order. In fact I think RPC/RDMA
> > permits a Requester to interleave Read segments at different positions, as
> > long as they are in the Read list in the order they should be used to
> > reconstruct the RPC Call.
> 
> True but I don't see why any requester would find it necessary/helpful
>  to interleave read segments in this way, except perhaps in the special case
> a PZRC/PNZRC mix.

Again, that's the particular case that is likely to be frequent and impact RPC/RDMA v1
as well as v2.


> > I'm working on some changes to the Linux NFS/RDMA implementation that might
> > perform Responder-side Read list sorting in order to deal properly with
> > Read lists that contain segments with more than one Position value. For
> > example, a Read list that contains Read segments with position zero and
> > Read segments with a non-zero position could be re-sorted so that all of
> > the segments are in byte order at position zero. 
> 
> You are talking about doing some thing that goes beyond sorting, even though
> sorting is a part of it and it appears to me that you are sorting by someting
> apporoximating chunk position.
> You are reorganizing all the read chunks including 
> changing non-PZ chunks to position-zero chunks.   In addition the sorting is not by
> the position field of the chunk but by expected position.  I can see why you moght
> do if you receive somethning like that that but I can't see why the reqester is using a 
> PZRC to send this, given that, in version two, you can avoid the PZRC and use 
> message continuation to send the request.

A v2 Requester is now able to use Message chaining but it is currently not /required/ to use it.
A v1 Requester is still allowed to send a PZRC and a normal Read chunk.

RPC/RDMA v2 would have to explicitly forbid the use of PZRCs with normal Read chunks.
Is there a sane way to do that?

We could require that RDMA2_NOMSG includes only position zero Read segments.

For situations that call for normal Read chunks with an inline body that is larger than the
transport's inline threshold, the transport is required then to use RDMA2_MSG and Message
chaining.

If the transport's Message chaining limits can't accommodate that arrangement, then the
Requester has to use a PZRC (ie, no normal Read chunks).


--
Chuck Lever