Re: [nfsv4] Clean up for rpcrdma-version-two Read chunks
David Noveck <davenoveck@gmail.com> Tue, 05 May 2020 19:04 UTC
Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E01D23A05DE for <nfsv4@ietfa.amsl.com>; Tue, 5 May 2020 12:04:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8GdQCK53ByuX for <nfsv4@ietfa.amsl.com>; Tue, 5 May 2020 12:04:18 -0700 (PDT)
Received: from mail-ed1-x529.google.com (mail-ed1-x529.google.com [IPv6:2a00:1450:4864:20::529]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F09283A05AA for <nfsv4@ietf.org>; Tue, 5 May 2020 12:04:17 -0700 (PDT)
Received: by mail-ed1-x529.google.com with SMTP id d16so2690874edq.7 for <nfsv4@ietf.org>; Tue, 05 May 2020 12:04:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=2LykN3gJ4f6nNyWB6h1wM5rJXfcXCw64HyYmNEqK33Y=; b=Eat6TSzxKpaSiCpm8AoO9Hsh+ZqmhN920VKkIw8HfsSufgLNk0HqLHQFN0IHMqcueL Ehe9ZSjgLtqolnN8pf6TDyg1NfSUdPHL8s8qHmo+yOybkbPOpYbp5QgcLUV2wV74b5hX S2X0oUoJESdpeWbZRLKMhnwKXEG9YkLvuYiMMA1qME9EMsYuHqujic+ro9GfGRkggJRh D3P25BSSwb4X6hOJ2hETYaew/j6gTP/2yOH2F3inhPxhrvrv9af2LnVbO+ekdmmS6WaV lXQjBCoBr0SIbaIrEWVRynH0piXTjfhK8cQ3c39mE9IaHC7GDKIy6nGb9hLmwwUmQgqq 53uA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=2LykN3gJ4f6nNyWB6h1wM5rJXfcXCw64HyYmNEqK33Y=; b=DarvsO9yP+DORLpR8FDwLZ8qZsawn1qvp/V+kMycNiWAlwkcAzy7FKkDu3B0IprHSE imofe+OZrUBGE7Bb+8jBrGsjRO3YMj7wDI0pXR2jp7bgeumYUvOPuC5N5ZSRVcDVaxY9 4ZmzOcakpQh26Ct/atwADNo3lSGldORJb7BbaIs5yAGgeG1KRC7mH07hSBdp00DvV/XB 2nJGs7DRmhgcm2yiz6fSfP9OBHt8p2oykfXLAFLXrbVt+dRtyMbbTMHrOOlqeVgFX7k7 feT+oH/+ArlGrJon9E8T3vLF5uEJLIz84wIp9oEVjHXXovpYrqi7sb8w9+1fJU2W3JhF XIzQ==
X-Gm-Message-State: AGi0PuZ1bioH+J5odAB3lS2RAQgztd+j3E4unBQdmh0z4GZXqDfipCx9 HxENrt54E8jJ2cduyLwqrx/4PC0iNFFQ13snXls=
X-Google-Smtp-Source: APiQypJM2hlUcmr+R+kBL5DjnZ3ep+VOMo1rK1/ny4QCaiqqO66mxpjT3/fnhCgh/dYS7UnFfv/Yt78jlMBB2I/Xcxk=
X-Received: by 2002:a05:6402:4:: with SMTP id d4mr3910469edu.344.1588705456393; Tue, 05 May 2020 12:04:16 -0700 (PDT)
MIME-Version: 1.0
References: <A999AEE0-9201-4A73-AC9D-005500A32BCA@oracle.com> <CADaq8jfXo65s-nPP0eh_zwJUtZ194XQrth8f5RpmMvy_54urVA@mail.gmail.com> <97344C4C-E9A5-4230-B477-F5E2775BED85@oracle.com>
In-Reply-To: <97344C4C-E9A5-4230-B477-F5E2775BED85@oracle.com>
From: David Noveck <davenoveck@gmail.com>
Date: Tue, 05 May 2020 15:04:05 -0400
Message-ID: <CADaq8jfEKBOnQKDFvvLfd4jaKjtCZWOYZBzQb2V=REJKu_+=bA@mail.gmail.com>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: NFSv4 <nfsv4@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000061922605a4eb50d9"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/OkTlc8JiBcW4tbMogyb45rUYXEo>
Subject: Re: [nfsv4] Clean up for rpcrdma-version-two Read chunks
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 05 May 2020 19:04:22 -0000
> This would require sorting by the receiver. >It would indeed. It would require that if they could arrive unsorted. > The alternative is kind of disastrous. I think a reasonable alternative is to ask the sender to send them in order. I understand you don't agree with that but can't see how it is "disastrous". > Gaps would probably contain data from a previous use of the Read sink > buffer. I don't see how. If you have multiple read chunks, it is probably due to multiple operations, each of which contain a DDP-eligible data area, since there is no op with more than one DDp-eligible argument. In this cse the non-DDP_eligible data appears inline. I guess you are focused on the case of a mix of PZRCS and PNZRCS, which is kind of a special case. > The exact contents of overlap regions would depend on the order that the > RNIC completed the RDMA Read operations. I think we agree about overlapping chunks being bad but only disagree about how best to prevent them. > Thus IMO a good quality Responder has to perform some sanity checking on > the position values and lengths in incoming Read segments. I Agree. > I'm not sure > how it could avoid some sorting-like behavior to perform this check. That's true only if it is valld to send them out-of-order which I don't think should be the case. > It might be better to place responsibility on the sender to sort these. I should have said "send these in sorted order". Realistically, requesters will send these in sorted order and no actual sort would be required. > IMO the goal of RPC/RDMA is to reduce host processing on the Requester. That's an important goal but don't agree that it is *the* goal. > Sorting on the Responder follows that paradigm more closely than the > converse. I think requring them being in sorted order avoids a lot of complexity. I haven't seen any case where it makes sense to send them other than in sorted order. With regard to this goal/paradigm, alll I can say is that T. S. Kuhn has lot to answer for :-) > For a given position value, RPC/RDMA already requires that Read segments > have to be in the order they appear in the reconstructed RPC message. But > there's currently no requirement that the Read list's position values have > to appear in monotonically increasing order. In fact I think RPC/RDMA > permits a Requester to interleave Read segments at different positions, as > long as they are in the Read list in the order they should be used to > reconstruct the RPC Call. True but I don't see why any requester would find it necessary/helpful to interleave read segments in this way, except perhaps in the special case a PZRC/PNZRC mix. > I'm working on some changes to the Linux NFS/RDMA implementation that might > perform Responder-side Read list sorting in order to deal properly with > Read lists that contain segments with more than one Position value. For > example, a Read list that contains Read segments with position zero and > Read segments with a non-zero position could be re-sorted so that all of > the segments are in byte order at position zero. You are talking about doing some thing that goes beyond sorting, even though sorting is a part of it and it appears to me that you are sorting by someting apporoximating chunk position. You are reorganizing all the read chunks including changing non-PZ chunks to position-zero chunks. In addition the sorting is not by the position field of the chunk but by expected position. I can see why you moght do if you receive somethning like that that but I can't see why the reqester is using a PZRC to send this, given that, in version two, you can avoid the PZRC and use message continuation to send the request. > This enables the Responder > to set up the Read sink buffer pages so at Read completion the message is > already in proper segment and byte order. The problem with this is that when some of non-position-zero read chunks are to be written to file offsets not a multiple of the page size. the alignment of the data in serrver memory is not what you would want itto be. I think it is better, when allocating the pages to be read-into and ten page flipped as part ofthe WRITEs, to be aware of the within-file offsets of the WRITEs. On Tue, May 5, 2020 at 2:21 AM Chuck Lever <chuck.lever@oracle.com> wrote: > > > On May 4, 2020, at 8:59 PM, David Noveck <davenoveck@gmail.com> wrote: > > > >> On Mon, May 4, 2020, 12:13 PM Chuck Lever <chuck.lever@oracle.com> > wrote: > >> > >>> RPC/RDMA v1 allows a position zero Read chunk to appear in an RDMA_MSG > type Call. > >>> Where does a Responder put the inline portion of such a message? > >> > >> I propose that in RPC/RDMA version 2, a Responder MUST return > RDMA2_ERR_BAD_XDR if > >> a Requester sends a Read list containing a position zero Read chunk as > part of > >> header type other than RDMA2_NOMSG. > > > > Agree. > > > > > >>> RPC/RDMA v1 does not explicitly require an RDMA_NOMSG type Call to > have a position > >>> zero Read chunk. Does such a message have gaps? Are they zero-filled? > >> > >> I propose that in RPC/RDMA version 2, a Responder MUST return > RDMA2_ERR_BAD_XDR if > >> a Requester sends an RDMA2_NOMSG header type whose Read list does not > include a > >> position zero Read chunk. > > > > As stated, this would forbid NOMSG bring used to send a long reply. > > Nit: rpcrdma-version-two no longer uses the term Long message, see > Section 4.4.3. > > > > I think the text to address this needs to be careful not to foreclose > that. Your text above uses the word "requester" assuming this is sufficient > but the only way a peer receiving message could determine whether it was > sent by a requester or responder is my looking at the message, which, in > this case, does not exist. > > The RPC/RDMA version two header has the RDMA2_F_RESPONSE flag (Section > 6.2.2.1) which was introduced to enable a receiver to distinguish the > roles of the sender and receiver peers without sniffing the RPC layer > payload. > > A while back I had envisioned using Read chunks in Responder-to- > Requester messages, and even wrote an I-D about it. But now that we > have both Reply chunks and Message chaining, it seems unnecessary to > hold the door open for using a Read chunk for a Reply, especially > given how arcane Read chunks are. Did you have a particular use case in > mind? > > Also, I think the proposed text above would prevent the use of the > RDMA2_NOMSG type for asynchronous credit grants (Section 4.2.1.2) so I > ought to restate the requirement as: > > >> A Responder MUST return RDMA2_ERR_BAD_XDR if a Requester sends an > >> RDMA2_NOMSG header type with a non-empty Read list that does not > >> include a position zero Read chunk. > > > > >>> RPC/RDMA v1 does not prevent or prohibit overlapping Read chunks. Is > the correct > >>> response ERR_CHUNK? > >> > >> A protocol change would be needed to totally prevent the expression of > overlapping > >> Read chunks. Maybe it's a little too late to address that in RPC/RDMA > version 2. > > > > I think you mean version 1. > > No, I meant version 2. My sense of the virtual room two weeks ago was > that no-one had the stomach for major surgery on the RPC/RDMA version 2 > data structures at this point. That's why I've limited the above proposals > to simple requirements that the Responder recognize badly formed Read > lists and respond to them as errors. > > If we were to "go there," my thought about how to address the gap/overlap > issue would be to eliminate Read chunks and structure the Read list the > same way that the Write list is structured; ie, as a list of arrays of > RDMA segments, but each array would have a position field. > > It might be interesting to use position fields in the Write list as well, > filled in by the Responder, to help disambiguate the position of result > data items in a Reply message. > > But at this point we have escaped the orbit of RPC/RDMA version one > entirely. > > > > Nobody seems up to do rfc8166bis. > > > >> I propose that in RPC/RDMA version 2, a Responder MUST return > RDMA2_ERR_BAD_XDR if > >> a Requester sends a Read list with chunks whose offsets and lengths > result in the > >> same message byte position appearing in more than one Read chunk. > > > > This would require sorting by the receiver. > > It would indeed. The alternative is kind of disastrous. > > Gaps would probably contain data from a previous use of the Read sink > buffer. > > The exact contents of overlap regions would depend on the order that the > RNIC completed the RDMA Read operations. > > Thus IMO a good quality Responder has to perform some sanity checking on > the position values and lengths in incoming Read segments. I'm not sure > how it could avoid some sorting-like behavior to perform this check. > > > > It might be better to place responsibility on the sender to sort these. > > IMO the goal of RPC/RDMA is to reduce host processing on the Requester. > Sorting on the Responder follows that paradigm more closely than the > converse. > > For a given position value, RPC/RDMA already requires that Read segments > have to be in the order they appear in the reconstructed RPC message. But > there's currently no requirement that the Read list's position values have > to appear in monotonically increasing order. In fact I think RPC/RDMA > permits a Requester to interleave Read segments at different positions, as > long as they are in the Read list in the order they should be used to > reconstruct the RPC Call. > > I'm working on some changes to the Linux NFS/RDMA implementation that might > perform Responder-side Read list sorting in order to deal properly with > Read lists that contain segments with more than one Position value. For > example, a Read list that contains Read segments with position zero and > Read segments with a non-zero position could be re-sorted so that all of > the segments are in byte order at position zero. This enables the Responder > to set up the Read sink buffer pages so at Read completion the message is > already in proper segment and byte order. > > > -- > Chuck Lever > > > >
- [nfsv4] Clean up for rpcrdma-version-two Read chu… Chuck Lever
- Re: [nfsv4] Clean up for rpcrdma-version-two Read… David Noveck
- Re: [nfsv4] Clean up for rpcrdma-version-two Read… Chuck Lever
- Re: [nfsv4] Clean up for rpcrdma-version-two Read… David Noveck
- Re: [nfsv4] Clean up for rpcrdma-version-two Read… Chuck Lever
- Re: [nfsv4] Clean up for rpcrdma-version-two Read… Chuck Lever