Re: [nfsv4] Credit management and one-way messages

Chuck Lever <> Tue, 08 August 2017 16:06 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 1CE7E1326F5 for <>; Tue, 8 Aug 2017 09:06:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -4.22
X-Spam-Status: No, score=-4.22 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id q9MScGzb9BmN for <>; Tue, 8 Aug 2017 09:06:29 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 1F57113268B for <>; Tue, 8 Aug 2017 09:06:29 -0700 (PDT)
Received: from ( []) by (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id v78G6RxR025105 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 8 Aug 2017 16:06:27 GMT
Received: from ( []) by (8.14.4/8.14.4) with ESMTP id v78G6QOq019057 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 8 Aug 2017 16:06:26 GMT
Received: from ( []) by (8.14.4/8.13.8) with ESMTP id v78G6QuS013309; Tue, 8 Aug 2017 16:06:26 GMT
Received: from (/ by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 08 Aug 2017 09:06:26 -0700
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Chuck Lever <>
In-Reply-To: <>
Date: Tue, 08 Aug 2017 12:06:24 -0400
Cc: "" <>
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <> <> <> <> <>
To: David Noveck <>
X-Mailer: Apple Mail (2.3124)
X-Source-IP: []
Archived-At: <>
Subject: Re: [nfsv4] Credit management and one-way messages
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NFSv4 Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 08 Aug 2017 16:06:32 -0000

> On Aug 8, 2017, at 7:30 AM, David Noveck <> wrote:
> > A requester is responsible for posting enough receive buffers to
> > catch Replies for all outstanding RPC Calls. 
> Agree with that.  
> > So either:
> > A. It can post a receive buffer as part of preparing to send a Call.
> > If a Reply is missed, that receive is still posted. The requester
> > has to accommodate for that.
> Yes.
> > B. It can batch-post receive buffers (say, as part of receive
> > completion handling) in order to keep more than enough available.
> With that formulation, receives are still posted in the event of missed reply but
> they are not considred a problem.
> What we seem to agree on and that A and B agree on, is that the responder is free
> to respond to requests without obtaining credits to validate his right to do so.

An RPC-over-RDMA version 1 requester always gates RPC Calls based
on how many credits were granted. The responder depends on that
behavior to know how many Replies it may send. The responder
grants credits, so it is in control of any "rights" to send more
Replies. There is no purpose for sending more Replies than there
have been Calls.

> > The problem, as I see it, occurs when a responder wants to send a
> > one-way message to a requester. 
> I don't think you can send a one-way message to a "requester" or "responder".

Yes, you can. This is exactly the problem with RPC-over-RDMA version 1.

We have a fundamentally asymmetric arrangement here. One side always
requests credits, and one side always grants them. One side always
sends RPC Calls, and one side always sends RPC Replies.

The requester requests credits, and the responder grants them.

> A parrty is either a requester or resopnder only in the context of an RPC.

The transport implementation responsibilities are different depending
on whether the receiver is the requester or the responder. This is
one reason bidirectional RPC-over-RDMA is so challenging.

> If you are not in the process of sending or processing an RPC, you are not a 
> requester or a responder  although you are either a client or a server,
> > In scenario A, that knocks down a
> > receive buffer that was intended to catch a Reply. 
> It knocks down a receive buffer, clearly.

We're talking only about RPC-over-RDMA version 1 at this moment, in
just one direction. So the only thing that can happen on a requester
is that a receive buffer is posted to catch a Reply.

> Specfic receives are not dedicated to specific purposes but I think that one-way
> messages, unlike replies should require a credit for the sender to send them,
> and should not deplete the pool of receves for which there are no associated 
> credits (i.e. the ones posted to accommodate replies).

This is very confusing. First you say "specific receives are not
dedicated to specific purposes" (which is true) then you say that
"one-way messages should not deplete the pool of receives for which
there are no associated credits".

How can there be receives with no associated credits if receives
are not dedicated to a specific purpose?

So now we have three purposes for receive buffers:

 - Catching RPC Calls in either direction
 - Catching RPC Replies in either direction
 - Catching one-way messages

Another way to approach the problem of these "one-way" messages is
to treat them as a control plane and use a separate mechanism for
conveying them. For example, at connection setup, the two sides
can send R_key and offset for a small buffer that can be used
for these messages.

This doesn't help for things like message chaining.

> > The requester
> > is responsible for getting a replacement receive buffer posted
> I don't think the receiver is responsible for this, just as it is not responsible for
> replenishing receives taken up by requests.  It may choose to do so but it is
> not obliged  to do so.

In order to prevent RNR and connection termination, the receiver is
responsible for keeping enough receive WRs in the receive queue. I'm
not sure what you are trying to say here.

> > but there's a possibility that it could not do so fast enough
> > to catch other incoming messages.
> Preventing that is what the credit logic is for, and you can use that
> to prevent over-sending one-way messages, just as it prevnts oversending of 
> requests.  I accept that one-way messages add some additional issues
> (see below) but the same basic logic is applicable. 
> > A responder is responsible for posting enough receive buffers to
> > catch "credit" RPC Calls. It relies on the fact that requesters
> > can't keep more than "credit" RPC Calls in flight, to know exactly
> > how many receive buffers it needs to keep posted.
> I agree.
> > So either:
> I don't see the point of this typology.
> > A. It posts "credit" receive buffers when accepting a connection,
> This formulation is confusing.  I would say that it posts some
> number of receives and then reports the number of such receives
> as "credit".

Since there are other resources that must be in place before a
connection is accepted, a responder typically knows the maximum
number of credits it will support for this transport instance,
and then prepares those resources, including posting receive
buffers, before completing the accept.

Posting receives before the accept completes prevents a race
where the first Send from the requester might not find a
posted Receive waiting for it on the responder.

> > and then it posts a receive buffer as part of preparing to send a
> > Reply. 
> It can post an aditional receive at any time it wants.  It is not tied
> to sending a reply.

Here I'm describing one way of implementing the protocol.

To guarantee that enough receives are posted to handle the balance
of RPC Calls that might be sent, the responder ensures that one new
receive is posted _before_ it sends a Reply.

> > If sending fails, that receive is still posted, and the
> > responder has to accommodate for that.
> I don't see anything to be accommodated.

The transport's Receive Queue and Receive Completion Queue are sized
based on the number of credits the responder supports for this
transport. The responder cannot post more Receives than there are
Receive WQEs and CQEs. In order to prevent a queue overrun, the
responder has to know when to stop posting Receives.

> The responder is
> required to report as "credit" the number of receives that have
> been posted.  This situation is unlike the other "A" above.
> > B. It can batch-post receive buffers in order to keep more than
> > enough available.
> There is no definition of "enough" or "more than enough".  The responder
> chooses how may receives to post and should report that as the number of 
> credits.

If, as you say, there is no relationship between sending a Reply
and posting a Receive, how does the responder know how many credits
it should grant in each Reply?

The number of granted credits typically does not change. It doesn't
go down after the requester has sent an RPC Call. It doesn't go back
up after the responder has sent an RPC Reply.

The number of credits granted is the total number of RPC Calls that
may be in flight at once.

Suppose a responder grants 8 credits. After the requester has sends
8 RPC Calls, the responder's credit grant is still 8. But the
requester cannot send another RPC Call until the responder sends one
Reply with 8 or more in its rdma_credits field.

There is more than just the value in the rdma_credit field. Both
sides count the number of RPCs in flight, and subtract those from
the credit limit to determine how many new RPC transactions can be

> > The problem, as I see it, occurs when a requester sends enough
> > one-way messages that it prevents the transmission of more
> > RPC Calls. 
> So any solution has to prevent that from happening.  I guess what I prposed was:
> 	• Some items considered one-way messages are used in pairs, even though they are not part of RPC's that it is the job of the transport.

The problem in that case is that both implementations with which I am
aware use RPC gating to prevent credit overflow. This is an
implementation problem, not a protocol problem, however.

> 	• You can send otherwise "empty" one-way messages if you need to send credit information.

This would be the same as an ACK, and essentially makes one-way
messages into two-way messages again. OK.

But it doesn't appear to help in cases where one half of a two-way
message is dropped.

> > The responder cannot send Replies to those one-way
> > messages, so the requester has no way to know when the responder
> > is ready to receive another RPC Call.
> He can send other one-way messages to get the credit information to the peer.
> > I think that credit management will have to take a different
> > form if we believe one-way messages are a necessary part of the
> > future of RPC-over-RDMA.
> I don't believe one-way messages per se are a necessary part of the futue of RPC-over-RDMA.
> I just seems to me simpler to add them then it does to you.

IMO you are simplifying by

- ignoring the issues of backwards compatibility with
   RPC-over-RDMA version 1
- ignoring the need to address lost messages or dropped RPCs

> The thing that I think is necessary to the future is of RPC-over-RDMA, if it has one,
> is that there be some provision for the transfer of control messages used by the
> trannsport itself.  If there is a need to pair them, I'm OK with that but I fell Version 1 is
> funmentally handicapped by a strucure in which in which the only messages to be sent are
> either a request to be sent on behalf of a RPC Requester or a reply to be sent on behalf of
> an RPC Responder.

I think that's what I've been saying all along.

> It may be that RPC-over-RDMA will not have a future and that 
> relying on new pNFS mapping types is adequate, but I' prefer us to have multiple paths forward.

Chuck Lever