Re: [nfsv4] Credit management and one-way messages

Chuck Lever <chuck.lever@oracle.com> Tue, 01 August 2017 15:50 UTC

Return-Path: <chuck.lever@oracle.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8212113178B for <nfsv4@ietfa.amsl.com>; Tue, 1 Aug 2017 08:50:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.221
X-Spam-Level:
X-Spam-Status: No, score=-4.221 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id L8C1oTF6XEHP for <nfsv4@ietfa.amsl.com>; Tue, 1 Aug 2017 08:49:59 -0700 (PDT)
Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1E7EC12ECC6 for <nfsv4@ietf.org>; Tue, 1 Aug 2017 08:49:59 -0700 (PDT)
Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id v71FnuMX009602 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 1 Aug 2017 15:49:56 GMT
Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id v71FntDq011072 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 1 Aug 2017 15:49:55 GMT
Received: from abhmp0016.oracle.com (abhmp0016.oracle.com [141.146.116.22]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id v71FntO4011536; Tue, 1 Aug 2017 15:49:55 GMT
Received: from anon-dhcp-171.1015granger.net (/68.46.169.226) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 01 Aug 2017 08:49:55 -0700
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Chuck Lever <chuck.lever@oracle.com>
In-Reply-To: <CADaq8jf8s+qpgC25d75Jm4=Mk9mcb5=TEYpKR9AzkczgV7dwag@mail.gmail.com>
Date: Tue, 01 Aug 2017 11:49:54 -0400
Cc: "nfsv4@ietf.org" <nfsv4@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <56B97007-87A7-4FBC-9DA0-530EA585AD57@oracle.com>
References: <CADaq8jf8s+qpgC25d75Jm4=Mk9mcb5=TEYpKR9AzkczgV7dwag@mail.gmail.com>
To: David Noveck <davenoveck@gmail.com>
X-Mailer: Apple Mail (2.3124)
X-Source-IP: userv0022.oracle.com [156.151.31.74]
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/7P6DhjoVYOLJIQIle6lCcUAbSQY>
Subject: Re: [nfsv4] Credit management and one-way messages
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Aug 2017 15:50:00 -0000

> On Jul 31, 2017, at 2:47 PM, David Noveck <davenoveck@gmail.com> wrote:
> 
> One issue that has been mentioned as a possible impediment to advnacing RPC-over-RDMA to be a working group document, in its current form, concerns its use of one-way messages to convey transport properties.  These concerns seem to arise from the presentation of credit managemt as a "request-grant protocol".  If the credit logic in fact depended on that pairing, then there would be no roon for one-way meesages.
> 
> In fact, it doesn't. If you look at the actual operation of credit management, one sees a different picture from what is stated in the introductory paragraph of Section 3.1.1 of RFC 8166:
> 
> Flow control for RDMA Send operations directed to the Responder is implemented as a simple request/grant protocol in the RPC-over-RDMA header associated with each RPC message.
> 
> what follows focuses on the role of the grant. giving extensive information but how it is computed and nd why it is vsli for the requester to use it.
> 
> While the text says "Practically speaking, the critical value is the granted value", it is not clear what the exact role of the request is.  It aapears to be a hintwhich the receiver is under no obligation to take any notice of.  So it appears that the presentation of the creddit management approach as a request-grant protocol., is a reflection of the fact that, when it is used to transport RPC Requests and Replies, this pairing always exists.  When grants are sent with one-way messages, there is no problem that arises from the fact that there is no corresponding credit request.  The sender simply informs the receiver about his own receive resources and thus his ability to accept further sends.

The request-grant protocol sets an upper bound for the number of
messages that can be in flight at once. You are addressing the
narrow issue of how a receiver should interpret the value in the
rdma_credits field to set the negotiated credit limit.

The purpose of this limit is to prevent a sender from transmitting
more messages than the receiver has available receive buffers.

For example, a requester cannot RPC Call without first ensuring
there is a receive buffer available to catch the RPC Reply. A
responder cannot send an RPC Reply without first ensuring there
are an appropriate number of receive buffers ready for the granted
number of credits, minus the number of RPC transactions it is
currently processing. A requester computes the number of remote
receive buffers that are available, and thus how many more RPC
Calls it can send before waiting, by observing the number of
Replies it has received.

In other words, the credit limit is _enforced_ via the two-way
interchange. The underlying and more significant issue is therefore
that, when unidirectional messages are introduced, neither side can
properly compute the number of new messages that may be sent
relative to the negotiated upper bound.

Another header field could be introduced that contains the exact
number of receive buffers available on the sender when that message
was sent. Or such a field could replace rdma_credits. Either would
be another step away from general compatibility with RPC-over-RDMA
version 1.


> There is one potential issue connected with use of one-way messages to be addressed.  If the recever has N credits and then N one-way messages are sent without any traffic in the opposite direction, then it is possble for a deadlock to result, since there would be no way for the sender to find out about the receive resources.  For most one-way messages, this is not a problem, since many one-way messages naturally give rise to messages in the opposite direction even if the relationship is not formalized witin an RPC paradigm.  For example:
> 	• The RDMA2_CONNPROP sent by the client to the server is paired with an RDMA2_CONNPROP in the opposite direction.
> 	• An RDMA2_REQPROP results in RDMA2_RESPROP send in response.
> RDMA2_UPDPROP is an exception.  In the unlikely event that there are a a lage series of such messages sent in one irection while there are no RPC's being sent to the receiver it is possible for a ealock to arise.  This would result in a situation in which the receiver would have to send some message back in the other to provide a credit grant.   The most likely way to to that is for it to choose to send an RDMA2_UPDPROP with an empty property set in the reverse direction.
> 
> One other problem with one-way messages as they stand in rpcrdma-version-two concerns section 7.2.2 in which the following item in the bullet list is poblematic:
> 	• When the rdma_proc field has the value RDMA2_OPTIONAL and no RPC message payload is present, a Requester MUST set the value of the rdma_optdir field to CALL, and a Responder MUST set the value of the rdma_optdir field to REPLY.  The Requester chooses a value for the rdma_xid field from the XID space that matches the message's direction.  Requesters and Responders set the rdma_credit field in a similar fashion: a value is set that is appropriate for the direction of the message.
> This cannot be acted on, because, in the context of a one-way message, it is not clear which party is the requester and which is the responder.  While the roles of client and server are fixed and clear, the roles of requester and responder vary from RPC to RPC.  If you are not in an RPC context, then any decision as to who is the requestor or responder is arbitrary.
> I think the best way to address these issues is for rpcrdma-version-two would be to:
> 	• Provide an explanation of credit management not so tied to the RPC paradigm.

That is appropriate to explore, but is much easier said than
done.

But first, we need to define what a unidirectional message is,
and even that's harder than it looks. Here are some examples.

 + An RPC Call that does not expect a Reply is unidirectional.

 + A one-way control message is unidirectional.

 + An RPC Call that is dropped or ignored by a responder is
   unidirectional.

 + An RPC retransmission without a connection break makes
   either the original or the retransmitted message
   unidirectional.

 + An RPC Call becomes unidirectional when the matching RPC
   Reply is lost for some reason.

 + RDMA_DONE would be unidirectional. (RDMA_DONE is proposed
   as a mechanism to manage Read chunks in RPC Replies in
   draft-cel-nfsv4-rpcrdma-reliable-reply-00).

 + An extension might introduce some new form of unidirectional
   message if the problematic text in section 7.2.2 is modified
   or removed.

These are some of the ways a requester or responder can lose
synchronization of outstanding credits.


> 	• Add a new value for the direction field for one-way messages

In some of the above cases, the sender does not know that a
unidirectional message is being sent, and thus cannot properly
set the direction field. 


> 	• Provide that one-way messages always contain a credit grant rather than a credit request

It doesn't make sense to me to send a message from a requester to
a responder with a "grant" credit value. What value would the
requester put in the rdma_credits field in the case where the
responder and requester have a different number of receive
buffers available?

It might be stronger to have a special value for either the
direction or the rdma_credits field which means "the value in the
rdma_credits field should be ignored".

But see above: sometimes a sender cannot know in advance whether
a message is unidirectional.


> 	• Explain how the potential deadlock with RDMA2_UPDPROP can be avoided.

In addition, to handle cases where a message becomes unidirectional
after it is sent, RPC-over-RDMA needs a reliable and non-destructive
mechanism for resynchronizing the number of outstanding credits at
the sender and receiver.

Today that mechanism is to drop and re-establish a connection (which
is 100% reliable but is not non-destructive).

We are introducing mechanisms that can build up state associated
with a connection (using unidirectional messages, possibly). That
state is lost and must be re-established if the connection is
dropped. Not to mention all the outstanding RPCs that have to be
retransmitted.


--
Chuck Lever