Re: [nfsv4] draft-ietf-nfsv4-rpcrdma-bidirection-03 review

Chuck Lever <chuck.lever@oracle.com> Fri, 03 June 2016 19:35 UTC

Return-Path: <chuck.lever@oracle.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8BA6D12D5B1 for <nfsv4@ietfa.amsl.com>; Fri, 3 Jun 2016 12:35:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.646
X-Spam-Level:
X-Spam-Status: No, score=-5.646 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.426, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xR9ADzMbNpad for <nfsv4@ietfa.amsl.com>; Fri, 3 Jun 2016 12:35:12 -0700 (PDT)
Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0761012D94B for <nfsv4@ietf.org>; Fri, 3 Jun 2016 12:34:59 -0700 (PDT)
Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u53JYvXY024748 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 3 Jun 2016 19:34:57 GMT
Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userv0022.oracle.com (8.14.4/8.13.8) with ESMTP id u53JYvrJ010747 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 3 Jun 2016 19:34:57 GMT
Received: from abhmp0008.oracle.com (abhmp0008.oracle.com [141.146.116.14]) by userv0121.oracle.com (8.13.8/8.13.8) with ESMTP id u53JYuKx029380; Fri, 3 Jun 2016 19:34:56 GMT
Received: from anon-dhcp-171.1015granger.net (/68.46.169.226) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 03 Jun 2016 12:34:56 -0700
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Chuck Lever <chuck.lever@oracle.com>
In-Reply-To: <CADaq8jeybG3fx4evUhKDxXkVfpbiVSm1td9mvrN=RwrqqFtiwg@mail.gmail.com>
Date: Fri, 03 Jun 2016 15:34:53 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <46F55B8B-5266-4472-BD30-D9FAEF2265AA@oracle.com>
References: <6da90b6b-bb58-d241-0d74-dc421358c97c@oracle.com> <9ECFBBD5-9359-46AB-B1CC-7FCBF06C40A8@oracle.com> <f609eba3-4294-37ae-3bb8-c7df8f648bb0@oracle.com> <0E79D0E4-9D53-4ED4-8D17-2D806C56648F@oracle.com> <567848d1-d854-ef70-8fba-33708e7e0601@oracle.com> <E2B31EDF-74D6-4CC8-8F6C-674C85498B56@oracle.com> <E0C18ECC-7D15-447F-9DA7-654E1EBF6C3B@oracle.com> <CADaq8jcgW316nLwA3LnCAmL7nAY3o6XjeQLCkV-S_Sps9g+LJw@mail.gmail.com> <4E8C421F-1A22-413C-AA2E-833C71AC6F71@oracle.com> <CADaq8jdVjt6x0MNgc7g0HCvQ9tR6yC01AdSPCiqHmEn8CMPscw@mail.gmail.com> <CADaq8jcHk4PzxhGLtsK7mBEuh0J3T54Bn3PFd==X5cdoB9pGSQ@mail.gmail.com> <46B6E3E0-4598-4A54-A091-D590BF084B7F@oracle.com> <CADaq8jeBFLT4QB9=jY0OywbDDon0eUSrafz3nDVxcORNvwpDbg@mail.gmail.com> <CADaq8jc2hEg+sk7ADPoaFKmvCQ_xWjAkP1amWz9PJDHRSnCvHw@mail.gmail.com> <B338F553-B74E-49C2-A638-2691B2A8E86D@oracle.com> <CADaq8jeybG3fx4evUhKDxXkVfpbiVSm1td9mvrN=RwrqqFtiwg@mail.gmail.com>
To: David Noveck <davenoveck@gmail.com>
X-Mailer: Apple Mail (2.3124)
X-Source-IP: userv0022.oracle.com [156.151.31.74]
Archived-At: <http://mailarchive.ietf.org/arch/msg/nfsv4/2hVWuA3jescnHbokkmeV5pz8Xe4>
Cc: NFSv4 <nfsv4@ietf.org>
Subject: Re: [nfsv4] draft-ietf-nfsv4-rpcrdma-bidirection-03 review
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Jun 2016 19:35:14 -0000

> On Jun 3, 2016, at 2:31 PM, David Noveck <davenoveck@gmail.com> wrote:
> 
> > I concur that rfc5666bis and rfc5666-implementation-experience
> > are ready to move forward.
> 
> I guess the implication is that bidirection is not ready and I'm unclear about the reason for that.
> 
> > However, I've been thinking about an earlier comment Dave made
> > in this thread:
> 
> > > My only concern is with regard to a possible future extension in which multiple RPC messages are carried in a single SEND (i.e., the precise opposite of message continuation).  
> > > When the language for the new field is drafted, we should make sure that it doesn't assume all the messages in a SEND are all related to a single direction of operation.
> 
> > There seem to be two related issues, and one of them has
> > ramifications for rpcrdma-bidirection.
> 
> A few things I'd like to note here:
> 	• As far as I'm concerned, I only raised a single issue.
> 	• My issue is only with regard to how to deal with a potential Version Two extension issue.
> 	• When I said it was my "only concern" (emphasis added), I hoped to indicate that, as a potential future extension, this should not hold us up now.
> 	• Although it may relate to bidirection in general, I don't think it affects use of it in the Version one case. 

I. relates to the Version One case because we may want
to rewrite parts of the rpcrdma-bidirection I-D, which
has a direct impact on NFSv4.1 on RPC-over-RDMA V1.

And recall that we stated earlier that rpcrdma-bidirection
was not meant to apply solely to V1. The current V2
specification requires support for RPC bidirection as
specified in rpcrdma-bidirection.


> > I. Ambiguity of the meaning of the credit field
> >
> > In the rfc5666bis world, the credit field in all messages
> > flowing on a connection had an unambiguous meaning. If
> > the message was going from requester to responder, the
> > credit field was a credit request. In the other direction,
> > it was a credit grant.
> 
> That remains the case, even with bidirection.  Your last
> two sentences are still true.

> The problem is that, when bidirection is in effect, it is not
> always clear who is the requester and who is the responder:
> The sendr always knows but the receiver might not know and,
> in case where a message is not decipherable, the question 
> might not be answerable.
> 
> So,
> 	• With only forward direction operation, If the message was going from client to server, the credit field is a credit request. In the other direction, it is a credit grant.
> 	• With bidirectional operation, the corresponding statement with requester/responder is still true, but the code in the client and server whenthey receive a message do not know if thety are action as requester or responder :-(
> > rpcrdma-bidirection introduced the idea of two separate
> > credit flows, which depend on whether the message was
> > part of forward RPC operation or backward RPC operation
> > on that connection.

There may be problematic language in rfc5666bis as well.

RFC 5666 Section 3.3 attaches credit requests to RPC Call
messages and credit grants to RPC Reply messages.

The updated language in rfc5666bis has this: credit
request and grant is described in Sections 4.3.1 and 5.2.3
as being tied to requester and responder.

So, this is a problem not only for bidirectional operation
but also for any type of message where there are multiple
or no RPC messages associated with the RPC-over-RDMA
message.

My proposal is this:

The roles of client (active connector) and server (passive
accepter) are always the same for a given connection, and
are independent of direction of RPC operation.

Thus there is never any ambiguity about what the credit
field means: the active side always makes requests, and
the passive side always makes grants.


> > This is the source of the problem.
> 
> It solved one problem, but it created another.  The question we have is
> what to do about it.

Agreed.


> > The meaning of the credit field depends on a field in
> > the upper layer, so it's a layering violation, to say
> > the least.
> 
> I agree that it is layering violation and it may even be a layering misdemeanor.
> I don't think it is a layering felony, and I hope we can avoid any
> Draconian penalty.
> 
> > Also, the original concept of credit was to manage RDMA
> > receive buffers, but rpcrdma-bidirection interpretation
> > of the credit value is one-credit-per-RPC.
> 
> If it switched to that, I think of it as a purely verbal mistake, which is understandable 
> given the context, i.e. that we are in a one-message-per-RPC environment.  In any case,
> I don't see how that shift is related to bidirectional operation.

Given that the one-credit-per-RPC language appears as
early as RFC 5666, perhaps you are correct.


> > Now if we want to add support for partial RPC message
> > per credit (message continuation) 
> 
> I think we do.  I discussed doing that in draft-dnoveck-nfsv4-rprcrdma-rtissuues-00.
> 
> I will discuss the credit management issues in draft-dnoveck-nfsv4-rocrdma-rtrext-00,
> but I'm not sure if I'll be able to submit that before IETF96. 
> 
> > or multiple RPC
> > messages per credit (as proposed above), 
> 
> I haven't actually proposed that but I agree that we don't want to foreclose this option.
> 
> > or NO RPC
> > message per credit (for RDMA2_OPTIONAL control messages)
> 
> I actually have proposed messages that don't carry RPC in draft-dnoveck-nfsv4-rpcrdma-xcharext-00.

Sorry, the above "if" was a rhetorical "if". I do
presume that all three proposals are interesting to
pursue.


> I assumed that they required a credit because receiving the message requires a buffer and there
> is no way to avoid the fact that a buffer (and this a credit) is used up.  This is true whether the credit 
> field is a request or a grant or is ignored.
> 
> > the credit field is unusable.
> 
> My formulation would be "problematic for some classes of future extensions to the next RPC-over-RDMA
> version".  That's light-years (or even megaparsecs) away from "unusable".

Alright, then, "unreliable" might be closer to my
intended meaning.


> > Would we add an independent pool of credits for each of
> > these transmission mechanisms? Probably not.
> 
> I agree that that is a REAL BAD IDEA. 
> 
> > A logical course of action, then, would be to alter the
> > rpcrdma-bidirection I-D so that forward and backward
> > direction operation use the same pool of credits.
> 
> I think this is a possible course of action.  I don't think it is 
> logical because:
> 	• The potential ambiguity you are worried about is between credit and grant, and having a single pool wouldn't solve that.
> 	• There is no ambiguity about which credit pool you are dealing with, so this, not being broken, doesn't require a fix.

I disagree with that. Having a single credit flow, with
fixed roles of which side sends a request, and which
sends a grant, solves the problem simply and fully, for
all message types.


> 	• There are design/supplementation issues that you note below that may make this infeasible
> 
> > The question is whether it is feasible for the two
> > directions to share the credit pools without deadlocking.
> > Some prototyping and/or careful thought would be needed
> > to answer it.

It is also possible that this issue could be resolved
with language in rpcrdma-version-two, and allow the
one-credit-per-RPC concept to linger in V1 only.

I think that would mean rpcrdma-bidirection is turning
into a V1-only enhancement. Bidirection would have to
be partially or fully specified again in
rpcrdma-version-two.

Having multiple different ways to do backward direction
operation is onerous for implementations.

So I'd prefer to have this addressed in rfc5666bis and
rpcrdma-bidirection if we can muster it.


> > II. Ambiguity of the meaning of the XID field
> 
> I think this issue has no implications for rpcrdma-bidirection.

I agree. We can set this one aside for the moment.


--
Chuck Lever