Re: [nfsv4] draft-ietf-nfsv4-rpcrdma-bidirection-03 review

Chuck Lever <chuck.lever@oracle.com> Fri, 03 June 2016 15:05 UTC

Return-Path: <chuck.lever@oracle.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8428812D0B9 for <nfsv4@ietfa.amsl.com>; Fri, 3 Jun 2016 08:05:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.646
X-Spam-Level:
X-Spam-Status: No, score=-5.646 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.426, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CXEcGzcqk9nC for <nfsv4@ietfa.amsl.com>; Fri, 3 Jun 2016 08:04:58 -0700 (PDT)
Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9874412B029 for <nfsv4@ietf.org>; Fri, 3 Jun 2016 08:04:58 -0700 (PDT)
Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u53F4uCr020523 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 3 Jun 2016 15:04:56 GMT
Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0021.oracle.com (8.13.8/8.13.8) with ESMTP id u53F4usX022672 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 3 Jun 2016 15:04:56 GMT
Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by aserv0122.oracle.com (8.13.8/8.13.8) with ESMTP id u53F4r5c011137; Fri, 3 Jun 2016 15:04:55 GMT
Received: from anon-dhcp-171.1015granger.net (/68.46.169.226) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 03 Jun 2016 07:04:53 -0800
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Chuck Lever <chuck.lever@oracle.com>
In-Reply-To: <CADaq8jc2hEg+sk7ADPoaFKmvCQ_xWjAkP1amWz9PJDHRSnCvHw@mail.gmail.com>
Date: Fri, 03 Jun 2016 11:04:52 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <B338F553-B74E-49C2-A638-2691B2A8E86D@oracle.com>
References: <6da90b6b-bb58-d241-0d74-dc421358c97c@oracle.com> <9ECFBBD5-9359-46AB-B1CC-7FCBF06C40A8@oracle.com> <f609eba3-4294-37ae-3bb8-c7df8f648bb0@oracle.com> <0E79D0E4-9D53-4ED4-8D17-2D806C56648F@oracle.com> <567848d1-d854-ef70-8fba-33708e7e0601@oracle.com> <E2B31EDF-74D6-4CC8-8F6C-674C85498B56@oracle.com> <E0C18ECC-7D15-447F-9DA7-654E1EBF6C3B@oracle.com> <CADaq8jcgW316nLwA3LnCAmL7nAY3o6XjeQLCkV-S_Sps9g+LJw@mail.gmail.com> <4E8C421F-1A22-413C-AA2E-833C71AC6F71@oracle.com> <CADaq8jdVjt6x0MNgc7g0HCvQ9tR6yC01AdSPCiqHmEn8CMPscw@mail.gmail.com> <CADaq8jcHk4PzxhGLtsK7mBEuh0J3T54Bn3PFd==X5cdoB9pGSQ@mail.gmail.com> <46B6E3E0-4598-4A54-A091-D590BF084B7F@oracle.com> <CADaq8jeBFLT4QB9=jY0OywbDDon0eUSrafz3nDVxcORNvwpDbg@mail.gmail.com> <CADaq8jc2hEg+sk7ADPoaFKmvCQ_xWjAkP1amWz9PJDHRSnCvHw@mail.gmail.com>
To: David Noveck <davenoveck@gmail.com>, Spencer Shepler <sshepler@microsoft.com>
X-Mailer: Apple Mail (2.3124)
X-Source-IP: aserv0021.oracle.com [141.146.126.233]
Archived-At: <http://mailarchive.ietf.org/arch/msg/nfsv4/kCRzJmotPDAWpikBrpNYjyUf7kc>
Cc: NFSv4 <nfsv4@ietf.org>
Subject: Re: [nfsv4] draft-ietf-nfsv4-rpcrdma-bidirection-03 review
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Jun 2016 15:05:00 -0000

> On Jun 3, 2016, at 10:33 AM, David Noveck <davenoveck@gmail.com> wrote:
> 
> A week ago I wrote:
> > The problem is that, in a multi-direction context, a given implementation only knows whether it is a client or 
> > a server.  It only knows if it is a requester or a responder based on the message it is processing at any given 
> > instant and if it can't figure that out, it is unsure whether it is a requester or responder and may be neither at
> > particular times.
> 
> It's now been at least two weeks since last call was supposed to end for the Version One RDMA documents
> and a week since Chuck posted any necessary updates.  I'm not sure why the documents are still listed as in 
> WGLC, but I wanted to make it clear that my comments above should not be holding these up.
> 
> Those comments refer to a problem that is inherent in Version One  and cannot be fixed within the constraints
> of a Version One no-XDR-change update.  Since the Working group has decided to do a Version One no-XDR-change
> update, the three existing RDMA documents should now go forward.  I don;t believe there is any disagreement on 
> that point.

I concur that rfc5666bis and rfc5666-implementation-experience
are ready to move forward.

However, I've been thinking about an earlier comment Dave made
in this thread:

> My only concern is with regard to a possible future extension in which multiple RPC messages are carried in a single SEND (i.e., the precise opposite of message continuation).  When the language for the new field is drafted, we should make sure that it doesn't assume all the messages in a SEND are all related to a single direction of operation. 


There seem to be two related issues, and one of them has
ramifications for rpcrdma-bidirection.


I. Ambiguity of the meaning of the credit field

In the rfc5666bis world, the credit field in all messages
flowing on a connection had an unambiguous meaning. If
the message was going from requester to responder, the
credit field was a credit request. In the other direction,
it was a credit grant.

rpcrdma-bidirection introduced the idea of two separate
credit flows, which depend on whether the message was
part of forward RPC operation or backward RPC operation
on that connection.

This is the source of the problem.

The meaning of the credit field depends on a field in
the upper layer, so it's a layering violation, to say
the least.

Also, the original concept of credit was to manage RDMA
receive buffers, but rpcrdma-bidirection interpretation
of the credit value is one-credit-per-RPC.

Now if we want to add support for partial RPC message
per credit (message continuation) or multiple RPC
messages per credit (as proposed above), or NO RPC
message per credit (for RDMA2_OPTIONAL control messages)
the credit field is unusable.

Would we add an independent pool of credits for each of
these transmission mechanisms? Probably not.

A logical course of action, then, would be to alter the
rpcrdma-bidirection I-D so that forward and backward
direction operation use the same pool of credits.

The question is whether it is feasible for the two
directions to share the credit pools without deadlocking.
Some prototyping and/or careful thought would be needed
to answer it.

(Spencer, I may be changing my mind about the readiness
of rpcrdma-bidirection).


II. Ambiguity of the meaning of the XID field

Today we have a one-to-one match between the XID value
in the RPC-over-RDMA header and the XID associated with
the RPC payload in that message.

With an extension for message continuation, there is
still a one-to-one match: the RPC-over-RDMA XID can
match the XID of the partial RPC payload.

But for multiple RPC payloads per Send, or no RPC
payloads per Send (control messages), we no longer have
a match. How should the RPC-over-RDMA header's XID field
be set in those cases?

RFC 5666 and rfc5666bis have similar language regarding
how the XID field in the RPC-over-RDMA header is to be
set. Here's rfc5666bis, Section 5.2.1:

>    The
>    receiver MAY perform its processing based solely on the XID in the
>    RPC-over-RDMA header, and thereby ignore the XID in the RPC message,
>    if it so chooses.


In other words, implementations are already allowed to
rely on the value of the RPC-over-RDMA header's XID
field, and ignore the XID field in the payload.

This issue has to be addressed for rpcrdma-version-two
to support extensions that enable multiple or no RPCs
per message.

I propose that here, an independent XID space for
RDMA2_OPTIONAL messages makes sense. This would allow
receivers to match RDMA2_OPTIONAL call and reply messages,
but would not tie the header's XID value to the upper
layer payload in any way.


--
Chuck Lever