Re: [nfsv4] preliminary review of draft-cel-nfsv4-reminv-design

Chuck Lever <chuck.lever@oracle.com> Thu, 04 August 2016 16:35 UTC

Return-Path: <chuck.lever@oracle.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2FA1C12D63E for <nfsv4@ietfa.amsl.com>; Thu, 4 Aug 2016 09:35:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.507
X-Spam-Level:
X-Spam-Status: No, score=-5.507 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.287, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id P0NfUTjXhBqn for <nfsv4@ietfa.amsl.com>; Thu, 4 Aug 2016 09:35:45 -0700 (PDT)
Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3354412D178 for <nfsv4@ietf.org>; Thu, 4 Aug 2016 09:35:45 -0700 (PDT)
Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u74GZhsA019575 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 4 Aug 2016 16:35:43 GMT
Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserv0021.oracle.com (8.13.8/8.13.8) with ESMTP id u74GZgUp021221 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 4 Aug 2016 16:35:43 GMT
Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id u74GZfkZ026210; Thu, 4 Aug 2016 16:35:42 GMT
Received: from anon-dhcp-171.1015granger.net (/68.46.169.226) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 04 Aug 2016 09:35:41 -0700
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Chuck Lever <chuck.lever@oracle.com>
In-Reply-To: <CADaq8jdor+Ju+F=ZBcV6zY3PWJerpM_sDtuTPy9EZTo6hymFPQ@mail.gmail.com>
Date: Thu, 04 Aug 2016 12:35:39 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <7E1F0B4A-EE1D-4686-BC9B-358AF8FCF095@oracle.com>
References: <CADaq8jdor+Ju+F=ZBcV6zY3PWJerpM_sDtuTPy9EZTo6hymFPQ@mail.gmail.com>
To: David Noveck <davenoveck@gmail.com>
X-Mailer: Apple Mail (2.3124)
X-Source-IP: aserv0021.oracle.com [141.146.126.233]
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/yor3xDzhtf5FYQAU6MENSp5uEdg>
Cc: "nfsv4@ietf.org" <nfsv4@ietf.org>
Subject: Re: [nfsv4] preliminary review of draft-cel-nfsv4-reminv-design
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Aug 2016 16:35:47 -0000

Hi Dave-

Thanks for your review comments.


> On Aug 4, 2016, at 10:48 AM, David Noveck <davenoveck@gmail.com> wrote:
> 
> The first issue concerns the structure rpcrdma2_chunk_lists.  Although this structure is defined in draft-cel-nfsv4-rpcrdma-version-two-01, there is nothing that uses it.  The structure that RDMA_MSG and RDMA_NOMSG include is rpcrdma2_chunks, which is not defined in the XDR.  It appear that this issue has existed since draft-cel-nfsv4-rpcrdma-version-two-00. 
> 
> Although that issue can and should be fixed, I think we need to have a more complete discussion of the extension model for version two and for RPC-over-RDMA as a whole in the near term.
> 
> As I understand things, the original intention was that Version Two be a compatible extension of Version One.  Now, with the inclusion on the direction indication added in -01, and now the R-key proposed in reminv-design, Version One and Version Two messages will be OTW-incompatible, and I think are better off with a model in which Version Two consists of  subset which equal to Version One and a set of additions, primarily optional.  I think we should diverge from that model only if the benefits are sufficiently important to justify this discontinuity.

Yes, the original intention was to be as compatible as possible,
and any XDR divergence with Version One would not be undertaken
lightly. That preference is why we have elected to defer changes
like this one to extensions when it makes sense.

However, with a version number bump, we are no longer entirely
shackled by existing implementations or XDR definitions. It's
very easy to take that too far, of course, which is why the
default choice is to extend rather than alter the base XDR.

In this case, however, I think the benefits outweigh the costs
of altering the base XDR.


> With these changes, the implementation barrier to convert a Version One implementation to be compatible Version Two becomes significant.  I think we would be better off, if most Version One implementations could be made compatible with very easily, making the decision to do so more or less automatic.

In Linux, it's simply a matter of adding an "if vers == 2,
insert (or expect) two u32 fields here." With an rpcgen-based
implementation, the change is also fairly trivial (in fact most
of it is handled by machine-generated code).

Speaking as an implementer, having to support two new RDMA
message types (in addition to RDMA_OPT_INIT_XCHAR) is much more
effort than having to deal with one or two extra fields in
rpcrdma2_chunk_lists.


> To get back to remote invalidation, I would prefer your section 3.3 as a baseline to be made accessible with no XDR changes in the base Version Two.  The additional functionality provided by 3.4, while desirable, is not of sufficient benefit to justify a non-compatible XDR change.  I feel that this functionality should be available as an OPTIONAL extension.

The benefit of the new field can be described this way:

RPC-over-RDMA allows multiple handles per RPC.

The burden of selecting a handle to invalidate remotely should
be the client's. I believe one existing client implementation
does mix persistently registered handles with dynamically
registered handles in the same RPC. A client can't use RI at
all if it can't tolerate an arbitrary choice of which handle
in an RPC is invalidated remotely.

The "big switch" approach is simply not generic enough when
multiple handles are in play and we don't have control over
client implementation choices. It would exclude a portion of
implementations, limiting the appeal of RPC-over-RDMA Version
Two.

In the case of SMB Direct, all implementers decided that they
would go with all FRWR registration. A big switch works in
that scenario. In fact, the switch is always "on".

Providing rdma_inv_handle in each RPC-over-RDMA header goes
along with the design of having lists of segments, each with
their own handle.

Should the protocol be designed to discourage implementations
that need to communicate handles on a per-RPC basis in favor
of ones that can work with just an exchange of a transport
characteristic?


> When I say that it should be an "OPTIONAL Extension", I don't mean to imply:
> 	• That it needs to be implemented as a subcase of RDMA_OPTIONAL.
> 	• That it should be documented in a separate document, as opposed to being documented (eventually) in draft-ietf-nfsv4-rpcrdma-version-two.
> What I do mean is that we should define new message type for extensions (so that we maintain Version One as a subset of  Version Two) and that we should (not "SHOULD" :-) make these new message types "OPTIONAL" (in the RFC2119 sense).  I can see, if there is sufficient reason, making an extension REQUIRED, but I don't see a reason to change an existing message type in an incompatible way.
> 
> One way to do this is to define new message types RDMA_MESSAGEX and RDMA_NOMSGX which include direction and rdma_handle but there are other ways to do this.  To make it easier to determine whether support for OPTIONAL message types is present, we could define a transport characteristic/attribute that provides a bit mask of supported message types.

(An xchar that carries a bitmap of supported message types seems
appropriate in the initial set of supported characteristics.)

IIRC, earlier we had decided that message types RDMA_MSG and
RDMA_NOMSG would be REQUIRED. Thus to support Remote Invalidation
in full, an implementation would need to support RDMA_MSG,
RDMA_NOMSG, RDMA_OPT_INIT_XCHAR, RDMA_MESSAGEX, and RDMA_NOMSGX.

But now we are talking about significant XDR changes, and a
significant implementation effort.


--
Chuck Lever