Re: [nfsv4] preliminary review of draft-cel-nfsv4-reminv-design

karen deitke <karen.deitke@oracle.com> Wed, 10 August 2016 19:13 UTC

Return-Path: <karen.deitke@oracle.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BB51412D841 for <nfsv4@ietfa.amsl.com>; Wed, 10 Aug 2016 12:13:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.467
X-Spam-Level:
X-Spam-Status: No, score=-5.467 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.247, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QQFdVpiv0drD for <nfsv4@ietfa.amsl.com>; Wed, 10 Aug 2016 12:12:59 -0700 (PDT)
Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2B58F12D830 for <nfsv4@ietf.org>; Wed, 10 Aug 2016 12:12:59 -0700 (PDT)
Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u7AJCwsj031508 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for <nfsv4@ietf.org>; Wed, 10 Aug 2016 19:12:58 GMT
Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id u7AJCw8A014730 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for <nfsv4@ietf.org>; Wed, 10 Aug 2016 19:12:58 GMT
Received: from abhmp0017.oracle.com (abhmp0017.oracle.com [141.146.116.23]) by aserv0121.oracle.com (8.13.8/8.13.8) with ESMTP id u7AJCvQ4011607 for <nfsv4@ietf.org>; Wed, 10 Aug 2016 19:12:57 GMT
Received: from [10.159.93.194] (/10.159.93.194) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 10 Aug 2016 12:12:57 -0700
To: nfsv4@ietf.org
References: <CADaq8jdor+Ju+F=ZBcV6zY3PWJerpM_sDtuTPy9EZTo6hymFPQ@mail.gmail.com> <7E1F0B4A-EE1D-4686-BC9B-358AF8FCF095@oracle.com> <CADaq8jecc4dqgcgOZ7yH+n-0tMGZ7XmamM_us7fjAn4+H=C=pQ@mail.gmail.com>
From: karen deitke <karen.deitke@oracle.com>
Organization: Oracle Corporation
Message-ID: <57AB7CB3.1020901@oracle.com>
Date: Wed, 10 Aug 2016 13:12:51 -0600
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0
MIME-Version: 1.0
In-Reply-To: <CADaq8jecc4dqgcgOZ7yH+n-0tMGZ7XmamM_us7fjAn4+H=C=pQ@mail.gmail.com>
Content-Type: multipart/alternative; boundary="------------050606000302030107090301"
X-Source-IP: aserv0022.oracle.com [141.146.126.234]
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/86K0hZ419Q4vo4KjHgGrB-nZxwY>
Subject: Re: [nfsv4] preliminary review of draft-cel-nfsv4-reminv-design
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Aug 2016 19:13:01 -0000


On 8/4/2016 2:49 PM, David Noveck wrote:
> > In this case, however, I think the benefits outweigh the costs
> > of altering the base XDR.
>
> Whether this is the case depends on the proportion of implementations 
> that are
> adequately supported by a simpler approach that does not provide the 
> extended
> functionality of allowing per-request selection of invalidation approach.
>
> If 80% or 90% or 95% of implementations are OK with the simpler 
> implementation,
> the balance of costs and benefits are different than they would be in 
> 30%, 50%, or
> 60% need the'more flexible approach.  I think we need to hear from the 
> rest of the
> working group about what they think is important here.
>
> > A client can't use RI at
> > all if it can't tolerate an arbitrary choice of which handle
> > in an RPC is invalidated remotely.
>
> The question is how common such clients are.
Solaris can represent one such client.  Currently we do not implement 
frwr which is needed for RI, however when we do, we would not be able to 
utilize RI if the client can not identify which chunks should be 
invalidate.  Random invalidation by the server would be detrimental on 
solaris as it would potentially also invalidate other memory buffers 
sharing the same  stag. Additionally it could invalidate other memory we 
intend to keep registered.

>
> > In Linux, it's simply a matter of adding an "if vers == 2,
> > insert (or expect) two u32 fields here."
>
> In lots of places.
>
> > With an rpcgen-based
> > implementation, the change is also fairly trivial (in fact most
> > of it is handled by machine-generated code).
>
> Essentially you have two implementation rather than one extensible one.
>
> > Speaking as an implementer, having to support two new RDMA
> > message types (in addition to RDMA_OPT_INIT_XCHAR) is much more
> > effort than having to deal with one or two extra fields in
> > rpcrdma2_chunk_lists.e problems with the simpler approach.
>
> That would be the case if every implementer had to support those OPTIONAL
> messages types.  If only a few would, the situation would be different.
>
> Those who did not need the more extensive support would not have to 
> deal with any
> XDR change at all.
>
> > Thus to support Remote Invalidation
> > in full, an implementation would need to support RDMA_MSG,
> > RDMA_NOMSG, RDMA_OPT_INIT_XCHAR, RDMA_MESSAGEX, and
> > RDMA_NOMSGX.
>
> so this all boils down to the question of how many implementations 
> want or need to support
> this extended form of remote invalidation.
This is a need for solaris.

Karen
>
>
>
>
> On Thu, Aug 4, 2016 at 12:35 PM, Chuck Lever <chuck.lever@oracle.com 
> <mailto:chuck.lever@oracle.com>> wrote:
>
>     Hi Dave-
>
>     Thanks for your review comments.
>
>
>     > On Aug 4, 2016, at 10:48 AM, David Noveck <davenoveck@gmail.com
>     <mailto:davenoveck@gmail.com>> wrote:
>     >
>     > The first issue concerns the structure rpcrdma2_chunk_lists. 
>     Although this structure is defined in
>     draft-cel-nfsv4-rpcrdma-version-two-01, there is nothing that uses
>     it.  The structure that RDMA_MSG and RDMA_NOMSG include is
>     rpcrdma2_chunks, which is not defined in the XDR.  It appear that
>     this issue has existed since draft-cel-nfsv4-rpcrdma-version-two-00.
>     >
>     > Although that issue can and should be fixed, I think we need to
>     have a more complete discussion of the extension model for version
>     two and for RPC-over-RDMA as a whole in the near term.
>     >
>     > As I understand things, the original intention was that Version
>     Two be a compatible extension of Version One.  Now, with the
>     inclusion on the direction indication added in -01, and now the
>     R-key proposed in reminv-design, Version One and Version Two
>     messages will be OTW-incompatible, and I think are better off with
>     a model in which Version Two consists of  subset which equal to
>     Version One and a set of additions, primarily optional.  I think
>     we should diverge from that model only if the benefits are
>     sufficiently important to justify this discontinuity.
>
>     Yes, the original intention was to be as compatible as possible,
>     and any XDR divergence with Version One would not be undertaken
>     lightly. That preference is why we have elected to defer changes
>     like this one to extensions when it makes sense.
>
>     However, with a version number bump, we are no longer entirely
>     shackled by existing implementations or XDR definitions. It's
>     very easy to take that too far, of course, which is why the
>     default choice is to extend rather than alter the base XDR.
>
>     In this case, however, I think the benefits outweigh the costs
>     of altering the base XDR.
>
>
>     > With these changes, the implementation barrier to convert a
>     Version One implementation to be compatible Version Two becomes
>     significant.  I think we would be better off, if most Version One
>     implementations could be made compatible with very easily, making
>     the decision to do so more or less automatic.
>
>     In Linux, it's simply a matter of adding an "if vers == 2,
>     insert (or expect) two u32 fields here." With an rpcgen-based
>     implementation, the change is also fairly trivial (in fact most
>     of it is handled by machine-generated code).
>
>     Speaking as an implementer, having to support two new RDMA
>     message types (in addition to RDMA_OPT_INIT_XCHAR) is much more
>     effort than having to deal with one or two extra fields in
>     rpcrdma2_chunk_lists.
>
>
>     > To get back to remote invalidation, I would prefer your section
>     3.3 as a baseline to be made accessible with no XDR changes in the
>     base Version Two.  The additional functionality provided by 3.4,
>     while desirable, is not of sufficient benefit to justify a
>     non-compatible XDR change.  I feel that this functionality should
>     be available as an OPTIONAL extension.
>
>     The benefit of the new field can be described this way:
>
>     RPC-over-RDMA allows multiple handles per RPC.
>
>     The burden of selecting a handle to invalidate remotely should
>     be the client's. I believe one existing client implementation
>     does mix persistently registered handles with dynamically
>     registered handles in the same RPC. A client can't use RI at
>     all if it can't tolerate an arbitrary choice of which handle
>     in an RPC is invalidated remotely.
>
>     The "big switch" approach is simply not generic enough when
>     multiple handles are in play and we don't have control over
>     client implementation choices. It would exclude a portion of
>     implementations, limiting the appeal of RPC-over-RDMA Version
>     Two.
>
>     In the case of SMB Direct, all implementers decided that they
>     would go with all FRWR registration. A big switch works in
>     that scenario. In fact, the switch is always "on".
>
>     Providing rdma_inv_handle in each RPC-over-RDMA header goes
>     along with the design of having lists of segments, each with
>     their own handle.
>
>     Should the protocol be designed to discourage implementations
>     that need to communicate handles on a per-RPC basis in favor
>     of ones that can work with just an exchange of a transport
>     characteristic?
>
>
>     > When I say that it should be an "OPTIONAL Extension", I don't
>     mean to imply:
>     >       • That it needs to be implemented as a subcase of
>     RDMA_OPTIONAL.
>     >       • That it should be documented in a separate document, as
>     opposed to being documented (eventually) in
>     draft-ietf-nfsv4-rpcrdma-version-two.
>     > What I do mean is that we should define new message type for
>     extensions (so that we maintain Version One as a subset of 
>     Version Two) and that we should (not "SHOULD" :-) make these new
>     message types "OPTIONAL" (in the RFC2119 sense).  I can see, if
>     there is sufficient reason, making an extension REQUIRED, but I
>     don't see a reason to change an existing message type in an
>     incompatible way.
>     >
>     > One way to do this is to define new message types RDMA_MESSAGEX
>     and RDMA_NOMSGX which include direction and rdma_handle but there
>     are other ways to do this.  To make it easier to determine whether
>     support for OPTIONAL message types is present, we could define a
>     transport characteristic/attribute that provides a bit mask of
>     supported message types.
>
>     (An xchar that carries a bitmap of supported message types seems
>     appropriate in the initial set of supported characteristics.)
>
>     IIRC, earlier we had decided that message types RDMA_MSG and
>     RDMA_NOMSG would be REQUIRED. Thus to support Remote Invalidation
>     in full, an implementation would need to support RDMA_MSG,
>     RDMA_NOMSG, RDMA_OPT_INIT_XCHAR, RDMA_MESSAGEX, and RDMA_NOMSGX.
>
>     But now we are talking about significant XDR changes, and a
>     significant implementation effort.
>
>
>     --
>     Chuck Lever
>
>
>
>
>
>
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4