Re: [nfsv4] preliminary review of draft-cel-nfsv4-reminv-design

karen deitke <karen.deitke@oracle.com> Thu, 11 August 2016 15:10 UTC

Return-Path: <karen.deitke@oracle.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 833C112D778 for <nfsv4@ietfa.amsl.com>; Thu, 11 Aug 2016 08:10:43 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.467
X-Spam-Level:
X-Spam-Status: No, score=-5.467 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.247, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id a-lfYGwU5pFH for <nfsv4@ietfa.amsl.com>; Thu, 11 Aug 2016 08:10:39 -0700 (PDT)
Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 587CF12D750 for <nfsv4@ietf.org>; Thu, 11 Aug 2016 08:10:39 -0700 (PDT)
Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u7BFAbi5026476 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 11 Aug 2016 15:10:37 GMT
Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id u7BFAaNp014884 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 11 Aug 2016 15:10:37 GMT
Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id u7BFAa8T018573; Thu, 11 Aug 2016 15:10:36 GMT
Received: from [10.159.93.194] (/10.159.93.194) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 11 Aug 2016 08:10:35 -0700
To: David Noveck <davenoveck@gmail.com>, nfsv4@ietf.org
References: <CADaq8jdor+Ju+F=ZBcV6zY3PWJerpM_sDtuTPy9EZTo6hymFPQ@mail.gmail.com> <7E1F0B4A-EE1D-4686-BC9B-358AF8FCF095@oracle.com> <CADaq8jecc4dqgcgOZ7yH+n-0tMGZ7XmamM_us7fjAn4+H=C=pQ@mail.gmail.com> <57AB7CB3.1020901@oracle.com> <CADaq8je23qLBbQ-SSrmwOrDv_CSA_hJ7zymLBsKw2ZV4uvLQ+g@mail.gmail.com>
From: karen deitke <karen.deitke@oracle.com>
Organization: Oracle Corporation
Message-ID: <57AC9570.9000905@oracle.com>
Date: Thu, 11 Aug 2016 09:10:40 -0600
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0
MIME-Version: 1.0
In-Reply-To: <CADaq8je23qLBbQ-SSrmwOrDv_CSA_hJ7zymLBsKw2ZV4uvLQ+g@mail.gmail.com>
Content-Type: multipart/alternative; boundary="------------070506010909050307080602"
X-Source-IP: aserv0022.oracle.com [141.146.126.234]
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/XNWI57TLGwcUZrxrrJipAND0Bb4>
Subject: Re: [nfsv4] preliminary review of draft-cel-nfsv4-reminv-design
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Aug 2016 15:10:43 -0000

Thanks.  The ietf email was dropped, adding it back so others can also 
reply.

Karen

On 8/11/2016 4:44 AM, David Noveck wrote:
> > This is a need for solaris.
>
> Msg received but I think we need to close the door at some point on 
> incompatible XDR changes in Version Two.  We already have two.  I 
> would have preferred zero but I can live with two that can be tested 
> for together, as described in Chuck's I-D.
>
> If there is anybody else who has a need for such an XDR change in 
> version two, I think it is time to raise the issue now and get issues 
> relating to potential XDR changes resolved before we get Version Two 
> ready to be a working group document, which I hope will be soon.
>
> On Wed, Aug 10, 2016 at 3:12 PM, karen deitke <karen.deitke@oracle.com 
> <mailto:karen.deitke@oracle.com>> wrote:
>
>
>
>     On 8/4/2016 2:49 PM, David Noveck wrote:
>>     > In this case, however, I think the benefits outweigh the costs
>>     > of altering the base XDR.
>>
>>     Whether this is the case depends on the proportion of
>>     implementations that are
>>     adequately supported by a simpler approach that does not provide
>>     the extended
>>     functionality of allowing per-request selection of invalidation
>>     approach.
>>
>>     If 80% or 90% or 95% of implementations are OK with the simpler
>>     implementation,
>>     the balance of costs and benefits are different than they would
>>     be in 30%, 50%, or
>>     60% need the'more flexible approach.  I think we need to hear
>>     from the rest of the
>>     working group about what they think is important here.
>>
>>     > A client can't use RI at
>>     > all if it can't tolerate an arbitrary choice of which handle
>>     > in an RPC is invalidated remotely.
>>
>>     The question is how common such clients are.
>     Solaris can represent one such client.  Currently we do not
>     implement frwr which is needed for RI, however when we do, we
>     would not be able to utilize RI if the client can not identify
>     which chunks should be invalidate.  Random invalidation by the
>     server would be detrimental on solaris as it would potentially
>     also invalidate other memory buffers sharing the same  stag.
>     Additionally it could invalidate other memory we intend to keep
>     registered.
>
>>
>>     > In Linux, it's simply a matter of adding an "if vers == 2,
>>     > insert (or expect) two u32 fields here."
>>
>>     In lots of places.
>>
>>     > With an rpcgen-based
>>     > implementation, the change is also fairly trivial (in fact most
>>     > of it is handled by machine-generated code).
>>
>>     Essentially you have two implementation rather than one
>>     extensible one.
>>
>>     > Speaking as an implementer, having to support two new RDMA
>>     > message types (in addition to RDMA_OPT_INIT_XCHAR) is much more
>>     > effort than having to deal with one or two extra fields in
>>     > rpcrdma2_chunk_lists.e problems with the simpler approach.
>>
>>     That would be the case if every implementer had to support those
>>     OPTIONAL
>>     messages types.  If only a few would, the situation would be
>>     different.
>>
>>     Those who did not need the more extensive support would not have
>>     to deal with any
>>     XDR change at all.
>>
>>     > Thus to support Remote Invalidation
>>     > in full, an implementation would need to support RDMA_MSG,
>>     > RDMA_NOMSG, RDMA_OPT_INIT_XCHAR, RDMA_MESSAGEX, and
>>     > RDMA_NOMSGX.
>>
>>     so this all boils down to the question of how many
>>     implementations want or need to support
>>     this extended form of remote invalidation.
>     This is a need for solaris.
>
>     Karen
>>
>>
>>
>>
>>     On Thu, Aug 4, 2016 at 12:35 PM, Chuck Lever
>>     <chuck.lever@oracle.com <mailto:chuck.lever@oracle.com>> wrote:
>>
>>         Hi Dave-
>>
>>         Thanks for your review comments.
>>
>>
>>         > On Aug 4, 2016, at 10:48 AM, David Noveck
>>         <davenoveck@gmail.com <mailto:davenoveck@gmail.com>> wrote:
>>         >
>>         > The first issue concerns the structure
>>         rpcrdma2_chunk_lists.  Although this structure is defined in
>>         draft-cel-nfsv4-rpcrdma-version-two-01, there is nothing that
>>         uses it.  The structure that RDMA_MSG and RDMA_NOMSG include
>>         is rpcrdma2_chunks, which is not defined in the XDR.  It
>>         appear that this issue has existed since
>>         draft-cel-nfsv4-rpcrdma-version-two-00.
>>         >
>>         > Although that issue can and should be fixed, I think we
>>         need to have a more complete discussion of the extension
>>         model for version two and for RPC-over-RDMA as a whole in the
>>         near term.
>>         >
>>         > As I understand things, the original intention was that
>>         Version Two be a compatible extension of Version One.  Now,
>>         with the inclusion on the direction indication added in -01,
>>         and now the R-key proposed in reminv-design, Version One and
>>         Version Two messages will be OTW-incompatible, and I think
>>         are better off with a model in which Version Two consists of 
>>         subset which equal to Version One and a set of additions,
>>         primarily optional.  I think we should diverge from that
>>         model only if the benefits are sufficiently important to
>>         justify this discontinuity.
>>
>>         Yes, the original intention was to be as compatible as possible,
>>         and any XDR divergence with Version One would not be undertaken
>>         lightly. That preference is why we have elected to defer changes
>>         like this one to extensions when it makes sense.
>>
>>         However, with a version number bump, we are no longer entirely
>>         shackled by existing implementations or XDR definitions. It's
>>         very easy to take that too far, of course, which is why the
>>         default choice is to extend rather than alter the base XDR.
>>
>>         In this case, however, I think the benefits outweigh the costs
>>         of altering the base XDR.
>>
>>
>>         > With these changes, the implementation barrier to convert a
>>         Version One implementation to be compatible Version Two
>>         becomes significant.  I think we would be better off, if most
>>         Version One implementations could be made compatible with
>>         very easily, making the decision to do so more or less automatic.
>>
>>         In Linux, it's simply a matter of adding an "if vers == 2,
>>         insert (or expect) two u32 fields here." With an rpcgen-based
>>         implementation, the change is also fairly trivial (in fact most
>>         of it is handled by machine-generated code).
>>
>>         Speaking as an implementer, having to support two new RDMA
>>         message types (in addition to RDMA_OPT_INIT_XCHAR) is much more
>>         effort than having to deal with one or two extra fields in
>>         rpcrdma2_chunk_lists.
>>
>>
>>         > To get back to remote invalidation, I would prefer your
>>         section 3.3 as a baseline to be made accessible with no XDR
>>         changes in the base Version Two.  The additional
>>         functionality provided by 3.4, while desirable, is not of
>>         sufficient benefit to justify a non-compatible XDR change.  I
>>         feel that this functionality should be available as an
>>         OPTIONAL extension.
>>
>>         The benefit of the new field can be described this way:
>>
>>         RPC-over-RDMA allows multiple handles per RPC.
>>
>>         The burden of selecting a handle to invalidate remotely should
>>         be the client's. I believe one existing client implementation
>>         does mix persistently registered handles with dynamically
>>         registered handles in the same RPC. A client can't use RI at
>>         all if it can't tolerate an arbitrary choice of which handle
>>         in an RPC is invalidated remotely.
>>
>>         The "big switch" approach is simply not generic enough when
>>         multiple handles are in play and we don't have control over
>>         client implementation choices. It would exclude a portion of
>>         implementations, limiting the appeal of RPC-over-RDMA Version
>>         Two.
>>
>>         In the case of SMB Direct, all implementers decided that they
>>         would go with all FRWR registration. A big switch works in
>>         that scenario. In fact, the switch is always "on".
>>
>>         Providing rdma_inv_handle in each RPC-over-RDMA header goes
>>         along with the design of having lists of segments, each with
>>         their own handle.
>>
>>         Should the protocol be designed to discourage implementations
>>         that need to communicate handles on a per-RPC basis in favor
>>         of ones that can work with just an exchange of a transport
>>         characteristic?
>>
>>
>>         > When I say that it should be an "OPTIONAL Extension", I
>>         don't mean to imply:
>>         >       • That it needs to be implemented as a subcase of
>>         RDMA_OPTIONAL.
>>         >       • That it should be documented in a separate
>>         document, as opposed to being documented (eventually) in
>>         draft-ietf-nfsv4-rpcrdma-version-two.
>>         > What I do mean is that we should define new message type
>>         for extensions (so that we maintain Version One as a subset
>>         of  Version Two) and that we should (not "SHOULD" :-) make
>>         these new message types "OPTIONAL" (in the RFC2119 sense).  I
>>         can see, if there is sufficient reason, making an extension
>>         REQUIRED, but I don't see a reason to change an existing
>>         message type in an incompatible way.
>>         >
>>         > One way to do this is to define new message types
>>         RDMA_MESSAGEX and RDMA_NOMSGX which include direction and
>>         rdma_handle but there are other ways to do this.  To make it
>>         easier to determine whether support for OPTIONAL message
>>         types is present, we could define a transport
>>         characteristic/attribute that provides a bit mask of
>>         supported message types.
>>
>>         (An xchar that carries a bitmap of supported message types seems
>>         appropriate in the initial set of supported characteristics.)
>>
>>         IIRC, earlier we had decided that message types RDMA_MSG and
>>         RDMA_NOMSG would be REQUIRED. Thus to support Remote Invalidation
>>         in full, an implementation would need to support RDMA_MSG,
>>         RDMA_NOMSG, RDMA_OPT_INIT_XCHAR, RDMA_MESSAGEX, and RDMA_NOMSGX.
>>
>>         But now we are talking about significant XDR changes, and a
>>         significant implementation effort.
>>
>>
>>         --
>>         Chuck Lever
>>
>>
>>
>>
>>
>>
>>     _______________________________________________
>>     nfsv4 mailing list
>>     nfsv4@ietf.org <mailto:nfsv4@ietf.org>
>>     https://www.ietf.org/mailman/listinfo/nfsv4
>>     <https://www.ietf.org/mailman/listinfo/nfsv4>
>     _______________________________________________ nfsv4 mailing list
>     nfsv4@ietf.org <mailto:nfsv4@ietf.org>
>     https://www.ietf.org/mailman/listinfo/nfsv4
>     <https://www.ietf.org/mailman/listinfo/nfsv4> 
>