Re: [nfsv4] preliminary review of draft-cel-nfsv4-reminv-design

David Noveck <davenoveck@gmail.com> Thu, 04 August 2016 20:49 UTC

Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B9BE412D675 for <nfsv4@ietfa.amsl.com>; Thu, 4 Aug 2016 13:49:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.699
X-Spam-Level:
X-Spam-Status: No, score=-2.699 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3CHOwibPq2Xl for <nfsv4@ietfa.amsl.com>; Thu, 4 Aug 2016 13:49:42 -0700 (PDT)
Received: from mail-oi0-x22d.google.com (mail-oi0-x22d.google.com [IPv6:2607:f8b0:4003:c06::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D69E712D819 for <nfsv4@ietf.org>; Thu, 4 Aug 2016 13:49:41 -0700 (PDT)
Received: by mail-oi0-x22d.google.com with SMTP id f189so63442946oig.3 for <nfsv4@ietf.org>; Thu, 04 Aug 2016 13:49:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=vCeG2+jEvelz/MmH80gFhA07xtFWAonjciPxDuDvEZU=; b=XBEId43amp3jJp2OawbCVGfpmOFHBaad8plccxFU0SYPl6V5bvKMGSeT3LmWPwD85K BVoojlhtIYmR9n6+4lqcVRG2/RAUbl1KozoZR7nPHAruB1lEAqWD7K5mBiM4mo3W6wTa V9JPgmjPLREvHbh2cdA8vemX3JKRFbE8k0qODrxT24sHzuElu9mk/KstrlrMpFnx3yM1 hIcQOkDMeG+o9xLi8NratvbUEu5TJsSrX+NucOafX4SUtlvv23bRH3h1DBPMieGcDKGw /Mg9+fd/o0NQAqFDJgeKiWiyOytrHiO1W3c2A6Jw9zKC/6doN6uXiCQBZwJDaq9FT7ME X96Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=vCeG2+jEvelz/MmH80gFhA07xtFWAonjciPxDuDvEZU=; b=jSHjCfqwbuBMF6i0duGrzKrw8E0bNaPa4MR9i/mpYy26+f4dxrPcE4HDGXddWtzHec sg4PREeCY7MoH4HbFWvQuNCsApj0Eb4fLX1m38k21R4mTNF4FSfnrT01TT/PvzDOZyAW I/opco8MKV2rEi2UQAE9VOvwwU4aFDEKdG46Rc9zdvtrna7bZew+htqFViQxReqdoyW7 VGqFwjs55Ebii1mV6oX7C+OMLb6kATa55YRGfvZY9OQYXp3294Z12/YPCvIRz6f5zlMi Jj9oWoejzY6ZgWGmTLS4q1yxOzR3zd4vUHSjTtjjzTUUvQICXkQ137crFNAhVDUdMhR7 6S0g==
X-Gm-Message-State: AEkoouuPo62AaEn4ndLmIIkkh/07bm+sYU3sDfupH7WuOR/DnB5w/Dwc/g2YweCKY6bgAn2g6ikg2DdgOZLePA==
X-Received: by 10.157.42.115 with SMTP id t106mr16153577ota.6.1470343781134; Thu, 04 Aug 2016 13:49:41 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.182.20.72 with HTTP; Thu, 4 Aug 2016 13:49:40 -0700 (PDT)
In-Reply-To: <7E1F0B4A-EE1D-4686-BC9B-358AF8FCF095@oracle.com>
References: <CADaq8jdor+Ju+F=ZBcV6zY3PWJerpM_sDtuTPy9EZTo6hymFPQ@mail.gmail.com> <7E1F0B4A-EE1D-4686-BC9B-358AF8FCF095@oracle.com>
From: David Noveck <davenoveck@gmail.com>
Date: Thu, 04 Aug 2016 16:49:40 -0400
Message-ID: <CADaq8jecc4dqgcgOZ7yH+n-0tMGZ7XmamM_us7fjAn4+H=C=pQ@mail.gmail.com>
To: Chuck Lever <chuck.lever@oracle.com>
Content-Type: multipart/alternative; boundary="001a113d069cc5b86c05394516a0"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/M_pg6FRFHK9IkJX8XeehL330vbM>
Cc: "nfsv4@ietf.org" <nfsv4@ietf.org>
Subject: Re: [nfsv4] preliminary review of draft-cel-nfsv4-reminv-design
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Aug 2016 20:49:45 -0000

> In this case, however, I think the benefits outweigh the costs
> of altering the base XDR.

Whether this is the case depends on the proportion of implementations that
are
adequately supported by a simpler approach that does not provide the
extended
functionality of allowing per-request selection of invalidation approach.

If 80% or 90% or 95% of implementations are OK with the simpler
implementation,
the balance of costs and benefits are different than they would be in 30%,
50%, or
60% need the'more flexible approach.  I think we need to hear from the rest
of the
working group about what they think is important here.

> A client can't use RI at
> all if it can't tolerate an arbitrary choice of which handle
> in an RPC is invalidated remotely.

The question is how common such clients are.

> In Linux, it's simply a matter of adding an "if vers == 2,
> insert (or expect) two u32 fields here."

In lots of places.

> With an rpcgen-based
> implementation, the change is also fairly trivial (in fact most
> of it is handled by machine-generated code).

Essentially you have two implementation rather than one extensible one.

> Speaking as an implementer, having to support two new RDMA
> message types (in addition to RDMA_OPT_INIT_XCHAR) is much more
> effort than having to deal with one or two extra fields in
> rpcrdma2_chunk_lists.e problems with the simpler approach.

That would be the case if every implementer had to support those OPTIONAL
messages types.  If only a few would, the situation would be different.

Those who did not need the more extensive support would not have to deal
with any
XDR change at all.

> Thus to support Remote Invalidation
> in full, an implementation would need to support RDMA_MSG,
> RDMA_NOMSG, RDMA_OPT_INIT_XCHAR, RDMA_MESSAGEX, and
> RDMA_NOMSGX.

so this all boils down to the question of how many implementations want or
need to support
this extended form of remote invalidation.




On Thu, Aug 4, 2016 at 12:35 PM, Chuck Lever <chuck.lever@oracle.com> wrote:

> Hi Dave-
>
> Thanks for your review comments.
>
>
> > On Aug 4, 2016, at 10:48 AM, David Noveck <davenoveck@gmail.com> wrote:
> >
> > The first issue concerns the structure rpcrdma2_chunk_lists.  Although
> this structure is defined in draft-cel-nfsv4-rpcrdma-version-two-01,
> there is nothing that uses it.  The structure that RDMA_MSG and RDMA_NOMSG
> include is rpcrdma2_chunks, which is not defined in the XDR.  It appear
> that this issue has existed since draft-cel-nfsv4-rpcrdma-version-two-00.
> >
> > Although that issue can and should be fixed, I think we need to have a
> more complete discussion of the extension model for version two and for
> RPC-over-RDMA as a whole in the near term.
> >
> > As I understand things, the original intention was that Version Two be a
> compatible extension of Version One.  Now, with the inclusion on the
> direction indication added in -01, and now the R-key proposed in
> reminv-design, Version One and Version Two messages will be
> OTW-incompatible, and I think are better off with a model in which Version
> Two consists of  subset which equal to Version One and a set of additions,
> primarily optional.  I think we should diverge from that model only if the
> benefits are sufficiently important to justify this discontinuity.
>
> Yes, the original intention was to be as compatible as possible,
> and any XDR divergence with Version One would not be undertaken
> lightly. That preference is why we have elected to defer changes
> like this one to extensions when it makes sense.
>
> However, with a version number bump, we are no longer entirely
> shackled by existing implementations or XDR definitions. It's
> very easy to take that too far, of course, which is why the
> default choice is to extend rather than alter the base XDR.
>
> In this case, however, I think the benefits outweigh the costs
> of altering the base XDR.
>
>
> > With these changes, the implementation barrier to convert a Version One
> implementation to be compatible Version Two becomes significant.  I think
> we would be better off, if most Version One implementations could be made
> compatible with very easily, making the decision to do so more or less
> automatic.
>
> In Linux, it's simply a matter of adding an "if vers == 2,
> insert (or expect) two u32 fields here." With an rpcgen-based
> implementation, the change is also fairly trivial (in fact most
> of it is handled by machine-generated code).
>
> Speaking as an implementer, having to support two new RDMA
> message types (in addition to RDMA_OPT_INIT_XCHAR) is much more
> effort than having to deal with one or two extra fields in
> rpcrdma2_chunk_lists.
>
>
> > To get back to remote invalidation, I would prefer your section 3.3 as a
> baseline to be made accessible with no XDR changes in the base Version
> Two.  The additional functionality provided by 3.4, while desirable, is not
> of sufficient benefit to justify a non-compatible XDR change.  I feel that
> this functionality should be available as an OPTIONAL extension.
>
> The benefit of the new field can be described this way:
>
> RPC-over-RDMA allows multiple handles per RPC.
>
> The burden of selecting a handle to invalidate remotely should
> be the client's. I believe one existing client implementation
> does mix persistently registered handles with dynamically
> registered handles in the same RPC. A client can't use RI at
> all if it can't tolerate an arbitrary choice of which handle
> in an RPC is invalidated remotely.
>
> The "big switch" approach is simply not generic enough when
> multiple handles are in play and we don't have control over
> client implementation choices. It would exclude a portion of
> implementations, limiting the appeal of RPC-over-RDMA Version
> Two.
>
> In the case of SMB Direct, all implementers decided that they
> would go with all FRWR registration. A big switch works in
> that scenario. In fact, the switch is always "on".
>
> Providing rdma_inv_handle in each RPC-over-RDMA header goes
> along with the design of having lists of segments, each with
> their own handle.
>
> Should the protocol be designed to discourage implementations
> that need to communicate handles on a per-RPC basis in favor
> of ones that can work with just an exchange of a transport
> characteristic?
>
>
> > When I say that it should be an "OPTIONAL Extension", I don't mean to
> imply:
> >       • That it needs to be implemented as a subcase of RDMA_OPTIONAL.
> >       • That it should be documented in a separate document, as opposed
> to being documented (eventually) in draft-ietf-nfsv4-rpcrdma-version-two.
> > What I do mean is that we should define new message type for extensions
> (so that we maintain Version One as a subset of  Version Two) and that we
> should (not "SHOULD" :-) make these new message types "OPTIONAL" (in the
> RFC2119 sense).  I can see, if there is sufficient reason, making an
> extension REQUIRED, but I don't see a reason to change an existing message
> type in an incompatible way.
> >
> > One way to do this is to define new message types RDMA_MESSAGEX and
> RDMA_NOMSGX which include direction and rdma_handle but there are other
> ways to do this.  To make it easier to determine whether support for
> OPTIONAL message types is present, we could define a transport
> characteristic/attribute that provides a bit mask of supported message
> types.
>
> (An xchar that carries a bitmap of supported message types seems
> appropriate in the initial set of supported characteristics.)
>
> IIRC, earlier we had decided that message types RDMA_MSG and
> RDMA_NOMSG would be REQUIRED. Thus to support Remote Invalidation
> in full, an implementation would need to support RDMA_MSG,
> RDMA_NOMSG, RDMA_OPT_INIT_XCHAR, RDMA_MESSAGEX, and RDMA_NOMSGX.
>
> But now we are talking about significant XDR changes, and a
> significant implementation effort.
>
>
> --
> Chuck Lever
>
>
>
>