Re: [nfsv4] NFS/RDMA next steps

David Noveck <davenoveck@gmail.com> Mon, 31 July 2017 20:04 UTC

Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3000F131C9C for <nfsv4@ietfa.amsl.com>; Mon, 31 Jul 2017 13:04:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.698
X-Spam-Level:
X-Spam-Status: No, score=-2.698 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id s-DyP55s29G9 for <nfsv4@ietfa.amsl.com>; Mon, 31 Jul 2017 13:04:37 -0700 (PDT)
Received: from mail-io0-x235.google.com (mail-io0-x235.google.com [IPv6:2607:f8b0:4001:c06::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3B84E124B0A for <nfsv4@ietf.org>; Mon, 31 Jul 2017 13:04:37 -0700 (PDT)
Received: by mail-io0-x235.google.com with SMTP id j32so396753iod.0 for <nfsv4@ietf.org>; Mon, 31 Jul 2017 13:04:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=uNW8unJDFp7zVHwIGKiXvTV+BaoXXFGfjCGd0/4DM+A=; b=WcTe0kZ1XEyxPXsROfmVYX4tm4R1DgFFzY7E9J+PKOR6ESWYTkgchCLv4aM6qsMlir /hsokDSR1hEXHeTgs/z912MfEPdivrH7GtXN6uM/GDoq7hcTHTN9iGrC0YMFfMxJgtqQ Tx8nA4xXg+ICruHSfDOX+hN4wHr7oYRJcb7tXYQw5OushoeYqyGUDW56O7JxbPMzeaoT z9OJ8QSRLL8u1zY+YeP0ZHWUgv7joJXGM61MJTxtN5XGZVpQyshLlByDhbWpBBXFZk6j SiFnERwpJmwGUULbU+f/4uuslNFpIFycp8awaH9lEW6onIGWgrdA5hUkchMP47TJno1e hFFA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=uNW8unJDFp7zVHwIGKiXvTV+BaoXXFGfjCGd0/4DM+A=; b=NIab5PLw1F793YBHRrpzhgcJM1oUlYU4H3ZV3CExFFMWnKVjGWismwOSLMosju4/UE JmpVZPCtRKyTVQFuMzKc9bfp1H/X6his+FzMKx1LgaRJHuvvmBVbwafcJ/mGLPvcGLbi wwb3aGOWlV4AStwsZ34GUEUkfOFeMqTJAKD/OvSorlswErceF/XT7SDcOfwdMh4h09eB 3gCSxfQYPlPefQQksfUCb0OIMMkdTjdW6EXLkunEbu3e4oW3H+uGDAyqyqNZaBu/V+Bv H94VhuCLwg330vRzZW+8QGeyLHkvJyMl1KiboYS/BQ4OIA3xZmxN/nJ8aEa08cqRe4zT RV0Q==
X-Gm-Message-State: AIVw113p7CBzHQ1QUQ6jg+/u6jAUk7YHZNu4BBDOWjF89yv6fVwsDQT4 ponM5TJ2AUZen2vEAIDf5M3jco4TkqMf
X-Received: by 10.107.19.222 with SMTP id 91mr19475903iot.313.1501531476327; Mon, 31 Jul 2017 13:04:36 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.107.142.72 with HTTP; Mon, 31 Jul 2017 13:04:35 -0700 (PDT)
In-Reply-To: <53DF3636-D420-4FAA-B1B0-8824602CBB72@oracle.com>
References: <53DF3636-D420-4FAA-B1B0-8824602CBB72@oracle.com>
From: David Noveck <davenoveck@gmail.com>
Date: Mon, 31 Jul 2017 16:04:35 -0400
Message-ID: <CADaq8jeyxWEDkdWcRvaK-Vet0dCCXgJ0HcMywP3aXawV9KVPbg@mail.gmail.com>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: NFSv4 <nfsv4@ietf.org>
Content-Type: multipart/alternative; boundary="001a113f92644403df0555a28ad0"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/P_A3ZZtYHoS1jhXjoAXyfyQ7oc8>
Subject: Re: [nfsv4] NFS/RDMA next steps
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 31 Jul 2017 20:04:40 -0000

> Note that RFC 5667bis is now "Submitted to IESG for Publication". The work
> items in the slides assume that this document will be completed and
published
> as planned.

It's been that way for seventeen days.  I expect it to be done in about six
months but RFC 8154 took about a year.  Sigh!

> The slides split the possibilities into three somewhat orthogonal
groupings.

You had to divide these up to produce slides, but everyone will divide
things
differently.  In any case, I don't see these divisions as all that
helpful.  I think we
need to decide what documents are ready to become working group
documents based on their maturity and promise and believe that is the
appropriate way to frame the discussion.

> Opinions
> are welcome as to what order, whether something is left out, or what
might be
> removed from this list. Groupings Two and Three could introduce new
Working
> Group documents, and thus have implications for our charter milestone
count.

Note that Grouping One does as well.  See below.


> Grouping Zero:
> Focus on improving existing implementations of RPC-over-RDMA and
NFS/RDMA. No
> IETF action needed, which is why I didn't include this on the slides.

For the same reason it might not be all that appropriate to this
discussion.  The balance
between iimplementation and specfication is a decision for individual
companies to
make.  In my case, the work is concerned with the development of a new
implementation
but the same principle applies.  My employer will have opinions about this
question and
I'm going to pay attenntion to my company's needs.  It is not up to me how
Oracle people
spend their time or up to Oracle about how mine is spent.

> There
> are substantial improvements that can be made to existing base
implementations,
> but these would be done by many of the same folks who would be working on
new
> protocol.

True.  If people are so busy doing implementation that they have no time to
work on
specs we will have to adjust.

> Grouping One:
> Enable greater transport parallelism in NFS.

This strikes me as a highly unnatural grouping to use as a basis for
decision-making.
It might make sense in a slide preentation.

> This includes multipathing

I'm working on this.  I mentioned prodcucing
draft-dnoveck-nfsv4-mv1-msns-update at the meeting.

My understanding is that Chuck will submit draft-cel-mv0-trunking-update
which addresses
trunking/multipathing in the v4.0 context.

> and
> use of pNFS with RDMA. No changes to RPC-over-RDMA or NFS/RDMA are
necessary,
> and this would bring important performance capabilities to NFS, especially
> by enabling very low latency client access to Storage Class Memory.

I agree with doing this but what we heard at ietf99 was a very early
draft.  This is a long-term effort which should be pursued but this
document is quite a ways from being ready to become
a working group document.

> Grouping Two:
> Incrementally improve RPC-over-RDMA version 1. The main idea here is to
> introduce a per-connection transport property negotiation mechanism to
replace
> CCP. This would enable variable size (ie larger) inline thresholds and
the use
> of Remote Invalidation in some instances with existing deployments.

Implementation of this is important and will be proceding.  I intend to
implement it
regardlersss of working group decisions about priorities or the pospect of
controversy.
Perhaps this belongs in group Zero.  The work to get this
already completed specfication work through the IETF process is relatively
small although there may be controversies that need to be resolved.  See
below.


> Grouping Three:
> Pursue RPC-over-RDMA version 2. This would open a variety of avenues by
which
> many of the perceived shortcomings of RPC-over-RDMA version 1 could be
> addressed.


> IMO Zero and One are where we can get the greatest bang for the buck in
the
> near term.

Two issues with this:

   - I believe Zero is out of scope for this discussion for reasons already
   given.
   - The pNFS-RDMA layout type is not a near-term item.  I agree we should
   pursue it, but it is not going to happen soon, especially if much of the
   working group is busy doing implementation work.  In any case, when
   Christoph thinks this is ready to be a working group item, we should
   definitely consider this seriously.

> The current proposal for Grouping Two (draft-cel-nfsv4-rpcrdma-cm-pvt-msg)
is
> controversial.

if Tom, or anyone else, has objections to proceeding with this, we need to
discuss this on the list and work to arrive at a reasonable resolution.  My
understaning is that Tom objects to this being standards-track.  While I
believe it should be standards-track, I can live with an Informatinal RFC.
  I just don't want ths to be an undocumented de facto standard.

> Grouping Three would be an immense amount of work to generalize
> some things for less gain than we might see with work in Grouping Zero or
One.

I don't think "immense" is the right word but let's not argue about that.
In any case, we need
relef from the major performance issue that were somehow baked into Version
One.  We can
live with Version One if we have the implementation corresponding to
cm-pvt-msg.  I'm not sure whether this is Grouping Two (as above) or Zero
(because we are are talking about existing implementations).  In any case
we have running code and should take advantage of Chuck's good work in
specifying this regardless of any potential controversy.



On Mon, Jul 31, 2017 at 2:34 PM, Chuck Lever <chuck.lever@oracle.com> wrote:

> Hi-
>
> During the nfsv4 WG meeting at IETF 99, I presented some slides on possible
> next steps for NFS/RDMA and related protocols.
>
> https://datatracker.ietf.org/meeting/99/materials/slides-
> 99-nfsv4-nfsrdma-next-steps-chuck-lever
>
> There are many areas that could use some attention, yet only a handful of
> engineering resources are available. Slides 11 - 18 describe the directions
> that could IMO be fruitful individual next steps.
>
> I hoped my talk could frame a conversation about where we think the highest
> priority is, but it was decided that there were enough interested and
> informed people who were not present that such a conversation should be
> moved to this mailing list and continued after IETF 99.
>
> Note that RFC 5667bis is now "Submitted to IESG for Publication". The work
> items in the slides assume that this document will be completed and
> published
> as planned.
>
> The slides split the possibilities into three somewhat orthogonal
> groupings.
> A fourth grouping arose during discussion in the room, which I'll add as
> "Grouping Zero" below. We can choose any or all of these approaches.
> Opinions
> are welcome as to what order, whether something is left out, or what might
> be
> removed from this list. Groupings Two and Three could introduce new Working
> Group documents, and thus have implications for our charter milestone
> count.
>
>
> Grouping Zero:
> Focus on improving existing implementations of RPC-over-RDMA and NFS/RDMA.
> No
> IETF action needed, which is why I didn't include this on the slides. There
> are substantial improvements that can be made to existing base
> implementations,
> but these would be done by many of the same folks who would be working on
> new
> protocol.
>
>
> Grouping One:
> Enable greater transport parallelism in NFS. This includes multipathing and
> use of pNFS with RDMA. No changes to RPC-over-RDMA or NFS/RDMA are
> necessary,
> and this would bring important performance capabilities to NFS, especially
> by enabling very low latency client access to Storage Class Memory.
>
>
> Grouping Two:
> Incrementally improve RPC-over-RDMA version 1. The main idea here is to
> introduce a per-connection transport property negotiation mechanism to
> replace
> CCP. This would enable variable size (ie larger) inline thresholds and the
> use
> of Remote Invalidation in some instances with existing deployments.
>
>
> Grouping Three:
> Pursue RPC-over-RDMA version 2. This would open a variety of avenues by
> which
> many of the perceived shortcomings of RPC-over-RDMA version 1 could be
> addressed.
>
>
> IMO Zero and One are where we can get the greatest bang for the buck in the
> near term.
>
> Latency to access Storage Class Memory is substantially shorter than the
> latency of traversing the NFS and RPC stack on just the client. Thus
> bypassing RPC entirely (eg by using an RDMA layout type) seems like the
> best
> strategy we have for tapping the potential of this new variety of durable
> storage.
>
> The current proposal for Grouping Two (draft-cel-nfsv4-rpcrdma-cm-pvt-msg)
> is
> controversial. Grouping Three would be an immense amount of work to
> generalize
> some things for less gain than we might see with work in Grouping Zero or
> One.
>
>
> --
> Chuck Lever
>
>
>
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4
>