[nfsv4] NFS/RDMA next steps

Chuck Lever <chuck.lever@oracle.com> Mon, 31 July 2017 18:34 UTC

Return-Path: <chuck.lever@oracle.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E5338132788 for <nfsv4@ietfa.amsl.com>; Mon, 31 Jul 2017 11:34:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7
X-Spam-Level:
X-Spam-Status: No, score=-7 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H2=-2.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yXpPrkq7Kop3 for <nfsv4@ietfa.amsl.com>; Mon, 31 Jul 2017 11:34:34 -0700 (PDT)
Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DCC4D132787 for <nfsv4@ietf.org>; Mon, 31 Jul 2017 11:34:19 -0700 (PDT)
Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id v6VIYIaw027121 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for <nfsv4@ietf.org>; Mon, 31 Jul 2017 18:34:19 GMT
Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id v6VIYIWP007482 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for <nfsv4@ietf.org>; Mon, 31 Jul 2017 18:34:18 GMT
Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id v6VIYHF0001692 for <nfsv4@ietf.org>; Mon, 31 Jul 2017 18:34:18 GMT
Received: from anon-dhcp-171.1015granger.net (/68.46.169.226) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 31 Jul 2017 11:34:17 -0700
From: Chuck Lever <chuck.lever@oracle.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Message-Id: <53DF3636-D420-4FAA-B1B0-8824602CBB72@oracle.com>
Date: Mon, 31 Jul 2017 14:34:17 -0400
To: NFSv4 <nfsv4@ietf.org>
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
X-Mailer: Apple Mail (2.3124)
X-Source-IP: aserv0021.oracle.com [141.146.126.233]
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/Q9AEeUvqI_l_RWdRQ7ppzoHVLl8>
Subject: [nfsv4] NFS/RDMA next steps
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 31 Jul 2017 18:34:36 -0000

Hi-

During the nfsv4 WG meeting at IETF 99, I presented some slides on possible
next steps for NFS/RDMA and related protocols.

https://datatracker.ietf.org/meeting/99/materials/slides-99-nfsv4-nfsrdma-next-steps-chuck-lever

There are many areas that could use some attention, yet only a handful of
engineering resources are available. Slides 11 - 18 describe the directions
that could IMO be fruitful individual next steps.

I hoped my talk could frame a conversation about where we think the highest
priority is, but it was decided that there were enough interested and
informed people who were not present that such a conversation should be
moved to this mailing list and continued after IETF 99.

Note that RFC 5667bis is now "Submitted to IESG for Publication". The work
items in the slides assume that this document will be completed and published
as planned.

The slides split the possibilities into three somewhat orthogonal groupings.
A fourth grouping arose during discussion in the room, which I'll add as
"Grouping Zero" below. We can choose any or all of these approaches. Opinions
are welcome as to what order, whether something is left out, or what might be
removed from this list. Groupings Two and Three could introduce new Working
Group documents, and thus have implications for our charter milestone count.


Grouping Zero:
Focus on improving existing implementations of RPC-over-RDMA and NFS/RDMA. No
IETF action needed, which is why I didn't include this on the slides. There
are substantial improvements that can be made to existing base implementations,
but these would be done by many of the same folks who would be working on new
protocol.


Grouping One:
Enable greater transport parallelism in NFS. This includes multipathing and
use of pNFS with RDMA. No changes to RPC-over-RDMA or NFS/RDMA are necessary,
and this would bring important performance capabilities to NFS, especially
by enabling very low latency client access to Storage Class Memory.


Grouping Two:
Incrementally improve RPC-over-RDMA version 1. The main idea here is to
introduce a per-connection transport property negotiation mechanism to replace
CCP. This would enable variable size (ie larger) inline thresholds and the use
of Remote Invalidation in some instances with existing deployments.


Grouping Three:
Pursue RPC-over-RDMA version 2. This would open a variety of avenues by which
many of the perceived shortcomings of RPC-over-RDMA version 1 could be
addressed.


IMO Zero and One are where we can get the greatest bang for the buck in the
near term.

Latency to access Storage Class Memory is substantially shorter than the
latency of traversing the NFS and RPC stack on just the client. Thus
bypassing RPC entirely (eg by using an RDMA layout type) seems like the best
strategy we have for tapping the potential of this new variety of durable
storage.

The current proposal for Grouping Two (draft-cel-nfsv4-rpcrdma-cm-pvt-msg) is
controversial. Grouping Three would be an immense amount of work to generalize
some things for less gain than we might see with work in Grouping Zero or One.


--
Chuck Lever