Re: [nfsv4] pNFS file and use of RPC-over-RDMA to access an NFSv4.1 data server

David Noveck <davenoveck@gmail.com> Sat, 28 April 2018 13:30 UTC

Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 58AA8126DFF for <nfsv4@ietfa.amsl.com>; Sat, 28 Apr 2018 06:30:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Level:
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TOYTG_xn6mwI for <nfsv4@ietfa.amsl.com>; Sat, 28 Apr 2018 06:30:36 -0700 (PDT)
Received: from mail-ot0-x230.google.com (mail-ot0-x230.google.com [IPv6:2607:f8b0:4003:c0f::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1A94A126BF3 for <nfsv4@ietf.org>; Sat, 28 Apr 2018 06:30:36 -0700 (PDT)
Received: by mail-ot0-x230.google.com with SMTP id g7-v6so5047703otj.11 for <nfsv4@ietf.org>; Sat, 28 Apr 2018 06:30:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=vKwiEIx4W3R/f96nhOsi/zqBkTdpJJYLXKmxdkz80rM=; b=CnsZ+kfodUk/2XwvPWxMHuaOemZhAwLXt9vA+mf/eOBEJsaaI1V3xarlzAUgcqw6wN DLgdHgJ9OGmUyT7+j16pq6IeEVlLGcuvIEYgQSb3W55T4/J1f3m/snbfJl6Iipz2VvdS zq1qVkSWfgYcM4XAEGjdxop2H/cXJKzB+/WtpU4z2dUdWvO6EYc6P8lHMnBdVl1ljc7K 0xGC6tjRy6x4RU5mVUF5n2j15qOt3d0twT+uStPg7ry+02bgPH80nDvcmIlrAsG9UuGo OMDIrmK/YIm1UQzdtsqt7jvVN3g0dxNTRLjlR9MZlDJ6sTMea/POdnS1hbZxBsQx4pbL 1clg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=vKwiEIx4W3R/f96nhOsi/zqBkTdpJJYLXKmxdkz80rM=; b=aFNhPJcd5AbeWJxjjjB4IN9NT3DbHJHhJEvdy1nMYyuOolvbypcLDV0qtzhDJERAzy hF2xmObz/3A7PH79o2me7wai9gsMP/wo6JNdujMak7GBhtYguubmxpHOyvPLXQROD0sO IcPQfDwPXT5TFQKsQABUyOTko/NbUMEmJMxB0qyIy3grTtCTi6JSeRangV1di1aX16uD I/aYmkaIhLwHX9RyaTu8xnAGzhZLBBKLx5c0FNGh/s3tZqzZdLIdjbZss0wo/rw7cJt3 IHgQC0TQcySiqmY+0lbamTjCZ70BkV7meTxUYrp8ku9HonIQF41ELhSEjdTalRI5EeKv aMoA==
X-Gm-Message-State: ALQs6tAAeKOb0xkb/uCgQ7rV/vY4NOdqTADQwZYOunYMZZctri3xcGFY Cbe8qAP5nO6UVL2sYJOz/nwqHLZgeVW/VzifZkFL1w==
X-Google-Smtp-Source: AB8JxZpc/VbgXxGQc6keee1uBtCmYSkg2+9YBgf5lXRKHFAkm0PLscvte2VQC9qwCEhnjByTSbHfM8/nsG5CAxX1LLs=
X-Received: by 2002:a9d:2e0f:: with SMTP id q15-v6mr3810182otb.302.1524922235098; Sat, 28 Apr 2018 06:30:35 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.201.55.233 with HTTP; Sat, 28 Apr 2018 06:30:34 -0700 (PDT)
In-Reply-To: <22B3EA02-3085-41FF-9393-EB827235CB1A@oracle.com>
References: <CADaq8jfcEa3xz1UNwrWkcvLZAr=eHKzQKppR+Kuqbzyq8H2ueQ@mail.gmail.com> <57F19FA8-2491-4CAB-B3B0-09A194BF09F9@oracle.com> <MWHPR2101MB0809103188F9DDFECCCB7A3CA0B60@MWHPR2101MB0809.namprd21.prod.outlook.com> <CADaq8jdqitBPXond=K3kZu7mzc0+uWuZ1BTn7pxN9TBsUgR9LA@mail.gmail.com> <22B3EA02-3085-41FF-9393-EB827235CB1A@oracle.com>
From: David Noveck <davenoveck@gmail.com>
Date: Sat, 28 Apr 2018 09:30:34 -0400
Message-ID: <CADaq8jfYXDF-_acmZ5W-0Hx=f=_UWuahhxFwskGz93v7hF8==Q@mail.gmail.com>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: Tom Talpey <ttalpey@microsoft.com>, NFSv4 <nfsv4@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000221c7b056ae8a0e5"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/21ET5rjLGXMjrlEpfnmGnjThQK8>
Subject: Re: [nfsv4] pNFS file and use of RPC-over-RDMA to access an NFSv4.1 data server
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 28 Apr 2018 13:30:39 -0000

 > NFSv4.1 has a "gear shift" capability where, after establishment
> of a TCP connection, the client and server detect that an RDMA
> connection is possible. At this point, I'm not aware of any
> implementation of this capability. Are we as a community deciding
> to abandon this approach for NFS?

I don't see any reason to do so.  There are a bunch of things in RFC 5661
for which there is no current implementation.  In this case, it is not
surprising that this has not been implemented so far given the delay in
getting
RDMA-capable implementations in general.    The fact that this is an
iWarp-only
capability is an impediment to investing resources in this feature so
further
delays are likely.  However, there is no time limit for feature
implementations, so
I don't see any reason to abandon this.

>> I think that whatever is decided on these questions, this needs to be
addressed
>> in draft-ietf-nfsv4-mv1-msns-update.  Given that this document is
presenting a
>> major respecification of this area, it needs to do so in light of the
fact that multiple
>> connection types are on the agenda now.  Similar logic might apply in
the case of
>> draft-ietf-nfsv4-mv0-trunking-update as well.  I intend to add
discussion of these
>> issues to draft-ietf-nfsv4-migration-issues-15.

>> The specific area of interoperability concern is probably limited
>> to exactly how a server advertises its RDMA capability, if we
>> conclude that is useful or necessary.

Those matters are fairly clear for v4.0 and v4.1:

For v4.0, there is no way to do this and the issue is probably not so
serious that we should provide a correction (as provided for by RFC 8178),
so, while useful and deisrable, it is pretty much impossible at this point.

For v4.1, there is a way to advertise RDMA capability (using
FSLI4TF_RDMA) specified in RFC5661, and I certainly hope that

is useful, necessary and adequate.

It is possible that other, more extensive, facilities such as Tom mentioned
that
SMB3 has might be desirable as potential extensions in NFSv4.2.

Beyond the server capability, there needs to be discussion of possible
client responses.   In the context of trunking, a client TCP connection to
address X is distinct from the RDMA connection to that same address, so
the trunking descriptons, at least for v4.1, will need to worked over to
reflect
that fact.

For example, we have to cover the case in which the client discovers the
RDMA
capability using fs_locations_info, finds out after using EXCHANGE_ID that
the
connections are session-trunkble and binds that new connection to the
existing
session.

In any case, draft-ietf-nfsv4-migration-issues15 will mention areas that
might
need to be addressed to reflect the fact the combination client and server
address
does not identify a connection and that connection type needs to be
involved as
well.



On Wed, Apr 25, 2018 at 3:30 PM, Chuck Lever <chuck.lever@oracle.com> wrote:

>
> > On Apr 24, 2018, at 7:34 AM, David Noveck <davenoveck@gmail.com> wrote:
> >
> > With regard to Chuck's comments about the analogous problems for
> referrals, I have some questions:
> >       • I assume the same problem can arise with regard to migrations,
> although, for that case it is a reasonable guess, in the absence of any
> other indications that the destination is of the same connection type as
> the source.  Are migration and referral essentially the same in this regard?
>
> There are several scenarios that I will put in the category of
> "automounted," for lack of a better term. The distinctive
> characteristic of these scenarios is that a mount occurs, but
> there is no clear opportunity to provide parameters for the
> behavior of that mount; especially in this case, an option to
> determine which transport is to be used.
>
> + cd'ing into an automounted directory
> + crossing a server FSID boundary
> + following a referral
> + recovering from a migration event
> + failing over to a replica
>
> We might also place into this category any explicit mount
> request where the transport type was not specified by the
> administrator.
>
> Client implementations typically have a set of policies which
> determine the operating parameters of such mounts. On Linux
> for example, proto=tcp is the default. The Solaris client tries
> proto=rdma first, then falls back to proto=tcp.
>
> For automounts, Linux currently tries proto=rdma only if the
> parent directory uses proto=rdma (inheritance) and falls back
> to TCP if RDMA is not available. Solaris will always try
> proto=rdma first, then try proto=tcp.
>
> The salient difference between these two implementations seems
> to be that Linux supports RoCE and iWARP, where there are
> scenarios in which it would be difficult to automatically
> determine whether end-to-end RDMA is available; currently
> Solaris supports only InfiniBand, which technically can speak
> only to other IB hosts and thus determining whether end-to-end
> RDMA is available is expected to be reliable.
>
>
> >       • How useful is  the FSLI4TF_RDMA flag in addressing this
> problem?  Do we need anything else for satisfactory support?
>
> Because NFS is IP-based, there necessarily is a translation from
> an IP address to a GUID for use by RoCE or InfiniBand. That is
> handled by the client's Connection Manager implementation and
> the HCA/NICs on the endpoints. iWARP goes through the same
> motions but I don't believe an address translation is required.
>
> Usually the CM can determine quickly that there is an
> end-to-end RDMA path available. The issue is how frequently
> there will be a combination of fabric and NIC that prevents
> quick determination of RDMA capability.
>
>
> >       • What is the situation with regard to NFSv4.0?  How  can  RDMA
> connection be established for migration/referrals in the absence of
> fs_locations_info? Can we regard this as a can't-get-there-from-here
> situation?  If not, how do we address this?
>
> See above: there are ad hoc implementations of transport class
> negotiation that work on NFSv4.0.
>
> NFSv4.1 has a "gear shift" capability where, after establishment
> of a TCP connection, the client and server detect that an RDMA
> connection is possible. At this point, I'm not aware of any
> implementation of this capability. Are we as a community deciding
> to abandon this approach for NFS?
>
>
> > I think that whatever is decided on these questions, this needs to be
> addressed in draft-ietf-nfsv4-mv1-msns-update.  Given that this document
> is presenting a major respecification of this area, it needs to do so in
> light of the fact that multiple connection types are on the agenda now.
> Similar logic might apply in the case of draft-ietf-nfsv4-mv0-trunking-update
> as well.  I intend to add discussion of these issues to
> draft-ietf-nfsv4-migration-issues-15.
>
> The specific area of interoperability concern is probably limited
> to exactly how a server advertises its RDMA capability, if we
> conclude that is useful or necessary.
>
>
> --
> Chuck Lever
>
>
>
>