Re: [nfsv4] pNFS file and use of RPC-over-RDMA to access an NFSv4.1 data server

David Noveck <davenoveck@gmail.com> Tue, 24 April 2018 13:34 UTC

Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D0435129C59 for <nfsv4@ietfa.amsl.com>; Tue, 24 Apr 2018 06:34:06 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.698
X-Spam-Level:
X-Spam-Status: No, score=-2.698 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id le-tcfowr_I6 for <nfsv4@ietfa.amsl.com>; Tue, 24 Apr 2018 06:34:04 -0700 (PDT)
Received: from mail-oi0-x22c.google.com (mail-oi0-x22c.google.com [IPv6:2607:f8b0:4003:c06::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9BB75129C51 for <nfsv4@ietf.org>; Tue, 24 Apr 2018 06:34:03 -0700 (PDT)
Received: by mail-oi0-x22c.google.com with SMTP id 11-v6so2094576ois.8 for <nfsv4@ietf.org>; Tue, 24 Apr 2018 06:34:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=fjL9Xl5j7RciiVVNKL88YSQWZb+Ng26jp897p6d302s=; b=EVOwTv32YbUo/Erk2Ejpb1irwuBIrZKdieVWyPSlWhFcfqS5yzPEzAFZuOkegfYX/O KkyKQboTRL+715ax3GtXnrBCTO9abhwqelT033kespZclr4uex6/ielMe9q76whzA88j 1bL5vV4CZnzg4h6h8hnRELlF7R/npUVuc+grjALfgeAYdLJ85mLbYhCv26hOLKxlzNFX UMZo5004T4z76/rAVEZNV3eSGMb2s3PE5DmLcgMIC9cDc3xr2ozvWk7Iw1teX3wo4dkc CpKS4eDx8xfEAGGeovTHWnRfABjIOZexR8EzFDQADUZMK20824HOLretdm82i5phExsy zASQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=fjL9Xl5j7RciiVVNKL88YSQWZb+Ng26jp897p6d302s=; b=obLM8ZLo3z1RmYlF7jZGBapD8Dee39qb+Tp41oHMjk7smZH5kooOhp/9eTy6H+IJZO Ew/ThyK69ry0ScGqDb0IfRLWLfNAlX+Z/6t126nw2M0vArqMCU+n121ixOYmRvnTh12Q Bt1N8s1J+XX35YXjk4OnFTW22TE3mZx5qmKNZLDdHYuvjeMBd0J10aysfU0JX6ENTyUy +Olfl1QwqIK6hrRjPIWeCLuBqWvw0LUM3LEZkeZ5c02RRQPReCAN10SiGWEsaNpi3XQh yCcXTuemZUSPYLUR8ggks+kcwF/HKkcK20PxP/9XW3xVMG8uyZny+Js27DniKG37fyR7 CSEQ==
X-Gm-Message-State: ALQs6tASaheFxOoGDM1A4dedvjUN0ASbLP15oVTrfRFfY5WCZ9Nk4R/4 PqG+bQwUiAJPMPB6DOsnLyM6T9h3B6cqF5DH+J0=
X-Google-Smtp-Source: AIpwx49oxVwa5x9H9RyggnbX+ty2lPO3HY3zaz1EIfLVD2ri5+U47TKXibeZPm/OxYmNhUnUwvyz/hkdG9MMv2Ln3z8=
X-Received: by 2002:aca:d08c:: with SMTP id j12-v6mr15552031oiy.276.1524576842675; Tue, 24 Apr 2018 06:34:02 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.201.55.233 with HTTP; Tue, 24 Apr 2018 06:34:02 -0700 (PDT)
In-Reply-To: <MWHPR2101MB0809103188F9DDFECCCB7A3CA0B60@MWHPR2101MB0809.namprd21.prod.outlook.com>
References: <CADaq8jfcEa3xz1UNwrWkcvLZAr=eHKzQKppR+Kuqbzyq8H2ueQ@mail.gmail.com> <57F19FA8-2491-4CAB-B3B0-09A194BF09F9@oracle.com> <MWHPR2101MB0809103188F9DDFECCCB7A3CA0B60@MWHPR2101MB0809.namprd21.prod.outlook.com>
From: David Noveck <davenoveck@gmail.com>
Date: Tue, 24 Apr 2018 09:34:02 -0400
Message-ID: <CADaq8jdqitBPXond=K3kZu7mzc0+uWuZ1BTn7pxN9TBsUgR9LA@mail.gmail.com>
To: Tom Talpey <ttalpey@microsoft.com>
Cc: Chuck Lever <chuck.lever@oracle.com>, NFSv4 <nfsv4@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000023fb40056a98351d"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/Q3NKymXiKDTsh4lNU3kgO0EcTMw>
Subject: Re: [nfsv4] pNFS file and use of RPC-over-RDMA to access an NFSv4.1 data server
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Apr 2018 13:34:07 -0000

Thanks to Tom and Chuck for their responses.  I'll respond to both  below.

First of all, let me say how my server will behave on this issue at future
testing events.

I intend to specify both rdma/rdma6 netaddr and a port number of 20049 on
the devinfos associated layouts when the data server supports RDMA.
Clients can then key off either of those two indications or they can use
the fact that the MDS connection is RDMA, if they choose.

I also intend to support use of  the FSLI4TF_RDMA flag as an indication
that an RDMA connection can be used but probably won't have that support on
my intial server.

Whatever the working group deciides to do on this issue, we need to
document the choice in some way.  Maybe we will need a short RFC on this
specfic issue rather than trying to update RFC8267 to include this.   This
short RFC would probably be marked as updating RFC5661, since it does
significantly revise chapter 13 of that document.  I have given some
thought to including this stuff in draft-ietf-nfsv4-mv1-msns-update but so
far it doesn't seem to fit very well.

With regard to Chuck's comments about the analogous problems for referrals,
I have some questions:

   - I assume the same problem can arise with regard to migrations,
   although, for that case it is a reasonable guess, in the absence of any
   other indications that the destination is of the same connection type as
   the source.  Are migration and referral essentially the same in this regard?
   - How useful is  the FSLI4TF_RDMA flag in addressing this problem?  Do
   we need anything else for satisfactory support?
   - What is the situation with regard to NFSv4.0?  How  can  RDMA
   connection be established for migration/referrals in the absence of
   fs_locations_info? Can we regard this as a can't-get-there-from-here
   situation?  If not, how do we address this?

I think that whatever is decided on these questions, this needs to be
addressed in draft-ietf-nfsv4-mv1-msns-update.  Given that this document is
presenting a major respecification of this area, it needs to do so in light
of the fact that multiple connection types are on the agenda now.  Similar
logic might apply in the case of draft-ietf-nfsv4-mv0-trunking-update as
well.  I intend to add discussion of these issues to
draft-ietf-nfsv4-migration-issues-15.

I agree with Tom that human internention is going to be required to deal
with switching/routing problems/incompatibilities although I am not joining
in his "guarantee".   Even though no protocol additions seem likely, we
have to be aware of these difficulties as we draft and review documents in
this area.  As Chuck indicates, the server's and client's knowledge that
they have support are not sufficient to conclude that it is appropriate
that it be used.  This probably means that we will need to avoid most uses
of RFC2119 terms as we describe this area.  It also seems to me that it
would be better if we avoided those terms' lower-case homonyms as well.

With regard to Tom's discussion of SMB3 facilities for presentation of
server interface characteristics, this has been discussed for NFSv4 but it
didn't seem to fit into NFSv4.1.   It is possible to provide such a
facility as an optional extension to NFSv4.2 if someone is interested in
working on this.  Unless there are objections, I will add a discussion of
this possibility to draft-ietf-nfsv4-migration-issues-15.

On Wed, Apr 18, 2018 at 12:07 PM, Tom Talpey <ttalpey@microsoft.com> wrote:

> > -----Original Message-----
> > From: nfsv4 <nfsv4-bounces@ietf.org> On Behalf Of Chuck Lever
> > Sent: Wednesday, April 18, 2018 11:05 AM
> > To: David Noveck <davenoveck@gmail.com>
> > Cc: NFSv4 <nfsv4@ietf.org>
> > Subject: Re: [nfsv4] pNFS file and use of RPC-over-RDMA to access an
> NFSv4.1
> > data server
> >
> > Hi Dave-
> >
> > Thanks for bringing this up. I agree it should be explored.
> > A few comments below.
> >
> >
> > > On Apr 18, 2018, at 9:51 AM, David Noveck <davenoveck@gmail.com>
> > wrote:
> > >
> > > As far as I can determine, there has not been significant discussion
> of the
> > question of how a client which receives a pNFS file layout is to
> determine
> > whether the connection to the data server is to be established using TCP
> or
> > RPC-over-RDMA.  As a result, the Linux NFS client, when it receives a
> file
> > layout, unconditionally establishes a TCP connection to the data
> server.   I’ve
>
> <all good stuff, sniped>
>
> > >     • Expecting the client to connect, at first, using a tcp
> connection and
> > then interrogating the fs_locations_info attribute and using the
> FSLI4TF_RDMA
> > flag as an indication that the client is to access this fs using an
> RDMA-capable
> > transport .
> >
> > This seems like a reliable long-term solution, and it has the
> > benefit of providing not one but a list of interfaces and
> > transport types.
>
> I just want to interject that this is what is done by the server side of
> SMB3
> multichannel, which additionally provides an address and link speed. The
> client then sorts by type and speed, chooses some preferences and performs
> a routing lookup to assess possible connectivity for each type. It then
> proceeds
> to attempt one or more additional connections, moving traffic to them
> if/when
> they succeed.
>
> > However, it falls prey to the issue above, where both ends could
> > support RDMA, but the end-to-end network path might not.
>
> Absolutely yes. And worse, if the connection succeeds but for some reason
> cannot sustain heavy traffic. We see this All The Time with RoCE protocols,
> which depend on accurate PFC and sometimes ECN behaviors to be
> performed by each and every switch hop in the path. The symptoms are
> maddening, and the solution even more so. We hope the RoCE vendors
> will step up to address this over time with diagnosibility information, and
> further refinement of the RoCE protocol requirements. I don't think that
> pNFS or NFSv4 should attempt to address this.
>
> Bottom line, simple is best at the pNFS/NFSv4 layer. I guarantee human
> intervention will always be required, somewhere along the way.
>
> Tom.
>