[nfsv4] pNFS file and use of RPC-over-RDMA to access an NFSv4.1 data server
David Noveck <davenoveck@gmail.com> Wed, 18 April 2018 13:52 UTC
Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 64C1D12D7F8 for <nfsv4@ietfa.amsl.com>; Wed, 18 Apr 2018 06:52:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.699
X-Spam-Level:
X-Spam-Status: No, score=-2.699 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4s5G1t7kb0jt for <nfsv4@ietfa.amsl.com>; Wed, 18 Apr 2018 06:51:59 -0700 (PDT)
Received: from mail-oi0-x234.google.com (mail-oi0-x234.google.com [IPv6:2607:f8b0:4003:c06::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5FB0112D86E for <nfsv4@ietf.org>; Wed, 18 Apr 2018 06:51:59 -0700 (PDT)
Received: by mail-oi0-x234.google.com with SMTP id e11-v6so1619381oii.11 for <nfsv4@ietf.org>; Wed, 18 Apr 2018 06:51:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=plbpkpDEzJzwplv+uMskV34yxELGFAENaFkdzyKFy40=; b=g1uRXhVGocU05Mf325qnF2cwL4Ep/rHKCc3F7r5K3hFtRa4W34ffrQtP8sqB/iF4LV KKcB2Lq6J9Gqpdql+TF7IudE0ZKmB7PrWTdtSRg37ynVgYuoLkxWv61iqMQn5zJiQDD9 Q7LJfk8D9BJQ79qIWC1cf7BxR8Z5TF96vWcGrUU7lIt8tQkKUgawkS6ldsy4cKDK7CA7 NHrcU2ZSpdWOVRDKbCL6PPuHxIQ+IVHsfy8ByQFAqHfN1H3M3gT3MXLfunu3YkY+CHk+ 0MmwqX2q95dwoqOD4ATgur+l3qmnNKm3prt+KR3JoFKjKqgc+V7MlEgLa13Ui1XnWhW5 ydBg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=plbpkpDEzJzwplv+uMskV34yxELGFAENaFkdzyKFy40=; b=X+sFj6X1zprtHGI4WkjwqDjO4+cBxPfMDseBauRPyVAj9AzGrdV5wvNNIrMNHqgGKs G4t+GOMqhuPfL6PKdm0WHBioJMrp60WLalbc289r+tPX0hHSOjpFyABuapa9yeiFHsJK Qt2Ub9Gro3vUuKtBtdpuigJUbgTs8WaktsPvN3YRqcp2n5f+HaaiSuOraTuFUl/J7Ahe m6MIUHE9TefJ/6Nqw6ubAoujCNR/4JHV47dIpGeztAXINzgAtuJQmLl+ZHLZkXKPW3vl AnKaH2RRoJbSRqWvgsP+SxcMzzYBKTXeQF7s8LPrrk8y7TcE+JJZKGklk51C/nI5EjDv 6Q/g==
X-Gm-Message-State: ALQs6tDtjz1PtO3OujQpEwhLXFJIVs15gisrrdOl22k/E6YewKpR/KRH Y2yJaAodXDpdRJXCHzNonrXJmyg03qFBV0JQHpVxEQ==
X-Google-Smtp-Source: AIpwx4+PE7SK0p2oTF7XkaPxEXbcbUIBscjuSJgkgwPPVjPSDLXIA//pOguHinN7B9efEGPEWVUGr3a1MMGtMsggKvQ=
X-Received: by 2002:aca:d786:: with SMTP id o128-v6mr1118685oig.10.1524059518419; Wed, 18 Apr 2018 06:51:58 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.201.55.233 with HTTP; Wed, 18 Apr 2018 06:51:57 -0700 (PDT)
From: David Noveck <davenoveck@gmail.com>
Date: Wed, 18 Apr 2018 09:51:57 -0400
Message-ID: <CADaq8jfcEa3xz1UNwrWkcvLZAr=eHKzQKppR+Kuqbzyq8H2ueQ@mail.gmail.com>
To: NFSv4 <nfsv4@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000364ba6056a1fc28e"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/vH9nlPTk88uQTreCyBmKY4XOVgw>
Subject: [nfsv4] pNFS file and use of RPC-over-RDMA to access an NFSv4.1 data server
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 18 Apr 2018 13:52:01 -0000
As far as I can determine, there has not been significant discussion of the question of how a client which receives a pNFS file layout is to determine whether the connection to the data server is to be established using TCP or RPC-over-RDMA. As a result, the Linux NFS client, when it receives a file layout, unconditionally establishes a TCP connection to the data server. I’ve had to disable pNFS support on the server for my testing, so that the NFS-over-RDMA data paths can get tested but I’d prefer if we could arrive at an approach which allows interoperable implementations supporting both pNFS file and RPC-over-RDMA to be developed and tested. I have discussed this issue with some people and arrived at an initial list of ways in which this choice might be made. Additions are possible but our principal task is arriving at a compatible subset which can serve as a basis for use in testing events and eventual standardization and deployment. One critical criterion governing our choice is the expected penalty for making the wrong choice. When this is high, the need is strong for a correct decision, while, if we could reduce the penalty, a reasonable basis for a guess could be an acceptable choice. In this connection, it is worth noting that currently Linux’s RoCE support for RPC-over-RDMA winds up waiting 60 seconds (i.e. about 10 million times the typical round-trip inter-node delay time) before deciding that RPC-over-RDMA service is not available. Until and unless that is changed, we would need a quite accurate/reliable means of determining when such support is available. In any case, here is the list of approaches I have come up with so far. I’d like the working group to augment the list as new approaches arise and then decide on a way forward that we can proceed on for use in future testing events and as a basis for standardization. - Deciding that, when the metadata server connection is RDMA, data connections are, by default, presumed to be RDMA. This is a basis for a guess and would not be acceptable if the penalty for a wrong guess is high. There will certainly be cases in which RDMA support is present on the MDS but might not be available on one or more DS’s. - Using the port specified in the layout with 20049 indicating that RDMA is to be attempted. I don’t see any practical issue with this but it is kind of hacky. - Using a value of “rdma’ or “rdma6” instead of “tcp” or “tcp6”, in the netaddr field of the device info. I’m not sure how hard or time-consuming it would be to have a new netaddr assigned in the case in which we use existing universal address formats. We need to investigate this. - Expecting the client to connect, at first, using a tcp connection and then interrogating the fs_locations_info attribute and using the FSLI4TF_RDMA flag as an indication that the client is to access this fs using an RDMA-capable transport . - Expecting the client to connect, at first, using a tcp connection and then do a CREATE_SESSION specifying CREATE_SESSION4_FLAG_CONN_RDMA, or to address transports in which a within-connection step-up is not possible i.e . RoCE or Infiniband, defniing a new similar create-session flag to support creation of a new connection to obtain RDMA support on the session using a BIND_CONN_TO_SESSION on the new connection.
- [nfsv4] pNFS file and use of RPC-over-RDMA to acc… David Noveck
- Re: [nfsv4] pNFS file and use of RPC-over-RDMA to… Chuck Lever
- Re: [nfsv4] pNFS file and use of RPC-over-RDMA to… Tom Talpey
- Re: [nfsv4] pNFS file and use of RPC-over-RDMA to… David Noveck
- Re: [nfsv4] pNFS file and use of RPC-over-RDMA to… Chuck Lever
- Re: [nfsv4] pNFS file and use of RPC-over-RDMA to… Tom Talpey
- Re: [nfsv4] pNFS file and use of RPC-over-RDMA to… Black, David
- Re: [nfsv4] pNFS file and use of RPC-over-RDMA to… Tom Talpey
- Re: [nfsv4] pNFS file and use of RPC-over-RDMA to… David Noveck