Re: [nfsv4] [internet-drafts@ietf.org: New Version Notification for draft-hellwig-nfsv4-rdma-layout-00.txt]

David Noveck <davenoveck@gmail.com> Mon, 17 July 2017 11:56 UTC

Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 628B012F257 for <nfsv4@ietfa.amsl.com>; Mon, 17 Jul 2017 04:56:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.698
X-Spam-Level:
X-Spam-Status: No, score=-2.698 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ow4FI9Lejvvv for <nfsv4@ietfa.amsl.com>; Mon, 17 Jul 2017 04:56:32 -0700 (PDT)
Received: from mail-io0-x22d.google.com (mail-io0-x22d.google.com [IPv6:2607:f8b0:4001:c06::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8E19612ECC1 for <nfsv4@ietf.org>; Mon, 17 Jul 2017 04:56:32 -0700 (PDT)
Received: by mail-io0-x22d.google.com with SMTP id z62so41694636ioi.3 for <nfsv4@ietf.org>; Mon, 17 Jul 2017 04:56:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=nh2sCG5EBiyAX0qXEgAYhCOAnpkmlLYUvdyuEc0x5HQ=; b=iBlauhoGcDcJzDQd0c9FmnAtpSxaIz4TBUBZKFtyJqhKO07y+lPF5EpDfSpjG9aDaG 3M0AvmB65lu64/XHrovOhD2786XiHoLmQUROrou8atq3P8g5ABzh/QxlsPppTIkJ1NqZ WgC9kHtAi98gLpM9Sx0km08edFfEvkeaHbN6tN2UUc/IJVKZP9GMCFl/2kDWPnLCh2se yYg29KFIkmSLpcJ7xiaxox3s0xHsOTxShmN6Ou1xVy2rGsM4zPmJFqZEI+Ff9l857F52 zYvT2TA7S4BPA6Aw1hkvvrvk3TimE7OAoEhHhZNfKpW/DC3YUZ9X5AkO99fC0i58/T/7 JvyA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=nh2sCG5EBiyAX0qXEgAYhCOAnpkmlLYUvdyuEc0x5HQ=; b=JaxKWKAx/dMOfvb9JqNg4SSOdHSHHmvHblnc6fNJwiI6DtCZNlfTQytMRRMu9ozecU ObRZvPonMSGlfUUkXe9WOCwieRDDVsWQ1zDWvh4q8ux9Vi4LAsaPVAP1lEs1jAuKU+bp eq3uFLcRYCUQieb74z9PPk4KVok0KOdg7zNMQVQKrrV1lvBoUgXoEsdDvdwOFJ+RxLeB 28qB662DMgRvS8EcfNgO6mS2qvOiKan5fGWt/x4yiI04WwyqD9xC2FB4n6+N0z1iS2Ox gJKJHcg5uxMbQHZhFZ/32YqtycMLLixdgx/uFIEoITQ9xs1EgLMG/TDmyAC8cCt0Oia6 P9gw==
X-Gm-Message-State: AIVw113axPQ7ahlyOgudQ248LfjRFyIoHAc0hCEwyC7CNimcC9nz91bu 0O3MXFArMdYXhBcWbx4Y0EiqUHbIPw==
X-Received: by 10.107.25.78 with SMTP id 75mr19127362ioz.103.1500292591788; Mon, 17 Jul 2017 04:56:31 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.107.57.86 with HTTP; Mon, 17 Jul 2017 04:56:31 -0700 (PDT)
In-Reply-To: <26A3EDDF-200C-49F2-934B-CD9155AECE88@gmail.com>
References: <20170702231000.GA2564@lst.de> <26A3EDDF-200C-49F2-934B-CD9155AECE88@gmail.com>
From: David Noveck <davenoveck@gmail.com>
Date: Mon, 17 Jul 2017 07:56:31 -0400
Message-ID: <CADaq8jdK+nSDBy6xr=VU-eWX8LMvWuZZQdy5VMKrbBrh3RPVKA@mail.gmail.com>
To: Chuck Lever <chucklever@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>, "nfsv4@ietf.org" <nfsv4@ietf.org>
Content-Type: multipart/alternative; boundary="001a113fefa0fe4de405548216ec"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/2ggo3nNfbxhW2h-y4HZZwiIfw1g>
Subject: Re: [nfsv4] [internet-drafts@ietf.org: New Version Notification for draft-hellwig-nfsv4-rdma-layout-00.txt]
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Jul 2017 11:56:35 -0000

> - define the exact connection establishment model.  I'd really
>   like to rely on RDMA/CM for that

Connection establishment can use RDMA/CM to establish an RC
connection.  Unfortunately, as Chuck points out, establishing the
connection is only part of the problem.

> The connection model is critical, because the handles returned
> to clients can work only on one connection (QP).

True :-(

> For example,
> you can't assume that NFS/RDMA will be used to access the MDS,
> nor can you assume that the MDS and DS's are accessed through
> the same HCA/RNIC on the same connection.

Also, you can't assume  there is only a single connection between any
client-DS pair.  Someone will have to choose one and it isn't clear whether
the
client or the MDS is the best one to do the choosing.

> To make it work, there will have to be some way of binding a
> layout (containing handles) to a particular connection to a
> particular storage device.

That sound like it requires a (new) bind operation, separate from the
creation of the
layout.  That is doable but it increases the complexity to define both a
protocol
extension and a pNFS layout type that work together.  Also, IIRC, RFC 8178
may require
a new minor version to do this.

An alternative is for the MDS to do the bind operation as part of assigning
the layout, which
it can do as long as it knows the connection.  One possibility is for the
to client to pass
a description of the chosen connection as a layout-hint.  Another is for
the MDS to be able
to find out the set of possible connections, havie it choose one, and let
the client know,
in the layout, which connection is associated with the layout.

> Also, if the connection to a DS is lost, more than a
> reconnect is necessary. The client will need to take steps to
> get the server to re-register the memory and send fresh
> handles.

The simplest way to deal with this, although maybe not the best, is to
consider the
connection break as  effectively revoking all associated layouts.  The
client needs
to get new layouts to replace the ones lost.

On Mon, Jul 17, 2017 at 5:49 AM, Chuck Lever <chucklever@gmail.com> wrote:

>
> > On Jul 3, 2017, at 01:10, Christoph Hellwig <hch@lst.de> wrote:
> >
> > As promised here is an initial cut at the RDMA layout before the
> > meeting in Prague.
> >
> > I have to admit it's not the highest quality draft, but I wanted to
> > get it out - it's probably unsuitable for readers without deep
> > knowledge of RDMA at this point.
> >
> > The idea of the layout is to provide RDMA READ / WRITE access to
> > remote memory regions - usually persistent memory in some form,
> > but to some extent it will also work with volatile caching of
> > data, e.g. in features like the NVMe controller memory buffer or
> > even host memory.  It is done by registering these regions on
> > the server and performing the RDMA READ / WRITE operations from
> > the client, that is it inverts the model used by RDMA RPC or
> > other storage models.
> >
> > Besides improving the spec language there still is a lot left to
> > be done:
> >
> > - define the exact connection establishment model.  I'd really
> >   like to rely on RDMA/CM for that
>
> The connection model is critical, because the handles returned
> to clients can work only on one connection (QP). For example,
> you can't assume that NFS/RDMA will be used to access the MDS,
> nor can you assume that the MDS and DS's are accessed through
> the same HCA/RNIC on the same connection.
>
> To make it work, there will have to be some way of binding a
> layout (containing handles) to a particular connection to a
> particular storage device.
>
> Also, if the connection to a DS is lost, more than a
> reconnect is necessary. The client will need to take steps to
> get the server to re-register the memory and send fresh
> handles.
>
>
> > - figure out if we can get rid of the sub-layout extent inherited
> >   from the block layout.  This should be possible by providing two
> >   handles in the layout
> > - find a way to future-proof for the introduction of a RDMA FLUSH
> >   or COMMIT operation, where we don't have to do a LAYOUTCOMMIT
> >   for every write
> >
> > ----- Forwarded message from internet-drafts@ietf.org -----
> >
> > Date: Sun, 02 Jul 2017 16:04:45 -0700
> > From: internet-drafts@ietf.org
> > Subject: New Version Notification for draft-hellwig-nfsv4-rdma-
> layout-00.txt
> > To: Christoph Hellwig <hch@lst.de>
> >
> >
> > A new version of I-D, draft-hellwig-nfsv4-rdma-layout-00.txt
> > has been successfully submitted by Christoph Hellwig and posted to the
> > IETF repository.
> >
> > Name:         draft-hellwig-nfsv4-rdma-layout
> > Revision:     00
> > Title:                Parallel NFS (pNFS) RDMA Layout
> > Document date:        2017-07-02
> > Group:                Individual Submission
> > Pages:                18
> > URL:            https://www.ietf.org/internet-
> drafts/draft-hellwig-nfsv4-rdma-layout-00.txt
> > Status:         https://datatracker.ietf.org/
> doc/draft-hellwig-nfsv4-rdma-layout/
> > Htmlized:       https://tools.ietf.org/html/draft-hellwig-nfsv4-rdma-
> layout-00
> > Htmlized:       https://datatracker.ietf.org/
> doc/html/draft-hellwig-nfsv4-rdma-layout-00
> >
> >
> > Abstract:
> >   The Parallel Network File System (pNFS) allows a separation between
> >   the metadata (onto a metadata server) and data (onto a storage
> >   device) for a file.  The RDMA Layout Type is defined in this document
> >   as an extension to pNFS to allow the use of RDMA Verbs operations to
> >   access remote storage, with a special focus on accessing byte
> >   addressable persistent memory.
> >
> >
> >
> >
> > Please note that it may take a couple of minutes from the time of
> submission
> > until the htmlized version and diff are available at tools.ietf.org.
> >
> > The IETF Secretariat
> >
> > ----- End forwarded message -----
> >
> > _______________________________________________
> > nfsv4 mailing list
> > nfsv4@ietf.org
> > https://www.ietf.org/mailman/listinfo/nfsv4
>
> --
> Chuck Lever
> chucklever@gmail.com
>
>
>
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4
>