Re: [nfsv4] Fwd: New Version Notification for draft-dnoveck-nfsv4-rpcrdma-rtrext-02.txt

David Noveck <davenoveck@gmail.com> Tue, 06 June 2017 14:33 UTC

Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A414C129B6C for <nfsv4@ietfa.amsl.com>; Tue, 6 Jun 2017 07:33:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.699
X-Spam-Level:
X-Spam-Status: No, score=-2.699 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wR187bXzmlae for <nfsv4@ietfa.amsl.com>; Tue, 6 Jun 2017 07:32:57 -0700 (PDT)
Received: from mail-it0-x230.google.com (mail-it0-x230.google.com [IPv6:2607:f8b0:4001:c0b::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2F796129C09 for <nfsv4@ietf.org>; Tue, 6 Jun 2017 07:32:52 -0700 (PDT)
Received: by mail-it0-x230.google.com with SMTP id m47so23110450iti.1 for <nfsv4@ietf.org>; Tue, 06 Jun 2017 07:32:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=FSsAUCGTA2qXaPU5yc/w1IV6XBQ13zxOiyI3sis2tlw=; b=aXW9JCEKFK5BuH2NRD1Na9g/jQz2TseSe0VSVUBZIvfnBQko9ZKDzRg50Y/DEzRLsi bp+7HVzZhaehYzLMIad3yk0Hfu+HSbXDeTajqiuSiyjusocoeiTtuhQWf93TUzKWOFgp hNE7hCYKOsvp2URUnNpcxBrAqsAFnQR4V2uqhlMxArPpiERiGPa1qnskFYJWANHtoTjv vZpvDeoqC6GzB0anx94jifRlBtPUYpgcn6zxYCeyDySEGqnKjQwY2kXFsSaxJHCQdMuz OlvO9FlZ8LU2+C24iJZXnU+sbIvChWGilGxAtDLBR9CmzeYXiZoAsnDRAu6T6NFvBuY1 w5/Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=FSsAUCGTA2qXaPU5yc/w1IV6XBQ13zxOiyI3sis2tlw=; b=mVhU2f0aR57qy4/QtfUC8pD017x8KEmBDFe/J5eKrBDqnS/5V9nwtmUtk17wG2Lj82 Ka4PJ6s9I3HZMAt36uptNZkpCGUojIjNx8gQA2iO4OcdRLWV383SecDzEnF5C6vMoMly Tmin7GRRMwB6HsZUwWoVpyj7ln7wrLiIMYd02OiIZCMXLWkl3LJI+10NFIfK80lU6Drz DYNIErcDQRUuCbAqSVNc499nxamAUKNnaaI/4T49ilauMYB3r2qKoS5Fv8sX1Ffb5EIP wManG+BBb1kCQGo/LUSMLMPDlhnwZwcZ/L82JSDrDEVhdcq8G++CfL35luWV/HKqLb3+ /15A==
X-Gm-Message-State: AODbwcBvGBztMVTZTQxpyJjpUb98J27NaBNtbURANZ99LtGos1TOjpra SB4iPKLuU0q6bmL2JN6Aqqjg2d0Z+Q==
X-Received: by 10.36.11.68 with SMTP id 65mr18573554itd.80.1496759571390; Tue, 06 Jun 2017 07:32:51 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.107.20.75 with HTTP; Tue, 6 Jun 2017 07:32:50 -0700 (PDT)
In-Reply-To: <20170606064252.GA14844@lst.de>
References: <149667468294.3266.7785272769313517872.idtracker@ietfa.amsl.com> <CADaq8jcf1zM5LGJngjT5q-FKxqVGJa-U_yyCdG78NDETp7-k3w@mail.gmail.com> <20170606064252.GA14844@lst.de>
From: David Noveck <davenoveck@gmail.com>
Date: Tue, 06 Jun 2017 10:32:50 -0400
Message-ID: <CADaq8jckXDOXp9p3266OznMCSu9=VWV7FX7wJp=V54P8ONTOUQ@mail.gmail.com>
To: Christoph Hellwig <hch@lst.de>
Cc: "Black, David" <David.Black@dell.com>, "nfsv4@ietf.org" <nfsv4@ietf.org>
Content-Type: multipart/alternative; boundary="001a1140c4ee912a7f05514b7ecc"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/bLMDYd0FOeIWN9zU20sGh5Ta8Ak>
Subject: Re: [nfsv4] Fwd: New Version Notification for draft-dnoveck-nfsv4-rpcrdma-rtrext-02.txt
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Jun 2017 14:33:01 -0000

> For transfers
> from the server to the client (e.g. NFSv4 READ) the client has to place
> the data into arbitrary buffers not controller by the NFS client,
> so send-based data placement will lead to data copies on the client,
> which will kill performance.

I'm clear that copies will kill performance, but I think you are wrong
about this
requiring copies.  The idea here is that the client will post as read
buffers, aligned
cache buffers not recently used.  The RNIC is free to pick one of those
buffers
when a send comes in.  so the buffers are not precisely controlled, as they
are
with DDP, but they are not arbitrary either.  They are buffers that the
client has
assigned to receiving block-aligned data.   This is basically the same as
the case in
wich the client picks less-recently used buffer to use when reading from
disk.  The
only difference is that, to accommodate the RNIC, the client is not in
control of which
read goes with which buffer (among the set of pending reads).   The client
can find
out which buffer got which block and map them to user addresses the same
way it does
with data read from block devices.

> That being said I've started drafting and prototyping a Verbs extension
> for a tagged RECV WR.  Right now it's just a software implementation,
> and there is conѕiderable hardware complexity if we want to implement
> it in HCA.  I could talk about this idea a bit in Prague, but I'm not
> sure about the venue -

Ultimately, you might wind up discussing this in a blue-sheet-less
environment.  For example, over a beer.

If you can send out the draft, I'd be interested in seeing it.  Others
probably
would as well.

> it's clearly not NFSv4 WG material,

Probably not, and given that the schedule is tight, there isn;t much point
trying
that.  Still, I think some people in the working group would be interested.

> and there
> doesn't seem to be a follow on to the STORM WG.  Ccing Dave Black if
> he has any idea.

I there should be some follow-on.  Maybe A BOF could be thought about for a
future meeting.


On Tue, Jun 6, 2017 at 2:42 AM, Christoph Hellwig <hch@lst.de> wrote:

> Hi David,
>
> send based data placement in it's current form only makes sense for
> transfers from the client to the server (e.g. NFSv4 WRITE).  For transfers
> from the server to the client (e.g. NFSv4 READ) the client has to place
> the data into arbitrary buffers not controller by the NFS client,
> so send-based data placement will lead to data copies on the client,
> which will kill performance.  I'm speaking from experience because we
> actually implemented this to verify the performance overhead for copies
> in early NVMe over Fabrics prototypes.
>
> That being said I've started drafting and prototyping a Verbs extension
> for a tagged RECV WR.  Right now it's just a software implementation,
> and there is conѕiderable hardware complexity if we want to implement
> it in HCA.  I could talk about this idea a bit in Prague, but I'm not
> sure about the venue - it's clearly not NFSv4 WG material, and there
> doesn't seem to be a follow on to the STORM WG.  Ccing Dave Black if
> he has any idea.
>
> Also I find the idea to split requests into multiple sends just to
> avoid RDMA ops very questionable - the latency advantage of SEND
> from the client to the server vs RDMA read from the server mostly
> matters to reduce the latency for small I/Os.  For large overheads
> the registration (or in fact usually unregistration) overhead does
> not matter in the grand scheme.  Last but not least modern RDMA
> protocols have show us that we should simplify credit management
> instead of making it even more complex as that has been one of the
> major sources of bugs.
>
> On Mon, Jun 05, 2017 at 11:15:04AM -0400, David Noveck wrote:
> > Submitted with a small set of revisions to avoid expiration:
> >
> >    - New contact information
> >    - Updated references
> >    - Changed some terminology.  "Send-based DDP" has become "send-based
> >    data placement", while "DDP" is limited to data placement using
> explicit
> >    RDMA operations.  Since there are corresponding XDR changes, this
> makes the
> >    diff large, but the actual extensions is pretty much unchanged
> >
> >
> >
> > ---------- Forwarded message ----------
> > From: <internet-drafts@ietf.org>
> > Date: Mon, Jun 5, 2017 at 10:58 AM
> > Subject: New Version Notification for
> > draft-dnoveck-nfsv4-rpcrdma-rtrext-02.txt
> > To: David Noveck <davenoveck@gmail.com>
> >
> >
> >
> > A new version of I-D, draft-dnoveck-nfsv4-rpcrdma-rtrext-02.txt
> > has been successfully submitted by David Noveck and posted to the
> > IETF repository.
> >
> > Name:           draft-dnoveck-nfsv4-rpcrdma-rtrext
> > Revision:       02
> > Title:          RPC-over-RDMA Extensions to Reduce Internode Round-trips
> > Document date:  2017-06-05
> > Group:          Individual Submission
> > Pages:          44
> > URL:            https://www.ietf.org/internet-
> drafts/draft-dnoveck-nfsv4-
> > rpcrdma-rtrext-02.txt
> > Status:         https://datatracker.ietf.org/doc/draft-dnoveck-nfsv4-
> > rpcrdma-rtrext/
> > Htmlized:       https://tools.ietf.org/html/draft-dnoveck-nfsv4-rpcrdma-
> > rtrext-02
> > Htmlized:       https://datatracker.ietf.org/
> doc/html/draft-dnoveck-nfsv4-
> > rpcrdma-rtrext-02
> > Diff:           https://www.ietf.org/rfcdiff?url2=draft-dnoveck-nfsv4-
> > rpcrdma-rtrext-02
> >
> > Abstract:
> >    It is expected that a future version of the RPC-over-RDMA transport
> >    will allow protocol extensions to be defined.  This would provide for
> >    the specification of OPTIONAL features allowing participants who
> >    implement such features to cooperate as specified by that extension,
> >    while still interoperating with participants who do not support that
> >    extension.
> >
> >    A particular extension is described herein, whose purpose is to
> >    reduce the latency due to inter-node round-trips needed to effect
> >    operations which involve direct data placement or which transfer RPC
> >    messages longer than the fixed inline buffer size limit.
> >
> >
> >
> >
> > Please note that it may take a couple of minutes from the time of
> submission
> > until the htmlized version and diff are available at tools.ietf.org.
> >
> > The IETF Secretariat
>
> > _______________________________________________
> > nfsv4 mailing list
> > nfsv4@ietf.org
> > https://www.ietf.org/mailman/listinfo/nfsv4
>
> ---end quoted text---
>