Re: [nfsv4] draft-ietf-nfsv4-rpcrdma-bidirection-03 review

David Noveck <davenoveck@gmail.com> Fri, 03 June 2016 18:31 UTC

Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C217512D86D for <nfsv4@ietfa.amsl.com>; Fri, 3 Jun 2016 11:31:27 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.699
X-Spam-Level:
X-Spam-Status: No, score=-2.699 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WvWFewLW7j-i for <nfsv4@ietfa.amsl.com>; Fri, 3 Jun 2016 11:31:24 -0700 (PDT)
Received: from mail-oi0-x236.google.com (mail-oi0-x236.google.com [IPv6:2607:f8b0:4003:c06::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1BFD212D7B0 for <nfsv4@ietf.org>; Fri, 3 Jun 2016 11:31:24 -0700 (PDT)
Received: by mail-oi0-x236.google.com with SMTP id e72so140036351oib.1 for <nfsv4@ietf.org>; Fri, 03 Jun 2016 11:31:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=0P+IWr+pRq230sFiVkMvFyvSqGil395k1ByPMoQG5WA=; b=V9he6yfKtWjoxDmEP4n0oHKDgGR5c9dI3E0jvfhlEFqOkzUlQ9sHgOnemxDcmZUSGB J0hEK6lRgnp36sdbbLq0tl5G54a/VQCsKMWr1wX9Aw+ZpDbVvl7Lyp717jjjuEXe58I3 7X8UIaAW/5AWPZfmGyuU7U+0z6y74IEf6IIjfsv07r+zwMoBqMeMs4e8JQ9HcFI0iQzN k992JNpmUbeCFYzsdreWR8kJ0WY3Up2U1Xgy5TSL8PVlFBEdDJSFXrhIGdEFn5jq/pvC g3cVQf5aZ52MAfdTBjfcAwYN/+kKWT4kvPOHwf/QfrzD6tySlhfBXe1PLXcIhqZ1Wz7l 1RFA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=0P+IWr+pRq230sFiVkMvFyvSqGil395k1ByPMoQG5WA=; b=E1U8HCN2b9ZLrv42exSue6AJIH2SkDTqx55ulCDbOaMlw3A3PN1LSowS1KDUv32WxM +aU5mMbRsnZ4+tLq1w/rq9K/8i3DbTlXaRM3kVSjfQ3FsCwO1cW3rYo247GVwlVgt3iW xRpzpg9Ntk6/NFZ54qPosw1F8N8QW3lyUEQ7Kr28cZiMCGN0EWYQdxaWF+1MWUcLiS+C bKshSKrYiIbfqyc+0Lnk3+ZEtbHp6EYMPRejzvgJQJPVcyDiqhUUJEETdX54yvZRJpsn XowJKIcXNm1Bs2XXP/ZDHysLmSSbvg7Lg2vbG3dBDx6nbC71Df5Q6I9K6wn0hemfZ+iC Nvzw==
X-Gm-Message-State: ALyK8tLsOgmKH3Es1IgyJg2QBPCOsxxFxa/Hh9UHaN70dMLR7Fjyoz4Ik6sGx5+EESpizGCbxEs2NzY+H+nSyA==
X-Received: by 10.202.221.6 with SMTP id u6mr2671196oig.51.1464978683103; Fri, 03 Jun 2016 11:31:23 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.182.29.166 with HTTP; Fri, 3 Jun 2016 11:31:21 -0700 (PDT)
In-Reply-To: <B338F553-B74E-49C2-A638-2691B2A8E86D@oracle.com>
References: <6da90b6b-bb58-d241-0d74-dc421358c97c@oracle.com> <9ECFBBD5-9359-46AB-B1CC-7FCBF06C40A8@oracle.com> <f609eba3-4294-37ae-3bb8-c7df8f648bb0@oracle.com> <0E79D0E4-9D53-4ED4-8D17-2D806C56648F@oracle.com> <567848d1-d854-ef70-8fba-33708e7e0601@oracle.com> <E2B31EDF-74D6-4CC8-8F6C-674C85498B56@oracle.com> <E0C18ECC-7D15-447F-9DA7-654E1EBF6C3B@oracle.com> <CADaq8jcgW316nLwA3LnCAmL7nAY3o6XjeQLCkV-S_Sps9g+LJw@mail.gmail.com> <4E8C421F-1A22-413C-AA2E-833C71AC6F71@oracle.com> <CADaq8jdVjt6x0MNgc7g0HCvQ9tR6yC01AdSPCiqHmEn8CMPscw@mail.gmail.com> <CADaq8jcHk4PzxhGLtsK7mBEuh0J3T54Bn3PFd==X5cdoB9pGSQ@mail.gmail.com> <46B6E3E0-4598-4A54-A091-D590BF084B7F@oracle.com> <CADaq8jeBFLT4QB9=jY0OywbDDon0eUSrafz3nDVxcORNvwpDbg@mail.gmail.com> <CADaq8jc2hEg+sk7ADPoaFKmvCQ_xWjAkP1amWz9PJDHRSnCvHw@mail.gmail.com> <B338F553-B74E-49C2-A638-2691B2A8E86D@oracle.com>
From: David Noveck <davenoveck@gmail.com>
Date: Fri, 03 Jun 2016 14:31:21 -0400
Message-ID: <CADaq8jeybG3fx4evUhKDxXkVfpbiVSm1td9mvrN=RwrqqFtiwg@mail.gmail.com>
To: Chuck Lever <chuck.lever@oracle.com>
Content-Type: multipart/alternative; boundary="001a113ce256028eee053463eee6"
Archived-At: <http://mailarchive.ietf.org/arch/msg/nfsv4/v-t06TrPZmJuVuGYxChXjygjpN0>
Cc: NFSv4 <nfsv4@ietf.org>
Subject: Re: [nfsv4] draft-ietf-nfsv4-rpcrdma-bidirection-03 review
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Jun 2016 18:31:28 -0000

> I concur that rfc5666bis and rfc5666-implementation-experience
> are ready to move forward.

I guess the implication is that bidirection is not ready and I'm unclear
about the reason for that.

> However, I've been thinking about an earlier comment Dave made
> in this thread:

> > My only concern is with regard to a possible future extension in which
multiple RPC messages are carried in a single SEND (i.e., the precise
opposite of message continuation).
> > When the language for the new field is drafted, we should make sure
that it doesn't assume all the messages in a SEND are all related to a
single direction of operation.

> There seem to be two related issues, and one of them has
> ramifications for rpcrdma-bidirection.

A few things I'd like to note here:

   - As far as I'm concerned, I only raised a single issue.
   - My issue is only with regard to how to deal with a potential Version
   Two extension issue.
   - When I said it was my "only concern" (emphasis added), I hoped to
   indicate that, as a potential future extension, this should not hold us up
   now.
   - Although it may relate to bidirection in general, I don't think it
   affects use of it in the Version one case.



> I. Ambiguity of the meaning of the credit field
>
> In the rfc5666bis world, the credit field in all messages
> flowing on a connection had an unambiguous meaning. If
> the message was going from requester to responder, the
> credit field was a credit request. In the other direction,
> it was a credit grant.

That remains the case, even with bidirection.  Your last
two sentences are still true.

The problem is that, when bidirection is in effect, it is not
always clear who is the requester and who is the responder:
The sendr always knows but the receiver might not know and,
in case where a message is not decipherable, the question
might not be answerable.

So,

   - With only forward direction operation, If the message was going from
   client to server, the credit field is a credit request. In the other
   direction, it is a credit grant.
   - With bidirectional operation, the corresponding statement with
   requester/responder is still true, but the code in the client and server
   whenthey receive a message do not know if thety are action as requester or
   responder :-(

> rpcrdma-bidirection introduced the idea of two separate
> credit flows, which depend on whether the message was
> part of forward RPC operation or backward RPC operation
> on that connection.

> This is the source of the problem.

It solved one problem, but it created another.  The question we have is
what to do about it.

> The meaning of the credit field depends on a field in
> the upper layer, so it's a layering violation, to say
> the least.

I agree that it is layering violation and it may even be a layering
misdemeanor.
I don't think it is a layering felony, and I hope we can avoid any
Draconian penalty.

> Also, the original concept of credit was to manage RDMA
> receive buffers, but rpcrdma-bidirection interpretation
> of the credit value is one-credit-per-RPC.

If it switched to that, I think of it as a purely verbal mistake, which is
understandable
given the context, i.e. that we are in a one-message-per-RPC environment.
In any case,
I don't see how that shift is related to bidirectional operation.

> Now if we want to add support for partial RPC message
> per credit (message continuation)

I think we do.  I discussed doing that in
draft-dnoveck-nfsv4-rprcrdma-rtissuues-00.

I will discuss the credit management issues in
draft-dnoveck-nfsv4-rocrdma-rtrext-00,
but I'm not sure if I'll be able to submit that before IETF96.

> or multiple RPC
> messages per credit (as proposed above),

I haven't actually proposed that but I agree that we don't want to
foreclose this option.

> or NO RPC
> message per credit (for RDMA2_OPTIONAL control messages)

I actually have proposed messages that don't carry RPC in
draft-dnoveck-nfsv4-rpcrdma-xcharext-00.

I assumed that they required a credit because receiving the message
requires a buffer and there
is no way to avoid the fact that a buffer (and this a credit) is used up.
This is true whether the credit
*field *is a request or a grant or is ignored.

> the credit field is unusable.

My formulation would be "problematic for some classes of future extensions
to the next RPC-over-RDMA
version".  That's light-years (or even megaparsecs) away from "unusable".

> Would we add an independent pool of credits for each of
> these transmission mechanisms? Probably not.

I agree that that is a REAL BAD IDEA.

> A logical course of action, then, would be to alter the
> rpcrdma-bidirection I-D so that forward and backward
> direction operation use the same pool of credits.

I think this is a possible course of action.  I don't think it is
logical because:

   - The potential ambiguity you are worried about is between credit and
   grant, and having a single pool wouldn't solve that.
   - There is no ambiguity about which credit pool you are dealing with, so
   this, not being broken, doesn't require a fix.
   - There are design/supplementation issues that you note below that may
   make this infeasible


> The question is whether it is feasible for the two
> directions to share the credit pools without deadlocking.
> Some prototyping and/or careful thought would be needed
> to answer it.

> (Spencer, I may be changing my mind about the readiness
> of rpcrdma-bidirection).

You're definitely changing your mind.  The only question is whether that is
done an odd or an even number of times.


> II. Ambiguity of the meaning of the XID field

I think this issue has no implications for rpcrdma-bidirection.

However, you have convinced me draft-cel-nfsv4-rpcrdma-version-two is not
ready for WGLC :-)


> Today we have a one-to-one match between the XID value
> in the RPC-over-RDMA header and the XID associated with
> the RPC payload in that message.

Right.

> With an extension for message continuation, there is
> still a one-to-one match: the RPC-over-RDMA XID can
> match the XID of the partial RPC payload.

Yes.  We are OK for all currently-proposed and those likely to be
proposed in the next several months.

> But for multiple RPC payloads per Send, or no RPC
> payloads per Send (control messages), we no longer have
> a match. How should the RPC-over-RDMA header's XID field
> be set in those cases?

In general, I would leave that up the specific extensions.

> RFC 5666 and rfc5666bis have similar language regarding
> how the XID field in the RPC-over-RDMA header is to be
> set. Here's rfc5666bis, Section 5.2.1:

> >    The
> >    receiver MAY perform its processing based solely on the XID in the
> >    RPC-over-RDMA header, and thereby ignore the XID in the RPC message,
> >    if it so chooses.


> In other words, implementations are already allowed to
> rely on the value of the RPC-over-RDMA header's XID
> field, and ignore the XID field in the payload.

Version One implementations may do that.  It make sense,
for Version Two implementations to have the same rules when they
are using the subset of Version Two that corresponds to Version

> This issue has to be addressed for rpcrdma-version-two
> to support extensions that enable multiple or no RPCs
> per message.

I think it should be addressed.  I don't think it has to.

> I propose that here, an independent XID space for
> RDMA2_OPTIONAL messages makes sense.

I'm OK with that as long as individual extensions have enough
flexibility in how they use this field.

> This would allow
> receivers to match RDMA2_OPTIONAL call and reply messages,
> but would not tie the header's XID value to the upper
> layer payload in any way.

It is not clear to me that we necessarily have to have
RDMA2_OPTIONAL call and reply messages.  We might have a
RDMA2_OPTIONAL Message that is used to transport a set of calls or
a set of replies or a mixture of the two.   An RDMA2_OPTIONAL message
that carries a set of calls is not itself necessarily a call itself and it
doesn't require
a reply, although each of the calls it sends may require replies (which may
or
may not come via RDMA2_OPTIONAL messages).

Note that if we do this, we may wind up with two xids used to send an
NFSv4.1
COMPOUND, where the basic function of xid, is, ironically enough, taken over
by slotid/sequence pair, which just happens to do the job better.

On Fri, Jun 3, 2016 at 11:04 AM, Chuck Lever <chuck.lever@oracle.com> wrote:

>
> > On Jun 3, 2016, at 10:33 AM, David Noveck <davenoveck@gmail.com> wrote:
> >
> > A week ago I wrote:
> > > The problem is that, in a multi-direction context, a given
> implementation only knows whether it is a client or
> > > a server.  It only knows if it is a requester or a responder based on
> the message it is processing at any given
> > > instant and if it can't figure that out, it is unsure whether it is a
> requester or responder and may be neither at
> > > particular times.
> >
> > It's now been at least two weeks since last call was supposed to end for
> the Version One RDMA documents
> > and a week since Chuck posted any necessary updates.  I'm not sure why
> the documents are still listed as in
> > WGLC, but I wanted to make it clear that my comments above should not be
> holding these up.
> >
> > Those comments refer to a problem that is inherent in Version One  and
> cannot be fixed within the constraints
> > of a Version One no-XDR-change update.  Since the Working group has
> decided to do a Version One no-XDR-change
> > update, the three existing RDMA documents should now go forward.  I
> don;t believe there is any disagreement on
> > that point.
>
> I concur that rfc5666bis and rfc5666-implementation-experience
> are ready to move forward.
>
> However, I've been thinking about an earlier comment Dave made
> in this thread:
>
> > My only concern is with regard to a possible future extension in which
> multiple RPC messages are carried in a single SEND (i.e., the precise
> opposite of message continuation).  When the language for the new field is
> drafted, we should make sure that it doesn't assume all the messages in a
> SEND are all related to a single direction of operation.
>
>
> There seem to be two related issues, and one of them has
> ramifications for rpcrdma-bidirection.
>
>
> I. Ambiguity of the meaning of the credit field
>
> In the rfc5666bis world, the credit field in all messages
> flowing on a connection had an unambiguous meaning. If
> the message was going from requester to responder, the
> credit field was a credit request. In the other direction,
> it was a credit grant.
>
> rpcrdma-bidirection introduced the idea of two separate
> credit flows, which depend on whether the message was
> part of forward RPC operation or backward RPC operation
> on that connection.
>
> This is the source of the problem.
>
> The meaning of the credit field depends on a field in
> the upper layer, so it's a layering violation, to say
> the least.
>
> Also, the original concept of credit was to manage RDMA
> receive buffers, but rpcrdma-bidirection interpretation
> of the credit value is one-credit-per-RPC.
>
> Now if we want to add support for partial RPC message
> per credit (message continuation) or multiple RPC
> messages per credit (as proposed above), or NO RPC
> message per credit (for RDMA2_OPTIONAL control messages)
> the credit field is unusable.
>
> Would we add an independent pool of credits for each of
> these transmission mechanisms? Probably not.
>
> A logical course of action, then, would be to alter the
> rpcrdma-bidirection I-D so that forward and backward
> direction operation use the same pool of credits.
>
> The question is whether it is feasible for the two
> directions to share the credit pools without deadlocking.
> Some prototyping and/or careful thought would be needed
> to answer it.
>
> (Spencer, I may be changing my mind about the readiness
> of rpcrdma-bidirection).
>
>
> II. Ambiguity of the meaning of the XID field
>
> Today we have a one-to-one match between the XID value
> in the RPC-over-RDMA header and the XID associated with
> the RPC payload in that message.
>
> With an extension for message continuation, there is
> still a one-to-one match: the RPC-over-RDMA XID can
> match the XID of the partial RPC payload.
>
> But for multiple RPC payloads per Send, or no RPC
> payloads per Send (control messages), we no longer have
> a match. How should the RPC-over-RDMA header's XID field
> be set in those cases?
>
> RFC 5666 and rfc5666bis have similar language regarding
> how the XID field in the RPC-over-RDMA header is to be
> set. Here's rfc5666bis, Section 5.2.1:
>
> >    The
> >    receiver MAY perform its processing based solely on the XID in the
> >    RPC-over-RDMA header, and thereby ignore the XID in the RPC message,
> >    if it so chooses.
>
>
> In other words, implementations are already allowed to
> rely on the value of the RPC-over-RDMA header's XID
> field, and ignore the XID field in the payload.
>
> This issue has to be addressed for rpcrdma-version-two
> to support extensions that enable multiple or no RPCs
> per message.
>
> I propose that here, an independent XID space for
> RDMA2_OPTIONAL messages makes sense. This would allow
> receivers to match RDMA2_OPTIONAL call and reply messages,
> but would not tie the header's XID value to the upper
> layer payload in any way.
>
>
> --
> Chuck Lever
>
>
>