Re: [nfsv4] draft-ietf-nfsv4-rpcrdma-bidirection-03 review

David Noveck <davenoveck@gmail.com> Sat, 04 June 2016 10:35 UTC

Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1B97B12D1DB for <nfsv4@ietfa.amsl.com>; Sat, 4 Jun 2016 03:35:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.699
X-Spam-Level:
X-Spam-Status: No, score=-2.699 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AZJSl5FL0SkL for <nfsv4@ietfa.amsl.com>; Sat, 4 Jun 2016 03:35:30 -0700 (PDT)
Received: from mail-oi0-x236.google.com (mail-oi0-x236.google.com [IPv6:2607:f8b0:4003:c06::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 906B412D1D6 for <nfsv4@ietf.org>; Sat, 4 Jun 2016 03:35:30 -0700 (PDT)
Received: by mail-oi0-x236.google.com with SMTP id w184so162740925oiw.2 for <nfsv4@ietf.org>; Sat, 04 Jun 2016 03:35:30 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=KkUati8gstCXFSCbXo++vX1Rw0bRq+8lsAiFCyEHpEs=; b=rE5ih/bC9/nV1nfcDRsQ5XZRVHCVU6S4oQNlKEqTn5I+PTPc+9gOb1MRYHwtIz8VXb g8HFPAhbXSJBTiQkeFzhnDsrDnqLxhrxcTwTzbS5Ni/WTh31tr7OSlANtgeomM6ngn4S 1zkMbhdZxO+9E8Z5/jLelPO4CltWkOkgR3wjMD5G/WCej8ZiNbJCi+m6025V+jup+88p Pa9BHOi5Z/HDnHLOY2dbn4xuESHiDEH8Liy3Gq0c2En/4B3ljgejeUeJrJWqzfSVRFNw 7/G8HF0Uq97zwUOsIiv42yVt+dmJWN+PI806nRzTgFlTuyStKMKxR/xy90BnflMgjjPS f0qw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=KkUati8gstCXFSCbXo++vX1Rw0bRq+8lsAiFCyEHpEs=; b=iCPpJGiHaCbAEfkqIWqMoW1pr85b8SRX2YIevcmfEboFUa/jSsjyxF5yCzrj95n5gA DZ8Wq1xQbn3nJGIQgCeKa8OAf/haHGyuWMSIfc5jxfcXTDjMQwUYrzJzwBJ5XY4O4Ulj cGWWA1jaFvAHhjKIxAewuk3yCPeUfnws//WYH7PlUy33GTtEwRBfzLjYPv41R8f5HPAz jGf99PxJg3bZYciGvtZzewLT0W+QiBcXH4iBfpY43+/zh+Dm69h/fdYbUkuOP2IfkTer T7NYt/3qUHWoL7IxG48aJZZnTZAd47fEgCiNPgUQi3D6E0h/yXJo6JAYqoA0E3ZyMZhu ryQg==
X-Gm-Message-State: ALyK8tJgZthrVIsr0h8rUQD18gr4PXhVxk1HUxzWTMSGGJN2UNj/qDOv5uwg5HgoRviaslcuAvmpV0SbKxKo+A==
X-Received: by 10.202.221.6 with SMTP id u6mr4252587oig.51.1465036529675; Sat, 04 Jun 2016 03:35:29 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.182.29.166 with HTTP; Sat, 4 Jun 2016 03:35:28 -0700 (PDT)
In-Reply-To: <46F55B8B-5266-4472-BD30-D9FAEF2265AA@oracle.com>
References: <6da90b6b-bb58-d241-0d74-dc421358c97c@oracle.com> <9ECFBBD5-9359-46AB-B1CC-7FCBF06C40A8@oracle.com> <f609eba3-4294-37ae-3bb8-c7df8f648bb0@oracle.com> <0E79D0E4-9D53-4ED4-8D17-2D806C56648F@oracle.com> <567848d1-d854-ef70-8fba-33708e7e0601@oracle.com> <E2B31EDF-74D6-4CC8-8F6C-674C85498B56@oracle.com> <E0C18ECC-7D15-447F-9DA7-654E1EBF6C3B@oracle.com> <CADaq8jcgW316nLwA3LnCAmL7nAY3o6XjeQLCkV-S_Sps9g+LJw@mail.gmail.com> <4E8C421F-1A22-413C-AA2E-833C71AC6F71@oracle.com> <CADaq8jdVjt6x0MNgc7g0HCvQ9tR6yC01AdSPCiqHmEn8CMPscw@mail.gmail.com> <CADaq8jcHk4PzxhGLtsK7mBEuh0J3T54Bn3PFd==X5cdoB9pGSQ@mail.gmail.com> <46B6E3E0-4598-4A54-A091-D590BF084B7F@oracle.com> <CADaq8jeBFLT4QB9=jY0OywbDDon0eUSrafz3nDVxcORNvwpDbg@mail.gmail.com> <CADaq8jc2hEg+sk7ADPoaFKmvCQ_xWjAkP1amWz9PJDHRSnCvHw@mail.gmail.com> <B338F553-B74E-49C2-A638-2691B2A8E86D@oracle.com> <CADaq8jeybG3fx4evUhKDxXkVfpbiVSm1td9mvrN=RwrqqFtiwg@mail.gmail.com> <46F55B8B-5266-4472-BD30-D9FAEF2265AA@oracle.com>
From: David Noveck <davenoveck@gmail.com>
Date: Sat, 04 Jun 2016 06:35:28 -0400
Message-ID: <CADaq8jf17P41ft2mnFhSEWi6oM0-D7KUEshdUof4SnQUUSbaWg@mail.gmail.com>
To: Chuck Lever <chuck.lever@oracle.com>
Content-Type: multipart/alternative; boundary="001a113ce256ef339c0534716574"
Archived-At: <http://mailarchive.ietf.org/arch/msg/nfsv4/xuDvucTZQEyTb2-ZdT5Dnptw2Dg>
Cc: NFSv4 <nfsv4@ietf.org>
Subject: Re: [nfsv4] draft-ietf-nfsv4-rpcrdma-bidirection-03 review
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 04 Jun 2016 10:35:34 -0000

> Alright, then, "unreliable" might be closer to my
> intended meaning.

I don't see any unreliability either.  I think we have
something which is difficult to generalize, because,
as you point out, it has an unfortunate layering issue,
which we are unable to fix in Version One because it
is embedded in the Version One XDR.  We can fix it in subsequent versions
where we can change the XDR but we have to take extra care because a lot of
the layering confusion is located in the four words that we have defined as
not subject to change :-(

> I disagree with that. Having a single credit flow, with
> fixed roles of which side sends a request, and which
> sends a grant, solves the problem simply and fully, for
> all message types.

It solves the ambiguity problem but it creates a bunch of
new ones.  I image that the first is interoperability with existing
implementations, who I assume are using the model in rpcrdma-bidirection as
it is today.

I don't see how one can implement bidirectional operation with a single
credit pool.  The reason for credits is to ensure that there are sufficient
preposted receives to correspond to sends that the peer makes.  Since there
is a set of receives on each peer node then there needs to be a
corresponding credit pool for each direction.  I don't see how you can have
a situation in which both client and server may be requesters
and have them share a single credit pool.

> It is also possible that this issue could be resolved
> with language in rpcrdma-version-two, and allow the
> one-credit-per-RPC concept to linger in V1 only.

Or alternatively, it could apply to both V1 and those
parts of V2 that use the existing V1 message types.

> I think that would mean rpcrdma-bidirection is turning
> into a V1-only enhancement. Bidirection would have
> to be partially or fully specified again in rpcrdma-
> version-two.

I don't think you need to do that, if you structure/layer things properly
(see below for an approach).  Then any
further specification of bidirection for version two would  only address
how the new classes of features we want to enable in version two would
interact with the existing bidirectional operation.  In doing that we could
use our
freedom to define the needed new XDR to avoid propagating the layering
issue that you have identified.

> Having multiple different ways to do backward
> direction operation is onerous for implementations.

Agree.

> So I'd prefer to have this addressed in rfc5666bis and
> rpcrdma-bidirection if we can muster it.

So here's the proposed division of responsibility:

   - The responsibility for defining the basic structure of bidirectional
   operation belongs to rpcrdma-bidirection.  This includes the use of two
   separate credit pools and the fact each message may contain either a
   request or a grant applying to the credit pool owned by the message
   receiver.
   - The responsibility for defining how it is to be determined, whether
   any particular message has a grant or a request in the credit field is the
   responsibility of the specific RPC-over-RDMA version being used.
   - The responsibility of deciding whether bidirectional operation is
   OPTIONAL, REQUIRED, or not allowed  is the responsibility of the specific
   RPC-over-RDMA version being used.
   - The text that now defines how the credit field for specfic Version One
   messages is interpreted as illustrative of version one only.
   - draft-ietf-nfsv4-rpcrdma-bidirection explicitly takes on the
   responsibility for defining this for Version Two.  We have some  issues to
   resolve which I'll address in a later message.




On Fri, Jun 3, 2016 at 3:34 PM, Chuck Lever <chuck.lever@oracle.com> wrote:

>
> > On Jun 3, 2016, at 2:31 PM, David Noveck <davenoveck@gmail.com> wrote:
> >
> > > I concur that rfc5666bis and rfc5666-implementation-experience
> > > are ready to move forward.
> >
> > I guess the implication is that bidirection is not ready and I'm unclear
> about the reason for that.
> >
> > > However, I've been thinking about an earlier comment Dave made
> > > in this thread:
> >
> > > > My only concern is with regard to a possible future extension in
> which multiple RPC messages are carried in a single SEND (i.e., the precise
> opposite of message continuation).
> > > > When the language for the new field is drafted, we should make sure
> that it doesn't assume all the messages in a SEND are all related to a
> single direction of operation.
> >
> > > There seem to be two related issues, and one of them has
> > > ramifications for rpcrdma-bidirection.
> >
> > A few things I'd like to note here:
> >       • As far as I'm concerned, I only raised a single issue.
> >       • My issue is only with regard to how to deal with a potential
> Version Two extension issue.
> >       • When I said it was my "only concern" (emphasis added), I hoped
> to indicate that, as a potential future extension, this should not hold us
> up now.
> >       • Although it may relate to bidirection in general, I don't think
> it affects use of it in the Version one case.
>
> I. relates to the Version One case because we may want
> to rewrite parts of the rpcrdma-bidirection I-D, which
> has a direct impact on NFSv4.1 on RPC-over-RDMA V1.
>
> And recall that we stated earlier that rpcrdma-bidirection
> was not meant to apply solely to V1. The current V2
> specification requires support for RPC bidirection as
> specified in rpcrdma-bidirection.
>
>
> > > I. Ambiguity of the meaning of the credit field
> > >
> > > In the rfc5666bis world, the credit field in all messages
> > > flowing on a connection had an unambiguous meaning. If
> > > the message was going from requester to responder, the
> > > credit field was a credit request. In the other direction,
> > > it was a credit grant.
> >
> > That remains the case, even with bidirection.  Your last
> > two sentences are still true.
>
> > The problem is that, when bidirection is in effect, it is not
> > always clear who is the requester and who is the responder:
> > The sendr always knows but the receiver might not know and,
> > in case where a message is not decipherable, the question
> > might not be answerable.
> >
> > So,
> >       • With only forward direction operation, If the message was going
> from client to server, the credit field is a credit request. In the other
> direction, it is a credit grant.
> >       • With bidirectional operation, the corresponding statement with
> requester/responder is still true, but the code in the client and server
> whenthey receive a message do not know if thety are action as requester or
> responder :-(
> > > rpcrdma-bidirection introduced the idea of two separate
> > > credit flows, which depend on whether the message was
> > > part of forward RPC operation or backward RPC operation
> > > on that connection.
>
> There may be problematic language in rfc5666bis as well.
>
> RFC 5666 Section 3.3 attaches credit requests to RPC Call
> messages and credit grants to RPC Reply messages.
>
> The updated language in rfc5666bis has this: credit
> request and grant is described in Sections 4.3.1 and 5.2.3
> as being tied to requester and responder.
>
> So, this is a problem not only for bidirectional operation
> but also for any type of message where there are multiple
> or no RPC messages associated with the RPC-over-RDMA
> message.
>
> My proposal is this:
>
> The roles of client (active connector) and server (passive
> accepter) are always the same for a given connection, and
> are independent of direction of RPC operation.
>
> Thus there is never any ambiguity about what the credit
> field means: the active side always makes requests, and
> the passive side always makes grants.
>
>
> > > This is the source of the problem.
> >
> > It solved one problem, but it created another.  The question we have is
> > what to do about it.
>
> Agreed.
>
>
> > > The meaning of the credit field depends on a field in
> > > the upper layer, so it's a layering violation, to say
> > > the least.
> >
> > I agree that it is layering violation and it may even be a layering
> misdemeanor.
> > I don't think it is a layering felony, and I hope we can avoid any
> > Draconian penalty.
> >
> > > Also, the original concept of credit was to manage RDMA
> > > receive buffers, but rpcrdma-bidirection interpretation
> > > of the credit value is one-credit-per-RPC.
> >
> > If it switched to that, I think of it as a purely verbal mistake, which
> is understandable
> > given the context, i.e. that we are in a one-message-per-RPC
> environment.  In any case,
> > I don't see how that shift is related to bidirectional operation.
>
> Given that the one-credit-per-RPC language appears as
> early as RFC 5666, perhaps you are correct.
>
>
> > > Now if we want to add support for partial RPC message
> > > per credit (message continuation)
> >
> > I think we do.  I discussed doing that in
> draft-dnoveck-nfsv4-rprcrdma-rtissuues-00.
> >
> > I will discuss the credit management issues in
> draft-dnoveck-nfsv4-rocrdma-rtrext-00,
> > but I'm not sure if I'll be able to submit that before IETF96.
> >
> > > or multiple RPC
> > > messages per credit (as proposed above),
> >
> > I haven't actually proposed that but I agree that we don't want to
> foreclose this option.
> >
> > > or NO RPC
> > > message per credit (for RDMA2_OPTIONAL control messages)
> >
> > I actually have proposed messages that don't carry RPC in
> draft-dnoveck-nfsv4-rpcrdma-xcharext-00.
>
> Sorry, the above "if" was a rhetorical "if". I do
> presume that all three proposals are interesting to
> pursue.
>
>
> > I assumed that they required a credit because receiving the message
> requires a buffer and there
> > is no way to avoid the fact that a buffer (and this a credit) is used
> up.  This is true whether the credit
> > field is a request or a grant or is ignored.
> >
> > > the credit field is unusable.
> >
> > My formulation would be "problematic for some classes of future
> extensions to the next RPC-over-RDMA
> > version".  That's light-years (or even megaparsecs) away from "unusable".
>
> Alright, then, "unreliable" might be closer to my
> intended meaning.
>
>
> > > Would we add an independent pool of credits for each of
> > > these transmission mechanisms? Probably not.
> >
> > I agree that that is a REAL BAD IDEA.
> >
> > > A logical course of action, then, would be to alter the
> > > rpcrdma-bidirection I-D so that forward and backward
> > > direction operation use the same pool of credits.
> >
> > I think this is a possible course of action.  I don't think it is
> > logical because:
> >       • The potential ambiguity you are worried about is between credit
> and grant, and having a single pool wouldn't solve that.
> >       • There is no ambiguity about which credit pool you are dealing
> with, so this, not being broken, doesn't require a fix.
>
> I disagree with that. Having a single credit flow, with
> fixed roles of which side sends a request, and which
> sends a grant, solves the problem simply and fully, for
> all message types.
>
>
> >       • There are design/supplementation issues that you note below that
> may make this infeasible
> >
> > > The question is whether it is feasible for the two
> > > directions to share the credit pools without deadlocking.
> > > Some prototyping and/or careful thought would be needed
> > > to answer it.
>
> It is also possible that this issue could be resolved
> with language in rpcrdma-version-two, and allow the
> one-credit-per-RPC concept to linger in V1 only.
>
> I think that would mean rpcrdma-bidirection is turning
> into a V1-only enhancement. Bidirection would have to
> be partially or fully specified again in
> rpcrdma-version-two.
>
> Having multiple different ways to do backward direction
> operation is onerous for implementations.
>
> So I'd prefer to have this addressed in rfc5666bis and
> rpcrdma-bidirection if we can muster it.
>
>
> > > II. Ambiguity of the meaning of the XID field
> >
> > I think this issue has no implications for rpcrdma-bidirection.
>
> I agree. We can set this one aside for the moment.
>
>
> --
> Chuck Lever
>
>
>
>