Re: [nfsv4] Tsvart early review of draft-ietf-nfsv4-rpcrdma-version-two-05

David Noveck <davenoveck@gmail.com> Tue, 02 November 2021 15:50 UTC

Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D95F03A03EA; Tue, 2 Nov 2021 08:50:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id w8qZIIZkLUqM; Tue, 2 Nov 2021 08:50:24 -0700 (PDT)
Received: from mail-ed1-x52a.google.com (mail-ed1-x52a.google.com [IPv6:2a00:1450:4864:20::52a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D3D553A0789; Tue, 2 Nov 2021 08:50:23 -0700 (PDT)
Received: by mail-ed1-x52a.google.com with SMTP id 5so76451830edw.7; Tue, 02 Nov 2021 08:50:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=pDlxexx3k3dle6XVXz0c0Tn1ed61MuYsTK7DQxo4UUQ=; b=b+N0+U1649jm0TZzYSQLKHlCfIqqd7UAxYfkhT7pz8eNbg9cSmIsHWsi15ki+dojhJ btYB7UgdpYoqlOZTEwP+llEZn+GzRnkX9rhlraF+rqy43BUveX2/PxxtFdu6Lfs4zZUw +PcJcyhCj6kjyOuYoaIM0LPUhiw/ym0zrlvDH3PeBe5II4pM8QuAIFWOFiANWvNYjpeS ZvI42laW9axiBW28b6r3H2Eauq7QgplwxMq4RVIiwTP5J4ffDmrCeWfz79/OnUsiKtQ4 siKi84iq+6W6lBA0+2ylsjOjS7iXrsgP9mNZV5grQTrmYLzAxJno4CSFzti1WItoF/cX 5zVQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=pDlxexx3k3dle6XVXz0c0Tn1ed61MuYsTK7DQxo4UUQ=; b=Qj6iTXcFyMG/g+5kwRP5BGJSdNN4hHpiiQwXSO30btvry/tpdFnRi9g0pckN8Tnj+B kL6+aEXHEp5RFbQJ7wo8u7xR9qKXtVRgbb9BJWJr6hC0x2hq/XKCrhpeoCne+CVc5e8h LyCOrN48PWeKDl6ERxCb6zHd2fIevmt/23lF49cSHquCgqDIAxn1MBgeskFqLgL1r31c RBo/HXp7LrCiqk3qkE6EqBWJlVjN3PFYAYbndysmJfsuALZhQYBKBkGwGQQ/t+XE2hwB Fe1l9eZQxF85MS1inFpT7rfyDtR8M6PN4r9noLDeMPSjcn2+lzHwAPfLKo7/+liqI7J7 Ia+A==
X-Gm-Message-State: AOAM533T3fTlU8AVgg8PekipvyTpCR7oHTgRxAVtKmDd1WIbDi25c7Zh 5WMyoRhrVJM+LjtpzT3VftTdgIdTmfgIr3LZcgm9BQT5
X-Google-Smtp-Source: ABdhPJw/Zt8dYnsn0YngOBQU+jYc21ZZncvUPEy8HgV2vp+gC5VELRWVWu6OrASB/MP7KI59+b95X9vbHY25RjK26hQ=
X-Received: by 2002:a05:6402:5190:: with SMTP id q16mr14180411edd.123.1635868220721; Tue, 02 Nov 2021 08:50:20 -0700 (PDT)
MIME-Version: 1.0
References: <163584491835.18636.4240783876111764396@ietfa.amsl.com> <C3890CF3-86F3-4E72-BB1A-522BF0AA4CA3@oracle.com>
In-Reply-To: <C3890CF3-86F3-4E72-BB1A-522BF0AA4CA3@oracle.com>
From: David Noveck <davenoveck@gmail.com>
Date: Tue, 02 Nov 2021 11:50:08 -0400
Message-ID: <CADaq8jc9Z2QUfTy-+7BVGb8t+opdejctaWyuBJz_5GMsuJNDvQ@mail.gmail.com>
To: Chuck Lever III <chuck.lever@oracle.com>
Cc: Jana Iyengar <jri.ietf@gmail.com>, "tsv-art@ietf.org" <tsv-art@ietf.org>, "draft-ietf-nfsv4-rpcrdma-version-two.all@ietf.org" <draft-ietf-nfsv4-rpcrdma-version-two.all@ietf.org>, "nfsv4@ietf.org" <nfsv4@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000320e2305cfd04068"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/J1_ZgrSduDXl2ER94Yf_u0Xzek4>
Subject: Re: [nfsv4] Tsvart early review of draft-ietf-nfsv4-rpcrdma-version-two-05
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 02 Nov 2021 15:50:29 -0000

.> This is not a
>  request to change anything, so perhaps all I'm looking for is that the
ADs are
>  ok with this.

I'm sure they would be since RFC4506 is an internet standard.

Thanks for the review.



On Tue, Nov 2, 2021 at 10:59 AM Chuck Lever III <chuck.lever@oracle.com>
wrote:

> Hi Jana, thanks for your expert comments. A few responses
> are inline below.
>
>
> > On Nov 2, 2021, at 5:21 AM, Jana Iyengar via Datatracker <
> noreply@ietf.org> wrote:
> >
> > Reviewer: Jana Iyengar
> > Review result: Not Ready
> >
> > I've read up through Section 4, specifically to understand and comment
> on the
> > Flow Control aspects of this draft, as requested by Chuck Lever.
> >
> > As I understand it, FC works as follows:
> >
> > - A receiver advertises the number of unacknowledged messages it is
> willing to
> >  receive (`credit window`), and also indicates the number of messages
> received so far
> >  (`received`). The underlying transport is assumed to provide ordered and
> >  reliable delivery.
> >
> > - A sender is required to not send more than receiver-advertized
> `received +
> >  credit window` messages.
> >
> > - These advertisements are typically piggy-backed on payload-bearing
> >  messages. Optionally, a receiver can send a message with a `Grants
> Credit`
> >  header type to advertise its window.
> >
> > - The receiver can increase or decrease the advertised window.
> >
> > - Each message has a max message size (`Inline Threshold`) which is
> indicated
> >  via `Transport Properties`, and defaults to 4096 bytes.
> >
> >
> > I have a few comments specific to Flow Control:
> >
> > - The document allows for advertised window to increase or decrease,
> meaning
> >  that a receiver can renege on credit it advertises. Specifically, this
> allows
> >  a receiver to say it will accept a particular message and then reduces
> its
> >  window to cause that message to be now out of its window.
>
> It was not my intention for a credit decrease to force messages
> out of the window. I hope this oversight can be addressed by
> simply adding a sentence or two restricting the Receiver's
> behavior.
>
>
> >  This choice is rife with pain. I would urge, in the strongest manner
> possible,
> >  to avoid allowing reneging. That is, do not allow a receiver to
> arbitrarily
> >  reduce advertised credit so as to allow reneging. If you really really
> need
> >  it, prepare for some blood and sweat, and eventually, inevitable tears.
> >
> >  To clarify, I do not mean that the window size must remain the same. A
> >  receiver can reduce its window size, but do it so that if a message was
> >  in-window before, it remains in-window after.
> >
> >  For example, here is a sequence of FC messages that reduces the window
> and
> >  reneges on a previously allowed message (this is bad):
> >
> >  received = 10, window = 5  (allowed = 11-15)
> >  received = 11, window = 2  (allowed = 12-13, reneged on 14, 15)
> >
> >  And here is a sequence of FC messages that reduces the window without
> >  reneging (this is good):
> >
> >  received = 10, window = 5  (allowed = 11-15)
> >  received = 11, window = 4  (allowed = 12-15)
> >  received = 12, window = 3  (allowed = 13-15)
> >
> > - The current mechanism has a receiver advertising both `received` and
> `credit
> >  window` and the sender computes the limit. It is more direct for a
> receiver to
> >  simply advertise the limit -- basically the maximum message that the
> sender
> >  can send. This makes for fewer fields in the exchange, is more
> intuitive, and
> >  importantly, it is easier to describe a mechanism that avoids reneging
> >  (advertise the total message limit, and never advertise a lower number).
>
> A single field is preferable to retain better compatibility
> with RPC/RDMA v1. I wasn't able to think of a way to combine
> the fields. We can discuss this offline.
>
>
> > - Are the expected limits at a receiver in bytes or in operations? I
> imagine
> >  that the answer is bytes.
>
> The second paragraph of Section 4.2.1 explains that the
> connection's credit limits are expressed in messages, not
> bytes.
>
>    An RPC-over-RDMA version 2 credit represents the capability to convey
>    exactly one RPC-over-RDMA version 2 message, regardless of its size,
>    via an RDMA Send/Receive pair.
>
>
> >  If so, the FC mechanisms should be tied to this
> >  resource. At the moment, a receiver has to figure out what to advertise
> based
> >  on available memory and the maximum message size it advertised. Note
> that
> >  there will be inefficiency because messages are likely to be smaller
> than the
> >  max limit. It is likely to be more direct and efficient if the unit is
> all
> >  bytes.
>
> True, messages are typically smaller than the connection's
> inline thresholds. Unfortunately message bytes are not
> fungible among RDMA Receive buffers.
>
> A Receiver with 32 4096-byte Receive buffers will use exactly
> the same resources to receive 32 400-byte messages as it will
> to receive 32 3500-byte messages. Sending small RDMA messages
> does not enable a Receiver to then receive additional messages.
>
> At one point the document text was clear that the inline
> threshold is actually the RDMA Receive buffer size, but that
> over time has been removed in favor of generality.
>
>
> Regarding efficiency: We had considered building a capability
> in RPC/RDMA of conveying multiple short RPC messages in a
> single RDMA Send/Receive, but holding up those RPC messages to
> coalesce them usually adds unwanted latency.
>
>
> > - When are transport properties exchanged? FC credit needs to be
> available at
> >  the beginning of a connection for any progress to be made at all, so
> either
> >  both endpoints need to advertise it right at the start, or some default
> needs
> >  to be assumed. I don't see either in the document.
>
> The second paragraph of Section 4.3 explains that both peers
> must send one-at-a-time until credit limits are established.
>
>
> > - Finally, `credit window` is a strange term. What the document
> currently uses
> >  is arguably a window, and the term `window` here applies to messages
> that can
> >  be received, not to flow control credit.
> >
> > General comments (I skimmed the rest of the document):
> >
> > - I understand the value of the XDR descriptions for the protocol, but I
> haven't
> >  encountered this in the past. Is this adequate for an IETF RFC? This is
> not a
> >  request to change anything, so perhaps all I'm looking for is that the
> ADs are
> >  ok with this.
>
> Fwiw, XDR describes RPC protocols throughout the RFCs pub-
> lished by the nfsv4 WG, starting with RFC 2623. Previous to
> the creation of the nfsv4 WG, XDR was originally defined in
> RFC 1014 and used to describe NFSv2 in RFC 1094 in 1989.
>
>
> > - It would be useful to have forward references from the earlier
> sections to the
> >  later wire format and detailed error sections. For example, when a
> connection
> >  might be closed due to FC violations.
> >
> > - I might recommend a little text in the intro section laying out the
> structure
> >  of the document.
> >
> > - I see lowercase and uppercase MUSTs and SHOULDs. For clarity, consider
> >  rewriting the text to avoid the lowercase keywords.
>
> Issues regarding readability, terminology, and document
> structure will be addressed in subsequent revisions. Thanks
> for pointing them out!
>
>
> --
> Chuck Lever
>
>
>
>