Re: [nfsv4] Review of draft-dnoveck-nfsv4-rpcrdma-xcharext-00

David Noveck <davenoveck@gmail.com> Fri, 22 July 2016 09:05 UTC

Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B93EB12DA7D for <nfsv4@ietfa.amsl.com>; Fri, 22 Jul 2016 02:05:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.699
X-Spam-Level:
X-Spam-Status: No, score=-2.699 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8rkBvF1CYI0j for <nfsv4@ietfa.amsl.com>; Fri, 22 Jul 2016 02:04:58 -0700 (PDT)
Received: from mail-oi0-x22d.google.com (mail-oi0-x22d.google.com [IPv6:2607:f8b0:4003:c06::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A3CE212DA5B for <nfsv4@ietf.org>; Fri, 22 Jul 2016 02:04:58 -0700 (PDT)
Received: by mail-oi0-x22d.google.com with SMTP id l72so154476072oig.2 for <nfsv4@ietf.org>; Fri, 22 Jul 2016 02:04:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=Lq2QW0dKs9gWuz6cO9xE/MqHgOr439rNT7b19CSEonA=; b=RyS17egBz6MgqViFPGcohZV2mYdX2QK6X608xi8XXTlqesrd9sTFs9v3uPF2VwCKAL w72Y/KKO03mGZ42lMJqRH3nNjYvd0Zp0t+OUOCqPuAhCf0xLdArRQt0EmaQmfksoCgPW Dqo0R1pUGqhh33bQYizCfJkKNhNcVH1tf/EZZLYLmYsgUq3pZFn9JTW+TW091UZnPukS DCfbqdoMbyUTtAI0BvcVbIjnfRrPjDnf6k2GON4lEveRdRZlsAyZGnZLB7IX8TPJ28bM aY2qYCBOVY7u96ZL9Na9ULYRcxfoLSSCFZTquyv/pMzT0hVslOhqsECjiBlsek4e1G/+ CiUQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=Lq2QW0dKs9gWuz6cO9xE/MqHgOr439rNT7b19CSEonA=; b=C1oICxvrzlLIfAUgRNLU3Z9dzxV+UUmct2iGJkB+4UHemeNtksqDnPFhT8fXunbLkF yRX4EDSv3+eDIX4Blr+qvQGqVoM9MCNQ3Pc8IrMrmY5hIyuU25OVGn4f16imEIAhSYeQ NnXRZlepcspgAOm9vp6PlmInN0U9X2i2oBVcOlZuIRSl8O98a23OAfmwRQ7/MbGJED6f owYanxfhtlWag183A89DlY2qtb/cZbyHI+v/WQ2UGzJbu+k+N+k0DcMJsJL7qUDHQF8C 9veKuSUBf2zpOsbfnZV+DPF/LMVDHeRN+NwMAxGG2Bzk4+Sc8E/NOPbF3U0n1djzNMIY QG8A==
X-Gm-Message-State: AEkoouuEIyisjK3dJTWEPXA4XpB/+QO+8yDzD4THdEBud2dblGoeqMkB8J8HfJEktPSrWcVRVZLxhP8JciPAyw==
X-Received: by 10.157.14.5 with SMTP id c5mr1529414otc.55.1469178297551; Fri, 22 Jul 2016 02:04:57 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.182.20.72 with HTTP; Fri, 22 Jul 2016 02:04:56 -0700 (PDT)
In-Reply-To: <9202AD2E-D708-46FF-A78D-B87AAAA102E7@oracle.com>
References: <9202AD2E-D708-46FF-A78D-B87AAAA102E7@oracle.com>
From: David Noveck <davenoveck@gmail.com>
Date: Fri, 22 Jul 2016 05:04:56 -0400
Message-ID: <CADaq8jfLMAAD6e8S1yCErF=MFwYASE3uJw0GwvQicZF_FpQ5Ag@mail.gmail.com>
To: Chuck Lever <chuck.lever@oracle.com>
Content-Type: multipart/alternative; boundary="001a11414dce898f64053835ba75"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/YVw_XHtpAvsr-oowjHKfnaF5428>
X-Mailman-Approved-At: Fri, 22 Jul 2016 08:19:13 -0700
Cc: NFSv4 <nfsv4@ietf.org>
Subject: Re: [nfsv4] Review of draft-dnoveck-nfsv4-rpcrdma-xcharext-00
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Jul 2016 09:05:04 -0000

> This is a long-form full document review of draft-dnoveck-nfsv4-rpcrdma-
xcharext-00.

Thanks for the review.

> Please feel free to respond with clarifying questions or further
discussion where needed, and no response is necessary
> where the author(s) agree to a suggestion or plan to ignore it.

To be clearer, I will

   - If I'm not following a suggestion I will say why rather than simply
   ignoring it.
   - In cases in which I simply agree to all the suggestions in a given
   section, I will just say "OK" to that section.

> Comments applying to the whole document:

> Subsequent extensions might each define one or more new characteristics.
Since the extension specified in this document
> is a (necessary) extensibility enabler, should we consider merging it
into the base RPC-over-RDMA Version Two protocol?

Yes. As discussed at the meeting, this makes architectural sense.

> Two of the three initially specified transport characteristics need
further technical discussion outside the scope of the review of this
document.

I'm not sure as to best ordering of:

   - Those discussions
   - Production of a -01
   - Incorporation of xcharext into a new version-two document.

For now, I'll make the changes I'm sure about into the existing xml.  We
can decide what to
do later about things that might need further discussion.  I'll be keeping
a to-do/to-decide list.
It may be that some of the more disruptive of those changes might be done
as part of effecting
the document amalgamation.

> *** Abstract

> Editorial:


> NEW
> This document specifies an extension to RPC-over-RDMA Version Two and
newer.

I have problem with the phrase "and newer".  I'm not sure what you are
assuming Version Three will be like such that it is sufficiently different
from Version Two to require a new version number and still need this
extension.   Also, if this extension is an integral part of version Two,
then it will remain in subsequent versions until someone makes an explicit
decision to take it out.

> *** Section 1.1:

> NEW
> This document specifies an extension to RPC-over-RDMA Version Two and
newer.

Again, I have a problem with "and newer".  No problem with the rest of it.

> *** Section 2.1:
>
> Editorial:
>
> The first sentence can be stated more clearly.

OK.

> Opinion:

> The term "char" has a longstanding and well known meaning to C coders. I
found the overloaded use of "char" as an abbreviation of "characteristic"
to be confusing.

I can see that might be a problem, but I don't think I use the unadorned
abbreviation "char'.  It is always part of the string "_XCHAR".  I can
change this and probably will.

> I find "attribute" or "capability" to be more in line with the intended
usage and either of these seems to be preferred historically over
"characteristic".
> There are plenty of examples, including within NFS itself, of the use of
the term "attribute."

At the time, I was worried that the use of the same term might be
confusing.  "capability" seems like it is limited to binary choices.

I'm willing to change this but note that it would change the title and the
filename  document.  As a result i am not making this change now.

> In addition, I found the use of the prefix "x" to be confusing. Here I
presume that "x" means "transport", not "transmit" or "unknown value".

You presume correctly.

Also, note that if we change to "attribute", "XCHAR" would become "XATTR"
:-)

> Since this is an extension of RPC-over-RDMA, I was surprised not to find
"rpcrdma" in the data type names.

If we add that (or "rpcrdma2" or "rpcrpdma2plus") to everything then the
names start getting very long for np very goo reaspn, and I prefer shorter
names.

> The variable, type, and operation names used in this document are often
not particularly mnemonic.
> I suggest (somewhat) longer names to help the reader remember function.
Even an additional vowel or two might be helpful.

Adding a few characters to make it more mnemonic is OK with me, I have
problems if that is carried to an extreme.

Although I think we can come to agreement about this issue, I think it
makes most sense to address this as part the merger of the two documents.
Are you OK with that?

For now, this is on my to-do/to-decide list.

> Technical:

> I suggest that RFC 2119 terms are appropriate in the following sentence:

OK with these.

> OLD
> The receiver of a message containing an xcharval needs to report an XDR
error if the length of xchv_data is such that it extends beyond the bounds
of the message transferred.

> NEW
> If the length of xchv_data is such that it extends beyond the bounds of
the transferred message, the receiver MUST report an XDR error.

> Likewise for the following sentence: "the receiver also needs to report"
ought to be "the receiver MUST report".

I've used the MUSTs but I haven't adopted your specific rephrasings,


> When first reading this section, it was not clear to me what the
xcharsubset type is for. To provide some context for this type definition,
it would help to add a
> paragraph in Section 2.1 briefly summarizing the later discussion:

>        • The xcharsubset type is used for reporting completion of one or
more changes of characteristics, and that
>        • Completing a change request does not have to be synchronous

Will address.

> *** Table 1:

> Technical:

> The default values provided here are based on the current behavior of
RPC-over-RDMA Version Two.

True.

> Future versions of RPC-over-RDMA may prescribe different default behavior
for each of these characteristics.

They might, but given the fact that we now have a mechanism to pick and
communicate the value you want, I don't expect that changing the default
values will be all that important.

> I suggest it would be better to defer the presentation of default values
until subsections 3.1, 3.2, and 3.3, with explicit association of these
particular values with RPC-over-RDMA Version
> Two only. Consider a minor text rewrite to make the default values
dependent on the RPC-over-RDMA Version being deployed.

I'm Ok with this but note that, if we merge the documents, the problem will
come close to solving itself.  If these appear in  a document defining
Version Two, then obviously the default values are the default values
within Version Two.

In a section describing further extensability one could state:

   - New versions can add characteristics
   - New versions should not change the structure of an existing
   characteristic.
   - New versions can change the default value of a characteristic

For now, I am putting this on the to-do list

> BKRQSUP_UNKNOWN is in fact not the default behavior of RPC-over-RDMA
Version Two. Version Two REQUIRES support for backwards direction
operation. I believe the correct default value > for Version Two would be
BKRQSUP_SZLIM ? More on that later.

Ok on the default.  Will discuss further implications later.

> Should Table 1 include a column that indicates whether a characteristic
is Requester-only, Responder-only, or Both? Should the names of the
characteristics also reflect this?

I don't think this is an appropriate way to categorize characteristics.  A
endpoint is known to be either a client or a server but whether it is
requester or responder depends on
the direction of the particular message being processed.   Particular
characteristics may be relevant to particular roles, but I don't see that
as a basis for categorization.  For example the receive buffer size applies
to the endpoint in its role as receiver but that doesn't make it a
receiver-only characteristic, since all endpoints are receivers.  Similarly
for characteristics that only apply to the endpoint in the role of
requester.  Given bi-directional operation, these will apply to the client
or the server, depending on the direction of transfer.

> Regarding the remark following Table 1, implementations can choose not to
support updating some or all of these characteristics.
> I suggest it is appropriate to use an RFC 2119 term here to make it
explicit that implementations have a choice, and that they must
> be prepared for a peer to reject a change request. Here is a sample
additional paragraph:

> Peers MAY reject a request to change a characteristic for any reason.

OK.

> There are at least two ways to reject a characteristic change request.
One is to not implement the REQ_XCHAR, RESP_XCHAR, and UPD_XCHAR
operations.

I agree about REQ_XCHAR.  Not implementing the others doesn't result in
request rejection.  That just just means you can't find out about the other
endpoint's rejection of your request.

> The other is to reject change requests for individual characteristics.
See Section 4.3 for details.

OK.

> ** Section 3.1:

> Editorial:

> For clarity, I suggest capitalizing Request Buffer Size whenever
referring to the characteristic itself.

The characteristic is "receive buffer size".  I capitalized it in two
places.


> Technical:

> Please explicitly specify the units of the xchrbsiz value. Octets is
assumed, but in the context of XDR, the value could be in XDR quads, for
instance.
> The buffer size in octets is likely to always be a multiple for four.

OK.

> The data types defined here are:

> const uint32    XCHAR_RBSIZ = 1;
> typedef uint32  xchrbsiz;

> I find these choices less readable than, for example, XCHAR_RCVBUF_SIZE
and xchr_rcvbufsz.

I'm OK with these changes, but will hold off for a while in order to make
sure we make all the necessary changes consistently.  This is on my
to-do/to-decide list.

> These are not necessarily my most preferred names, they simply illustrate
that I believe we can afford a few more characters in these names to help
legibility.

I would have a problem with RPCRDMA2_TRANSPORT_CHAR_RECEIVE_BUFFER_SIZE and
rpcrpma2_transport_char_receive_buffer_size :-(

> The initial subset of characteristics are setting a precedent for future
naming choices.

Good point.

> The terms "Position-Zero Read chunk" and "Reply chunk" as defined in
rfc5666bis are capitalized.

OK

> *** Section 3.2:

> Editorial:

> Rather than the term "remote invalidation requests," I recommend the more
specific term "RDMA Send With Invalidate". Here and elsewhere in the
document, > the term "invalidate" is preferred over "deregister". (The term
is "Remote Invalidation" not "Remote Deregistration," after all).

OK.

Technical:

> The mention of the server's lack of use of memory registration is not
technically correct.

Will address.

> Finally, although there are no DDP-eligible data items in currently
defined backward-direction ULPs, an implementation can still choose to use
Position-Zero Read chunks or
> Reply chunks for larger messages.

Good point.

> I'm not sure what exactly to do about backward-direction use of Remote
Invalidation, other than to have the responder also provide this
characteristic via INIT_XCHAR, but
> always set to false for now,

What I feel is consistent with that but I'd distinguish between what the
spec  says to do and what we expect to implementations to do in the near
future.

The spec should state that this chracteristic applies to both directions of
operation.

You and I know that server implementations, for the forseeable future are
either going to not report this characteristic or report it as false. also
clients are unlikely to look at it for a while.  The document may provide
this information as interesting background but that doesn't affect the
protocol, which allows that situation to change.

> possibly stating that this characteristic is for forward-direction
requests only (ie, punting).

I don't think punting is appropriate.  This is a
second-down-and-inches-to-go situation.

> Therefore may I suggest replacing the first two paragraphs with the
following:
>
> OLD
> The requester remote invalidation characteristic indicates that the
requester is prepared for the responder to issue remote invalidation
requests, in order to unregister memory > regions established to support
RDMA Read and Write operations done by the responder into or out of the
requester’s memory.
>
> As RPC-over-RDMA is currently used, memory registration is not done on
the server and explicit RDMA operations are not done to satisfy
backward-direction requests. This
> makes it unlikely that servers will present non-default values of the
XCHAR_RQREMINV characteristic or that clients will take note of the value
presented by servers.

> NEW
> The Requester Remote Invalidation characteristic indicates that a
forward-direction requester is prepared for a forward-direction responder
to use RDMA Send With Invalidate
> when replying to an RPC-over-RDMA request containing non-empty chunk
lists.

I don't think we want to foreclose the option of using this in the reverse
direction.

PROPOSED:

The Requester Remote Invalidation characteristic indicates that the current
endpoint, when in the role of a requester, is prepared for the responder to
use RDMA Send With Invalidate when replying to an RPC-over-RDMA request
containing non-empty chuck lists.

As RPC-over-RDMA is currently used, memory registration exposed to peers
are not done by the server and explicit RDMA operations are not done to
satisfy backward direction requests.  This makes it unlikely that servers
will present non-default values of the XCHAR_RQREQ characteristic or that
clients will take note of such values when presented by servers.


> IMO, there is no need to include a discussion of the implementation of
Remote Invalidation in this document. After <CODE ENDS> in this section, I
suggest finishing the section with the following
> two replacement paragraphs:

> When the Requester Remote Invalidate characteristic is set to false, a
responder MUST use Send to convey
> RPC reply messages to the requester. When the Requester Remote Invalidate
characteristic is set to true, a
> responder MAY use Send With Invalidate instead of Send to convey RPC
replies to the requester.

> The value of the Requester Remote Invalidate characteristic is not likely
to change from the value reported by INIT_XCHAR (see Section 4.1).

OK.

 -- stuff about possible need for XDR change deleted ---

> We should discuss this further outside the scope of this review to come
to agreement on how to proceed.

OK.  I'll note this on the to-do/to-decde list.

I also note that if there is a change in the chunk XDR, I will need to
update the XDR In draft-dnoveck-nfsv4-rpcrdma-rtrext to match.


> *** Section 3.3:
>
> Technical:
>
> RPC-over-RDMA Version Two and newer implementations are required to have
support for backward-direction requests.

I didn't know that.  Now that you mention it, I'm not sure it is such a
good idea.  For example, I don't think that you want to
prevent an NFSv3 implementation, which has no need to use backward
direction requests, from upgrading from V1 to V2.  That
issue can be discussed further. For now, I'll keep this on the
to-do/to-decide list.

> I don't see a use for BKRQSUP_UNKNOWN or BKRQSUP_NONE here; V1, where
> NONE might be appropriate, isn't extensible, and thus will never have
transport characteristics.

Depends on the decision on whether backward direction support is required
in V2.

>The only unknown is whether the implementation can support the use of
chunks for backward-direction requests. This should not be a problem if the
ULP does not have any DDP-eligible data items, and has a mechanism for
indicating request size limits and it keeps the size of backward direction
requests under the inline threshold.

> Note that Version Two's inline threshold is at least 4KB, and
implementations are free to advertise an even larger threshold using the
Receive Buffer Size characteristic.

> This size limit is actively advertised by the only ULP that uses
backward-direction requests, NFSv4.x.


> The rpcrdma-bidirection specification purposely does not prohibit chunks
in the backward direction.

Exactly.  In version Two, this is optional behavior. The characteristic in
question provides a means to determine whether the peer supports transfer
of such chunks.

> When a backward direction ULP is proposed with DDP-eligible data items,
support for backward direction chunks will be introduced
> in RPC-over-RDMA implementations where that ULP is supported.

Yes, but my hope is that we will be in a situation in which the
implementation support can be introduced but no protocol change is
required.  If we don't provide the support before it is needed, it is
likely that it will be needed before it is available, which would be a real
drag.

> Outside the scope of this review, let's propose some use cases that
demonstrate how this characteristic can be used effectively, and then we
can consider adjusting what is currently proposed.

It has to be understood that any such use case will necessarily be
hypothetical, in that they have to assume that there will be a ULP that
wants this support, that Version Two, has, so far, been careful not to
prevent.


*** Section 4:

> Editorial:

> I suggest OPT_XCHAR_ or ROPT_XCHAR_ as a prefix rather than using an
_XCHAR suffix for the operation names.

I'm OK with this but am going to hold off a bit since a merger with Version
Two may supersede this.  This is now on my to-decide list.

After all, if this is merged fully into Version Two, we might have
RDMA_INIT_XCHAR, etc.

> It might be useful to have a separate table which states what the
receiver's expected response is (if I understand correctly):

Not sure a table is useful.  Nevertheless the discussion of what would be
in it would be helpful.

> Sender:            response:
> INIT_XCHAR    INIT_XCHAR

As stated, this would result infinite response-to-response loop :-(

I think of each of these as independently offered by each endpoint.

> REQ_XCHAR   RESP_XCHAR or UPD_XCHAR

RESP_XCHAR is the response to REQ_XCHAR.  In addition, one or more
UPD_XCHARs might be generate but these are
asynchronous and not readily connected with the REQ_XCHAR That caused them
to be generated.

> RESP_XCHAR none
> UPD_XCHAR   none

Yes.


> *** Section 4.1:

> Technical:

> Paragraph ending:

> can be received by partners who use the default minimum receive buffer
size.

> I suggest the word "minimum" be dropped from that sentence. Afterwards,
add something to the effect of:

> The connection's initial receive buffer size is typically 1KB, but it
depends on the initial connection state of the RPC-over-RDMA version in
use. See [rpcrmdav2] for details.

OK.


*** Section 4.2:

OK..


> *** Section 4.3:

OK

*** Section 4.4:

> Editorial:

> The first paragraph has a superfluous comma:

> The UPD_XCHAR message type allows an RPC-over-RDMA participant >>,<< to
notify the other participant that a change to the transport characteristics
has occurred.

Will fix.

> The following paragraph:

> Note that there no way to tie a request changed to the specfic request
which asked for it.  In particular, the xid associated with this message is
independent of that for an earlier REQ_XCHAR message.

> Seems more important than just a "Note".

It is not "just a 'Note'", even though the verb "Note" is used.

> I suggest dropping "Note that,"

I don't see that raises the prominence or perceived importance of this fact.

One possibility is to say:

It is important to note that ...


another possibility is to replace the verb "note" by its definition,
resulting in:

One should pay particular attention to the fact that there is no way

> and clarifying the phrase "there no way to tie a request changed".

I think that is in the following sentence.

> Technical:

> I suggest something like ROPT_XCHAR_NOTIFY rather than UPD_XCHAR.

"NOTIFY" seems too generic to me.  How about "ROPT_XCHAR_CHANGED"?  I
supposed that this would be batched with other similar name changes.

> I suggest that it be made clear that "xid" in this document means the
value in the RPC-over-RDMA header's rdma_xid field.

I've done that.

> In addition, the call-response relationships of the new operations
proposed in this document are perhaps important enough
> (and unidirectional, not like normal RPC) to move discussion into a
separate subsection of Section 4, or perhaps earlier in the document.

I will do that.

> This section finishes with the following text:

> optuxc_pendclr, if true, indicates that a previous request to update the
characteristic specified by optuxc_now.xchv_which
> is no longer to be considered pending. This may be set true even if the
characteristic value is not changed from the previous value.

> This seems to imply that, after receiving an UPD_XCHAR message, the
receiver must query the sender to see if the change actually took effect.
Did I misunderstand this?

It doesn't.  The receiver gets the new current value of the characteristic
in any case.

> Would it be better if UPD_XCHAR reported whether a characteristic change
took effect or was rejected?

rejection is reported by RESP_XCHAR.

> Under what circumstances would optucx_pendclr be set to false?

Two possibilities;

   - change not requested.  i.e. the sender is notifying you of a change
   you didn't request.
   - you requested a change but the request is still pending. this could
   happen if you requested that something be changed and instead of waiting
   untuil the change was effected, the sender decides to update you on partial
   completion, while continuing to keep the original request pending.


*** Section 6.1:

Editorial:

> The title of this section is "Additional Operations" but the two
paragraphs here discuss only transport characteristics. Perhaps this
subsection could be deleted?

I will delete it.

> The second paragraph is missing its ending period.

Will fix if I don't delete this.


*** Section 6.2:

Editorial:

> I suggest breaking the first sentence into two or three sentences and
restating.

Will do.

*** Section 6.3:

OK.

*** Section 7:

OK.

On Wed, Jul 20, 2016 at 12:03 PM, Chuck Lever <chuck.lever@oracle.com>
wrote:

> This is a long-form full document review of
> draft-dnoveck-nfsv4-rpcrdma-xcharext-00. Please feel free to respond with
> clarifying questions or further discussion where needed, and no response is
> necessary where the author(s) agree to a suggestion or plan to ignore it.
>
>
> Comments applying to the whole document:
>
> Subsequent extensions might each define one or more new characteristics.
> Since the extension specified in this document is a (necessary)
> extensibility enabler, should we consider merging it into the base
> RPC-over-RDMA Version Two protocol?
>
> Two of the three initially specified transport characteristics need
> further technical discussion outside the scope of the review of this
> document.
>
>
> *** Abstract
>
> Editorial:
>
> IMO it's safe to assume in the abstract that RPC-over-RDMA Version Two
> will have a mechanism that permits extension of the base RPC-over-RDMA
> protocol. Explaining that in this document's abstract seems unnecessary,
> and the use of subjunctive mood and future tense dulls the impact of both
> the abstract and the introduction section. Also, the term RPC-over-RDMA is
> preferred over the term RPC/RDMA.
>
> I suggest tightening the text as follows:
>
> OLD
> It is expected that the RPC-over-RDMA transport will, at some point, allow
> protocol extensions to be defined. This would provide for the specification
> of OPTIONAL features, allowing participants who implement the OPTIONAL
> features, to cooperate as specified by that    extension, while still
> interoperating with participants who do not support that extension.
>
> A particular extension is described herein, whose purpose is to allow
> RPC/RDMA Endpoints to specify and manage their transport characteristics in
> order to allow the other participant to optimize message transfer in light
> of the characteristics communicated by the initial sender.
>
> NEW
> This document specifies an extension to RPC-over-RDMA Version Two and
> newer. The extension enables endpoints of an RPC-over-RDMA connection to
> exchange information which can optimize message transfer.
>
>
> *** Section 1.1:
>
> Editorial:
>
> Likewise, the introduction can be written to assume that the extension
> mechanism is in the base protocol already. I suggest replacing the first
> two paragraphs of Section 1.1.
>
> OLD
> This document describes a potential extension to the RPC-over-RDMA
> protocol, which would allow participating implementations to communicate
> the transport characteristics of their implementation, to request changes
> in those of the other participant, and to effect changes and notify the
> other participant of the changes.
>
> Although this document specifies the new OPTIONAL message header types to
> implement these functions, the precise means by which the presence of
> support for these OPTIONAL functions will be ascertained is not described
> here, as would be done more appropriately by the RFC defining a version of
> RPC-over-RDMA which supports protocol extension.
>
> NEW
> This document specifies an extension to RPC-over-RDMA Version Two and
> newer. It allows each participating endpoint on a single connection to
> communicate various characteristics of its implementation, to request
> changes in characteristics of the other endpoint, and to effect changes and
> notify the other endpoint of changes to these characteristics during
> operation.
>
> The extension described herein specifies OPTIONAL message header types to
> implement this mechanism. The means by which the implementation support
> status of these OPTIONAL types is ascertained is described in [rpcrmdav2].
>
>
> *** Section 2.1:
>
> Editorial:
>
> The first sentence can be stated more clearly.
>
> OLD
> Receiver and sender characteristics are specified using an extensible
> approach, that allows new characteristics to be defined, in addition to the
> small set of initial transport characteristics specified herein.
>
> NEW
> An initial set of receiver and sender characteristics are specified in
> this document. An extensible approach is used, allowing new characteristics
> to be defined in future standards track documents.
>
>
> Opinion:
>
> The term "char" has a longstanding and well known meaning to C coders. I
> found the overloaded use of "char" as an abbreviation of "characteristic"
> to be confusing. I find "attribute" or "capability" to be more in line with
> the intended usage and either of these seems to be preferred historically
> over "characteristic". There are plenty of examples, including within NFS
> itself, of the use of the term "attribute."
>
> In addition, I found the use of the prefix "x" to be confusing. Here I
> presume that "x" means "transport", not "transmit" or "unknown value".
>
> Since this is an extension of RPC-over-RDMA, I was surprised not to find
> "rpcrdma" in the data type names.
>
> The variable, type, and operation names used in this document are often
> not particularly mnemonic. I suggest (somewhat) longer names to help the
> reader remember function. Even an additional vowel or two might be helpful.
>
>
> Technical:
>
> I suggest that RFC 2119 terms are appropriate in the following sentence:
>
> OLD
> The receiver of a message containing an xcharval needs to report an XDR
> error if the length of xchv_data is such that it extends beyond the bounds
> of the message transferred.
>
> NEW
> If the length of xchv_data is such that it extends beyond the bounds of
> the transferred message, the receiver MUST report an XDR error.
>
> Likewise for the following sentence: "the receiver also needs to report"
> ought to be "the receiver MUST report".
>
> When first reading this section, it was not clear to me what the
> xcharsubset type is for. To provide some context for this type definition,
> it would help to add a paragraph in Section 2.1 briefly summarizing the
> later discussion:
>
>         • The xcharsubset type is used for reporting completion of one or
> more changes of characteristics, and that
>         • Completing a change request does not have to be synchronous
>
>
> *** Table 1:
>
> Technical:
>
> The default values provided here are based on the current behavior of
> RPC-over-RDMA Version Two. Future versions of RPC-over-RDMA may prescribe
> different default behavior for each of these characteristics. I suggest it
> would be better to defer the presentation of default values until
> subsections 3.1, 3.2, and 3.3, with explicit association of these
> particular values with RPC-over-RDMA Version Two only. Consider a minor
> text rewrite to make the default values dependent on the RPC-over-RDMA
> Version being deployed.
>
> BKRQSUP_UNKNOWN is in fact not the default behavior of RPC-over-RDMA
> Version Two. Version Two REQUIRES support for backwards direction
> operation. I believe the correct default value for Version Two would be
> BKRQSUP_SZLIM ? More on that later.
>
> Should Table 1 include a column that indicates whether a characteristic is
> Requester-only, Responder-only, or Both? Should the names of the
> characteristics also reflect this?
>
> Regarding the remark following Table 1, implementations can choose not to
> support updating some or all of these characteristics. I suggest it is
> appropriate to use an RFC 2119 term here to make it explicit that
> implementations have a choice, and that they must be prepared for a peer to
> reject a change request. Here is a sample additional paragraph:
>
> Peers MAY reject a request to change a characteristic for any reason.
> There are at least two ways to reject a characteristic change request. One
> is to not implement the REQ_XCHAR, RESP_XCHAR, and UPD_XCHAR operations.
> The other is to reject change requests for individual characteristics. See
> Section 4.3 for details.
>
>
> *** Section 3.1:
>
> Editorial:
>
> For clarity, I suggest capitalizing Request Buffer Size whenever referring
> to the characteristic itself.
>
>
> Technical:
>
> Please explicitly specify the units of the xchrbsiz value. Octets is
> assumed, but in the context of XDR, the value could be in XDR quads, for
> instance. The buffer size in octets is likely to always be a multiple for
> four.
>
> The data types defined here are:
>
> const uint32    XCHAR_RBSIZ = 1;
> typedef uint32  xchrbsiz;
>
> I find these choices less readable than, for example, XCHAR_RCVBUF_SIZE
> and xchr_rcvbufsz. These are not necessarily my most preferred names, they
> simply illustrate that I believe we can afford a few more characters in
> these names to help legibility. The initial subset of characteristics are
> setting a precedent for future naming choices.
>
> The terms "Position-Zero Read chunk" and "Reply chunk" as defined in
> rfc5666bis are capitalized.
>
>
> *** Section 3.2:
>
> Editorial:
>
> Rather than the term "remote invalidation requests," I recommend the more
> specific term "RDMA Send With Invalidate". Here and elsewhere in the
> document, the term "invalidate" is preferred over "deregister". (The term
> is "Remote Invalidation" not "Remote Deregistration," after all).
>
>
> Technical:
>
> The mention of the server's lack of use of memory registration is not
> technically correct. A server often does register memory regions that are
> used as RDMA Read sink buffers. (the Linux server registers these buffers
> using FRWR; Solaris does not register these buffers, relying on physical
> addressing and a smaller scatter/gather limit per Read WR). These
> registered sink buffers are not exposed to other network nodes.
>
> Finally, although there are no DDP-eligible data items in currently
> defined backward-direction ULPs, an implementation can still choose to use
> Position-Zero Read chunks or Reply chunks for larger messages. I'm not sure
> what exactly to do about backward-direction use of Remote Invalidation,
> other than to have the responder also provide this characteristic via
> INIT_XCHAR, but always set to false for now, or possibly stating that this
> characteristic is for forward-direction requests only (ie, punting).
>
> Therefore may I suggest replacing the first two paragraphs with the
> following:
>
> OLD
> The requester remote invalidation characteristic indicates that the
> requester is prepared for the responder to issue remote invalidation
> requests, in order to unregister memory regions established to support RDMA
> Read and Write operations done by the responder into or out of the
> requester’s memory.
>
> As RPC-over-RDMA is currently used, memory registration is not done on the
> server and explicit RDMA operations are not done to satisfy
> backward-direction requests. This makes it unlikely that servers will
> present non-default values of the XCHAR_RQREMINV characteristic or that
> clients will take note of the value presented by servers.
>
> NEW
> The Requester Remote Invalidation characteristic indicates that a
> forward-direction requester is prepared for a forward-direction responder
> to use RDMA Send With Invalidate when replying to an RPC-over-RDMA request
> containing non-empty chunk lists.
>
>
> IMO, there is no need to include a discussion of the implementation of
> Remote Invalidation in this document. After <CODE ENDS> in this section, I
> suggest finishing the section with the following two replacement paragraphs:
>
> When the Requester Remote Invalidate characteristic is set to false, a
> responder MUST use Send to convey RPC reply messages to the requester. When
> the Requester Remote Invalidate characteristic is set to true, a responder
> MAY use Send With Invalidate instead of Send to convey RPC replies to the
> requester.
>
> The value of the Requester Remote Invalidate characteristic is not likely
> to change from the value reported by INIT_XCHAR (see Section 4.1).
>
>
> After some research, I've found that full support for RI is done in other
> protocols by having the client mark the chunk in each request it wants the
> server to invalidate. This allows servers to clearly distinguish chunks
> that are persistently registered (where an attempted invalidation might be
> terminal to the connection) and ones that have a short-lived tag (where
> invalidation should work); and of course it also indicates when the client
> cannot tolerate any remove invalidation.
>
> This is not a mechanism that can be provided with a transport
> characteristic, but would require an XDR change in the RPC-over-RDMA
> header. In particular, it would require changes to the chunk list data
> structures. For example, the set of three chunk lists in each Call could be
> followed by a field containing the tag that the client wishes to be
> invalidated. Once such a data structure change is in place, a Requester
> Remote Invalidation characteristic might be superfluous.
>
> We should discuss this further outside the scope of this review to come to
> agreement on how to proceed.
>
>
> *** Section 3.3:
>
> Technical:
>
> RPC-over-RDMA Version Two and newer implementations are required to have
> support for backward-direction requests. I don't see a use for
> BKRQSUP_UNKNOWN or BKRQSUP_NONE here; V1, where NONE might be appropriate,
> isn't extensible, and thus will never have transport characteristics.
>
> The only unknown is whether the implementation can support the use of
> chunks for backward-direction requests. This should not be a problem if the
> ULP does not have any DDP-eligible data items, and has a mechanism for
> indicating request size limits and it keeps the size of backward direction
> requests under the inline threshold.
>
> Note that Version Two's inline threshold is at least 4KB, and
> implementations are free to advertise an even larger threshold using the
> Receive Buffer Size characteristic. This size limit is actively advertised
> by the only ULP that uses backward-direction requests, NFSv4.x.
>
> The rpcrdma-bidirection specification purposely does not prohibit chunks
> in the backward direction. When a backward direction ULP is proposed with
> DDP-eligible data items, support for backward direction chunks will be
> introduced in RPC-over-RDMA implementations where that ULP is supported.
>
> Outside the scope of this review, let's propose some use cases that
> demonstrate how this characteristic can be used effectively, and then we
> can consider adjusting what is currently proposed.
>
>
> *** Section 4:
>
> Editorial:
>
> I suggest OPT_XCHAR_ or ROPT_XCHAR_ as a prefix rather than using an
> _XCHAR suffix for the operation names.
>
> It might be useful to have a separate table which states what the
> receiver's expected response is (if I understand correctly):
>
> Sender:            response:
> INIT_XCHAR    INIT_XCHAR
> REQ_XCHAR   RESP_XCHAR or UPD_XCHAR
> RESP_XCHAR none
> UPD_XCHAR   none
>
>
> *** Section 4.1:
>
> Technical:
>
> Paragraph ending:
>
> can be received by partners who use the default minimum receive buffer
> size.
>
> I suggest the word "minimum" be dropped from that sentence. Afterwards,
> add something to the effect of:
>
> The connection's initial receive buffer size is typically 1KB, but it
> depends on the initial connection state of the RPC-over-RDMA version in
> use. See [rpcrmdav2] for details.
>
>
> *** Section 4.2:
>
> Editorial:
>
> The first paragraph has a superfluous comma:
>
> The REQ_XCHAR message type allows an RPC-over-RDMA participant, whether
> client or server, to request of its partner >>,<< that relevant transport
> characteristics be changed.
>
> The second paragraph is missing its ending period.
>
>
> *** Section 4.3:
>
> Editorial:
>
> OLD
> future transmissions may be made based on that assumption.
>
> NEW
> future transmissions may be made based on the new value.
>
>
> *** Section 4.4:
>
> Editorial:
>
> The first paragraph has a superfluous comma:
>
> The UPD_XCHAR message type allows an RPC-over-RDMA participant >>,<< to
> notify the other participant that a change to the transport characteristics
> has occurred.
>
> The following paragraph:
>
> Note that there no way to tie a request changed to the specific request
> which asked for it.  In particular, the xid associated with this message is
> independent of that for an earlier REQ_XCHAR message.
>
> Seems more important than just a "Note". I suggest dropping "Note that,"
> and clarifying the phrase "there no way to tie a request changed".
>
>
> Technical:
>
> I suggest something like ROPT_XCHAR_NOTIFY rather than UPD_XCHAR.
>
> I suggest that it be made clear that "xid" in this document means the
> value in the RPC-over-RDMA header's rdma_xid field. In addition, the
> call-response relationships of the new operations proposed in this document
> are perhaps important enough (and unidirectional, not like normal RPC) to
> move discussion into a separate subsection of Section 4, or perhaps earlier
> in the document.
>
> This section finishes with the following text:
>
> optuxc_pendclr, if true, indicates that a previous request to update the
> characteristic specified by optuxc_now.xchv_which is no longer to be
> considered pending. This may be set true even if the characteristic value
> is not changed from the previous value.
>
>
> This seems to imply that, after receiving an UPD_XCHAR message, the
> receiver must query the sender to see if the change actually took effect.
> Did I misunderstand this? Would it be better if UPD_XCHAR reported whether
> a characteristic change took effect or was rejected? Under what
> circumstances would optucx_pendclr be set to false?
>
>
> *** Section 6.1:
>
> Editorial:
>
> The title of this section is "Additional Operations" but the two
> paragraphs here discuss only transport characteristics. Perhaps this
> subsection could be deleted?
>
> The second paragraph is missing its ending period.
>
>
> *** Section 6.2:
>
> Editorial:
>
> I suggest breaking the first sentence into two or three sentences and
> restating.
>
>
> *** Section 6.3:
>
> Technical:
>
> I suggest adding an explicit statement here that use of xcharid values in
> the experimental range is not guaranteed to interoperate among independent
> implementations.
>
>
> *** Section 7:
>
> Editorial:
>
> I suggest moving the second paragraph to section 6.2.
>
>
> Technical:
>
> I suggest replacing the first paragraph with the following two:
>
> Like other fields that appear in each RPC-over-RDMA header, characteristic
> information is sent in the clear on the fabric with no integrity
> protection, making it vulnerable to man-in-the-middle attacks.
>
> For example, if a man-in-the-middle changes the value of the Receive
> buffer size or the Requester Remote Invalidation boolean, it can reduce
> connection performance or trigger loss of connection. Repeated connection
> loss can impact performance or even prevent a new connection from being
> established. Recourse is to deploy on a private network or use link-layer
> encryption.
>
> --
> Chuck Lever
>
>
>
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4
>