Re: [nfsv4] Review of draft-dnoveck-nfsv4-rpcrdma-xcharext-00

Chuck Lever <chuck.lever@oracle.com> Wed, 27 July 2016 18:21 UTC

Return-Path: <chuck.lever@oracle.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 19F8D12D1BC for <nfsv4@ietfa.amsl.com>; Wed, 27 Jul 2016 11:21:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.507
X-Spam-Level:
X-Spam-Status: No, score=-5.507 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.287, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tSJfazAlcWRh for <nfsv4@ietfa.amsl.com>; Wed, 27 Jul 2016 11:21:34 -0700 (PDT)
Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C2E1412D187 for <nfsv4@ietf.org>; Wed, 27 Jul 2016 11:21:34 -0700 (PDT)
Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u6RILXeL022137 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 27 Jul 2016 18:21:33 GMT
Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userv0022.oracle.com (8.14.4/8.13.8) with ESMTP id u6RILXjE026447 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 27 Jul 2016 18:21:33 GMT
Received: from abhmp0002.oracle.com (abhmp0002.oracle.com [141.146.116.8]) by userv0121.oracle.com (8.13.8/8.13.8) with ESMTP id u6RILPiS028955; Wed, 27 Jul 2016 18:21:30 GMT
Received: from anon-dhcp-171.1015granger.net (/68.46.169.226) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 27 Jul 2016 11:21:25 -0700
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Chuck Lever <chuck.lever@oracle.com>
In-Reply-To: <CADaq8jfLMAAD6e8S1yCErF=MFwYASE3uJw0GwvQicZF_FpQ5Ag@mail.gmail.com>
Date: Wed, 27 Jul 2016 14:21:24 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <4D1DC503-68A8-49CC-B471-354B4903DA83@oracle.com>
References: <9202AD2E-D708-46FF-A78D-B87AAAA102E7@oracle.com> <CADaq8jfLMAAD6e8S1yCErF=MFwYASE3uJw0GwvQicZF_FpQ5Ag@mail.gmail.com>
To: David Noveck <davenoveck@gmail.com>
X-Mailer: Apple Mail (2.3124)
X-Source-IP: userv0022.oracle.com [156.151.31.74]
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/wVoSWoctfgZc_w2xv9tKEQp8lBg>
Cc: NFSv4 <nfsv4@ietf.org>
Subject: Re: [nfsv4] Review of draft-dnoveck-nfsv4-rpcrdma-xcharext-00
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Jul 2016 18:21:37 -0000

> On Jul 22, 2016, at 5:04 AM, David Noveck <davenoveck@gmail.com> wrote:
> 
> > This is a long-form full document review of draft-dnoveck-nfsv4-rpcrdma-xcharext-00. 
> 
> Thanks for the review.
> 
> > Please feel free to respond with clarifying questions or further discussion where needed, and no response is necessary 
> > where the author(s) agree to a suggestion or plan to ignore it.
> 
> To be clearer, I will
> 	• If I'm not following a suggestion I will say why rather than simply ignoring it.
> 	• In cases in which I simply agree to all the suggestions in a given section, I will just say "OK" to that section.
> > Comments applying to the whole document:
> 
> > Subsequent extensions might each define one or more new characteristics. Since the extension specified in this document 
> > is a (necessary) extensibility enabler, should we consider merging it into the base RPC-over-RDMA Version Two protocol?
> 
> Yes. As discussed at the meeting, this makes architectural sense. 
> 
> > Two of the three initially specified transport characteristics need further technical discussion outside the scope of the review of this document.
> 
> I'm not sure as to best ordering of:
> 	• Those discussions
> 	• Production of a -01
> 	• Incorporation of xcharext into a new version-two document.
> For now, I'll make the changes I'm sure about into the existing xml.  We can decide what to 
> do later about things that might need further discussion.  I'll be keeping a to-do/to-decide list. 
> It may be that some of the more disruptive of those changes might be done as part of effecting
> the document amalgamation.

I'll wait until you feel it is ready.

Someone might point out that the sooner the better, just to
have the most diff history attached to the particular document
that is going to be moved forward. I don't have a strong
preference.


> > *** Abstract
> 
> > Editorial:
> 
> 
> > NEW
> > This document specifies an extension to RPC-over-RDMA Version Two and newer.
> 
> I have problem with the phrase "and newer".  I'm not sure what you are assuming Version Three will be like such that it is sufficiently different from Version Two to require a new version number and still need this extension.   Also, if this extension is an integral part of version Two, then it will remain in subsequent versions until someone makes an explicit decision to take it out.
> 
> > *** Section 1.1:
> 
> > NEW
> > This document specifies an extension to RPC-over-RDMA Version Two and newer. 
> 
> Again, I have a problem with "and newer".  No problem with the rest of it.
> 
> > *** Section 2.1:
> > 
> > Editorial:
> >
> > The first sentence can be stated more clearly.
> 
> OK.
> 
> > Opinion:
> 
> > The term "char" has a longstanding and well known meaning to C coders. I found the overloaded use of "char" as an abbreviation of "characteristic" to be confusing. 
> 
> I can see that might be a problem, but I don't think I use the unadorned abbreviation "char'.  It is always part of the string "_XCHAR".  I can change this and probably will.
> 
> > I find "attribute" or "capability" to be more in line with the intended usage and either of these seems to be preferred historically over "characteristic". 
> > There are plenty of examples, including within NFS itself, of the use of the term "attribute."
> 
> At the time, I was worried that the use of the same term might be confusing.  "capability" seems like it is limited to binary choices.
> 
> I'm willing to change this but note that it would change the title and the filename  document.  As a result i am not making this change now.
> 
> > In addition, I found the use of the prefix "x" to be confusing. Here I presume that "x" means "transport", not "transmit" or "unknown value".
> 
> You presume correctly.
> 
> Also, note that if we change to "attribute", "XCHAR" would become "XATTR" :-)
> 
> > Since this is an extension of RPC-over-RDMA, I was surprised not to find "rpcrdma" in the data type names.
> 
> If we add that (or "rpcrdma2" or "rpcrpdma2plus") to everything then the names start getting very long for np very goo reaspn, and I prefer shorter names.
> 
> > The variable, type, and operation names used in this document are often not particularly mnemonic. 
> > I suggest (somewhat) longer names to help the reader remember function. Even an additional vowel or two might be helpful.
> 
> Adding a few characters to make it more mnemonic is OK with me, I have problems if that is carried to an extreme.
> 
> Although I think we can come to agreement about this issue, I think it makes most sense to address this as part the merger of the two documents.  Are you OK with that?

That's fine.


> For now, this is on my to-do/to-decide list.
> 
> > Technical:
> 
> > I suggest that RFC 2119 terms are appropriate in the following sentence:
> 
> OK with these.
> 
> > OLD
> > The receiver of a message containing an xcharval needs to report an XDR error if the length of xchv_data is such that it extends beyond the bounds of the message transferred.
> 
> > NEW
> > If the length of xchv_data is such that it extends beyond the bounds of the transferred message, the receiver MUST report an XDR error.
> 
> > Likewise for the following sentence: "the receiver also needs to report" ought to be "the receiver MUST report".
> 
> I've used the MUSTs but I haven't adopted your specific rephrasings,
>  
> 
> > When first reading this section, it was not clear to me what the xcharsubset type is for. To provide some context for this type definition, it would help to add a 
> > paragraph in Section 2.1 briefly summarizing the later discussion:
> 
> >        • The xcharsubset type is used for reporting completion of one or more changes of characteristics, and that
> >        • Completing a change request does not have to be synchronous
> 
> Will address.

[ ... snipped ... ]


> > *** Section 3.2:
> 
> > Editorial:
> 
> > Rather than the term "remote invalidation requests," I recommend the more specific term "RDMA Send With Invalidate". Here and elsewhere in the document, > the term "invalidate" is preferred over "deregister". (The term is "Remote Invalidation" not "Remote Deregistration," after all).
> 
> OK.
> 
> Technical:
> 
> > The mention of the server's lack of use of memory registration is not technically correct. 
> 
> Will address.
> 
> > Finally, although there are no DDP-eligible data items in currently defined backward-direction ULPs, an implementation can still choose to use Position-Zero Read chunks or 
> > Reply chunks for larger messages. 
> 
> Good point.
> 
> > I'm not sure what exactly to do about backward-direction use of Remote Invalidation, other than to have the responder also provide this characteristic via INIT_XCHAR, but
> > always set to false for now, 
> 
> What I feel is consistent with that but I'd distinguish between what the spec  says to do and what we expect to implementations to do in the near future. 
> 
> The spec should state that this chracteristic applies to both directions of operation.
> 
> You and I know that server implementations, for the forseeable future are either going to not report this characteristic or report it as false. also clients are unlikely to look at it for a while.  The document may provide this information as interesting background but that doesn't affect the protocol, which allows that situation to change. 
> 
> > possibly stating that this characteristic is for forward-direction requests only (ie, punting).
> 
> I don't think punting is appropriate.  This is a second-down-and-inches-to-go situation.
> 
> > Therefore may I suggest replacing the first two paragraphs with the following:
> > 
> > OLD
> > The requester remote invalidation characteristic indicates that the requester is prepared for the responder to issue remote invalidation requests, in order to unregister memory > regions established to support RDMA Read and Write operations done by the responder into or out of the requester’s memory.
> >
> > As RPC-over-RDMA is currently used, memory registration is not done on the server and explicit RDMA operations are not done to satisfy backward-direction requests. This 
> > makes it unlikely that servers will present non-default values of the XCHAR_RQREMINV characteristic or that clients will take note of the value presented by servers.
> 
> > NEW
> > The Requester Remote Invalidation characteristic indicates that a forward-direction requester is prepared for a forward-direction responder to use RDMA Send With Invalidate 
> > when replying to an RPC-over-RDMA request containing non-empty chunk lists.
> 
> I don't think we want to foreclose the option of using this in the reverse direction.
> 
> PROPOSED:
> The Requester Remote Invalidation characteristic indicates that the current endpoint, when in the role of a requester, is prepared for the responder to use RDMA Send With Invalidate when replying to an RPC-over-RDMA request containing non-empty chuck lists.
> 
> As RPC-over-RDMA is currently used, memory registration exposed to peers are not done by the server and explicit RDMA operations are not done to satisfy backward direction requests.  This makes it unlikely that servers will present non-default values of the XCHAR_RQREQ characteristic or that clients will take note of such values when presented by servers.

The first paragraph is OK. The second seems unnecessary to me.


> > IMO, there is no need to include a discussion of the implementation of Remote Invalidation in this document. After <CODE ENDS> in this section, I suggest finishing the section with the following 
> > two replacement paragraphs:
> 
> > When the Requester Remote Invalidate characteristic is set to false, a responder MUST use Send to convey 
> > RPC reply messages to the requester. When the Requester Remote Invalidate characteristic is set to true, a 
> > responder MAY use Send With Invalidate instead of Send to convey RPC replies to the requester.
> 
> > The value of the Requester Remote Invalidate characteristic is not likely to change from the value reported by INIT_XCHAR (see Section 4.1).
> 
> OK.
> 
>  -- stuff about possible need for XDR change deleted ---
> 
> > We should discuss this further outside the scope of this review to come to agreement on how to proceed.
> 
> OK.  I'll note this on the to-do/to-decde list.

I'll wait for you to bring this up on the mailing list.


> I also note that if there is a change in the chunk XDR, I will need to update the XDR In draft-dnoveck-nfsv4-rpcrdma-rtrext to match. 
> 
> 
> > *** Section 3.3:
> > 
> > Technical:
> > 
> > RPC-over-RDMA Version Two and newer implementations are required to have support for backward-direction requests. 
> 
> I didn't know that.  Now that you mention it, I'm not sure it is such a good idea.  For example, I don't think that you want to 
> prevent an NFSv3 implementation, which has no need to use backward direction requests, from upgrading from V1 to V2.

Fair point. Also, I may have overstated the requirement. I'll
make a note to revisit that language.


> That
> issue can be discussed further. For now, I'll keep this on the to-do/to-decide list.

> > I don't see a use for BKRQSUP_UNKNOWN or BKRQSUP_NONE here; V1, where 
> > NONE might be appropriate, isn't extensible, and thus will never have transport characteristics.
> 
> Depends on the decision on whether backward direction support is required in V2.
> 
> >The only unknown is whether the implementation can support the use of chunks for backward-direction requests. This should not be a problem if the ULP does not have any DDP-eligible data items, and has a mechanism for indicating request size limits and it keeps the size of backward direction requests under the inline threshold.
> 
> > Note that Version Two's inline threshold is at least 4KB, and implementations are free to advertise an even larger threshold using the Receive Buffer Size characteristic. 
> 
> > This size limit is actively advertised by the only ULP that uses backward-direction requests, NFSv4.x.
> 
> 
> > The rpcrdma-bidirection specification purposely does not prohibit chunks in the backward direction. 
> 
> Exactly.  In version Two, this is optional behavior.

It is also an option in Version One.


> The characteristic in question provides a means to determine whether the peer supports transfer of such chunks.

That's not at all clear from the description and discussion
in the draft. But, OK, I'll go with that.


> > When a backward direction ULP is proposed with DDP-eligible data items, support for backward direction chunks will be introduced 
> > in RPC-over-RDMA implementations where that ULP is supported.
> 
> Yes, but my hope is that we will be in a situation in which the implementation support can be introduced but no protocol change is required.  If we don't provide the support before it is needed, it is likely that it will be needed before it is available, which would be a real drag.

So then:

UNKNOWN - The upper layer handles reporting support levels.

NONE - No backwards-direction support at all.

SZLIM - Implementation does not support chunks.

GENL - implementation supports backward-direction chunks


I don't understand why UNKNOWN is necessary. The transport
layer either has the support or it doesn't, and it can report
that.

What should be reported when, say, NFSv4.0 and NFSv4.1 are
operating on the same connection? How does the transport
decide whether it should report UNKNOWN or SZLIM ?

I suggest that SZLIM and GENL might be more functionally named.
For example, INLINE and CHUNKS might be better. GENL is a bit
broad.

Instead of a discriminator, a bitmap might be more extensible.
That would allow an implementation that supported only
RDMA_NOMSG in the backward direction.


> > Outside the scope of this review, let's propose some use cases that demonstrate how this characteristic can be used effectively, and then we can consider adjusting what is currently proposed.
> 
> It has to be understood that any such use case will necessarily be hypothetical, in that they have to assume that there will be a ULP that wants this support, that Version Two, has, so far, been careful not to prevent.

Still struggling to understand why this one needs to be an
initially defined characteristic. We don't have any ULP that
needs this functionality. The other two characteristics have
strong justifications.

Even if this characteristic might become necessary sometime
down the line, RPC-over-RDMA Version Two is extensible:
it can be introduced at any time. It seems like there ought to
be a real use case or two to understand how it needs to work.

If we look at TCP implementations:

- there's no way at the transport level for peer implementations
to report whether they support backward-direction operation at
all.

- clients that connect via NFSv3 quite likely use an RPC layer
that does support backward-direction operation. It is completely
interoperable with an implementation that has no backward-
direction support.

There is no danger of sending a backward direction request to
a client that doesn't support backward direction operation. The
request will just be silently discarded.

Without chunk support, the behavior is supposed to be similar.

Thus, what is the purpose of reporting a backward-direction
support level? The Upper Layer can report either "no backward
direction support" or can report the largest size backward-
direction operation it can receive, already; and it's more or
less required to do that when operating on other transports.


> *** Section 4:
> 
> > Editorial:
> 
> > I suggest OPT_XCHAR_ or ROPT_XCHAR_ as a prefix rather than using an _XCHAR suffix for the operation names.
> 
> I'm OK with this but am going to hold off a bit since a merger with Version Two may supersede this.  This is now on my to-decide list.
> 
> After all, if this is merged fully into Version Two, we might have RDMA_INIT_XCHAR, etc.

> > It might be useful to have a separate table which states what the receiver's expected response is (if I understand correctly):
> 
> Not sure a table is useful.  Nevertheless the discussion of what would be in it would be helpful.
> 
> > Sender:            response:
> > INIT_XCHAR    INIT_XCHAR
> 
> As stated, this would result infinite response-to-response loop :-(
> 
> I think of each of these as independently offered by each endpoint.
> 
> > REQ_XCHAR   RESP_XCHAR or UPD_XCHAR
> 
> RESP_XCHAR is the response to REQ_XCHAR.  In addition, one or more UPD_XCHARs might be generate but these are 
> asynchronous and not readily connected with the REQ_XCHAR That caused them to be generated.
> 
> > RESP_XCHAR none
> > UPD_XCHAR   none
> 
> Yes.

Agreed, a table oversimplifies it. But the discussion here seems
useful to include.


> *** Section 4.4:
> 
> > Editorial:
> 
> > The first paragraph has a superfluous comma:
> 
> > The UPD_XCHAR message type allows an RPC-over-RDMA participant >>,<< to notify the other participant that a change to the transport characteristics has occurred.
> 
> Will fix.
> 
> > The following paragraph:
> 
> > Note that there no way to tie a request changed to the specfic request which asked for it.  In particular, the xid associated with this message is independent of that for an earlier REQ_XCHAR message.
> 
> > Seems more important than just a "Note". 
> 
> It is not "just a 'Note'", even though the verb "Note" is used.
> 
> > I suggest dropping "Note that," 
> 
> I don't see that raises the prominence or perceived importance of this fact.
> 
> One possibility is to say:
> 
> It is important to note that ...
> 
> another possibility is to replace the verb "note" by its definition, resulting in:
> 
> One should pay particular attention to the fact that there is no way
> 
> > and clarifying the phrase "there no way to tie a request changed".
> 
> I think that is in the following sentence.

The phrase I mentioned is not grammatical. Can it be cleaned up?


> > Technical:
> 
> > I suggest something like ROPT_XCHAR_NOTIFY rather than UPD_XCHAR.
> 
> "NOTIFY" seems too generic to me.  How about "ROPT_XCHAR_CHANGED"?  I supposed that this would be batched with other similar name changes.
> 
> > I suggest that it be made clear that "xid" in this document means the value in the RPC-over-RDMA header's rdma_xid field. 
> 
> I've done that.
> 
> > In addition, the call-response relationships of the new operations proposed in this document are perhaps important enough 
> > (and unidirectional, not like normal RPC) to move discussion into a separate subsection of Section 4, or perhaps earlier in the document.
> 
> I will do that.
> 
> > This section finishes with the following text:
> 
> > optuxc_pendclr, if true, indicates that a previous request to update the characteristic specified by optuxc_now.xchv_which 
> > is no longer to be considered pending. This may be set true even if the characteristic value is not changed from the previous value.
> 
> > This seems to imply that, after receiving an UPD_XCHAR message, the receiver must query the sender to see if the change actually took effect. Did I misunderstand this? 
> 
> It doesn't.  The receiver gets the new current value of the characteristic in any case.
> 
> > Would it be better if UPD_XCHAR reported whether a characteristic change took effect or was rejected? 
> 
> rejection is reported by RESP_XCHAR.
> 
> > Under what circumstances would optucx_pendclr be set to false?
> 
> Two possibilities;
> 	• change not requested.  i.e. the sender is notifying you of a change you didn't request.

The usual term is "unsolicited."

> 	• you requested a change but the request is still pending. this could happen if you requested that something be changed and instead of waiting untuil the change was effected, the sender decides to update you on partial completion, while continuing to keep the original request pending.

I'm still not clear on what "partial completion" means. Can you
give an example?


> *** Section 6.1:
> 
> Editorial:
> 
> > The title of this section is "Additional Operations" but the two paragraphs here discuss only transport characteristics. Perhaps this subsection could be deleted?
> 
> I will delete it.
> 
> > The second paragraph is missing its ending period.
> 
> Will fix if I don't delete this.
> 
> 
> *** Section 6.2:
> 
> Editorial:
> 
> > I suggest breaking the first sentence into two or three sentences and restating.
> 
> Will do.
> 
> *** Section 6.3:
> 
> OK.
> 
> *** Section 7:
> 
> OK.

--
Chuck Lever