Re: [nfsv4] Review of draft-dnoveck-nfsv4-rpcrdma-xcharext-00

karen deitke <karen.deitke@oracle.com> Wed, 10 August 2016 18:16 UTC

Return-Path: <karen.deitke@oracle.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 53FE612B01B for <nfsv4@ietfa.amsl.com>; Wed, 10 Aug 2016 11:16:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.467
X-Spam-Level:
X-Spam-Status: No, score=-5.467 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.247, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JpI61Ns6lkQq for <nfsv4@ietfa.amsl.com>; Wed, 10 Aug 2016 11:16:54 -0700 (PDT)
Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 390ED12B042 for <nfsv4@ietf.org>; Wed, 10 Aug 2016 11:16:54 -0700 (PDT)
Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u7AIGqDA022877 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for <nfsv4@ietf.org>; Wed, 10 Aug 2016 18:16:53 GMT
Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0022.oracle.com (8.14.4/8.13.8) with ESMTP id u7AIGqRw010271 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for <nfsv4@ietf.org>; Wed, 10 Aug 2016 18:16:52 GMT
Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id u7AIGpl3024927 for <nfsv4@ietf.org>; Wed, 10 Aug 2016 18:16:52 GMT
Received: from [10.159.93.194] (/10.159.93.194) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 10 Aug 2016 11:16:51 -0700
To: nfsv4@ietf.org
References: <9202AD2E-D708-46FF-A78D-B87AAAA102E7@oracle.com> <CADaq8jfLMAAD6e8S1yCErF=MFwYASE3uJw0GwvQicZF_FpQ5Ag@mail.gmail.com>
From: karen deitke <karen.deitke@oracle.com>
Organization: Oracle Corporation
Message-ID: <57AB6F8F.8040808@oracle.com>
Date: Wed, 10 Aug 2016 12:16:47 -0600
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0
MIME-Version: 1.0
In-Reply-To: <CADaq8jfLMAAD6e8S1yCErF=MFwYASE3uJw0GwvQicZF_FpQ5Ag@mail.gmail.com>
Content-Type: multipart/alternative; boundary="------------090103000900050603040801"
X-Source-IP: userv0022.oracle.com [156.151.31.74]
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/O7fZArfAbzDlu3_lPIrLYVBjLA8>
X-Mailman-Approved-At: Wed, 10 Aug 2016 11:26:23 -0700
Subject: Re: [nfsv4] Review of draft-dnoveck-nfsv4-rpcrdma-xcharext-00
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Aug 2016 18:16:59 -0000


On 7/22/2016 3:04 AM, David Noveck wrote:
> > This is a long-form full document review of 
> draft-dnoveck-nfsv4-rpcrdma-xcharext-00.
>
> Thanks for the review.
>
> > Please feel free to respond with clarifying questions or further discussion where needed, 
> and no response is necessary
> > where the author(s) agree to a suggestion or plan to ignore it.
>
> To be clearer, I will
>
>   * If I'm not following a suggestion I will say why rather than
>     simply ignoring it.
>   * In cases in which I simply agree to all the suggestions in a given
>     section, I will just say "OK" to that section.
>
> > Comments applying to the whole document:
>
> > Subsequent extensions might each define one or more new characteristics. Since the 
> extension specified in this document
> > is a (necessary) extensibility enabler, should we consider merging it into the base 
> RPC-over-RDMA Version Two protocol?
>
> Yes. As discussed at the meeting, this makes architectural sense.
>
> > Two of the three initially specified transport characteristics need further technical discussion 
> outside the scope of the review of this document.
>
> I'm not sure as to best ordering of:
>
>   * Those discussions
>   * Production of a -01
>   * Incorporation of xcharext into a new version-two document.
>
> For now, I'll make the changes I'm sure about into the existing xml.  
> We can decide what to
> do later about things that might need further discussion.  I'll be 
> keeping a to-do/to-decide list.
> It may be that some of the more disruptive of those changes might be 
> done as part of effecting
> the document amalgamation.
>
> > *** Abstract
>
> > Editorial:
>
>
> > NEW
> > This document specifies an extension to RPC-over-RDMA Version Two and newer.
>
> I have problem with the phrase "and newer".  I'm not sure what you are 
> assuming Version Three will be like such that it is sufficiently 
> different from Version Two to require a new version number and still 
> need this extension.   Also, if this extension is an integral part of 
> version Two, then it will remain in subsequent versions until someone 
> makes an explicit decision to take it out.
>
> > *** Section 1.1:
>
> > NEW
> > This document specifies an extension to RPC-over-RDMA Version Two and newer.
>
> Again, I have a problem with "and newer".  No problem with the rest of it.
>
> > *** Section 2.1:
> >
> > Editorial:
> >
> > The first sentence can be stated more clearly.
>
> OK.
>
> > Opinion:
>
> > The term "char" has a longstanding and well known meaning to C coders. I found the 
> overloaded use of "char" as an abbreviation of "characteristic" to be 
> confusing.
>
> I can see that might be a problem, but I don't think I use the 
> unadorned abbreviation "char'.  It is always part of the string 
> "_XCHAR".  I can change this and probably will.
>
> > I find "attribute" or "capability" to be more in line with the intended usage and either of 
> these seems to be preferred historically over "characteristic".
> > There are plenty of examples, including within NFS itself, of the use of the term 
> "attribute."
>
> At the time, I was worried that the use of the same term might be 
> confusing.  "capability" seems like it is limited to binary choices.
>
> I'm willing to change this but note that it would change the title and 
> the filename  document.  As a result i am not making this change now.
>
> > In addition, I found the use of the prefix "x" to be confusing. Here I presume that "x" means 
> "transport", not "transmit" or "unknown value".
>
> You presume correctly.
>
> Also, note that if we change to "attribute", "XCHAR" would become 
> "XATTR" :-)

Honestly the term XCHAR I did find confusing.

>
> > Since this is an extension of RPC-over-RDMA, I was surprised not to find "rpcrdma" in the data 
> type names.
>
> If we add that (or "rpcrdma2" or "rpcrpdma2plus") to everything then 
> the names start getting very long for np very goo reaspn, and I prefer 
> shorter names.
>
> > The variable, type, and operation names used in this document are often not particularly 
> mnemonic.
> > I suggest (somewhat) longer names to help the reader remember function. Even an additional 
> vowel or two might be helpful.
/
I agree with extra vowels.

/
>
> Adding a few characters to make it more mnemonic is OK with me, I have 
> problems if that is carried to an extreme.
>
> Although I think we can come to agreement about this issue, I think it 
> makes most sense to address this as part the merger of the two 
> documents.  Are you OK with that?
>
> For now, this is on my to-do/to-decide list.
>
> > Technical:
>
> > I suggest that RFC 2119 terms are appropriate in the following sentence:
>
> OK with these.
>
> > OLD
> > The receiver of a message containing an xcharval needs to report an XDR error if the length of 
> xchv_data is such that it extends beyond the bounds of the message 
> transferred.
>
> > NEW
> > If the length of xchv_data is such that it extends beyond the bounds of the transferred message, 
> the receiver MUST report an XDR error.
>
> > Likewise for the following sentence: "the receiver also needs to report" ought to be "the 
> receiver MUST report".
>
> I've used the MUSTs but I haven't adopted your specific rephrasings,
>
>
> > When first reading this section, it was not clear to me what the xcharsubset type is for. To 
> provide some context for this type definition, it would help to add a
> > paragraph in Section 2.1 briefly summarizing the later discussion:
>
> >        • The xcharsubset type is used for reporting completion of one or more changes of 
> characteristics, and that
> >        • Completing a change request does not have to be synchronous
>
> Will address.
>
> > *** Table 1:
>
> > Technical:
>
> > The default values provided here are based on the current behavior of RPC-over-RDMA 
> Version Two.
>
> True.
>
> > Future versions of RPC-over-RDMA may prescribe different default behavior for each of 
> these characteristics.
>
> They might, but given the fact that we now have a mechanism to pick 
> and communicate the value you want, I don't expect that changing the 
> default values will be all that important.
>
> > I suggest it would be better to defer the presentation of default values until subsections 
> 3.1, 3.2, and 3.3, with explicit association of these particular 
> values with RPC-over-RDMA Version
> > Two only. Consider a minor text rewrite to make the default values dependent on the 
> RPC-over-RDMA Version being deployed.
>
> I'm Ok with this but note that, if we merge the documents, the problem 
> will come close to solving itself.  If these appear in  a document 
> defining Version Two, then obviously the default values are the 
> default values within Version Two.
>
> In a section describing further extensability one could state:
>
>   * New versions can add characteristics
>   * New versions should not change the structure of an existing
>     characteristic.
>   * New versions can change the default value of a characteristic
>
> For now, I am putting this on the to-do list
>
> > BKRQSUP_UNKNOWN is in fact not the default behavior of RPC-over-RDMA Version Two. Version Two 
> REQUIRES support for backwards direction operation. I believe the 
> correct default value > for Version Two would be BKRQSUP_SZLIM ? More 
> on that later.
> Ok on the default.  Will discuss further implications later.
>
> > Should Table 1 include a column that indicates whether a characteristic is Requester-only, 
> Responder-only, or Both? Should the names of the characteristics also 
> reflect this?
>
> I don't think this is an appropriate way to categorize 
> characteristics.  A endpoint is known to be either a client or a 
> server but whether it is requester or responder depends on
> the direction of the particular message being processed. Particular 
> characteristics may be relevant to particular roles, but I don't see 
> that as a basis for categorization. For example the receive buffer 
> size applies to the endpoint in its role as receiver but that doesn't 
> make it a receiver-only characteristic, since all endpoints are 
> receivers.  Similarly for characteristics that only apply to the 
> endpoint in the role of requester.  Given bi-directional operation, 
> these will apply to the client or the server, depending on the 
> direction of transfer.
>
> > Regarding the remark following Table 1, implementations can choose not to support updating 
> some or all of these characteristics.
> > I suggest it is appropriate to use an RFC 2119 term here to make it explicit that 
> implementations have a choice, and that they must
> > be prepared for a peer to reject a change request. Here is a sample additional paragraph:
>
> > Peers MAY reject a request to change a characteristic for any reason.
>
> OK.
>
> > There are at least two ways to reject a characteristic change request. One is to not 
> implement the REQ_XCHAR, RESP_XCHAR, and UPD_XCHAR operations.
>
> I agree about REQ_XCHAR. Not implementing the others doesn't result in 
> request rejection.  That just just means you can't find out about the 
> other endpoint's rejection of your request.
>
> > The other is to reject change requests for individual characteristics. See Section 4.3 for 
> details.
>
> OK.
>
> > ** Section 3.1:
>
> > Editorial:
>
> > For clarity, I suggest capitalizing Request Buffer Size whenever referring to the 
> characteristic itself.
>
> The characteristic is "receive buffer size".  I capitalized it in two 
> places.
>
>
> > Technical:
>
> > Please explicitly specify the units of the xchrbsiz value. Octets is assumed, but in the 
> context of XDR, the value could be in XDR quads, for instance.
> > The buffer size in octets is likely to always be a multiple for four.
>
> OK.
>
> > The data types defined here are:
>
> > const uint32XCHAR_RBSIZ = 1;
> > typedef uint32  xchrbsiz;
>
> > I find these choices less readable than, for example, XCHAR_RCVBUF_SIZE and xchr_rcvbufsz.
>
> I'm OK with these changes, but will hold off for a while in order to 
> make sure we make all the necessary changes consistently.  This is on 
> my to-do/to-decide list.
>
> > These are not necessarily my most preferred names, they simply illustrate that I 
> believe we can afford a few more characters in these names to help 
> legibility.
>
> I would have a problem with 
> RPCRDMA2_TRANSPORT_CHAR_RECEIVE_BUFFER_SIZE and 
> rpcrpma2_transport_char_receive_buffer_size :-(
>
> > The initial subset of characteristics are setting a precedent for future naming choices.
>
> Good point.
>
> > The terms "Position-Zero Read chunk" and "Reply chunk" as defined in rfc5666bis are capitalized.
>
> OK
>
> > *** Section 3.2:
>
> > Editorial:
>
> > Rather than the term "remote invalidation requests," I recommend the more specific term 
> "RDMA Send With Invalidate". Here and elsewhere in the document, > the 
> term "invalidate" is preferred over "deregister". (The term is "Remote 
> Invalidation" not "Remote Deregistration," after all).
>
> OK.
>
> Technical:
>
> > The mention of the server's lack of use of memory registration is not technically correct.
>
> Will address.
>
> > Finally, although there are no DDP-eligible data items in currently defined 
> backward-direction ULPs, an implementation can still choose to use 
> Position-Zero Read chunks or
> > Reply chunks for larger messages.
>
> Good point.
>
> > I'm not sure what exactly to do about backward-direction use of Remote Invalidation, 
> other than to have the responder also provide this characteristic via 
> INIT_XCHAR, but
> > always set to false for now,
>
> What I feel is consistent with that but I'd distinguish between what 
> the spec  says to do and what we expect to implementations to do in 
> the near future.
>
> The spec should state that this chracteristic applies to both 
> directions of operation.
>
> You and I know that server implementations, for the forseeable future 
> are either going to not report this characteristic or report it as 
> false. also clients are unlikely to look at it for a while.  The 
> document may provide this information as interesting background but 
> that doesn't affect the protocol, which allows that situation to change.
>
> > possibly stating that this characteristic is for forward-direction requests only (ie, punting).
>
> I don't think punting is appropriate.  This is a 
> second-down-and-inches-to-go situation.
>
> > Therefore may I suggest replacing the first two paragraphs with the following:
> >
> > OLD
> > The requester remote invalidation characteristic indicates that the requester is prepared 
> for the responder to issue remote invalidation requests, in order to 
> unregister memory > regions established to support RDMA Read and Write 
> operations done by the responder into or out of the requester’s memory.
> >
> > As RPC-over-RDMA is currently used, memory registration is not done on the server and 
> explicit RDMA operations are not done to satisfy backward-direction 
> requests. This
> > makes it unlikely that servers will present non-default values of the XCHAR_RQREMINV 
> characteristic or that clients will take note of the value presented 
> by servers.
>
> > NEW
> > The Requester Remote Invalidation characteristic indicates that a forward-direction 
> requester is prepared for a forward-direction responder to use RDMA 
> Send With Invalidate
> > when replying to an RPC-over-RDMA request containing non-empty chunk lists.
>
> I don't think we want to foreclose the option of using this in the 
> reverse direction.
>
> PROPOSED:
>
>     The Requester Remote Invalidation characteristic indicates that
>     the current endpoint, when in the role of a requester, is prepared
>     for the responder to use RDMA Send With Invalidate when replying
>     to an RPC-over-RDMA request containing non-empty chuck lists.
>
>     As RPC-over-RDMA is currently used, memory registration exposed to
>     peers are not done by the server and explicit RDMA operations are
>     not done to satisfy backward direction requests.  This makes it
>     unlikely that servers will present non-default values of the
>     XCHAR_RQREQ characteristic or that clients will take note of such
>     values when presented by servers.
>
>
> > IMO, there is no need to include a discussion of the implementation of Remote Invalidation in 
> this document. After <CODE ENDS> in this section, I suggest finishing 
> the section with the following
> > two replacement paragraphs:
>
> > When the Requester Remote Invalidate characteristic is set to false, a responder MUST use Send 
> to convey
> > RPC reply messages to the requester. When the Requester Remote Invalidate characteristic is 
> set to true, a
> > responder MAY use Send With Invalidate instead of Send to convey RPC replies to the requester.
>
> > The value of the Requester Remote Invalidate characteristic is not likely to change from the 
> value reported by INIT_XCHAR (see Section 4.1).
>
> OK.
>
>  -- stuff about possible need for XDR change deleted ---
>
> > We should discuss this further outside the scope of this review to come to agreement on how 
> to proceed.
>
> OK.  I'll note this on the to-do/to-decde list.
>
> I also note that if there is a change in the chunk XDR, I will need to 
> update the XDR In draft-dnoveck-nfsv4-rpcrdma-rtrext to match.
>
>
> > *** Section 3.3:
> >
> > Technical:
> >
> > RPC-over-RDMA Version Two and newer implementations are required to have support for 
> backward-direction requests.
>
> I didn't know that.  Now that you mention it, I'm not sure it is such 
> a good idea. For example, I don't think that you want to
> prevent an NFSv3 implementation, which has no need to use backward 
> direction requests, from upgrading from V1 to V2.  That
> issue can be discussed further. For now, I'll keep this on the 
> to-do/to-decide list.
>
> > I don't see a use for BKRQSUP_UNKNOWN or BKRQSUP_NONE here; V1, where
> > NONE might be appropriate, isn't extensible, and thus will never have transport 
> characteristics.
>
> Depends on the decision on whether backward direction support is 
> required in V2.
>
> >The only unknown is whether the implementation can support the use of chunks for 
> backward-direction requests. This should not be a problem if the ULP 
> does not have any DDP-eligible data items, and has a mechanism for 
> indicating request size limits and it keeps the size of backward 
> direction requests under the inline threshold.
>
> > Note that Version Two's inline threshold is at least 4KB, and implementations are free to 
> advertise an even larger threshold using the Receive Buffer Size 
> characteristic.
>
> > This size limit is actively advertised by the only ULP that uses backward-direction 
> requests, NFSv4.x.
>
>
> > The rpcrdma-bidirection specification purposely does not prohibit chunks in the backward 
> direction.
>
> Exactly.  In version Two, this is optional behavior. The 
> characteristic in question provides a means to determine whether the 
> peer supports transfer of such chunks.
>
> > When a backward direction ULP is proposed with DDP-eligible data items, support for 
> backward direction chunks will be introduced
> > in RPC-over-RDMA implementations where that ULP is supported.
>
> Yes, but my hope is that we will be in a situation in which the 
> implementation support can be introduced but no protocol change is 
> required.  If we don't provide the support before it is needed, it is 
> likely that it will be needed before it is available, which would be a 
> real drag.
>
> > Outside the scope of this review, let's propose some use cases that demonstrate how this 
> characteristic can be used effectively, and then we can consider 
> adjusting what is currently proposed.
>
> It has to be understood that any such use case will necessarily be 
> hypothetical, in that they have to assume that there will be a ULP 
> that wants this support, that Version Two, has, so far, been careful 
> not to prevent.
>
>
> *** Section 4:
>
> > Editorial:
>
> > I suggest OPT_XCHAR_ or ROPT_XCHAR_ as a prefix rather than using an _XCHAR suffix for the 
> operation names.
>
> I'm OK with this but am going to hold off a bit since a merger with 
> Version Two may supersede this.  This is now on my to-decide list.
>
> After all, if this is merged fully into Version Two, we might have 
> RDMA_INIT_XCHAR, etc.
>
> > It might be useful to have a separate table which states what the receiver's expected response 
> is (if I understand correctly):
>
> Not sure a table is useful.  Nevertheless the discussion of what would 
> be in it would be helpful.
>
> > Sender:response:
> > INIT_XCHAR    INIT_XCHAR
>
> As stated, this would result infinite response-to-response loop :-(
>
> I think of each of these as independently offered by each endpoint.
>
> > REQ_XCHAR   RESP_XCHAR or UPD_XCHAR
>
> RESP_XCHAR is the response to REQ_XCHAR.  In addition, one or more 
> UPD_XCHARs might be generate but these are
> asynchronous and not readily connected with the REQ_XCHAR That caused 
> them to be generated.
>
> > RESP_XCHAR none
> > UPD_XCHAR   none
>
> Yes.
>
>
> > *** Section 4.1:
>
> > Technical:
>
> > Paragraph ending:
>
> > can be received by partners who use the default minimum receive buffer size.
>
> > I suggest the word "minimum" be dropped from that sentence. Afterwards, add something to 
> the effect of:
>
> > The connection's initial receive buffer size is typically 1KB, but it depends on the initial 
> connection state of the RPC-over-RDMA version in use. See [rpcrmdav2] 
> for details.
>
> OK.
>
>
> *** Section 4.2:
>
> OK..
>
>
> > *** Section 4.3:
>
> OK
>
> *** Section 4.4:
>
> > Editorial:
>
> > The first paragraph has a superfluous comma:
>
> > The UPD_XCHAR message type allows an RPC-over-RDMA participant >>,<< to notify the other 
> participant that a change to the transport characteristics has occurred.
>
> Will fix.
>
> > The following paragraph:
>
> > Notethat there no way to tie a request changed to the specfic request 
> which asked for it.  In particular, the xid associated with this 
> message is independent of that for an earlier REQ_XCHAR message.
>
> > Seems more important than just a "Note".
>
> It is not "just a 'Note'", even though the verb "Note" is used.
>
> > I suggest dropping "Note that,"
>
> I don't see that raises the prominence or perceived importance of this 
> fact.
>
> One possibility is to say:
>
>     It is important to note that ...
>
>
> another possibility is to replace the verb "note" by its definition, 
> resulting in:
>
>     One should pay particular attention to the fact that there is no way
>
> > and clarifying the phrase "there no way to tie a request changed".
>
> I think that is in the following sentence.
>
> > Technical:
>
> > I suggest something like ROPT_XCHAR_NOTIFY rather than UPD_XCHAR.
>
> "NOTIFY" seems too generic to me.  How about "ROPT_XCHAR_CHANGED"?  I 
> supposed that this would be batched with other similar name changes.
>
> > I suggest that it be made clear that "xid" in this document means the value in the 
> RPC-over-RDMA header's rdma_xid field.
>
> I've done that.
>
> > In addition, the call-response relationships of the new operations proposed in this 
> document are perhaps important enough
> > (and unidirectional, not like normal RPC) to move discussion into a separate subsection of 
> Section 4, or perhaps earlier in the document.
>
> I will do that.
>
> > This section finishes with the following text:
>
> > optuxc_pendclr, if true, indicates that a previous request to update the characteristic 
> specified by optuxc_now.xchv_which
> > is no longer to be considered pending. This may be set true even if the characteristic 
> value is not changed from the previous value.
>
> > This seems to imply that, after receiving an UPD_XCHAR message, the receiver must query the 
> sender to see if the change actually took effect. Did I misunderstand 
> this?
>
> It doesn't.  The receiver gets the new current value of the 
> characteristic in any case.
>
> > Would it be better if UPD_XCHAR reported whether a characteristic change took effect or was 
> rejected?
>
> rejection is reported by RESP_XCHAR.
>
> >Under what circumstances would optucx_pendclr be set to false?
>
> Two possibilities;
>
>   * change not requested.  i.e. the sender is notifying you of a
>     change you didn't request.
>   * you requested a change but the request is still pending. this
>     could happen if you requested that something be changed and
>     instead of waiting untuil the change was effected, the sender
>     decides to update you on partial completion, while continuing to
>     keep the original request pending.
>
Why would the responder do the second above?   If the requestor gets 
optucx_pendclr how would it know which one of the above conditions 
occurred?  Should it bother expecting a change at all?  For example if 
it requested that the receive buffer size be changed, seems like it 
would wait for the repsonse indicating that either it was change, or no 
it won't be, before sending any requests.  So, how would it distinguish 
the difference?

>
> *** Section 6.1:
>
> Editorial:
>
> > The title of this section is "Additional Operations" but the two paragraphs here discuss only 
> transport characteristics. Perhaps this subsection could be deleted?
>
> I will delete it.
>
> > The second paragraph is missing its ending period.
>
> Will fix if I don't delete this.
>
>
> *** Section 6.2:
>
> Editorial:
>
> > I suggest breaking the first sentence into two or three sentences and restating.
>
> Will do.
>
> *** Section 6.3:
>
> OK.
>
> *** Section 7:
>
> OK.
>
> On Wed, Jul 20, 2016 at 12:03 PM, Chuck Lever <chuck.lever@oracle.com 
> <mailto:chuck.lever@oracle.com>> wrote:
>
>     This is a long-form full document review of
>     draft-dnoveck-nfsv4-rpcrdma-xcharext-00. Please feel free to
>     respond with clarifying questions or further discussion where
>     needed, and no response is necessary where the author(s) agree to
>     a suggestion or plan to ignore it.
>
>
>     Comments applying to the whole document:
>
>     Subsequent extensions might each define one or more new
>     characteristics. Since the extension specified in this document is
>     a (necessary) extensibility enabler, should we consider merging it
>     into the base RPC-over-RDMA Version Two protocol?
>
>     Two of the three initially specified transport characteristics
>     need further technical discussion outside the scope of the review
>     of this document.
>
>
>     *** Abstract
>
>     Editorial:
>
>     IMO it's safe to assume in the abstract that RPC-over-RDMA Version
>     Two will have a mechanism that permits extension of the base
>     RPC-over-RDMA protocol. Explaining that in this document's
>     abstract seems unnecessary, and the use of subjunctive mood and
>     future tense dulls the impact of both the abstract and the
>     introduction section. Also, the term RPC-over-RDMA is preferred
>     over the term RPC/RDMA.
>
>     I suggest tightening the text as follows:
>
>     OLD
>     It is expected that the RPC-over-RDMA transport will, at some
>     point, allow protocol extensions to be defined. This would provide
>     for the specification of OPTIONAL features, allowing participants
>     who implement the OPTIONAL features, to cooperate as specified by
>     that    extension, while still interoperating with participants
>     who do not support that extension.
>
>     A particular extension is described herein, whose purpose is to
>     allow RPC/RDMA Endpoints to specify and manage their transport
>     characteristics in order to allow the other participant to
>     optimize message transfer in light of the characteristics
>     communicated by the initial sender.
>
>     NEW
>     This document specifies an extension to RPC-over-RDMA Version Two
>     and newer. The extension enables endpoints of an RPC-over-RDMA
>     connection to exchange information which can optimize message
>     transfer.
>
>
>     *** Section 1.1:
>
>     Editorial:
>
>     Likewise, the introduction can be written to assume that the
>     extension mechanism is in the base protocol already. I suggest
>     replacing the first two paragraphs of Section 1.1.
>
>     OLD
>     This document describes a potential extension to the RPC-over-RDMA
>     protocol, which would allow participating implementations to
>     communicate the transport characteristics of their implementation,
>     to request changes in those of the other participant, and to
>     effect changes and notify the other participant of the changes.
>
>     Although this document specifies the new OPTIONAL message header
>     types to implement these functions, the precise means by which the
>     presence of support for these OPTIONAL functions will be
>     ascertained is not described here, as would be done more
>     appropriately by the RFC defining a version of RPC-over-RDMA which
>     supports protocol extension.
>
>     NEW
>     This document specifies an extension to RPC-over-RDMA Version Two
>     and newer. It allows each participating endpoint on a single
>     connection to communicate various characteristics of its
>     implementation, to request changes in characteristics of the other
>     endpoint, and to effect changes and notify the other endpoint of
>     changes to these characteristics during operation.
>
>     The extension described herein specifies OPTIONAL message header
>     types to implement this mechanism. The means by which the
>     implementation support status of these OPTIONAL types is
>     ascertained is described in [rpcrmdav2].
>
>
>     *** Section 2.1:
>
>     Editorial:
>
>     The first sentence can be stated more clearly.
>
>     OLD
>     Receiver and sender characteristics are specified using an
>     extensible approach, that allows new characteristics to be
>     defined, in addition to the small set of initial transport
>     characteristics specified herein.
>
>     NEW
>     An initial set of receiver and sender characteristics are
>     specified in this document. An extensible approach is used,
>     allowing new characteristics to be defined in future standards
>     track documents.
>
>
>     Opinion:
>
>     The term "char" has a longstanding and well known meaning to C
>     coders. I found the overloaded use of "char" as an abbreviation of
>     "characteristic" to be confusing. I find "attribute" or
>     "capability" to be more in line with the intended usage and either
>     of these seems to be preferred historically over "characteristic".
>     There are plenty of examples, including within NFS itself, of the
>     use of the term "attribute."
>
>     In addition, I found the use of the prefix "x" to be confusing.
>     Here I presume that "x" means "transport", not "transmit" or
>     "unknown value".
>
>     Since this is an extension of RPC-over-RDMA, I was surprised not
>     to find "rpcrdma" in the data type names.
>
>     The variable, type, and operation names used in this document are
>     often not particularly mnemonic. I suggest (somewhat) longer names
>     to help the reader remember function. Even an additional vowel or
>     two might be helpful.
>
>
>     Technical:
>
>     I suggest that RFC 2119 terms are appropriate in the following
>     sentence:
>
>     OLD
>     The receiver of a message containing an xcharval needs to report
>     an XDR error if the length of xchv_data is such that it extends
>     beyond the bounds of the message transferred.
>
>     NEW
>     If the length of xchv_data is such that it extends beyond the
>     bounds of the transferred message, the receiver MUST report an XDR
>     error.
>
>     Likewise for the following sentence: "the receiver also needs to
>     report" ought to be "the receiver MUST report".
>
>     When first reading this section, it was not clear to me what the
>     xcharsubset type is for. To provide some context for this type
>     definition, it would help to add a paragraph in Section 2.1
>     briefly summarizing the later discussion:
>
>             • The xcharsubset type is used for reporting completion of
>     one or more changes of characteristics, and that
>             • Completing a change request does not have to be synchronous
>
>
>     *** Table 1:
>
>     Technical:
>
>     The default values provided here are based on the current behavior
>     of RPC-over-RDMA Version Two. Future versions of RPC-over-RDMA may
>     prescribe different default behavior for each of these
>     characteristics. I suggest it would be better to defer the
>     presentation of default values until subsections 3.1, 3.2, and
>     3.3, with explicit association of these particular values with
>     RPC-over-RDMA Version Two only. Consider a minor text rewrite to
>     make the default values dependent on the RPC-over-RDMA Version
>     being deployed.
>
>     BKRQSUP_UNKNOWN is in fact not the default behavior of
>     RPC-over-RDMA Version Two. Version Two REQUIRES support for
>     backwards direction operation. I believe the correct default value
>     for Version Two would be BKRQSUP_SZLIM ? More on that later.
>
>     Should Table 1 include a column that indicates whether a
>     characteristic is Requester-only, Responder-only, or Both? Should
>     the names of the characteristics also reflect this?
>
>     Regarding the remark following Table 1, implementations can choose
>     not to support updating some or all of these characteristics. I
>     suggest it is appropriate to use an RFC 2119 term here to make it
>     explicit that implementations have a choice, and that they must be
>     prepared for a peer to reject a change request. Here is a sample
>     additional paragraph:
>
>     Peers MAY reject a request to change a characteristic for any
>     reason. There are at least two ways to reject a characteristic
>     change request. One is to not implement the REQ_XCHAR, RESP_XCHAR,
>     and UPD_XCHAR operations. The other is to reject change requests
>     for individual characteristics. See Section 4.3 for details.
>
>
>     *** Section 3.1:
>
>     Editorial:
>
>     For clarity, I suggest capitalizing Request Buffer Size whenever
>     referring to the characteristic itself.
>
>
>     Technical:
>
>     Please explicitly specify the units of the xchrbsiz value. Octets
>     is assumed, but in the context of XDR, the value could be in XDR
>     quads, for instance. The buffer size in octets is likely to always
>     be a multiple for four.
>
>     The data types defined here are:
>
>     const uint32    XCHAR_RBSIZ = 1;
>     typedef uint32  xchrbsiz;
>
>     I find these choices less readable than, for example,
>     XCHAR_RCVBUF_SIZE and xchr_rcvbufsz. These are not necessarily my
>     most preferred names, they simply illustrate that I believe we can
>     afford a few more characters in these names to help legibility.
>     The initial subset of characteristics are setting a precedent for
>     future naming choices.
>
>     The terms "Position-Zero Read chunk" and "Reply chunk" as defined
>     in rfc5666bis are capitalized.
>
>
>     *** Section 3.2:
>
>     Editorial:
>
>     Rather than the term "remote invalidation requests," I recommend
>     the more specific term "RDMA Send With Invalidate". Here and
>     elsewhere in the document, the term "invalidate" is preferred over
>     "deregister". (The term is "Remote Invalidation" not "Remote
>     Deregistration," after all).
>
>
>     Technical:
>
>     The mention of the server's lack of use of memory registration is
>     not technically correct. A server often does register memory
>     regions that are used as RDMA Read sink buffers. (the Linux server
>     registers these buffers using FRWR; Solaris does not register
>     these buffers, relying on physical addressing and a smaller
>     scatter/gather limit per Read WR). These registered sink buffers
>     are not exposed to other network nodes.
>
>     Finally, although there are no DDP-eligible data items in
>     currently defined backward-direction ULPs, an implementation can
>     still choose to use Position-Zero Read chunks or Reply chunks for
>     larger messages. I'm not sure what exactly to do about
>     backward-direction use of Remote Invalidation, other than to have
>     the responder also provide this characteristic via INIT_XCHAR, but
>     always set to false for now, or possibly stating that this
>     characteristic is for forward-direction requests only (ie, punting).
>
>     Therefore may I suggest replacing the first two paragraphs with
>     the following:
>
>     OLD
>     The requester remote invalidation characteristic indicates that
>     the requester is prepared for the responder to issue remote
>     invalidation requests, in order to unregister memory regions
>     established to support RDMA Read and Write operations done by the
>     responder into or out of the requester’s memory.
>
>     As RPC-over-RDMA is currently used, memory registration is not
>     done on the server and explicit RDMA operations are not done to
>     satisfy backward-direction requests. This makes it unlikely that
>     servers will present non-default values of the XCHAR_RQREMINV
>     characteristic or that clients will take note of the value
>     presented by servers.
>
>     NEW
>     The Requester Remote Invalidation characteristic indicates that a
>     forward-direction requester is prepared for a forward-direction
>     responder to use RDMA Send With Invalidate when replying to an
>     RPC-over-RDMA request containing non-empty chunk lists.
>
>
>     IMO, there is no need to include a discussion of the
>     implementation of Remote Invalidation in this document. After
>     <CODE ENDS> in this section, I suggest finishing the section with
>     the following two replacement paragraphs:
>
>     When the Requester Remote Invalidate characteristic is set to
>     false, a responder MUST use Send to convey RPC reply messages to
>     the requester. When the Requester Remote Invalidate characteristic
>     is set to true, a responder MAY use Send With Invalidate instead
>     of Send to convey RPC replies to the requester.
>
>     The value of the Requester Remote Invalidate characteristic is not
>     likely to change from the value reported by INIT_XCHAR (see
>     Section 4.1).
>
>
>     After some research, I've found that full support for RI is done
>     in other protocols by having the client mark the chunk in each
>     request it wants the server to invalidate. This allows servers to
>     clearly distinguish chunks that are persistently registered (where
>     an attempted invalidation might be terminal to the connection) and
>     ones that have a short-lived tag (where invalidation should work);
>     and of course it also indicates when the client cannot tolerate
>     any remove invalidation.
>
>     This is not a mechanism that can be provided with a transport
>     characteristic, but would require an XDR change in the
>     RPC-over-RDMA header. In particular, it would require changes to
>     the chunk list data structures. For example, the set of three
>     chunk lists in each Call could be followed by a field containing
>     the tag that the client wishes to be invalidated. Once such a data
>     structure change is in place, a Requester Remote Invalidation
>     characteristic might be superfluous.
>
>     We should discuss this further outside the scope of this review to
>     come to agreement on how to proceed.
>
>
>     *** Section 3.3:
>
>     Technical:
>
>     RPC-over-RDMA Version Two and newer implementations are required
>     to have support for backward-direction requests. I don't see a use
>     for BKRQSUP_UNKNOWN or BKRQSUP_NONE here; V1, where NONE might be
>     appropriate, isn't extensible, and thus will never have transport
>     characteristics.
>
>     The only unknown is whether the implementation can support the use
>     of chunks for backward-direction requests. This should not be a
>     problem if the ULP does not have any DDP-eligible data items, and
>     has a mechanism for indicating request size limits and it keeps
>     the size of backward direction requests under the inline threshold.
>
>     Note that Version Two's inline threshold is at least 4KB, and
>     implementations are free to advertise an even larger threshold
>     using the Receive Buffer Size characteristic. This size limit is
>     actively advertised by the only ULP that uses backward-direction
>     requests, NFSv4.x.
>
>     The rpcrdma-bidirection specification purposely does not prohibit
>     chunks in the backward direction. When a backward direction ULP is
>     proposed with DDP-eligible data items, support for backward
>     direction chunks will be introduced in RPC-over-RDMA
>     implementations where that ULP is supported.
>
>     Outside the scope of this review, let's propose some use cases
>     that demonstrate how this characteristic can be used effectively,
>     and then we can consider adjusting what is currently proposed.
>
>
>     *** Section 4:
>
>     Editorial:
>
>     I suggest OPT_XCHAR_ or ROPT_XCHAR_ as a prefix rather than using
>     an _XCHAR suffix for the operation names.
>
>     It might be useful to have a separate table which states what the
>     receiver's expected response is (if I understand correctly):
>
>     Sender:            response:
>     INIT_XCHAR    INIT_XCHAR
>     REQ_XCHAR   RESP_XCHAR or UPD_XCHAR
>     RESP_XCHAR none
>     UPD_XCHAR   none
>
>
>     *** Section 4.1:
>
>     Technical:
>
>     Paragraph ending:
>
>     can be received by partners who use the default minimum receive
>     buffer size.
>
>     I suggest the word "minimum" be dropped from that sentence.
>     Afterwards, add something to the effect of:
>
>     The connection's initial receive buffer size is typically 1KB, but
>     it depends on the initial connection state of the RPC-over-RDMA
>     version in use. See [rpcrmdav2] for details.
>
>
>     *** Section 4.2:
>
>     Editorial:
>
>     The first paragraph has a superfluous comma:
>
>     The REQ_XCHAR message type allows an RPC-over-RDMA participant,
>     whether client or server, to request of its partner >>,<< that
>     relevant transport characteristics be changed.
>
>     The second paragraph is missing its ending period.
>
>
>     *** Section 4.3:
>
>     Editorial:
>
>     OLD
>     future transmissions may be made based on that assumption.
>
>     NEW
>     future transmissions may be made based on the new value.
>
>
>     *** Section 4.4:
>
>     Editorial:
>
>     The first paragraph has a superfluous comma:
>
>     The UPD_XCHAR message type allows an RPC-over-RDMA participant
>     >>,<< to notify the other participant that a change to the
>     transport characteristics has occurred.
>
>     The following paragraph:
>
>     Note that there no way to tie a request changed to the specific
>     request which asked for it.  In particular, the xid associated
>     with this message is independent of that for an earlier REQ_XCHAR
>     message.
>
>     Seems more important than just a "Note". I suggest dropping "Note
>     that," and clarifying the phrase "there no way to tie a request
>     changed".
>
>
>     Technical:
>
>     I suggest something like ROPT_XCHAR_NOTIFY rather than UPD_XCHAR.
>
>     I suggest that it be made clear that "xid" in this document means
>     the value in the RPC-over-RDMA header's rdma_xid field. In
>     addition, the call-response relationships of the new operations
>     proposed in this document are perhaps important enough (and
>     unidirectional, not like normal RPC) to move discussion into a
>     separate subsection of Section 4, or perhaps earlier in the document.
>
>     This section finishes with the following text:
>
>     optuxc_pendclr, if true, indicates that a previous request to
>     update the characteristic specified by optuxc_now.xchv_which is no
>     longer to be considered pending. This may be set true even if the
>     characteristic value is not changed from the previous value.
>
>
>     This seems to imply that, after receiving an UPD_XCHAR message,
>     the receiver must query the sender to see if the change actually
>     took effect. Did I misunderstand this? Would it be better if
>     UPD_XCHAR reported whether a characteristic change took effect or
>     was rejected? Under what circumstances would optucx_pendclr be set
>     to false?
>
>
>     *** Section 6.1:
>
>     Editorial:
>
>     The title of this section is "Additional Operations" but the two
>     paragraphs here discuss only transport characteristics. Perhaps
>     this subsection could be deleted?
>
>     The second paragraph is missing its ending period.
>
>
>     *** Section 6.2:
>
>     Editorial:
>
>     I suggest breaking the first sentence into two or three sentences
>     and restating.
>
>
>     *** Section 6.3:
>
>     Technical:
>
>     I suggest adding an explicit statement here that use of xcharid
>     values in the experimental range is not guaranteed to interoperate
>     among independent implementations.
>
>
>     *** Section 7:
>
>     Editorial:
>
>     I suggest moving the second paragraph to section 6.2.
>
>
>     Technical:
>
>     I suggest replacing the first paragraph with the following two:
>
>     Like other fields that appear in each RPC-over-RDMA header,
>     characteristic information is sent in the clear on the fabric with
>     no integrity protection, making it vulnerable to man-in-the-middle
>     attacks.
>
>     For example, if a man-in-the-middle changes the value of the
>     Receive buffer size or the Requester Remote Invalidation boolean,
>     it can reduce connection performance or trigger loss of
>     connection. Repeated connection loss can impact performance or
>     even prevent a new connection from being established. Recourse is
>     to deploy on a private network or use link-layer encryption.
>
>     --
>     Chuck Lever
>
>
>
>     _______________________________________________
>     nfsv4 mailing list
>     nfsv4@ietf.org <mailto:nfsv4@ietf.org>
>     https://www.ietf.org/mailman/listinfo/nfsv4
>
>
>
>
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4