Re: [nfsv4] AD Review of draft-ietf-nfsv4-rfc5666bis-08

Chuck Lever <chuck.lever@oracle.com> Tue, 10 January 2017 21:35 UTC

Return-Path: <chuck.lever@oracle.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 818251295BE; Tue, 10 Jan 2017 13:35:38 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -8.555
X-Spam-Level:
X-Spam-Status: No, score=-8.555 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H2=-1.156, RP_MATCHES_RCVD=-3.199, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mOzzx4uykVoz; Tue, 10 Jan 2017 13:35:35 -0800 (PST)
Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AA9F31288B8; Tue, 10 Jan 2017 13:35:35 -0800 (PST)
Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id v0ALZXpr012144 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 10 Jan 2017 21:35:33 GMT
Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id v0ALZWNf005259 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 10 Jan 2017 21:35:33 GMT
Received: from abhmp0010.oracle.com (abhmp0010.oracle.com [141.146.116.16]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id v0ALZWOh015624; Tue, 10 Jan 2017 21:35:32 GMT
Received: from anon-dhcp-171.1015granger.net (/68.46.169.226) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 10 Jan 2017 13:35:32 -0800
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Chuck Lever <chuck.lever@oracle.com>
In-Reply-To: <CAKKJt-fz3HLNSD2Y3vYBT2YU+Zsxa5d3R6fBTHDYZRPU69=Cow@mail.gmail.com>
Date: Tue, 10 Jan 2017 16:35:30 -0500
Content-Transfer-Encoding: quoted-printable
Message-Id: <796338D6-3097-4128-B3FA-2BD709BE112F@oracle.com>
References: <CAKKJt-fz3HLNSD2Y3vYBT2YU+Zsxa5d3R6fBTHDYZRPU69=Cow@mail.gmail.com>
To: Spencer Dawkins at IETF <spencerdawkins.ietf@gmail.com>
X-Mailer: Apple Mail (2.3124)
X-Source-IP: aserv0022.oracle.com [141.146.126.234]
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/tIoCEex_8q-4ev36NcukUk2s4kU>
Cc: NFSv4 <nfsv4@ietf.org>, draft-ietf-nfsv4-rfc5666bis@ietf.org
Subject: Re: [nfsv4] AD Review of draft-ietf-nfsv4-rfc5666bis-08
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Jan 2017 21:35:38 -0000

Spencer, thanks for your time and your remarks. I've indicated
below where I can immediately merge your suggestions, but
there remain open questions.


> On Jan 10, 2017, at 3:29 PM, Spencer Dawkins at IETF <spencerdawkins.ietf@gmail.com> wrote:
> 
> Hi, Spencer (S, where the "S" stands for "Shepherd),
> 
> My apologies for a slow AD evaluation for this draft. I've been down with something flu-ish since December 24, and am only now surfacing. 
> 
> I found this specification to be pretty clear, but I did have some questions and suggested wording changes that I'd like for people to consider before I request Last Call. 
> 
> Could you let me know when you think the document is good to go?
> 
> In the meantime, I'll do the AD Evaluation for draft-ietf-nfsv4-rpcrdma-bidirection-05.
> 
> Thanks!
> 
> Spencer (D)
> 
> AD Evaluation follows ---
> 
> This could certainly be OK, but I note that RFC 5666 uses the pre-2008 text IPR version, while the bis draft does not. Is that intentional? (If you folks were able to obtain releases that allow you to use the default IPR version, I applaud your initiative. It's never going to be easier to do that, than now).

You mean this text from RFC 5666:

   This document may contain material from IETF Documents or IETF
   Contributions published or made publicly available before November
   10, 2008.  The person(s) controlling the copyright in some of this
   material may not have granted the IETF Trust the right to allow
   modifications of such material outside the IETF Standards Process.
   Without obtaining an adequate license from the person(s) controlling
   the copyright in such materials, this document may not be modified
   outside the IETF Standards Process, and derivative works of it may
   not be created outside the IETF Standards Process, except to format
   it for publication as an RFC or to translate it into languages other
   than English.

I didn't obtain any permission to delete this material, just simply
used the default boilerplate provided by xml2rfc. I see now that was
an oversight on my part.

Because we are continuing to use the XDR specification from RFC 5666
nearly in whole and it was originally submitted in 2006, I think we
can say that there remains some pre-2008 contribution in rfc5666bis,
and therefore should continue to include the pre-2008 IPR boilerplate.

I'm looking at

  https://xml2rfc.tools.ietf.org/rfc7749.html#attribute-ipr

Which value should I supply as the document's ipr attribute? Possibly
one of these:

  https://xml2rfc.tools.ietf.org/rfc7749.html#attribute-ipr-200811


> Just so that I understand what's being said, is 
> 
>    This document obsoletes RFC 5666.  However, the protocol specified by
>    this document is based on existing interoperating implementations of
>    the RPC-over-RDMA Version One protocol.
>    
> saying 
> 
>    This document obsoletes RFC 5666.  However, the protocol specified by
>    this document is based on experience gained from existing 
>                              ^^^^^^^^^^^^^^^^^^^^^^
>    interoperating implementations of the RPC-over-RDMA Version One protocol.
>    
> ? (If so, again, I applaud that!)

rfc5666bis does benefit from experience gained, but it is also an
effort to document the behavior of existing implementations, which
may not exactly match what was specified in RFC 5666. All existing
implementations continue to interoperate.

Is there a way to say that more explicitly here?


> I would be a bit more comfortable with 
> 
>    Open Network Computing Remote Procedure Call (ONC RPC, or simply,
>    RPC)
>    
> if the document said
> 
>    Open Network Computing Remote Procedure Call (ONC RPC, often shortened
>    in this document as RPC)
>    
> (or, more broadly, "often shortened in NFSv4 documents", if that's correct), because I'm thinking that "RPC" doesn't even refer to ONC RPC in all IETF-stream RFCs.

I can make that editorial change (unless my co-authors object,
and ditto for subsequent responses).


> Is this text
> 
>    Although the RDMA transport described herein can provide relatively
>    transparent support for any RPC application, this document also
>    describes mechanisms that can optimize data transfer even further,
>    given more active participation by RPC applications.
>    
> saying 
> 
>    Although the RDMA transport described herein can provide relatively
>    transparent support for any RPC application, this document also
>    describes mechanisms that can optimize data transfer even further,
>    for RPC applications that are willing to exploit awareness of
>    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>    RDMA transport.
>    ^^^^^^^^^^^^^^
>    
> ?

Yes, that makes more sense. I will use your replacement text.


> In this text,
> 
>    o  Section 2 has been expanded to introduce and explain key RPC, XDR,
>       and RDMA terminology.  These terms are now used consistently
>       throughout the specification.
>       
> I wonder if it's helpful to add a reference for XDR.

I will add this reference.


> I found myself reading 
> 
>    o  The specification of the optional Connection Configuration
>       Protocol has been removed from the specification.
>       
> and wondering "where did it go, and why?". Is this going to appear in a separate document, or was this so optional that it wasn't much implemented or deployed? For most of the items in the bullet list in Section 2.1, I don't think you need any explanation, but if you're removing something that was in the obsoleted RFC, it might be nice to explain what's behind that.

A) CCP was never implemented, and
B) A better design would provide a similar information in-band for
each new connection, rather than as a separate protocol.

I recall we had some of this explanation in earlier revisions, but
it was removed for brevity. I can add something here.


> In this text,
> 
>    o  A subsection describing the use of RPCSEC_GSS with RPC-over-RDMA
>       Version One has been added.
>       
> it might be helpful to add a reference for RPCSEC_GSS.

I will add this reference.


> In this text,
> 
>    o  Support for the Read-Read transfer model has been removed.  Read-
>       Read is a slower transfer model than Read-Write.  As a result,
>       implementers have chosen not to support it.  Removal simplifies
>       explanatory text, and support for the RDMA_DONE procedure is no
>       longer necessary.
>       
> "is no longer necessary" says, to me, that it can still be present, and used. Is that about right, or is "the RDMA_DONE procedure is no longer supported" more accurate? I'm just asking because the description of the RDMA_MSGP message type in the next bullet sounds much blunter, and both of those seem weaker than the descriptions in Section 5.6. If you're able to use the same language in all these places, I'd be clearer on what your intent is.

No implementation of Read-Read exists, thus no implementation sends
RDMA_DONE. I recall we carefully attempted to avoid the use of the
term "deprecated" here.

What tone or element of the text in Section 5.6 would you prefer we
use throughout?


> It seems to me that
> 
>    The protocol version number has not been changed because the protocol
>    specified in this document fully interoperates with implementations
>    of the RPC-over-RDMA Version One protocol specified in [RFC5666].
>    
> belongs a lot earlier in the document - at least in the Introduction, and possibly the Abstract as well. It's nearly five pages into the current draft.

I will move this text earlier in the document.


> I'm not clear on the relationship between RDMA and DDP, and I'm a bit surprised that Section 3.2.1 is called Direct Data Placement, but most of the contents are describing RDMA, only one paragraph mentions DDP, and that paragraph says that RDMA isn't necessary or sufficient for DDP, unless I'm misreading it. I suspect that my hangup is that it's not clear where RDMA and DDP are moving the RPC message, but I'm not sure of that. Could you take a look at this section and make sure it's as clear for an implementer as it needs to be?

Direct Data Placement is the most abstract way of defining what this
transport does. Remote Direct Memory Access is one mechanism that
implements DDP. We are attempting to write this document in terms of
data placement, rather than in terms of a specific placement mechanism.

Dave Noveck is spearheading an effort to use Send and Receive, rather
than RDMA, to achieve Direct Data Placement, for example.

As one author of this document, I might not be in the best position
to know exactly what is and is not clear to implementers. ;-) I will
have a look at this, and I'm open to other suggestions.

Though the individual changes will be surgical, this might take one
or two iterations to find consistency throughout the entire document.


> I'm noticing that "pre-posted" is used for the first time without a reference or definition, and "pinned" is used only once, without a reference or definition.

I will investigate replacement terminology.


> In this text,
> 
>    	 	RDMA Read	
>  	      The RDMA provider supports an RDMA Read operation to directly	
>  	      place peer source data in the read initiator's memory.  The local	
>  	      host initiates an RDMA Read, and completion is signaled there; no	
>  	      completion is signaled on the remote. 
>           
> is "remote" short for "remote peer"? And I'm wondering how this is different from the next paragraph, which begins,
> 
>       The remote peer receives no notification of RDMA Read completion.

"remote peer" is correct, and I agree that is redundant. Removing the text
that discusses completion handling from the first paragraph is probably
best.


> On 
> 
>  	   [RFC5666] specifies the use of both the Read-Read and the Read-Write	
>  	   Transfer Model.  All current RPC-over-RDMA Version One	
>  	   implementations use only the Read-Write Transfer Model.  Therefore	
>  	   the use of the Read-Read Transfer Model within RPC-over-RDMA Version	
>  	   One implementations is no longer supported.  Transfer Models other	
>  	   than the Read-Write model may be used in future versions of RPC-over 
>        RDMA.
>        
> I'm still stumbling over my previous point about Read-Read, but this time, it's "no longer supported", which seems different from "no longer necessary" in my previous comment.

"No longer necessary" refers to the use of a particular protocol element
(RDMA_DONE). The text here is discussing a particular mode of operation
which was never implemented. I'm open to suggestion.


> In this text,
> 
>  	4.4.2.  DDP-Eligibility	
>  		
>  	   Only an XDR data item that might benefit from Direct Data Placement	
>  	   may be reduced.  The eligibility of particular XDR data items to be	
>  	   reduced is independent of RPC-over-RDMA, and thus is not specified by
>  	   this document.
>        
> is there a pointer/reference you can give, for where this is specified? Is this describing the same "not specified in this document" as the text in 5.4.2. "Communicating DDP-Eligibility"? I had the same question there.

I will add text that explains that the eligibility of particular
XDR data items is spelled out in a protocol's Upper Layer Binding
specification, which is a separate document.


> For this text, 
> 
>    Except in special cases, a chunk contains exactly one XDR data item.
>    
> is there a pointer/reference/clue you can provide about chunks with more than one XDR data item?

I will add a reference to the discussion of Position-Zero Read chunks
and Reply chunks.


> I struggled with this text a bit (and similar text appears in Section 4.4.6.1. Write Chunk Round-up):
> 
> 3.5.  XDR Decoding with Read Chunks	 	
>        XDR requires each encoded data item to start on four-byte alignment.	
>  	   When an odd-length data item is encoded, its length is encoded	
>  	   literally, while the data is padded so the next data item in the XDR	
>  	   stream can start on a four-byte boundary. 
>        
> I would think "odd-length" covers one-byte and three-byte fields, but not two-byte fields. If you mean "a data item with a length that's not a multiple of four bytes", that would be more accurate.

I will replace the term "odd-length" with more precise language.


> I might have said "When an odd-length data item is encoded, its unpadded length is encoded". I think I understand what you meant by "encoded literally", but I was guessing.

I agree this is confusing. The actual length of said items precedes them in
the XDR stream. This length never includes XDR padding bytes.

If the chunk does not include padding, the chunk length is the same as the
actual length of the data item. If the chunk includes padding bytes, the
chunk length must be increased to include those bytes, but the actual
encoded length remains unchanged.


> Thanks for getting the code component licensing right (I think) - this is my first RFC to shepherd with a code component, so I'm reading https://trustee.ietf.org/license-info/IETF-TLP-4.htm in parallel.

I believe this was copied from recent NFS-related RFCs that
have adopted similar clauses. Let me know if we need something
different/newer here, IANAL ;-)


> In this text:
> 
> 	   The details of reporting and recovery from RDMA link layer errors are
>  	   outside the scope of this protocol specification. 
>        
> are these details described anywhere? If so, a reference would be nice.

They would be spelled out in the link layer's API and operational
specifications. There's no one reference. Would it be better to
remove this text?


> My apologies for asking about text that's unchanged from RFC 5666, but I wasn't sure what you meant here:
> 
>    RPC services normally register with a portmap or rpcbind [RFC1833]
>    service, which associates an RPC Program number with a service
>    address.  (In the case of UDP or TCP, the service address for NFS is
>    normally port 2049.)  This policy is no different with RDMA
>    transports, although it may require the allocation of port numbers
>    appropriate to each Upper Layer Protocol that uses the RPC framing
>    defined here.
>    
> Is this saying that you need multiple port numbers if you use multiple ULPs, or that specific ULPs have different port conventions, or something else? I'm guessing a couple of other possibilities, but I'm guessing.

I think this means that sometimes, the Upper Layer Protocol might choose
to allocate a separate port number for use when running on RPC-over-RDMA.
For example, NFS on traditional transports such as TCP uses port 2049,
but NFS/RDMA uses port 20049.

I will clarify this text.


> It didn't jump out at me, that this paragraph was introducing two cases that result in similar problems, until I read the rest of the section:
> 
> 	   A DDP-eligibility violation occurs when a requester forms a Call	
>  	   message with a non-DDP-eligible data item in a Read chunk.  A	
>  	   violation occurs when a responder forms a Reply message without	
>  	   reducing a DDP-eligible data item when there is a Write list provided
>  	   by the requester.
>        
> I was at least partially confused because the first case is for a "DDP-eligibility violation", and the second is for "a violation" - is the second also for a "DDP-eligibility violation"?

Yes, those are both DDP-eligibility violations.


> If I'm getting that right, this text might be clearer as
> 
> 	   A DDP-eligibility violation occurs when a requester forms a Call	
>  	   message with a non-DDP-eligible data item in a Read chunk, or
>  	   when a responder forms a Reply message without	
>  	   reducing a DDP-eligible data item when there is a Write list provided
>  	   by the requester.

I will use your replacement.


> Nice work on Section 8. You may note that Mirja and I have been including migration and coexistence discussions at TSVAREA, the past couple of IETFs.

Dave Noveck provided the structure and most of the text for this section.


> In Section 8.1, I wonder if you'll get pushback on naming a theoretically short-lived working group (yeah, I know the nfsv4 working group is older than 1998, and that's version 4, for the oldest charter in the datatracker) in a "forever RFC":
> 
> 	   Such extensions must be specified in a Standards Track document with	
>  	   appropriate review by the nfsv4 Working Group and the IESG.
>        
> Let's leave this for now, and see if anyone objects.


--
Chuck Lever