Re: [nfsv4] Review of draft-ietf-nfsv4-rfc5667bis-09

Chuck Lever <chuck.lever@oracle.com> Tue, 02 May 2017 16:50 UTC

Return-Path: <chuck.lever@oracle.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5A6DE1293E8 for <nfsv4@ietfa.amsl.com>; Tue, 2 May 2017 09:50:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.221
X-Spam-Level:
X-Spam-Status: No, score=-4.221 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0iTLENTnFNSa for <nfsv4@ietfa.amsl.com>; Tue, 2 May 2017 09:50:49 -0700 (PDT)
Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 61A90129541 for <nfsv4@ietf.org>; Tue, 2 May 2017 09:47:58 -0700 (PDT)
Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id v42GlvAH029152 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for <nfsv4@ietf.org>; Tue, 2 May 2017 16:47:57 GMT
Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id v42Glu4Z028733 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for <nfsv4@ietf.org>; Tue, 2 May 2017 16:47:56 GMT
Received: from abhmp0008.oracle.com (abhmp0008.oracle.com [141.146.116.14]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id v42Gluel031245 for <nfsv4@ietf.org>; Tue, 2 May 2017 16:47:56 GMT
Received: from anon-dhcp-171.1015granger.net (/68.46.169.226) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 02 May 2017 09:47:55 -0700
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Chuck Lever <chuck.lever@oracle.com>
In-Reply-To: <e7ff236f-29e4-06d8-86c9-486f95f9db14@oracle.com>
Date: Tue, 02 May 2017 12:47:55 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <505CD860-4167-49FB-8162-B5FE6083E7AF@oracle.com>
References: <CADaq8jdkGgL+H-yoO-+bTNbSYiE_1us9cN5SXY8QV0gfYfK0Ng@mail.gmail.com> <ce42960d-d1e9-8fa6-e98e-3e9b1a2af7d6@oracle.com> <f66e8e66-ba54-ff57-945a-7951eab2f8b1@talpey.com> <BB65A737-BDBD-4A23-9CEE-2EA153293842@oracle.com> <33468014-6695-a2da-1af8-f1f355fbe986@talpey.com> <CADaq8jcJJQ3TiVX6fFURg22YgNg=Cd7ezNQewjt6fgNK4LrPVg@mail.gmail.com> <F417EA11-D49F-420D-A64F-AE6A382B920C@oracle.com> <7213a956-6157-d0a6-432d-1da8d555d8e9@talpey.com> <A7BB8A22-53E3-4910-A6DE-C6103343D309@oracle.com> <6974E7E7-051B-4F28-A61A-DF6F841B248B@oracle.com> <af6ed8c5-6a7d-08ed-590b-1774f34e05f2@talpey.com> <F842F8E7-B576-4781-A845-F13317593F88@oracle.com> <1451a113-115b-5c43-5cfe-f0c5e21b59d6@talpey.com> <C91AC1D8-C884-490B-8738-7279DEC0F372@oracle.com> <CADaq8jc6X6y5WXuptVevhNopG9Nbfca8FUV6zYCBTADs5ohvag@mail.gmail.com> <F7941956-149D-4B4C-B793-444FC61A9517@oracle.com> <e7ff236f-29e4-06d8-86c9-486f95f9db14@oracle.com>
To: NFSv4 <nfsv4@ietf.org>
X-Mailer: Apple Mail (2.3124)
X-Source-IP: userv0022.oracle.com [156.151.31.74]
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/Cph_rjy74ilVwvW-medlHJabFPc>
Subject: Re: [nfsv4] Review of draft-ietf-nfsv4-rfc5667bis-09
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 02 May 2017 16:50:54 -0000

> On May 1, 2017, at 5:02 PM, karen deitke <karen.deitke@oracle.com> wrote:
> 
> 
> 
> On 4/28/2017 11:32 AM, Chuck Lever wrote:
>>> On Apr 27, 2017, at 10:24 AM, David Noveck <davenoveck@gmail.com> wrote:
>>> 
>>>> Correct. But since the protocol creates the problem, the protocol
>>>> definition needs to say something about dealing with it.
>>> The protocol did not create this problem.
>> I respectfully disagree.
>> 
>> The transport protocol design allows arbitrarily complex Read
>> and Write lists. This is a common practice in protocol design
>> to permit wide latitude for innovation by implementers.
>> 
>> NFSv4 also allows arbitrarily complex COMPOUNDs for similar
>> reasons, but a) clients do not today make use of this, and b)
>> newer minor versions recognize that implementation limits have
>> to be communicated.
>> 
>> The protocol problem we are trying to address is that there
>> is no mechanism (either via a specified limit or via a run-time
>> negotiation) that allows implementations to choose limits that
>> are less than infinity while allowing acceptable interoperability.
>> 
>> 
>>> Up until about a month ago,
>>> the protocol allowed multiple chunks and we believed that the
>>> implementations did as well.
>> It's true that the transport protocol hasn't changed, but
>> I think up until a month ago, we simply didn't realize that
>> permissive chunk list complexity limits was an interopera-
>> bility issue.
>> 
>> The Linux implementations take a lot of short cuts because
>> they are really advanced prototypes, not fully mature
>> implementations of the transport. One of the short cuts
>> has been that they implement just the minimum number of
>> chunk combinations needed for most NFS operations. They do
>> not implement all possible chunk combinations, nor do they
>> support arbitrarily long chunk lists, because they never
>> had to.
>> 
>> I'm aware that other implementers have taken a philoso-
>> phically similar approach to shorten the time it takes to
>> get a working client and server, and I'm sure that will be
>> the case for implementations in the future. As protocol
>> designers I don't think we should ignore this kind of
>> expediency.
>> 
>> In practice, for now chunk list complexity limits really
>> aren't a problem at all, because of a) above, except for the
>> desire to have servers accept a Read list that contains a
>> Position Zero Read chunk and a normal Read chunk at once.
>> 
>> So this is a situation that is hazardous, but might not be
>> encountered in practice for years. I don't want to spill a
>> lot more electrons or brain cells on fixing something that
>> has minimal consequences for the set of implementations and
>> ULPs we have today.
> Agreed.  The solaris server can currently handle an offset 0 read chunk, and another read chunk.  What happens if we receive more than that?  Its uncertain.  Our client doesn't implement this, nor have we seen it from other clients.  That is, its never been tested. Same is true for more than 1write chunk.
> 
>> 
>> 
>>> Then we found out that a lot implementations have these restrcctions and we are
>>> trying to deal with that situation.  The protocol has stayed the same but we now
>>> know that some implementations have these restrictions.
>>> 
>>> Even though the protocol did not create this situation, this document is the only
>>> opportunity we have to tell clients how to deal with these restrictions.
>> Or we could give implementers a base set of chunk list
>> capabilities that must be observed. For all versions of NFS on
>> RPC-over-RDMA Version One, make the limit one normal Read chunk
>> plus one PZRC (both with multiple Read segments), and one Write
>> chunk (with multiple segments). Replace discussion of handling
>> NFSv4 COMPOUNDs with more than one DDP-eligible element with
>> a few rules that determine which single operation in a multiple
>> READ or WRITE COMPOUND gets to use DDP.
>> 
>> Real support for multiple chunks with NFSv4 COMPOUNDs will have
>> to wait until another version of RPC-over-RDMA is available.
>> 
>> Not sure what to do about segment count limits.
> Agreed.  Currently the solaris server does not have a limit on the segment count, usually only what will fit in the recv buffer that holds the rdma header itself.

Here's an expansion of S5.4.2 to include most of Tom's proposed text
and a description of current implementation behavior. Please don't
hesitate to argue in favor of anything I left out, or anything I
should remove.


5.4.2.  Complexity Considerations

   The RPC-over-RDMA Version One protocol does not place any limit on
   the number of chunks or segments that may appear in the Read or Write
   lists.  However, for various reasons NFS version 4 server
   implementations often have practical limits on the number of chunks
   or segments they are prepared to process in one message.

   These implementation limits are especially important when Kerberos
   integrity or privacy is in use [RFC7861].  GSS services increase the
   size of credential material in RPC headers, forcing more frequent use
   of Position-Zero Read chunks and Reply chunks.  This can increase the
   complexity of chunk lists independent of the NFS version 4 COMPOUND
   being conveyed.

   To avoid encountering server chunk list complexity limits, NFS
   version 4 clients SHOULD restrict their RPC-over-RDMA Version One
   messages to simple combinations of chunks:

   o  The Read list contains no more than one Position-Zero Read chunk
      and one Read chunk with a non-zero Position.

   o  The Write list contains no more than one chunk.

   o  The inline threshold restricts the number of segments that may
      appear in either list.

   NFS version 4 clients wishing to send more complex chunk lists can
   provide configuration interfaces to bound the complexity of NFS
   version 4 COMPOUNDs, limit the number of elements in scatter-gather
   operations, and avoid other sources of RPC-over-RDMA chunk overruns
   at the peer.

   An NFS Version 4 server has some flexibility in how it indicates that
   an RPC-over-RDMA Version One message constructed by an NFS Version 4
   client is valid but cannot be processed.  Examples include:

   o  A problem is detected at the transport layer (i.e., during
      transport header processing).  The server returns an RDMA_ERROR
      message with the err field set to ERR_CHUNK.

   o  A problem is detected during XDR decoding of the request (e.g.,
      during re-assembly of the RPC Call message by the RPC layer).  The
      server returns an RPC reply with its "reply_stat" field set to
      MSG_ACCEPTED and its "accept_stat" field set to GARBAGE_ARGS.

   o  A problem is detected in the Upper Layer (i.e., by the NFS version
      4 implementation).  The server sends an NFS reply with a status of
      NFS4ERR_RESOURCE.

   After receiving one of these errors, an NFS version 4 client SHOULD
   NOT retransmit the failing request, as the result would be the same
   error.  It SHOULD immediately terminate the RPC transaction
   associated with the XID in the reply.



>>> On Thu, Apr 27, 2017 at 12:27 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
>>> 
>>>> On Apr 27, 2017, at 8:54 AM, Tom Talpey <tom@talpey.com> wrote:
>>>> 
>>>> On 4/27/2017 11:44 AM, Chuck Lever wrote:
>>>>>> On Apr 27, 2017, at 7:20 AM, Tom Talpey <tom@talpey.com> wrote:
>>>>>> 
>>>>>>> snip:
>>>>>>>   Such implementation limits can constrain the complexity of NFS
>>>>>>>   version 4 COMPOUNDs, limit the number of elements in scatter-gather
>>>>>>>   operations, or prevent the use of Kerberos integrity or privacy
>>>>>>>   services.
>>>>>> I like the approach, and the lead-in language looks good. The text
>>>>>> quoted above is just a little bit dark, especially that bit about
>>>>>> preventing krb5i/krb5p. I'd suggest a more active statement to replace
>>>>>> the above, including the more prescriptive SHOULD rather than "can".
>>>>>> How about:
>>>>>> 
>>>>>> "Client implementations SHOULD be prepared to provide mechanisms for
>>>>>> reporting the above errors, and optionally provide configuration to
>>>>>> limit the complexity of NFS version 4 COMPOUNDs, limit the number
>>>>>> of elements in scatter-gather operations, and to avoid other possible
>>>>>> sources of RPC-over-RDMA chunk overruns at the peer.
>>>>>> 
>>>>>> These become especially important when Kerberos integrity or privacy
>>>>>> is in place for the RPC connection. These facilities add payload to
>>>>>> the RPC headers, potentially increasing the complexity of the chunk
>>>>>> manipulation, independent of the upper layer NFS operation. The
>>>>>> implementation SHOULD consider such RPC payload requirements in
>>>>>> addition to the NFS considerations."
>>>>> Sure, I can work this in.
>>>>> 
>>>>> When you say "Client implementations SHOULD ... [report] the above
>>>>> errors" you are talking about reporting them to administrators
>>>>> and/or RPC consumers? I don't think we can use SHOULD in that case.
>>>> I am agnostic about who to inform. The important thing is that some
>>>> visibility of the error be surfaced. I absolutely don't think an
>>>> arbitrary GARBAGE_ARGS returned to an application that may simply
>>>> choke on it, qualifies.
>>> Right, no client surfaces a protocol element like GARBAGE_ARGS
>>> as it is. The failure of the RPC on the server is reported to
>>> consumers in a way that conforms with the API they use to drive
>>> the requests. So an application running on a POSIX client might
>>> use read(2) or write(2) and get back EIO in this case, because
>>> POSIX allows only a narrow range of error status codes for these
>>> APIs.
>>> 
>>> That translation is a normal part of RPC client implementations,
>>> so I didn't think I needed to state that explicitly, and thus
>>> was confused about why "Client implementations" was mentioned
>>> above.
>>> 
>>> 
>>>>> This feels like implementation advice, not protocol.
>>>> Correct. But since the protocol creates the problem, the protocol
>>>> definition needs to say something about dealing with it.
>>> Philosophical agreement with that.
>>> 
>>>> So I believe SHOULD is best.
>>>>> Would "recommend" be enough for this section?
>>>> The RFC2119 term RECOMMENDED is basically a synonym for SHOULD. It's
>>>> perfectly permissible.
>>> To be clear, I was suggesting lower-case "recommended". IMO
>>> RFC 2119 terminology isn't appropriate for describing
>>> internal APIs or mechanisms that do not appear on the wire.
>>> 
>>> The typical approach has been to say "The RPC client then
>>> terminates the RPC with an appropriate error status." Which
>>> crisply yet generically describes desired behavior while
>>> avoiding the use of MUST or SHOULD.
>>> 
>>> 
>>>> Tom.
>>>> 
>>>>> 
>>>>>> Feel free to wordsmith further.
>>>>>> 
>>>>>> Tom.
>>>>>> 
>>>>>> 
>>>>>> On 4/26/2017 12:18 PM, Chuck Lever wrote:
>>>>>>>> On Apr 24, 2017, at 7:30 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
>>>>>>>> 
>>>>>>>>> On Apr 24, 2017, at 7:56 AM, Tom Talpey <tom@talpey.com> wrote:
>>>>>>>>> 
>>>>>>>>> On 4/21/2017 10:43 AM, Chuck Lever wrote:
>>>>>>>>>> I agree that SHOULD/MAY makes things cloudier, and does not
>>>>>>>>>> seem to align with well-defined RFC2119 usage.
>>>>>>>>>> 
>>>>>>>>>> Another way we've dealt with similar disagreements between
>>>>>>>>>> specification and implementation is to decide that one of
>>>>>>>>>> the implementations is incorrect.
>>>>>>>>>> 
>>>>>>>>>> Can we agree that:
>>>>>>>>>> 
>>>>>>>>>> - GARBAGE_ARGS is a bit of a layering violation, though it's
>>>>>>>>>> understandable why it might be returned
>>>>>>>>>> 
>>>>>>>>>> - RPC clients are already prepared for GARBAGE_ARGS
>>>>>>>>> Are you certain of this?
>>>>>>>> GARBAGE_ARGS has been part of the RPC protocol for decades.
>>>>>>>> The two Unix-flavored clients that have NFS/RDMA support can
>>>>>>>> both handle this error.
>>>>>>> I've confirmed that the only other known NFS/RDMA client
>>>>>>> (Oracle dNFS) properly recognizes GARBAGE_ARGS.
>>>>>>> 
>>>>>>> 
>>>>>>>>> And out of curiosity, what is returned
>>>>>>>>> to the consumer for GARBAGE_ARGS versus ERR_CHUNK?
>>>>>>>> RFC 5531:
>>>>>>>>> GARBAGE_ARGS  = 4, /* procedure can’t decode params   */
>>>>>>>> 
>>>>>>>> GARBAGE_ARGS is an RPC-level error. The reply is "accepted"
>>>>>>>> with accept_stat GARBAGE_ARGS. An XID is available in the
>>>>>>>> header.
>>>>>>>> 
>>>>>>>> rfc5666bis:
>>>>>>>>> If the rdma_vers field contains a recognized value, but an
>>>>>>>>> XDR parsing error occurs, the responder MUST reply with an
>>>>>>>>> RDMA_ERROR procedure and set the rdma_err value to ERR_CHUNK.
>>>>>>>> 
>>>>>>>> ERR_CHUNK is a transport level error. An XID is available
>>>>>>>> in the header.
>>>>>>>> 
>>>>>>>> The difference is that the RPC layer v. the transport layer
>>>>>>>> are reporting they don't understand the contents of the
>>>>>>>> message (Call). There is nothing more in either type of
>>>>>>>> message.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>>> - In RPC-over-RDMA Version One, we are not trying to recover
>>>>>>>>>> (in the sense of resending a simpler COMPOUND) but are rather
>>>>>>>>>> trying to ensure the offending RPC is properly terminated on
>>>>>>>>>> the client, and does not further block other RPCs or deadlock
>>>>>>>>>> the transport
>>>>>>>>>> 
>>>>>>>>>> Thus I claim it is harmless if a server returns GARBAGE_ARGS
>>>>>>>>>> instead of ERR_CHUNK.
>>>>>>>>> "Harmless" is a bit relative. The operation fails, through no fault
>>>>>>>>> of the consumer. And, frankly, in a very mysterious way.
>>>>>>>> We have no richer way of communicating failure in RPC-over-RDMA
>>>>>>>> Version One. We are not looking for recovery here, so I don't
>>>>>>>> believe any more information would be useful. If the server
>>>>>>>> wishes, it can log the failure with a message explaining what
>>>>>>>> went wrong.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Again, I think there is more to say here. It's a limitation of the
>>>>>>>>> protocol whose implications should be made clear (contraining the
>>>>>>>>> complexity of COMPOUNDs, limiting scatter/gather lengths, etc).
>>>>>>>> I'd welcome any suggested text.
>>>>>>>> 
>>>>>>>> Honestly, I'm not sure what can be said. Neither NFSv4.0 nor
>>>>>>>> RPC-over-RDMA have a sophisticated mechanism to communicate this
>>>>>>>> kind of limitation. The best an NFSv4 server can do is return
>>>>>>>> NFS4ERR_RESOURCE, which also carries little extra information
>>>>>>>> about what a client should do to recover.
>>>>>>>> 
>>>>>>>> So are you comfortable with eliminating GARBAGE_ARGS if we can
>>>>>>>> come up with more detail about the impact of not knowing how
>>>>>>>> complex a COMPOUND can be?
>>>>>>> I've come up with some possible replacement text for
>>>>>>> the final two paragraphs of S5.4.1 in an attempt to
>>>>>>> address comments from Tom, David, and Karen. The
>>>>>>> normative requirements have been removed, and a (brief)
>>>>>>> discussion of the consequences of not handling complex
>>>>>>> COMPOUNDs was introduced.
>>>>>>> 
>>>>>>> 
>>>>>>> 5.4.2.  Complexity Considerations
>>>>>>> 
>>>>>>>  As mentioned above, an NFS version 4 COMPOUND procedure can contain
>>>>>>>  more than one operation that carries a DDP-eligible data item.  The
>>>>>>>  RPC-over-RDMA Version One protocol does not place any limit on the
>>>>>>>  number of chunks that may appear in the Read or Write lists.
>>>>>>>  Therefore an NFS version 4 client MAY construct an RPC-over-RDMA
>>>>>>>  Version One message containing more than one Read chunk or Write
>>>>>>>  chunk.
>>>>>>> 
>>>>>>>  However, implementations have practical limits on the number of
>>>>>>>  chunks or segments they are prepared to process in one of these
>>>>>>>  lists.  There are several ways an NFS Version 4 server might indicate
>>>>>>>  that an RPC Call message constructed by a client is valid but cannot
>>>>>>>  be processed because of implementation limitations:
>>>>>>> 
>>>>>>>  o  If the problem is detected in the upper layer (i.e., by the NFS
>>>>>>>     version 4 implementation), the server returns an NFS status of
>>>>>>>     NFS4ERR_RESOURCE.
>>>>>>> 
>>>>>>>  o  If the problem is detected during XDR decoding of the request
>>>>>>>     (e.g., during re-assembly of the Call message by the RPC layer),
>>>>>>>     the server returns an RPC accept_stat of GARBAGE_ARGS.
>>>>>>> 
>>>>>>>  o  If the problem is detected at the transport layer (i.e., during
>>>>>>>     transport header processing), the server returns an RDMA_ERROR
>>>>>>>     message with the err_code field set to ERR_CHUNK.
>>>>>>> 
>>>>>>>  Such implementation limits can constrain the complexity of NFS
>>>>>>>  version 4 COMPOUNDs, limit the number of elements in scatter-gather
>>>>>>>  operations, or prevent the use of Kerberos integrity or privacy
>>>>>>>  services.
>>>>>>> 
>>>>>>> 
>>>>>>> Comments, opinions on this approach?
>>>>>>> 
>>>>>>> 
>>>>>>>>> Tom.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> As a result, I can change the Read list text in S5.4.1 to be
>>>>>>>>>> the same as the Write list text, removing the mention of
>>>>>>>>>> GARBAGE_ARGS.
>>>>>>>>>> 
>>>>>>>>>> Would that sit comfortably with everyone?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Apr 20, 2017, at 7:21 PM, David Noveck <davenoveck@gmail.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> The "or" is a similar situation, it prescribes a choice, which
>>>>>>>>>>>> does not define a protocol.
>>>>>>>>>>> Fair enough, but the point that needs to be made is that, with
>>>>>>>>>>> regard to Version One, Chuck and  the working group is not
>>>>>>>>>>> free to define a protocol.  As a result we have the kind of
>>>>>>>>>>> ugliness you object to, but it is inherent in the choice to try to
>>>>>>>>>>> revive Version One as-is.
>>>>>>>>>>> 
>>>>>>>>>>>> If an NFS version 4 client sends an RPC Call with a Read list that
>>>>>>>>>>>> contains more chunks than an NFS version 4 server is prepared to
>>>>>>>>>>>> process, the server SHOULD reject the request by responding with an
>>>>>>>>>>>> RDMA_ERROR message with the rdma_err value set to ERR_CHUNK. The
>>>>>>>>>>>> server MAY reject the RPC with an RDMA_MSG message containing an RPC
>>>>>>>>>>>> Reply with an accept status of GARBAGE_ARGS.
>>>>>>>>>>> I think I know what you intend here and I've seen stuff like this in RFCs but I don't
>>>>>>>>>>> wthink e can do this because this is not in line with the definitions of "SHOULD"
>>>>>>>>>>> and "MAY" that appear in RFC2119.
>>>>>>>>>>> 
>>>>>>>>>>> "SHOULD" means that you are supposed to do something but can avoid it if
>>>>>>>>>>> you have a good reason and are aware of the consequences of not doing it.
>>>>>>>>>>> In this case the "good" reason is that someone coded the implementation
>>>>>>>>>>> to do something else, which is not all that good a reason.  The consequences of
>>>>>>>>>>> returning the GARBAGEARGS are exactly zero, since the client has to be prepared
>>>>>>>>>>> for either it or ERR_CHUNK.
>>>>>>>>>>> 
>>>>>>>>>>> "MAY" means the implementation can choose to do the action or not, which is line
>>>>>>>>>>> with the reality here but essentially contradicts the SHOULD.
>>>>>>>>>>> 
>>>>>>>>>>>> This at least makes it clear which response is "preferred".
>>>>>>>>>>> But it is isn't really the job of the RFC2119 terms to say which is "preferred" or
>>>>>>>>>>> "'preferred'".  These terms are supposed to describe interoperability and the
>>>>>>>>>>> interoperability situation is that the server MUST return ERR_CHUNK or
>>>>>>>>>>> GARBAGEARGS and the client needs to be prepared for either.  That is the
>>>>>>>>>>> unpleasant reality.  If you want to indicate a preference, you can say something
>>>>>>>>>>> like:
>>>>>>>>>>>        • Returning ERR_CHUNK is preferrable.
>>>>>>>>>>>        • Returinng ERR_CHUNK is more in line with the appropriate protocol layering since this issue relates to a limitation of the transport implementation.
>>>>>>>>>>>        • Use of GARBAGEARGS is an unfortunate artifact of inappropriately layered implementations and is only allowed for reasons of compatibility with existing implementations.  It is desirable to avoid it.
>>>>>>>>>>>> And one would hope a future draft would decide.
>>>>>>>>>>> Not sure what draft you are thinking of.  I don't see us doing an rfc5667bisbis (rfc5667tris).
>>>>>>>>>>> 
>>>>>>>>>>> By the time we did that, the implementations with these restrictions will probably be gone.
>>>>>>>>>>> 
>>>>>>>>>>>> I have a second question though. How does the client determine what is
>>>>>>>>>>>> the actual error? As in, how many chunks were allowed?
>>>>>>>>>>> This is not fixable in Version One.  It would be in Version Two, but by then
>>>>>>>>>>> the need will probably be gone.
>>>>>>>>>>> 
>>>>>>>>>>>> Does the upper
>>>>>>>>>>>> layer have to recover, and if so what are the implications?
>>>>>>>>>>> I think something could be put in to indicate that clients should break up COMPOUNDS
>>>>>>>>>>> so the only have a single chunk each.
>>>>>>>>>>> 
>>>>>>>>>>>> Yes, I know 5667 did not explore this very well.
>>>>>>>>>>> It didn't explore it at all.  And 5666's error reporting facilities were extremely limited.
>>>>>>>>>>> 
>>>>>>>>>>>> Mea culpa.
>>>>>>>>>>> I don'tt think you have anything to apologize for.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, Apr 20, 2017 at 5:28 PM, Tom Talpey <tom@talpey.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On 4/19/2017 11:14 AM, Chuck Lever wrote:
>>>>>>>>>>> Hi Tom-
>>>>>>>>>>> 
>>>>>>>>>>> On Apr 18, 2017, at 11:08 PM, Tom Talpey <tom@talpey.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> I noticed the same thing, and I'll add that the MUST reject condition
>>>>>>>>>>> is very confusing because it allows an "or". In my opinion a MUST is
>>>>>>>>>>> always a single requirement, never ambiguous.
>>>>>>>>>>> 
>>>>>>>>>>> I agree this kind of thing is tricky. I wrote it as "the server MUST
>>>>>>>>>>> reject the RPC". That's the single requirement. The choice is how the
>>>>>>>>>>> rejection is conveyed to the client.
>>>>>>>>>>> 
>>>>>>>>>>> The statement "MUST reject" is not testable. So, while it may be
>>>>>>>>>>> understood what is intended, there is nothing implementable in the
>>>>>>>>>>> MUST. The "or" is a similar situation, it prescribes a choice, which
>>>>>>>>>>> does not define a protocol.
>>>>>>>>>>> 
>>>>>>>>>>> Is there some reason you want to allow such a choice? I think you'll
>>>>>>>>>>> find that, worded properly, it becomes actually much less implementable
>>>>>>>>>>> and interoperable than you may think.
>>>>>>>>>>> 
>>>>>>>>>>> The Solaris server can return an RPC-level error in cases like this.
>>>>>>>>>>> 
>>>>>>>>>>> Well, this is happening because the Solaris server is (probably) just
>>>>>>>>>>> handing the chunk list up to the RPC layer, and it's the RPC (XDR)
>>>>>>>>>>> processing that detects any problem.
>>>>>>>>>>> 
>>>>>>>>>>> On the other hand, an implementation could do the opposite, it could
>>>>>>>>>>> process the chunks at the lower layer, before ever invoking RPC
>>>>>>>>>>> processing. This would naturally lead to a non-RPC error.
>>>>>>>>>>> 
>>>>>>>>>>> The challenge in defining the protocol is to hide these possibilities.
>>>>>>>>>>> 
>>>>>>>>>>> I think there are similar choices allowed in rfc5666bis. Let's say
>>>>>>>>>>> that in a perfect world, I would go with only ERR_CHUNK, but I'm
>>>>>>>>>>> documenting existing implementation behavior here.
>>>>>>>>>>> 
>>>>>>>>>>> I'm not sure it matters to the client: both errors are permanent and
>>>>>>>>>>> the RPC is terminated on the client.
>>>>>>>>>>> 
>>>>>>>>>>> I'm open to alternatives.
>>>>>>>>>>> 
>>>>>>>>>>> The icky way to do this is to split into two weak requirements.
>>>>>>>>>>> 
>>>>>>>>>>> If an NFS version 4 client sends an RPC Call with a Read list that
>>>>>>>>>>> contains more chunks than an NFS version 4 server is prepared to
>>>>>>>>>>> process, the server SHOULD reject the request by responding with an
>>>>>>>>>>> RDMA_ERROR message with the rdma_err value set to ERR_CHUNK. The
>>>>>>>>>>> server MAY reject the RPC with an RDMA_MSG message containing an RPC
>>>>>>>>>>> Reply with an accept status of GARBAGE_ARGS.
>>>>>>>>>>> 
>>>>>>>>>>> This at least makes it clear which response is "preferred". And one
>>>>>>>>>>> would hope a future draft would decide.
>>>>>>>>>>> 
>>>>>>>>>>> I have a second question though. How does the client determine what is
>>>>>>>>>>> the actual error? As in, how many chunks were allowed? Does the upper
>>>>>>>>>>> layer have to recover, and if so what are the implications?
>>>>>>>>>>> 
>>>>>>>>>>> Yes, I know 5667 did not explore this very well. Mea culpa.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Tom.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On 4/18/2017 6:32 PM, karen deitke wrote:
>>>>>>>>>>> Hi Chuck, its unclear what you mean by "is prepared to process" in the text below.
>>>>>>>>>>> Other than that, looks good.
>>>>>>>>>>> 
>>>>>>>>>>> Karen
>>>>>>>>>>> 
>>>>>>>>>>> 5.4.1
>>>>>>>>>>> If an NFS version 4 client sends an RPC Call with a Write list that
>>>>>>>>>>> contains more chunks than an NFS version 4 server is prepared to
>>>>>>>>>>> process, the server MUST reject the RPC by responding with an
>>>>>>>>>>> RDMA_ERROR message with the rdma_err value set to ERR_CHUNK.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> If an NFS version 4 client sends an RPC Call with a Read list that
>>>>>>>>>>> contains more chunks than an NFS version 4 server is prepared to
>>>>>>>>>>> process, the server MUST reject the RPC by responding with an
>>>>>>>>>>> RDMA_MSG message containing an RPC Reply with an accept status of
>>>>>>>>>>> GARBAGE_ARGS, or with an RDMA_ERROR message with the rdma_err value
>>>>>>>>>>> set to ERR_CHUNK.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On 4/18/2017 1:21 PM, David Noveck wrote:
>>>>>>>>>>> *Overall Evaluation*
>>>>>>>>>>> *
>>>>>>>>>>> *
>>>>>>>>>>> Major improvement over RFC5667.  Almost ready to ship.  No technical
>>>>>>>>>>> issues.
>>>>>>>>>>> 
>>>>>>>>>>> A lot of my comments are basically editorial and are offered on a
>>>>>>>>>>> take-it-or-lease-it basis.
>>>>>>>>>>> 
>>>>>>>>>>> I think some clarification in Section 5.4.1 is needed although not
>>>>>>>>>>> necessarily in the ways suggested below,
>>>>>>>>>>> 
>>>>>>>>>>> *Comments by Section*
>>>>>>>>>>> *5.4.1. Multiple DDP-eligible Data Items*
>>>>>>>>>>> Giventhat READ_PLUS no longer has any DDP-eligible data items, the
>>>>>>>>>>> situation described in the fifth bullet can no longer arise. I suggest
>>>>>>>>>>> deleting the bullet.
>>>>>>>>>>> The penultimate paragraph can be read as applying to some situations
>>>>>>>>>>> in which it shouldn't and where the extra chunks would very naturally
>>>>>>>>>>> ignored. For example, if you had on write chunk together with a READ
>>>>>>>>>>> operation which failed, the server would have more chunks (i.e. one)
>>>>>>>>>>> than the number it is prepared to process (i.e. zero). Suggest, as a
>>>>>>>>>>> possible replacement:
>>>>>>>>>>> 
>>>>>>>>>>> Normally, when an NFS version 4 client sends an RPC Call with a
>>>>>>>>>>> Write list that contains multiple chunks. each such, when matched
>>>>>>>>>>> with a DDP-eligible data item in the response, directs the
>>>>>>>>>>> placement of the data item as specified by
>>>>>>>>>>> [I.D.-nfsv4-rfc5666bis]. When there are DDP-eligible data items
>>>>>>>>>>> matched to write chunks that an NFS version 4 server is not
>>>>>>>>>>> prepared to process, the server MUST reject the RPC by responding
>>>>>>>>>>> with an RDMA_ERROR message with the rdma_err value set to ERR_CHUNK.
>>>>>>>>>>> 
>>>>>>>>>>> With regard to the last paragraph, I am curious that this paragraph,
>>>>>>>>>>> unlike the previous one, allows GARBGEARGS. Is this so because that
>>>>>>>>>>> would be allowed if the chunks in question had offsets other than
>>>>>>>>>>> those that correspond to DDP-eligible data items? If so, please
>>>>>>>>>>> consider the following possible replacement.
>>>>>>>>>>> 
>>>>>>>>>>> Normally, when an NFS version 4 client sends an RPC Call with a
>>>>>>>>>>> Read list that contains multiple chunks, each such, when properly
>>>>>>>>>>> matched with a DDP-eliigible data item in the request, directs the
>>>>>>>>>>> fetching of the the data item as specified by
>>>>>>>>>>> [I.D.-nfsv4-rfc5666bis]. When there are DDP-eligible data items
>>>>>>>>>>> matched to read chunks that an NFS version 4 server is not
>>>>>>>>>>> prepared to process, the server MUST reject the RPC by responding
>>>>>>>>>>> with an RDMA_ERROR message with the rdma_err value set to ERR_CHUNK.
>>>>>>>>>>> 
>>>>>>>>>>> *5.6. Session-Related Considerations*
>>>>>>>>>>> In the third sentence of the second paragraph, suggest replacing "no
>>>>>>>>>>> different" by "not different".
>>>>>>>>>>> In the last sentence of the last paragraph, suggest replacing "is not"
>>>>>>>>>>> by "were not"
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> nfsv4 mailing list
>>>>>>>>>>> nfsv4@ietf.org
>>>>>>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> nfsv4 mailing list
>>>>>>>>>>> nfsv4@ietf.org
>>>>>>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> nfsv4 mailing list
>>>>>>>>>>> nfsv4@ietf.org
>>>>>>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Chuck Lever
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> nfsv4 mailing list
>>>>>>>>>>> nfsv4@ietf.org
>>>>>>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> nfsv4 mailing list
>>>>>>>>>>> nfsv4@ietf.org
>>>>>>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> nfsv4 mailing list
>>>>>>>>>>> nfsv4@ietf.org
>>>>>>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>>>>>>>>> --
>>>>>>>>>> Chuck Lever
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> nfsv4 mailing list
>>>>>>>>>> nfsv4@ietf.org
>>>>>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> nfsv4 mailing list
>>>>>>>>> nfsv4@ietf.org
>>>>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>>>>>>> --
>>>>>>>> Chuck Lever
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> nfsv4 mailing list
>>>>>>>> nfsv4@ietf.org
>>>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>>>>>> --
>>>>>>> Chuck Lever
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> nfsv4 mailing list
>>>>>>> nfsv4@ietf.org
>>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>>>>>> 
>>>>>> _______________________________________________
>>>>>> nfsv4 mailing list
>>>>>> nfsv4@ietf.org
>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>>>> --
>>>>> Chuck Lever
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> nfsv4 mailing list
>>>>> nfsv4@ietf.org
>>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>>>> 
>>>> _______________________________________________
>>>> nfsv4 mailing list
>>>> nfsv4@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>> --
>>> Chuck Lever
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> nfsv4 mailing list
>>> nfsv4@ietf.org
>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
>> _______________________________________________
>> nfsv4 mailing list
>> nfsv4@ietf.org
>> https://www.ietf.org/mailman/listinfo/nfsv4
> 
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4

--
Chuck Lever