Re: [nfsv4] Review of draft-ietf-nfsv4-rfc5667bis-09

Chuck Lever <chuck.lever@oracle.com> Wed, 26 April 2017 16:16 UTC

Return-Path: <chuck.lever@oracle.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E1B5713148D for <nfsv4@ietfa.amsl.com>; Wed, 26 Apr 2017 09:16:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.221
X-Spam-Level:
X-Spam-Status: No, score=-4.221 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id r9DlqnoUlYVo for <nfsv4@ietfa.amsl.com>; Wed, 26 Apr 2017 09:16:54 -0700 (PDT)
Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0B9D4127876 for <nfsv4@ietf.org>; Wed, 26 Apr 2017 09:16:53 -0700 (PDT)
Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id v3QGGrDM025992 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for <nfsv4@ietf.org>; Wed, 26 Apr 2017 16:16:53 GMT
Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id v3QGGqBF002169 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for <nfsv4@ietf.org>; Wed, 26 Apr 2017 16:16:53 GMT
Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by aserv0121.oracle.com (8.13.8/8.13.8) with ESMTP id v3QGGowk027004 for <nfsv4@ietf.org>; Wed, 26 Apr 2017 16:16:51 GMT
Received: from dhcp-whq-5op-3rd-and-4th-floor-gen-off-10-211-46-132.usdhcp.oraclecorp.com (/10.211.46.132) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 26 Apr 2017 09:16:50 -0700
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Chuck Lever <chuck.lever@oracle.com>
In-Reply-To: <A7BB8A22-53E3-4910-A6DE-C6103343D309@oracle.com>
Date: Wed, 26 Apr 2017 09:18:23 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <6974E7E7-051B-4F28-A61A-DF6F841B248B@oracle.com>
References: <CADaq8jdkGgL+H-yoO-+bTNbSYiE_1us9cN5SXY8QV0gfYfK0Ng@mail.gmail.com> <ce42960d-d1e9-8fa6-e98e-3e9b1a2af7d6@oracle.com> <f66e8e66-ba54-ff57-945a-7951eab2f8b1@talpey.com> <BB65A737-BDBD-4A23-9CEE-2EA153293842@oracle.com> <33468014-6695-a2da-1af8-f1f355fbe986@talpey.com> <CADaq8jcJJQ3TiVX6fFURg22YgNg=Cd7ezNQewjt6fgNK4LrPVg@mail.gmail.com> <F417EA11-D49F-420D-A64F-AE6A382B920C@oracle.com> <7213a956-6157-d0a6-432d-1da8d555d8e9@talpey.com> <A7BB8A22-53E3-4910-A6DE-C6103343D309@oracle.com>
To: NFSv4 <nfsv4@ietf.org>
X-Mailer: Apple Mail (2.3124)
X-Source-IP: userv0022.oracle.com [156.151.31.74]
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/Lkus37YsRUhoQ4UTUbLnCNMpF-A>
Subject: Re: [nfsv4] Review of draft-ietf-nfsv4-rfc5667bis-09
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 26 Apr 2017 16:16:57 -0000

> On Apr 24, 2017, at 7:30 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> 
>> 
>> On Apr 24, 2017, at 7:56 AM, Tom Talpey <tom@talpey.com> wrote:
>> 
>> On 4/21/2017 10:43 AM, Chuck Lever wrote:
>>> I agree that SHOULD/MAY makes things cloudier, and does not
>>> seem to align with well-defined RFC2119 usage.
>>> 
>>> Another way we've dealt with similar disagreements between
>>> specification and implementation is to decide that one of
>>> the implementations is incorrect.
>>> 
>>> Can we agree that:
>>> 
>>> - GARBAGE_ARGS is a bit of a layering violation, though it's
>>> understandable why it might be returned
>>> 
>>> - RPC clients are already prepared for GARBAGE_ARGS
>> 
>> Are you certain of this?
> 
> GARBAGE_ARGS has been part of the RPC protocol for decades.
> The two Unix-flavored clients that have NFS/RDMA support can
> both handle this error.

I've confirmed that the only other known NFS/RDMA client
(Oracle dNFS) properly recognizes GARBAGE_ARGS.


>> And out of curiosity, what is returned
>> to the consumer for GARBAGE_ARGS versus ERR_CHUNK?
> 
> RFC 5531:
>> GARBAGE_ARGS  = 4, /* procedure can’t decode params   */
> 
> 
> GARBAGE_ARGS is an RPC-level error. The reply is "accepted"
> with accept_stat GARBAGE_ARGS. An XID is available in the
> header.
> 
> rfc5666bis:
>> If the rdma_vers field contains a recognized value, but an
>> XDR parsing error occurs, the responder MUST reply with an
>> RDMA_ERROR procedure and set the rdma_err value to ERR_CHUNK.
> 
> 
> ERR_CHUNK is a transport level error. An XID is available
> in the header.
> 
> The difference is that the RPC layer v. the transport layer
> are reporting they don't understand the contents of the
> message (Call). There is nothing more in either type of
> message.
> 
> 
>>> - In RPC-over-RDMA Version One, we are not trying to recover
>>> (in the sense of resending a simpler COMPOUND) but are rather
>>> trying to ensure the offending RPC is properly terminated on
>>> the client, and does not further block other RPCs or deadlock
>>> the transport
>>> 
>>> Thus I claim it is harmless if a server returns GARBAGE_ARGS
>>> instead of ERR_CHUNK.
>> 
>> "Harmless" is a bit relative. The operation fails, through no fault
>> of the consumer. And, frankly, in a very mysterious way.
> 
> We have no richer way of communicating failure in RPC-over-RDMA
> Version One. We are not looking for recovery here, so I don't
> believe any more information would be useful. If the server
> wishes, it can log the failure with a message explaining what
> went wrong.
> 
> 
>> Again, I think there is more to say here. It's a limitation of the
>> protocol whose implications should be made clear (contraining the
>> complexity of COMPOUNDs, limiting scatter/gather lengths, etc).
> 
> I'd welcome any suggested text.
> 
> Honestly, I'm not sure what can be said. Neither NFSv4.0 nor
> RPC-over-RDMA have a sophisticated mechanism to communicate this
> kind of limitation. The best an NFSv4 server can do is return
> NFS4ERR_RESOURCE, which also carries little extra information
> about what a client should do to recover.
> 
> So are you comfortable with eliminating GARBAGE_ARGS if we can
> come up with more detail about the impact of not knowing how
> complex a COMPOUND can be?

I've come up with some possible replacement text for
the final two paragraphs of S5.4.1 in an attempt to
address comments from Tom, David, and Karen. The
normative requirements have been removed, and a (brief)
discussion of the consequences of not handling complex
COMPOUNDs was introduced.


5.4.2.  Complexity Considerations

   As mentioned above, an NFS version 4 COMPOUND procedure can contain
   more than one operation that carries a DDP-eligible data item.  The
   RPC-over-RDMA Version One protocol does not place any limit on the
   number of chunks that may appear in the Read or Write lists.
   Therefore an NFS version 4 client MAY construct an RPC-over-RDMA
   Version One message containing more than one Read chunk or Write
   chunk.

   However, implementations have practical limits on the number of
   chunks or segments they are prepared to process in one of these
   lists.  There are several ways an NFS Version 4 server might indicate
   that an RPC Call message constructed by a client is valid but cannot
   be processed because of implementation limitations:

   o  If the problem is detected in the upper layer (i.e., by the NFS
      version 4 implementation), the server returns an NFS status of
      NFS4ERR_RESOURCE.

   o  If the problem is detected during XDR decoding of the request
      (e.g., during re-assembly of the Call message by the RPC layer),
      the server returns an RPC accept_stat of GARBAGE_ARGS.

   o  If the problem is detected at the transport layer (i.e., during
      transport header processing), the server returns an RDMA_ERROR
      message with the err_code field set to ERR_CHUNK.

   Such implementation limits can constrain the complexity of NFS
   version 4 COMPOUNDs, limit the number of elements in scatter-gather
   operations, or prevent the use of Kerberos integrity or privacy
   services.


Comments, opinions on this approach?


>> Tom.
>> 
>> 
>>> 
>>> As a result, I can change the Read list text in S5.4.1 to be
>>> the same as the Write list text, removing the mention of
>>> GARBAGE_ARGS.
>>> 
>>> Would that sit comfortably with everyone?
>>> 
>>> 
>>>> On Apr 20, 2017, at 7:21 PM, David Noveck <davenoveck@gmail.com> wrote:
>>>> 
>>>>> The "or" is a similar situation, it prescribes a choice, which
>>>>> does not define a protocol.
>>>> 
>>>> Fair enough, but the point that needs to be made is that, with
>>>> regard to Version One, Chuck and  the working group is not
>>>> free to define a protocol.  As a result we have the kind of
>>>> ugliness you object to, but it is inherent in the choice to try to
>>>> revive Version One as-is.
>>>> 
>>>>> If an NFS version 4 client sends an RPC Call with a Read list that
>>>>> contains more chunks than an NFS version 4 server is prepared to
>>>>> process, the server SHOULD reject the request by responding with an
>>>>> RDMA_ERROR message with the rdma_err value set to ERR_CHUNK. The
>>>>> server MAY reject the RPC with an RDMA_MSG message containing an RPC
>>>>> Reply with an accept status of GARBAGE_ARGS.
>>>> 
>>>> I think I know what you intend here and I've seen stuff like this in RFCs but I don't
>>>> wthink e can do this because this is not in line with the definitions of "SHOULD"
>>>> and "MAY" that appear in RFC2119.
>>>> 
>>>> "SHOULD" means that you are supposed to do something but can avoid it if
>>>> you have a good reason and are aware of the consequences of not doing it.
>>>> In this case the "good" reason is that someone coded the implementation
>>>> to do something else, which is not all that good a reason.  The consequences of
>>>> returning the GARBAGEARGS are exactly zero, since the client has to be prepared
>>>> for either it or ERR_CHUNK.
>>>> 
>>>> "MAY" means the implementation can choose to do the action or not, which is line
>>>> with the reality here but essentially contradicts the SHOULD.
>>>> 
>>>>> This at least makes it clear which response is "preferred".
>>>> 
>>>> But it is isn't really the job of the RFC2119 terms to say which is "preferred" or
>>>> "'preferred'".  These terms are supposed to describe interoperability and the
>>>> interoperability situation is that the server MUST return ERR_CHUNK or
>>>> GARBAGEARGS and the client needs to be prepared for either.  That is the
>>>> unpleasant reality.  If you want to indicate a preference, you can say something
>>>> like:
>>>> 	• Returning ERR_CHUNK is preferrable.
>>>> 	• Returinng ERR_CHUNK is more in line with the appropriate protocol layering since this issue relates to a limitation of the transport implementation.
>>>> 	• Use of GARBAGEARGS is an unfortunate artifact of inappropriately layered implementations and is only allowed for reasons of compatibility with existing implementations.  It is desirable to avoid it.
>>>>> And one would hope a future draft would decide.
>>>> 
>>>> Not sure what draft you are thinking of.  I don't see us doing an rfc5667bisbis (rfc5667tris).
>>>> 
>>>> By the time we did that, the implementations with these restrictions will probably be gone.
>>>> 
>>>>> I have a second question though. How does the client determine what is
>>>>> the actual error? As in, how many chunks were allowed?
>>>> 
>>>> This is not fixable in Version One.  It would be in Version Two, but by then
>>>> the need will probably be gone.
>>>> 
>>>>> Does the upper
>>>>> layer have to recover, and if so what are the implications?
>>>> 
>>>> I think something could be put in to indicate that clients should break up COMPOUNDS
>>>> so the only have a single chunk each.
>>>> 
>>>>> Yes, I know 5667 did not explore this very well.
>>>> 
>>>> It didn't explore it at all.  And 5666's error reporting facilities were extremely limited.
>>>> 
>>>>> Mea culpa.
>>>> 
>>>> I don'tt think you have anything to apologize for.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Thu, Apr 20, 2017 at 5:28 PM, Tom Talpey <tom@talpey.com> wrote:
>>>> 
>>>> 
>>>> On 4/19/2017 11:14 AM, Chuck Lever wrote:
>>>> Hi Tom-
>>>> 
>>>> On Apr 18, 2017, at 11:08 PM, Tom Talpey <tom@talpey.com> wrote:
>>>> 
>>>> I noticed the same thing, and I'll add that the MUST reject condition
>>>> is very confusing because it allows an "or". In my opinion a MUST is
>>>> always a single requirement, never ambiguous.
>>>> 
>>>> I agree this kind of thing is tricky. I wrote it as "the server MUST
>>>> reject the RPC". That's the single requirement. The choice is how the
>>>> rejection is conveyed to the client.
>>>> 
>>>> The statement "MUST reject" is not testable. So, while it may be
>>>> understood what is intended, there is nothing implementable in the
>>>> MUST. The "or" is a similar situation, it prescribes a choice, which
>>>> does not define a protocol.
>>>> 
>>>> Is there some reason you want to allow such a choice? I think you'll
>>>> find that, worded properly, it becomes actually much less implementable
>>>> and interoperable than you may think.
>>>> 
>>>> The Solaris server can return an RPC-level error in cases like this.
>>>> 
>>>> Well, this is happening because the Solaris server is (probably) just
>>>> handing the chunk list up to the RPC layer, and it's the RPC (XDR)
>>>> processing that detects any problem.
>>>> 
>>>> On the other hand, an implementation could do the opposite, it could
>>>> process the chunks at the lower layer, before ever invoking RPC
>>>> processing. This would naturally lead to a non-RPC error.
>>>> 
>>>> The challenge in defining the protocol is to hide these possibilities.
>>>> 
>>>> I think there are similar choices allowed in rfc5666bis. Let's say
>>>> that in a perfect world, I would go with only ERR_CHUNK, but I'm
>>>> documenting existing implementation behavior here.
>>>> 
>>>> I'm not sure it matters to the client: both errors are permanent and
>>>> the RPC is terminated on the client.
>>>> 
>>>> I'm open to alternatives.
>>>> 
>>>> The icky way to do this is to split into two weak requirements.
>>>> 
>>>>  If an NFS version 4 client sends an RPC Call with a Read list that
>>>>  contains more chunks than an NFS version 4 server is prepared to
>>>>  process, the server SHOULD reject the request by responding with an
>>>>  RDMA_ERROR message with the rdma_err value set to ERR_CHUNK. The
>>>>  server MAY reject the RPC with an RDMA_MSG message containing an RPC
>>>>  Reply with an accept status of GARBAGE_ARGS.
>>>> 
>>>> This at least makes it clear which response is "preferred". And one
>>>> would hope a future draft would decide.
>>>> 
>>>> I have a second question though. How does the client determine what is
>>>> the actual error? As in, how many chunks were allowed? Does the upper
>>>> layer have to recover, and if so what are the implications?
>>>> 
>>>> Yes, I know 5667 did not explore this very well. Mea culpa.
>>>> 
>>>> 
>>>> 
>>>> Tom.
>>>> 
>>>> 
>>>> 
>>>> On 4/18/2017 6:32 PM, karen deitke wrote:
>>>> Hi Chuck, its unclear what you mean by "is prepared to process" in the text below.
>>>> Other than that, looks good.
>>>> 
>>>> Karen
>>>> 
>>>> 5.4.1
>>>> If an NFS version 4 client sends an RPC Call with a Write list that
>>>> contains more chunks than an NFS version 4 server is prepared to
>>>> process, the server MUST reject the RPC by responding with an
>>>> RDMA_ERROR message with the rdma_err value set to ERR_CHUNK.
>>>> 
>>>> 
>>>> If an NFS version 4 client sends an RPC Call with a Read list that
>>>> contains more chunks than an NFS version 4 server is prepared to
>>>> process, the server MUST reject the RPC by responding with an
>>>> RDMA_MSG message containing an RPC Reply with an accept status of
>>>> GARBAGE_ARGS, or with an RDMA_ERROR message with the rdma_err value
>>>> set to ERR_CHUNK.
>>>> 
>>>> 
>>>> On 4/18/2017 1:21 PM, David Noveck wrote:
>>>> *Overall Evaluation*
>>>> *
>>>> *
>>>> Major improvement over RFC5667.  Almost ready to ship.  No technical
>>>> issues.
>>>> 
>>>> A lot of my comments are basically editorial and are offered on a
>>>> take-it-or-lease-it basis.
>>>> 
>>>> I think some clarification in Section 5.4.1 is needed although not
>>>> necessarily in the ways suggested below,
>>>> 
>>>> *Comments by Section*
>>>> *5.4.1. Multiple DDP-eligible Data Items*
>>>> Giventhat READ_PLUS no longer has any DDP-eligible data items, the
>>>> situation described in the fifth bullet can no longer arise. I suggest
>>>> deleting the bullet.
>>>> The penultimate paragraph can be read as applying to some situations
>>>> in which it shouldn't and where the extra chunks would very naturally
>>>> ignored. For example, if you had on write chunk together with a READ
>>>> operation which failed, the server would have more chunks (i.e. one)
>>>> than the number it is prepared to process (i.e. zero). Suggest, as a
>>>> possible replacement:
>>>> 
>>>>  Normally, when an NFS version 4 client sends an RPC Call with a
>>>>  Write list that contains multiple chunks. each such, when matched
>>>>  with a DDP-eligible data item in the response, directs the
>>>>  placement of the data item as specified by
>>>>  [I.D.-nfsv4-rfc5666bis]. When there are DDP-eligible data items
>>>>  matched to write chunks that an NFS version 4 server is not
>>>>  prepared to process, the server MUST reject the RPC by responding
>>>>  with an RDMA_ERROR message with the rdma_err value set to ERR_CHUNK.
>>>> 
>>>> With regard to the last paragraph, I am curious that this paragraph,
>>>> unlike the previous one, allows GARBGEARGS. Is this so because that
>>>> would be allowed if the chunks in question had offsets other than
>>>> those that correspond to DDP-eligible data items? If so, please
>>>> consider the following possible replacement.
>>>> 
>>>>  Normally, when an NFS version 4 client sends an RPC Call with a
>>>>  Read list that contains multiple chunks, each such, when properly
>>>>  matched with a DDP-eliigible data item in the request, directs the
>>>>  fetching of the the data item as specified by
>>>>  [I.D.-nfsv4-rfc5666bis]. When there are DDP-eligible data items
>>>>  matched to read chunks that an NFS version 4 server is not
>>>>  prepared to process, the server MUST reject the RPC by responding
>>>>  with an RDMA_ERROR message with the rdma_err value set to ERR_CHUNK.
>>>> 
>>>> *5.6. Session-Related Considerations*
>>>> In the third sentence of the second paragraph, suggest replacing "no
>>>> different" by "not different".
>>>> In the last sentence of the last paragraph, suggest replacing "is not"
>>>> by "were not"
>>>> 
>>>> 
>>>> _______________________________________________
>>>> nfsv4 mailing list
>>>> nfsv4@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> nfsv4 mailing list
>>>> nfsv4@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>>> 
>>>> 
>>>> _______________________________________________
>>>> nfsv4 mailing list
>>>> nfsv4@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>>> 
>>>> --
>>>> Chuck Lever
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> nfsv4 mailing list
>>>> nfsv4@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> nfsv4 mailing list
>>>> nfsv4@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>>> 
>>>> _______________________________________________
>>>> nfsv4 mailing list
>>>> nfsv4@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>> 
>>> --
>>> Chuck Lever
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> nfsv4 mailing list
>>> nfsv4@ietf.org
>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>> 
>> 
>> _______________________________________________
>> nfsv4 mailing list
>> nfsv4@ietf.org
>> https://www.ietf.org/mailman/listinfo/nfsv4
> 
> --
> Chuck Lever
> 
> 
> 
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4

--
Chuck Lever