Re: [nfsv4] Review of draft-ietf-nfsv4-rfc5667bis-09

Chuck Lever <chuck.lever@oracle.com> Mon, 24 April 2017 14:30 UTC

Return-Path: <chuck.lever@oracle.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6E15213153E for <nfsv4@ietfa.amsl.com>; Mon, 24 Apr 2017 07:30:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.221
X-Spam-Level:
X-Spam-Status: No, score=-4.221 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HoEh8HcaEEx1 for <nfsv4@ietfa.amsl.com>; Mon, 24 Apr 2017 07:30:33 -0700 (PDT)
Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 84386131541 for <nfsv4@ietf.org>; Mon, 24 Apr 2017 07:30:28 -0700 (PDT)
Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id v3OEURWv024758 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 24 Apr 2017 14:30:27 GMT
Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id v3OEUQUR002146 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 24 Apr 2017 14:30:27 GMT
Received: from abhmp0010.oracle.com (abhmp0010.oracle.com [141.146.116.16]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id v3OEUQx6003552; Mon, 24 Apr 2017 14:30:26 GMT
Received: from anon-dhcp-171.1015granger.net (/68.46.169.226) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 24 Apr 2017 07:30:25 -0700
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Chuck Lever <chuck.lever@oracle.com>
In-Reply-To: <7213a956-6157-d0a6-432d-1da8d555d8e9@talpey.com>
Date: Mon, 24 Apr 2017 10:30:24 -0400
Cc: nfsv4@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <A7BB8A22-53E3-4910-A6DE-C6103343D309@oracle.com>
References: <CADaq8jdkGgL+H-yoO-+bTNbSYiE_1us9cN5SXY8QV0gfYfK0Ng@mail.gmail.com> <ce42960d-d1e9-8fa6-e98e-3e9b1a2af7d6@oracle.com> <f66e8e66-ba54-ff57-945a-7951eab2f8b1@talpey.com> <BB65A737-BDBD-4A23-9CEE-2EA153293842@oracle.com> <33468014-6695-a2da-1af8-f1f355fbe986@talpey.com> <CADaq8jcJJQ3TiVX6fFURg22YgNg=Cd7ezNQewjt6fgNK4LrPVg@mail.gmail.com> <F417EA11-D49F-420D-A64F-AE6A382B920C@oracle.com> <7213a956-6157-d0a6-432d-1da8d555d8e9@talpey.com>
To: Tom Talpey <tom@talpey.com>
X-Mailer: Apple Mail (2.3124)
X-Source-IP: userv0022.oracle.com [156.151.31.74]
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/5Pty-pwVwh6KNGiP04NlatLxmzM>
Subject: Re: [nfsv4] Review of draft-ietf-nfsv4-rfc5667bis-09
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 24 Apr 2017 14:30:40 -0000

> On Apr 24, 2017, at 7:56 AM, Tom Talpey <tom@talpey.com> wrote:
> 
> On 4/21/2017 10:43 AM, Chuck Lever wrote:
>> I agree that SHOULD/MAY makes things cloudier, and does not
>> seem to align with well-defined RFC2119 usage.
>> 
>> Another way we've dealt with similar disagreements between
>> specification and implementation is to decide that one of
>> the implementations is incorrect.
>> 
>> Can we agree that:
>> 
>> - GARBAGE_ARGS is a bit of a layering violation, though it's
>> understandable why it might be returned
>> 
>> - RPC clients are already prepared for GARBAGE_ARGS
> 
> Are you certain of this?

GARBAGE_ARGS has been part of the RPC protocol for decades.
The two Unix-flavored clients that have NFS/RDMA support can
both handle this error.


> And out of curiosity, what is returned
> to the consumer for GARBAGE_ARGS versus ERR_CHUNK?

RFC 5531:
> GARBAGE_ARGS  = 4, /* procedure can’t decode params   */


GARBAGE_ARGS is an RPC-level error. The reply is "accepted"
with accept_stat GARBAGE_ARGS. An XID is available in the
header.

rfc5666bis:
> If the rdma_vers field contains a recognized value, but an
> XDR parsing error occurs, the responder MUST reply with an
> RDMA_ERROR procedure and set the rdma_err value to ERR_CHUNK.


ERR_CHUNK is a transport level error. An XID is available
in the header.

The difference is that the RPC layer v. the transport layer
are reporting they don't understand the contents of the
message (Call). There is nothing more in either type of
message.


>> - In RPC-over-RDMA Version One, we are not trying to recover
>> (in the sense of resending a simpler COMPOUND) but are rather
>> trying to ensure the offending RPC is properly terminated on
>> the client, and does not further block other RPCs or deadlock
>> the transport
>> 
>> Thus I claim it is harmless if a server returns GARBAGE_ARGS
>> instead of ERR_CHUNK.
> 
> "Harmless" is a bit relative. The operation fails, through no fault
> of the consumer. And, frankly, in a very mysterious way.

We have no richer way of communicating failure in RPC-over-RDMA
Version One. We are not looking for recovery here, so I don't
believe any more information would be useful. If the server
wishes, it can log the failure with a message explaining what
went wrong.


> Again, I think there is more to say here. It's a limitation of the
> protocol whose implications should be made clear (contraining the
> complexity of COMPOUNDs, limiting scatter/gather lengths, etc).

I'd welcome any suggested text.

Honestly, I'm not sure what can be said. Neither NFSv4.0 nor
RPC-over-RDMA have a sophisticated mechanism to communicate this
kind of limitation. The best an NFSv4 server can do is return
NFS4ERR_RESOURCE, which also carries little extra information
about what a client should do to recover.

So are you comfortable with eliminating GARBAGE_ARGS if we can
come up with more detail about the impact of not knowing how
complex a COMPOUND can be?


> Tom.
> 
> 
>> 
>> As a result, I can change the Read list text in S5.4.1 to be
>> the same as the Write list text, removing the mention of
>> GARBAGE_ARGS.
>> 
>> Would that sit comfortably with everyone?
>> 
>> 
>>> On Apr 20, 2017, at 7:21 PM, David Noveck <davenoveck@gmail.com> wrote:
>>> 
>>>> The "or" is a similar situation, it prescribes a choice, which
>>>> does not define a protocol.
>>> 
>>> Fair enough, but the point that needs to be made is that, with
>>> regard to Version One, Chuck and  the working group is not
>>> free to define a protocol.  As a result we have the kind of
>>> ugliness you object to, but it is inherent in the choice to try to
>>> revive Version One as-is.
>>> 
>>>>  If an NFS version 4 client sends an RPC Call with a Read list that
>>>>  contains more chunks than an NFS version 4 server is prepared to
>>>>  process, the server SHOULD reject the request by responding with an
>>>>  RDMA_ERROR message with the rdma_err value set to ERR_CHUNK. The
>>>>  server MAY reject the RPC with an RDMA_MSG message containing an RPC
>>>>  Reply with an accept status of GARBAGE_ARGS.
>>> 
>>> I think I know what you intend here and I've seen stuff like this in RFCs but I don't
>>> wthink e can do this because this is not in line with the definitions of "SHOULD"
>>> and "MAY" that appear in RFC2119.
>>> 
>>> "SHOULD" means that you are supposed to do something but can avoid it if
>>> you have a good reason and are aware of the consequences of not doing it.
>>> In this case the "good" reason is that someone coded the implementation
>>> to do something else, which is not all that good a reason.  The consequences of
>>> returning the GARBAGEARGS are exactly zero, since the client has to be prepared
>>> for either it or ERR_CHUNK.
>>> 
>>> "MAY" means the implementation can choose to do the action or not, which is line
>>> with the reality here but essentially contradicts the SHOULD.
>>> 
>>>> This at least makes it clear which response is "preferred".
>>> 
>>> But it is isn't really the job of the RFC2119 terms to say which is "preferred" or
>>> "'preferred'".  These terms are supposed to describe interoperability and the
>>> interoperability situation is that the server MUST return ERR_CHUNK or
>>> GARBAGEARGS and the client needs to be prepared for either.  That is the
>>> unpleasant reality.  If you want to indicate a preference, you can say something
>>> like:
>>> 	• Returning ERR_CHUNK is preferrable.
>>> 	• Returinng ERR_CHUNK is more in line with the appropriate protocol layering since this issue relates to a limitation of the transport implementation.
>>> 	• Use of GARBAGEARGS is an unfortunate artifact of inappropriately layered implementations and is only allowed for reasons of compatibility with existing implementations.  It is desirable to avoid it.
>>>> And one would hope a future draft would decide.
>>> 
>>> Not sure what draft you are thinking of.  I don't see us doing an rfc5667bisbis (rfc5667tris).
>>> 
>>> By the time we did that, the implementations with these restrictions will probably be gone.
>>> 
>>>> I have a second question though. How does the client determine what is
>>>> the actual error? As in, how many chunks were allowed?
>>> 
>>> This is not fixable in Version One.  It would be in Version Two, but by then
>>> the need will probably be gone.
>>> 
>>>> Does the upper
>>>> layer have to recover, and if so what are the implications?
>>> 
>>> I think something could be put in to indicate that clients should break up COMPOUNDS
>>> so the only have a single chunk each.
>>> 
>>>> Yes, I know 5667 did not explore this very well.
>>> 
>>> It didn't explore it at all.  And 5666's error reporting facilities were extremely limited.
>>> 
>>>> Mea culpa.
>>> 
>>> I don'tt think you have anything to apologize for.
>>> 
>>> 
>>> 
>>> 
>>> On Thu, Apr 20, 2017 at 5:28 PM, Tom Talpey <tom@talpey.com> wrote:
>>> 
>>> 
>>> On 4/19/2017 11:14 AM, Chuck Lever wrote:
>>> Hi Tom-
>>> 
>>> On Apr 18, 2017, at 11:08 PM, Tom Talpey <tom@talpey.com> wrote:
>>> 
>>> I noticed the same thing, and I'll add that the MUST reject condition
>>> is very confusing because it allows an "or". In my opinion a MUST is
>>> always a single requirement, never ambiguous.
>>> 
>>> I agree this kind of thing is tricky. I wrote it as "the server MUST
>>> reject the RPC". That's the single requirement. The choice is how the
>>> rejection is conveyed to the client.
>>> 
>>> The statement "MUST reject" is not testable. So, while it may be
>>> understood what is intended, there is nothing implementable in the
>>> MUST. The "or" is a similar situation, it prescribes a choice, which
>>> does not define a protocol.
>>> 
>>> Is there some reason you want to allow such a choice? I think you'll
>>> find that, worded properly, it becomes actually much less implementable
>>> and interoperable than you may think.
>>> 
>>> The Solaris server can return an RPC-level error in cases like this.
>>> 
>>> Well, this is happening because the Solaris server is (probably) just
>>> handing the chunk list up to the RPC layer, and it's the RPC (XDR)
>>> processing that detects any problem.
>>> 
>>> On the other hand, an implementation could do the opposite, it could
>>> process the chunks at the lower layer, before ever invoking RPC
>>> processing. This would naturally lead to a non-RPC error.
>>> 
>>> The challenge in defining the protocol is to hide these possibilities.
>>> 
>>> I think there are similar choices allowed in rfc5666bis. Let's say
>>> that in a perfect world, I would go with only ERR_CHUNK, but I'm
>>> documenting existing implementation behavior here.
>>> 
>>> I'm not sure it matters to the client: both errors are permanent and
>>> the RPC is terminated on the client.
>>> 
>>> I'm open to alternatives.
>>> 
>>> The icky way to do this is to split into two weak requirements.
>>> 
>>>   If an NFS version 4 client sends an RPC Call with a Read list that
>>>   contains more chunks than an NFS version 4 server is prepared to
>>>   process, the server SHOULD reject the request by responding with an
>>>   RDMA_ERROR message with the rdma_err value set to ERR_CHUNK. The
>>>   server MAY reject the RPC with an RDMA_MSG message containing an RPC
>>>   Reply with an accept status of GARBAGE_ARGS.
>>> 
>>> This at least makes it clear which response is "preferred". And one
>>> would hope a future draft would decide.
>>> 
>>> I have a second question though. How does the client determine what is
>>> the actual error? As in, how many chunks were allowed? Does the upper
>>> layer have to recover, and if so what are the implications?
>>> 
>>> Yes, I know 5667 did not explore this very well. Mea culpa.
>>> 
>>> 
>>> 
>>> Tom.
>>> 
>>> 
>>> 
>>> On 4/18/2017 6:32 PM, karen deitke wrote:
>>> Hi Chuck, its unclear what you mean by "is prepared to process" in the text below.
>>> Other than that, looks good.
>>> 
>>> Karen
>>> 
>>> 5.4.1
>>>  If an NFS version 4 client sends an RPC Call with a Write list that
>>>  contains more chunks than an NFS version 4 server is prepared to
>>>  process, the server MUST reject the RPC by responding with an
>>>  RDMA_ERROR message with the rdma_err value set to ERR_CHUNK.
>>> 
>>> 
>>>  If an NFS version 4 client sends an RPC Call with a Read list that
>>>  contains more chunks than an NFS version 4 server is prepared to
>>>  process, the server MUST reject the RPC by responding with an
>>>  RDMA_MSG message containing an RPC Reply with an accept status of
>>>  GARBAGE_ARGS, or with an RDMA_ERROR message with the rdma_err value
>>>  set to ERR_CHUNK.
>>> 
>>> 
>>> On 4/18/2017 1:21 PM, David Noveck wrote:
>>> *Overall Evaluation*
>>> *
>>> *
>>> Major improvement over RFC5667.  Almost ready to ship.  No technical
>>> issues.
>>> 
>>> A lot of my comments are basically editorial and are offered on a
>>> take-it-or-lease-it basis.
>>> 
>>> I think some clarification in Section 5.4.1 is needed although not
>>> necessarily in the ways suggested below,
>>> 
>>> *Comments by Section*
>>> *5.4.1. Multiple DDP-eligible Data Items*
>>> Giventhat READ_PLUS no longer has any DDP-eligible data items, the
>>> situation described in the fifth bullet can no longer arise. I suggest
>>> deleting the bullet.
>>> The penultimate paragraph can be read as applying to some situations
>>> in which it shouldn't and where the extra chunks would very naturally
>>> ignored. For example, if you had on write chunk together with a READ
>>> operation which failed, the server would have more chunks (i.e. one)
>>> than the number it is prepared to process (i.e. zero). Suggest, as a
>>> possible replacement:
>>> 
>>>   Normally, when an NFS version 4 client sends an RPC Call with a
>>>   Write list that contains multiple chunks. each such, when matched
>>>   with a DDP-eligible data item in the response, directs the
>>>   placement of the data item as specified by
>>>   [I.D.-nfsv4-rfc5666bis]. When there are DDP-eligible data items
>>>   matched to write chunks that an NFS version 4 server is not
>>>   prepared to process, the server MUST reject the RPC by responding
>>>   with an RDMA_ERROR message with the rdma_err value set to ERR_CHUNK.
>>> 
>>> With regard to the last paragraph, I am curious that this paragraph,
>>> unlike the previous one, allows GARBGEARGS. Is this so because that
>>> would be allowed if the chunks in question had offsets other than
>>> those that correspond to DDP-eligible data items? If so, please
>>> consider the following possible replacement.
>>> 
>>>   Normally, when an NFS version 4 client sends an RPC Call with a
>>>   Read list that contains multiple chunks, each such, when properly
>>>   matched with a DDP-eliigible data item in the request, directs the
>>>   fetching of the the data item as specified by
>>>   [I.D.-nfsv4-rfc5666bis]. When there are DDP-eligible data items
>>>   matched to read chunks that an NFS version 4 server is not
>>>   prepared to process, the server MUST reject the RPC by responding
>>>   with an RDMA_ERROR message with the rdma_err value set to ERR_CHUNK.
>>> 
>>> *5.6. Session-Related Considerations*
>>> In the third sentence of the second paragraph, suggest replacing "no
>>> different" by "not different".
>>> In the last sentence of the last paragraph, suggest replacing "is not"
>>> by "were not"
>>> 
>>> 
>>> _______________________________________________
>>> nfsv4 mailing list
>>> nfsv4@ietf.org
>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> nfsv4 mailing list
>>> nfsv4@ietf.org
>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>> 
>>> 
>>> _______________________________________________
>>> nfsv4 mailing list
>>> nfsv4@ietf.org
>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>> 
>>> --
>>> Chuck Lever
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> nfsv4 mailing list
>>> nfsv4@ietf.org
>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> nfsv4 mailing list
>>> nfsv4@ietf.org
>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>> 
>>> _______________________________________________
>>> nfsv4 mailing list
>>> nfsv4@ietf.org
>>> https://www.ietf.org/mailman/listinfo/nfsv4
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
>> _______________________________________________
>> nfsv4 mailing list
>> nfsv4@ietf.org
>> https://www.ietf.org/mailman/listinfo/nfsv4
>> 
> 
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4

--
Chuck Lever