Re: [nfsv4] Review of draft-ietf-nfsv4-rfc5667bis-09

Chuck Lever <chuck.lever@oracle.com> Wed, 03 May 2017 18:08 UTC

Return-Path: <chuck.lever@oracle.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 603C4129B63 for <nfsv4@ietfa.amsl.com>; Wed, 3 May 2017 11:08:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.22
X-Spam-Level:
X-Spam-Status: No, score=-4.22 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, MIME_QP_LONG_LINE=0.001, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4Qbr5fPZfZY6 for <nfsv4@ietfa.amsl.com>; Wed, 3 May 2017 11:08:41 -0700 (PDT)
Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B3817129AB8 for <nfsv4@ietf.org>; Wed, 3 May 2017 11:06:47 -0700 (PDT)
Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id v43I6kv9008158 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for <nfsv4@ietf.org>; Wed, 3 May 2017 18:06:47 GMT
Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id v43I6kCZ029585 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for <nfsv4@ietf.org>; Wed, 3 May 2017 18:06:46 GMT
Received: from abhmp0017.oracle.com (abhmp0017.oracle.com [141.146.116.23]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id v43I6kE1023871 for <nfsv4@ietf.org>; Wed, 3 May 2017 18:06:46 GMT
Received: from [10.12.0.53] (/172.98.86.26) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 03 May 2017 11:06:45 -0700
References: <CADaq8jdkGgL+H-yoO-+bTNbSYiE_1us9cN5SXY8QV0gfYfK0Ng@mail.gmail.com> <ce42960d-d1e9-8fa6-e98e-3e9b1a2af7d6@oracle.com> <f66e8e66-ba54-ff57-945a-7951eab2f8b1@talpey.com> <BB65A737-BDBD-4A23-9CEE-2EA153293842@oracle.com> <33468014-6695-a2da-1af8-f1f355fbe986@talpey.com> <CADaq8jcJJQ3TiVX6fFURg22YgNg=Cd7ezNQewjt6fgNK4LrPVg@mail.gmail.com> <F417EA11-D49F-420D-A64F-AE6A382B920C@oracle.com> <7213a956-6157-d0a6-432d-1da8d555d8e9@talpey.com> <A7BB8A22-53E3-4910-A6DE-C6103343D309@oracle.com> <6974E7E7-051B-4F28-A61A-DF6F841B248B@oracle.com> <af6ed8c5-6a7d-08ed-590b-1774f34e05f2@talpey.com> <F842F8E7-B576-4781-A845-F13317593F88@oracle.com> <1451a113-115b-5c43-5cfe-f0c5e21b59d6@talpey.com> <C91AC1D8-C884-490B-8738-7279DEC0F372@oracle.com> <CADaq8jc6X6y5WXuptVevhNopG9Nbfca8FUV6zYCBTADs5ohvag@mail.gmail.com> <F7941956-149D-4B4C-B793-444FC61A9517@oracle.com> <e7ff236f-29e4-06d8-86c9-486f95f9db14@oracle.com> <505CD860-4167-49FB-8162-B5FE6083E7AF@oracle.com> <BN6PR03MB2449B8D6FE! E4975D78858456A0160@BN6PR03MB2449.namprd03.prod.outlook.com> <02E06540-C6EE-44F5-A538-4D3FDB12EFA0@oracle.com> <CADaq8jc0ncZHpwSu1QAv5NTRsA7e1tQnT679UTcsk6soe8mkTw@mail.gmail.com>
From: Chuck Lever <chuck.lever@oracle.com>
Content-Type: multipart/alternative; boundary="Apple-Mail-B3AF2E46-3CC6-473D-A6FD-519A9A23A8DC"
X-Mailer: iPad Mail (14E304)
In-Reply-To: <CADaq8jc0ncZHpwSu1QAv5NTRsA7e1tQnT679UTcsk6soe8mkTw@mail.gmail.com>
Message-Id: <57E0C5EF-B80C-4537-9527-CFAEFAD701BC@oracle.com>
Date: Wed, 03 May 2017 14:06:42 -0400
To: nfsv4@ietf.org
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (1.0)
X-Source-IP: userv0021.oracle.com [156.151.31.71]
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/BXU_aiXHYRv900hRy3hZn-dqWik>
Subject: Re: [nfsv4] Review of draft-ietf-nfsv4-rfc5667bis-09
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 03 May 2017 18:08:48 -0000

> On May 3, 2017, at 12:34 PM, David Noveck <davenoveck@gmail.com> wrote:
> 
> > Note that NFS4ERR_RESOURCE is deprecated in NFSv4.1.
> 
> Actually, it is not mentioned at all in RFC5661.  Not sure why that happened.
> 
> There is a define for it in RFC5662 but the comment says it is not a valid error
> in v4.1.
> 
> The table of valid errors for operations does not include it so it is, in some sense,
> not valid to return it for v4.1+.  
> 
> > Will ponder this.
> 
> My suggestion is that we might as well just drop this from the ULB unless someone sees
> a strong need for this.  This error is only valid in v4.0 and not in v3 or v4..1+.

Dropping mention of this specific error seems reasonable.


> On Wed, May 3, 2017 at 11:23 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
>> 
>> > On May 3, 2017, at 2:00 AM, Tom Talpey <ttalpey@microsoft.com> wrote:
>> >
>> >> -----Original Message-----
>> >> From: nfsv4 [mailto:nfsv4-bounces@ietf.org] On Behalf Of Chuck Lever
>> >> Sent: Tuesday, May 2, 2017 6:48 PM
>> >> To: NFSv4 <nfsv4@ietf.org>
>> >> Subject: Re: [nfsv4] Review of draft-ietf-nfsv4-rfc5667bis-09
>> >>
>> >>
>> >>> On May 1, 2017, at 5:02 PM, karen deitke <karen.deitke@oracle.com> wrote:
>> >>>
>> >>>
>> >>>
>> >>> On 4/28/2017 11:32 AM, Chuck Lever wrote:
>> >>>>> On Apr 27, 2017, at 10:24 AM, David Noveck <davenoveck@gmail.com>
>> >> wrote:
>> >>>>>
>> >>>>>> Correct. But since the protocol creates the problem, the protocol
>> >>>>>> definition needs to say something about dealing with it.
>> >>>>> The protocol did not create this problem.
>> >>>> I respectfully disagree.
>> >>>>
>> >>>> The transport protocol design allows arbitrarily complex Read and
>> >>>> Write lists. This is a common practice in protocol design to permit
>> >>>> wide latitude for innovation by implementers.
>> >>>>
>> >>>> NFSv4 also allows arbitrarily complex COMPOUNDs for similar reasons,
>> >>>> but a) clients do not today make use of this, and b) newer minor
>> >>>> versions recognize that implementation limits have to be
>> >>>> communicated.
>> >>>>
>> >>>> The protocol problem we are trying to address is that there is no
>> >>>> mechanism (either via a specified limit or via a run-time
>> >>>> negotiation) that allows implementations to choose limits that are
>> >>>> less than infinity while allowing acceptable interoperability.
>> >>>>
>> >>>>
>> >>>>> Up until about a month ago,
>> >>>>> the protocol allowed multiple chunks and we believed that the
>> >>>>> implementations did as well.
>> >>>> It's true that the transport protocol hasn't changed, but I think up
>> >>>> until a month ago, we simply didn't realize that permissive chunk
>> >>>> list complexity limits was an interopera- bility issue.
>> >>>>
>> >>>> The Linux implementations take a lot of short cuts because they are
>> >>>> really advanced prototypes, not fully mature implementations of the
>> >>>> transport. One of the short cuts has been that they implement just
>> >>>> the minimum number of chunk combinations needed for most NFS
>> >>>> operations. They do not implement all possible chunk combinations,
>> >>>> nor do they support arbitrarily long chunk lists, because they never
>> >>>> had to.
>> >>>>
>> >>>> I'm aware that other implementers have taken a philoso- phically
>> >>>> similar approach to shorten the time it takes to get a working client
>> >>>> and server, and I'm sure that will be the case for implementations in
>> >>>> the future. As protocol designers I don't think we should ignore this
>> >>>> kind of expediency.
>> >>>>
>> >>>> In practice, for now chunk list complexity limits really aren't a
>> >>>> problem at all, because of a) above, except for the desire to have
>> >>>> servers accept a Read list that contains a Position Zero Read chunk
>> >>>> and a normal Read chunk at once.
>> >>>>
>> >>>> So this is a situation that is hazardous, but might not be
>> >>>> encountered in practice for years. I don't want to spill a lot more
>> >>>> electrons or brain cells on fixing something that has minimal
>> >>>> consequences for the set of implementations and ULPs we have today.
>> >>> Agreed.  The solaris server can currently handle an offset 0 read chunk, and
>> >> another read chunk.  What happens if we receive more than that?  Its
>> >> uncertain.  Our client doesn't implement this, nor have we seen it from other
>> >> clients.  That is, its never been tested. Same is true for more than 1write chunk.
>> >>>
>> >>>>
>> >>>>
>> >>>>> Then we found out that a lot implementations have these restrcctions
>> >>>>> and we are trying to deal with that situation.  The protocol has
>> >>>>> stayed the same but we now know that some implementations have these
>> >> restrictions.
>> >>>>>
>> >>>>> Even though the protocol did not create this situation, this
>> >>>>> document is the only opportunity we have to tell clients how to deal with
>> >> these restrictions.
>> >>>> Or we could give implementers a base set of chunk list capabilities
>> >>>> that must be observed. For all versions of NFS on RPC-over-RDMA
>> >>>> Version One, make the limit one normal Read chunk plus one PZRC (both
>> >>>> with multiple Read segments), and one Write chunk (with multiple
>> >>>> segments). Replace discussion of handling
>> >>>> NFSv4 COMPOUNDs with more than one DDP-eligible element with a few
>> >>>> rules that determine which single operation in a multiple READ or
>> >>>> WRITE COMPOUND gets to use DDP.
>> >>>>
>> >>>> Real support for multiple chunks with NFSv4 COMPOUNDs will have to
>> >>>> wait until another version of RPC-over-RDMA is available.
>> >>>>
>> >>>> Not sure what to do about segment count limits.
>> >>> Agreed.  Currently the solaris server does not have a limit on the segment
>> >> count, usually only what will fit in the recv buffer that holds the rdma header
>> >> itself.
>> >>
>> >> Here's an expansion of S5.4.2 to include most of Tom's proposed text and a
>> >> description of current implementation behavior. Please don't hesitate to argue
>> >> in favor of anything I left out, or anything I should remove.
>> >>
>> >
>> > Thanks Chuck, great improvement. Some comments on the update:
>> >
>> >> 5.4.2.  Complexity Considerations
>> >>
>> >>   The RPC-over-RDMA Version One protocol does not place any limit on
>> >>   the number of chunks or segments that may appear in the Read or Write
>> >>   lists.  However, for various reasons NFS version 4 server
>> >>   implementations often have practical limits on the number of chunks
>> >>   or segments they are prepared to process in one message.
>> >>
>> >>   These implementation limits are especially important when Kerberos
>> >>   integrity or privacy is in use [RFC7861].  GSS services increase the
>> >>   size of credential material in RPC headers, forcing more frequent use
>> >
>> > GSS payloads don't always force this, I'd suggest "potentially requiring" or similar
>> > softened text.
>> >
>> >>   of Position-Zero Read chunks and Reply chunks.  This can increase the
>> >>   complexity of chunk lists independent of the NFS version 4 COMPOUND
>> >>   being conveyed.
>> >>
>> >>   To avoid encountering server chunk list complexity limits, NFS
>> >>   version 4 clients SHOULD restrict their RPC-over-RDMA Version One
>> >>   messages to simple combinations of chunks:
>> >
>> > Again, "restrict" may be too strong. If the client is certain that a certain chunk list
>> > is acceptable, it's perfectly OK for it to send it. Suggest the following rules being
>> > cast as "safe" or at least "following the Internet Principles" as "conservative in
>> > what you send".
>> 
>> Agree, I'll make adjustments.
>> 
>> 
>> >>   o  The Read list contains no more than one Position-Zero Read chunk
>> >>      and one Read chunk with a non-zero Position.
>> >>
>> >>   o  The Write list contains no more than one chunk.
>> >>
>> >>   o  The inline threshold restricts the number of segments that may
>> >>      appear in either list.
>> >>
>> >>   NFS version 4 clients wishing to send more complex chunk lists can
>> >>   provide configuration interfaces to bound the complexity of NFS
>> >>   version 4 COMPOUNDs, limit the number of elements in scatter-gather
>> >>   operations, and avoid other sources of RPC-over-RDMA chunk overruns
>> >>   at the peer.
>> >
>> > Good.
>> >
>> >>   An NFS Version 4 server has some flexibility in how it indicates that
>> >>   an RPC-over-RDMA Version One message constructed by an NFS Version 4
>> >>   client is valid but cannot be processed.  Examples include:
>> >>
>> >>   o  A problem is detected at the transport layer (i.e., during
>> >>      transport header processing).  The server returns an RDMA_ERROR
>> >>      message with the err field set to ERR_CHUNK.
>> >>
>> >>   o  A problem is detected during XDR decoding of the request (e.g.,
>> >>      during re-assembly of the RPC Call message by the RPC layer).  The
>> >>      server returns an RPC reply with its "reply_stat" field set to
>> >>      MSG_ACCEPTED and its "accept_stat" field set to GARBAGE_ARGS.
>> >>
>> >>   o  A problem is detected in the Upper Layer (i.e., by the NFS version
>> >>      4 implementation).  The server sends an NFS reply with a status of
>> >
>> > The two previous bullets used "returns" while this uses "sends". Intentional?
>> 
>> Was trying to avoid repetitive wording. They can all use "sends".
>> 
>> 
>> >>      NFS4ERR_RESOURCE.
>> >>
>> >>   After receiving one of these errors, an NFS version 4 client SHOULD
>> >>   NOT retransmit the failing request, as the result would be the same
>> >>   error.  It SHOULD immediately terminate the RPC transaction
>> >>   associated with the XID in the reply.
>> >
>> > Now this is interesting. ERR_CHUNK and GARBAGE_ARGS are clear, but I
>> > thought NFS4ERR_RESOURCE is generally retryable. So, how does the client
>> > determine this error is actually from the RDMA encoding, and not retry?
>> 
>> RFC7530 says:
>> 
>> 13.1.3.4.  NFS4ERR_RESOURCE (Error Code 10018)
>> 
>>    For the processing of the COMPOUND procedure, the server may exhaust
>>    available resources and cannot continue processing operations within
>>    the COMPOUND procedure.  This error will be returned from the server
>>    in those instances of resource exhaustion related to the processing
>>    of the COMPOUND procedure.
>> 
>> 
>> And
>> 
>> 
>> 15.2.5.  IMPLEMENTATION
>> 
>>    Since an error of any type may occur after only a portion of the
>>    operations have been evaluated, the client must be prepared to
>>    recover from any failure.  If the source of an NFS4ERR_RESOURCE error
>>    was a complex or lengthy set of operations, it is likely that if the
>>    number of operations were reduced the server would be able to
>>    evaluate them successfully.  Therefore, the client is responsible for
>>    dealing with this type of complexity in recovery.
>> 
>> 
>> The specification is not clear whether a COMPOUND that receives an
>> NFS4ERR_RESOURCE status may be retried or whether the required
>> client recovery is to break up the COMPOUND and try it again --
>> with a fresh XID, of course -- which was my intended meaning
>> here (don't send exactly the same COMPOUND again).
>> 
>> However, I agree that NFS4ERR_RESOURCE is in a layer above the
>> layer that would deal with terminating the RPC transaction, so
>> those requirements are awkward when applied to RESOURCE.
>> 
>> Note that NFS4ERR_RESOURCE is deprecated in NFSv4.1.
>> 
>> Will ponder this.
>> 
>> 
>> > This protocol should not require another protocol to change existing behavior.
>> 
>> > Tom.
>> >
>> >
>> >>
>> >>
>> >>
>> >>>>> On Thu, Apr 27, 2017 at 12:27 PM, Chuck Lever <chuck.lever@oracle.com>
>> >> wrote:
>> >>>>>
>> >>>>>> On Apr 27, 2017, at 8:54 AM, Tom Talpey <tom@talpey.com> wrote:
>> >>>>>>
>> >>>>>> On 4/27/2017 11:44 AM, Chuck Lever wrote:
>> >>>>>>>> On Apr 27, 2017, at 7:20 AM, Tom Talpey <tom@talpey.com> wrote:
>> >>>>>>>>
>> >>>>>>>>> snip:
>> >>>>>>>>>  Such implementation limits can constrain the complexity of NFS
>> >>>>>>>>>  version 4 COMPOUNDs, limit the number of elements in scatter-
>> >> gather
>> >>>>>>>>>  operations, or prevent the use of Kerberos integrity or privacy
>> >>>>>>>>>  services.
>> >>>>>>>> I like the approach, and the lead-in language looks good. The
>> >>>>>>>> text quoted above is just a little bit dark, especially that bit
>> >>>>>>>> about preventing krb5i/krb5p. I'd suggest a more active statement
>> >>>>>>>> to replace the above, including the more prescriptive SHOULD rather
>> >> than "can".
>> >>>>>>>> How about:
>> >>>>>>>>
>> >>>>>>>> "Client implementations SHOULD be prepared to provide mechanisms
>> >>>>>>>> for reporting the above errors, and optionally provide
>> >>>>>>>> configuration to limit the complexity of NFS version 4 COMPOUNDs,
>> >>>>>>>> limit the number of elements in scatter-gather operations, and to
>> >>>>>>>> avoid other possible sources of RPC-over-RDMA chunk overruns at the
>> >> peer.
>> >>>>>>>>
>> >>>>>>>> These become especially important when Kerberos integrity or
>> >>>>>>>> privacy is in place for the RPC connection. These facilities add
>> >>>>>>>> payload to the RPC headers, potentially increasing the complexity
>> >>>>>>>> of the chunk manipulation, independent of the upper layer NFS
>> >>>>>>>> operation. The implementation SHOULD consider such RPC payload
>> >>>>>>>> requirements in addition to the NFS considerations."
>> >>>>>>> Sure, I can work this in.
>> >>>>>>>
>> >>>>>>> When you say "Client implementations SHOULD ... [report] the above
>> >>>>>>> errors" you are talking about reporting them to administrators
>> >>>>>>> and/or RPC consumers? I don't think we can use SHOULD in that case.
>> >>>>>> I am agnostic about who to inform. The important thing is that some
>> >>>>>> visibility of the error be surfaced. I absolutely don't think an
>> >>>>>> arbitrary GARBAGE_ARGS returned to an application that may simply
>> >>>>>> choke on it, qualifies.
>> >>>>> Right, no client surfaces a protocol element like GARBAGE_ARGS as it
>> >>>>> is. The failure of the RPC on the server is reported to consumers in
>> >>>>> a way that conforms with the API they use to drive the requests. So
>> >>>>> an application running on a POSIX client might use read(2) or
>> >>>>> write(2) and get back EIO in this case, because POSIX allows only a
>> >>>>> narrow range of error status codes for these APIs.
>> >>>>>
>> >>>>> That translation is a normal part of RPC client implementations, so
>> >>>>> I didn't think I needed to state that explicitly, and thus was
>> >>>>> confused about why "Client implementations" was mentioned above.
>> >>>>>
>> >>>>>
>> >>>>>>> This feels like implementation advice, not protocol.
>> >>>>>> Correct. But since the protocol creates the problem, the protocol
>> >>>>>> definition needs to say something about dealing with it.
>> >>>>> Philosophical agreement with that.
>> >>>>>
>> >>>>>> So I believe SHOULD is best.
>> >>>>>>> Would "recommend" be enough for this section?
>> >>>>>> The RFC2119 term RECOMMENDED is basically a synonym for SHOULD.
>> >>>>>> It's perfectly permissible.
>> >>>>> To be clear, I was suggesting lower-case "recommended". IMO RFC 2119
>> >>>>> terminology isn't appropriate for describing internal APIs or
>> >>>>> mechanisms that do not appear on the wire.
>> >>>>>
>> >>>>> The typical approach has been to say "The RPC client then terminates
>> >>>>> the RPC with an appropriate error status." Which crisply yet
>> >>>>> generically describes desired behavior while avoiding the use of
>> >>>>> MUST or SHOULD.
>> >>>>>
>> >>>>>
>> >>>>>> Tom.
>> >>>>>>
>> >>>>>>>
>> >>>>>>>> Feel free to wordsmith further.
>> >>>>>>>>
>> >>>>>>>> Tom.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On 4/26/2017 12:18 PM, Chuck Lever wrote:
>> >>>>>>>>>> On Apr 24, 2017, at 7:30 AM, Chuck Lever <chuck.lever@oracle.com>
>> >> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> On Apr 24, 2017, at 7:56 AM, Tom Talpey <tom@talpey.com>
>> >> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>> On 4/21/2017 10:43 AM, Chuck Lever wrote:
>> >>>>>>>>>>>> I agree that SHOULD/MAY makes things cloudier, and does not
>> >>>>>>>>>>>> seem to align with well-defined RFC2119 usage.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Another way we've dealt with similar disagreements between
>> >>>>>>>>>>>> specification and implementation is to decide that one of the
>> >>>>>>>>>>>> implementations is incorrect.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Can we agree that:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> - GARBAGE_ARGS is a bit of a layering violation, though it's
>> >>>>>>>>>>>> understandable why it might be returned
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> - RPC clients are already prepared for GARBAGE_ARGS
>> >>>>>>>>>>> Are you certain of this?
>> >>>>>>>>>> GARBAGE_ARGS has been part of the RPC protocol for decades.
>> >>>>>>>>>> The two Unix-flavored clients that have NFS/RDMA support can
>> >>>>>>>>>> both handle this error.
>> >>>>>>>>> I've confirmed that the only other known NFS/RDMA client (Oracle
>> >>>>>>>>> dNFS) properly recognizes GARBAGE_ARGS.
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>>> And out of curiosity, what is returned to the consumer for
>> >>>>>>>>>>> GARBAGE_ARGS versus ERR_CHUNK?
>> >>>>>>>>>> RFC 5531:
>> >>>>>>>>>>> GARBAGE_ARGS  = 4, /* procedure can’t decode params   */
>> >>>>>>>>>>
>> >>>>>>>>>> GARBAGE_ARGS is an RPC-level error. The reply is "accepted"
>> >>>>>>>>>> with accept_stat GARBAGE_ARGS. An XID is available in the
>> >>>>>>>>>> header.
>> >>>>>>>>>>
>> >>>>>>>>>> rfc5666bis:
>> >>>>>>>>>>> If the rdma_vers field contains a recognized value, but an XDR
>> >>>>>>>>>>> parsing error occurs, the responder MUST reply with an
>> >>>>>>>>>>> RDMA_ERROR procedure and set the rdma_err value to
>> >> ERR_CHUNK.
>> >>>>>>>>>>
>> >>>>>>>>>> ERR_CHUNK is a transport level error. An XID is available in
>> >>>>>>>>>> the header.
>> >>>>>>>>>>
>> >>>>>>>>>> The difference is that the RPC layer v. the transport layer are
>> >>>>>>>>>> reporting they don't understand the contents of the message
>> >>>>>>>>>> (Call). There is nothing more in either type of message.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>>> - In RPC-over-RDMA Version One, we are not trying to recover
>> >>>>>>>>>>>> (in the sense of resending a simpler COMPOUND) but are rather
>> >>>>>>>>>>>> trying to ensure the offending RPC is properly terminated on
>> >>>>>>>>>>>> the client, and does not further block other RPCs or deadlock
>> >>>>>>>>>>>> the transport
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Thus I claim it is harmless if a server returns GARBAGE_ARGS
>> >>>>>>>>>>>> instead of ERR_CHUNK.
>> >>>>>>>>>>> "Harmless" is a bit relative. The operation fails, through no
>> >>>>>>>>>>> fault of the consumer. And, frankly, in a very mysterious way.
>> >>>>>>>>>> We have no richer way of communicating failure in RPC-over-RDMA
>> >>>>>>>>>> Version One. We are not looking for recovery here, so I don't
>> >>>>>>>>>> believe any more information would be useful. If the server
>> >>>>>>>>>> wishes, it can log the failure with a message explaining what
>> >>>>>>>>>> went wrong.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>> Again, I think there is more to say here. It's a limitation of
>> >>>>>>>>>>> the protocol whose implications should be made clear
>> >>>>>>>>>>> (contraining the complexity of COMPOUNDs, limiting scatter/gather
>> >> lengths, etc).
>> >>>>>>>>>> I'd welcome any suggested text.
>> >>>>>>>>>>
>> >>>>>>>>>> Honestly, I'm not sure what can be said. Neither NFSv4.0 nor
>> >>>>>>>>>> RPC-over-RDMA have a sophisticated mechanism to communicate
>> >>>>>>>>>> this kind of limitation. The best an NFSv4 server can do is
>> >>>>>>>>>> return NFS4ERR_RESOURCE, which also carries little extra
>> >>>>>>>>>> information about what a client should do to recover.
>> >>>>>>>>>>
>> >>>>>>>>>> So are you comfortable with eliminating GARBAGE_ARGS if we can
>> >>>>>>>>>> come up with more detail about the impact of not knowing how
>> >>>>>>>>>> complex a COMPOUND can be?
>> >>>>>>>>> I've come up with some possible replacement text for the final
>> >>>>>>>>> two paragraphs of S5.4.1 in an attempt to address comments from
>> >>>>>>>>> Tom, David, and Karen. The normative requirements have been
>> >>>>>>>>> removed, and a (brief) discussion of the consequences of not
>> >>>>>>>>> handling complex COMPOUNDs was introduced.
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> 5.4.2.  Complexity Considerations
>> >>>>>>>>>
>> >>>>>>>>> As mentioned above, an NFS version 4 COMPOUND procedure can
>> >>>>>>>>> contain  more than one operation that carries a DDP-eligible
>> >>>>>>>>> data item.  The  RPC-over-RDMA Version One protocol does not
>> >>>>>>>>> place any limit on the  number of chunks that may appear in the Read
>> >> or Write lists.
>> >>>>>>>>> Therefore an NFS version 4 client MAY construct an
>> >>>>>>>>> RPC-over-RDMA  Version One message containing more than one
>> >> Read
>> >>>>>>>>> chunk or Write  chunk.
>> >>>>>>>>>
>> >>>>>>>>> However, implementations have practical limits on the number of
>> >>>>>>>>> chunks or segments they are prepared to process in one of these
>> >>>>>>>>> lists.  There are several ways an NFS Version 4 server might
>> >>>>>>>>> indicate  that an RPC Call message constructed by a client is
>> >>>>>>>>> valid but cannot  be processed because of implementation limitations:
>> >>>>>>>>>
>> >>>>>>>>> o  If the problem is detected in the upper layer (i.e., by the NFS
>> >>>>>>>>>    version 4 implementation), the server returns an NFS status of
>> >>>>>>>>>    NFS4ERR_RESOURCE.
>> >>>>>>>>>
>> >>>>>>>>> o  If the problem is detected during XDR decoding of the request
>> >>>>>>>>>    (e.g., during re-assembly of the Call message by the RPC layer),
>> >>>>>>>>>    the server returns an RPC accept_stat of GARBAGE_ARGS.
>> >>>>>>>>>
>> >>>>>>>>> o  If the problem is detected at the transport layer (i.e., during
>> >>>>>>>>>    transport header processing), the server returns an RDMA_ERROR
>> >>>>>>>>>    message with the err_code field set to ERR_CHUNK.
>> >>>>>>>>>
>> >>>>>>>>> Such implementation limits can constrain the complexity of NFS
>> >>>>>>>>> version 4 COMPOUNDs, limit the number of elements in
>> >>>>>>>>> scatter-gather  operations, or prevent the use of Kerberos
>> >>>>>>>>> integrity or privacy  services.
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> Comments, opinions on this approach?
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>>> Tom.
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>> As a result, I can change the Read list text in S5.4.1 to be
>> >>>>>>>>>>>> the same as the Write list text, removing the mention of
>> >>>>>>>>>>>> GARBAGE_ARGS.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Would that sit comfortably with everyone?
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> On Apr 20, 2017, at 7:21 PM, David Noveck
>> >> <davenoveck@gmail.com> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> The "or" is a similar situation, it prescribes a choice,
>> >>>>>>>>>>>>>> which does not define a protocol.
>> >>>>>>>>>>>>> Fair enough, but the point that needs to be made is that,
>> >>>>>>>>>>>>> with regard to Version One, Chuck and  the working group is
>> >>>>>>>>>>>>> not free to define a protocol.  As a result we have the kind
>> >>>>>>>>>>>>> of ugliness you object to, but it is inherent in the choice
>> >>>>>>>>>>>>> to try to revive Version One as-is.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> If an NFS version 4 client sends an RPC Call with a Read
>> >>>>>>>>>>>>>> list that contains more chunks than an NFS version 4 server
>> >>>>>>>>>>>>>> is prepared to process, the server SHOULD reject the
>> >>>>>>>>>>>>>> request by responding with an RDMA_ERROR message with the
>> >>>>>>>>>>>>>> rdma_err value set to ERR_CHUNK. The server MAY reject the
>> >>>>>>>>>>>>>> RPC with an RDMA_MSG message containing an RPC Reply with
>> >> an accept status of GARBAGE_ARGS.
>> >>>>>>>>>>>>> I think I know what you intend here and I've seen stuff like
>> >>>>>>>>>>>>> this in RFCs but I don't wthink e can do this because this is not in
>> >> line with the definitions of "SHOULD"
>> >>>>>>>>>>>>> and "MAY" that appear in RFC2119.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> "SHOULD" means that you are supposed to do something but can
>> >>>>>>>>>>>>> avoid it if you have a good reason and are aware of the
>> >> consequences of not doing it.
>> >>>>>>>>>>>>> In this case the "good" reason is that someone coded the
>> >>>>>>>>>>>>> implementation to do something else, which is not all that
>> >>>>>>>>>>>>> good a reason.  The consequences of returning the
>> >>>>>>>>>>>>> GARBAGEARGS are exactly zero, since the client has to be
>> >> prepared for either it or ERR_CHUNK.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> "MAY" means the implementation can choose to do the action
>> >>>>>>>>>>>>> or not, which is line with the reality here but essentially
>> >> contradicts the SHOULD.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> This at least makes it clear which response is "preferred".
>> >>>>>>>>>>>>> But it is isn't really the job of the RFC2119 terms to say
>> >>>>>>>>>>>>> which is "preferred" or "'preferred'".  These terms are
>> >>>>>>>>>>>>> supposed to describe interoperability and the
>> >>>>>>>>>>>>> interoperability situation is that the server MUST return
>> >>>>>>>>>>>>> ERR_CHUNK or GARBAGEARGS and the client needs to be
>> >> prepared
>> >>>>>>>>>>>>> for either.  That is the unpleasant reality.  If you want to
>> >>>>>>>>>>>>> indicate a preference, you can say something
>> >>>>>>>>>>>>> like:
>> >>>>>>>>>>>>>       • Returning ERR_CHUNK is preferrable.
>> >>>>>>>>>>>>>       • Returinng ERR_CHUNK is more in line with the appropriate
>> >> protocol layering since this issue relates to a limitation of the transport
>> >> implementation.
>> >>>>>>>>>>>>>       • Use of GARBAGEARGS is an unfortunate artifact of
>> >> inappropriately layered implementations and is only allowed for reasons of
>> >> compatibility with existing implementations.  It is desirable to avoid it.
>> >>>>>>>>>>>>>> And one would hope a future draft would decide.
>> >>>>>>>>>>>>> Not sure what draft you are thinking of.  I don't see us doing an
>> >> rfc5667bisbis (rfc5667tris).
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> By the time we did that, the implementations with these
>> >> restrictions will probably be gone.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> I have a second question though. How does the client
>> >>>>>>>>>>>>>> determine what is the actual error? As in, how many chunks
>> >> were allowed?
>> >>>>>>>>>>>>> This is not fixable in Version One.  It would be in Version
>> >>>>>>>>>>>>> Two, but by then the need will probably be gone.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Does the upper
>> >>>>>>>>>>>>>> layer have to recover, and if so what are the implications?
>> >>>>>>>>>>>>> I think something could be put in to indicate that clients
>> >>>>>>>>>>>>> should break up COMPOUNDS so the only have a single chunk
>> >> each.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Yes, I know 5667 did not explore this very well.
>> >>>>>>>>>>>>> It didn't explore it at all.  And 5666's error reporting facilities
>> >> were extremely limited.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Mea culpa.
>> >>>>>>>>>>>>> I don'tt think you have anything to apologize for.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On Thu, Apr 20, 2017 at 5:28 PM, Tom Talpey <tom@talpey.com>
>> >> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On 4/19/2017 11:14 AM, Chuck Lever wrote:
>> >>>>>>>>>>>>> Hi Tom-
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On Apr 18, 2017, at 11:08 PM, Tom Talpey <tom@talpey.com>
>> >> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> I noticed the same thing, and I'll add that the MUST reject
>> >>>>>>>>>>>>> condition is very confusing because it allows an "or". In my
>> >>>>>>>>>>>>> opinion a MUST is always a single requirement, never ambiguous.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> I agree this kind of thing is tricky. I wrote it as "the
>> >>>>>>>>>>>>> server MUST reject the RPC". That's the single requirement.
>> >>>>>>>>>>>>> The choice is how the rejection is conveyed to the client.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> The statement "MUST reject" is not testable. So, while it
>> >>>>>>>>>>>>> may be understood what is intended, there is nothing
>> >>>>>>>>>>>>> implementable in the MUST. The "or" is a similar situation,
>> >>>>>>>>>>>>> it prescribes a choice, which does not define a protocol.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Is there some reason you want to allow such a choice? I
>> >>>>>>>>>>>>> think you'll find that, worded properly, it becomes actually
>> >>>>>>>>>>>>> much less implementable and interoperable than you may think.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> The Solaris server can return an RPC-level error in cases like this.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Well, this is happening because the Solaris server is
>> >>>>>>>>>>>>> (probably) just handing the chunk list up to the RPC layer,
>> >>>>>>>>>>>>> and it's the RPC (XDR) processing that detects any problem.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On the other hand, an implementation could do the opposite,
>> >>>>>>>>>>>>> it could process the chunks at the lower layer, before ever
>> >>>>>>>>>>>>> invoking RPC processing. This would naturally lead to a non-RPC
>> >> error.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> The challenge in defining the protocol is to hide these
>> >> possibilities.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> I think there are similar choices allowed in rfc5666bis.
>> >>>>>>>>>>>>> Let's say that in a perfect world, I would go with only
>> >>>>>>>>>>>>> ERR_CHUNK, but I'm documenting existing implementation
>> >> behavior here.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> I'm not sure it matters to the client: both errors are
>> >>>>>>>>>>>>> permanent and the RPC is terminated on the client.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> I'm open to alternatives.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> The icky way to do this is to split into two weak requirements.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> If an NFS version 4 client sends an RPC Call with a Read
>> >>>>>>>>>>>>> list that contains more chunks than an NFS version 4 server
>> >>>>>>>>>>>>> is prepared to process, the server SHOULD reject the request
>> >>>>>>>>>>>>> by responding with an RDMA_ERROR message with the rdma_err
>> >>>>>>>>>>>>> value set to ERR_CHUNK. The server MAY reject the RPC with
>> >>>>>>>>>>>>> an RDMA_MSG message containing an RPC Reply with an accept
>> >> status of GARBAGE_ARGS.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> This at least makes it clear which response is "preferred".
>> >>>>>>>>>>>>> And one would hope a future draft would decide.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> I have a second question though. How does the client
>> >>>>>>>>>>>>> determine what is the actual error? As in, how many chunks
>> >>>>>>>>>>>>> were allowed? Does the upper layer have to recover, and if so
>> >> what are the implications?
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Yes, I know 5667 did not explore this very well. Mea culpa.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Tom.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On 4/18/2017 6:32 PM, karen deitke wrote:
>> >>>>>>>>>>>>> Hi Chuck, its unclear what you mean by "is prepared to process"
>> >> in the text below.
>> >>>>>>>>>>>>> Other than that, looks good.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Karen
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> 5.4.1
>> >>>>>>>>>>>>> If an NFS version 4 client sends an RPC Call with a Write
>> >>>>>>>>>>>>> list that contains more chunks than an NFS version 4 server
>> >>>>>>>>>>>>> is prepared to process, the server MUST reject the RPC by
>> >>>>>>>>>>>>> responding with an RDMA_ERROR message with the rdma_err
>> >> value set to ERR_CHUNK.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> If an NFS version 4 client sends an RPC Call with a Read
>> >>>>>>>>>>>>> list that contains more chunks than an NFS version 4 server
>> >>>>>>>>>>>>> is prepared to process, the server MUST reject the RPC by
>> >>>>>>>>>>>>> responding with an RDMA_MSG message containing an RPC
>> >> Reply
>> >>>>>>>>>>>>> with an accept status of GARBAGE_ARGS, or with an
>> >> RDMA_ERROR
>> >>>>>>>>>>>>> message with the rdma_err value set to ERR_CHUNK.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On 4/18/2017 1:21 PM, David Noveck wrote:
>> >>>>>>>>>>>>> *Overall Evaluation*
>> >>>>>>>>>>>>> *
>> >>>>>>>>>>>>> *
>> >>>>>>>>>>>>> Major improvement over RFC5667.  Almost ready to ship.  No
>> >>>>>>>>>>>>> technical issues.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> A lot of my comments are basically editorial and are offered
>> >>>>>>>>>>>>> on a take-it-or-lease-it basis.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> I think some clarification in Section 5.4.1 is needed
>> >>>>>>>>>>>>> although not necessarily in the ways suggested below,
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> *Comments by Section*
>> >>>>>>>>>>>>> *5.4.1. Multiple DDP-eligible Data Items* Giventhat
>> >>>>>>>>>>>>> READ_PLUS no longer has any DDP-eligible data items, the
>> >>>>>>>>>>>>> situation described in the fifth bullet can no longer arise.
>> >>>>>>>>>>>>> I suggest deleting the bullet.
>> >>>>>>>>>>>>> The penultimate paragraph can be read as applying to some
>> >>>>>>>>>>>>> situations in which it shouldn't and where the extra chunks
>> >>>>>>>>>>>>> would very naturally ignored. For example, if you had on
>> >>>>>>>>>>>>> write chunk together with a READ operation which failed, the
>> >>>>>>>>>>>>> server would have more chunks (i.e. one) than the number it
>> >>>>>>>>>>>>> is prepared to process (i.e. zero). Suggest, as a possible
>> >> replacement:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Normally, when an NFS version 4 client sends an RPC Call
>> >>>>>>>>>>>>> with a Write list that contains multiple chunks. each such,
>> >>>>>>>>>>>>> when matched with a DDP-eligible data item in the response,
>> >>>>>>>>>>>>> directs the placement of the data item as specified by
>> >>>>>>>>>>>>> [I.D.-nfsv4-rfc5666bis]. When there are DDP-eligible data
>> >>>>>>>>>>>>> items matched to write chunks that an NFS version 4 server
>> >>>>>>>>>>>>> is not prepared to process, the server MUST reject the RPC
>> >>>>>>>>>>>>> by responding with an RDMA_ERROR message with the rdma_err
>> >> value set to ERR_CHUNK.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> With regard to the last paragraph, I am curious that this
>> >>>>>>>>>>>>> paragraph, unlike the previous one, allows GARBGEARGS. Is
>> >>>>>>>>>>>>> this so because that would be allowed if the chunks in
>> >>>>>>>>>>>>> question had offsets other than those that correspond to
>> >>>>>>>>>>>>> DDP-eligible data items? If so, please consider the following
>> >> possible replacement.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Normally, when an NFS version 4 client sends an RPC Call
>> >>>>>>>>>>>>> with a Read list that contains multiple chunks, each such,
>> >>>>>>>>>>>>> when properly matched with a DDP-eliigible data item in the
>> >>>>>>>>>>>>> request, directs the fetching of the the data item as
>> >>>>>>>>>>>>> specified by [I.D.-nfsv4-rfc5666bis]. When there are
>> >>>>>>>>>>>>> DDP-eligible data items matched to read chunks that an NFS
>> >>>>>>>>>>>>> version 4 server is not prepared to process, the server MUST
>> >>>>>>>>>>>>> reject the RPC by responding with an RDMA_ERROR message
>> >> with the rdma_err value set to ERR_CHUNK.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> *5.6. Session-Related Considerations* In the third sentence
>> >>>>>>>>>>>>> of the second paragraph, suggest replacing "no different" by
>> >>>>>>>>>>>>> "not different".
>> >>>>>>>>>>>>> In the last sentence of the last paragraph, suggest replacing "is
>> >> not"
>> >>>>>>>>>>>>> by "were not"
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> _______________________________________________
>> >>>>>>>>>>>>> nfsv4 mailing list
>> >>>>>>>>>>>>> nfsv4@ietf.org
>> >>>>>>>>>>>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%
>> >>>>>>>>>>>>>
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
>> _______________________________________________
>> nfsv4 mailing list
>> nfsv4@ietf.org
>> https://www.ietf.org/mailman/listinfo/nfsv4
>