Re: [nfsv4] Review of draft-ietf-nfsv4-rfc5667bis-09

Chuck Lever <chuck.lever@oracle.com> Wed, 03 May 2017 15:26 UTC

Return-Path: <chuck.lever@oracle.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6C8EB129B11 for <nfsv4@ietfa.amsl.com>; Wed, 3 May 2017 08:26:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.221
X-Spam-Level:
X-Spam-Status: No, score=-4.221 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xHTl8QYWPevI for <nfsv4@ietfa.amsl.com>; Wed, 3 May 2017 08:26:03 -0700 (PDT)
Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1FF1A129426 for <nfsv4@ietf.org>; Wed, 3 May 2017 08:23:25 -0700 (PDT)
Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id v43FNOKq008057 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for <nfsv4@ietf.org>; Wed, 3 May 2017 15:23:24 GMT
Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id v43FNNJV020455 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for <nfsv4@ietf.org>; Wed, 3 May 2017 15:23:24 GMT
Received: from abhmp0016.oracle.com (abhmp0016.oracle.com [141.146.116.22]) by aserv0121.oracle.com (8.13.8/8.13.8) with ESMTP id v43FNL6T030614 for <nfsv4@ietf.org>; Wed, 3 May 2017 15:23:22 GMT
Received: from anon-dhcp-171.1015granger.net (/68.46.169.226) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 03 May 2017 08:23:21 -0700
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Chuck Lever <chuck.lever@oracle.com>
In-Reply-To: <BN6PR03MB2449B8D6FEE4975D78858456A0160@BN6PR03MB2449.namprd03.prod.outlook.com>
Date: Wed, 03 May 2017 11:23:20 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <02E06540-C6EE-44F5-A538-4D3FDB12EFA0@oracle.com>
References: <CADaq8jdkGgL+H-yoO-+bTNbSYiE_1us9cN5SXY8QV0gfYfK0Ng@mail.gmail.com> <ce42960d-d1e9-8fa6-e98e-3e9b1a2af7d6@oracle.com> <f66e8e66-ba54-ff57-945a-7951eab2f8b1@talpey.com> <BB65A737-BDBD-4A23-9CEE-2EA153293842@oracle.com> <33468014-6695-a2da-1af8-f1f355fbe986@talpey.com> <CADaq8jcJJQ3TiVX6fFURg22YgNg=Cd7ezNQewjt6fgNK4LrPVg@mail.gmail.com> <F417EA11-D49F-420D-A64F-AE6A382B920C@oracle.com> <7213a956-6157-d0a6-432d-1da8d555d8e9@talpey.com> <A7BB8A22-53E3-4910-A6DE-C6103343D309@oracle.com> <6974E7E7-051B-4F28-A61A-DF6F841B248B@oracle.com> <af6ed8c5-6a7d-08ed-590b-1774f34e05f2@talpey.com> <F842F8E7-B576-4781-A845-F13317593F88@oracle.com> <1451a113-115b-5c43-5cfe-f0c5e21b59d6@talpey.com> <C91AC1D8-C884-490B-8738-7279DEC0F372@oracle.com> <CADaq8jc6X6y5WXuptVevhNopG9Nbfca8FUV6zYCBTADs5ohvag@mail.gmail.com> <F7941956-149D-4B4C-B793-444FC61A9517@oracle.com> <e7ff236f-29e4-06d8-86c9-486f95f9db14@oracle.com> <505CD860-4167-49FB-8162-B5FE6083E7AF@oracle.com> <BN6PR03MB2449B8D6FE! E4975D78858456A0160@BN6PR03MB2449.namprd03.prod.outlook.com>
To: NFSv4 <nfsv4@ietf.org>
X-Mailer: Apple Mail (2.3124)
X-Source-IP: userv0022.oracle.com [156.151.31.74]
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/oNwT9VnVQwT2hfgYtFmO-US9nK4>
Subject: Re: [nfsv4] Review of draft-ietf-nfsv4-rfc5667bis-09
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 03 May 2017 15:26:08 -0000

> On May 3, 2017, at 2:00 AM, Tom Talpey <ttalpey@microsoft.com> wrote:
> 
>> -----Original Message-----
>> From: nfsv4 [mailto:nfsv4-bounces@ietf.org] On Behalf Of Chuck Lever
>> Sent: Tuesday, May 2, 2017 6:48 PM
>> To: NFSv4 <nfsv4@ietf.org>
>> Subject: Re: [nfsv4] Review of draft-ietf-nfsv4-rfc5667bis-09
>> 
>> 
>>> On May 1, 2017, at 5:02 PM, karen deitke <karen.deitke@oracle.com> wrote:
>>> 
>>> 
>>> 
>>> On 4/28/2017 11:32 AM, Chuck Lever wrote:
>>>>> On Apr 27, 2017, at 10:24 AM, David Noveck <davenoveck@gmail.com>
>> wrote:
>>>>> 
>>>>>> Correct. But since the protocol creates the problem, the protocol
>>>>>> definition needs to say something about dealing with it.
>>>>> The protocol did not create this problem.
>>>> I respectfully disagree.
>>>> 
>>>> The transport protocol design allows arbitrarily complex Read and
>>>> Write lists. This is a common practice in protocol design to permit
>>>> wide latitude for innovation by implementers.
>>>> 
>>>> NFSv4 also allows arbitrarily complex COMPOUNDs for similar reasons,
>>>> but a) clients do not today make use of this, and b) newer minor
>>>> versions recognize that implementation limits have to be
>>>> communicated.
>>>> 
>>>> The protocol problem we are trying to address is that there is no
>>>> mechanism (either via a specified limit or via a run-time
>>>> negotiation) that allows implementations to choose limits that are
>>>> less than infinity while allowing acceptable interoperability.
>>>> 
>>>> 
>>>>> Up until about a month ago,
>>>>> the protocol allowed multiple chunks and we believed that the
>>>>> implementations did as well.
>>>> It's true that the transport protocol hasn't changed, but I think up
>>>> until a month ago, we simply didn't realize that permissive chunk
>>>> list complexity limits was an interopera- bility issue.
>>>> 
>>>> The Linux implementations take a lot of short cuts because they are
>>>> really advanced prototypes, not fully mature implementations of the
>>>> transport. One of the short cuts has been that they implement just
>>>> the minimum number of chunk combinations needed for most NFS
>>>> operations. They do not implement all possible chunk combinations,
>>>> nor do they support arbitrarily long chunk lists, because they never
>>>> had to.
>>>> 
>>>> I'm aware that other implementers have taken a philoso- phically
>>>> similar approach to shorten the time it takes to get a working client
>>>> and server, and I'm sure that will be the case for implementations in
>>>> the future. As protocol designers I don't think we should ignore this
>>>> kind of expediency.
>>>> 
>>>> In practice, for now chunk list complexity limits really aren't a
>>>> problem at all, because of a) above, except for the desire to have
>>>> servers accept a Read list that contains a Position Zero Read chunk
>>>> and a normal Read chunk at once.
>>>> 
>>>> So this is a situation that is hazardous, but might not be
>>>> encountered in practice for years. I don't want to spill a lot more
>>>> electrons or brain cells on fixing something that has minimal
>>>> consequences for the set of implementations and ULPs we have today.
>>> Agreed.  The solaris server can currently handle an offset 0 read chunk, and
>> another read chunk.  What happens if we receive more than that?  Its
>> uncertain.  Our client doesn't implement this, nor have we seen it from other
>> clients.  That is, its never been tested. Same is true for more than 1write chunk.
>>> 
>>>> 
>>>> 
>>>>> Then we found out that a lot implementations have these restrcctions
>>>>> and we are trying to deal with that situation.  The protocol has
>>>>> stayed the same but we now know that some implementations have these
>> restrictions.
>>>>> 
>>>>> Even though the protocol did not create this situation, this
>>>>> document is the only opportunity we have to tell clients how to deal with
>> these restrictions.
>>>> Or we could give implementers a base set of chunk list capabilities
>>>> that must be observed. For all versions of NFS on RPC-over-RDMA
>>>> Version One, make the limit one normal Read chunk plus one PZRC (both
>>>> with multiple Read segments), and one Write chunk (with multiple
>>>> segments). Replace discussion of handling
>>>> NFSv4 COMPOUNDs with more than one DDP-eligible element with a few
>>>> rules that determine which single operation in a multiple READ or
>>>> WRITE COMPOUND gets to use DDP.
>>>> 
>>>> Real support for multiple chunks with NFSv4 COMPOUNDs will have to
>>>> wait until another version of RPC-over-RDMA is available.
>>>> 
>>>> Not sure what to do about segment count limits.
>>> Agreed.  Currently the solaris server does not have a limit on the segment
>> count, usually only what will fit in the recv buffer that holds the rdma header
>> itself.
>> 
>> Here's an expansion of S5.4.2 to include most of Tom's proposed text and a
>> description of current implementation behavior. Please don't hesitate to argue
>> in favor of anything I left out, or anything I should remove.
>> 
> 
> Thanks Chuck, great improvement. Some comments on the update:
> 
>> 5.4.2.  Complexity Considerations
>> 
>>   The RPC-over-RDMA Version One protocol does not place any limit on
>>   the number of chunks or segments that may appear in the Read or Write
>>   lists.  However, for various reasons NFS version 4 server
>>   implementations often have practical limits on the number of chunks
>>   or segments they are prepared to process in one message.
>> 
>>   These implementation limits are especially important when Kerberos
>>   integrity or privacy is in use [RFC7861].  GSS services increase the
>>   size of credential material in RPC headers, forcing more frequent use
> 
> GSS payloads don't always force this, I'd suggest "potentially requiring" or similar
> softened text.
> 
>>   of Position-Zero Read chunks and Reply chunks.  This can increase the
>>   complexity of chunk lists independent of the NFS version 4 COMPOUND
>>   being conveyed.
>> 
>>   To avoid encountering server chunk list complexity limits, NFS
>>   version 4 clients SHOULD restrict their RPC-over-RDMA Version One
>>   messages to simple combinations of chunks:
> 
> Again, "restrict" may be too strong. If the client is certain that a certain chunk list
> is acceptable, it's perfectly OK for it to send it. Suggest the following rules being
> cast as "safe" or at least "following the Internet Principles" as "conservative in
> what you send".

Agree, I'll make adjustments.


>>   o  The Read list contains no more than one Position-Zero Read chunk
>>      and one Read chunk with a non-zero Position.
>> 
>>   o  The Write list contains no more than one chunk.
>> 
>>   o  The inline threshold restricts the number of segments that may
>>      appear in either list.
>> 
>>   NFS version 4 clients wishing to send more complex chunk lists can
>>   provide configuration interfaces to bound the complexity of NFS
>>   version 4 COMPOUNDs, limit the number of elements in scatter-gather
>>   operations, and avoid other sources of RPC-over-RDMA chunk overruns
>>   at the peer.
> 
> Good.
> 
>>   An NFS Version 4 server has some flexibility in how it indicates that
>>   an RPC-over-RDMA Version One message constructed by an NFS Version 4
>>   client is valid but cannot be processed.  Examples include:
>> 
>>   o  A problem is detected at the transport layer (i.e., during
>>      transport header processing).  The server returns an RDMA_ERROR
>>      message with the err field set to ERR_CHUNK.
>> 
>>   o  A problem is detected during XDR decoding of the request (e.g.,
>>      during re-assembly of the RPC Call message by the RPC layer).  The
>>      server returns an RPC reply with its "reply_stat" field set to
>>      MSG_ACCEPTED and its "accept_stat" field set to GARBAGE_ARGS.
>> 
>>   o  A problem is detected in the Upper Layer (i.e., by the NFS version
>>      4 implementation).  The server sends an NFS reply with a status of
> 
> The two previous bullets used "returns" while this uses "sends". Intentional?

Was trying to avoid repetitive wording. They can all use "sends".


>>      NFS4ERR_RESOURCE.
>> 
>>   After receiving one of these errors, an NFS version 4 client SHOULD
>>   NOT retransmit the failing request, as the result would be the same
>>   error.  It SHOULD immediately terminate the RPC transaction
>>   associated with the XID in the reply.
> 
> Now this is interesting. ERR_CHUNK and GARBAGE_ARGS are clear, but I
> thought NFS4ERR_RESOURCE is generally retryable. So, how does the client
> determine this error is actually from the RDMA encoding, and not retry?

RFC7530 says:

13.1.3.4.  NFS4ERR_RESOURCE (Error Code 10018)

   For the processing of the COMPOUND procedure, the server may exhaust
   available resources and cannot continue processing operations within
   the COMPOUND procedure.  This error will be returned from the server
   in those instances of resource exhaustion related to the processing
   of the COMPOUND procedure.


And


15.2.5.  IMPLEMENTATION

   Since an error of any type may occur after only a portion of the
   operations have been evaluated, the client must be prepared to
   recover from any failure.  If the source of an NFS4ERR_RESOURCE error
   was a complex or lengthy set of operations, it is likely that if the
   number of operations were reduced the server would be able to
   evaluate them successfully.  Therefore, the client is responsible for
   dealing with this type of complexity in recovery.


The specification is not clear whether a COMPOUND that receives an
NFS4ERR_RESOURCE status may be retried or whether the required
client recovery is to break up the COMPOUND and try it again --
with a fresh XID, of course -- which was my intended meaning
here (don't send exactly the same COMPOUND again).

However, I agree that NFS4ERR_RESOURCE is in a layer above the
layer that would deal with terminating the RPC transaction, so
those requirements are awkward when applied to RESOURCE.

Note that NFS4ERR_RESOURCE is deprecated in NFSv4.1.

Will ponder this.


> This protocol should not require another protocol to change existing behavior.

> Tom.
> 
> 
>> 
>> 
>> 
>>>>> On Thu, Apr 27, 2017 at 12:27 PM, Chuck Lever <chuck.lever@oracle.com>
>> wrote:
>>>>> 
>>>>>> On Apr 27, 2017, at 8:54 AM, Tom Talpey <tom@talpey.com> wrote:
>>>>>> 
>>>>>> On 4/27/2017 11:44 AM, Chuck Lever wrote:
>>>>>>>> On Apr 27, 2017, at 7:20 AM, Tom Talpey <tom@talpey.com> wrote:
>>>>>>>> 
>>>>>>>>> snip:
>>>>>>>>>  Such implementation limits can constrain the complexity of NFS
>>>>>>>>>  version 4 COMPOUNDs, limit the number of elements in scatter-
>> gather
>>>>>>>>>  operations, or prevent the use of Kerberos integrity or privacy
>>>>>>>>>  services.
>>>>>>>> I like the approach, and the lead-in language looks good. The
>>>>>>>> text quoted above is just a little bit dark, especially that bit
>>>>>>>> about preventing krb5i/krb5p. I'd suggest a more active statement
>>>>>>>> to replace the above, including the more prescriptive SHOULD rather
>> than "can".
>>>>>>>> How about:
>>>>>>>> 
>>>>>>>> "Client implementations SHOULD be prepared to provide mechanisms
>>>>>>>> for reporting the above errors, and optionally provide
>>>>>>>> configuration to limit the complexity of NFS version 4 COMPOUNDs,
>>>>>>>> limit the number of elements in scatter-gather operations, and to
>>>>>>>> avoid other possible sources of RPC-over-RDMA chunk overruns at the
>> peer.
>>>>>>>> 
>>>>>>>> These become especially important when Kerberos integrity or
>>>>>>>> privacy is in place for the RPC connection. These facilities add
>>>>>>>> payload to the RPC headers, potentially increasing the complexity
>>>>>>>> of the chunk manipulation, independent of the upper layer NFS
>>>>>>>> operation. The implementation SHOULD consider such RPC payload
>>>>>>>> requirements in addition to the NFS considerations."
>>>>>>> Sure, I can work this in.
>>>>>>> 
>>>>>>> When you say "Client implementations SHOULD ... [report] the above
>>>>>>> errors" you are talking about reporting them to administrators
>>>>>>> and/or RPC consumers? I don't think we can use SHOULD in that case.
>>>>>> I am agnostic about who to inform. The important thing is that some
>>>>>> visibility of the error be surfaced. I absolutely don't think an
>>>>>> arbitrary GARBAGE_ARGS returned to an application that may simply
>>>>>> choke on it, qualifies.
>>>>> Right, no client surfaces a protocol element like GARBAGE_ARGS as it
>>>>> is. The failure of the RPC on the server is reported to consumers in
>>>>> a way that conforms with the API they use to drive the requests. So
>>>>> an application running on a POSIX client might use read(2) or
>>>>> write(2) and get back EIO in this case, because POSIX allows only a
>>>>> narrow range of error status codes for these APIs.
>>>>> 
>>>>> That translation is a normal part of RPC client implementations, so
>>>>> I didn't think I needed to state that explicitly, and thus was
>>>>> confused about why "Client implementations" was mentioned above.
>>>>> 
>>>>> 
>>>>>>> This feels like implementation advice, not protocol.
>>>>>> Correct. But since the protocol creates the problem, the protocol
>>>>>> definition needs to say something about dealing with it.
>>>>> Philosophical agreement with that.
>>>>> 
>>>>>> So I believe SHOULD is best.
>>>>>>> Would "recommend" be enough for this section?
>>>>>> The RFC2119 term RECOMMENDED is basically a synonym for SHOULD.
>>>>>> It's perfectly permissible.
>>>>> To be clear, I was suggesting lower-case "recommended". IMO RFC 2119
>>>>> terminology isn't appropriate for describing internal APIs or
>>>>> mechanisms that do not appear on the wire.
>>>>> 
>>>>> The typical approach has been to say "The RPC client then terminates
>>>>> the RPC with an appropriate error status." Which crisply yet
>>>>> generically describes desired behavior while avoiding the use of
>>>>> MUST or SHOULD.
>>>>> 
>>>>> 
>>>>>> Tom.
>>>>>> 
>>>>>>> 
>>>>>>>> Feel free to wordsmith further.
>>>>>>>> 
>>>>>>>> Tom.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 4/26/2017 12:18 PM, Chuck Lever wrote:
>>>>>>>>>> On Apr 24, 2017, at 7:30 AM, Chuck Lever <chuck.lever@oracle.com>
>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> On Apr 24, 2017, at 7:56 AM, Tom Talpey <tom@talpey.com>
>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> On 4/21/2017 10:43 AM, Chuck Lever wrote:
>>>>>>>>>>>> I agree that SHOULD/MAY makes things cloudier, and does not
>>>>>>>>>>>> seem to align with well-defined RFC2119 usage.
>>>>>>>>>>>> 
>>>>>>>>>>>> Another way we've dealt with similar disagreements between
>>>>>>>>>>>> specification and implementation is to decide that one of the
>>>>>>>>>>>> implementations is incorrect.
>>>>>>>>>>>> 
>>>>>>>>>>>> Can we agree that:
>>>>>>>>>>>> 
>>>>>>>>>>>> - GARBAGE_ARGS is a bit of a layering violation, though it's
>>>>>>>>>>>> understandable why it might be returned
>>>>>>>>>>>> 
>>>>>>>>>>>> - RPC clients are already prepared for GARBAGE_ARGS
>>>>>>>>>>> Are you certain of this?
>>>>>>>>>> GARBAGE_ARGS has been part of the RPC protocol for decades.
>>>>>>>>>> The two Unix-flavored clients that have NFS/RDMA support can
>>>>>>>>>> both handle this error.
>>>>>>>>> I've confirmed that the only other known NFS/RDMA client (Oracle
>>>>>>>>> dNFS) properly recognizes GARBAGE_ARGS.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>>> And out of curiosity, what is returned to the consumer for
>>>>>>>>>>> GARBAGE_ARGS versus ERR_CHUNK?
>>>>>>>>>> RFC 5531:
>>>>>>>>>>> GARBAGE_ARGS  = 4, /* procedure can’t decode params   */
>>>>>>>>>> 
>>>>>>>>>> GARBAGE_ARGS is an RPC-level error. The reply is "accepted"
>>>>>>>>>> with accept_stat GARBAGE_ARGS. An XID is available in the
>>>>>>>>>> header.
>>>>>>>>>> 
>>>>>>>>>> rfc5666bis:
>>>>>>>>>>> If the rdma_vers field contains a recognized value, but an XDR
>>>>>>>>>>> parsing error occurs, the responder MUST reply with an
>>>>>>>>>>> RDMA_ERROR procedure and set the rdma_err value to
>> ERR_CHUNK.
>>>>>>>>>> 
>>>>>>>>>> ERR_CHUNK is a transport level error. An XID is available in
>>>>>>>>>> the header.
>>>>>>>>>> 
>>>>>>>>>> The difference is that the RPC layer v. the transport layer are
>>>>>>>>>> reporting they don't understand the contents of the message
>>>>>>>>>> (Call). There is nothing more in either type of message.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>>> - In RPC-over-RDMA Version One, we are not trying to recover
>>>>>>>>>>>> (in the sense of resending a simpler COMPOUND) but are rather
>>>>>>>>>>>> trying to ensure the offending RPC is properly terminated on
>>>>>>>>>>>> the client, and does not further block other RPCs or deadlock
>>>>>>>>>>>> the transport
>>>>>>>>>>>> 
>>>>>>>>>>>> Thus I claim it is harmless if a server returns GARBAGE_ARGS
>>>>>>>>>>>> instead of ERR_CHUNK.
>>>>>>>>>>> "Harmless" is a bit relative. The operation fails, through no
>>>>>>>>>>> fault of the consumer. And, frankly, in a very mysterious way.
>>>>>>>>>> We have no richer way of communicating failure in RPC-over-RDMA
>>>>>>>>>> Version One. We are not looking for recovery here, so I don't
>>>>>>>>>> believe any more information would be useful. If the server
>>>>>>>>>> wishes, it can log the failure with a message explaining what
>>>>>>>>>> went wrong.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> Again, I think there is more to say here. It's a limitation of
>>>>>>>>>>> the protocol whose implications should be made clear
>>>>>>>>>>> (contraining the complexity of COMPOUNDs, limiting scatter/gather
>> lengths, etc).
>>>>>>>>>> I'd welcome any suggested text.
>>>>>>>>>> 
>>>>>>>>>> Honestly, I'm not sure what can be said. Neither NFSv4.0 nor
>>>>>>>>>> RPC-over-RDMA have a sophisticated mechanism to communicate
>>>>>>>>>> this kind of limitation. The best an NFSv4 server can do is
>>>>>>>>>> return NFS4ERR_RESOURCE, which also carries little extra
>>>>>>>>>> information about what a client should do to recover.
>>>>>>>>>> 
>>>>>>>>>> So are you comfortable with eliminating GARBAGE_ARGS if we can
>>>>>>>>>> come up with more detail about the impact of not knowing how
>>>>>>>>>> complex a COMPOUND can be?
>>>>>>>>> I've come up with some possible replacement text for the final
>>>>>>>>> two paragraphs of S5.4.1 in an attempt to address comments from
>>>>>>>>> Tom, David, and Karen. The normative requirements have been
>>>>>>>>> removed, and a (brief) discussion of the consequences of not
>>>>>>>>> handling complex COMPOUNDs was introduced.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 5.4.2.  Complexity Considerations
>>>>>>>>> 
>>>>>>>>> As mentioned above, an NFS version 4 COMPOUND procedure can
>>>>>>>>> contain  more than one operation that carries a DDP-eligible
>>>>>>>>> data item.  The  RPC-over-RDMA Version One protocol does not
>>>>>>>>> place any limit on the  number of chunks that may appear in the Read
>> or Write lists.
>>>>>>>>> Therefore an NFS version 4 client MAY construct an
>>>>>>>>> RPC-over-RDMA  Version One message containing more than one
>> Read
>>>>>>>>> chunk or Write  chunk.
>>>>>>>>> 
>>>>>>>>> However, implementations have practical limits on the number of
>>>>>>>>> chunks or segments they are prepared to process in one of these
>>>>>>>>> lists.  There are several ways an NFS Version 4 server might
>>>>>>>>> indicate  that an RPC Call message constructed by a client is
>>>>>>>>> valid but cannot  be processed because of implementation limitations:
>>>>>>>>> 
>>>>>>>>> o  If the problem is detected in the upper layer (i.e., by the NFS
>>>>>>>>>    version 4 implementation), the server returns an NFS status of
>>>>>>>>>    NFS4ERR_RESOURCE.
>>>>>>>>> 
>>>>>>>>> o  If the problem is detected during XDR decoding of the request
>>>>>>>>>    (e.g., during re-assembly of the Call message by the RPC layer),
>>>>>>>>>    the server returns an RPC accept_stat of GARBAGE_ARGS.
>>>>>>>>> 
>>>>>>>>> o  If the problem is detected at the transport layer (i.e., during
>>>>>>>>>    transport header processing), the server returns an RDMA_ERROR
>>>>>>>>>    message with the err_code field set to ERR_CHUNK.
>>>>>>>>> 
>>>>>>>>> Such implementation limits can constrain the complexity of NFS
>>>>>>>>> version 4 COMPOUNDs, limit the number of elements in
>>>>>>>>> scatter-gather  operations, or prevent the use of Kerberos
>>>>>>>>> integrity or privacy  services.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Comments, opinions on this approach?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>>> Tom.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> As a result, I can change the Read list text in S5.4.1 to be
>>>>>>>>>>>> the same as the Write list text, removing the mention of
>>>>>>>>>>>> GARBAGE_ARGS.
>>>>>>>>>>>> 
>>>>>>>>>>>> Would that sit comfortably with everyone?
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Apr 20, 2017, at 7:21 PM, David Noveck
>> <davenoveck@gmail.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The "or" is a similar situation, it prescribes a choice,
>>>>>>>>>>>>>> which does not define a protocol.
>>>>>>>>>>>>> Fair enough, but the point that needs to be made is that,
>>>>>>>>>>>>> with regard to Version One, Chuck and  the working group is
>>>>>>>>>>>>> not free to define a protocol.  As a result we have the kind
>>>>>>>>>>>>> of ugliness you object to, but it is inherent in the choice
>>>>>>>>>>>>> to try to revive Version One as-is.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> If an NFS version 4 client sends an RPC Call with a Read
>>>>>>>>>>>>>> list that contains more chunks than an NFS version 4 server
>>>>>>>>>>>>>> is prepared to process, the server SHOULD reject the
>>>>>>>>>>>>>> request by responding with an RDMA_ERROR message with the
>>>>>>>>>>>>>> rdma_err value set to ERR_CHUNK. The server MAY reject the
>>>>>>>>>>>>>> RPC with an RDMA_MSG message containing an RPC Reply with
>> an accept status of GARBAGE_ARGS.
>>>>>>>>>>>>> I think I know what you intend here and I've seen stuff like
>>>>>>>>>>>>> this in RFCs but I don't wthink e can do this because this is not in
>> line with the definitions of "SHOULD"
>>>>>>>>>>>>> and "MAY" that appear in RFC2119.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> "SHOULD" means that you are supposed to do something but can
>>>>>>>>>>>>> avoid it if you have a good reason and are aware of the
>> consequences of not doing it.
>>>>>>>>>>>>> In this case the "good" reason is that someone coded the
>>>>>>>>>>>>> implementation to do something else, which is not all that
>>>>>>>>>>>>> good a reason.  The consequences of returning the
>>>>>>>>>>>>> GARBAGEARGS are exactly zero, since the client has to be
>> prepared for either it or ERR_CHUNK.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> "MAY" means the implementation can choose to do the action
>>>>>>>>>>>>> or not, which is line with the reality here but essentially
>> contradicts the SHOULD.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> This at least makes it clear which response is "preferred".
>>>>>>>>>>>>> But it is isn't really the job of the RFC2119 terms to say
>>>>>>>>>>>>> which is "preferred" or "'preferred'".  These terms are
>>>>>>>>>>>>> supposed to describe interoperability and the
>>>>>>>>>>>>> interoperability situation is that the server MUST return
>>>>>>>>>>>>> ERR_CHUNK or GARBAGEARGS and the client needs to be
>> prepared
>>>>>>>>>>>>> for either.  That is the unpleasant reality.  If you want to
>>>>>>>>>>>>> indicate a preference, you can say something
>>>>>>>>>>>>> like:
>>>>>>>>>>>>>       • Returning ERR_CHUNK is preferrable.
>>>>>>>>>>>>>       • Returinng ERR_CHUNK is more in line with the appropriate
>> protocol layering since this issue relates to a limitation of the transport
>> implementation.
>>>>>>>>>>>>>       • Use of GARBAGEARGS is an unfortunate artifact of
>> inappropriately layered implementations and is only allowed for reasons of
>> compatibility with existing implementations.  It is desirable to avoid it.
>>>>>>>>>>>>>> And one would hope a future draft would decide.
>>>>>>>>>>>>> Not sure what draft you are thinking of.  I don't see us doing an
>> rfc5667bisbis (rfc5667tris).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> By the time we did that, the implementations with these
>> restrictions will probably be gone.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I have a second question though. How does the client
>>>>>>>>>>>>>> determine what is the actual error? As in, how many chunks
>> were allowed?
>>>>>>>>>>>>> This is not fixable in Version One.  It would be in Version
>>>>>>>>>>>>> Two, but by then the need will probably be gone.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Does the upper
>>>>>>>>>>>>>> layer have to recover, and if so what are the implications?
>>>>>>>>>>>>> I think something could be put in to indicate that clients
>>>>>>>>>>>>> should break up COMPOUNDS so the only have a single chunk
>> each.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Yes, I know 5667 did not explore this very well.
>>>>>>>>>>>>> It didn't explore it at all.  And 5666's error reporting facilities
>> were extremely limited.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Mea culpa.
>>>>>>>>>>>>> I don'tt think you have anything to apologize for.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Thu, Apr 20, 2017 at 5:28 PM, Tom Talpey <tom@talpey.com>
>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 4/19/2017 11:14 AM, Chuck Lever wrote:
>>>>>>>>>>>>> Hi Tom-
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Apr 18, 2017, at 11:08 PM, Tom Talpey <tom@talpey.com>
>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I noticed the same thing, and I'll add that the MUST reject
>>>>>>>>>>>>> condition is very confusing because it allows an "or". In my
>>>>>>>>>>>>> opinion a MUST is always a single requirement, never ambiguous.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I agree this kind of thing is tricky. I wrote it as "the
>>>>>>>>>>>>> server MUST reject the RPC". That's the single requirement.
>>>>>>>>>>>>> The choice is how the rejection is conveyed to the client.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The statement "MUST reject" is not testable. So, while it
>>>>>>>>>>>>> may be understood what is intended, there is nothing
>>>>>>>>>>>>> implementable in the MUST. The "or" is a similar situation,
>>>>>>>>>>>>> it prescribes a choice, which does not define a protocol.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Is there some reason you want to allow such a choice? I
>>>>>>>>>>>>> think you'll find that, worded properly, it becomes actually
>>>>>>>>>>>>> much less implementable and interoperable than you may think.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The Solaris server can return an RPC-level error in cases like this.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Well, this is happening because the Solaris server is
>>>>>>>>>>>>> (probably) just handing the chunk list up to the RPC layer,
>>>>>>>>>>>>> and it's the RPC (XDR) processing that detects any problem.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On the other hand, an implementation could do the opposite,
>>>>>>>>>>>>> it could process the chunks at the lower layer, before ever
>>>>>>>>>>>>> invoking RPC processing. This would naturally lead to a non-RPC
>> error.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The challenge in defining the protocol is to hide these
>> possibilities.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I think there are similar choices allowed in rfc5666bis.
>>>>>>>>>>>>> Let's say that in a perfect world, I would go with only
>>>>>>>>>>>>> ERR_CHUNK, but I'm documenting existing implementation
>> behavior here.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'm not sure it matters to the client: both errors are
>>>>>>>>>>>>> permanent and the RPC is terminated on the client.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'm open to alternatives.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The icky way to do this is to split into two weak requirements.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> If an NFS version 4 client sends an RPC Call with a Read
>>>>>>>>>>>>> list that contains more chunks than an NFS version 4 server
>>>>>>>>>>>>> is prepared to process, the server SHOULD reject the request
>>>>>>>>>>>>> by responding with an RDMA_ERROR message with the rdma_err
>>>>>>>>>>>>> value set to ERR_CHUNK. The server MAY reject the RPC with
>>>>>>>>>>>>> an RDMA_MSG message containing an RPC Reply with an accept
>> status of GARBAGE_ARGS.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This at least makes it clear which response is "preferred".
>>>>>>>>>>>>> And one would hope a future draft would decide.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I have a second question though. How does the client
>>>>>>>>>>>>> determine what is the actual error? As in, how many chunks
>>>>>>>>>>>>> were allowed? Does the upper layer have to recover, and if so
>> what are the implications?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Yes, I know 5667 did not explore this very well. Mea culpa.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Tom.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 4/18/2017 6:32 PM, karen deitke wrote:
>>>>>>>>>>>>> Hi Chuck, its unclear what you mean by "is prepared to process"
>> in the text below.
>>>>>>>>>>>>> Other than that, looks good.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Karen
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 5.4.1
>>>>>>>>>>>>> If an NFS version 4 client sends an RPC Call with a Write
>>>>>>>>>>>>> list that contains more chunks than an NFS version 4 server
>>>>>>>>>>>>> is prepared to process, the server MUST reject the RPC by
>>>>>>>>>>>>> responding with an RDMA_ERROR message with the rdma_err
>> value set to ERR_CHUNK.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> If an NFS version 4 client sends an RPC Call with a Read
>>>>>>>>>>>>> list that contains more chunks than an NFS version 4 server
>>>>>>>>>>>>> is prepared to process, the server MUST reject the RPC by
>>>>>>>>>>>>> responding with an RDMA_MSG message containing an RPC
>> Reply
>>>>>>>>>>>>> with an accept status of GARBAGE_ARGS, or with an
>> RDMA_ERROR
>>>>>>>>>>>>> message with the rdma_err value set to ERR_CHUNK.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 4/18/2017 1:21 PM, David Noveck wrote:
>>>>>>>>>>>>> *Overall Evaluation*
>>>>>>>>>>>>> *
>>>>>>>>>>>>> *
>>>>>>>>>>>>> Major improvement over RFC5667.  Almost ready to ship.  No
>>>>>>>>>>>>> technical issues.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> A lot of my comments are basically editorial and are offered
>>>>>>>>>>>>> on a take-it-or-lease-it basis.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I think some clarification in Section 5.4.1 is needed
>>>>>>>>>>>>> although not necessarily in the ways suggested below,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> *Comments by Section*
>>>>>>>>>>>>> *5.4.1. Multiple DDP-eligible Data Items* Giventhat
>>>>>>>>>>>>> READ_PLUS no longer has any DDP-eligible data items, the
>>>>>>>>>>>>> situation described in the fifth bullet can no longer arise.
>>>>>>>>>>>>> I suggest deleting the bullet.
>>>>>>>>>>>>> The penultimate paragraph can be read as applying to some
>>>>>>>>>>>>> situations in which it shouldn't and where the extra chunks
>>>>>>>>>>>>> would very naturally ignored. For example, if you had on
>>>>>>>>>>>>> write chunk together with a READ operation which failed, the
>>>>>>>>>>>>> server would have more chunks (i.e. one) than the number it
>>>>>>>>>>>>> is prepared to process (i.e. zero). Suggest, as a possible
>> replacement:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Normally, when an NFS version 4 client sends an RPC Call
>>>>>>>>>>>>> with a Write list that contains multiple chunks. each such,
>>>>>>>>>>>>> when matched with a DDP-eligible data item in the response,
>>>>>>>>>>>>> directs the placement of the data item as specified by
>>>>>>>>>>>>> [I.D.-nfsv4-rfc5666bis]. When there are DDP-eligible data
>>>>>>>>>>>>> items matched to write chunks that an NFS version 4 server
>>>>>>>>>>>>> is not prepared to process, the server MUST reject the RPC
>>>>>>>>>>>>> by responding with an RDMA_ERROR message with the rdma_err
>> value set to ERR_CHUNK.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> With regard to the last paragraph, I am curious that this
>>>>>>>>>>>>> paragraph, unlike the previous one, allows GARBGEARGS. Is
>>>>>>>>>>>>> this so because that would be allowed if the chunks in
>>>>>>>>>>>>> question had offsets other than those that correspond to
>>>>>>>>>>>>> DDP-eligible data items? If so, please consider the following
>> possible replacement.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Normally, when an NFS version 4 client sends an RPC Call
>>>>>>>>>>>>> with a Read list that contains multiple chunks, each such,
>>>>>>>>>>>>> when properly matched with a DDP-eliigible data item in the
>>>>>>>>>>>>> request, directs the fetching of the the data item as
>>>>>>>>>>>>> specified by [I.D.-nfsv4-rfc5666bis]. When there are
>>>>>>>>>>>>> DDP-eligible data items matched to read chunks that an NFS
>>>>>>>>>>>>> version 4 server is not prepared to process, the server MUST
>>>>>>>>>>>>> reject the RPC by responding with an RDMA_ERROR message
>> with the rdma_err value set to ERR_CHUNK.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> *5.6. Session-Related Considerations* In the third sentence
>>>>>>>>>>>>> of the second paragraph, suggest replacing "no different" by
>>>>>>>>>>>>> "not different".
>>>>>>>>>>>>> In the last sentence of the last paragraph, suggest replacing "is
>> not"
>>>>>>>>>>>>> by "were not"
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> nfsv4 mailing list
>>>>>>>>>>>>> nfsv4@ietf.org
>>>>>>>>>>>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%
>>>>>>>>>>>>> 

--
Chuck Lever