Re: [nfsv4] Review of draft-ietf-nfsv4-rfc5667bis-06

Chuck Lever <chuck.lever@oracle.com> Thu, 02 March 2017 18:59 UTC

Return-Path: <chuck.lever@oracle.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C9ADC1295D2 for <nfsv4@ietfa.amsl.com>; Thu, 2 Mar 2017 10:59:23 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.202
X-Spam-Level:
X-Spam-Status: No, score=-4.202 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H2=-0.001, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Wt1HJ0xZfgf8 for <nfsv4@ietfa.amsl.com>; Thu, 2 Mar 2017 10:59:22 -0800 (PST)
Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 728131295D3 for <nfsv4@ietf.org>; Thu, 2 Mar 2017 10:59:22 -0800 (PST)
Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id v22IxKGE017769 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 2 Mar 2017 18:59:20 GMT
Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserv0021.oracle.com (8.13.8/8.14.4) with ESMTP id v22IxJuW010974 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 2 Mar 2017 18:59:20 GMT
Received: from abhmp0017.oracle.com (abhmp0017.oracle.com [141.146.116.23]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id v22IxJJd018289; Thu, 2 Mar 2017 18:59:19 GMT
Received: from dhcp184.cthon.org (/70.213.12.118) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 02 Mar 2017 10:59:18 -0800
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Chuck Lever <chuck.lever@oracle.com>
In-Reply-To: <CADaq8jdgdO1k3iW9yo7n2N1Yo6cAjvXznaWk-tN3ChftmzMJfQ@mail.gmail.com>
Date: Thu, 02 Mar 2017 10:59:14 -0800
Content-Transfer-Encoding: quoted-printable
Message-Id: <8B939A70-8875-495E-811C-80047B37B335@oracle.com>
References: <CADaq8je8zfRN5R11LxJw=0st-u-XOoKosGbZDBajOTiChzpS5Q@mail.gmail.com> <93F476D6-57F8-44AB-94C9-545608396F51@oracle.com> <CADaq8jcJ3WkpmPJVVec5aJc0ekKgdHPUok=S5_ofGVJnbqrrjA@mail.gmail.com> <5538FD5E-A71B-4F91-AC3A-CBD2F54AF9E3@oracle.com> <de109940-7de1-1a09-51f3-d3be44d98c60@talpey.com> <CADaq8jf5zU0y=v4gaUxVd4scQQwyAEcgWtp11Ddcn=U4jB17pA@mail.gmail.com> <CADaq8jea99i8L=tYKM=6T-Mu78n_qzmMwrKGSsWhmgpBytZMiQ@mail.gmail.com> <D2083198-E667-4B71-AAC5-D26318BE52D6@oracle.com> <CADaq8jeegoga-kB+a4e6QQEdLSCrTOmpbkSTk+4SmbqzCAfXgw@mail.gmail.com> <ACE665A3-0859-47E8-BBD6-E98A401B7656@oracle.com> <CADaq8jdgdO1k3iW9yo7n2N1Yo6cAjvXznaWk-tN3ChftmzMJfQ@mail.gmail.com>
To: David Noveck <davenoveck@gmail.com>
X-Mailer: Apple Mail (2.3124)
X-Source-IP: aserv0021.oracle.com [141.146.126.233]
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/JqxHCCkBPGP33wfQo7juSEOnUWI>
Cc: Tom Talpey <tom@talpey.com>, "nfsv4@ietf.org" <nfsv4@ietf.org>
Subject: Re: [nfsv4] Review of draft-ietf-nfsv4-rfc5667bis-06
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Mar 2017 18:59:24 -0000

> On Mar 1, 2017, at 11:57 AM, David Noveck <davenoveck@gmail.com> wrote:
> 
> > It's not a question of whether or not I liked a particular
> > proposed mechanism. My job as editor of rfc5667bis is to
> > keep this document on track and limit creeping scope.
> 
> Fair enough.  Instead of saying:
> 
> I made some suggestions in that regard which you didn't like
> 
> I should have said:
> 
> I made some suggestions in that regard that you felt would
> lead the document off track and might result in an desirable
> expansion of scope.
> 
> > I'm having trouble considering more work and more special
> > casing in rf5667bis to detect support for features that
> > do not yet have a real world application, and for which
> > there is little ability to prototype and test.
> 
> I hear you.
> 
> > Given your publicly-stated desire to see this document
> > published soon, I'm surprised you would consider
> > introducing new mechanisms at this point.
> 
> I wanted to explore the ways this could be done.   I guess
> the question regarding "at this point" is exactly where we are.
> A few days ago, I thought the document was quite close to
> WGLC.  Now it appears it is not and I'm not sure exactly where
> this document is. 
> 
> > Further, I do not like the implication that we should do
> > something only because no-one has thought of a better
> > approach. 
> 
> When did I say/imply that?
> 
> > We always have the option of not doing
> > something that adds complexity with little or no actual
> > gain.
> 
> Right, but right now we don't have the option of doing nothing.
> Either we do a fairly major surgery on an existing document,
> or we do something else.  In any case, you are not comfortable
> with any of these alternatives and so we might as well drop
> discussion of them.
> 
> However, given how ingrained the idea of list of chunks is, I 
> don't think that this can be though of as a minimally invasive 
> featurectomy.  That's why I thought of alternatives that you
> have trouble with.  If you and the rest of the working group,
> think the surgery to remove support for multi-chunk operation
> is the most expeditious way to proceed, I don't have a problem 
> with it.
> 
> > I'm interested right now in hearing other people's opinions
> > on whether it is worth completing rfc5667bis as strictly a
> > document of existing implementations, or whether it should
> > continue to include what amounts to a speculative feature.
> 
> I'm also interested in hearing other people's opinions.
> 
> I'm having trouble with the idea that, a major part of the
> current rfc5667bis, which I thought was pretty close to
> WGLC, has suddenly become "speculative".  
> 
> That does not mean that I think we need to keep things 
> as they are, but I think it has to be understood that to 
> remove this, we would have to do some pretty substantial 
> surgery on the current document.
> 
> > I could make due with permitting only single chunks in
> > Version One, and explore support for multiple chunks
> > (and/or more complex COMPOUNDs) in Version Two.
> 
> If that;s what you want to do, and the rest of the working group is
> OK with it, I don't have a problem.  
> 
> But note that the following documents are written to support multiple
> chunks in a request:
> 	• RFC5666
> 	• RFC5667
> 	• rfc5666bis
> 	• rfc56667 (at least up until -06)
> 	• draft-cel-rpcrdma-version-two
> so the exploration involved is going to require prototype implementation, 
> if anyone is interested in doing that.

For the moment, let's try to address the Long message
issue from RFC 5667. Basically, the original NFS ULB
specification does not match implementations when it
comes to supporting Long messages with reduced Payload
streams.

Tom has suggested to me privately that the RFC 5667
language was not intended to exclude the "reduced
Payload stream in a Long message" form.

So yesterday at Connectathon, we confirmed that:

- The Solaris client definitely sends Write + Reply
with NFSv4.0 and krb5. Therefore the Solaris server
must support it too. This matches the intent of RFC
5667.

- The Solaris client almost certainly sends PZRC +
normal Read in similar cases (for example, with a
writev(2) on NFSv4 while using krb5). Code exists,
but we haven't seen an actual request on the wire.
The Solaris server must also therefore support
this. This matches the intent of RFC 5667.

- The Linux server does not support Write + Reply,
but a confirmed fix is available for v4.12. I regard
this is as an implementation bug: the Linux server
hasn't officially supported NFS/RDMA with Kerberos
until a few months ago, so it never needed to
support this type of reply.

- The Linux server does not support PZRC + normal
Read. Likewise, this is an implementation bug for
the same reason.

- The Linux client does not send Write + Reply;
it uses Reply only. This is less efficient than it
could be, but should not be an interop problem.

- The Linux client does not send PZRC + normal
Read. Instead it will send a multi-segment PZRC.
The Solaris server does not support multi-segment
PZRCs, even though the spec allows it. We regard
this as a server implementation bug, and it is
being addressed.

It's possible that a writev(2) done on NFSv3 with
krb5 could build a substantial Read segment list,
and thus could be sent via a PZRC + normal Read
chunk.

I recognize that implementations are free to use
Long messages to send reduced Payload streams any
time they want. However, the main user of this
form seems to be NFS/RDMA on krb5. (krb5i and
krb5p always use Long messages without payload
reduction).

Therefore I propose the following:

All of the extra language added to -06 to deal
with reduced Payload streams in Long messages is
unnecessary and will be removed from -07. This
should permit the use of Long messages to convey
reduced Payload streams with all versions of NFS.


Now back to your previously scheduled thread topic.


--
Chuck Lever