Re: [nfsv4] Review of draft-ietf-nfsv4-rfc5667bis-06

David Noveck <davenoveck@gmail.com> Mon, 06 March 2017 11:22 UTC

Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 70587129644 for <nfsv4@ietfa.amsl.com>; Mon, 6 Mar 2017 03:22:40 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LT59ClZgC9dJ for <nfsv4@ietfa.amsl.com>; Mon, 6 Mar 2017 03:22:37 -0800 (PST)
Received: from mail-ot0-x233.google.com (mail-ot0-x233.google.com [IPv6:2607:f8b0:4003:c0f::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BC51E128AC9 for <nfsv4@ietf.org>; Mon, 6 Mar 2017 03:22:37 -0800 (PST)
Received: by mail-ot0-x233.google.com with SMTP id i1so110341852ota.3 for <nfsv4@ietf.org>; Mon, 06 Mar 2017 03:22:37 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=u+XjyjZqZDyJRhPM/YNckFCdiaGUZDF697YAGq80WxU=; b=TMAdNFygxLKovhkCf+MF3cXB9hImMi2dmXYvu0ovsSSGw5zPiAtKuE9uuRfDcjolL0 DLFZx3Vp90t8kb9QnoAxaAAf4BDRKVjLTllEe6YhMadlHspT43R6SpxqLUVZFmqT8KY3 xDlV2Caa+RT1SoGBucjGmmB2DQZbTuwrJJNkx72xDqxArDQ6ffcgbZEsFT5hteahcb9G 4SswVFia3qaFD2RxxAmaPPpQUjTvo7EGq/TLzDWjVcER9eh5FMj4VrtdgSwjzUiTpjs5 G6Vzc//E1J5GocqFBlVbHq+j2UQDgxdKafptPGsPfsyPhfrYjWmDBzExKZUPgtagxgTF EJCQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=u+XjyjZqZDyJRhPM/YNckFCdiaGUZDF697YAGq80WxU=; b=XuGemgoAYI1+u+917s/rOniZ/MIfxyjwNNNdaFdQQuO+BdH3gFaMjMF7+6wHdQW2LV v14NJ9i6w2Z4Jic1MyieeSoosJPfN8YejeeQUrarTeiTk3TWQacAcHR88bCU/ogRLsL+ s8aP+gfbPLeVX43FL8GodmnM4vrHASVuVxTbSeia5conlfFiidCgNaT4/xW/a8UFNnWB T/x7zCgKqgOrpE/9wiHNDElxh/uU85B3zJAhLP1sALNEVN8EIe5YF/NptX+a5UsIqNbg wXW0TSYAwb009Fw13kI8UXFHkEY5TK0pXv9ZX4bj4NIUOkaAL+jZ9FzrZajy6sd/LdHY KjXA==
X-Gm-Message-State: AMke39k8GmTcDToj4MWJUWyRDfmi7pp+uxzWhmLL8AHyTQ6uo5G2NPXq3WhBPkBRc2wN5Yd99vpdJNyc6ab+Ew==
X-Received: by 10.157.47.38 with SMTP id h35mr6980114otb.130.1488799356931; Mon, 06 Mar 2017 03:22:36 -0800 (PST)
MIME-Version: 1.0
Received: by 10.182.137.200 with HTTP; Mon, 6 Mar 2017 03:22:35 -0800 (PST)
In-Reply-To: <4D6DCECB-BDF1-48E6-B59E-0A98D1252C8A@oracle.com>
References: <CADaq8je8zfRN5R11LxJw=0st-u-XOoKosGbZDBajOTiChzpS5Q@mail.gmail.com> <93F476D6-57F8-44AB-94C9-545608396F51@oracle.com> <CADaq8jcJ3WkpmPJVVec5aJc0ekKgdHPUok=S5_ofGVJnbqrrjA@mail.gmail.com> <5538FD5E-A71B-4F91-AC3A-CBD2F54AF9E3@oracle.com> <de109940-7de1-1a09-51f3-d3be44d98c60@talpey.com> <CADaq8jf5zU0y=v4gaUxVd4scQQwyAEcgWtp11Ddcn=U4jB17pA@mail.gmail.com> <CADaq8jea99i8L=tYKM=6T-Mu78n_qzmMwrKGSsWhmgpBytZMiQ@mail.gmail.com> <D2083198-E667-4B71-AAC5-D26318BE52D6@oracle.com> <CADaq8jeegoga-kB+a4e6QQEdLSCrTOmpbkSTk+4SmbqzCAfXgw@mail.gmail.com> <ACE665A3-0859-47E8-BBD6-E98A401B7656@oracle.com> <CADaq8jdgdO1k3iW9yo7n2N1Yo6cAjvXznaWk-tN3ChftmzMJfQ@mail.gmail.com> <4D6DCECB-BDF1-48E6-B59E-0A98D1252C8A@oracle.com>
From: David Noveck <davenoveck@gmail.com>
Date: Mon, 06 Mar 2017 06:22:35 -0500
Message-ID: <CADaq8jeheLhSHhn9w+fiPXMQARGJAc0665NWpcwWC2QP-NQgJQ@mail.gmail.com>
To: Chuck Lever <chuck.lever@oracle.com>
Content-Type: multipart/alternative; boundary="94eb2c047922cfd843054a0e1c65"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/XVsQC8IZTP_ffauX7FZ_oPrsoMc>
Cc: Tom Talpey <tom@talpey.com>, "nfsv4@ietf.org" <nfsv4@ietf.org>
Subject: Re: [nfsv4] Review of draft-ietf-nfsv4-rfc5667bis-06
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 06 Mar 2017 11:22:40 -0000

>> However, given how ingrained the idea of list of chunks is, I
>> don't think that this can be though of as a minimally invasive
>> featurectomy.

> I'd like to understand this more. What makes this harder
> than just setting a limit?

Perhaps I'm carrying the surgical analogy too far, but I want
to make it clear that I'm not saying this would be anything like
Ben Carson separating conjoined twins.

What I'm basically saying this is a lot more than wart removal and
is probably harder than you think.

> The problem with "just setting a limit" is that it results in a
> document that doesn't make sense.

> IMO, a ULB is allowed to set some limits on the use of
> the facilities in the underlying transport.

I agree.

> The approach I'm thinking of is defining those limits in rfc5667bis,
> possibly replacing the description of how to use multiple
> chunks.

You've just described multiple approaches and need to
decide on one before you scrub up.

If you just define those limits, you will wind up with a
document that doesn't make sense.  rfc5666bis defines
sensible rules for matching write chunks with data items
and rfc5667bis-06 sensibly modifies those to account for
the peculiarities of READ_PLUS.

As a result, simply adding those limits will result in a
confusing document.

If you remove those matching rule modifications, you will
have to deal with the restrictions on what READ_PLUS
can return.  Those are still valid, but would be stated
differently in an environment in which there are no requests
with multiple chunks.

I had an easier task with nfsulb.  In that case, the existing text
makes sense as the generic NFS ULB description, and adding
a version-one-specific restriction is not incongruous.

For rfc5667bis, this is surgery and the patent is likely to survive.
Unfortunately, there is no way to properly compensate the
surgeon :-(

> You might be remembering that I was leery of a CM-private-
> data-based approach, originally.

I didn't remember that.

> Christoph and others in the Linux community convinced me

Sounds like one or more acknowledgements is in order.

> this mechanism would be an appropriate platform for
> experimenting with the features you listed above.

It appears that the experiment has been successful :-)

> Please remember this was conceived of as an enabler for
> experimentation.

I'm not sure exactly what that means.  Any proposal is an
experiment in that it needs to be implemented.  If it can't,
the experiment has failed.  In this case, the experiment has
succeeded.

There was certainly nothing slapdash about this document.

> It is a naive design.

I would not use the word "naive" regarding the design.
It is simple and accomplishes what needs to be done,
which is enough for me and I hope it will be enough for
the working group.

There is one small element that does seem naive to me
and I cover it in my review, which should be out soon.

> However, as simple
> as the CM-private-data approach is, it does have a limited
> ability to be extended.

I noticed that but I wasn't particularly concerned about
extensibility.

> It would not be difficult to add
> fields that report how many of each chunk type an
> implementation supports per RPC, for example.

I thought about adding a flag indicating that the server
could support with more than one read chunk or write
chunk, but I didn't think you wanted to make rfc5667bis
dependent on a cm private data RFC.

> I have heard a rumor that the Solaris team is interested
> in this idea, at least to allow somewhat larger inline
> thresholds, both on the Solaris client and server.

Now the whole working group has heard that rumour.

If we go forward with this, as I expect we will, then it
would be good if we could hear from the Solaris team directly.



On Sun, Mar 5, 2017 at 12:51 PM, Chuck Lever <chuck.lever@oracle.com> wrote:

>
> > On Mar 1, 2017, at 11:57 AM, David Noveck <davenoveck@gmail.com> wrote:
> >
> > > It's not a question of whether or not I liked a particular
> > > proposed mechanism. My job as editor of rfc5667bis is to
> > > keep this document on track and limit creeping scope.
> >
> > Fair enough.  Instead of saying:
> >
> > I made some suggestions in that regard which you didn't like
> >
> > I should have said:
> >
> > I made some suggestions in that regard that you felt would
> > lead the document off track and might result in an desirable
> > expansion of scope.
> >
> > > I'm having trouble considering more work and more special
> > > casing in rf5667bis to detect support for features that
> > > do not yet have a real world application, and for which
> > > there is little ability to prototype and test.
> >
> > I hear you.
> >
> > > Given your publicly-stated desire to see this document
> > > published soon, I'm surprised you would consider
> > > introducing new mechanisms at this point.
> >
> > I wanted to explore the ways this could be done.   I guess
> > the question regarding "at this point" is exactly where we are.
> > A few days ago, I thought the document was quite close to
> > WGLC.  Now it appears it is not and I'm not sure exactly where
> > this document is.
> >
> > > Further, I do not like the implication that we should do
> > > something only because no-one has thought of a better
> > > approach.
> >
> > When did I say/imply that?
> >
> > > We always have the option of not doing
> > > something that adds complexity with little or no actual
> > > gain.
> >
> > Right, but right now we don't have the option of doing nothing.
> > Either we do a fairly major surgery on an existing document,
> > or we do something else.  In any case, you are not comfortable
> > with any of these alternatives and so we might as well drop
> > discussion of them.
>
> To be clear, it's a general discomfort, not something
> specific to any of the particular alternatives. I don't
> have an appetite for a lot of work on something that is
> not immediately useful here, especially because our
> mission is writing down how this stuff works right now.
>
>
> > However, given how ingrained the idea of list of chunks is, I
> > don't think that this can be though of as a minimally invasive
> > featurectomy.
>
> I'd like to understand this more. What makes this harder
> than just setting a limit?
>
>
> > That's why I thought of alternatives that you
> > have trouble with.  If you and the rest of the working group,
> > think the surgery to remove support for multi-chunk operation
> > is the most expeditious way to proceed, I don't have a problem
> > with it.
>
> IMO, a ULB is allowed to set some limits on the use of
> the facilities in the underlying transport. The approach
> I'm thinking of is defining those limits in rfc5667bis,
> possibly replacing the description of how to use multiple
> chunks.
>
>
> > > I'm interested right now in hearing other people's opinions
> > > on whether it is worth completing rfc5667bis as strictly a
> > > document of existing implementations, or whether it should
> > > continue to include what amounts to a speculative feature.
> >
> > I'm also interested in hearing other people's opinions.
> >
> > I'm having trouble with the idea that, a major part of the
> > current rfc5667bis, which I thought was pretty close to
> > WGLC, has suddenly become "speculative".
> >
> > That does not mean that I think we need to keep things
> > as they are, but I think it has to be understood that to
> > remove this, we would have to do some pretty substantial
> > surgery on the current document.
> >
> > > I could make due with permitting only single chunks in
> > > Version One, and explore support for multiple chunks
> > > (and/or more complex COMPOUNDs) in Version Two.
> >
> > If that;s what you want to do, and the rest of the working group is
> > OK with it, I don't have a problem.
> >
> > But note that the following documents are written to support multiple
> > chunks in a request:
> >       • RFC5666
> >       • RFC5667
> >       • rfc5666bis
> >       • rfc56667 (at least up until -06)
> >       • draft-cel-rpcrdma-version-two
> > so the exploration involved is going to require prototype implementation,
> > if anyone is interested in doing that.
> >
> > > > > and Version Two is years away from
> > > > > appearing in storage products in a robust form.
> > >>
> > >> Probably so, but to me, the fact that something is going to take a
> > >> while to do makes it more appropriate to push forward, rather than
> > >> less.  And whatever you think about Version One,, it lacks:
> > >>       • General remote invalidation support
> > >>       • A default inline threshold size that is appropriate to a
> protocol that has neither remote invalidation support nor message
> continuation.
> > >>       • The ability to decide on and use a threshold bigger than the
> default.
> >
> > > Right now Linux has Remote Invalidation support, a
> > > 4KB default inline threshold (when interoperating
> > > with another Linux system), and the ability to decide
> > > on and use a larger threshold (up to 64KB). In other
> > > words, everything you've named here, minus "general
> > > Remote Invalidation".
> >
> > I hadn't known that.
>
> You might be remembering that I was leery of a CM-private-
> data-based approach, originally.
>
> Christoph and others in the Linux community convinced me
> this mechanism would be an appropriate platform for
> experimenting with the features you listed above.
>
>
> > > The mechanism it uses to do these things is documented
> > > in a published personal draft:
> > >
> > >   draft-cel-nfsv4-rpcrdma-cm-pvt-msg
> >
> > If, as I believe you are, suggesting that this is an alternative to
> > accelerating work on Version Two, then I would be OK with that,
> > provided that we push forward on this work instead.  I will be looking
> > at this document with a view to seeing what barriers exist to making
> > it a working group document.
>
> Please remember this was conceived of as an enabler for
> experimentation. It is a naive design. However, as simple
> as the CM-private-data approach is, it does have a limited
> ability to be extended. It would not be difficult to add
> fields that report how many of each chunk type an
> implementation supports per RPC, for example.
>
>
> > This shouldn't be too hard, given that
> > prototypes exist and that it provides some of the performance
> > help we need.
>
> > > Linux client happens to support responder's choice
> > > Remote Invalidation. Proper generic support for Remote
> > > Invalidation will be needed to include other clients,
> > > though the only other current client implementation
> > > would need significant internal re-architecture to use
> > > Remote Invalidation of any kind.
> >
> > OK.  If expanding the client set is blocked, perhaps it would
> > be best if this document were made a working group item with a
> > view toward encouraging other servers to support this mechanism
> > interoperably.
>
> I have heard a rumor that the Solaris team is interested
> in this idea, at least to allow somewhat larger inline
> thresholds, both on the Solaris client and server.
>
>
> > > Given the existence of the CM private data mechanism,
> > > IMO we are safe focusing on adding real value to
> > > Version Two rather than rushing it forward.
> >
> > I don't think that's what I was proposing, but there isn't
> > much point in arguing about it.  I'm OK with putting
> > additional focus on the CM private data mechanism
> > instead. I don't think I'm proposing "rushing" that forward
> > either, but you will have a chance to object to any particular
> > steps you feel are imprudent.
> > On Wed, Mar 1, 2017 at 11:40 AM, Chuck Lever <chuck.lever@oracle.com>
> wrote:
>
> --
> Chuck Lever
>
>
>
>