Re: [nfsv4] Review of draft-ietf-nfsv4-rfc5667bis-06

David Noveck <davenoveck@gmail.com> Tue, 28 February 2017 20:41 UTC

Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6A57F1296D2 for <nfsv4@ietfa.amsl.com>; Tue, 28 Feb 2017 12:41:16 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UVnUMZ693yDu for <nfsv4@ietfa.amsl.com>; Tue, 28 Feb 2017 12:41:05 -0800 (PST)
Received: from mail-ot0-x234.google.com (mail-ot0-x234.google.com [IPv6:2607:f8b0:4003:c0f::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 43CC91296CB for <nfsv4@ietf.org>; Tue, 28 Feb 2017 12:41:05 -0800 (PST)
Received: by mail-ot0-x234.google.com with SMTP id w44so16066111otw.2 for <nfsv4@ietf.org>; Tue, 28 Feb 2017 12:41:05 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=B7+hFBYcjlxT2m1gWkgcrjNV1IwBqC0TmpXV0829vxc=; b=FKA+BVUZh8sIk0zBdMqzFcrF6+t2KTO0Oe8Qtyre4dxZT7SkAEMRCaHBekgeRvR9O4 0FLbij7MnrnA2l6XTx3+b2NvjTyxS2h4s+S5He8x+8BU6+sbgUnZN35drl49M3ptoio9 AAIB1a4B5S8uXkxSb1X3UP7iT1kUZCXm4JDXzuYmjavIeSLiCG+CUAy4oA4I4D+hROaH I4GO/Z7ZENDF3sbVly8Q2G/jPYNz8mVqbbnM9pc/tmSUJb2ndsRxFMdDQnsPzyFMLXqH IefhzXR3Z63f6912kKBOY/FYdVZRcXMOxgzXqlZ5K7Q1QfAjYoWHkbxhFtkdZqFipdCk tUqA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=B7+hFBYcjlxT2m1gWkgcrjNV1IwBqC0TmpXV0829vxc=; b=UXFkyL7yXCYISvlUqGOFu2DffnliQW6nNpBzi7l77+G1aFgfNg+4QJPwLDyfMws+np a+SyRxk93cDodY8opgFKpvnY1OfhjDyC1dAyBldVYZ9IXRDflNZJLCYPcytJAsMumtVo 6Tbl/iPLfAZSNwRAibaKbYA9fv56fCjEn+591/QQKbDBY/8hvjqSUtVTiBFZMUtNfwde 9qNtNhQy2Vw27yW/Kci0WymArA1FO/VPAEnz5QI63HbAhqo+fXgUUK3vTrD/fkYCzBbR 6O+oT4sUEf6aDAVvDjMS01JWhiJ19jhDCkQhUyNl3Y66sbIMX2+PRaMQbh+Td0XCljLD 8Ubg==
X-Gm-Message-State: AMke39niDwkR1KCcM3R4REoQRwwKfeSxB0tKVe8z+LVoi9iKHSlAW6Ll+9SHM5BBlvjd6N9O/uVM2Aqb5CAqcA==
X-Received: by 10.157.29.182 with SMTP id y51mr2047752otd.256.1488314464475; Tue, 28 Feb 2017 12:41:04 -0800 (PST)
MIME-Version: 1.0
Received: by 10.182.137.200 with HTTP; Tue, 28 Feb 2017 12:41:03 -0800 (PST)
In-Reply-To: <D2083198-E667-4B71-AAC5-D26318BE52D6@oracle.com>
References: <CADaq8je8zfRN5R11LxJw=0st-u-XOoKosGbZDBajOTiChzpS5Q@mail.gmail.com> <93F476D6-57F8-44AB-94C9-545608396F51@oracle.com> <CADaq8jcJ3WkpmPJVVec5aJc0ekKgdHPUok=S5_ofGVJnbqrrjA@mail.gmail.com> <5538FD5E-A71B-4F91-AC3A-CBD2F54AF9E3@oracle.com> <de109940-7de1-1a09-51f3-d3be44d98c60@talpey.com> <CADaq8jf5zU0y=v4gaUxVd4scQQwyAEcgWtp11Ddcn=U4jB17pA@mail.gmail.com> <CADaq8jea99i8L=tYKM=6T-Mu78n_qzmMwrKGSsWhmgpBytZMiQ@mail.gmail.com> <D2083198-E667-4B71-AAC5-D26318BE52D6@oracle.com>
From: David Noveck <davenoveck@gmail.com>
Date: Tue, 28 Feb 2017 15:41:03 -0500
Message-ID: <CADaq8jeegoga-kB+a4e6QQEdLSCrTOmpbkSTk+4SmbqzCAfXgw@mail.gmail.com>
To: Chuck Lever <chuck.lever@oracle.com>
Content-Type: multipart/alternative; boundary="001a1141fb8af821aa05499d36df"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/cMsTejR0znxpuscG-3bZUEnDE78>
Cc: Tom Talpey <tom@talpey.com>, "nfsv4@ietf.org" <nfsv4@ietf.org>
Subject: Re: [nfsv4] Review of draft-ietf-nfsv4-rfc5667bis-06
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Feb 2017 20:41:16 -0000

> Again, the WG still has one author of that document
> who is available to us to confirm the intent of
> that text.

Tom could weigh in here, but this is text that was written over ten
years ago.  I'm sure Tom would have interesting things to say about this,
but, if he is anything like me, he wouldn't remember his precise intent
in writing any particular sentence.  He would look at the same evidence
as we have available to us, and draw conclusions about intent in the
same way we would, albeit with more knowledge of the circumstances
of composition.

In particular, I expect  he would look at the following facts in determining
intent, just as you or I might:

   - That there a reference to a write list and that the XDR thus is
   specifically designed to accommodate a variable number of items, rather
   being an optional item (a max-length-one variable-length array)..  True,
   this is in RFC5666, but given the centrality of NFS to this effort, it is
   hard to make the case that this was intended to support protocols other
   than NFS.
   - That the fourth paragraph of Section 5 of RFC5667 specifies a matching
   procedure which would make no sense if the intent was other than to support
   multiple write chunks.
   - That section 5 has an example include a write list of three elements.

Presumably those items and similar evidence led you to write section 5,4,2
in rfc5667bis.  I will listen to what Tom has to say, but it would be very
hard for him to convince me that you somehow misunderstood the intent of
RFCs 5666 an 5667.

> > > However, it might be interesting to some to leave
> > > some flexibility for future use.
> >
> > Do you mean future use within Version One?  Or we
> > could limit Version One to what it is now., in the expectation
> > that it will not exist for very long.

> Again I am at odds with your view. Version One
> has plenty of capability to last for quite a
> while,

There is no point in us in arguing "very long" vs. "quite a while".
Regardless of how long one expects Version One to be useful,  the question
is whether we anticipate sticking strictly with the capabilities of the
existing servers  or provide some way to provide within the scope of
Version One support for the features that I believe to have been intended
in RFC 5666 and 5667.

I'm not sure where you stand on this issue.  On the one hand, you say,
"However, it might be interesting to some to leave some flexibility for
future use".  Given your preference for the per-version ULB model, I
assumed that meant we would have to have some way of distinguishing servers
that provided the current level of support from other servers.  I made some
suggestions in that regard which you didn't like.  So the question is
whether you have some other way of doing this in mind or you whether you
have decided that these limitations will endure for  the entirety of
Version One, however one might characterize its duration.

> and Version Two is years away from
> appearing in storage products in a robust form.

Probably so, but to me, the fact that something is going to take a
while to do makes it more appropriate to push forward, rather than
less.  And whatever you think about Version One,, it lacks:

   - General remote invalidation support
   - A default inline threshold size that is appropriate to a protocol that
   has neither remote invalidation support nor message continuation.
   - The ability to decide on and use a threshold bigger than the default.



On Tue, Feb 28, 2017 at 12:46 PM, Chuck Lever <chuck.lever@oracle.com>
wrote:

>
> > On Feb 27, 2017, at 6:09 PM, David Noveck <davenoveck@gmail.com> wrote:
> >
> > > Earlier I had believed we had some historical reasons
> > > for matching the intent of RFC 5667.
> >
> > I think we do, where that intent is clear.
> >
> > > But if we are
> > > interested only in documenting the behavior of
> > > existing implementations,
> >
> > For some of the cases you mention, there is no possible
> > ambiguity.  The clear intent of RFC5666 and RFC5667
> > is that multiple READs and WRITEs were supposed to be
> > allowed and considerable effort was expended in explaining
> > how they would work.
>
> If you're referring to handling multiple Read or
> Write chunks per RPC, there is no clear intent.
> That text in RFC 5667 is clearly unfinished, and
> its state is a major reason the WG embarked on
> this update.
>
>
> > So it appears that some the things on your liist are bugs.  We
> > may be forced, for practical reasons, to turn a blind eye to these
> > bugs for some period of time, but this is not the same sort of
> > situation as the PZRC-read-chunk issue which is a fairly minor nit
> > based on text with rfc56667 that defies clear interpretation.  Sigh!
>
> Again, the WG still has one author of that document
> who is available to us to confirm the intent of
> that text.
>
> The problem is existing implementations do not
> match the capabilities described in RFC 5667.
>
>
> > > note that:
> >
> > interesting/depressing list deleted
>
> Not depressing. Basically this list means that the
> world has survived adequately with incomplete
> implementations of RPC-over-RDMA and NFSv4 for quite
> some time. There has been no need to support multiple
> NFSv4 READ and WRITE operations in a COMPOUND on RDMA
> or on any other transport.
>
> The issue is whether the WG believes rectifying that
> situation is an immediate need, or something that
> would be nice to allow for the future, or something
> that is not necessary for the remaining future of
> RPC-over-RDMA Version One.
>
>
> > > This could be an argument to, in addition, remove a
> > > large piece of text that discusses multiple Write chunks.
> >
> > Perhaps it could.
> >
> > > However, it might be interesting to some to leave
> > > some flexibility for future use.
> >
> > Do you mean future use within Version One?  Or we
> > could limit Version One to what it is now., in the expectation
> > that it will not exist for very long.
>
> Again I am at odds with your view. Version One
> has plenty of capability to last for quite a
> while, and Version Two is years away from
> appearing in storage products in a robust form.
>
>
> > Then we
> > could use base Version Two to implement what
> > Version One was supposed to be and get a small set
> > of performance goodies at the same time.
> >
> > If we on;t want to write off Version One in that way,
> > we need a reliable means of distinguishing servers
> > with full COMPOUND support  from those that you
> > describe.
>
> > For example, when you say a server only supports a single READ
> > operation, what does it do if it gets a COMPOUND with more than
> > one?  We could have something to work with if:
> >       • All such servers did the same thing.
> >       • It didn't result in disconnection or memory corruption.
> > I thought about an extension adding a full_rdma_compound_support
> > attribute but that doesn't work for V4.0.
> >
> > BTW, do any of these old-fashioned servers with these bugs, recognize
> > and report DDP-eligibility violations?  If not this could be a fool
> proof way
> > to distinguish servers who may have these implememtation gaps from newer
> > ones that should have full COMPOUND support for RDMA.
>
> Given the experience we had trying to detect
> the completeness of protocol features with
> RPC-over-RDMA, I really don't relish going
> down that path.
>
> As far as I am aware, no client I'm aware of
> sends such COMPOUNDs. There is some support in
> the Linux NFS server to handle such COMPOUNDS,
> but as you might guess, there's been no testing
> of this facility with real clients.
>
>
> > On Mon, Feb 27, 2017 at 11:38 AM, David Noveck <davenoveck@gmail.com>
> wrote:
> > One of the issues that Chuck has to deal with is the need
> > to make new implementations interoperable with existing
> > implementation.  That has been the source of some new
> > MUSTs.
> >
> > Regarding the use of MUSTs, I think these terms are overused
> > in general and that RFC2119's suggestion that these be used
> > sparingly (which ironically uses a "MUST") is too often ignored,
> > including by  the IESG.
> >
> > Regarding your suggestion that 5667bis is using "MUST" too
> > much, a comparison with RFC5667 is instructive.  Including
> > "MUST NOT"s, the RFC2119 term "MUST" Is  used:
> >       • 22 times in RFC5667, a 10-page spec.
> >       • 14 times in rf5667bis-06, an 18-page spec.
> > Part of Chuck's advantage here is that he deleted a lot of the
> > duplication in which RFC5667 ether repeated what was in
> > RFC5666 specialized to NFS (useless) or contradicted it (which
> > really had to go).
> >
> >
> > On Mon, Feb 27, 2017 at 8:16 AM, Tom Talpey <tom@talpey.com> wrote:
> > On 2/26/2017 3:29 PM, Chuck Lever wrote:
> > On Feb 25, 2017, at 3:54 PM, David Noveck <davenoveck@gmail.com> wrote:
> >
> > RFC 5667 Section 4 says:
> >
> > Similarly, a single RDMA Read list entry MAY be posted by the client
> > to supply the opaque file data for a WRITE request or the pathname
> > for a SYMLINK request.
> >
> > Part of the problem here is that, as you discuss later, this statement is
> > ambiguous, as the meaning of "read list entry" is not clear.
> >
> > The server MUST ignore any Read list for
> > other NFS procedures,
> >
> > As I understand it, this statement cannot apply to PZRCs, and rfc5666bis
> > has already dealt with that issue.  So, if one tried to maintain this
> paragraph,
> > in something like the RFC5667-form, some modification would have been
> > necessary to avoid essentially preventing any use of PZRCs
> >
> > as well as additional Read list entries beyond
> > the first in the list.
> >
> > I take "Read list entry" to mean Read chunk, composed of
> > multiple list entries that share the same XDR position.
> > This comports with similar language describing Write
> > chunks where a single list entry is indeed allowed to
> > have multiple segments.
> >
> > Makes sense to me.
> >
> > However, the original intent might have been "single
> > Read segment".
> >
> > It might have been but there is no way to be sure.
> >
> > We can ask Tom Talpey. If he does not recall, then
> > we have no way to be sure.
> >
> > I agree the paragraph in question could have been more clear. I'll
> > hazard a guess that it should have been written as "Read list" instead
> > of "Read list entry", meaning, an entire scatter list is provided.
> > This woud certainly match the semantic for the result of an ordinary
> > NFS Read.
> >
> > I will also observe that the statement is a MAY. That is, it prescribes
> > no behavior, and offers a choice to the implementer. It does not rule
> > out the option of posting a list.
> >
> > I think you guys need to stop worrying about writing these "rules"
> > down so literally. The only goal of RFC5667 was to isolate the tidbits
> > of NFS behaviors separate from the core rpcrdma transport. The
> > document makes relatively few MUST requirements.
> >
> > Tom.
> >
> > _______________________________________________
> > nfsv4 mailing list
> > nfsv4@ietf.org
> > https://www.ietf.org/mailman/listinfo/nfsv4
> >
> >
> > _______________________________________________
> > nfsv4 mailing list
> > nfsv4@ietf.org
> > https://www.ietf.org/mailman/listinfo/nfsv4
>
> --
> Chuck Lever
>
>
>
>