Re: [nfsv4] Review of draft-ietf-nfsv4-rfc5667bis-06

David Noveck <davenoveck@gmail.com> Mon, 27 February 2017 00:54 UTC

Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id ABF05129494 for <nfsv4@ietfa.amsl.com>; Sun, 26 Feb 2017 16:54:41 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.698
X-Spam-Level:
X-Spam-Status: No, score=-2.698 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oFJ1xuAlQvFP for <nfsv4@ietfa.amsl.com>; Sun, 26 Feb 2017 16:54:40 -0800 (PST)
Received: from mail-oi0-x230.google.com (mail-oi0-x230.google.com [IPv6:2607:f8b0:4003:c06::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0683F129488 for <nfsv4@ietf.org>; Sun, 26 Feb 2017 16:54:40 -0800 (PST)
Received: by mail-oi0-x230.google.com with SMTP id f192so12620750oic.3 for <nfsv4@ietf.org>; Sun, 26 Feb 2017 16:54:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=N2wFMqqaqXGKMRQmcdQrKSikHD59CUSyI+yVu/RTEDo=; b=P1tv8tUZJkbo+6ZZ+0Ob9rLu/0u7t6NZEWaaGkkhbDNXZ7HDJCac0zqS0HJTIn5Wtl LKF2b/2LJKhVuHKZYQh5kcKLHgnGuFsFLuLqnFpdm6W0COl3JgPRmHLzgbtHjUFORq5T GPCwFAMD960RkJNk8F8/bwfDbJ3nP0Kp0P6jk9bhTBIpfFhyV6vPwuMNYR7Rvy1gcolA g0Cq8tro0NTXyzj6KLJoLAhrYPfCz8C+iLQz4iIdyLtgfBSDKHa6zyzeS22RqcqrsUSr AOoKxE5T6azL5HC/4e0Zc5Kh96GZaqJm24xEnHpGWjmadUWqM+R4x5Qz8elSujLBUyKj /gwA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=N2wFMqqaqXGKMRQmcdQrKSikHD59CUSyI+yVu/RTEDo=; b=c6/D++DCmWVRQF9uz3K3w9Zn5c/qlKcLHAvIEvHuNSvRKRshpAWh/gyNiyfhDYL8rf KThsgLGVU0Ti1OZ5xv1y4hjz+4eJujGvTK8iLSUq6O19q6VMppVqlhIHhhg/Yaoeo1Zw okfIqyNaJ4w6bpvMA/MgnoS4S7TyJLjk8rG2paHHTjqxou1BEWzweS/0nYsA6OXSw3Sh Zeb3OQ+5vfcBrQKEXO5k1HXFRXRXHZ2V7o7sPXInqpCTh5d+Je+sc8x9wF+i+kUcIlr9 iu6jAQmPUuCpyzKU3K8l9jeqbl2ev3k84i5sUNB2j1vTlGnHhkwmELbn3mXGjDJLpdDc VUBA==
X-Gm-Message-State: AMke39nnDuQzxJrcYncWmF7uNYye562VykTcjXypuWmicKwK3l9oj+srzAYKLB//Qd7ZzMiBsfLoyQNZ2Btihg==
X-Received: by 10.202.87.4 with SMTP id l4mr6392066oib.126.1488156879222; Sun, 26 Feb 2017 16:54:39 -0800 (PST)
MIME-Version: 1.0
Received: by 10.182.137.200 with HTTP; Sun, 26 Feb 2017 16:54:38 -0800 (PST)
In-Reply-To: <5538FD5E-A71B-4F91-AC3A-CBD2F54AF9E3@oracle.com>
References: <CADaq8je8zfRN5R11LxJw=0st-u-XOoKosGbZDBajOTiChzpS5Q@mail.gmail.com> <93F476D6-57F8-44AB-94C9-545608396F51@oracle.com> <CADaq8jcJ3WkpmPJVVec5aJc0ekKgdHPUok=S5_ofGVJnbqrrjA@mail.gmail.com> <5538FD5E-A71B-4F91-AC3A-CBD2F54AF9E3@oracle.com>
From: David Noveck <davenoveck@gmail.com>
Date: Sun, 26 Feb 2017 19:54:38 -0500
Message-ID: <CADaq8jeX4LiFeeq7ruPKQgwBhBHze6=bfvUOzRMcriLm-Zrn+Q@mail.gmail.com>
To: Chuck Lever <chuck.lever@oracle.com>
Content-Type: multipart/alternative; boundary="001a113df29e27fe63054978869e"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/_Skz1lgDB_Ee5uZbQtsUFK5F1_g>
Cc: "nfsv4@ietf.org" <nfsv4@ietf.org>
Subject: Re: [nfsv4] Review of draft-ietf-nfsv4-rfc5667bis-06
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Feb 2017 00:54:41 -0000

>  As far as I know the extant implementations do
> not support this construction, and we have to
> preserve interoperation with them.

I can see that the relevant factor is the need
to preserve interoperability, which is just as well,
since, whenever we try to interpret what RFC5667
really meant, my head starts to hurt.

If the reason for forbidding something is driven by
interoperability concerns, it would make the spec
more clear if this were said explicitly

> That reasoning could also apply in the NFSv4
> case, where it is almost a certainty that typical
> NFSv4 WRITE requests can be larger than the
> inline threshold.

Perhaps so but note that section 5.4.1 of -06 says:

Therefore, an NFS version 4 client MAY send a reduced Payload stream
in a Long Call.  An NFS version 4 client MAY enable an NFS version 4
server to send a reduced Payload stream in a Long Reply.

This wasn't in -05. It doesn't seem to be needed, but if it is wrong,
simply deleting it would not be enough. You might have to put in
something forbidding it, as is done in the legacy case.




On Sun, Feb 26, 2017 at 3:29 PM, Chuck Lever <chuck.lever@oracle.com> wrote:

>
> > On Feb 25, 2017, at 3:54 PM, David Noveck <davenoveck@gmail.com> wrote:
> >
> > > RFC 5667 Section 4 says:
> >
> > > > Similarly, a single RDMA Read list entry MAY be posted by the client
> > > > to supply the opaque file data for a WRITE request or the pathname
> > > > for a SYMLINK request.
> >
> > Part of the problem here is that, as you discuss later, this statement is
> > ambiguous, as the meaning of "read list entry" is not clear.
> >
> > > > The server MUST ignore any Read list for
> > > > other NFS procedures,
> >
> > As I understand it, this statement cannot apply to PZRCs, and rfc5666bis
> > has already dealt with that issue.  So, if one tried to maintain this
> paragraph,
> > in something like the RFC5667-form, some modification would have been
> > necessary to avoid essentially preventing any use of PZRCs
> >
> > > > as well as additional Read list entries beyond
> > > > the first in the list.
> >
> > > I take "Read list entry" to mean Read chunk, composed of
> > > multiple list entries that share the same XDR position.
> > > This comports with similar language describing Write
> > > chunks where a single list entry is indeed allowed to
> > > have multiple segments.
> >
> > Makes sense to me.
> >
> > > However, the original intent might have been "single
> > > Read segment".
> >
> > It might have been but there is no way to be sure.
>
> We can ask Tom Talpey. If he does not recall, then
> we have no way to be sure.
>
>
> > In some
> > cases, one might look to how existing implementations behave,
> > to come up with an interpretation even though it might not
> > have been what the author intended.
> >
> > > In that case, this language clearly states
> > > that Legacy clients can send only one contiguous memory
> > > region
> >
> > I think you mean that,  if that were the case, this language would
> > mean that.
> >
> > > in Legacy NFS READ and SYMLINK requests.
> >
> > I think we are talking about WRITE and SYMLINK, given
> > that read chunks are mentioned.
> >
> > While I can easily believe that existing implementations
> > limit SYMLINKs to a single contiguous buffer, I have trouble
> > believing the same about WRITEs.
>
> All server implementations I'm aware of allow WRITE
> payloads with multiple segments. That suggests the
> former interpretation is correct.
>
>
> > > At your behest I replaced the RFC 5667 language with
> > > just the definitions of which XDR data items are DDP
> > > eligible, and removed the discussion about Read list
> > > entries.
> >
> > One benefit of that is that we now know what the spec says, while,
> > as with the section 4 text you quoted, we weren't.
> >
> > > This new language permits a Legacy client to reduce the
> > > Payload stream of an NFS READ or SYMLINK request and put
> > > that stream into a Position-Zero Read chunk. The client
> > > would then send the PZRC and the non-zero position chunk:
> > > two distinct Read chunks, which is not allowed by the
> > > RFC 5667 language discussing NFS READ or SYMLINK.
> >
> > It is not clear that section 4 would disallow that.  Beyond the
> > uncertainty about what "read list entry" means and a possible need
> > to treat PZRCs separately as done in rfc5666bis, the issue is that
> > you are only supposed to ignore "additional Read list entries beyond
> > the first in the list."  To me, this would allow a PZRC, if it is the
> first in the
> > list, which it might well be.
>
> A PZRC is allowed. A non-zero position Read chunk is
> allowed. What might not be allowed is both at the
> same time.
>
>
> > Anyway, your view and my view of this obscure text doesn't really
> > count.  We have to work off the behavior of implementations.  If existing
> > implementations did or didn't do something because of this presumed
> > restriction, that would provide an answer.  Unfortunately, I don't think
> this
> > provides an answer either,
>
> At least Linux does not currently support a reduced
> Payload stream in a PZRC. Solaris is not likely to
> either.
>
> Either the original implementers interpreted the
> original specification as not allowing this
> construction, or so far there hasn't been a need
> to support it.
>
>
> > > To remain interoperable with RFC 5667-compliant
> > > implementations, only one Read chunk is allowed, which
> > > excludes the use of reduced Payload streams in a PZRC,
> > > even though RPC-over-RDMA Version One allows this
> > > construction.
> >
> > True but it is not clear to me me how this situation could ever
> > arise given:
> >       • The XDR of legacy NFS protocols
> >       • The definition of what is DDP-eligible in the ULB
>
> The situation MAY arise because it is not specifically
> excluded by the new language.
>
> Unless the Transport header is exceptionally large
> or there is exceptional authentication material,
> a reduced NFS WRITE or SYMLINK can always fit inline.
>
> However, a client MAY still choose to reduce the
> Payload stream and send it in a PZRC, even if it
> could be sent inline. There is nothing that says
> an implementation cannot use RDMA_NOMSG for all
> transactions.
>
> The point here is that if an NFSv3 WRITE is large,
> there are only two legal ways to send it:
>
>  - Reduce the payload into a Read chunk and send
> the rest inline
>
>  - Send the whole request in a PZRC (RDMA_NOMSG)
>
>
> > > There isn't a need to support sending a reduced Payload
> > > stream this way in these NFS versions, so I added the new
> > > paragraph to forbid it explicitly.
> >
> > If there is no way for this situation to arise, there is no need to
> > forbid it explicitly.
>
> There is a need to forbid it explicitly:
>
> - The new language definitely does not forbid this
> construction.
>
> - It is possible a client might use this
> construction for reasons we can't predict.
>
> - As far as I know the extant implementations do
> not support this construction, and we have to
> preserve interoperation with them.
>
> That reasoning could also apply in the NFSv4
> case, where it is almost a certainty that typical
> NFSv4 WRITE requests can be larger than the
> inline threshold.
>
> --
> Chuck Lever
>
>
>
>