Re: [nfsv4] Server-side copy question

"J. Bruce Fields" <bfields@fieldses.org> Thu, 23 February 2017 15:03 UTC

Return-Path: <bfields@fieldses.org>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EFD6B129893 for <nfsv4@ietfa.amsl.com>; Thu, 23 Feb 2017 07:03:39 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.902
X-Spam-Level:
X-Spam-Status: No, score=-1.902 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aZdwXNvCqOFc for <nfsv4@ietfa.amsl.com>; Thu, 23 Feb 2017 07:03:37 -0800 (PST)
Received: from fieldses.org (fieldses.org [173.255.197.46]) by ietfa.amsl.com (Postfix) with ESMTP id 711B91298C6 for <nfsv4@ietf.org>; Thu, 23 Feb 2017 07:03:37 -0800 (PST)
Received: by fieldses.org (Postfix, from userid 2815) id CE0BF3ED; Thu, 23 Feb 2017 10:03:06 -0500 (EST)
Date: Thu, 23 Feb 2017 10:03:06 -0500
From: "J. Bruce Fields" <bfields@fieldses.org>
To: David Noveck <davenoveck@gmail.com>
Message-ID: <20170223150306.GA11882@fieldses.org>
References: <B65A07BE-E379-4507-A9B8-6927DF61A0A5@netapp.com> <CADaq8jdhK2NbJrAa0UHacvBS1w0ucoVpc1LJA3=mxH+_iBfMPQ@mail.gmail.com> <20170217195149.GH10894@fieldses.org> <CADaq8jc4XRaFSXz7mberFi0Dr-f+FaQcZFi=gKn9OjUifrNSyw@mail.gmail.com> <3889A2C2-9261-4809-92F9-9CF2F00A894D@gmail.com> <CADaq8jeo_r8jqj_WK3khX38LvwE0bG+5xdEwmkAwq-N6pCTeAg@mail.gmail.com> <CAABAsM6z59o91jv4rqq3-g1+6Lx4LAxOUr02Vd9frN6Mqw+vcQ@mail.gmail.com> <CADaq8jf5S55eBmuu7v+CAVFd+yZeSsU+aKwyL1F5rtr_xvnkMA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <CADaq8jf5S55eBmuu7v+CAVFd+yZeSsU+aKwyL1F5rtr_xvnkMA@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/wciCk0diwtV0jLw6QFwXU_z_EUQ>
Cc: "nfsv4@ietf.org" <nfsv4@ietf.org>
Subject: Re: [nfsv4] Server-side copy question
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Feb 2017 15:03:40 -0000

On Thu, Feb 23, 2017 at 08:00:09AM -0500, David Noveck wrote:
> > The real problem is not so much that the size can change between a client
> check and COPY. It is that, unlike CLONE, the COPY operation itself is not
> required to be
> > atomic by the protocol.
> 
> And while the protocol could define it as atomic, there isn't a lot of
> point in doing that.  There are not many servers that could implement it
> that way, and a non-atomic copy that avoids a lot of OTW traffic is better
> than a non-atomic copy that wastes network bandwidth.  In any case, if you
> don't want the file changed while it is copied, you can open it with
> DENY=WRITE before doing the COPY.
> 
> Let's get back to the issue of the erroneous specification of COPY
> semantics in RFC 7862.
> 
> Let's call the behavior currently specified as a MUST in ETC 7862,
> returning INVAL if the area to be copied spans the file's EOF, Behavior A.
> 
> The more sensible behavior, allowing the copy to proceed, terminating the
> range to be copied at the EOF, will be called behavior B.
> 
> I don't have any doubt that RFC 7862 is, as Bruce says, "just wrong" and
> that behavior B is the right choice.  The question is what to do about the
> issue.

I think what we really want is to change the MUST to something weaker
(MAY?).

Maybe it doesn't work, but I the intention was to allow COPY to be
implemented by either a copy-like or clone-like backend.

> Apart from the question of any possible errata one might or might not
> submit, the important point is that servers that manifest behavior A
> actually exist and will continue to exist for some time.
> 
> In light of this fact, clients might need to adapt to the situation that
> such servers exist.  Whether they do so, depends on how serious they
> consider the issue that Jorge has brought to our attention.  I can see some
> clients ignoring this issue since copying a range, as opposed to a whole
> file, is an unusual case.
> 
> Despite that, it is reasonable for a client to deal with the possibility of
> behavior-A servers by checking for a copy ending point past EOF and
> shortening the range to be copied.  Behavior-B servers would not be
> affected.
> 
> With regard to Bruce's concerns about potential bad consequences resulting
> from changes in EOF after the check and before the copy, I haven't been
> able to come up with a case where anything bad happens. You can't make the
> check-copy combination atomic, because copy itself isn't atomic.
> 
> With regard to Trond's suggestion to use VERIFY, it does narrow the
> window but the fundamental problem of the non-atomicity of COPY remains.
> Also, I'd be reluctant to implement things this way because it is hard to
> figure out what to do if the VERIFY fails.  You may need to retry for a
> while and then try to do the copy using READs and WRITEs.  You wind up with
> a big bunch of code that is almost never tested.  I wouldn't want to go
> there.
> 
> So this brings up back to the errata question.  The goal of an errata would
> be to prevent more behavior-A servers from being written, since we can't be
> sure that any that exist are fixed/decommissioned.

So, my only worry would be to avoid clients that are unprepared to deal
with a short read.  Thankfully I think short reads are already allowed
for other reasons, so I think the risk of a major interoperability
problem here is low?

--b.

> 
> The simplest and most obvious errata is one that simply switches from
> mandating behavior A to mandating behavior B
> 
> 
> I am worried about the time/difficulty of getting such an errata approved.
> The existing text, by using "MUST" really says two things:
> 
>    1. Servers have to implement behavior A.
>    2. Clients depend on behavior A.
> 
> We know that these two statements are wrong, and, in other situations, the
> editor  (Tom) would quickly issue a correction.
> 
> I don't know the details of the errata process but I'm pretty sure it will
> be nothing like the editor simply considering the matter and making the
> necessary correction.  It will involve one or more cycles of  review,
> inquiry, and reassurance, to make sure that existing clients and servers
> will not be adversely affected.  As a result we will have to tell people:
> 
> 
> a) That all (or most) existing servers will be considered non-compliant
> 
> b) That the clients who have been interacting successfully with behavior-A
> servers will have no problems interacting with behavior-B servers, despite
> statement 2 above.
> 
> Given that, approval should not be considered a sure thing and is not going
> to happen quickly.
> 
> Other possible errata texts have been proposed but they don't really
> address the issue of preventing new Behavior-A servers from being created.
> 
> Given the potential difficulties of the errata process and the fact that
> behavior-A servers will continue to exist for a while no matter what
> happens, I think the focus has to be on client changes to work around
> behavior A, if this issue is important enough to do anything about.
> 
> 
> 
> 
> 
> 
> On Feb 18, 2017 11:23 PM, "Trond Myklebust" <trondmy@gmail.com> wrote:
> 
> 
> 
> On 18 February 2017 at 15:57, David Noveck <davenoveck@gmail.com> wrote:
> 
> > > Could we please make that message to implementors stronger, but saying “the server SHOULD limit the are to be copied to reflect the size of the source file, but it MAY fail the operation with NFS4ERR_INVAL”?
> >
> > I think we can do something in this regard but your text is not in accord with RFC2119, although I recall seeing things like this in other documents.  SHOULD means a client is allowed to do something else under some fairly restrictive conditions while MAY says he can choose to do something simply because he wants to.  While it is reasonable to say you "should do X by may do Y", saying you "SHOULD do X but MAY do Y" is almost always self-contradictory.
> >
> > How about leaving the proposed text and adding the following sentence?
> >
> > The latter is generally preferable since the former might  force clients implementing primitives for copying byte ranges to check the size of the file before issuing the copy, which in turn raises issues because such a check and the copy are done at different times and the size might change between the two operations.
> >
> >
> The real problem is not so much that the size can change between a client
> check and COPY. It is that, unlike CLONE, the COPY operation itself is not
> required to be atomic by the protocol. That means that most implementations
> will have to accept one of either two options:
> 
>    1. The server on which the source file resides must somehow cooperate to
>    prevent truncation of the file while the copy is in progress. Note that
>    since the client MAY be holding a file byte range or share lock, it is
>    impossible for either one of the servers to make use of file locks.
>    2. Accept the inevitability that a truncate on the source may cut the
>    COPY short at any time. i.e. that the TOCTOU race condition lasts for the
>    entire duration of the (synchronous or asynchronous) copy.
> 
> Note also that if the _client_ wants to enforce this EINVAL semantic, it
> can easily throw in a VERIFY operation in order to check the file length
> before attempting the COPY operation. Granted that is also non-atomic, but
> that's about as good as it gets here.
> 
> On Sat, Feb 18, 2017 at 1:29 PM, Trond Myklebust <trondmy@gmail.com> wrote:
> >
> >>
> >> So here is my proposed corrected text:
> >>
> >> When the source offset is greater than the size of the source file, the
> >> error
> >>
> >> is reported by failing the operation with NFS4ERR_INVAL.  Otherwise, If
> >>
> >> the source offset plus count is greater than the size of the source
> >> file, the
> >>
> >> server MAY fail the operation with NFS4ERR_INVAL.or limit the area to
> >>
> >> be copied to reflect the size of the source file.
> >>
> >> Could we please make that message to implementors stronger, but saying
> >> “the server SHOULD limit the are to be copied to reflect the size of the
> >> source file, but it MAY fail the operation with NFS4ERR_INVAL”?
> >>
> >> Cheers
> >>   Trond
> >>
> >>
> >>
> >> On Feb 18, 2017, at 07:45, David Noveck <davenoveck@gmail.com> wrote:
> >>
> >> > > I agree but think that the most expedient way of addressing this
> >> problem is
> >> > > in the Linux client,  It can check against the file size before
> >> issuing the
> >> > > COPY request  to prevent an erroneous INVAL error being returned.
> >>
> >> > That doesn't work, due to races with concurrent modifications to the
> >> > file.
> >>
> >> I'm not sure whether it works or not.  I can't find an obvious breakage
> >> but I can't prove that none exists.
> >>
> >> In any case, we can't consider this an easy fix for the issue.
> >>
> >> > The server itself would seem susceptible to such races: if it implements
> >> > the copy as a read-write loop then blocking concurrent truncates
> >> > probably isn't reasonable, so its only choice is a short copy.  (Too
> >> > late to return INVAL once you've written to the destination....).
> >>
> >> That's an implementation problem that would exist no matter
> >> what we do to the spec text under discussion.
> >>
> >> > I really think the spec's just wrong here.'
> >>
> >> I agree, as does Tom and Jorge as well I suppose.   Nobody has said it
> >> was a good choice.
> >>
> >> The question is what to do to fix it.  This is particularly difficult
> >> given
> >> the misuse of "MUST".  The problem I'm worried about is that
> >> the existing approach is presented as required for interoperability.
> >> We know of cases where that approach has been followed and
> >> the problem is that the existing spec essentially says that clients
> >> might break if servers do something else. :-(
> >>
> >> We know that isn't true but sombody (Spencer D?) is going to have
> >> to convince people in the IESG of that.
> >>
> >> So I think the focus of any correction has to be on the fact that the
> >> "MUST"
> >> is wrong rather than on the fact (also true) that the wrong behavior was
> >> chosen.
> >>
> >> So here is my proposed corrected text:
> >>
> >> When the source offset is greater than the size of the source file, the
> >> error
> >>
> >> is reported by failing the operation with NFS4ERR_INVAL.  Otherwise, If
> >>
> >> the source offset plus count is greater than the size of the source
> >> file, the
> >>
> >> server MAY fail the operation with NFS4ERR_INVAL.or limit the area to
> >>
> >> be copied to reflect the size of the source file.
> >>
> >>
> >>
> >>
> >> On Fri, Feb 17, 2017 at 2:51 PM, J. Bruce Fields <bfields@fieldses.org>
> >> wrote:
> >>
> >>> On Thu, Feb 16, 2017 at 08:00:36AM -0500, David Noveck wrote:
> >>> > I agree but think that the most expedient way of addressing this
> >>> problem is
> >>> > in the Linux client,  It can check against the file size before
> >>> issuing the
> >>> > COPY request  to prevent an erroneous INVAL error being returned.
> >>>
> >>> That doesn't work, due to races with concurrent modifications to the
> >>> file.
> >>>
> >>> The server itself would seem succeptible to such races: if it implements
> >>> the copy as a read-write loop then blocking concurrent truncates
> >>> probably isn't reasonable, so its only choice is a short copy.  (Too
> >>> late to return INVAL once you've written to the destination....).
> >>>
> >>> I really think the spec's just wrong here.
> >>>
> >>> --b.
> >>>
> >>
> >> _______________________________________________
> >> nfsv4 mailing list
> >> nfsv4@ietf.org
> >> https://www.ietf.org/mailman/listinfo/nfsv4
> >>
> >>
> >>
> >