Re: [nfsv4] Server-side copy question

David Noveck <davenoveck@gmail.com> Thu, 23 February 2017 21:46 UTC

Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 65AB8129AE2 for <nfsv4@ietfa.amsl.com>; Thu, 23 Feb 2017 13:46:40 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Level:
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BDInxWyQWmoN for <nfsv4@ietfa.amsl.com>; Thu, 23 Feb 2017 13:46:38 -0800 (PST)
Received: from mail-ot0-x230.google.com (mail-ot0-x230.google.com [IPv6:2607:f8b0:4003:c0f::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 866C1129B0D for <nfsv4@ietf.org>; Thu, 23 Feb 2017 13:46:34 -0800 (PST)
Received: by mail-ot0-x230.google.com with SMTP id j38so2958689otb.3 for <nfsv4@ietf.org>; Thu, 23 Feb 2017 13:46:34 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=Wre3GjMhx6oF60lEceQRW0A4+1r9fbmmlcMOx4hWG6Q=; b=TlUwnnBFnTYUFfLIqz/aswGK3+U9S1pf54kV5ylbAI9Z/kKkDQqIMRoYYWOh15Uwpl 56TVE4CGpc6P5Co7LIrw4GvNvjCwPqnSFNu1dZq/qjEyUTfi+Z7xwGkCrEdV1wp9+wPH yeZp4coQ/AoQyLUm9PZbSuSKHirQBVQ93NGLgSMKhyt5bpETUEyA2AuNOqjIysn1QDQE 36PFpabjFent7feTvmYfpivizVJ5a7cNWddoeKh1tMTUCEZYWMvVKxvuBRaB4Tuqxvs9 qe2KEDuWCuhT28MLYX/SvloFBjUbb4U1BXOy8zx2Hsyj5mHsx2QSWELQp0dd2JCxTokO DWsg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=Wre3GjMhx6oF60lEceQRW0A4+1r9fbmmlcMOx4hWG6Q=; b=gxjlQ3dIfEHk5CHaGzZ1hXnU/zzpL9f3iPCY+k3IpvpuSxstv/u2R/wtUa1KmAmneA zLY5Cy65seg++ndJsiaMPalcfXLIMnjOkqmdQXnM+8iDGtAEtcAJ5Aiq/bTYq1mqFay3 htwtq2O/mf0qV0Vh7QWP2aoWl17uUpj2+UtYnzl7D0eauUO+hCORlwqtEtoayBfGhf7H VosPagUFFLjP/3rXAbwYTozLrMjTZ4SCGd91YF3wXDLyZijjnjJZjLWAXy2bQPvInku2 tHXJ4qB0fXWf3dKpELPF0YZF3E4OB97NLSUyggI5X/26XjpynmmXK9JpmoRxCm0ZDDrc uTjg==
X-Gm-Message-State: AMke39mvdEbjEAeHJEVNW4I7I3BLiYnRidYiD8XFCwgAf6o8z1APD9nbMH9f0Hva1mAuVcmTHJbEi3Y/0nsqGg==
X-Received: by 10.157.60.241 with SMTP id t46mr23794320otf.224.1487886393717; Thu, 23 Feb 2017 13:46:33 -0800 (PST)
MIME-Version: 1.0
Received: by 10.182.137.200 with HTTP; Thu, 23 Feb 2017 13:46:33 -0800 (PST)
In-Reply-To: <20170223150306.GA11882@fieldses.org>
References: <B65A07BE-E379-4507-A9B8-6927DF61A0A5@netapp.com> <CADaq8jdhK2NbJrAa0UHacvBS1w0ucoVpc1LJA3=mxH+_iBfMPQ@mail.gmail.com> <20170217195149.GH10894@fieldses.org> <CADaq8jc4XRaFSXz7mberFi0Dr-f+FaQcZFi=gKn9OjUifrNSyw@mail.gmail.com> <3889A2C2-9261-4809-92F9-9CF2F00A894D@gmail.com> <CADaq8jeo_r8jqj_WK3khX38LvwE0bG+5xdEwmkAwq-N6pCTeAg@mail.gmail.com> <CAABAsM6z59o91jv4rqq3-g1+6Lx4LAxOUr02Vd9frN6Mqw+vcQ@mail.gmail.com> <CADaq8jf5S55eBmuu7v+CAVFd+yZeSsU+aKwyL1F5rtr_xvnkMA@mail.gmail.com> <20170223150306.GA11882@fieldses.org>
From: David Noveck <davenoveck@gmail.com>
Date: Thu, 23 Feb 2017 16:46:33 -0500
Message-ID: <CADaq8jc6vWdXonTs3795QWO8SWJTudo=x9=gR4uKN+tnfBk+dw@mail.gmail.com>
To: "J. Bruce Fields" <bfields@fieldses.org>
Content-Type: multipart/alternative; boundary="94eb2c1924e4f6bcc30549398bda"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/rcGJFnkId0bVbzoTzsI7nEPJZlM>
Cc: "nfsv4@ietf.org" <nfsv4@ietf.org>
Subject: Re: [nfsv4] Server-side copy question
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Feb 2017 21:46:40 -0000

> I think what we really want is to change the MUST to something weaker
> (MAY?).

I think we should stay away from the question of what we really want,
since everybody's preferences are different.

Fortunately, we don't need agreement/consensus on that.  Anyone,
can submit an errata.  All you need is a suggested corrected text that
people will not object strongly to and will regard as an improvement.
Given how broken the existing text is, that is a pretty low bar.

I have already suggested using MAY so I'm on board.  Trond
had some objections but if he can't come up with something better,
I don't think he'll object to your approach.

> Maybe it doesn't work, but I the intention was to allow COPY to be
> implemented by either a copy-like or clone-like backend.

That's fine but the question is going to be about how such a client
deals with server's who manifest behavior A, mandated by RFC 7862
and allowed by the corrected text we are discussing.

> So, my only worry would be to avoid clients that are unprepared to deal
> with a short read.  Thankfully I think short reads are already allowed
> for other reasons, so I think the risk of a major interoperability
> problem here is low?

Given that this only occurs in the range case, which is quite rarely used,
the chance of a "major" interoperability problem is not low or
infinitesimal.
It is zero.

As far significant interoperability problems. we already have the one
pointed out by Jorge, which we know exists, and the chance of any
other hypothetical problem has to be lower than that.

If you want this change, go for  it and submit the errata.  I don't expect
opposition from within the working group but this is not going to be a
walkover.

On Thu, Feb 23, 2017 at 10:03 AM, J. Bruce Fields <bfields@fieldses.org>
wrote:

> On Thu, Feb 23, 2017 at 08:00:09AM -0500, David Noveck wrote:
> > > The real problem is not so much that the size can change between a
> client
> > check and COPY. It is that, unlike CLONE, the COPY operation itself is
> not
> > required to be
> > > atomic by the protocol.
> >
> > And while the protocol could define it as atomic, there isn't a lot of
> > point in doing that.  There are not many servers that could implement it
> > that way, and a non-atomic copy that avoids a lot of OTW traffic is
> better
> > than a non-atomic copy that wastes network bandwidth.  In any case, if
> you
> > don't want the file changed while it is copied, you can open it with
> > DENY=WRITE before doing the COPY.
> >
> > Let's get back to the issue of the erroneous specification of COPY
> > semantics in RFC 7862.
> >
> > Let's call the behavior currently specified as a MUST in ETC 7862,
> > returning INVAL if the area to be copied spans the file's EOF, Behavior
> A.
> >
> > The more sensible behavior, allowing the copy to proceed, terminating the
> > range to be copied at the EOF, will be called behavior B.
> >
> > I don't have any doubt that RFC 7862 is, as Bruce says, "just wrong" and
> > that behavior B is the right choice.  The question is what to do about
> the
> > issue.
>
> I think what we really want is to change the MUST to something weaker
> (MAY?).
>
> Maybe it doesn't work, but I the intention was to allow COPY to be
> implemented by either a copy-like or clone-like backend.
>
> > Apart from the question of any possible errata one might or might not
> > submit, the important point is that servers that manifest behavior A
> > actually exist and will continue to exist for some time.
> >
> > In light of this fact, clients might need to adapt to the situation that
> > such servers exist.  Whether they do so, depends on how serious they
> > consider the issue that Jorge has brought to our attention.  I can see
> some
> > clients ignoring this issue since copying a range, as opposed to a whole
> > file, is an unusual case.
> >
> > Despite that, it is reasonable for a client to deal with the possibility
> of
> > behavior-A servers by checking for a copy ending point past EOF and
> > shortening the range to be copied.  Behavior-B servers would not be
> > affected.
> >
> > With regard to Bruce's concerns about potential bad consequences
> resulting
> > from changes in EOF after the check and before the copy, I haven't been
> > able to come up with a case where anything bad happens. You can't make
> the
> > check-copy combination atomic, because copy itself isn't atomic.
> >
> > With regard to Trond's suggestion to use VERIFY, it does narrow the
> > window but the fundamental problem of the non-atomicity of COPY remains.
> > Also, I'd be reluctant to implement things this way because it is hard to
> > figure out what to do if the VERIFY fails.  You may need to retry for a
> > while and then try to do the copy using READs and WRITEs.  You wind up
> with
> > a big bunch of code that is almost never tested.  I wouldn't want to go
> > there.
> >
> > So this brings up back to the errata question.  The goal of an errata
> would
> > be to prevent more behavior-A servers from being written, since we can't
> be
> > sure that any that exist are fixed/decommissioned.
>
> So, my only worry would be to avoid clients that are unprepared to deal
> with a short read.  Thankfully I think short reads are already allowed
> for other reasons, so I think the risk of a major interoperability
> problem here is low?
>
> --b.
>
> >
> > The simplest and most obvious errata is one that simply switches from
> > mandating behavior A to mandating behavior B
> >
> >
> > I am worried about the time/difficulty of getting such an errata
> approved.
> > The existing text, by using "MUST" really says two things:
> >
> >    1. Servers have to implement behavior A.
> >    2. Clients depend on behavior A.
> >
> > We know that these two statements are wrong, and, in other situations,
> the
> > editor  (Tom) would quickly issue a correction.
> >
> > I don't know the details of the errata process but I'm pretty sure it
> will
> > be nothing like the editor simply considering the matter and making the
> > necessary correction.  It will involve one or more cycles of  review,
> > inquiry, and reassurance, to make sure that existing clients and servers
> > will not be adversely affected.  As a result we will have to tell people:
> >
> >
> > a) That all (or most) existing servers will be considered non-compliant
> >
> > b) That the clients who have been interacting successfully with
> behavior-A
> > servers will have no problems interacting with behavior-B servers,
> despite
> > statement 2 above.
> >
> > Given that, approval should not be considered a sure thing and is not
> going
> > to happen quickly.
> >
> > Other possible errata texts have been proposed but they don't really
> > address the issue of preventing new Behavior-A servers from being
> created.
> >
> > Given the potential difficulties of the errata process and the fact that
> > behavior-A servers will continue to exist for a while no matter what
> > happens, I think the focus has to be on client changes to work around
> > behavior A, if this issue is important enough to do anything about.
> >
> >
> >
> >
> >
> >
> > On Feb 18, 2017 11:23 PM, "Trond Myklebust" <trondmy@gmail.com> wrote:
> >
> >
> >
> > On 18 February 2017 at 15:57, David Noveck <davenoveck@gmail.com> wrote:
> >
> > > > Could we please make that message to implementors stronger, but
> saying “the server SHOULD limit the are to be copied to reflect the size of
> the source file, but it MAY fail the operation with NFS4ERR_INVAL”?
> > >
> > > I think we can do something in this regard but your text is not in
> accord with RFC2119, although I recall seeing things like this in other
> documents.  SHOULD means a client is allowed to do something else under
> some fairly restrictive conditions while MAY says he can choose to do
> something simply because he wants to.  While it is reasonable to say you
> "should do X by may do Y", saying you "SHOULD do X but MAY do Y" is almost
> always self-contradictory.
> > >
> > > How about leaving the proposed text and adding the following sentence?
> > >
> > > The latter is generally preferable since the former might  force
> clients implementing primitives for copying byte ranges to check the size
> of the file before issuing the copy, which in turn raises issues because
> such a check and the copy are done at different times and the size might
> change between the two operations.
> > >
> > >
> > The real problem is not so much that the size can change between a client
> > check and COPY. It is that, unlike CLONE, the COPY operation itself is
> not
> > required to be atomic by the protocol. That means that most
> implementations
> > will have to accept one of either two options:
> >
> >    1. The server on which the source file resides must somehow cooperate
> to
> >    prevent truncation of the file while the copy is in progress. Note
> that
> >    since the client MAY be holding a file byte range or share lock, it is
> >    impossible for either one of the servers to make use of file locks.
> >    2. Accept the inevitability that a truncate on the source may cut the
> >    COPY short at any time. i.e. that the TOCTOU race condition lasts for
> the
> >    entire duration of the (synchronous or asynchronous) copy.
> >
> > Note also that if the _client_ wants to enforce this EINVAL semantic, it
> > can easily throw in a VERIFY operation in order to check the file length
> > before attempting the COPY operation. Granted that is also non-atomic,
> but
> > that's about as good as it gets here.
> >
> > On Sat, Feb 18, 2017 at 1:29 PM, Trond Myklebust <trondmy@gmail.com>
> wrote:
> > >
> > >>
> > >> So here is my proposed corrected text:
> > >>
> > >> When the source offset is greater than the size of the source file,
> the
> > >> error
> > >>
> > >> is reported by failing the operation with NFS4ERR_INVAL.  Otherwise,
> If
> > >>
> > >> the source offset plus count is greater than the size of the source
> > >> file, the
> > >>
> > >> server MAY fail the operation with NFS4ERR_INVAL.or limit the area to
> > >>
> > >> be copied to reflect the size of the source file.
> > >>
> > >> Could we please make that message to implementors stronger, but saying
> > >> “the server SHOULD limit the are to be copied to reflect the size of
> the
> > >> source file, but it MAY fail the operation with NFS4ERR_INVAL”?
> > >>
> > >> Cheers
> > >>   Trond
> > >>
> > >>
> > >>
> > >> On Feb 18, 2017, at 07:45, David Noveck <davenoveck@gmail.com> wrote:
> > >>
> > >> > > I agree but think that the most expedient way of addressing this
> > >> problem is
> > >> > > in the Linux client,  It can check against the file size before
> > >> issuing the
> > >> > > COPY request  to prevent an erroneous INVAL error being returned.
> > >>
> > >> > That doesn't work, due to races with concurrent modifications to the
> > >> > file.
> > >>
> > >> I'm not sure whether it works or not.  I can't find an obvious
> breakage
> > >> but I can't prove that none exists.
> > >>
> > >> In any case, we can't consider this an easy fix for the issue.
> > >>
> > >> > The server itself would seem susceptible to such races: if it
> implements
> > >> > the copy as a read-write loop then blocking concurrent truncates
> > >> > probably isn't reasonable, so its only choice is a short copy.  (Too
> > >> > late to return INVAL once you've written to the destination....).
> > >>
> > >> That's an implementation problem that would exist no matter
> > >> what we do to the spec text under discussion.
> > >>
> > >> > I really think the spec's just wrong here.'
> > >>
> > >> I agree, as does Tom and Jorge as well I suppose.   Nobody has said it
> > >> was a good choice.
> > >>
> > >> The question is what to do to fix it.  This is particularly difficult
> > >> given
> > >> the misuse of "MUST".  The problem I'm worried about is that
> > >> the existing approach is presented as required for interoperability.
> > >> We know of cases where that approach has been followed and
> > >> the problem is that the existing spec essentially says that clients
> > >> might break if servers do something else. :-(
> > >>
> > >> We know that isn't true but sombody (Spencer D?) is going to have
> > >> to convince people in the IESG of that.
> > >>
> > >> So I think the focus of any correction has to be on the fact that the
> > >> "MUST"
> > >> is wrong rather than on the fact (also true) that the wrong behavior
> was
> > >> chosen.
> > >>
> > >> So here is my proposed corrected text:
> > >>
> > >> When the source offset is greater than the size of the source file,
> the
> > >> error
> > >>
> > >> is reported by failing the operation with NFS4ERR_INVAL.  Otherwise,
> If
> > >>
> > >> the source offset plus count is greater than the size of the source
> > >> file, the
> > >>
> > >> server MAY fail the operation with NFS4ERR_INVAL.or limit the area to
> > >>
> > >> be copied to reflect the size of the source file.
> > >>
> > >>
> > >>
> > >>
> > >> On Fri, Feb 17, 2017 at 2:51 PM, J. Bruce Fields <
> bfields@fieldses.org>
> > >> wrote:
> > >>
> > >>> On Thu, Feb 16, 2017 at 08:00:36AM -0500, David Noveck wrote:
> > >>> > I agree but think that the most expedient way of addressing this
> > >>> problem is
> > >>> > in the Linux client,  It can check against the file size before
> > >>> issuing the
> > >>> > COPY request  to prevent an erroneous INVAL error being returned.
> > >>>
> > >>> That doesn't work, due to races with concurrent modifications to the
> > >>> file.
> > >>>
> > >>> The server itself would seem succeptible to such races: if it
> implements
> > >>> the copy as a read-write loop then blocking concurrent truncates
> > >>> probably isn't reasonable, so its only choice is a short copy.  (Too
> > >>> late to return INVAL once you've written to the destination....).
> > >>>
> > >>> I really think the spec's just wrong here.
> > >>>
> > >>> --b.
> > >>>
> > >>
> > >> _______________________________________________
> > >> nfsv4 mailing list
> > >> nfsv4@ietf.org
> > >> https://www.ietf.org/mailman/listinfo/nfsv4
> > >>
> > >>
> > >>
> > >
>