Re: [nfsv4] Server-side copy question

David Noveck <davenoveck@gmail.com> Thu, 23 February 2017 13:00 UTC

Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4840A12A196 for <nfsv4@ietfa.amsl.com>; Thu, 23 Feb 2017 05:00:14 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.698
X-Spam-Level:
X-Spam-Status: No, score=-2.698 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id V65PF_RCHdKK for <nfsv4@ietfa.amsl.com>; Thu, 23 Feb 2017 05:00:12 -0800 (PST)
Received: from mail-oi0-x22f.google.com (mail-oi0-x22f.google.com [IPv6:2607:f8b0:4003:c06::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C7717129736 for <nfsv4@ietf.org>; Thu, 23 Feb 2017 05:00:11 -0800 (PST)
Received: by mail-oi0-x22f.google.com with SMTP id 2so16961390oif.0 for <nfsv4@ietf.org>; Thu, 23 Feb 2017 05:00:11 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=iYxS2VC5VNWbIW98WaNK2oPm+FXJGoDQtF33pOnxGfs=; b=QnU+YC54eG1GFpN7h0MbSFkRdSAButicU5PaexLDGzidIwWbf5C7cSIcw4GgZwNqZ3 nKPafGcGY/I+IC2YU2Pm1Y+YqPNcOHuu7JvwwGvH6fgVFu/4qPmOiH6p/tkChKO9Bl/S PJxe2q6zMcQD9/vmw+z87UjB7XzM0NMJ/zFn8d8egDicpNtUqR07PTWmYSx+k82P7KEc Ei2l1KPsyrhyblGogZ5R2C6rzwUaTUa/KCKYPgNbw27NSB1TPhzjEk55YSqzJYou+Ov0 wMO0I7/60BsMmcXw8VvKgln8uVAZG/N0sGGqoQTFTs5E9SWg72v5TvzjC8wi00d66Viv xgMA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=iYxS2VC5VNWbIW98WaNK2oPm+FXJGoDQtF33pOnxGfs=; b=bK1tcNCMY05TaImpkVtqxynk9OR7wHG4nYXzdGGSMA/vvCTP3idOYBKZ9yV/Bktcrk R2diPokxZmAsXrNtcidxHdiCBQtrt0AI3rjjCGf8W9j1ccLoJ4ksKyhIIdQbWr5W51sG T+ErEP0xFBd3YW3XjUy433yO2uFSEhcUIYDOQmOrWmbJx7YfySSJQ67+/E89Hy6FY3ED dVgFeA1a3QbyE1uoq+4Ei2GlM5l8EZtIdwwRg+T3bnJ/5EsMqSwQzeXqUHTwB9jinaYY cMDHOP7Rspoh3JCiJrlNs4nei+KxILtUWhbeWcf69i9aS3AbnqcQx97sZdYzuYS5tzbB bB7Q==
X-Gm-Message-State: AMke39nJRBeDrg94vJn4V3NYNBB6oqvgKNSK4srohAojRaKBgiCjFuNqq6HTU2KA7uDMwxz3RudOWKmCT1Hp8w==
X-Received: by 10.202.169.80 with SMTP id s77mr11080895oie.165.1487854811023; Thu, 23 Feb 2017 05:00:11 -0800 (PST)
MIME-Version: 1.0
Received: by 10.182.137.200 with HTTP; Thu, 23 Feb 2017 05:00:09 -0800 (PST)
In-Reply-To: <CAABAsM6z59o91jv4rqq3-g1+6Lx4LAxOUr02Vd9frN6Mqw+vcQ@mail.gmail.com>
References: <B65A07BE-E379-4507-A9B8-6927DF61A0A5@netapp.com> <CADaq8jdhK2NbJrAa0UHacvBS1w0ucoVpc1LJA3=mxH+_iBfMPQ@mail.gmail.com> <20170217195149.GH10894@fieldses.org> <CADaq8jc4XRaFSXz7mberFi0Dr-f+FaQcZFi=gKn9OjUifrNSyw@mail.gmail.com> <3889A2C2-9261-4809-92F9-9CF2F00A894D@gmail.com> <CADaq8jeo_r8jqj_WK3khX38LvwE0bG+5xdEwmkAwq-N6pCTeAg@mail.gmail.com> <CAABAsM6z59o91jv4rqq3-g1+6Lx4LAxOUr02Vd9frN6Mqw+vcQ@mail.gmail.com>
From: David Noveck <davenoveck@gmail.com>
Date: Thu, 23 Feb 2017 08:00:09 -0500
Message-ID: <CADaq8jf5S55eBmuu7v+CAVFd+yZeSsU+aKwyL1F5rtr_xvnkMA@mail.gmail.com>
To: Trond Myklebust <trondmy@gmail.com>
Content-Type: multipart/alternative; boundary="001a113cd7da7d1cae05493231c5"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/LSZFa3Exbv5_R2xcqcE69jp808w>
Cc: "J. Bruce Fields" <bfields@fieldses.org>, "nfsv4@ietf.org" <nfsv4@ietf.org>
Subject: Re: [nfsv4] Server-side copy question
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Feb 2017 13:00:14 -0000

> The real problem is not so much that the size can change between a client
check and COPY. It is that, unlike CLONE, the COPY operation itself is not
required to be
> atomic by the protocol.

And while the protocol could define it as atomic, there isn't a lot of
point in doing that.  There are not many servers that could implement it
that way, and a non-atomic copy that avoids a lot of OTW traffic is better
than a non-atomic copy that wastes network bandwidth.  In any case, if you
don't want the file changed while it is copied, you can open it with
DENY=WRITE before doing the COPY.

Let's get back to the issue of the erroneous specification of COPY
semantics in RFC 7862.

Let's call the behavior currently specified as a MUST in ETC 7862,
returning INVAL if the area to be copied spans the file's EOF, Behavior A.

The more sensible behavior, allowing the copy to proceed, terminating the
range to be copied at the EOF, will be called behavior B.

I don't have any doubt that RFC 7862 is, as Bruce says, "just wrong" and
that behavior B is the right choice.  The question is what to do about the
issue.

Apart from the question of any possible errata one might or might not
submit, the important point is that servers that manifest behavior A
actually exist and will continue to exist for some time.

In light of this fact, clients might need to adapt to the situation that
such servers exist.  Whether they do so, depends on how serious they
consider the issue that Jorge has brought to our attention.  I can see some
clients ignoring this issue since copying a range, as opposed to a whole
file, is an unusual case.

Despite that, it is reasonable for a client to deal with the possibility of
behavior-A servers by checking for a copy ending point past EOF and
shortening the range to be copied.  Behavior-B servers would not be
affected.

With regard to Bruce's concerns about potential bad consequences resulting
from changes in EOF after the check and before the copy, I haven't been
able to come up with a case where anything bad happens. You can't make the
check-copy combination atomic, because copy itself isn't atomic.

With regard to Trond's suggestion to use VERIFY, it does narrow the
window but the fundamental problem of the non-atomicity of COPY remains.
Also, I'd be reluctant to implement things this way because it is hard to
figure out what to do if the VERIFY fails.  You may need to retry for a
while and then try to do the copy using READs and WRITEs.  You wind up with
a big bunch of code that is almost never tested.  I wouldn't want to go
there.

So this brings up back to the errata question.  The goal of an errata would
be to prevent more behavior-A servers from being written, since we can't be
sure that any that exist are fixed/decommissioned.

The simplest and most obvious errata is one that simply switches from
mandating behavior A to mandating behavior B


I am worried about the time/difficulty of getting such an errata approved.
The existing text, by using "MUST" really says two things:

   1. Servers have to implement behavior A.
   2. Clients depend on behavior A.

We know that these two statements are wrong, and, in other situations, the
editor  (Tom) would quickly issue a correction.

I don't know the details of the errata process but I'm pretty sure it will
be nothing like the editor simply considering the matter and making the
necessary correction.  It will involve one or more cycles of  review,
inquiry, and reassurance, to make sure that existing clients and servers
will not be adversely affected.  As a result we will have to tell people:


a) That all (or most) existing servers will be considered non-compliant

b) That the clients who have been interacting successfully with behavior-A
servers will have no problems interacting with behavior-B servers, despite
statement 2 above.

Given that, approval should not be considered a sure thing and is not going
to happen quickly.

Other possible errata texts have been proposed but they don't really
address the issue of preventing new Behavior-A servers from being created.

Given the potential difficulties of the errata process and the fact that
behavior-A servers will continue to exist for a while no matter what
happens, I think the focus has to be on client changes to work around
behavior A, if this issue is important enough to do anything about.






On Feb 18, 2017 11:23 PM, "Trond Myklebust" <trondmy@gmail.com> wrote:



On 18 February 2017 at 15:57, David Noveck <davenoveck@gmail.com> wrote:

> > Could we please make that message to implementors stronger, but saying “the server SHOULD limit the are to be copied to reflect the size of the source file, but it MAY fail the operation with NFS4ERR_INVAL”?
>
> I think we can do something in this regard but your text is not in accord with RFC2119, although I recall seeing things like this in other documents.  SHOULD means a client is allowed to do something else under some fairly restrictive conditions while MAY says he can choose to do something simply because he wants to.  While it is reasonable to say you "should do X by may do Y", saying you "SHOULD do X but MAY do Y" is almost always self-contradictory.
>
> How about leaving the proposed text and adding the following sentence?
>
> The latter is generally preferable since the former might  force clients implementing primitives for copying byte ranges to check the size of the file before issuing the copy, which in turn raises issues because such a check and the copy are done at different times and the size might change between the two operations.
>
>
The real problem is not so much that the size can change between a client
check and COPY. It is that, unlike CLONE, the COPY operation itself is not
required to be atomic by the protocol. That means that most implementations
will have to accept one of either two options:

   1. The server on which the source file resides must somehow cooperate to
   prevent truncation of the file while the copy is in progress. Note that
   since the client MAY be holding a file byte range or share lock, it is
   impossible for either one of the servers to make use of file locks.
   2. Accept the inevitability that a truncate on the source may cut the
   COPY short at any time. i.e. that the TOCTOU race condition lasts for the
   entire duration of the (synchronous or asynchronous) copy.

Note also that if the _client_ wants to enforce this EINVAL semantic, it
can easily throw in a VERIFY operation in order to check the file length
before attempting the COPY operation. Granted that is also non-atomic, but
that's about as good as it gets here.

On Sat, Feb 18, 2017 at 1:29 PM, Trond Myklebust <trondmy@gmail.com> wrote:
>
>>
>> So here is my proposed corrected text:
>>
>> When the source offset is greater than the size of the source file, the
>> error
>>
>> is reported by failing the operation with NFS4ERR_INVAL.  Otherwise, If
>>
>> the source offset plus count is greater than the size of the source
>> file, the
>>
>> server MAY fail the operation with NFS4ERR_INVAL.or limit the area to
>>
>> be copied to reflect the size of the source file.
>>
>> Could we please make that message to implementors stronger, but saying
>> “the server SHOULD limit the are to be copied to reflect the size of the
>> source file, but it MAY fail the operation with NFS4ERR_INVAL”?
>>
>> Cheers
>>   Trond
>>
>>
>>
>> On Feb 18, 2017, at 07:45, David Noveck <davenoveck@gmail.com> wrote:
>>
>> > > I agree but think that the most expedient way of addressing this
>> problem is
>> > > in the Linux client,  It can check against the file size before
>> issuing the
>> > > COPY request  to prevent an erroneous INVAL error being returned.
>>
>> > That doesn't work, due to races with concurrent modifications to the
>> > file.
>>
>> I'm not sure whether it works or not.  I can't find an obvious breakage
>> but I can't prove that none exists.
>>
>> In any case, we can't consider this an easy fix for the issue.
>>
>> > The server itself would seem susceptible to such races: if it implements
>> > the copy as a read-write loop then blocking concurrent truncates
>> > probably isn't reasonable, so its only choice is a short copy.  (Too
>> > late to return INVAL once you've written to the destination....).
>>
>> That's an implementation problem that would exist no matter
>> what we do to the spec text under discussion.
>>
>> > I really think the spec's just wrong here.'
>>
>> I agree, as does Tom and Jorge as well I suppose.   Nobody has said it
>> was a good choice.
>>
>> The question is what to do to fix it.  This is particularly difficult
>> given
>> the misuse of "MUST".  The problem I'm worried about is that
>> the existing approach is presented as required for interoperability.
>> We know of cases where that approach has been followed and
>> the problem is that the existing spec essentially says that clients
>> might break if servers do something else. :-(
>>
>> We know that isn't true but sombody (Spencer D?) is going to have
>> to convince people in the IESG of that.
>>
>> So I think the focus of any correction has to be on the fact that the
>> "MUST"
>> is wrong rather than on the fact (also true) that the wrong behavior was
>> chosen.
>>
>> So here is my proposed corrected text:
>>
>> When the source offset is greater than the size of the source file, the
>> error
>>
>> is reported by failing the operation with NFS4ERR_INVAL.  Otherwise, If
>>
>> the source offset plus count is greater than the size of the source
>> file, the
>>
>> server MAY fail the operation with NFS4ERR_INVAL.or limit the area to
>>
>> be copied to reflect the size of the source file.
>>
>>
>>
>>
>> On Fri, Feb 17, 2017 at 2:51 PM, J. Bruce Fields <bfields@fieldses.org>
>> wrote:
>>
>>> On Thu, Feb 16, 2017 at 08:00:36AM -0500, David Noveck wrote:
>>> > I agree but think that the most expedient way of addressing this
>>> problem is
>>> > in the Linux client,  It can check against the file size before
>>> issuing the
>>> > COPY request  to prevent an erroneous INVAL error being returned.
>>>
>>> That doesn't work, due to races with concurrent modifications to the
>>> file.
>>>
>>> The server itself would seem succeptible to such races: if it implements
>>> the copy as a read-write loop then blocking concurrent truncates
>>> probably isn't reasonable, so its only choice is a short copy.  (Too
>>> late to return INVAL once you've written to the destination....).
>>>
>>> I really think the spec's just wrong here.
>>>
>>> --b.
>>>
>>
>> _______________________________________________
>> nfsv4 mailing list
>> nfsv4@ietf.org
>> https://www.ietf.org/mailman/listinfo/nfsv4
>>
>>
>>
>