Re: [nfsv4] Server-side copy question

Trond Myklebust <trondmy@gmail.com> Sun, 19 February 2017 04:23 UTC

Return-Path: <trondmy@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9DE2312949F for <nfsv4@ietfa.amsl.com>; Sat, 18 Feb 2017 20:23:49 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Level:
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id l1T53ekbG097 for <nfsv4@ietfa.amsl.com>; Sat, 18 Feb 2017 20:23:47 -0800 (PST)
Received: from mail-ot0-x235.google.com (mail-ot0-x235.google.com [IPv6:2607:f8b0:4003:c0f::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2F354129473 for <nfsv4@ietf.org>; Sat, 18 Feb 2017 20:23:47 -0800 (PST)
Received: by mail-ot0-x235.google.com with SMTP id k4so32064673otc.0 for <nfsv4@ietf.org>; Sat, 18 Feb 2017 20:23:47 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=yrVKbsDU6vuKR9Zzi1ix56lfC5Dmtw1rJ2lr59U5oVc=; b=u7ARoPXjv8rITWYRvLbDmX6pj/biaJDjR698W/8lKwIfPzRc8eF3CJwrJ4UiP6L8qF LnNeu8M3ddBxOVF7123Hh8p9q6VcvFFu8Tk032EAWpNGy9R13uBf4OXenTsTTEdcsdLX ruAKKpzOp8AghQWL6UuPyDQ0xYyLimT5XIHBdXLF0Fsw8pOuWVVhsNw34J/2VieY2XV9 f0dPcot+Jx1IqcgFks/8PmVNlWEV5gl8fg0ZWpD7wHnOneiucTihXCWw5GccBcGPG2eo pIip70iAqfy41EsZEXdAK5ebduijidoTKeayYj3Ane8ZALS1FKQroKdcYQ9J4hqQfsAo amRw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=yrVKbsDU6vuKR9Zzi1ix56lfC5Dmtw1rJ2lr59U5oVc=; b=evsKh8AucRoFx9PD8GNuUlErdBuJ4bUAQCA2kZcyt+7LirPQDfAakBCMphllu2L/Ju fjwWQMDwg4q7O7SGUuRqRqR71JdfT2pAZQgQIy+EHIO7dUlFkIG5zQYxJ6qgk7tH6+Wb 9MR3WBc6MCuMDUQPr6VhRYGR9spO/XNArKu7ZPwSmB5D0vfT3urSAT6Z0bujtLEZn1vv h8iukj1w56GAej0deKEnEUqSeY6ouQOtezv5qjz6ZkGTEIJs1uBnZKDxGi6jHvbW3liv U9PxNHdgPPsYLUbEI/Sf/c7OVHY3pqYrt1IBqF1wS6zTUK5QCNBlv1GqTf+5s0hSZZt6 08gw==
X-Gm-Message-State: AMke39lQDZ4Lu70q1H160Mo0ueMR5ZYrIJQTnO2mpck2ueJ6cOFhlxPcGYzS6PoMOwr6E82LZUQiBPvgzfzmPw==
X-Received: by 10.157.4.37 with SMTP id 34mr7194159otc.205.1487478226574; Sat, 18 Feb 2017 20:23:46 -0800 (PST)
MIME-Version: 1.0
Received: by 10.157.41.239 with HTTP; Sat, 18 Feb 2017 20:23:45 -0800 (PST)
In-Reply-To: <CADaq8jeo_r8jqj_WK3khX38LvwE0bG+5xdEwmkAwq-N6pCTeAg@mail.gmail.com>
References: <B65A07BE-E379-4507-A9B8-6927DF61A0A5@netapp.com> <CADaq8jdhK2NbJrAa0UHacvBS1w0ucoVpc1LJA3=mxH+_iBfMPQ@mail.gmail.com> <20170217195149.GH10894@fieldses.org> <CADaq8jc4XRaFSXz7mberFi0Dr-f+FaQcZFi=gKn9OjUifrNSyw@mail.gmail.com> <3889A2C2-9261-4809-92F9-9CF2F00A894D@gmail.com> <CADaq8jeo_r8jqj_WK3khX38LvwE0bG+5xdEwmkAwq-N6pCTeAg@mail.gmail.com>
From: Trond Myklebust <trondmy@gmail.com>
Date: Sat, 18 Feb 2017 23:23:45 -0500
Message-ID: <CAABAsM6z59o91jv4rqq3-g1+6Lx4LAxOUr02Vd9frN6Mqw+vcQ@mail.gmail.com>
To: David Noveck <davenoveck@gmail.com>
Content-Type: multipart/alternative; boundary="001a113724cc4e63570548da8332"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/3P2uKwjK7wbGJZQyAkbEGSvwDGE>
Cc: Bruce James Fields <bfields@fieldses.org>, IETF NFSv4 WG Mailing List <nfsv4@ietf.org>
Subject: Re: [nfsv4] Server-side copy question
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 19 Feb 2017 04:23:49 -0000

On 18 February 2017 at 15:57, David Noveck <davenoveck@gmail.com> wrote:

> > Could we please make that message to implementors stronger, but saying “the server SHOULD limit the are to be copied to reflect the size of the source file, but it MAY fail the operation with NFS4ERR_INVAL”?
>
> I think we can do something in this regard but your text is not in accord with RFC2119, although I recall seeing things like this in other documents.  SHOULD means a client is allowed to do something else under some fairly restrictive conditions while MAY says he can choose to do something simply because he wants to.  While it is reasonable to say you "should do X by may do Y", saying you "SHOULD do X but MAY do Y" is almost always self-contradictory.
>
> How about leaving the proposed text and adding the following sentence?
>
> The latter is generally preferable since the former might  force clients implementing primitives for copying byte ranges to check the size of the file before issuing the copy, which in turn raises issues because such a check and the copy are done at different times and the size might change between the two operations.
>
>
The real problem is not so much that the size can change between a client
check and COPY. It is that, unlike CLONE, the COPY operation itself is not
required to be atomic by the protocol. That means that most implementations
will have to accept one of either two options:

   1. The server on which the source file resides must somehow cooperate to
   prevent truncation of the file while the copy is in progress. Note that
   since the client MAY be holding a file byte range or share lock, it is
   impossible for either one of the servers to make use of file locks.
   2. Accept the inevitability that a truncate on the source may cut the
   COPY short at any time. i.e. that the TOCTOU race condition lasts for the
   entire duration of the (synchronous or asynchronous) copy.

Note also that if the _client_ wants to enforce this EINVAL semantic, it
can easily throw in a VERIFY operation in order to check the file length
before attempting the COPY operation. Granted that is also non-atomic, but
that's about as good as it gets here.

On Sat, Feb 18, 2017 at 1:29 PM, Trond Myklebust <trondmy@gmail.com> wrote:
>
>>
>> So here is my proposed corrected text:
>>
>> When the source offset is greater than the size of the source file, the
>> error
>>
>> is reported by failing the operation with NFS4ERR_INVAL.  Otherwise, If
>>
>> the source offset plus count is greater than the size of the source
>> file, the
>>
>> server MAY fail the operation with NFS4ERR_INVAL.or limit the area to
>>
>> be copied to reflect the size of the source file.
>>
>> Could we please make that message to implementors stronger, but saying
>> “the server SHOULD limit the are to be copied to reflect the size of the
>> source file, but it MAY fail the operation with NFS4ERR_INVAL”?
>>
>> Cheers
>>   Trond
>>
>>
>>
>> On Feb 18, 2017, at 07:45, David Noveck <davenoveck@gmail.com> wrote:
>>
>> > > I agree but think that the most expedient way of addressing this
>> problem is
>> > > in the Linux client,  It can check against the file size before
>> issuing the
>> > > COPY request  to prevent an erroneous INVAL error being returned.
>>
>> > That doesn't work, due to races with concurrent modifications to the
>> > file.
>>
>> I'm not sure whether it works or not.  I can't find an obvious breakage
>> but I can't prove that none exists.
>>
>> In any case, we can't consider this an easy fix for the issue.
>>
>> > The server itself would seem susceptible to such races: if it implements
>> > the copy as a read-write loop then blocking concurrent truncates
>> > probably isn't reasonable, so its only choice is a short copy.  (Too
>> > late to return INVAL once you've written to the destination....).
>>
>> That's an implementation problem that would exist no matter
>> what we do to the spec text under discussion.
>>
>> > I really think the spec's just wrong here.'
>>
>> I agree, as does Tom and Jorge as well I suppose.   Nobody has said it
>> was a good choice.
>>
>> The question is what to do to fix it.  This is particularly difficult
>> given
>> the misuse of "MUST".  The problem I'm worried about is that
>> the existing approach is presented as required for interoperability.
>> We know of cases where that approach has been followed and
>> the problem is that the existing spec essentially says that clients
>> might break if servers do something else. :-(
>>
>> We know that isn't true but sombody (Spencer D?) is going to have
>> to convince people in the IESG of that.
>>
>> So I think the focus of any correction has to be on the fact that the
>> "MUST"
>> is wrong rather than on the fact (also true) that the wrong behavior was
>> chosen.
>>
>> So here is my proposed corrected text:
>>
>> When the source offset is greater than the size of the source file, the
>> error
>>
>> is reported by failing the operation with NFS4ERR_INVAL.  Otherwise, If
>>
>> the source offset plus count is greater than the size of the source
>> file, the
>>
>> server MAY fail the operation with NFS4ERR_INVAL.or limit the area to
>>
>> be copied to reflect the size of the source file.
>>
>>
>>
>>
>> On Fri, Feb 17, 2017 at 2:51 PM, J. Bruce Fields <bfields@fieldses.org>
>> wrote:
>>
>>> On Thu, Feb 16, 2017 at 08:00:36AM -0500, David Noveck wrote:
>>> > I agree but think that the most expedient way of addressing this
>>> problem is
>>> > in the Linux client,  It can check against the file size before
>>> issuing the
>>> > COPY request  to prevent an erroneous INVAL error being returned.
>>>
>>> That doesn't work, due to races with concurrent modifications to the
>>> file.
>>>
>>> The server itself would seem succeptible to such races: if it implements
>>> the copy as a read-write loop then blocking concurrent truncates
>>> probably isn't reasonable, so its only choice is a short copy.  (Too
>>> late to return INVAL once you've written to the destination....).
>>>
>>> I really think the spec's just wrong here.
>>>
>>> --b.
>>>
>>
>> _______________________________________________
>> nfsv4 mailing list
>> nfsv4@ietf.org
>> https://www.ietf.org/mailman/listinfo/nfsv4
>>
>>
>>
>