Re: [nfsv4] Server-side copy question

David Noveck <davenoveck@gmail.com> Wed, 25 October 2017 10:30 UTC

Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0C6AC13AE04 for <nfsv4@ietfa.amsl.com>; Wed, 25 Oct 2017 03:30:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SAkG8dIilQCq for <nfsv4@ietfa.amsl.com>; Wed, 25 Oct 2017 03:30:35 -0700 (PDT)
Received: from mail-io0-x230.google.com (mail-io0-x230.google.com [IPv6:2607:f8b0:4001:c06::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BB74B13B12A for <nfsv4@ietf.org>; Wed, 25 Oct 2017 03:30:34 -0700 (PDT)
Received: by mail-io0-x230.google.com with SMTP id 189so436360iow.10 for <nfsv4@ietf.org>; Wed, 25 Oct 2017 03:30:34 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=EZ0x94HTmvfHYK1hnxnFM7rNbJnoh4IYMLZFac6JVgc=; b=nfnGloif+9fcRLny1d7QogaPCgL/ecy+pXvc2f0zeo+QrWtAVaIZ7zBSNsCUY71SFi 385uj+9R/3fUI1K5b6CA5gsg1tHg5mG18XSgTfNbYuVQpEul1Gpidv07pbcb24dpOqBe 9CltjSqdB9VlL2B9C8Rd/tLl+DaIZQH1Xkkf/SRNQd2dJkUPNakC3/TlEdFy5fAcTsev gRys2GgeNPVO1ehJgrl+M14a3gUPg+SdxMWVf3jlf7G0V75M9XNtHhCqTHTCF+fu5UtW Yd/rVk/kK0ELtFpjwxF9v7d3VSK4b3W2wR2rTWtd55WAjQDWUshPXUF+QcEkbVKkLMG0 zzJA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=EZ0x94HTmvfHYK1hnxnFM7rNbJnoh4IYMLZFac6JVgc=; b=s5zy7M6/n+N8+0BBhOsvDFNTQbx3AMcVQsSJU/INvpT2FbHUSUjDA8+2frioLIrEF5 rMEzlWKXp0+NVKu70kncVdiKSwG2+Sqw9kDeJHF2WMD+HS9y7hpSqQclbkIcXbrLWHRL uA5HcCEoE9/k2MF1/99l8htA/sq8eYh03jkwwOJAsateG9ijurEF6ovVdGFGCk1bAJqF iAvD1rn9ZdnveFPEN4STJ8Yf/vXxegwiRrEtPaHNc6irNRpOCv/TJqf9Jh02anOnebbQ I4NLBiwBtxCpisDUvRz+xDHgN1anBzV26YQfMj5RPXfXgWjxgylZ9Ddfa5knTD83ms93 VZZw==
X-Gm-Message-State: AMCzsaUOwEKoz+FxSaxRSVC5C6hixZ/yDyhzqtgRi/cGpGkwAbL/3WLj Uur8yIDcgTGCSszLuehEHudC9a5KnZf6I6H/zGg=
X-Google-Smtp-Source: ABhQp+R3SFUMYFEEizjNX8+vOa+6E+/KUiXvHgKfHAidlCFDYVa9D34Fg0lsxD913lqGLLpR7QPnftofLe9ZbInlNUY=
X-Received: by 10.107.12.216 with SMTP id 85mr25991264iom.80.1508927433825; Wed, 25 Oct 2017 03:30:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.107.31.68 with HTTP; Wed, 25 Oct 2017 03:30:33 -0700 (PDT)
In-Reply-To: <CAN-5tyHM5suS=iqP0XL3pD2i2YYtU2k1Wd1jnHb+FT52sXzsHw@mail.gmail.com>
References: <CADaq8jc6vWdXonTs3795QWO8SWJTudo=x9=gR4uKN+tnfBk+dw@mail.gmail.com> <20170224222135.GJ26378@fieldses.org> <CAN-5tyHJb=ercWh_W2uzPfKR86j=dGWx=zYH1y+TjbNTzYh38A@mail.gmail.com> <20170228164420.GA28845@fieldses.org> <CADaq8jdivTUVNx2LXCiecAKY-nfoZP_XoAoNLwc3V1TUD6X8vg@mail.gmail.com> <CC84C4DB-F00F-45FC-8307-BDA5424DA480@netapp.com> <20170228173215.GA29700@fieldses.org> <CAN-5tyF8EZjYNX3cE43BPM9CbxVLygRdPuTJf_RgC7ntDCf9Kg@mail.gmail.com> <20171020193306.GF15211@fieldses.org> <CAN-5tyHaJeis=_f9h9u1St76Og3A_TtBUxZx_Z=ZouG7sa2=6w@mail.gmail.com> <20171024013646.GA22943@fieldses.org> <CADaq8jdWkTfQB-Y55ow1A23AGCVM12LtwjXYZ6E7zU7M=Aw89g@mail.gmail.com> <CAN-5tyHM5suS=iqP0XL3pD2i2YYtU2k1Wd1jnHb+FT52sXzsHw@mail.gmail.com>
From: David Noveck <davenoveck@gmail.com>
Date: Wed, 25 Oct 2017 06:30:33 -0400
Message-ID: <CADaq8jcASncJWSR23VWw_6REy89DUDdyAt=F0RiAD-XFjGc_tA@mail.gmail.com>
To: Olga Kornievskaia <aglo@citi.umich.edu>
Cc: "J. Bruce Fields" <bfields@fieldses.org>, "nfsv4@ietf.org" <nfsv4@ietf.org>
Content-Type: multipart/alternative; boundary="001a113fb712af6214055c5c8b62"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/-FQA8GjqGd6QIsvjH5uupfron1k>
Subject: Re: [nfsv4] Server-side copy question
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Oct 2017 10:30:38 -0000

>I would like to understand the comment by Bruce and you about not
> knowing which bytes were copied. I don't understand why it would not
> be the bytes from the offset to offset+bytes_copied? And Dave,
> wouldn't the "rest of the bytes" be the request length - bytes_copied?s
have been copied."

It might be and it is resonable if it is but RFC7862, says, as Bruce noted,
"The coa_bytes_copied value indicates the number of bytes copied but
not which specific bytes.".  I don't know wny it says that or how those
words came
to be in the spec.  If the spec did not say that, it would reasonable to
assume that it
would be the bytes from the offset to offset+bytes_copied.  But given
RFC7862,
as it is, I think clients cannot assume that.  It may be that these words
are just
plaiin wrong, but assuming that is risky, and fixing the spec to remove
them is very
difficult, even if everyone were to agree that they shouldn't be there.
,

On Tue, Oct 24, 2017 at 11:52 AM, Olga Kornievskaia <aglo@citi.umich.edu>
wrote:

> On Tue, Oct 24, 2017 at 7:43 AM, David Noveck <davenoveck@gmail.com>
> wrote:
> >> So in the short write case it sounds like it *is* the server's
> >> responsibility to commit.
> >
> > Perhaps it should be the server's responsibility, but it is not,
> > according to RFC7862
> >
> >> That's weird,
> >
> > I agree that it is wierd.  I feel it is well-intentioned but not viable
> > since
> > you are essentially creating your own protocol, if clients depend on
> > this behavior an there is no point in doing it, if clients do not.
> >
> >> but as long as the short write
> >> case is rare, maybe it's not a big deal?
> >
> > From a performance point of view, it is not a big deal.  However it is
> > a big deal that the spec is broken. I hate when that happens :-(
> >
> >>Hm: "The coa_bytes_copied value indicates the number of bytes copied but
> >> not which specific bytes have been copied."  That makes coa_bytes_copied
> >> pretty much worthless.
> >
> > Any non-zero value of coa_bytes_copied is worthless because the client
> does
> > not know:
> >
> > Which particular bytes were copied
> > Whether those bytes were copied to the destination stably
> > How to commit any  non-stably-written bytes.
> >
> >>I guess I'd be in favor of using the NFS_OK branch of CB_OFFLOAD
> >> whenever a nonzero number of bytes is copied.
> >
> > That's likely to create an interoperability problem, since there are
> almost
> > certainly clients who assume that when there is no error, all requested
> > bytes
> > have been copied.
>
> I don't think it's true that clients would assume this. I think the
> client can only assume server copied as many bytes as it was specified
> in the CB_OFFLOAD bytes_copied regardless of how many bytes the client
> had asked for. At least that's how I read the spec.
>
> > Basically, the server is pretending there is no error and
>
> Yes, pretending that there was no error is the part I'm not totally
> comfortable with. And I agree with the comments that the client can
> try again to get the same error.
>
> > then
> > relying on the client to figure out that there was one, which could only
> be
> > done if the
> > spec tells the clients about this.  If RFC7862 doesn't do this, you wind
> up
> > essentially creating a private protocol.
> >
> >> If less than the
> >> requested number of bytes were copied, the client can always find out
> >> the error (if any) by attempting to copy the rest.
> >
> > Unfortunately there is no way of determining which bytes were copied so
> > there
> > is no way of knowing "the rest", i.e. the bytes not copied
>
> I would like to understand the comment by Bruce and you about not
> knowing which bytes were copied. I don't understand why it would not
> be the bytes from the offset to offset+bytes_copied? And Dave,
> wouldn't the "rest of the bytes" be the request length - bytes_copied?
> What am I missing?
>
> If you are suggested that the server is allowed to copy a requested
> range in any order of smaller chunks then there is a problem with a
> non-error case because we don't returned in COPY/CB_OFFLOAD offset and
> bytes_copied (just bytes_copied).
>
> I think we know which bytes are copied we just don't know how they
> were copied: stable or not (if the server chooses to specify an error
> in CB_OFFLOAD).
>
> > On Mon, Oct 23, 2017 at 9:36 PM, J. Bruce Fields <bfields@fieldses.org>
> > wrote:
> >>
> >> On Mon, Oct 23, 2017 at 05:43:42PM -0400, Olga Kornievskaia wrote:
> >> > On Fri, Oct 20, 2017 at 3:33 PM, J. Bruce Fields <
> bfields@fieldses.org>
> >> > wrote:
> >> > > On Wed, Oct 18, 2017 at 05:22:08PM -0400, Olga Kornievskaia wrote:
> >> > >> On Tue, Feb 28, 2017 at 12:32 PM, J. Bruce Fields
> >> > >> <bfields@fieldses.org> wrote:
> >> > >> > On Tue, Feb 28, 2017 at 05:04:27PM +0000, Adamson, Andy wrote:
> >> > >> >> I also thought that bytes copied is only included in the success
> >> > >> >> case – but as Olga pointed out, not true. The coa_status of
> NFS4_OK has the
> >> > >> >> coa_resok4->wr_count showing the bytes copied, and the default
> (e.g. failure
> >> > >> >> cases) has the coa_bytes_copied.
> >> > >> >>
> >> > >> >> So the server can send a CB_OFFLOAD with an error (such as the
> >> > >> >> –EINVAL in the example) and send the bytes copied in
> coa_bytes_copied.
> >> > >> >>
> >> > >> >>    union offload_info4 switch (nfsstat4 coa_status) {
> >> > >> >>    case NFS4_OK:
> >> > >> >>            write_response4 coa_resok4;
> >> > >> >>    default:
> >> > >> >>            length4         coa_bytes_copied;
> >> > >> >>    };
> >> > >> >
> >> > >> > Gah, I'm sorry, Olga said CB_OFFLOAD but for some reason I was
> >> > >> > looking
> >> > >> > at COPY.
> >> > >> >
> >> > >> > I have no idea what to think of that.  I don't see why it should
> be
> >> > >> > different from COPY.
> >> > >>
> >> > >> I'd like to resurrect this thread...
> >> > >>
> >> > >> This discussion was initiated to allow for the server to do a
> partial
> >> > >> copy instead of failing the copy with EINVAL when copy is requested
> >> > >> beyond the end of the file.
> >> > >>
> >> > >> However, the problem I'm running into is that because CB_OFFLOAD
> only
> >> > >> returns "coa_bytes_copied" and does not include stable_how and
> >> > >> verifier. So even if client were to then send a COMMIT for the
> bytes
> >> > >> that were copied, it has no verifier to match from the COMMIT to
> >> > >> anything.
> >> > >>
> >> > >> Should the client then ignore the lack of the matching verifier and
> >> > >> just assume the data was committed for the partial bytes?
> >> > >
> >> > > Yes, sounds like in the asynchronous case the server has to wait for
> >> > > the
> >> > > writes to reach disk before calling CB_OFFLOAD.  (Do we currently do
> >> > > that?).
> >> >
> >> > Why? No the spec does not require the server to do no such thing. The
> >> > server sends back in CB_OFFLOAD if the bytes were written as UNSTABLE
> >> > or FILE_SYNC.
> >>
> >> Look back at what you said before....  So the problem you were
> >> explaining (missing stable_how and verifier) was only for the short
> >> write case?
>
> Yes.
>
> >> So in the short write case it sounds like it *is* the server's
> >> responsibility to commit.
>
> I disagree. I don't see anywhere in the spec that requires the server
> to do that.
>
> >>   That's weird, but as long as the short write
> >> case is rare, maybe it's not a big deal?
> >>
> >> Hm: "The coa_bytes_copied value indicates the number of bytes copied but
> >> not which specific bytes have been copied."  That makes coa_bytes_copied
> >> pretty much worthless.
>
> Yep that's what I thought.
>
> >> I guess I'd be in favor of using the NFS_OK branch of CB_OFFLOAD
> >> whenever a nonzero number of bytes is copied.  If less than the
> >> requested number of bytes were copied, the client can always find out
> >> the error (if any) by attempting to copy the rest.
>
> Ok
>
> >>
> >> --b.
> >>
> >> >
> >> > > In theory that shouldn't be too expensive because asynchronous COPY
> >> > > should normally be done with a very large range.
> >> > >
> >> > > (I wonder if the server should just do a synchronous copy if the
> copy
> >> > > would be less than a few megs?)
> >> >
> >> > Yes the server could determine to convert an asynchronous copy into a
> >> > synchronous one based on size.
> >> >
> >> > I guess I'm still confused how to handle the implementation so that it
> >> > doesn't kill interoperability with other possible (future)
> >> > implementations...
> >>
> >> _______________________________________________
> >> nfsv4 mailing list
> >> nfsv4@ietf.org
> >> https://www.ietf.org/mailman/listinfo/nfsv4
> >
> >
>