Re: [nfsv4] seeking clarifications in server-side copy CB_OFFLOAD feature

David Noveck <davenoveck@gmail.com> Fri, 03 March 2017 20:20 UTC

Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B393D12955E for <nfsv4@ietfa.amsl.com>; Fri, 3 Mar 2017 12:20:58 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Level:
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6jaQYS6DDZ0r for <nfsv4@ietfa.amsl.com>; Fri, 3 Mar 2017 12:20:55 -0800 (PST)
Received: from mail-ot0-x22b.google.com (mail-ot0-x22b.google.com [IPv6:2607:f8b0:4003:c0f::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 65903129444 for <nfsv4@ietf.org>; Fri, 3 Mar 2017 12:20:55 -0800 (PST)
Received: by mail-ot0-x22b.google.com with SMTP id x37so36645125ota.2 for <nfsv4@ietf.org>; Fri, 03 Mar 2017 12:20:55 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=NMlXUH5Yuqgy3uEn6ii6MRe2OmtPfXVTbFhYNwhc08g=; b=GqOBDQSkW4PhG6ezHJqH8eR1koloy+0UXlhLrBKU3xJB8zWFhfCMCWvv0Sh8BT+sX9 al46WPZCwJRyI48nIHBFdLmWNjTWs/yemFRLFYct3lIu4dlpVjSQ70GltLVid1VMIGCc W5vgNmrtq/XLFtMqCPpzfAzXCzIrBwrSVErZd9oUZKxruxyTA4zupCpl49Ti4Dyo1Jap LzPFieNSEFo0bu5rrfLTer+3JOloi9Xw7gn84EgxisQgKDpJ0pjRKSKdR/vLhAZJKkfO vyfRQWBoyxrmtwE1zNifl3wHf+9dAqBh9DlswEx1kIRkVcxkG69dp7QpGRT3T9DcS3KN DWbA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=NMlXUH5Yuqgy3uEn6ii6MRe2OmtPfXVTbFhYNwhc08g=; b=uP8Wi0xKjPNZHIr02m56E7fXgjxyhrEFkN4mtItGJ867Z8k15nJCELMCfIrF2jQyGJ YK5yij3i2LaL9BumChBHdQ3IxcKQTOZWXmUOKX01o5aKlv0olrGUlG+om5lSSundf9kF lAzXZJSBy1hUA4usDG5AnoDehTsb+D4BI2nbEKyamvgu9YjIfQqtqtMesfodhSvw9Wo+ IFOhMTEbaYuF88LV/HFMjvvUIQUPRg2sXKCd2YYTPhnRU3kMxl1YYhOpAApKWuHT34hJ lM19muQD0fgzDDcADp9ZPYAjahjgMFHoWwjlmgzeloS0jw+WrYY/I9cDR/cM/bvxBmvw +8nA==
X-Gm-Message-State: AMke39no49OjGAFEGuyxAC3L1JSWAbhQlZnGuHUwMis9ojtiDZhdFr/caP41p8ZFwVBlJoamoDlovRlhM83exA==
X-Received: by 10.157.47.38 with SMTP id h35mr2089930otb.130.1488572454549; Fri, 03 Mar 2017 12:20:54 -0800 (PST)
MIME-Version: 1.0
Received: by 10.182.137.200 with HTTP; Fri, 3 Mar 2017 12:20:54 -0800 (PST)
In-Reply-To: <CAN-5tyHB5Uo0uEHfKuBOSyJmu5VZ5osLWQQaqLnt68ftUS4OnQ@mail.gmail.com>
References: <CAN-5tyHB5Uo0uEHfKuBOSyJmu5VZ5osLWQQaqLnt68ftUS4OnQ@mail.gmail.com>
From: David Noveck <davenoveck@gmail.com>
Date: Fri, 03 Mar 2017 15:20:54 -0500
Message-ID: <CADaq8jdoxyFvNTMs0Ywp22gdizSkkWRZ2GOAd9uWZsH4casp8A@mail.gmail.com>
To: Olga Kornievskaia <aglo@umich.edu>
Content-Type: multipart/alternative; boundary="94eb2c04792260419c0549d948b0"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/iGn1ou3iudmPjE3uHmsucKJwfXo>
Cc: "nfsv4@ietf.org" <nfsv4@ietf.org>
Subject: Re: [nfsv4] seeking clarifications in server-side copy CB_OFFLOAD feature
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Mar 2017 20:20:59 -0000

> I'm seeking some clarification for doing the server-side copy feature.
> EINVAL in read beyond the end of the file has been already raised in
> another thread so I'll leave that aside.

Good.  I'm tired of talking about that.

> So here are a few more :

uh-oh.

> -- possible inconsistent behavior in presence of an error from doing
> an async copy vs sync. CB_OFFLOAD can return an error and a partial
> copy result but there is no mechanism to do so in synchronous COPY.

Definitely inconsistent behavior.

The only questions are how/why it got that way and what to do about.

> If
> CB_OFFLOAD returned an error and a partial copy, is that really an
> error or should it be a short copy?

It's really both but, in many cases, the client will have to make a choice
to treat it as one or the other.

> I can see a case where an error
> should not be ignore -- if error is ENOSPC.

Agree.

> Should the client if
> received an error in CB_OFFLOAD propagate the error to the
> application?

In general it should.  The only exception should be cases where the
client can do recovery to complete the requested copy.

> Should it be server's responsibility to then if some
> error occurred but a partial copy succeeded to then not return the
> error to the client (that leads to the 2nd question).

I don't think so, in the CB_OFFLOAD case.  Since the server
can indicate both the error and that the partial copy was done,
there doesn't seem any  to be any point in suppressing that information.

> -- related to errors returned in CB_OFFLOAD, there is no guidance with
> regards to the reboot recovery.

There should be some :-(

> If the destination server unable to
> finish the copy due to the source server rebooting, should it reply
> back with a successful short copy or should it return some error (and
> short copy of possibly 0bytes).

I think it should return some error.  In the case of enormous copies, the
client
could want to continue the operation, although I'm not sure what it could do
if the sequential flag is not set).  Given that the open is to be done by
the client
he was to have some indication that he can use to decide to get the lock
back within
the grace period.  If this is the only stateid for the server, an he does
not have a clear indicatin of the reason for the failure, the grace period
might expire before he decides to try to get his open back

> To address these two questions I find some wording in the spec that
> confuse me (section 15.2.3):

Me too.  I sometimes feel that, if you are not confused by this, it
is a sign that you haven't read it carefully enough.

> "If a failure does occur for a synchronous copy, wr_count will be set
> to the number of bytes copied to the destination file before the error
> occurred."

This sentence needs to be clarified, given that wr_count is only returned
in the success case.  I'll discuss possible replacements below.

> it also talks about consecutive bits there but then it
> proceeds to also say that
> "If the failure occurred for an asynchronous
> copy, .... It will be able to determine the bytes copied from the
> coa_bytes_copied in the CB_OFFLOAD argument."

To me, this s the most troubling part of this whole thing.  At some
point, someone was aware that CB_OFFLOAD had the ability to inform
the client of a partial successful copy together with  later failure while
the
result of CB_COPY did not.  However, nobody bothered to either:

   - Explain why this discrepancy was valid.
   - Sought to correct/eliminate the discrepancy.


> Since it talks about "wr_count", then it must mean that COPY's
> nfsstat4 status is not an error.

That's the implication.

> Does this then mean that if copy
> "failed" then the error must be ignored on the server and a partial
> copy results propagated to the client.

Yes, as you recognize in putting the word *failed* in quotation marks,
perhaps there
was no true failure here.

If the curious lead sentence were rewritten to say:

If, a copy proceeds for a while and sitution develops that makes it
impossible to
continue with the copy, wr_count will be set to the number of bytes copied
to
the destination file before it became impossible to continue the operation

further. In this case the operation is treated  as a success, even though
not all bytes

requested were copied.

Would that make things clearer?  if so, is it important enough to file an
errata?




> But for the async copy,
> "coa_bytes_copied" means that CB_OFFLOAD also set nfsstat4 to some
> error. So error is not ignored in the async case?

Yes.  CB_OFFLOAD has a design that allows a partially successful
copy together with a later error to be reported to the client.

Unfortunately, the synchronous case does not

> -- the spec lists the kind of errors CB_OFFLOAD reply can return but
> there is no wording for what kind of errors be included in
> CB_OFFLOAD's call.

True but given the fact that this is asychronous way of indicated completion
operation, I believe that the errors returned by COPY are the plce to start.

> For instance, I can see that a source server reboot
> can translate to ERR_BAD_HANDLE or it could be BAD_SESSION or
> STALE_CLIENTID or BAD_STATEID. It is not clear what errors to check
> for and recover by restarting the copy and what errors shouldn't
> happen.

I think that is primarily riven my implementer preference and the semantics
of the client-side API"s

>Something like BAD_SESSION implies that server-to-server
> protocol was NFS4.x x>=1.

I suppose so.

> Can the client receive an NFSv3 error in CB_OFFLOAD?

GIven that this is an nfsstat4, I think it can't.

> While I realize that server-to-server copy details is not a part of
> the RFC7862, but what the client receives in CB_OFFLOAD and how to
> interpret that is unclear.

I think there are some lapses in clarity but some of the issues are not
questions about how to interpret the results of CB_OFFLOAD but deal
with what to do about the results.

On Thu, Mar 2, 2017 at 4:07 PM, Olga Kornievskaia <aglo@umich.edu> wrote:

> Hi folks,
>
> I'm seeking some clarification for doing the server-side copy feature.
> EINVAL in read beyond the end of the file has been already raised in
> another thread so I'll leave that aside. So here are a few more :
> -- possible inconsistent behavior in presence of an error from doing
> an async copy vs sync. CB_OFFLOAD can return an error and a partial
> copy result but there is no mechanism to do so in synchronous COPY. If
> CB_OFFLOAD returned an error and a partial copy, is that really an
> error or should it be a short copy? I can see a case where an error
> should not be ignore -- if error is ENOSPC. Should the client if
> received an error in CB_OFFLOAD propagate the error to the
> application? Should it be server's responsibility to then if some
> error occurred but a partial copy succeeded to then not return the
> error to the client (that leads to the 2nd question).
>
> -- related to errors returned in CB_OFFLOAD, there is no guidance with
> regards to the reboot recovery. If the destination server unable to
> finish the copy due to the source server rebooting, should it reply
> back with a successful short copy or should it return some error (and
> short copy of possibly 0bytes).
>
> To address these two questions I find some wording in the spec that
> confuse me (section 15.2.3):
>
> "If a failure does occur for a synchronous copy, wr_count will be set
> to the number of bytes copied to the destination file before the error
> occurred." ... it also talks about consecutive bits there but then it
> proceeds to also say that "If the failure occurred for an asynchronous
> copy, .... It will be able to determine the bytes copied from the
> coa_bytes_copied in the CB_OFFLOAD argument."
>
> Since it talks about "wr_count", then it must means that COPY's
> nfsstat4 status is not an error. Does this then mean that if copy
> "failed" then the error must be ignored on the server and a partial
> copy results propagated to the client. But for the async copy,
> "coa_bytes_copied" means that CB_OFFLOAD also set nfsstat4 to some
> error. So error is not ignored in the async case?
>
> -- the spec lists the kind of errors CB_OFFLOAD reply can return but
> there is no wording for what kind of errors be included in
> CB_OFFLOAD's call. For instance, I can see that a source server reboot
> can translate to ERR_BAD_HANDLE or it could be BAD_SESSION or
> STALE_CLIENTID or BAD_STATEID. It is not clear what errors to check
> for and recover by restarting the copy and what errors shouldn't
> happen. Something like BAD_SESSION implies that server-to-server
> protocol was NFS4.x x>=1. Can the client receive an NFSv3 error in
> CB_OFFLOAD?
>
> While I realize that server-to-server copy details is not a part of
> the RFC7862, but what the client receives in CB_OFFLOAD and how to
> interpret that is unclear.
>
> Thank you.
>
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4
>