Re: [nfsv4] What error to return if destination server fails to READ within cnr_lease_time

"Adamson, Andy" <William.Adamson@netapp.com> Fri, 18 December 2015 19:14 UTC

Return-Path: <William.Adamson@netapp.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C79891B384B for <nfsv4@ietfa.amsl.com>; Fri, 18 Dec 2015 11:14:20 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.911
X-Spam-Level:
X-Spam-Status: No, score=-6.911 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id k6iwUMcfcz5Z for <nfsv4@ietfa.amsl.com>; Fri, 18 Dec 2015 11:14:18 -0800 (PST)
Received: from mx143.netapp.com (mx143.netapp.com [216.240.21.24]) (using TLSv1.2 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3000E1B3845 for <nfsv4@ietf.org>; Fri, 18 Dec 2015 11:14:18 -0800 (PST)
X-IronPort-AV: E=Sophos;i="5.20,447,1444719600"; d="scan'208";a="85921523"
Received: from hioexcmbx05-prd.hq.netapp.com ([10.122.105.38]) by mx143-out.netapp.com with ESMTP; 18 Dec 2015 11:09:17 -0800
Received: from HIOEXCMBX03-PRD.hq.netapp.com (10.122.105.36) by hioexcmbx05-prd.hq.netapp.com (10.122.105.38) with Microsoft SMTP Server (TLS) id 15.0.1130.7; Fri, 18 Dec 2015 11:09:17 -0800
Received: from HIOEXCMBX03-PRD.hq.netapp.com ([::1]) by hioexcmbx03-prd.hq.netapp.com ([fe80::d0b6:c2cf:8cbc:16b8%21]) with mapi id 15.00.1130.005; Fri, 18 Dec 2015 11:09:17 -0800
From: "Adamson, Andy" <William.Adamson@netapp.com>
To: David Noveck <davenoveck@gmail.com>
Thread-Topic: [nfsv4] What error to return if destination server fails to READ within cnr_lease_time
Thread-Index: AQHROQthNC0KceJZrUi+dauqiaUVVJ7QoXQAgAC+eICAADP9gIAAD0iA
Date: Fri, 18 Dec 2015 19:09:17 +0000
Message-ID: <601F497C-40CF-409E-A2C3-561234AC54B9@netapp.com>
References: <2EE02221-E9C5-4087-AFA6-1A1D52308C0C@netapp.com> <CADaq8jfxrJDvtfhEVeGMXkD4PLFcv5sfBP4g4B6VUfdhB_Ks2w@mail.gmail.com> <E19898F5-E239-42D8-87FE-870712D5DA63@netapp.com> <CADaq8jePDoEj=jt2pC5_ko+8wiX4d7EVO-EDKPmPJTNqthhFKg@mail.gmail.com>
In-Reply-To: <CADaq8jePDoEj=jt2pC5_ko+8wiX4d7EVO-EDKPmPJTNqthhFKg@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-mailer: Apple Mail (2.2098)
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [10.122.56.79]
Content-Type: text/plain; charset="utf-8"
Content-ID: <E7EBC00B20088546BBC685E0222E802B@hq.netapp.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: <http://mailarchive.ietf.org/arch/msg/nfsv4/vII22EVH0OuvQ05PUdw5FNT2Ik0>
Cc: "Adamson, Andy" <William.Adamson@netapp.com>, NFSv4 <nfsv4@ietf.org>
Subject: Re: [nfsv4] What error to return if destination server fails to READ within cnr_lease_time
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Dec 2015 19:14:20 -0000

> On Dec 18, 2015, at 1:14 PM, David Noveck <davenoveck@gmail.com> wrote:
> 
> > ?? The destination server is a client (referred to as dst-client)
> 
> Yes it is a client but there is another client as well.  I was referring to that as "the client" which didn't help clarify things.
> 
> > it loads the client module, and the dst-client mounts and reads the data from the source server with _no_ change to the client code. 
> 
> I can why one might want that, but the situation is one that typical client code does not deal with, particularly when lease considerations are concerned.  Normally, one issue a read with a stateid that was gotten in the context of the client issuing the request.  If lease expiration occurs, it is clear which lease has expired.  Here we have two clients and the stateid was gotten by one and is used by a different one.  If NFS4ERR_EXPIRED means a lease has expired, the obvious question is "which one?" and RFC5661 is not very clear about that since the situation doesn't occur in  v4.1.  I was assuming that the lease expiration would be for the original client while you seem to be assuming it would be for the dst-client.  
> 
> > The READ is a normal READ. 
> 
> It's normal in form,but it is done with a stateid that it didn't get with an OPEN that it did, which makes it kind of "non-standard", if "abnormal" seems inappropriate.
> 
> > So if the dst-client gets an NFS4ERR_EXPIRED on the READ, it must assume that the stateid is bad, and try to recover it. 
> 
> But there would be no way for it do that.  The normal way to do that is to do an OPEN, but the dst-client doesn't have the information to do that.
> 
> > But the stateid is not bad, the clientID is not expired. This has nothing to do with the cnr_lease_time and in my opinion, should not be returned.
> 
> OK, but I want to explain my logic, which I think still works, if one assumes, as I did, that the client code will have to be as the server code will have to be, aware of the the non-standar nature of what is going on.  Since, there are two leases involved, I was assuming NFS4ERR_EXPIRED would mean that one or more of the following had occurred:
> 	• The original client's lease has expired.  In this case, the best thing to do is to fail the copy immediately and let the original client do any necessary lease recovery on his own
> 	• The dst-client's has expired.  As you point out, this case is unlikely to arise, since the clientid has just been established, but it is still (just barely) possible that a communication break at just the wrong time can cause this to occur.  the dst-client can test any stateid's that it got itself to see if this is the case that occurred.

Um - the stateid in question did not come from the dest-client. it came from the client. This means that the stateid used by the READ to the source server is not associated with the clientID established by the mount of the dest-client. We already have special code on the source server to allow for a lookup of a stateid against a different clientID than the SEQUENCE operation resolves to - special for the COPY READ. How can it test the stateid!! We would need more special code on the server. This is IMHO not the direction to go.

>  If it didn't, it can just fail the copy since it doesn't matter whether 1 or 3 occurred.  If 2 has occurred, the dst-client can only recover stateids it got for itself.  it has no ability to recover the others and so in this case as well it should fail the copy.  In this case, the original client might see that its stateid is usable and so reissue the copy immediately
> 	• The cnr_lease_time has expired.  In this case also, the best thing to do is to fail the copy immediately and let the original client do any necessary lease recovery on his own

All of the above complexity goes away if a READ for the COPY is able to return NFS4ERR_PARTNER_NO_AUTH. 

—>Andy

> 
> 
> On Fri, Dec 18, 2015 at 10:08 AM, Adamson, Andy <William.Adamson@netapp.com> wrote:
> 
> > On Dec 17, 2015, at 10:46 PM, David Noveck <davenoveck@gmail.com> wrote:
> >
> > > 1) What is the error returned by the source server on the READ?
> >
> > > It’s not NFS4ERR_EXPIRED
> >
> > I would not be so quick to dismiss this.  See below.
> >
> > > as this refers to the clients lease
> >
> > Normally it does but since there is no NFS4ERR_CNR_LEASE_EXPIRED, I think it is a reasonable accommodation.
> 
> What if the stateid has actually expired, and the source server is saying ‘recover the stateid’ when it returns NFS4ERR_EXPIRED on the READ? In other words, there is no way to add the ‘this NFS4ERR_EXPIRED means that the cnr_lease_time not the stateid.
> 
> >
> > > and will promt the client to recover the stateid.
> >
> > It would if the client got it, but the client is not going to get it in this case.
> 
> ?? The destination server is a client (referred to as dst-client) - it loads the client module, and the dst-client mounts and reads the data from the source server with _no_ change to the client code. The READ is a normal READ. So if the dst-client gets an NFS4ERR_EXPIRED on the READ, it must assume that the stateid is bad, and try to recover it. But the stateid is not bad, the clientID is not expired. This has nothing to do with the cnr_lease_time and in my opinion, should not be returned.
> 
> >  The destination server is going to get it and he could reasonably conclude that either:
> >       • The client's lease has expired:
> 
>         in which case the dst-client will try to recover the stateid, and if that indicates that the clientID has expired, then recover the clientID which is nuts as that the dst-client has _just_ established the clientID, as all that dest-client does is mount, READ, umount.
> 
> >       • cnr_lease_time has expired
> 
>         in which case the destination server would just fail the COPY.
> 
> > It would be nice if he could know which of these two occurred but it is not essential.
> > In either case, the COPY has to be failed and the client will find soon out enough whether his lease for the source server has expired or not.
> 
> Really? How does the client know if the lease for the source server has expired?
> 
> 
> >  If it has, he is in a position to re-establish it.
> 
> Well, a new COPY needs to be started.
> 
> >
> > Another possibility is NFS4ERR_ADMIN_REVOKED if you want to distinguish this from a true lease expiration.
> 
> This also implies a stateid problem, not a cnr_lease_time problem and will prompt stateid recovery.
> >
> >
> > > 2) What is the error returned by the destination server on the COPY?
> >
> > One possibility is NFS4ERR_PARTNER_NO_AUTH.  That's an exact fit if you know cnr_lease_time expired.  It's kind of a rough ft if either lease could have expired.
> 
> 
> Ah! This is the error I was looking for.
> 
> Why not add NFS4ERR_PARTNER_NO_AUTH as an error on a READ? This seems a simple addition to the protocol and would be
> 
> —>Andy
> 
> 
> 
> >
> > If you are unsure of the specfic lease expiration causing the failure, NFS4ERR_OFFLOAD_DENIED seems like it would do what you want.
> >
> > On Thu, Dec 17, 2015 at 3:42 PM, Adamson, Andy <William.Adamson@netapp.com> wrote:
> > From draft-ietf-nfsv4-minorversion2-39 Section 15.3.3.  DESCRIPTION of COPY_NOTIFY:
> >
> >    If this operation succeeds, the source server will allow the
> >    cna_destination_server to copy the specified file on behalf of the
> >    given user as long as both of the following conditions are met:
> >
> >
> >       The destination server begins reading the source file before the
> >       cnr_lease_time expires.
> >
> >
> >
> > So on an inter-SSC the source server starts the cnr_lease_time upon the reply to COPY_NOTIFY, and
> > if the cnr_lease_time expires prior to the beginning of the READ from the source
> >  server, the source server fails the READ.
> >
> > 1) What is the error returned by the source server on the READ?
> >
> > It’s not NFS4ERR_EXPIRED as this refers to the clients lease and will promt the client to recover the stateid.
> >
> >
> > 2) What is the error returned by the destination server on the COPY?
> >
> > I would hope it is the same error as returned by READ.
> >
> > Do we need a new error code?
> >
> > Suggestions?
> >
> > —>Andy
> >
> > _______________________________________________
> > nfsv4 mailing list
> > nfsv4@ietf.org
> > https://www.ietf.org/mailman/listinfo/nfsv4
> >
> 
>