Re: [nfsv4] 3530bis Issue 39: Clarification on renewing sequence IDs

Trond Myklebust <Trond.Myklebust@netapp.com> Thu, 18 November 2010 17:08 UTC

Return-Path: <Trond.Myklebust@netapp.com>
X-Original-To: nfsv4@core3.amsl.com
Delivered-To: nfsv4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id CEA813A688C for <nfsv4@core3.amsl.com>; Thu, 18 Nov 2010 09:08:51 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.599
X-Spam-Level:
X-Spam-Status: No, score=-10.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yskvXFEyzohP for <nfsv4@core3.amsl.com>; Thu, 18 Nov 2010 09:08:49 -0800 (PST)
Received: from mx2.netapp.com (mx2.netapp.com [216.240.18.37]) by core3.amsl.com (Postfix) with ESMTP id 029CC3A6887 for <nfsv4@ietf.org>; Thu, 18 Nov 2010 09:08:48 -0800 (PST)
X-IronPort-AV: E=Sophos;i="4.59,218,1288594800"; d="scan'208";a="484021757"
Received: from smtp2.corp.netapp.com ([10.57.159.114]) by mx2-out.netapp.com with ESMTP; 18 Nov 2010 09:09:36 -0800
Received: from sacrsexc1-prd.hq.netapp.com (sacrsexc1-prd.hq.netapp.com [10.99.115.27]) by smtp2.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id oAIH9aus017930; Thu, 18 Nov 2010 09:09:36 -0800 (PST)
Received: from SACMVEXC2-PRD.hq.netapp.com ([10.99.115.18]) by sacrsexc1-prd.hq.netapp.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 18 Nov 2010 09:09:36 -0800
Received: from 10.58.60.62 ([10.58.60.62]) by SACMVEXC2-PRD.hq.netapp.com ([10.99.115.16]) with Microsoft Exchange Server HTTP-DAV ; Thu, 18 Nov 2010 17:09:04 +0000
Received: from heimdal.trondhjem.org by SACMVEXC2-PRD.hq.netapp.com; 18 Nov 2010 12:09:04 -0500
From: Trond Myklebust <Trond.Myklebust@netapp.com>
To: david.noveck@emc.com
In-Reply-To: <BF3BB6D12298F54B89C8DCC1E4073D8002945154@CORPUSMX50A.corp.emc.com>
References: <4CACD9BF.2010809@oracle.com> <4CD32C31.3040909@oracle.com> <1288965554.3975.27.camel@heimdal.trondhjem.org> <BF3BB6D12298F54B89C8DCC1E4073D8002945154@CORPUSMX50A.corp.emc.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Organization: NetApp Inc
Date: Thu, 18 Nov 2010 12:09:03 -0500
Message-ID: <1290100143.7245.9.camel@heimdal.trondhjem.org>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.0 (2.32.0-2.fc14)
X-OriginalArrivalTime: 18 Nov 2010 17:09:36.0459 (UTC) FILETIME=[60AB51B0:01CB8743]
Cc: Robert.Thurlow@oracle.com, nfsv4@ietf.org
Subject: Re: [nfsv4] 3530bis Issue 39: Clarification on renewing sequence IDs
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nfsv4>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Nov 2010 17:08:52 -0000

So, I don't think that SEQ_STATUS_LEASE_MOVED should be the last word on
this process.

SEQ_STATUS_LEASE_MOVED has the same scalability problems that
NFS4ERR_LEASE_MOVED does: if the server is exporting 1000 or so volumes,
and one of them moves, it has to check every time a client sends a
SEQUENCE op (i.e. pretty much every COMPOUND) if that client has
completed its search for the absent volume. Furthermore, it will be
receiving a bunch of new RPC requests as the clients all probe 1000
volumes for the absent one.

NFSv4.1 has solved the problem of callbacks by means of the session
backchannel. Why can't we now solve the problem of LEASE_MOVED by adding
a CB_LEASE_MOVED operation which reports not only that a filesystem is
missing, but also identifies which filesystem (e.g. by including the
fsid)?

Cheers
  Trond

On Sun, 2010-11-07 at 01:04 -0500, david.noveck@emc.com wrote:
> I agree with Trond's argument as to seqid, i.e. that you should
> increment the seqid in the case of NFS4ERR_LEASE_MOVED.
> 
> NFS4ERR_LEASE_MOVED was buried at a cross-roads in RFC5661, but there we
> have SEQ_STATUS_LEASE_MOVED so we don't need it.
> 
> If you bury it in RFC3530bis, you do have to have some way to deal with
> the issue of letting the client find out that there is a migrated lease.
> Otherwise, there is an unbounded period in which the new server will not
> hear from the client and the client's open files could be lost.
> 
> The alternative to the monstrous hack would be to require the server to
> simulate a reboot.  The client would see a STALE client or stateid error
> and then he would go through the reclaim sequence for both any migrated
> and non-migrated fs's.   That seems harder to make happen than
> LEASE_MOVED, as monstrous as it is.
>  
> 
> -----Original Message-----
> From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf
> Of Trond Myklebust
> Sent: Friday, November 05, 2010 9:59 AM
> To: Robert Thurlow
> Cc: NFSv4
> Subject: Re: [nfsv4] 3530bis Issue 39: Clarification on renewing
> sequence IDs
> 
> On Thu, 2010-11-04 at 15:57 -0600, Robert Thurlow wrote:
> > Robert Thurlow wrote:
> > > Hi folks,
> > > 
> > > This is issue 39 from 
> > > http://github.com/loghyr/3530bis/blob/master/tasklist.txt.
> > > 
> > > In implementing NFSv4 migration support, we believe that
> > > MOVED and LEASE_MOVED need to be added to the list of errors
> > > in 8.1.5 which do NOT result in incrementing the open owner
> > > or lock owner sequence ID.  The goal is to make the sequence
> > > ID readily calculable for both the client and the destination
> > > server after the migration has occurred.
> > > 
> > > On the 3530bis call, this appeared exactly backwards to some
> > > others - that since a completely gross error had not occurred,
> > > we should increment the sequence ID and the client and the
> > > destination server should know to expect that when they interact
> > > after a migration.  I do not know this issue well enough to
> > > properly defend a position, so please reply with your reasoned
> > > opinion :-)
> > 
> > I don't think this has had a response.  If you disagree with
> > the wording change, now is the time to say so.
> 
> As stated on the confcall, I strongly disagree with this change w.r.t.
> NFS4ERR_LEASE_MOVED. The operation that resulted in a
> NFS4ERR_LEASE_MOVED cannot be safely replayed if the sequence id has not
> been bumped.
> 
> The point is that NFS4ERR_LEASE_MOVED is an error that depends on the
> state of a _different_ filesystem. It does not even pertain to the
> actual state you are trying to modify (and is a monstrous hack). Worse
> yet, that error condition can be cleared at any time with no
> consequences for the stateids held by the client, so unlike
> NFS4ERR_BAD_STATEID or NFS4ERR_BAD_SEQID, there is no ordering w.r.t.
> the operation that you are retrying.
> 
> IOW: if the error condition happens to get cleared between two replays
> of the operation, the client may end up getting 2 conflicting replies
> (one NFS4ERR_LEASE_MOVED, the other being a change of state on the
> server). Which one does it choose?
> 
> So how about counter-proposal: we bury NFS4ERR_LEASE_MOVED at a
> cross-roads with a stake through its heart, and promise never to mention
> it again except when the kids need scaring to bed...
> 
> Trond
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4
> 

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com