Re: [nfsv4] RFC 7530: Filehandle of opened file after the REMOVE

David Noveck <davenoveck@gmail.com> Sun, 01 January 2017 13:13 UTC

Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F314E129447 for <nfsv4@ietfa.amsl.com>; Sun, 1 Jan 2017 05:13:59 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.699
X-Spam-Level:
X-Spam-Status: No, score=-2.699 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dB57wXqs9haW for <nfsv4@ietfa.amsl.com>; Sun, 1 Jan 2017 05:13:57 -0800 (PST)
Received: from mail-oi0-x241.google.com (mail-oi0-x241.google.com [IPv6:2607:f8b0:4003:c06::241]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BE7AB1293D9 for <nfsv4@ietf.org>; Sun, 1 Jan 2017 05:13:57 -0800 (PST)
Received: by mail-oi0-x241.google.com with SMTP id v84so63362271oie.2 for <nfsv4@ietf.org>; Sun, 01 Jan 2017 05:13:57 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=cs3wlmrJTys629oWas4oo5Fh2eM57X/6i6ddjVexupI=; b=h9tpynhM323zi+kv1hIgI96GR2LBMZ63lFPdd0m8OA7UxbfUf23s48ejis7jI0+fFt +2AGeXHbuTimdXLaXlm76e4fh6Yv1mFHWzzA81wBbdcNrDJBzVV87OdzyvA7ppoRYKW+ qWue9cFeI5W/WEOvxhfv7hh4zuX2cFOXjsGr2U+BTTKs3SEgQodDFy1inOW0UKYszJHS B7++OtMmoj7b3H/dGX5/9ILjGnduSUscYykJWEq15B2HQ6kflnZ+/7aiDoPn2C+OXdC9 JudqTfMwrsCXhMBIC14rAHN202+ZIDKSbXWW+1cuEKACpqS9ELUXqREA30iewabClyLu ehJw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=cs3wlmrJTys629oWas4oo5Fh2eM57X/6i6ddjVexupI=; b=RSjZXG9KM9S7yWHCdVBdUfc2vGPDSiMaH7xrE8a/4SFyGfMMVFVNAMWkzm+KwuEZHM yLp6AUvRX1tJcuHYB6MFJTu8B2CcBGN6bVnP5eQg6uU7VKOC2BY26s37MCRBlc/d4UDt DGyhhtZrCYgtip5m+AaZcx3avwEXhb+tZtC5VgC2oayIqHVucdVji9FBBSsY7wusEDs0 tJvcoauwz9UB3BLfUZ93e6z8TqvSlA6D+2MqQdyPG2uAunqDYJ4VQrDJZduupxAlgQ4g sP6Yx1eFcSAg14BvkNNYeMpC8G6313uP0dseHT8fBS7qMWh5+cwO/p2hhkxFQitnwUx7 Ee0A==
X-Gm-Message-State: AIkVDXLZimo78CXz3Z5lkeF/ExK0Tlay8hsuq9fZocj7uxtr0rOgicDLBKaiaLhUk/UGdX9najaV/Mw6/omuwA==
X-Received: by 10.157.51.53 with SMTP id f50mr25079474otc.34.1483276437066; Sun, 01 Jan 2017 05:13:57 -0800 (PST)
MIME-Version: 1.0
Received: by 10.182.137.202 with HTTP; Sun, 1 Jan 2017 05:13:55 -0800 (PST)
In-Reply-To: <20161229074830.GA3002@lst.de>
References: <20161213171902.Horde.MkS1YMOM6VpxA0Z7rSMTe7P@mail.telka.sk> <CAABAsM5L0xdKodxk1dRSugLyROzn2JzgDkq6kdHE0LuGcfh++A@mail.gmail.com> <20161213181734.Horde.EqgB09El8rupnkesIQaBwJ3@mail.telka.sk> <CADaq8jcq2C0o8EWXoGjxDn58sV_J+-SP-=rj934Se-DV69b-pw@mail.gmail.com> <20161214112112.Horde.aPh8AjT6iWRl37CULwihyV7@mail.telka.sk> <CAABAsM7v6y0bsb0jKzfvobkUjniTLhM3uv8FYjo07HcLD2004w@mail.gmail.com> <20161227144414.GA32002@fieldses.org> <CADaq8jck14SKL6Ua9QxbqPyX1=1aaA7+76wv-__EWFvh7ZcEJA@mail.gmail.com> <C496AE44-0F27-4B66-A1F6-A76AEAFD7A90@gmail.com> <20161229024703.GA21325@fieldses.org> <20161229074830.GA3002@lst.de>
From: David Noveck <davenoveck@gmail.com>
Date: Sun, 1 Jan 2017 08:13:55 -0500
Message-ID: <CADaq8jd__SJHP-4aJPbW9GscKRc6cwSe26VYt3w_GPcpuN3QHQ@mail.gmail.com>
To: Christoph Hellwig <hch@lst.de>
Content-Type: multipart/alternative; boundary=001a1141ba08229b3c05450835dc
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/Z51tVaYHnj94XBdd_kb7kGm8ibY>
Cc: Bruce James Fields <bfields@fieldses.org>, IETF NFSv4 WG Mailing List <nfsv4@ietf.org>
Subject: Re: [nfsv4] RFC 7530: Filehandle of opened file after the REMOVE
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 01 Jan 2017 13:14:00 -0000

> We could still persist that information
> somewhere, or use a flag to delay the deletion of unlinked inodes until
> NFSD runs.

You need to delay the deletion until after the grace period is over.  There
may be technical issues with this but the biggest hurdle may may be
architectural/political.  I expect that people involved in file system
recovery may have trouble with the idea that the NFS server is telling them
when they can and cannot do various pieces of file system recovery.

It might be more possible to ask them to complete recovery but move the
unlinked inodes to a directory like /unlinked-inodes and consider the
recovery complete before NFS actually runs.  If that is OK, and I believe
it is, the result would be almost the same as the NFS server moving the
file to that directory in the first place, using a server-based silly
rename.

> Personally I'd love to see sillyrename die.

I think you really want to see it dead.  Actually watching it die is
boring, sort of like watching paint dry

> It's a major pain for getting sensible semantics out of NFS.

Please elaborate.  Would a server-based silly rename allow the sensible
semantics you are looking for?

The best way to get silly rename to die is to provide an alternative, and I
believe an extension to NFSv4.2 could be defined to provide that.

> The unlinked inode list is almost a directory, except that it doesn't
> have names for the entries, you can only find inodes on it by the inode
> number and generation (aka NFS file handle).

The existence of those names could be considered a flaw, but if we define
an extension to allow silly rename to be eliminated, we should not REQUIRE
or RECOMMEND they they not be visible.  I'm thinking of some non-RFC2119
characterization such as "It is most desirable that these objects not be
accessible other than though  use of their file handles".

Btw, my preferred title for such a document, almost certain to be rejected,
is "Silly Rename Must Die" :-)


On Thu, Dec 29, 2016 at 2:48 AM, Christoph Hellwig <hch@lst.de> wrote:

> On Wed, Dec 28, 2016 at 09:47:03PM -0500, Bruce James Fields wrote:
> > I never seriously worked on it, but for a while I was in the habit of
> > running it by people.  Christoph Hellwig thought it was doable (I think
> > he suggested some sort of callback from the filesystem during the
> > garbage collection, possibly because he had in mind some other
> > application for that--but my memory may be wrong).  Chris Mason didn't
> > like the idea at all.  He asked what we expect to happen on fsck, or if
> > the filesystem gets mounted without nfs getting started, or... some
> > other scenarios I forget.
>
> The way open but unlinked files are handled by modern transaction
> file systems is that the file system has a list of those inodes
> (in XFS this is the unlinked inode list in the allocation group header,
> other file systems use different terminologies and slightly different
> technics, e.g. in ext4 the list is global for the whole file system).
>
> After an unclean shutdown when file system recovery is run we'll perform
> the deferred delete for all the inodes on the unlinked inode list.
> At that point the file system could in theory inform NFSD about that
> fact.  But at least as far as the current Linux kernel is concerned (
> sorry for delving into implementation details, but I guess this is still
> easier to understand than an abstract discussion) at the point where
> file system performs recovery NFSD has not been started, or at least
> doesn't
> know about the file system yet.   We could still persist that information
> somewhere, or use a flag to delay the deletion of unlinked inodes until
> NFSD runs.
>
> > We could do the same silly rename tricks on the server side.  Something
> > like: create a directory with an unlikely name in the root of the
> > export, rename files there on REMOVE.  Possible problems:
>
> Personally I'd love to see sillyrename die.  It's a major pain for
> getting sensible semantics out of NFS.
>
> >       - you'll never be able to completely hide that directory.  But
> >         maybe we could get some sort of filesystem support for a
> >         hidden directory.
>
>
> The unlinked inode list is almost a directory, except that it doesn't
> have names for the entries, you can only find inodes on it by the inode
> number and generation (aka NFS file handle).
>
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4
>