[nfsv4] Notes regarding discussion of directory scalabiliy issues

David Noveck <davenoveck@gmail.com> Fri, 26 June 2020 17:54 UTC

Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost []) by ietfa.amsl.com (Postfix) with ESMTP id 773543A0BCD for <nfsv4@ietfa.amsl.com>; Fri, 26 Jun 2020 10:54:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id JxJK5kJWBqM2 for <nfsv4@ietfa.amsl.com>; Fri, 26 Jun 2020 10:54:33 -0700 (PDT)
Received: from mail-ed1-x536.google.com (mail-ed1-x536.google.com [IPv6:2a00:1450:4864:20::536]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5BE453A0BD1 for <nfsv4@ietf.org>; Fri, 26 Jun 2020 10:54:33 -0700 (PDT)
Received: by mail-ed1-x536.google.com with SMTP id dg28so7577279edb.3 for <nfsv4@ietf.org>; Fri, 26 Jun 2020 10:54:33 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=B22TDJd/QOHMo2wtOA/7hKYqjqLhCD4SZk2xxyKgGUU=; b=dYOr7C899jGwnVxmcuo2Dz/FsoqbqHlGz1VCEjUSdzT8WjgMiR/X2fMmdIfU5SvxeN 7b6lGOc/ZUd5qygP8svvBFkMh23KZ2gTz9Jru6RgqvM8/cnUufxYoBL2+3pE5OgWfpQY aCsxMswYGotKybvQRicvB+0FUonNPoDso74UsGXWNCbdUwNnEz4+zDJmkzDLMVFJ7+oY to6+Y8kl6xeLNq/Ta6a2slx/L1kgzY3YHMsW4bxYFKDKHP5ww+xjLohqElryne/nkiEy tv1v+K8GtKzjFSUpt8XAZwZktqGsRxumYrXrXCj2Zopr6+Kz+e6jbY8jzGGKSdxu8lLv 4GkA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=B22TDJd/QOHMo2wtOA/7hKYqjqLhCD4SZk2xxyKgGUU=; b=uXrv1vmqCeuajHVRjWSMAgCS94iYYMZByIv54tkMMeKsadPNzZ5w3UGMe/S0Pt/Cg2 QaF1ToCFKPCfBuNu8BOY28yuLxwnHLQZmzu9MczUKrUeaoycWLOM5RLzlteAp0cpRFre N2pu2MrZJRBjkAOQ6DWcbeOKkUlJrJVD6OLydGWjesRVI13w+jxwe9ZQb2kNSpvJlQoB fmgeLY8ytumWfO5rMeg/QVy/KvKfMu+TDjMhxZ4Du4SRyeVzVR6ImxZH4HAdF8hcL6yL BV9k5cwKh2pNW3NdTiUMAFJlGvBX644OfofyQ8qmf/qrNk8iqxCskQCA1wxKUjcnCqsN F4ng==
X-Gm-Message-State: AOAM533R2z8SF1a0yOHQZcUPVWKoWES2FYmpt1NAlomu80+nhzvduZ5q Sea8uQ/EN3erXGWYJp7lBn5sN72syPIwJnH5mcZU+g==
X-Google-Smtp-Source: ABdhPJyIM1aepTTFsSH7iDZDywi43rUyjAu6ZdHUivmQoejfMKHDH2RSgVScfZrVAihHoAsvNFCdoJLv6J+4SWD1+Y0=
X-Received: by 2002:a50:f1d9:: with SMTP id y25mr4372629edl.292.1593194071316; Fri, 26 Jun 2020 10:54:31 -0700 (PDT)
MIME-Version: 1.0
From: David Noveck <davenoveck@gmail.com>
Date: Fri, 26 Jun 2020 13:54:20 -0400
Message-ID: <CADaq8jev+tUs=mrGDMnZMpfmQXL=KLwDKW5S-CbBLpL-54RJTA@mail.gmail.com>
To: NFSv4 <nfsv4@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000addf5405a900668c"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/ak4ADoj_-xRa5TL-fnSAIHrYQZU>
Subject: [nfsv4] Notes regarding discussion of directory scalabiliy issues
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 26 Jun 2020 17:54:35 -0000


On 6/22, Chuck and I held a discussion to try to resolve some issues that
arose in our discussion of scalability issues for directory operations.
 After our original presentations on this topic at the
post-IETF107 virtual interim, it was anticipated that we would discuss
those on the wg mailing list.   Since that didn't work out, Chuck and I
decided to clarify and possibly resolve our differences of approach in a
short meeting that worked well in the form of a phone call.

We were able to clarify but not resolve two issues.  We hope to be able to
resolve these eventually, but not necessarily during the July 9th meeting.

   - We did agree that we will need to better understand, and try to
   resolve issues regarding the compatibility of directory notifications and
   client handling of directory cookies and their caching. See *Directory
   Delegation Issues* for details.

   - We also explored issues raised by the possible addition of ops to aid
   recursive directory traversals, building on Chuck's suggestion, at the
   earlier meeting, of possible protocol aid for "rm -r". See  *Ops to Add
   Aiding Recursive Directory Traversal *for details

*Directory Delegation Issues*

Implementations of directory delegations are quite limited, making it not
worth investing in server-side implementations.   Useful client-side
implementation would require the ability to efficiently cache large slowly
changing directories.   Unfortunately, that is not currently possible, at
least for the Linux client, since the creation or removal of a file from a
directory requires that the entire directory (which can be very large) be
refetched.  This now occurs for directory changes that the client makes
itself but would presumably also apply in the case of directory
notifications, making them not very useful in the important case of large
slowly-changing directories.

As Chuck related to me, this requirement for directory refetching is
predicated on the possibility that any change to a directory (e.g. remove,
create, rename) could potentially invalidate directory entry cookies for
all the cached directory.entries.   While this is unlikely to happen in
practice, I can see that clients might be unwilling to rely on filsystems
not changing these. The thing I don't understand, and Chuck was unable to
clarify for me, is why such entries need to be cached at all, and
essentially treated as if they were attributes.  While the directory
notification feature has the ability to propagate attribute changes to
clients with delegations, there is no such ability in the case of cookie
changes.  It appears that the feature was designed assuming these would not
be necessary.

As a result, it appeared that one if the following would have to be done:

   1. Clients might be modified so as not to depend on a supposed fixity of
   such cookies. That's clearly my favorite as I believe the client returning
   a cached directory could synthesize its own cookies to allow users to fetch
   directory information across multiple requests.  However, I'm not sure
   client implementers would agree and this is a matter on which consensus is
   2. Provide a way in which the server could  communicate that the
   theoretical possibility that a remove, create, or rename could cause a
   revision of cookies for uninvolved directory entries does not occur for  a
   given fs. This could be done by adding a new fs-scope attribute providing
   information about an fs's directory entry cookie management.  The downside
   is that this would be v4.2-only and would be require an additional RFC to
   make a v4.1 feature effectively usable 😞
   3. Provide another way to exclude the possibility of the client being
   blind-sided by a server-side fs prone to directory cookie reassignment.
   For example, any change to a directory entry cookie not being removed or
   renamed could require delegation recall.  Unfortunately, this is a big
   change to an existing feature😖

As Chuck and I finished the discussion we anticipated a long-term process
aimed at securing a consensus about which of these choices the working
group would adopt to make directory delegations useful.  This could start
at the 7/9 meeting but the need to make sure we had the
active participation of client implementers meant it wasn't a sure thing
for initial discussion at the 7/9 meeting.

Lately, I've been looking at directory notifications in more detail and
come to the conclusion that the way the notifications use cookies to update
cached directories is really predicated on the expectation that cookies for
directory entries not involved in the specific update will not change.  For
example, insert and rename notifications include the cookie of the entry
before the insertion point, which is not really useful if the server-side
fs  is free to change cookies for entries not involved in the directory

I now believe that it is possible to rework the description of this feature
so that the server by supporting directory notifications is providing
assurance that the server, by supporting such notifications, is effectively
providing an assurance that the wholesale revision of cookies as a result
of directory modification operations, about which clients are concerned,
cannot happen.   This could not be done as a consequence of an errata
report, but it is the sort of clarification/revision that I feel could be
done as a part of rfc5661bis, assuming that we can reach a working group
consensus on the matter.  We intend to do change sof this scale for some
REJECTED errata reports.I will take about 5-10 minutes for a presentation
of this issue at the 7/9 meeting, hoping to stimulate later discussion of
this issue and future directory delegation/notification
implementation possibilities on the mailing list.

*Ops to Add Aiding Recursive Directory Traversal*

In Chuck's presentation he alluded to the possibility of the protocol
giving "rm -r" more help and there are a number of useful extensions that
could be defined.  As I considered other applications in which recursive
directory traversal there appeared to be a number of possible READDIR
extensions that could be sensibly proposed and would be useful in software
build workloads.

The problem with such extensions is that they are unlikely to be used
unless complementary work is done to provide useful API's to provide access
to any helpful protocol extensions.  Although such work is out-of-scope for
the working group, the working group has to avoid investing in protocol
extensions which, realistically, will never be used.  As a result I will
not present regarding possible extensions at the 7/9 meeting but may dp
this later if there is interest in compatible API's for important clients.

*Miscellaneous Items.*

We also had occasion to discuss some other issues regarding the agenda of
the  forthcoming meeting:

   - Chuck reminded me of my previously mentioned intention to use github
   for the handling of the writing and review of wg documents associated with
   rfc5661bis (i.e. draft-ietf-nfsv4-internationalization,
   draft-ietf-nfsv4-security-needs, draft-ietf-nfsv4-security,
   draft-ietf-nfsv4-rfc5661bis, and possibly others).   Chuck wasn't clear
   what exactly I might need from him to help the process and it turned out,
   I wasn't sure either.  We agreed that I would mention the issues in my
   general slides on the rfc5661bis process and that I'd give Chuck an early
   opportunity to respond to those.
   - Chuck pointed out the issue of the length of rfc5661 nd the need to
   address concerns about that.  I was concerned that previous suggestions in
   this regard (with regard to pNFS file) might result in multiple documents
   that don't fit together all that well.   I agreed to look at other
   approaches to the issue and present those to the working group at the 7/9