Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1])
 by ietfa.amsl.com (Postfix) with ESMTP id EDAAB3A0C27
 for <nfsv4@ietfa.amsl.com>; Fri,  3 Jul 2020 03:43:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level: 
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5
 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
 DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001,
 HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001,
 URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key)
 header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44])
 by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id 3mK1sHkFdsDw for <nfsv4@ietfa.amsl.com>;
 Fri,  3 Jul 2020 03:43:56 -0700 (PDT)
Received: from mail-ed1-x533.google.com (mail-ed1-x533.google.com
 [IPv6:2a00:1450:4864:20::533])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by ietfa.amsl.com (Postfix) with ESMTPS id 824AE3A0C25
 for <nfsv4@ietf.org>; Fri,  3 Jul 2020 03:43:55 -0700 (PDT)
Received: by mail-ed1-x533.google.com with SMTP id dm19so21016303edb.13
 for <nfsv4@ietf.org>; Fri, 03 Jul 2020 03:43:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; 
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=2I5Gjwnoj92wLA6TMo9sFBwcInUYq8l+dHUg1BKqOKo=;
 b=qeRFBfHS5anaJSj02Z6EXpHK4zzb/nVcD8OYbtfLJ0+uIyVDMQbM6Skjgxm2pfZn/8
 gUR1buGabA6a9/hCb1FbfvVflOsaP3KejYh4EmFd4zWlMVozPTeYfW0jCMUnRWaMUyOz
 M++0VIToo526+fdadW2ipb2qNDANM15wlxxA5bSOaFwb85xKZFTvk+CNdkAfMYnCZVF9
 73kzVvUwweOm7QFk/toRU87S3S1oLYJpZj5pCfBvtd0klpWZrkJAvE9MkqsLi+H/+b6f
 ybWRb+dyE9PHAL5bZyTquTiFfnEvjjTEIk0Qc+MTVy3rF9DQQP4umYBSdXhiP1FeuQ1u
 OtmQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=2I5Gjwnoj92wLA6TMo9sFBwcInUYq8l+dHUg1BKqOKo=;
 b=djrOPvEtr/irdzK9vEWwocs67vGuL9aJgK5IGaGiesqmOd64irlUYweTcHRZEMK2wk
 +HBts2ZxL2XgM6G7GVkFLt5UahDE811pM6PjvF+R8N8pHKMBbHqTTny/lXtm02Hx6vAv
 KNadLAOn4G/NJAcfqvfeiY3RIqRjfRviIAWmQDnE7uI7lPAqUvySE1hlpu4FNHnMT4eO
 R2Es3bBaf2CfP3ozPCF3+lvbnXLO9hCX18vWNWBNIj0jmFXat1qrG2q4Jaiyhf8j9AuR
 6as5d4qz4xAHgDYroApQp/NU0mKiWIxGX4/u75IiVIun6tyKvT5YV328wAa64VMBJxQd
 IzAw==
X-Gm-Message-State: AOAM533GjTstcojgyERDSVZL2bD+EfK2MDgAXAyjcClZzsxu7VnMzy2Z
 SWpWNLGUdNq2EetnfTIOb72phDIFg53IYdgGk2p9dQ==
X-Google-Smtp-Source: ABdhPJxsBs5MqHQZxBYcvCcAcWvCWxbDC0dAeAbVGYeLwJL62i3xa+YEoMVMMNptZq/I1X3b5vwvNy8FYB0Sjvki8VE=
X-Received: by 2002:a50:f702:: with SMTP id g2mr40202926edn.348.1593773033891; 
 Fri, 03 Jul 2020 03:43:53 -0700 (PDT)
MIME-Version: 1.0
References: <CADaq8jev+tUs=mrGDMnZMpfmQXL=KLwDKW5S-CbBLpL-54RJTA@mail.gmail.com>
 <3fc0af37d7d870eeb6ab854a75d6eeb5aae61a0d.camel@hammerspace.com>
 <CADaq8jcb5BLiE49SyS3wxbbDmz88GJNWeLgeh5XU4oGkdBuJmw@mail.gmail.com>
 <5d9b4f697ec698a7f07e8168f56826dbb52e234b.camel@hammerspace.com>
 <CADaq8jenX2Su1MpMGpcvMuJncoDU+JMnMmdXaED9eYXiKrnGcA@mail.gmail.com>
 <QB1PR01MB3364D6132B8D515B7766A606DD6A0@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <QB1PR01MB3364D6132B8D515B7766A606DD6A0@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM>
From: David Noveck <davenoveck@gmail.com>
Date: Fri, 3 Jul 2020 06:43:42 -0400
Message-ID: <CADaq8jdSYD5n6roZ8gZh7RHS3A0SapFuPb9s2Aq05W++d32wEA@mail.gmail.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Cc: Trond Myklebust <trondmy@hammerspace.com>, NFSv4 <nfsv4@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000089a94a05a9873313"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/hxK17tDgAYztIYeJQyOI3y12juE>
Subject: Re: [nfsv4] Notes regarding discussion of directory scalabiliy
 issues
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>,
 <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>,
 <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Jul 2020 10:43:59 -0000

--00000000000089a94a05a9873313
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Thu, Jul 2, 2020, 9:03 PM Rick Macklem <rmacklem@uoguelph.ca> wrote:

> Well, I'll throw out a simple suggestion w.r.t. directory cookies.
>

I'm also working on a suggestion about helping clients maintain directory
cookies. Slides will be presented 7/9.


> What if the cookies generated by the server were required to have the
> following properties:
>

I don't think we can add new requirements for directory cookies.
Requirements for the granting or retention of directory delegations are
possible but I'd like  to derive them from inherent aspects of the existing
notification scheme. Adding wholly new requirements on an existing feature
is not possible.

- Monotonically increasing numeric values.
>

Not possible to add this requirment. Also don't see the need/value of
this.  In my view, the existing directory notification scheme provides
adequate information for subscribed clients to maintain the structure of
cached directories being modified, including cookie values.

- Sparse enough that new entries can be created with numeric values
>   between them.
>

Again, a new requirement, with no clear justification.

- Guaranteed to not change and remain valid (and referring to the same
> entry)
>   despite additions/deletions. (Until a deletion callback is received by
> the client.)
>

Should also be allowed if the directory delegation is recalled.

I'm unclear about the callbacks you are proposing. Normally, notification
callbacks are not sent to the client making the change. It might be
possible to change that in a v4.2 extension but it adds to operation
latency and creates sequencing difficulties =E2=98=B9=EF=B8=8F

I considered returning the necessary info in a bunch of new ops but the
whole thing got waaay too complicated.  I think the best choice would be a
GET_NOTIFICATION op to be issued after the CREATE, OPEN, LINK, REMOVE,
RENAME.


> For example, for a simple case of a UFS file system on a server, the UFS
> directory consists of blocks of directory entries.
> - When an entry is added, it goes in a gap within the directory or grows =
a
>    new block at the end (if I vaguely recall this correctly;-).
> - When an entry is removed, the entry is erased without moving other
>    entries.
> For this example, the directory offset cookie can simply be the byte offs=
et
> of the entry.
> (I haven't looked at other file systems, but hopefully offset cookies wit=
h
> the
>  above properties can be created for most of them?)
>

Possibly, but given the need for this, I'm not inclined to rely on hope.

>
> When an entry is added/deleted, the server issues a callback to the clien=
t
> with the new directory cookie offset (and the directory entry for the add
> case).
>

I'd prefer that the same information that is returned as in the existing
directory notification scheme, which allows you to maintain directory order
and cookies, without tieing one to the other.


> The client can maintain this structure in any number of ways (and it coul=
d
> be
> fun figuring out what works well), but a trivial version could be:
> - The reply to a readdir is kept as a list head (with the directory offse=
t
> of the
>    first entry) and a linked list of the entries in order, with their
> cookie.
>   (Remember that the cookies are in the same ordering as the entries
>

Not in my scheme.

- The next readdir reply creates the next head/list.
>
> readdir() just works through each list, following each head in order.
> telldir() returns the cookie for the entry.
> seekdir() just finds the correct list and then searches down that list fo=
r
> a match.
>
> The remove/add entry callbacks just insert/delete entrie(s) in the
> appropriate
> list.
> (I'd probably keep these lists in the kernel client under the VFS for
> FreeBSD,
>  as malloc'd data structures, but that is simply an implementation choice=
.)
>
> You would require the monotonically increasing property for directory
> delegations to be issued. (Without that you don't know where to insert
> additions.)


The insert notification gives you enough information to do this without the
monotonically increasing property.

You would also require that extant directory entry cookies
> remain valid and unchanged when additions/deletions occur.
> (Note that removing an entry and then adding an entry at the same offset
>  is allowed under POSIX telldir()/seekdir() as I understand it.)
>

I think the requirement goes away when the delegation does.


> This avoids any need for the client to synthesize cookies and just use th=
e
> ones
> returned by the server, I think?
>

Yes


> Just a simple idea that may be worth considering?
> (If this has already been discussed,


Don't think it has.

I apologize for not seeing it.)
>

Even if it had, no apology would be necessary


> rick
>
> ________________________________________
> From: nfsv4 <nfsv4-bounces@ietf.org> on behalf of David Noveck <
> davenoveck@gmail.com>
> Sent: Thursday, July 2, 2020 6:06 AM
> To: Trond Myklebust
> Cc: nfsv4@ietf.org
> Subject: Re: [nfsv4] Notes regarding discussion of directory scalabiliy
> issues
>
> CAUTION: This email originated from outside of the University of Guelph.
> Do not click links or open attachments unless you recognize the sender an=
d
> know the content is safe. If in doubt, forward suspicious emails to
> IThelp@uoguelph.ca
>
>
>
> On Tuesday, June 30, 2020, Trond Myklebust <trondmy@hammerspace.com
> <mailto:trondmy@hammerspace.com>> wrote:
> On Tue, 2020-06-30 at 08:40 -0400, David Noveck wrote:
> Thanks for your helpful comments.
>
> On Mon, Jun 29, 2020 at 2:43 PM Trond Myklebust <trondmy@hammerspace.com
> <mailto:trondmy@hammerspace.com>> wrote:
> Like it or not, the readdir cookie is an attribute of the directory.
>
>
> If the protocol treated them as such, then the attribute notifications
> feature could provide updates to the client.   Given that it doesn't, we
> could add a cookie update feature to directory notification feaure as a
> v4.2 extension to the protocol.  However, I'm reluctant to start work on
> the necessary protocol additions until we are sure they are needed to
> provide better directory cacheability.
>
> Actually, they are attributes of directory streams.   The difference is
> not all that important given that client implementations are unlikely to =
be
> aware of the specific steam associated with any particular request.
> However, there are a few cases in which the difference is important in
> determining whether various approaches to client handling of cookies migh=
t
> or might not work, and will be important in the discussion below:
>
>   *   Two requests made on different clients necessarily are made on
> distinct streams.
>   *   Two requests made on different instances of the same client (with a=
n
> intervening restart/reboot) also have to arise on different streams.
>
> If I want to support the POSIX telldir() and seekdir() operations (
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/seekdir.html
> ), then I need to ensure that when the application calls seekdir(), I
> return to the exact same cursor location in the stream that I was at when=
 I
> called telldir().
>
>
> Agreed.
>
> Without a server side cookie on which to anchor my telldir() cookies,
>
>
> Every client has these available but it is not clear to me useful such
> anchoring is.   I think the flexibility that each client has to assign
> cookies to streams it is responsible for is valuable and could be
> compromised if anchoring to the server cookies is made the focus of the
> implementation.
>
> then all I have is a list of filenames that can and will change every tim=
e
> a file is created, deleted or renamed.
>
>
> Clearly it will change.  However, the directory notifications feature
> makes some assumptions, currently implicit about how the list will change=
.
>  Once these are made explicit, the wg could decide that server/fs pairs
> incapable of staying within these reasonable restrictions (if they are, i=
n
> fact, reasonable), cannot support the directory notifications feature.
>
> Both the length and ordering of that list may change whenever the
> directory is modified,
>
>
> Clearly the length will change, but the reasonable expectation is that
> creating a file will increase the length by one and deleting one will
> decrease it by one.   I don't see the value of supporting directory
> notifications on server fs that do something else.
>
> With regard ro ordering,  suppose the spec allows an fs to shuffle the
> directory order every time a change is made, but I'm unaware of any actua=
l
> file systems that do this.   Do we need to support directory notification=
s
> for such fs's?
>
>
> touch foo; touch bar; ln foo baz; rm foo; mv baz foo
>
> There... Most filesystems will end up reordering 'foo' and 'bar' in the
> directory stream given the above sequence of commands. How does the clien=
t
> figure out what happened if the above sequence of commands is performed o=
n
> the server?
> Now let's say that is a directory of a million files, and something like
> the above is made to happen regularly. How do I maintain a stable list of
> synthetic cookies on the client?
>
> I think you are right about there being cases in which it is impossible,
> but we either disagree or are simply talking past one another about other
> cases.
>
> If the caching client is making the directory changes, then I agree this
> cannot be done and you are stuck having to refetch potentially large
> directories to deal with new READDIR requests=E2=98=B9=EF=B8=8F
>
> Where we might disagree is the case in which another client is making the
> change.  In that case directory notifications would allow you to avoid
> repeated READDIR ops, whether you are providing the user synthetic or
> server-based cookies.
>
> My talk on directory caching will discuss the possibility of v4.2
> extensions to address the same-client directory caching issue, as well as
> possible clarifications regarding directory delegation/notification in v4=
.1.
>
>
> meaning that a naive implementation
>
>
> OK.   I'll plead guilty to one misdemeanor count of directory naivety.
>
> of synthetic cookies as an offset is not compatible with the
> telldir()/seekdir() requirements.
>
>
> It's not clear to me how this incompatibility would manifest itself.  I
> think I need to understand what would break.
>
> To make matters worse, the list size is for all intents and purposes
> unbounded, because there is no hard limit on the size of a directory. Tha=
t
> makes it also impossible to create a cached mapping between a synthetic
> cookie and a filename; such a mapping would be unbounded both in size and
> in duration (since we don't know a priori how long the application will
> keep the directory open, or for that matter, which exact set of cookies i=
t
> may have cached).
>
>
> Such a mapping would, in essence, be part of the cached directory.   So,
> if it is too big to keep in client memory,then it is too big to cache and
> you might as well decide not to cache it.
>
> I expect there is an issue that is a worry in the case in which a
> reasonably sized directory  grows over time to be too big to cache while =
an
> open directory stream retains some directory cookies which might be
> incompatible with the client dropping  caching of directories and switchi=
ng
> to server-based cookies.=F0=9F=98=96
> I feel it is reasonable to treat this situation as one might a
> cookie-verifier failure, particularly if this is the only worrisome failu=
re
> mode.   However, this possibility means that I would not ask clients to
> implement such local cookies. To enable that, we would have to make
> explicit the same sort of reasonableness requirement for cookie changes
> that we have already discussed for ordering changes.  RFC7530 already
> alludes to the need to avoid spurious cookie invalidations although not i=
n
> as explicit or strict way as we would need to support directory
> notifications:
>
>    As there is no way for the client to indicate that a cookie value,
>
>    once received, will not be subsequently used, server implementations
>
>    should avoid schemes that allocate memory corresponding to a returned
>
>    cookie.  Such allocation can be avoided if the server bases cookie
>
>    values on a value such as the offset within the directory where the
>
>    scan is to be resumed.
>
>
>    Cookies generated by such techniques should be designed to remain
>
>    valid despite modification of the associated directory.  If a server
>
>    were to invalidate a cookie because of a directory modification,
>
>    READDIRs of large directories might never finish.
>
>
> So in order to make this work the client would basically have to create
> its own B-tree and persist it in storage somewhere.
>
>
> I don't see the need to make this persistent.  If the client restarts, al=
l
> directory streams have ceased to exist and we know  a posteriori  that
> there are no outstanding directory cookies to which the client would have
> to respond.
> <mailto:nfsv4@ietf.org>
>
> --
>
> --
>
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com<mailto:trond.myklebust@primarydata.com>
>
>
>
>

--00000000000089a94a05a9873313
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto"><div><br><br><div class=3D"gmail_quote"><div dir=3D"ltr" =
class=3D"gmail_attr">On Thu, Jul 2, 2020, 9:03 PM Rick Macklem &lt;<a href=
=3D"mailto:rmacklem@uoguelph.ca" rel=3D"noreferrer noreferrer noreferrer no=
referrer noreferrer noreferrer" target=3D"_blank">rmacklem@uoguelph.ca</a>&=
gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0=
 .8ex;border-left:1px #ccc solid;padding-left:1ex">Well, I&#39;ll throw out=
 a simple suggestion w.r.t. directory cookies.<br></blockquote></div></div>=
<div dir=3D"auto"><br></div><div dir=3D"auto">I&#39;m also working on a sug=
gestion about helping clients maintain directory cookies. Slides will be pr=
esented 7/9.</div><div dir=3D"auto"><br></div><div dir=3D"auto"><div class=
=3D"gmail_quote"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8=
ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
What if the cookies generated by the server were required to have the<br>
following properties:<br></blockquote></div></div><div dir=3D"auto"><br></d=
iv><div dir=3D"auto">I don&#39;t think we can add new requirements for dire=
ctory cookies.=C2=A0 Requirements for the granting or retention of director=
y delegations are possible but I&#39;d like=C2=A0 to derive them from inher=
ent aspects of the existing notification scheme. Adding wholly new requirem=
ents on an existing feature is not possible.</div><div dir=3D"auto"><br></d=
iv><div dir=3D"auto"><div class=3D"gmail_quote"><blockquote class=3D"gmail_=
quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1=
ex">
- Monotonically increasing numeric values.<br></blockquote></div></div><div=
 dir=3D"auto"><br></div><div dir=3D"auto">Not possible to add this requirme=
nt. Also don&#39;t see the need/value of this.=C2=A0 In my view, the existi=
ng directory notification scheme provides adequate information for subscrib=
ed clients to maintain the structure of cached directories being modified, =
including cookie values.</div><div dir=3D"auto"><br></div><div dir=3D"auto"=
><div class=3D"gmail_quote"><blockquote class=3D"gmail_quote" style=3D"marg=
in:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
- Sparse enough that new entries can be created with numeric values<br>
=C2=A0 between them.<br></blockquote></div></div><div dir=3D"auto"><br></di=
v><div dir=3D"auto">Again, a new requirement, with no clear justification.<=
/div><div dir=3D"auto"><br></div><div dir=3D"auto"><div class=3D"gmail_quot=
e"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left=
:1px #ccc solid;padding-left:1ex">
- Guaranteed to not change and remain valid (and referring to the same entr=
y)<br>
=C2=A0 despite additions/deletions. (Until a deletion callback is received =
by the client.)<br></blockquote></div></div><div dir=3D"auto"><br></div><di=
v dir=3D"auto">Should also be allowed if the directory delegation is recall=
ed.</div><div dir=3D"auto"><br></div><div dir=3D"auto">I&#39;m unclear abou=
t the callbacks you are proposing. Normally, notification callbacks are not=
 sent to the client making the change. It might be possible to change that =
in a v4.2 extension but it adds to operation latency and creates sequencing=
 difficulties =E2=98=B9=EF=B8=8F</div><div dir=3D"auto"><br></div><div dir=
=3D"auto">I considered returning the necessary info in a bunch of new ops b=
ut the whole thing got waaay too complicated.=C2=A0 I think the best choice=
 would be a GET_NOTIFICATION op to be issued after the CREATE, OPEN, LINK, =
REMOVE, RENAME.</div><div dir=3D"auto"><br></div><div dir=3D"auto"><div cla=
ss=3D"gmail_quote"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
For example, for a simple case of a UFS file system on a server, the UFS<br=
>
directory consists of blocks of directory entries.<br>
- When an entry is added, it goes in a gap within the directory or grows a<=
br>
=C2=A0 =C2=A0new block at the end (if I vaguely recall this correctly;-).<b=
r>
- When an entry is removed, the entry is erased without moving other<br>
=C2=A0 =C2=A0entries.<br>
For this example, the directory offset cookie can simply be the byte offset=
<br>
of the entry.<br>
(I haven&#39;t looked at other file systems, but hopefully offset cookies w=
ith the<br>
=C2=A0above properties can be created for most of them?)<br></blockquote></=
div></div><div dir=3D"auto"><br></div><div dir=3D"auto">Possibly, but given=
 the need for this, I&#39;m not inclined to rely on hope.</div><div dir=3D"=
auto"><div class=3D"gmail_quote"><blockquote class=3D"gmail_quote" style=3D=
"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
When an entry is added/deleted, the server issues a callback to the client<=
br>
with the new directory cookie offset (and the directory entry for the add c=
ase).<br></blockquote></div></div><div dir=3D"auto"><br></div><div dir=3D"a=
uto">I&#39;d prefer that the same information that is returned as in the ex=
isting directory notification scheme, which allows you to maintain director=
y order and cookies, without tieing one to the other.</div><div dir=3D"auto=
"><br></div><div dir=3D"auto"><div class=3D"gmail_quote"><blockquote class=
=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padd=
ing-left:1ex">
<br>
The client can maintain this structure in any number of ways (and it could =
be<br>
fun figuring out what works well), but a trivial version could be:<br>
- The reply to a readdir is kept as a list head (with the directory offset =
of the<br>
=C2=A0 =C2=A0first entry) and a linked list of the entries in order, with t=
heir cookie.<br>
=C2=A0 (Remember that the cookies are in the same ordering as the entries<b=
r></blockquote></div></div><div dir=3D"auto"><br></div><div dir=3D"auto">No=
t in my scheme.</div><div dir=3D"auto"><br></div><div dir=3D"auto"><div cla=
ss=3D"gmail_quote"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex">
- The next readdir reply creates the next head/list.<br>
<br>
readdir() just works through each list, following each head in order.<br>
telldir() returns the cookie for the entry.<br>
seekdir() just finds the correct list and then searches down that list for =
a match.<br>
<br>
The remove/add entry callbacks just insert/delete entrie(s) in the appropri=
ate<br>
list.<br>
(I&#39;d probably keep these lists in the kernel client under the VFS for F=
reeBSD,<br>
=C2=A0as malloc&#39;d data structures, but that is simply an implementation=
 choice.)<br>
<br>
You would require the monotonically increasing property for directory<br>
delegations to be issued. (Without that you don&#39;t know where to insert<=
br>
additions.) </blockquote></div></div><div dir=3D"auto"><br></div><div dir=
=3D"auto">The insert notification gives you enough information to do this w=
ithout the monotonically increasing property.</div><div dir=3D"auto"><br></=
div><div dir=3D"auto"><div class=3D"gmail_quote"><blockquote class=3D"gmail=
_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:=
1ex">You would also require that extant directory entry cookies<br>
remain valid and unchanged when additions/deletions occur.<br>
(Note that removing an entry and then adding an entry at the same offset<br=
>
=C2=A0is allowed under POSIX telldir()/seekdir() as I understand it.)<br></=
blockquote></div></div><div dir=3D"auto"><br></div><div dir=3D"auto">I thin=
k the requirement goes away when the delegation does.</div><div dir=3D"auto=
"><br></div><div dir=3D"auto"><div class=3D"gmail_quote"><blockquote class=
=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padd=
ing-left:1ex">
<br>
This avoids any need for the client to synthesize cookies and just use the =
ones<br>
returned by the server, I think?<br></blockquote></div></div><div dir=3D"au=
to"><br></div><div dir=3D"auto">Yes</div><div dir=3D"auto"><br></div><div d=
ir=3D"auto"><div class=3D"gmail_quote"><blockquote class=3D"gmail_quote" st=
yle=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Just a simple idea that may be worth considering?<br>
(If this has already been discussed, </blockquote></div></div><div dir=3D"a=
uto"><br></div><div dir=3D"auto">Don&#39;t think it has.</div><div dir=3D"a=
uto"><br></div><div dir=3D"auto"><div class=3D"gmail_quote"><blockquote cla=
ss=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pa=
dding-left:1ex">I apologize for not seeing it.)<br></blockquote></div></div=
><div dir=3D"auto"><br></div><div dir=3D"auto">Even if it had, no apology w=
ould be necessary</div><div dir=3D"auto"><br></div><div dir=3D"auto"><div c=
lass=3D"gmail_quote"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 =
0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
rick<br>
<br>
________________________________________<br>
From: nfsv4 &lt;<a href=3D"mailto:nfsv4-bounces@ietf.org" rel=3D"noreferrer=
 noreferrer noreferrer noreferrer noreferrer noreferrer noreferrer" target=
=3D"_blank">nfsv4-bounces@ietf.org</a>&gt; on behalf of David Noveck &lt;<a=
 href=3D"mailto:davenoveck@gmail.com" rel=3D"noreferrer noreferrer noreferr=
er noreferrer noreferrer noreferrer noreferrer" target=3D"_blank">davenovec=
k@gmail.com</a>&gt;<br>
Sent: Thursday, July 2, 2020 6:06 AM<br>
To: Trond Myklebust<br>
Cc: <a href=3D"mailto:nfsv4@ietf.org" rel=3D"noreferrer noreferrer noreferr=
er noreferrer noreferrer noreferrer noreferrer" target=3D"_blank">nfsv4@iet=
f.org</a><br>
Subject: Re: [nfsv4] Notes regarding discussion of directory scalabiliy iss=
ues<br>
<br>
CAUTION: This email originated from outside of the University of Guelph. Do=
 not click links or open attachments unless you recognize the sender and kn=
ow the content is safe. If in doubt, forward suspicious emails to <a href=
=3D"mailto:IThelp@uoguelph.ca" rel=3D"noreferrer noreferrer noreferrer nore=
ferrer noreferrer noreferrer noreferrer" target=3D"_blank">IThelp@uoguelph.=
ca</a><br>
<br>
<br>
<br>
On Tuesday, June 30, 2020, Trond Myklebust &lt;<a href=3D"mailto:trondmy@ha=
mmerspace.com" rel=3D"noreferrer noreferrer noreferrer noreferrer noreferre=
r noreferrer noreferrer" target=3D"_blank">trondmy@hammerspace.com</a>&lt;m=
ailto:<a href=3D"mailto:trondmy@hammerspace.com" rel=3D"noreferrer noreferr=
er noreferrer noreferrer noreferrer noreferrer noreferrer" target=3D"_blank=
">trondmy@hammerspace.com</a>&gt;&gt; wrote:<br>
On Tue, 2020-06-30 at 08:40 -0400, David Noveck wrote:<br>
Thanks for your helpful comments.<br>
<br>
On Mon, Jun 29, 2020 at 2:43 PM Trond Myklebust &lt;<a href=3D"mailto:trond=
my@hammerspace.com" rel=3D"noreferrer noreferrer noreferrer noreferrer nore=
ferrer noreferrer noreferrer" target=3D"_blank">trondmy@hammerspace.com</a>=
&lt;mailto:<a href=3D"mailto:trondmy@hammerspace.com" rel=3D"noreferrer nor=
eferrer noreferrer noreferrer noreferrer noreferrer noreferrer" target=3D"_=
blank">trondmy@hammerspace.com</a>&gt;&gt; wrote:<br>
Like it or not, the readdir cookie is an attribute of the directory.<br>
<br>
<br>
If the protocol treated them as such, then the attribute notifications feat=
ure could provide updates to the client.=C2=A0 =C2=A0Given that it doesn&#3=
9;t, we could add a cookie update feature to directory notification feaure =
as a v4.2 extension to the protocol.=C2=A0 However, I&#39;m reluctant to st=
art work on the necessary protocol additions until we are sure they are nee=
ded to provide better directory cacheability.<br>
<br>
Actually, they are attributes of directory streams.=C2=A0 =C2=A0The differe=
nce is not all that important given that client implementations are unlikel=
y to be aware of the specific steam associated with any particular request.=
=C2=A0 However, there are a few cases in which the difference is important =
in determining whether various approaches to client handling of cookies mig=
ht or might not work, and will be important in the discussion below:<br>
<br>
=C2=A0 *=C2=A0 =C2=A0Two requests made on different clients necessarily are=
 made on distinct streams.<br>
=C2=A0 *=C2=A0 =C2=A0Two requests made on different instances of the same c=
lient (with an intervening restart/reboot) also have to arise on different =
streams.<br>
<br>
If I want to support the POSIX telldir() and seekdir() operations ( <a href=
=3D"https://pubs.opengroup.org/onlinepubs/9699919799/functions/seekdir.html=
" rel=3D"noreferrer noreferrer noreferrer noreferrer noreferrer noreferrer =
noreferrer noreferrer" target=3D"_blank">https://pubs.opengroup.org/onlinep=
ubs/9699919799/functions/seekdir.html</a> ), then I need to ensure that whe=
n the application calls seekdir(), I return to the exact same cursor locati=
on in the stream that I was at when I called telldir().<br>
<br>
<br>
Agreed.<br>
<br>
Without a server side cookie on which to anchor my telldir() cookies,<br>
<br>
<br>
Every client has these available but it is not clear to me useful such anch=
oring is.=C2=A0 =C2=A0I think the flexibility that each client has to assig=
n cookies to streams it is responsible for is valuable and could be comprom=
ised if anchoring to the server cookies is made the focus of the implementa=
tion.<br>
<br>
then all I have is a list of filenames that can and will change every time =
a file is created, deleted or renamed.<br>
<br>
<br>
Clearly it will change.=C2=A0 However, the directory notifications feature =
makes some assumptions, currently implicit about how the list will change.=
=C2=A0 =C2=A0Once these are made explicit, the wg could decide that server/=
fs pairs incapable of staying within these reasonable restrictions (if they=
 are, in fact, reasonable), cannot support the directory notifications feat=
ure.<br>
<br>
Both the length and ordering of that list may change whenever the directory=
 is modified,<br>
<br>
<br>
Clearly the length will change, but the reasonable expectation is that crea=
ting a file will increase the length by one and deleting one will decrease =
it by one.=C2=A0 =C2=A0I don&#39;t see the value of supporting directory no=
tifications on server fs that do something else.<br>
<br>
With regard ro ordering,=C2=A0 suppose the spec allows an fs to shuffle the=
 directory order every time a change is made, but I&#39;m unaware of any ac=
tual file systems that do this.=C2=A0 =C2=A0Do we need to support directory=
 notifications for such fs&#39;s?<br>
<br>
<br>
touch foo; touch bar; ln foo baz; rm foo; mv baz foo<br>
<br>
There... Most filesystems will end up reordering &#39;foo&#39; and &#39;bar=
&#39; in the directory stream given the above sequence of commands. How doe=
s the client figure out what happened if the above sequence of commands is =
performed on the server?<br>
Now let&#39;s say that is a directory of a million files, and something lik=
e the above is made to happen regularly. How do I maintain a stable list of=
 synthetic cookies on the client?<br>
<br>
I think you are right about there being cases in which it is impossible, bu=
t we either disagree or are simply talking past one another about other cas=
es.<br>
<br>
If the caching client is making the directory changes, then I agree this ca=
nnot be done and you are stuck having to refetch potentially large director=
ies to deal with new READDIR requests=E2=98=B9=EF=B8=8F<br>
<br>
Where we might disagree is the case in which another client is making the c=
hange.=C2=A0 In that case directory notifications would allow you to avoid =
repeated READDIR ops, whether you are providing the user synthetic or serve=
r-based cookies.<br>
<br>
My talk on directory caching will discuss the possibility of v4.2 extension=
s to address the same-client directory caching issue, as well as possible c=
larifications regarding directory delegation/notification in v4.1.<br>
<br>
<br>
meaning that a naive implementation<br>
<br>
<br>
OK.=C2=A0 =C2=A0I&#39;ll plead guilty to one misdemeanor count of directory=
 naivety.<br>
<br>
of synthetic cookies as an offset is not compatible with the telldir()/seek=
dir() requirements.<br>
<br>
<br>
It&#39;s not clear to me how this incompatibility would manifest itself.=C2=
=A0 I think I need to understand what would break.<br>
<br>
To make matters worse, the list size is for all intents and purposes unboun=
ded, because there is no hard limit on the size of a directory. That makes =
it also impossible to create a cached mapping between a synthetic cookie an=
d a filename; such a mapping would be unbounded both in size and in duratio=
n (since we don&#39;t know a priori how long the application will keep the =
directory open, or for that matter, which exact set of cookies it may have =
cached).<br>
<br>
<br>
Such a mapping would, in essence, be part of the cached directory.=C2=A0 =
=C2=A0So, if it is too big to keep in client memory,then it is too big to c=
ache and you might as well decide not to cache it.<br>
<br>
I expect there is an issue that is a worry in the case in which a reasonabl=
y sized directory=C2=A0 grows over time to be too big to cache while an ope=
n directory stream retains some directory cookies which might be incompatib=
le with the client dropping=C2=A0 caching of directories and switching to s=
erver-based cookies.=F0=9F=98=96<br>
I feel it is reasonable to treat this situation as one might a cookie-verif=
ier failure, particularly if this is the only worrisome failure mode.=C2=A0=
 =C2=A0However, this possibility means that I would not ask clients to impl=
ement such local cookies. To enable that, we would have to make explicit th=
e same sort of reasonableness requirement for cookie changes that we have a=
lready discussed for ordering changes.=C2=A0 RFC7530 already alludes to the=
 need to avoid spurious cookie invalidations although not in as explicit or=
 strict way as we would need to support directory notifications:<br>
<br>
=C2=A0 =C2=A0As there is no way for the client to indicate that a cookie va=
lue,<br>
<br>
=C2=A0 =C2=A0once received, will not be subsequently used, server implement=
ations<br>
<br>
=C2=A0 =C2=A0should avoid schemes that allocate memory corresponding to a r=
eturned<br>
<br>
=C2=A0 =C2=A0cookie.=C2=A0 Such allocation can be avoided if the server bas=
es cookie<br>
<br>
=C2=A0 =C2=A0values on a value such as the offset within the directory wher=
e the<br>
<br>
=C2=A0 =C2=A0scan is to be resumed.<br>
<br>
<br>
=C2=A0 =C2=A0Cookies generated by such techniques should be designed to rem=
ain<br>
<br>
=C2=A0 =C2=A0valid despite modification of the associated directory.=C2=A0 =
If a server<br>
<br>
=C2=A0 =C2=A0were to invalidate a cookie because of a directory modificatio=
n,<br>
<br>
=C2=A0 =C2=A0READDIRs of large directories might never finish.<br>
<br>
<br>
So in order to make this work the client would basically have to create its=
 own B-tree and persist it in storage somewhere.<br>
<br>
<br>
I don&#39;t see the need to make this persistent.=C2=A0 If the client resta=
rts, all directory streams have ceased to exist and we know=C2=A0 a posteri=
ori=C2=A0 that there are no outstanding directory cookies to which the clie=
nt would have to respond.<br>
&lt;mailto:<a href=3D"mailto:nfsv4@ietf.org" rel=3D"noreferrer noreferrer n=
oreferrer noreferrer noreferrer noreferrer noreferrer" target=3D"_blank">nf=
sv4@ietf.org</a>&gt;<br>
<br>
--<br>
<br>
--<br>
<br>
Trond Myklebust<br>
Linux NFS client maintainer, Hammerspace<br>
<a href=3D"mailto:trond.myklebust@hammerspace.com" rel=3D"noreferrer norefe=
rrer noreferrer noreferrer noreferrer noreferrer noreferrer" target=3D"_bla=
nk">trond.myklebust@hammerspace.com</a>&lt;mailto:<a href=3D"mailto:trond.m=
yklebust@primarydata.com" rel=3D"noreferrer noreferrer noreferrer noreferre=
r noreferrer noreferrer noreferrer" target=3D"_blank">trond.myklebust@prima=
rydata.com</a>&gt;<br>
<br>
<br>
</blockquote><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;b=
order-left:1px #ccc solid;padding-left:1ex"><br></blockquote></div></div></=
div>

--00000000000089a94a05a9873313--

