[nfsv4] Re: question around CB_GETATTR handling

Rick Macklem <rick.macklem@gmail.com> Thu, 05 September 2024 00:33 UTC

Return-Path: <rick.macklem@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3899DC1CAE8E for <nfsv4@ietfa.amsl.com>; Wed, 4 Sep 2024 17:33:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.108
X-Spam-Level:
X-Spam-Status: No, score=-7.108 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Pb-C5ot9MToh for <nfsv4@ietfa.amsl.com>; Wed, 4 Sep 2024 17:33:41 -0700 (PDT)
Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9FB3FC1CAE7F for <nfsv4@ietf.org>; Wed, 4 Sep 2024 17:33:41 -0700 (PDT)
Received: by mail-pl1-x62f.google.com with SMTP id d9443c01a7336-206aee40676so2288505ad.0 for <nfsv4@ietf.org>; Wed, 04 Sep 2024 17:33:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725496421; x=1726101221; darn=ietf.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=fxc22B7hcRq3NNfWPicdzf88h09KcAs4gEFv1A+DUog=; b=ePzcg+gOKRSk08ETE0miLDIjjVIzxqpFJLUit/QR55yoLBTo4qf350cdywh5NwrJMO DjHG2LcxDmqSyZYonNHEZ+O5G/Vu8kcaAKTPaj9zKlkHtETWg8dD+DU5omGSA5t4Mpsc 3xqqm7m3IA9sKw4a/vIlkbwun0s1jLhxnPoXiGSVur9KLsSPwnM9dHJYaCSIFQ1jjT0w FEgZ4f0fAWW0FqupSoZdqXRq3OGAjG+dnd9ABZseUZUhjxzLOtYX4qyk/G9I4B2tDIBC PkJKyC2EuK2D0cobEGC38WKDV4Z8mAudB6O4RUYZUMl7D8Gym8kf8RbEjyIY5Yx2BGfT /1qQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725496421; x=1726101221; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fxc22B7hcRq3NNfWPicdzf88h09KcAs4gEFv1A+DUog=; b=vDSuze0NmvK7Nuyo4gH5HB7pKDQcdcf6qvKorX0CdjS4cQF3dGWRKU8p+XDEb98L+1 DeXKuiQ7tPrCrxeJ/wkMGx9MYZKYQLp/sl9goRg9/FfZKNyytlHnzihgtF5NdCD84EQp pzmJom5wL/XX8HaAPlKQgQflKXtrTxxe/utwTICwu8EDaHbHr3/hp/dezCxl8vyqawDd Sb4LZ3NFuB6iYBElFMg7b5rLMtyFZSpuBz2rbVfVa19dixisaaFDCOnvMYNFeYWTWudk 4iGMoHahT9ko1JB3kTxjfdSUNCB/IMKeWSC8d+to5F2t9ISs33eOFenTKCtTg4sphL6F sJaQ==
X-Forwarded-Encrypted: i=1; AJvYcCX+bjLRW9xFY+5e7bsBByJrwNBRAAWY976bC3Ts+bdjDFf+F3kLcQj3TOPYGwHvZzhSWetzKQ==@ietf.org
X-Gm-Message-State: AOJu0YyLXhedxhEve+aOy9LfGJXe8D/heMRxri6itwMPQTVHwt3t+KZv cwSar3qL12z9iwpDobpLGaoh/RhGUisrqpjfF1Cp97zWcvZZcxCc8tootQuNrG0wsUWtkszUOLM qHjS0T18l4kNHrSDLIas7AP4408K5oP0=
X-Google-Smtp-Source: AGHT+IHia32AqwePdgIwqbV4Iq7O5oDbyum6dTdcvquu5OEC2lQcEV6o1rdH68T9SW9ODQbx4qU/TQNNB6e2i3Cvdgo=
X-Received: by 2002:a17:902:f547:b0:206:b618:1d91 with SMTP id d9443c01a7336-206b618252amr42575775ad.5.1725496420697; Wed, 04 Sep 2024 17:33:40 -0700 (PDT)
MIME-Version: 1.0
References: <6e6e3d11d0d48c88cc4cefc28b66d9cfb5874723.camel@poochiereds.net> <e584a74b46853af247212e8dc9bee1949c676cea.camel@hammerspace.com> <28b4c2851c73b518023f2e78be53874c64b1cff3.camel@poochiereds.net> <2074c10c3de50525f30089fd1634c421b3eddca6.camel@hammerspace.com> <880007b35526b41f41ce1263cd96813e7e5faf92.camel@poochiereds.net> <CAM5tNy5GJm+BXNiLJUZ8aS5kT5S0uLwUiBBF1MKQQ-0gGmTwQQ@mail.gmail.com> <CAM5tNy6aZM8RkuvycuNLaNhP6fB4O1TuGx3SvjW2Hd6ZE1Yhuw@mail.gmail.com> <1a3b67d2655f2298907a4c3473160428ad2e556f.camel@poochiereds.net>
In-Reply-To: <1a3b67d2655f2298907a4c3473160428ad2e556f.camel@poochiereds.net>
From: Rick Macklem <rick.macklem@gmail.com>
Date: Wed, 04 Sep 2024 17:33:30 -0700
Message-ID: <CAM5tNy4cppF6vC4_gA5nHEewM2KNeCeOwCe96OhFHBA3z-0EkA@mail.gmail.com>
To: Jeff Layton <jlayton@poochiereds.net>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Message-ID-Hash: EEF4ZW2VV6BJ7ATHCJEKD22VNAIUAKZM
X-Message-ID-Hash: EEF4ZW2VV6BJ7ATHCJEKD22VNAIUAKZM
X-MailFrom: rick.macklem@gmail.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-nfsv4.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: Trond Myklebust <trondmy@hammerspace.com>, "nfsv4@ietf.org" <nfsv4@ietf.org>
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [nfsv4] Re: question around CB_GETATTR handling
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/wXmz043eIxAmhKETQAorOSU5OS4>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Owner: <mailto:nfsv4-owner@ietf.org>
List-Post: <mailto:nfsv4@ietf.org>
List-Subscribe: <mailto:nfsv4-join@ietf.org>
List-Unsubscribe: <mailto:nfsv4-leave@ietf.org>

On Wed, Sep 4, 2024 at 4:54 PM Jeff Layton <jlayton@poochiereds.net> wrote:
>
> On Wed, 2024-09-04 at 14:15 -0700, Rick Macklem wrote:
> > On Wed, Sep 4, 2024 at 2:05 PM Rick Macklem <rick.macklem@gmail.com> wrote:
> > >
> > > On Wed, Sep 4, 2024 at 1:09 PM Jeff Layton <jlayton@poochiereds.net> wrote:
> > > >
> > > > On Wed, 2024-09-04 at 19:47 +0000, Trond Myklebust wrote:
> > > > > On Wed, 2024-09-04 at 15:33 -0400, Jeff Layton wrote:
> > > > > > On Wed, 2024-09-04 at 19:21 +0000, Trond Myklebust wrote:
> > > > > > > On Wed, 2024-09-04 at 12:23 -0400, Jeff Layton wrote:
> > > > > > > > RFC 8881, section 10.4.3 has this pseudo code to describe how to
> > > > > > > > handle
> > > > > > > > the size and change attr:
> > > > > > > >
> > > > > > > > ------------------8<-------------------
> > > > > > > >     if (!modified) {
> > > > > > > >         do CB_GETATTR for change and size;
> > > > > > > >
> > > > > > > >         if (cc != sc)
> > > > > > > >             modified = TRUE;
> > > > > > > >     } else {
> > > > > > > >         do CB_GETATTR for size;
> > > > > > > >     }
> > > > > > > >
> > > > > > > >     if (modified) {
> > > > > > > >         sc = sc + 1;
> > > > > > > >         time_modify = time_metadata = current_time;
> > > > > > > >         update sc, time_modify, time_metadata into file's
> > > > > > > > metadata;
> > > > > > > >     }
> > > > > > > > ------------------8<-------------------
> > > > > > > >
> > > > > > > >
> > > > > > > > The line that shows sc = sc + 1 seems to indicate that we need to
> > > > > > > > increment "sc" on every CB_GETATTR, even if nothing changed since
> > > > > > > > the
> > > > > > > > last time.
> > > > > > > >
> > > > > > > > Is that correct or should we only increment it if it wasn't
> > > > > > > > incremented
> > > > > > > > before? The latter would make more sense, given that it doesn't
> > > > > > > > query
> > > > > > > > for change if the file is already considered to be modified.
> > > > > > >
> > > > > > > I believe RFC5661 says
> > > > > > >
> > > > > > >    For simplicity of implementation, the client MAY for each
> > > > > > > CB_GETATTR
> > > > > > >    return the same value d.  This is true even if, between
> > > > > > > successive
> > > > > > >    CB_GETATTR operations, the client again modifies the file's data
> > > > > > > or
> > > > > > >    metadata in its cache.  The client can return the same value
> > > > > > > because
> > > > > > >    the only requirement is that the client be able to indicate to
> > > > > > > the
> > > > > > >    server that the client holds modified data.  Therefore, the
> > > > > > > value of
> > > > > > >    d may always be c + 1.
> > > > > > >
> > > > > > > Which is how the Linux client tries to implement it.
> > > > > > >
> > > > > >
> > > > > > Right, 8881 is similar.
> > > > >
> > > > > Yeah, but nobody has coded to that.
> > > > >
> > > > > >
> > > > > > In this case though, I'm asking about the server-side implementation.
> > > > > > That has its own algorithm (illustrated above in the pseudocode), and
> > > > > > it seems to indicate that once the server detects that the client has
> > > > > > made a single modification to the file, that it needs to increment
> > > > > > "sc"
> > > > > > on every subsequent CB_GETATTR.
> > > > > >
> > > > > > That makes no sense on its face, so I suspect this is a mistake in
> > > > > > the
> > > > > > pseudocode above. Or am I wrong and this is actually necessary?
> > > > > > --
> > > > >
> > > > > Oh... No, as far as the server is concerned, I think it is indeed
> > > > > supposed to increment the change attribute and timestamps on every
> > > > > reply. That is reinforced here:
> > > > >
> > > > >    As discussed earlier in this section, the client MAY return the same
> > > > >    cc value on subsequent CB_GETATTR calls, even if the file was
> > > > >    modified in the client's cache yet again between successive
> > > > >    CB_GETATTR calls.  Therefore, the server must assume that the file
> > > > >    has been modified yet again, and MUST take care to ensure that the
> > > > >    new nsc it constructs and returns is greater than the previous nsc it
> > > > >    returned.
> > > > >
> > > > > The spec also says that "committing the constructed attribute values to
> > > > > stable storage is OPTIONAL". That seems misleading or possibly just
> > > > > incorrect. The fact that the client declares that it is caching data
> > > > > doesn't necessarily mean that it will actually manage to write that
> > > > > data back.
> > > I think the server does have the option of not saving the change attribute
> > > on stable storage and then, after a server reboot, if the client does a
> > > OPEN/CLAIM_PREVIOUS for the delegation, the server can require
> > > an immediate recall of the delegation. (By setting recall == true in the
> > > open_read_delegation4/open_write_delegation4.)
> > >
> > > At least I think that works?
> > Hmm. I now think it takes more than the above. I think it does need to
> > commit nsc to stable storage, just as it would for changes done via
> > write(s) to the server, as an updated change.
> > (It is the sc value it can forget on a reboot, if
> > it does the immediate recall for CLAIM_REVIOUS for a write delegation.)
> >
> > Does this sound correct to others? rick
> >
>
> The Linux server doesn't always have ultimate control over the change
> attribute, and we can't necessarily commit a specific value to the
> metadata that the client gives us. The change attr is sort of like the
> ctime that way.
If there is something "harmless" that the nfsd can do to the file which
changes change, you might be able to do that every time other clients
do a GETATTR. (Such as setting ctime, maybe?)

That would presumably also get the change onto stable storage?

>
> > >
> > > >
> > > > Huh, ok.
> > > >
> > > > So imagine that the client gets a write deleg and makes a single small
> > > > change to the file, but then holds on to the write deleg for a long
> > > > time. Other clients are issuing CB_GETATTRs against the server.
> > > > Eventually the delegation is returned.
> > > >
> > > > Could you end up in a change attr reuse situation due to all of this
> > > > faux change attr incrementing by the server?
> > > Since change is 64bits, I think this will take a very very long time?
> > >
>
> The scenario I was thinking of is:
>
> Client gets a write delegation and the change attr is 1. Client does a
> (cached) modification and sends a response of 2 to the server's first
> CB_GETATTR (cc + 1).
>
> Next, we get GETATTR requests from more clients. The server increments
> "sc" for every one, and we end up with a change attribute of 5 in the
> delegation record. The exported file still has a change attr of 1 as
> there has been no writeback from the client yet.
>
> Finally, the client writes the data back and does a DELEGRETURN. The
> local file lands at a value of 3. More data is written from another
> client, and we eventually get to a change attribute of 5. The client
> that saw a value of 5 earlier does another GETATTR and is now confused.
> It doesn't realize that there have been more writes done by another
> client and the size is completely different from what it expects.
Since you have 64bits to work with, maybe the first increment of sc
could be by a large amount, so that the normal updates will "never" get
there. (Or something like putting the file system's change value in the
low order 63 bits with the high order bit always 0 vs always set the
high order bit in sc,
so you have bisected the change number space.)
I'm sure there are others. The increment by 1 in the RFC algorithm
is just an example of making it change each GETATTR.

rick

>
> As a practical matter, other clients can't do much with the file while
> there is a write delegation outstanding (aside from GETATTR), so that
> may mean that the above doesn't make any difference. The client doing
> the GETATTR can't do a READ or anything, and since we did at least one
> change attr increment, other clients will need to invalidate their
> caches.
> --
> Jeff Layton <jlayton@poochiereds.net>