[nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operations

Rick Macklem <rick.macklem@gmail.com> Tue, 20 August 2024 20:29 UTC

Return-Path: <rick.macklem@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5ABBCC151522 for <nfsv4@ietfa.amsl.com>; Tue, 20 Aug 2024 13:29:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.107
X-Spam-Level:
X-Spam-Status: No, score=-2.107 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lldah39mUOoH for <nfsv4@ietfa.amsl.com>; Tue, 20 Aug 2024 13:29:14 -0700 (PDT)
Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com [IPv6:2607:f8b0:4864:20::1031]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B44DBC15109F for <nfsv4@ietf.org>; Tue, 20 Aug 2024 13:29:14 -0700 (PDT)
Received: by mail-pj1-x1031.google.com with SMTP id 98e67ed59e1d1-2d3c05dc63eso4417948a91.0 for <nfsv4@ietf.org>; Tue, 20 Aug 2024 13:29:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724185754; x=1724790554; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=wI0oGpVBbXINvDuyKlA4SprPN/5hMi7KyYFfR53ugoI=; b=ffHrxXz5Uh6Z/yyN+SUYTJJF0wG7bFVOac4+1G6hEi4RG/jOee3cMo8CruCxVdEZo+ 5yd7xFalG4SiMm1g8YYkvL/2P1RWhBa197pR15l6P05fPs3ju+V/rokob2q6CvIWWsIu fwviPOyK0gfFeXNZcjz71/j0jSyROWZqWqPH8wkPYrMgCVY7XBj11rwKbetvHFfCifOA vujzmSEKNi3YjenVrWdmx15FcXlnbNVYu4dlF3fJQ7ImVyk4wyDePehc9Th1Cdazk04D /ek8OckiQ0rNxzBBOLglVUAZv+GI2HmGo6KwFhWherd0vDBZ3Knd6IYWfSJRRbRG0eaA sj+g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724185754; x=1724790554; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=wI0oGpVBbXINvDuyKlA4SprPN/5hMi7KyYFfR53ugoI=; b=Ye1JLzuVrKvbbXIRy1xBudJRkyB1JJrd6yGC6YGab1k4mFaPFRbAlAWRCZnq1ePCpl N4MdD60xj7z+2PJbvxNOklzW1bpL4tTsnbQ4RKHS7rhogRt88eUT5Qz7XdrOJ5ARe050 MLzINoz4OnvTRCUMInhIEeqRIUS2RH+wIpZGwetsK+91D85QKTg7gZTKzl9QMb2oyG0y bhVC2udHTIM3GFeOFXHQ357Wat5lnoqp+C3s/48NG0cBRzBZVBe7FrKKkGHOz3htoNMW 7ywYNBd08BNBot+GcexuiAjDbdvRHjia1qpQGD4cHlmZVDckTSmOl1//HiygqZOQZFVr nysA==
X-Gm-Message-State: AOJu0YxMkOp8zqN8x746yaNBiPQj0WqfoGHxeJuTGNVWIPuak3MSGU5i kebljMgvJHa7GTK9UoTf7K6BMLklQyvyxHP9BwwBTh/pNrLfx/k5GqWL7ccaWdSjF+lcT2AjVJN F9F3c+WlW78Gyv73BLnueKK0t3V61
X-Google-Smtp-Source: AGHT+IH8ACi9gKvk6nC2YUTDyx/HhrOye9JqH/LHna4oDKhQ8u7jC4dJS9DipbK7ZpK3u2I1zz2ONkcXSFlomhrjaCg=
X-Received: by 2002:a17:90b:88e:b0:2d3:ce96:eb62 with SMTP id 98e67ed59e1d1-2d5ea3c1f77mr122345a91.38.1724185753889; Tue, 20 Aug 2024 13:29:13 -0700 (PDT)
MIME-Version: 1.0
References: <CAM5tNy7g+YCiiZQD7G6Ryv_Mo8N5BeRiqMP=224zPpEXa+Yi+A@mail.gmail.com> <a8b69cf0e7d33aed66ae125f40e377bccb4b6918.camel@poochiereds.net>
In-Reply-To: <a8b69cf0e7d33aed66ae125f40e377bccb4b6918.camel@poochiereds.net>
From: Rick Macklem <rick.macklem@gmail.com>
Date: Tue, 20 Aug 2024 13:29:03 -0700
Message-ID: <CAM5tNy4D1OB2f5B3h9M_56zWpC7+a3q_Y=c-kvNNnOG2c3QZxw@mail.gmail.com>
To: Jeff Layton <jlayton@poochiereds.net>
Content-Type: multipart/alternative; boundary="0000000000006333ec06202347b8"
Message-ID-Hash: FZ37KERNMSGJIRNSIKOV2NCNFPMN4K3W
X-Message-ID-Hash: FZ37KERNMSGJIRNSIKOV2NCNFPMN4K3W
X-MailFrom: rick.macklem@gmail.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-nfsv4.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: NFSv4 <nfsv4@ietf.org>
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operations
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/_nOBDF8eTbMNb2Ss4VjgLzzQTlA>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Owner: <mailto:nfsv4-owner@ietf.org>
List-Post: <mailto:nfsv4@ietf.org>
List-Subscribe: <mailto:nfsv4-join@ietf.org>
List-Unsubscribe: <mailto:nfsv4-leave@ietf.org>

On Tue, Aug 20, 2024 at 4:49 AM Jeff Layton <jlayton@poochiereds.net> wrote:

> On Thu, 2024-08-08 at 13:36 -0700, Rick Macklem wrote:
> > Hi,
> >
> > Over the years, I've run into cases where it would be really
> > nice to be able to perform multiple NFSv4 operations on a
> > file without other operations done by other clients
> > "gumming up the works" by changing the file's data/metadata
> > between the operations in the compound.
> >
> > So, what do others think about an extension to NFSv4.2 that
> > adds 2 new operations:
> >   MUTEX_BEGIN(CFH)
> >   MUTEX_END(CFH)
> > Both would use the CFH as argument, and no other client would
> > be allowed to perform operations on the CFH between the MUTEX_BEGIN
> > and MUTEX_END.
> >
> > I think there would need to be a couple of properties for these:
> > - There would need to be an "implicit" MUTEX_END when any operation
> >   between MUTEX_BEGIN and MUTEX_END returns a status other than NFS_OK.
> > - I think you would want a restriction of only one mutex for one CFH
> >   at a time in a compound. Without that, there could easily be deadlocks
> >   caused by other compounds acquiring mutexes on the same CFHs in a
> >   different order.
> > - Only one compound can hold a mutex on a given CFH at any time.
> > - MUTEX_BEGIN/MUTEX_END can only be used in compounds where SEQUENCE
> >   is the first operation.
> > - All mutexes are discarded by a server when it crashes/recovers.
> >   (Any time a client receives a NFS4ERR_STALE_CLIENTID.)
> >   That way RPC retries after a server reboot should work ok, I think?
> >
> > I am not sure what the semantics for reading data/metadata should be,
> > but I was thinking that would be allowed to be done by compounds for
> > other clients for the CFH. If a client wanted to serialize against
> > other compounds for the CFH, it could do a MUTEX_BEGIN/MUTEX_END.
> >
> > I see this as useful in a variety of ways:
> > - The example in the previous email of:
> >   MUTEX_BEGIN
> >   NVERIFY acl_truform ACL_MODEL_NFS4
> >   SETATTR posix_access_acl
> >   MUTEX_END
> > - Append writing:
> >   MUTEX_BEGIN
> >   VERIFY size "offset in WRITE that follows"
> >   WRITE "offset" etc
> >   MUTEX_END
> > - A bunch of cases where NFSv4 lacks the postop_attributes
> >   that were in NFSv3.
> >   MUTEX_BEGIN
> >   WRITE
> >   GETATTR size, change,..
> >   MUTEX_END
> >
> > So, what do others think?
> > (This was obviously not possible without sessions.)
> >
>
> It's an interesting idea. Quite a bit different than LOCK/LOCKU for
> sure.
>
> The difficulty here will be in defining what activity this mutex
> blocks. Just other MUTEX_BEGINs, or is this a mandatory lock for all
> access to that filehandle?

I think that only other MUTEX_BEGINs would not be particularly
useful, since there will be clients doing RPCs without any MUTEX_BEGINs
for a long time.

My original thinking was all operations in other compounds that use
the FH (either CFH or SAVEFH) would block until MUTEX_END.
(I'd even like to include NFSv3 RPCs, but that might be a bridge too far?)

Assuming the latter, would READ or GETATTR
> activity also be blocked? If not, then we'll have to have a list of
> operations that are blocked by the mutex, and new operations will have
> to have their semantics spelled out clearly vs. MUTEX_BEGIN.
>
I suppose only blocking operations that modify the file (where the change
attribute changes) is possible. I do think this makes it more complex
to implement (a lot more difficult to implement for FreeBSD, at least)
and I am not sure which would be preferred?

For example, suppose a compound does:
MUTEX_BEGIN
SETATTR (size of 0)
WRITE
GETATTR (size, change,...)
MUTEX_END
And then another compound does:
READ
GETATTR (size, change,...)
--> If the READ is not blocked until MUTEX_END, it might see EOF
    for the file size of 0 and then see a size of non-zero for
    the GETATTR.
    --> Could happen now (without MUTEX_BEGIN/MUTEX_END),
        so not a big deal, but it *might* be
        better to block the READ in this case?

As far as implementing it, I do not know what other server
setups are, but for FreeBSD, doing the "block all operations
on the FH by other RPCs" is by far the easiest.
- FreeBSD uses a shared/exclusive lock on vnodes (think inodes)
  during each operation in a compound for CFH (and SAVEFH, if it
  exists). All the MUTEX_BEGIN needs to do is acquire the exclusive
  vnode lock for CFH and keep it until MUTEX_END (normally, it is
  unlocked after the operation).
  --> Using a shared lock on the vnode would allow READ/GETATTR etc
      to be done by other compounds when between MUTEX_BEGIN/MUTEX_END
      but, at least for FreeBSD, a safe upgrade to an exclusive lock
      (required by operations like WRITE) cannot be done.
      --> So, implementing the "only block operations that change change"
          would require some other (rather odd) locking mechanism layered
          on top of the vnode locking. Doable, but not fun.

Then there is the issue of avoiding deadlock...
I think there are operations, like RENAME, which cannot be put between
MUTEX_BEGIN/MUTEX_END without risk of deadlock (if the RENAME is using
two different directories).
I'll admit I have not worked through too many of these yet, but it looks
like the simplest way to avoid deadlock problems is to not allow
MUTEX_BEGIN to be done on directory FHs.
Does this sound like too severe a restriction?

rick




> --
> Jeff Layton <jlayton@poochiereds.net>
>