[nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operations
Rick Macklem <rick.macklem@gmail.com> Tue, 20 August 2024 22:43 UTC
Return-Path: <rick.macklem@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A735AC18DB80 for <nfsv4@ietfa.amsl.com>; Tue, 20 Aug 2024 15:43:16 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.107
X-Spam-Level:
X-Spam-Status: No, score=-7.107 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ct-JVqLacw3v for <nfsv4@ietfa.amsl.com>; Tue, 20 Aug 2024 15:43:15 -0700 (PDT)
Received: from mail-pl1-x636.google.com (mail-pl1-x636.google.com [IPv6:2607:f8b0:4864:20::636]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D15C8C1840F4 for <nfsv4@ietf.org>; Tue, 20 Aug 2024 15:43:15 -0700 (PDT)
Received: by mail-pl1-x636.google.com with SMTP id d9443c01a7336-2020e83eca1so40774775ad.2 for <nfsv4@ietf.org>; Tue, 20 Aug 2024 15:43:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724193795; x=1724798595; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=bUGmnGabx36LCiRXBv/5aOpPKC3Lc4oXiu5zseUG6A0=; b=gQEsE2PkaR+tUcJHhmlGs9udFV5unlyz40nCvwmvZinhBGGNYwyeD1ZX9/QKEvaq6F pk60A2SVEgY2/C2eBlWegjafs9bCYZAtsiLUGZ+mJIF8nMrO7oNPR4s9WbQLFvWJBo42 c2To/mZaHS5Q9ycsFMD1eEwo4+EU8gx2uGrzEHKkDuT9DpE6y5attWZ8WxD97vX72fM/ etcaYKGjzlLGZz1b5iVOXfqf3PA5tRMpMDkFn+u62nYZxErXwItKCVSlNZQKeI+J4jux 9B2nlV+vNiasdXFAU9Ubp6NeBftw2Fd+7PA7nW1bJ6DkgVE5SAfbloH/Lcu6JRxfei7y sylQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724193795; x=1724798595; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=bUGmnGabx36LCiRXBv/5aOpPKC3Lc4oXiu5zseUG6A0=; b=f+5djt6FJ1Zol7jestqZvGTBzjaunoBVxpW3Pwl2AnYcoD0r2xT81vqn+zHcUa7Pyu jmzwYmVlyh/IglFy7exaxJQ6yqR9BTF/699puH4N5f7SU1eQxhmqlUEeIfw/uT/lf9g0 +LxyxtbXHlPGDDx2ltJi51e1Bcvrgm1d9tkR6zuODmixsT7mJdLjH8bfiLMrMOR3vE7d McwHA2OzDs/MGXZ6+0nuGkKZ3jsmranwLgmcRwnkuXYGxI0UJ8O4+ysx5rFg9nWb4Q2H PaBh9CiwkUQP2zxXMP/yCT+AfIVpOAbuuuVKpRUwjcCK5boLglEr5GMxsE+EthdmGCSz vwfw==
X-Gm-Message-State: AOJu0Ywn7ViLXnTEuUM3O4UvnsGTJUXOVZ10o0hWfWSge3rOO/QMOPCE vNC8H90PX+xbXCb3Epbg0MjOL+/ukfijp9rEZ31aYk0i2/KAKynHuX/gqRu3raYHHQS7qYaMRfH w42tZpvFILysgsAguL4jhqkKLCk86
X-Google-Smtp-Source: AGHT+IHn/WZbbWUne7H38GvBD7uamT9ZM4kyh6efHTW2anNW4r0D6zuuWnm5iGDbz11JeyMkwXPxY2w5mhxffwG5xXs=
X-Received: by 2002:a17:902:c946:b0:201:efe7:cb03 with SMTP id d9443c01a7336-203681e4522mr4565765ad.48.1724193794745; Tue, 20 Aug 2024 15:43:14 -0700 (PDT)
MIME-Version: 1.0
References: <CAM5tNy7g+YCiiZQD7G6Ryv_Mo8N5BeRiqMP=224zPpEXa+Yi+A@mail.gmail.com> <a8b69cf0e7d33aed66ae125f40e377bccb4b6918.camel@poochiereds.net> <CAM5tNy4D1OB2f5B3h9M_56zWpC7+a3q_Y=c-kvNNnOG2c3QZxw@mail.gmail.com> <CAM5tNy6psWjCCHiFF2v2NNZxF9FiSOOsE4xyLeWi2DSCPfvqew@mail.gmail.com> <a16c54cf2e808b8f60f1f85c960794f13e0ec9d5.camel@poochiereds.net>
In-Reply-To: <a16c54cf2e808b8f60f1f85c960794f13e0ec9d5.camel@poochiereds.net>
From: Rick Macklem <rick.macklem@gmail.com>
Date: Tue, 20 Aug 2024 15:43:04 -0700
Message-ID: <CAM5tNy7UaVroWQwq5N3LH+3zg_1vwFCLzAi+LpXa9_Yme_jeVA@mail.gmail.com>
To: Jeff Layton <jlayton@poochiereds.net>
Content-Type: multipart/alternative; boundary="000000000000a8f04006202526c0"
Message-ID-Hash: BCCICSIDUMTFUWBTBNZVFEBTV2FVK5LD
X-Message-ID-Hash: BCCICSIDUMTFUWBTBNZVFEBTV2FVK5LD
X-MailFrom: rick.macklem@gmail.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-nfsv4.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: NFSv4 <nfsv4@ietf.org>
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operations
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/wYa2om_SglIg1mm7ZyH0HH_560E>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Owner: <mailto:nfsv4-owner@ietf.org>
List-Post: <mailto:nfsv4@ietf.org>
List-Subscribe: <mailto:nfsv4-join@ietf.org>
List-Unsubscribe: <mailto:nfsv4-leave@ietf.org>
On Tue, Aug 20, 2024 at 2:57 PM Jeff Layton <jlayton@poochiereds.net> wrote: > On Tue, 2024-08-20 at 13:35 -0700, Rick Macklem wrote: > > > > > > On Tue, Aug 20, 2024 at 1:29 PM Rick Macklem <rick.macklem@gmail.com> > wrote: > > > > > > > > > On Tue, Aug 20, 2024 at 4:49 AM Jeff Layton <jlayton@poochiereds.net> > wrote: > > > > On Thu, 2024-08-08 at 13:36 -0700, Rick Macklem wrote: > > > > > Hi, > > > > > > > > > > Over the years, I've run into cases where it would be really > > > > > nice to be able to perform multiple NFSv4 operations on a > > > > > file without other operations done by other clients > > > > > "gumming up the works" by changing the file's data/metadata > > > > > between the operations in the compound. > > > > > > > > > > So, what do others think about an extension to NFSv4.2 that > > > > > adds 2 new operations: > > > > > MUTEX_BEGIN(CFH) > > > > > MUTEX_END(CFH) > > > > > Both would use the CFH as argument, and no other client would > > > > > be allowed to perform operations on the CFH between the MUTEX_BEGIN > > > > > and MUTEX_END. > > > > > > > > > > I think there would need to be a couple of properties for these: > > > > > - There would need to be an "implicit" MUTEX_END when any operation > > > > > between MUTEX_BEGIN and MUTEX_END returns a status other than > NFS_OK. > > > > > - I think you would want a restriction of only one mutex for one > CFH > > > > > at a time in a compound. Without that, there could easily be > deadlocks > > > > > caused by other compounds acquiring mutexes on the same CFHs in a > > > > > different order. > > > > > - Only one compound can hold a mutex on a given CFH at any time. > > > > > - MUTEX_BEGIN/MUTEX_END can only be used in compounds where > SEQUENCE > > > > > is the first operation. > > > > > - All mutexes are discarded by a server when it crashes/recovers. > > > > > (Any time a client receives a NFS4ERR_STALE_CLIENTID.) > > > > > That way RPC retries after a server reboot should work ok, I > think? > > > > > > > > > > I am not sure what the semantics for reading data/metadata should > be, > > > > > but I was thinking that would be allowed to be done by compounds > for > > > > > other clients for the CFH. If a client wanted to serialize against > > > > > other compounds for the CFH, it could do a MUTEX_BEGIN/MUTEX_END. > > > > > > > > > > I see this as useful in a variety of ways: > > > > > - The example in the previous email of: > > > > > MUTEX_BEGIN > > > > > NVERIFY acl_truform ACL_MODEL_NFS4 > > > > > SETATTR posix_access_acl > > > > > MUTEX_END > > > > > - Append writing: > > > > > MUTEX_BEGIN > > > > > VERIFY size "offset in WRITE that follows" > > > > > WRITE "offset" etc > > > > > MUTEX_END > > > > > - A bunch of cases where NFSv4 lacks the postop_attributes > > > > > that were in NFSv3. > > > > > MUTEX_BEGIN > > > > > WRITE > > > > > GETATTR size, change,.. > > > > > MUTEX_END > > > > > > > > > > So, what do others think? > > > > > (This was obviously not possible without sessions.) > > > > > > > > > > > > > It's an interesting idea. Quite a bit different than LOCK/LOCKU for > > > > sure. > > > > > > > > The difficulty here will be in defining what activity this mutex > > > > blocks. Just other MUTEX_BEGINs, or is this a mandatory lock for all > > > > access to that filehandle? > > > > > > > > > > I think that only other MUTEX_BEGINs would not be particularly > > > useful, since there will be clients doing RPCs without any MUTEX_BEGINs > > > for a long time. > > > > > > My original thinking was all operations in other compounds that use > > > the FH (either CFH or SAVEFH) would block until MUTEX_END. > > > (I'd even like to include NFSv3 RPCs, but that might be a bridge too > far?) > > > > > > > Assuming the latter, would READ or GETATTR > > > > activity also be blocked? If not, then we'll have to have a list of > > > > operations that are blocked by the mutex, and new operations will > have > > > > to have their semantics spelled out clearly vs. MUTEX_BEGIN. > > > > > > > > > > I suppose only blocking operations that modify the file (where the > change > > > attribute changes) is possible. I do think this makes it more complex > > > to implement (a lot more difficult to implement for FreeBSD, at least) > > > and I am not sure which would be preferred? > > > > > > For example, suppose a compound does: > > > MUTEX_BEGIN > > > SETATTR (size of 0) > > > WRITE > > > GETATTR (size, change,...) > > > MUTEX_END > > > And then another compound does: > > > READ > > > GETATTR (size, change,...) > > > --> If the READ is not blocked until MUTEX_END, it might see EOF > > > for the file size of 0 and then see a size of non-zero for > > > the GETATTR. > > > --> Could happen now (without MUTEX_BEGIN/MUTEX_END), > > > so not a big deal, but it *might* be > > > better to block the READ in this case? > > > > > > As far as implementing it, I do not know what other server > > > setups are, but for FreeBSD, doing the "block all operations > > > on the FH by other RPCs" is by far the easiest. > > > - FreeBSD uses a shared/exclusive lock on vnodes (think inodes) > > > during each operation in a compound for CFH (and SAVEFH, if it > > > exists). All the MUTEX_BEGIN needs to do is acquire the exclusive > > > vnode lock for CFH and keep it until MUTEX_END (normally, it is > > > unlocked after the operation). > > > --> Using a shared lock on the vnode would allow READ/GETATTR etc > > > to be done by other compounds when between MUTEX_BEGIN/MUTEX_END > > > but, at least for FreeBSD, a safe upgrade to an exclusive lock > > > (required by operations like WRITE) cannot be done. > > > --> So, implementing the "only block operations that change > change" > > > would require some other (rather odd) locking mechanism > layered > > > on top of the vnode locking. Doable, but not fun. > > > > > > Then there is the issue of avoiding deadlock... > > > I think there are operations, like RENAME, which cannot be put between > > > MUTEX_BEGIN/MUTEX_END without risk of deadlock (if the RENAME is using > > > two different directories). > > > I'll admit I have not worked through too many of these yet, but it > looks > > > like the simplest way to avoid deadlock problems is to not allow > > > MUTEX_BEGIN to be done on directory FHs. > > > Does this sound like too severe a restriction? > > > > > > > > > Oh, and I think operations like PUTFH, which changes the CFH need to be > > avoided between MUTEX_BEGIN/MUTEX_END to avoid deadlock, as well. > > If a compound does: > > MUTEX_BEGIN (FH for foo) > > PUTFH (FH for bar) > > GETATTR > > MUTEX_END > > and another compound does: > > MUTEX_BEGIN (FH for bar) > > PUTFH (FH for foo) > > GETATTR > > MUTEX_END > > --> They could deadlock. > > Does not allowing PUTFH/PUTROOTFH/PUTPUBFH/RESTOREFH between MUTEX_BEGIN > > and MUTEX_END sound like too severe a restriction? > > I don't think so, but there are other operations that can change the > current_fh too -- OPEN, for instance. If I do a MUTEX_BEGIN on a > directory and then do an OPEN (by name) in it, what happens to the > mutex when the current_fh changes? > Yes, I think that any operation that changes CFH or uses both CFH and SAVED_FH as arguments introduces the risk of deadlock. > As far as deadlocks go, the simplest way to avoid them would be to just > deny any nesting whatsoever, and only allow locking a single mutex in > the compound. > If operations as above are not allowed between MUTEX_BEGIN/MUTEX_END plus no nesting (I think I have mentioned that before), then I believe deadlocks are avoied. > We'd probably have to implement this as a per-fh thing in the Linux > server too. One thing thing I've been wanting to build is a way to do a > gated write using the change attr. This could make that possible: > > COMPOUND1: > GETATTR (get the change attr) > READ > > (modify the data in memory) > > COMPOUND2: > MUTEX_BEGIN > VERIFY > WRITE > MUTEX_END > > ...if the write fails, do the whole thing again. > > Synchronized, multi-host I/O without using file locks! > Yep, and you can do an append write using the same COMPOUND2, but checking size == write-offset in the VERIFY. Another one I am interested in is... I really like Thomas's Layout_WCC, but I do like the fact that it only works for NFSv3 DSs. This would allow a NFSv4 DS (assuming the DS supports MUTEX_BEGIN/MUTEX_END) to do the same thing, I think? rick -- > Jeff Layton <jlayton@poochiereds.net> >
- [nfsv4] RFC: new MUTEX_BEGIN/MUTEX_END operations Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Tom Haynes
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Pali Rohár
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Chuck Lever III
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Jeff Layton
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Pali Rohár
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Jeff Layton
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Jeff Layton
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Jeff Layton
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Pali Rohár
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Trond Myklebust
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem