[nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operations
Rick Macklem <rick.macklem@gmail.com> Tue, 20 August 2024 20:35 UTC
Return-Path: <rick.macklem@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CCE06C16943F for <nfsv4@ietfa.amsl.com>; Tue, 20 Aug 2024 13:35:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.108
X-Spam-Level:
X-Spam-Status: No, score=-2.108 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4heKrduAcq8n for <nfsv4@ietfa.amsl.com>; Tue, 20 Aug 2024 13:35:42 -0700 (PDT)
Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1F8ECC14CE42 for <nfsv4@ietf.org>; Tue, 20 Aug 2024 13:35:42 -0700 (PDT)
Received: by mail-pl1-x62f.google.com with SMTP id d9443c01a7336-201f2b7fe0dso41255265ad.1 for <nfsv4@ietf.org>; Tue, 20 Aug 2024 13:35:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724186141; x=1724790941; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=JB8878MRTxUScpC7k4rI7w0pGzYZ1M+z7h+lI6O0rmY=; b=akavMduEho7lx8ijXsUqTCq66VnTDN9DC1Vn2K5l1AXb7jlHCsF1670XpjVrvzi/r0 gDUGeLmLp5KTznAQQyl25nHDgIX4Pteg180Zn4W/EMrFaiZNg1ciCERoeohEMJEUzZjF zDFCygb58I2tJWY5aCa1Lfi5KtzShi1bkn1IVBQFNBm7ImU1MYBITFyyNMQ8+z8s97vp aR0CPldc6lWUpK1c1sNHcCAxVS0wWmDRJzaQjrK8PPxD4zE14tLdkrLn1YVWKe/T4Sr6 NuFcGdlJXG8sdYXDmJKUDijbOu8Lwsgl7jEuFmot+MZSZ0bpS/ASxv4uxWUIpVEgG4hx 0Gbg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724186141; x=1724790941; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=JB8878MRTxUScpC7k4rI7w0pGzYZ1M+z7h+lI6O0rmY=; b=shHi1rFJnm0BSHQ+HhGFO3oES6FLlvOySkhm1D7UfrtsB/dDEONhMmPlWEqLsBkQTd 90mFt9/h/HR8oWb96dnkjDFYPDSu0d9OVN5cGDJdUxtC9fErkq2nhbYSzsX1bYHFR+K2 2VWt8LhAxoPxkDPH93PbMch46itHhRjWmEOWAiD+M3OzTuJ15r0y8kIF779uQjZ/IrXc WP2TUsLT8qow8Hph6I2U8Un52/nLwE8exH8+R4lEU9dlRqRxopQ7dFWwoEYEUorpoGKQ SxQCKT2XeSP7fN4QaKFIqJan1kzOYGhECV//Up8jMTQ8Ps0O2WrrcFPVlzzRJ2p3ETfQ uKGg==
X-Gm-Message-State: AOJu0YyAvUElqguHrFjz+s5moXi+t0DCB2XdF+vlNe+Fw9mEx3TnFe/W mSKA+5Ikl9Jq4EShLQZhBSIjBJp2OiltFhTQ1st4DI6bfhtmi2xTS3eIKu2EgCRtndfwfoHUehc +0hZvQVPiztlwOXxuCvxTBA1kqQ==
X-Google-Smtp-Source: AGHT+IE/Ee6OiZmZ469jSGPZodEFSVKpVXgMZbEVuIGTHpK2QG8thBS2pxgTXv1cu2JQ2vXxOZwCHT88UseJbO9N1iE=
X-Received: by 2002:a17:903:234b:b0:1fb:1afb:b864 with SMTP id d9443c01a7336-20367af1f8cmr2182065ad.5.1724186141386; Tue, 20 Aug 2024 13:35:41 -0700 (PDT)
MIME-Version: 1.0
References: <CAM5tNy7g+YCiiZQD7G6Ryv_Mo8N5BeRiqMP=224zPpEXa+Yi+A@mail.gmail.com> <a8b69cf0e7d33aed66ae125f40e377bccb4b6918.camel@poochiereds.net> <CAM5tNy4D1OB2f5B3h9M_56zWpC7+a3q_Y=c-kvNNnOG2c3QZxw@mail.gmail.com>
In-Reply-To: <CAM5tNy4D1OB2f5B3h9M_56zWpC7+a3q_Y=c-kvNNnOG2c3QZxw@mail.gmail.com>
From: Rick Macklem <rick.macklem@gmail.com>
Date: Tue, 20 Aug 2024 13:35:31 -0700
Message-ID: <CAM5tNy6psWjCCHiFF2v2NNZxF9FiSOOsE4xyLeWi2DSCPfvqew@mail.gmail.com>
To: Jeff Layton <jlayton@poochiereds.net>
Content-Type: multipart/alternative; boundary="0000000000007bee000620235e5a"
Message-ID-Hash: WZ3LGBCGWIQDJL6JYZFEKL2MEH747RQW
X-Message-ID-Hash: WZ3LGBCGWIQDJL6JYZFEKL2MEH747RQW
X-MailFrom: rick.macklem@gmail.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-nfsv4.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: NFSv4 <nfsv4@ietf.org>
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operations
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/mjyQ8tV9sLN-vBS46BXdGZ2fLqk>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Owner: <mailto:nfsv4-owner@ietf.org>
List-Post: <mailto:nfsv4@ietf.org>
List-Subscribe: <mailto:nfsv4-join@ietf.org>
List-Unsubscribe: <mailto:nfsv4-leave@ietf.org>
On Tue, Aug 20, 2024 at 1:29 PM Rick Macklem <rick.macklem@gmail.com> wrote: > > > On Tue, Aug 20, 2024 at 4:49 AM Jeff Layton <jlayton@poochiereds.net> > wrote: > >> On Thu, 2024-08-08 at 13:36 -0700, Rick Macklem wrote: >> > Hi, >> > >> > Over the years, I've run into cases where it would be really >> > nice to be able to perform multiple NFSv4 operations on a >> > file without other operations done by other clients >> > "gumming up the works" by changing the file's data/metadata >> > between the operations in the compound. >> > >> > So, what do others think about an extension to NFSv4.2 that >> > adds 2 new operations: >> > MUTEX_BEGIN(CFH) >> > MUTEX_END(CFH) >> > Both would use the CFH as argument, and no other client would >> > be allowed to perform operations on the CFH between the MUTEX_BEGIN >> > and MUTEX_END. >> > >> > I think there would need to be a couple of properties for these: >> > - There would need to be an "implicit" MUTEX_END when any operation >> > between MUTEX_BEGIN and MUTEX_END returns a status other than NFS_OK. >> > - I think you would want a restriction of only one mutex for one CFH >> > at a time in a compound. Without that, there could easily be deadlocks >> > caused by other compounds acquiring mutexes on the same CFHs in a >> > different order. >> > - Only one compound can hold a mutex on a given CFH at any time. >> > - MUTEX_BEGIN/MUTEX_END can only be used in compounds where SEQUENCE >> > is the first operation. >> > - All mutexes are discarded by a server when it crashes/recovers. >> > (Any time a client receives a NFS4ERR_STALE_CLIENTID.) >> > That way RPC retries after a server reboot should work ok, I think? >> > >> > I am not sure what the semantics for reading data/metadata should be, >> > but I was thinking that would be allowed to be done by compounds for >> > other clients for the CFH. If a client wanted to serialize against >> > other compounds for the CFH, it could do a MUTEX_BEGIN/MUTEX_END. >> > >> > I see this as useful in a variety of ways: >> > - The example in the previous email of: >> > MUTEX_BEGIN >> > NVERIFY acl_truform ACL_MODEL_NFS4 >> > SETATTR posix_access_acl >> > MUTEX_END >> > - Append writing: >> > MUTEX_BEGIN >> > VERIFY size "offset in WRITE that follows" >> > WRITE "offset" etc >> > MUTEX_END >> > - A bunch of cases where NFSv4 lacks the postop_attributes >> > that were in NFSv3. >> > MUTEX_BEGIN >> > WRITE >> > GETATTR size, change,.. >> > MUTEX_END >> > >> > So, what do others think? >> > (This was obviously not possible without sessions.) >> > >> >> It's an interesting idea. Quite a bit different than LOCK/LOCKU for >> sure. >> >> The difficulty here will be in defining what activity this mutex >> blocks. Just other MUTEX_BEGINs, or is this a mandatory lock for all >> access to that filehandle? > > I think that only other MUTEX_BEGINs would not be particularly > useful, since there will be clients doing RPCs without any MUTEX_BEGINs > for a long time. > > My original thinking was all operations in other compounds that use > the FH (either CFH or SAVEFH) would block until MUTEX_END. > (I'd even like to include NFSv3 RPCs, but that might be a bridge too far?) > > Assuming the latter, would READ or GETATTR >> activity also be blocked? If not, then we'll have to have a list of >> operations that are blocked by the mutex, and new operations will have >> to have their semantics spelled out clearly vs. MUTEX_BEGIN. >> > I suppose only blocking operations that modify the file (where the change > attribute changes) is possible. I do think this makes it more complex > to implement (a lot more difficult to implement for FreeBSD, at least) > and I am not sure which would be preferred? > > For example, suppose a compound does: > MUTEX_BEGIN > SETATTR (size of 0) > WRITE > GETATTR (size, change,...) > MUTEX_END > And then another compound does: > READ > GETATTR (size, change,...) > --> If the READ is not blocked until MUTEX_END, it might see EOF > for the file size of 0 and then see a size of non-zero for > the GETATTR. > --> Could happen now (without MUTEX_BEGIN/MUTEX_END), > so not a big deal, but it *might* be > better to block the READ in this case? > > As far as implementing it, I do not know what other server > setups are, but for FreeBSD, doing the "block all operations > on the FH by other RPCs" is by far the easiest. > - FreeBSD uses a shared/exclusive lock on vnodes (think inodes) > during each operation in a compound for CFH (and SAVEFH, if it > exists). All the MUTEX_BEGIN needs to do is acquire the exclusive > vnode lock for CFH and keep it until MUTEX_END (normally, it is > unlocked after the operation). > --> Using a shared lock on the vnode would allow READ/GETATTR etc > to be done by other compounds when between MUTEX_BEGIN/MUTEX_END > but, at least for FreeBSD, a safe upgrade to an exclusive lock > (required by operations like WRITE) cannot be done. > --> So, implementing the "only block operations that change change" > would require some other (rather odd) locking mechanism layered > on top of the vnode locking. Doable, but not fun. > > Then there is the issue of avoiding deadlock... > I think there are operations, like RENAME, which cannot be put between > MUTEX_BEGIN/MUTEX_END without risk of deadlock (if the RENAME is using > two different directories). > I'll admit I have not worked through too many of these yet, but it looks > like the simplest way to avoid deadlock problems is to not allow > MUTEX_BEGIN to be done on directory FHs. > Does this sound like too severe a restriction? > Oh, and I think operations like PUTFH, which changes the CFH need to be avoided between MUTEX_BEGIN/MUTEX_END to avoid deadlock, as well. If a compound does: MUTEX_BEGIN (FH for foo) PUTFH (FH for bar) GETATTR MUTEX_END and another compound does: MUTEX_BEGIN (FH for bar) PUTFH (FH for foo) GETATTR MUTEX_END --> They could deadlock. Does not allowing PUTFH/PUTROOTFH/PUTPUBFH/RESTOREFH between MUTEX_BEGIN and MUTEX_END sound like too severe a restriction? rick > rick > > > > >> -- >> Jeff Layton <jlayton@poochiereds.net> >> >
- [nfsv4] RFC: new MUTEX_BEGIN/MUTEX_END operations Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Tom Haynes
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Pali Rohár
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Chuck Lever III
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Jeff Layton
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Pali Rohár
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Jeff Layton
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Jeff Layton
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Jeff Layton
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Pali Rohár
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Trond Myklebust
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem