[nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operations

Rick Macklem <rick.macklem@gmail.com> Tue, 20 August 2024 20:35 UTC

Return-Path: <rick.macklem@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CCE06C16943F for <nfsv4@ietfa.amsl.com>; Tue, 20 Aug 2024 13:35:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.108
X-Spam-Level:
X-Spam-Status: No, score=-2.108 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4heKrduAcq8n for <nfsv4@ietfa.amsl.com>; Tue, 20 Aug 2024 13:35:42 -0700 (PDT)
Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1F8ECC14CE42 for <nfsv4@ietf.org>; Tue, 20 Aug 2024 13:35:42 -0700 (PDT)
Received: by mail-pl1-x62f.google.com with SMTP id d9443c01a7336-201f2b7fe0dso41255265ad.1 for <nfsv4@ietf.org>; Tue, 20 Aug 2024 13:35:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724186141; x=1724790941; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=JB8878MRTxUScpC7k4rI7w0pGzYZ1M+z7h+lI6O0rmY=; b=akavMduEho7lx8ijXsUqTCq66VnTDN9DC1Vn2K5l1AXb7jlHCsF1670XpjVrvzi/r0 gDUGeLmLp5KTznAQQyl25nHDgIX4Pteg180Zn4W/EMrFaiZNg1ciCERoeohEMJEUzZjF zDFCygb58I2tJWY5aCa1Lfi5KtzShi1bkn1IVBQFNBm7ImU1MYBITFyyNMQ8+z8s97vp aR0CPldc6lWUpK1c1sNHcCAxVS0wWmDRJzaQjrK8PPxD4zE14tLdkrLn1YVWKe/T4Sr6 NuFcGdlJXG8sdYXDmJKUDijbOu8Lwsgl7jEuFmot+MZSZ0bpS/ASxv4uxWUIpVEgG4hx 0Gbg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724186141; x=1724790941; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=JB8878MRTxUScpC7k4rI7w0pGzYZ1M+z7h+lI6O0rmY=; b=shHi1rFJnm0BSHQ+HhGFO3oES6FLlvOySkhm1D7UfrtsB/dDEONhMmPlWEqLsBkQTd 90mFt9/h/HR8oWb96dnkjDFYPDSu0d9OVN5cGDJdUxtC9fErkq2nhbYSzsX1bYHFR+K2 2VWt8LhAxoPxkDPH93PbMch46itHhRjWmEOWAiD+M3OzTuJ15r0y8kIF779uQjZ/IrXc WP2TUsLT8qow8Hph6I2U8Un52/nLwE8exH8+R4lEU9dlRqRxopQ7dFWwoEYEUorpoGKQ SxQCKT2XeSP7fN4QaKFIqJan1kzOYGhECV//Up8jMTQ8Ps0O2WrrcFPVlzzRJ2p3ETfQ uKGg==
X-Gm-Message-State: AOJu0YyAvUElqguHrFjz+s5moXi+t0DCB2XdF+vlNe+Fw9mEx3TnFe/W mSKA+5Ikl9Jq4EShLQZhBSIjBJp2OiltFhTQ1st4DI6bfhtmi2xTS3eIKu2EgCRtndfwfoHUehc +0hZvQVPiztlwOXxuCvxTBA1kqQ==
X-Google-Smtp-Source: AGHT+IE/Ee6OiZmZ469jSGPZodEFSVKpVXgMZbEVuIGTHpK2QG8thBS2pxgTXv1cu2JQ2vXxOZwCHT88UseJbO9N1iE=
X-Received: by 2002:a17:903:234b:b0:1fb:1afb:b864 with SMTP id d9443c01a7336-20367af1f8cmr2182065ad.5.1724186141386; Tue, 20 Aug 2024 13:35:41 -0700 (PDT)
MIME-Version: 1.0
References: <CAM5tNy7g+YCiiZQD7G6Ryv_Mo8N5BeRiqMP=224zPpEXa+Yi+A@mail.gmail.com> <a8b69cf0e7d33aed66ae125f40e377bccb4b6918.camel@poochiereds.net> <CAM5tNy4D1OB2f5B3h9M_56zWpC7+a3q_Y=c-kvNNnOG2c3QZxw@mail.gmail.com>
In-Reply-To: <CAM5tNy4D1OB2f5B3h9M_56zWpC7+a3q_Y=c-kvNNnOG2c3QZxw@mail.gmail.com>
From: Rick Macklem <rick.macklem@gmail.com>
Date: Tue, 20 Aug 2024 13:35:31 -0700
Message-ID: <CAM5tNy6psWjCCHiFF2v2NNZxF9FiSOOsE4xyLeWi2DSCPfvqew@mail.gmail.com>
To: Jeff Layton <jlayton@poochiereds.net>
Content-Type: multipart/alternative; boundary="0000000000007bee000620235e5a"
Message-ID-Hash: WZ3LGBCGWIQDJL6JYZFEKL2MEH747RQW
X-Message-ID-Hash: WZ3LGBCGWIQDJL6JYZFEKL2MEH747RQW
X-MailFrom: rick.macklem@gmail.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-nfsv4.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: NFSv4 <nfsv4@ietf.org>
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operations
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/mjyQ8tV9sLN-vBS46BXdGZ2fLqk>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Owner: <mailto:nfsv4-owner@ietf.org>
List-Post: <mailto:nfsv4@ietf.org>
List-Subscribe: <mailto:nfsv4-join@ietf.org>
List-Unsubscribe: <mailto:nfsv4-leave@ietf.org>

On Tue, Aug 20, 2024 at 1:29 PM Rick Macklem <rick.macklem@gmail.com> wrote:

>
>
> On Tue, Aug 20, 2024 at 4:49 AM Jeff Layton <jlayton@poochiereds.net>
> wrote:
>
>> On Thu, 2024-08-08 at 13:36 -0700, Rick Macklem wrote:
>> > Hi,
>> >
>> > Over the years, I've run into cases where it would be really
>> > nice to be able to perform multiple NFSv4 operations on a
>> > file without other operations done by other clients
>> > "gumming up the works" by changing the file's data/metadata
>> > between the operations in the compound.
>> >
>> > So, what do others think about an extension to NFSv4.2 that
>> > adds 2 new operations:
>> >   MUTEX_BEGIN(CFH)
>> >   MUTEX_END(CFH)
>> > Both would use the CFH as argument, and no other client would
>> > be allowed to perform operations on the CFH between the MUTEX_BEGIN
>> > and MUTEX_END.
>> >
>> > I think there would need to be a couple of properties for these:
>> > - There would need to be an "implicit" MUTEX_END when any operation
>> >   between MUTEX_BEGIN and MUTEX_END returns a status other than NFS_OK.
>> > - I think you would want a restriction of only one mutex for one CFH
>> >   at a time in a compound. Without that, there could easily be deadlocks
>> >   caused by other compounds acquiring mutexes on the same CFHs in a
>> >   different order.
>> > - Only one compound can hold a mutex on a given CFH at any time.
>> > - MUTEX_BEGIN/MUTEX_END can only be used in compounds where SEQUENCE
>> >   is the first operation.
>> > - All mutexes are discarded by a server when it crashes/recovers.
>> >   (Any time a client receives a NFS4ERR_STALE_CLIENTID.)
>> >   That way RPC retries after a server reboot should work ok, I think?
>> >
>> > I am not sure what the semantics for reading data/metadata should be,
>> > but I was thinking that would be allowed to be done by compounds for
>> > other clients for the CFH. If a client wanted to serialize against
>> > other compounds for the CFH, it could do a MUTEX_BEGIN/MUTEX_END.
>> >
>> > I see this as useful in a variety of ways:
>> > - The example in the previous email of:
>> >   MUTEX_BEGIN
>> >   NVERIFY acl_truform ACL_MODEL_NFS4
>> >   SETATTR posix_access_acl
>> >   MUTEX_END
>> > - Append writing:
>> >   MUTEX_BEGIN
>> >   VERIFY size "offset in WRITE that follows"
>> >   WRITE "offset" etc
>> >   MUTEX_END
>> > - A bunch of cases where NFSv4 lacks the postop_attributes
>> >   that were in NFSv3.
>> >   MUTEX_BEGIN
>> >   WRITE
>> >   GETATTR size, change,..
>> >   MUTEX_END
>> >
>> > So, what do others think?
>> > (This was obviously not possible without sessions.)
>> >
>>
>> It's an interesting idea. Quite a bit different than LOCK/LOCKU for
>> sure.
>>
>> The difficulty here will be in defining what activity this mutex
>> blocks. Just other MUTEX_BEGINs, or is this a mandatory lock for all
>> access to that filehandle?
>
> I think that only other MUTEX_BEGINs would not be particularly
> useful, since there will be clients doing RPCs without any MUTEX_BEGINs
> for a long time.
>
> My original thinking was all operations in other compounds that use
> the FH (either CFH or SAVEFH) would block until MUTEX_END.
> (I'd even like to include NFSv3 RPCs, but that might be a bridge too far?)
>
> Assuming the latter, would READ or GETATTR
>> activity also be blocked? If not, then we'll have to have a list of
>> operations that are blocked by the mutex, and new operations will have
>> to have their semantics spelled out clearly vs. MUTEX_BEGIN.
>>
> I suppose only blocking operations that modify the file (where the change
> attribute changes) is possible. I do think this makes it more complex
> to implement (a lot more difficult to implement for FreeBSD, at least)
> and I am not sure which would be preferred?
>
> For example, suppose a compound does:
> MUTEX_BEGIN
> SETATTR (size of 0)
> WRITE
> GETATTR (size, change,...)
> MUTEX_END
> And then another compound does:
> READ
> GETATTR (size, change,...)
> --> If the READ is not blocked until MUTEX_END, it might see EOF
>     for the file size of 0 and then see a size of non-zero for
>     the GETATTR.
>     --> Could happen now (without MUTEX_BEGIN/MUTEX_END),
>         so not a big deal, but it *might* be
>         better to block the READ in this case?
>
> As far as implementing it, I do not know what other server
> setups are, but for FreeBSD, doing the "block all operations
> on the FH by other RPCs" is by far the easiest.
> - FreeBSD uses a shared/exclusive lock on vnodes (think inodes)
>   during each operation in a compound for CFH (and SAVEFH, if it
>   exists). All the MUTEX_BEGIN needs to do is acquire the exclusive
>   vnode lock for CFH and keep it until MUTEX_END (normally, it is
>   unlocked after the operation).
>   --> Using a shared lock on the vnode would allow READ/GETATTR etc
>       to be done by other compounds when between MUTEX_BEGIN/MUTEX_END
>       but, at least for FreeBSD, a safe upgrade to an exclusive lock
>       (required by operations like WRITE) cannot be done.
>       --> So, implementing the "only block operations that change change"
>           would require some other (rather odd) locking mechanism layered
>           on top of the vnode locking. Doable, but not fun.
>
> Then there is the issue of avoiding deadlock...
> I think there are operations, like RENAME, which cannot be put between
> MUTEX_BEGIN/MUTEX_END without risk of deadlock (if the RENAME is using
> two different directories).
> I'll admit I have not worked through too many of these yet, but it looks
> like the simplest way to avoid deadlock problems is to not allow
> MUTEX_BEGIN to be done on directory FHs.
> Does this sound like too severe a restriction?
>

Oh, and I think operations like PUTFH, which changes the CFH need to be
avoided between MUTEX_BEGIN/MUTEX_END to avoid deadlock, as well.
If a compound does:
MUTEX_BEGIN (FH for foo)
PUTFH (FH for bar)
GETATTR
MUTEX_END
and another compound does:
MUTEX_BEGIN (FH for bar)
PUTFH (FH for foo)
GETATTR
MUTEX_END
--> They could deadlock.
Does not allowing PUTFH/PUTROOTFH/PUTPUBFH/RESTOREFH between MUTEX_BEGIN
and MUTEX_END sound like too severe a restriction?

rick


> rick
>
>
>
>
>> --
>> Jeff Layton <jlayton@poochiereds.net>
>>
>