[nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operations

Rick Macklem <rick.macklem@gmail.com> Thu, 08 August 2024 22:52 UTC

Return-Path: <rick.macklem@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5694FC14F70D for <nfsv4@ietfa.amsl.com>; Thu, 8 Aug 2024 15:52:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.108
X-Spam-Level:
X-Spam-Status: No, score=-2.108 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HJ7dtkM654xF for <nfsv4@ietfa.amsl.com>; Thu, 8 Aug 2024 15:52:09 -0700 (PDT)
Received: from mail-pg1-x532.google.com (mail-pg1-x532.google.com [IPv6:2607:f8b0:4864:20::532]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BD608C14F706 for <nfsv4@ietf.org>; Thu, 8 Aug 2024 15:52:09 -0700 (PDT)
Received: by mail-pg1-x532.google.com with SMTP id 41be03b00d2f7-7163489149eso1054650a12.1 for <nfsv4@ietf.org>; Thu, 08 Aug 2024 15:52:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723157529; x=1723762329; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=qA6U+h+O3Dp3I69X9NAXrPWpx5LhCmsbLjJ+wYbWQ8M=; b=K3JyoUyKfeKCX0NyXefGexX9iKD4r0SUCB+ag6ES0IufBnGwk7MB1PLFFsbQMa2Cqw FclkB/8dM5tR4+KXMXK8IWlLri4riYrz3cdcj2Rw7n1MZh5dEo/GPse+BoHU5020xqfJ XjVfd7Q5pO6mQfRc3jR2by1dSN/eNYf1ownmLEAO7W58dvCPL+QATR8JOPmYKE2IhxuY dP/tkpGxHW2Z+mVck4FmTKP43/O6EeCcXQfeakzCR8vorH8b59P+8+TFAvKhsZhmsAn+ 4ZQnDLRcrSbFR28rG0IZWUEjyAqAV7QbmH0NxY3UvvUWm8BLIe+i0agRhbo5c3jLdZYV brIQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723157529; x=1723762329; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=qA6U+h+O3Dp3I69X9NAXrPWpx5LhCmsbLjJ+wYbWQ8M=; b=O5vmN8hftwAzkE2h/8ZyYKYqBSZa/9ha1E9+bUW0m09YGN7Lr/gkInSk6SoTmYCK4r 5Ek4Z+4xasFkvtnBQv/gfkKcvZIJNGsLP+1+Gx5bFNZ29K7BrmyXNqcbptTI+eBGBgRN 18YwMZ/sfBUI9fcl7yN2gHaAiQubOQiUx67ihYjyz++r/yRLZl5z6B3MxhlivbQ+AWol yZB8PIqvsSrYDU92UbovLw7zMjPFq1O7hoQEKkK+6w/JnZa5WXwLoPg+uI4bQUlVtbmf C71eDT2iVY6yWlhTWLqqzNqTbDeqHXFryBpLAWE3VV7G7CoQ/0Qmugt+8GFjN7ms/HD0 NQNQ==
X-Gm-Message-State: AOJu0YzcknpCYfH5xNQ0rkq86s8xDQPXeSJ6dREllxFQorj88VrryQbR cawqT5pE7rlOeH+pOuV6E1vg8pewQFb2dwIj2nPVYAWH+scEAtJJytUKE38Tz6+ud4qonz1Pdxy vCoy32bAg1Y+zRQSla+iT7ZEActtH
X-Google-Smtp-Source: AGHT+IHg2z9ZaOkKx5CD3V9ZLazAxZ6ya9Qn4Ym9xSqjcCePsSz0qOsSVwUTiV4bCGCleQwGpZEd7vPoNaOtbymvtms=
X-Received: by 2002:a17:90b:3c3:b0:2c9:e0e3:e507 with SMTP id 98e67ed59e1d1-2d1c307fb95mr4246591a91.0.1723157528905; Thu, 08 Aug 2024 15:52:08 -0700 (PDT)
MIME-Version: 1.0
References: <CAM5tNy7g+YCiiZQD7G6Ryv_Mo8N5BeRiqMP=224zPpEXa+Yi+A@mail.gmail.com> <CAM5tNy7qO-b8HB+0xwLbmed6oMbWVO5xksCyn5VJnxy+D9-kwg@mail.gmail.com> <CAMa=BDqFcYGVmn131M9Wk_6amtC6NUYOJjds5yrGm6sTxTO8sg@mail.gmail.com>
In-Reply-To: <CAMa=BDqFcYGVmn131M9Wk_6amtC6NUYOJjds5yrGm6sTxTO8sg@mail.gmail.com>
From: Rick Macklem <rick.macklem@gmail.com>
Date: Thu, 08 Aug 2024 15:51:57 -0700
Message-ID: <CAM5tNy4apzyrUJYvLfzZ4r72Fgya86uNd7ZaTSj9h4_PcuU1Eg@mail.gmail.com>
To: Tom Haynes <loghyr@gmail.com>
Content-Type: multipart/alternative; boundary="000000000000670f98061f33e04b"
Message-ID-Hash: 56Z4WTGFOPJCLZ7P2U3FYBQ56HU4SUGG
X-Message-ID-Hash: 56Z4WTGFOPJCLZ7P2U3FYBQ56HU4SUGG
X-MailFrom: rick.macklem@gmail.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-nfsv4.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: NFSv4 <nfsv4@ietf.org>
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operations
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/y4CjtFsWPihxs6tB79JIPBEpAYw>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Owner: <mailto:nfsv4-owner@ietf.org>
List-Post: <mailto:nfsv4@ietf.org>
List-Subscribe: <mailto:nfsv4-join@ietf.org>
List-Unsubscribe: <mailto:nfsv4-leave@ietf.org>

On Thu, Aug 8, 2024 at 2:11 PM Tom Haynes <loghyr@gmail.com> wrote:

> I've been thinking of ATOMIC_BEGIN/ATOMIC_END.
>
> The difference for me is that they are not per-FH, but signifying all ops
> in between have to be done atomically.
>
I do not have a strong preference. The only concern I would have with a
global lock would be the
performance hit. For a busy server handling 100s of clients, I would think
blocking all the others
from doing operations could be a big hit. (Even using MUTEX_BEGIN for a
single FH could be a big hit
for something like WRITE/FILE_SYNC4.)

I think clients would have to be very careful to only atomic lock around
fast operations.


> Not much difference, except to work on two files, MUTEX_* has to do a
> dance with PUTFH.
>
> I.e., if you want COPY or CLONE to lock both files. Or you want to lock
> directories for a rename....
>
COPY can take a long time (the asynchronous case might be ok). I recently
had a bug report
for the FreeBSD server, where a COPY was taking 25sec. This turned out to
be a ZFS configuration
problem, but the time a synchronous COPY takes can still be pretty large.

The case of multiple concurrent mutex's is definitely asking for deadlock.
I'm not sure what locking
directories for rename would do. It is not a case I have thought about.


> Oh, how about a try until available flag. So if two compounds arrive at
> the same time, the second waits until the first is done?
>
I think a short wait is fine (in fact I just forgot to suggest that for the
previous post), but I think
setting an upper bound of something like 50msec would be appropriate.


> In general I support this.
>
Be careful, you might end up stuck writing it up. I hate writing these
drafts;-)

rick


> On Thu, Aug 8, 2024, 13:47 Rick Macklem <rick.macklem@gmail.com> wrote:
>
>> Oh, one thing I missed...
>>
>> On Thu, Aug 8, 2024 at 1:36 PM Rick Macklem <rick.macklem@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Over the years, I've run into cases where it would be really
>>> nice to be able to perform multiple NFSv4 operations on a
>>> file without other operations done by other clients
>>> "gumming up the works" by changing the file's data/metadata
>>> between the operations in the compound.
>>>
>>> So, what do others think about an extension to NFSv4.2 that
>>> adds 2 new operations:
>>>   MUTEX_BEGIN(CFH)
>>>   MUTEX_END(CFH)
>>> Both would use the CFH as argument, and no other client would
>>> be allowed to perform operations on the CFH between the MUTEX_BEGIN
>>> and MUTEX_END.
>>>
>>> I think there would need to be a couple of properties for these:
>>> - There would need to be an "implicit" MUTEX_END when any operation
>>>   between MUTEX_BEGIN and MUTEX_END returns a status other than NFS_OK.
>>>
>> - When a mutex is held by another compound in progress for a an FH,
>>   MUTEX_BEGIN would reply with an error (NFS4ERR_LOCKED maybe).
>>
>>> - I think you would want a restriction of only one mutex for one CFH
>>>   at a time in a compound. Without that, there could easily be deadlocks
>>>   caused by other compounds acquiring mutexes on the same CFHs in a
>>>   different order.
>>> - Only one compound can hold a mutex on a given CFH at any time.
>>> - MUTEX_BEGIN/MUTEX_END can only be used in compounds where SEQUENCE
>>>   is the first operation.
>>> - All mutexes are discarded by a server when it crashes/recovers.
>>>   (Any time a client receives a NFS4ERR_STALE_CLIENTID.)
>>>   That way RPC retries after a server reboot should work ok, I think?
>>>
>>> I am not sure what the semantics for reading data/metadata should be,
>>> but I was thinking that would be allowed to be done by compounds for
>>> other clients for the CFH. If a client wanted to serialize against
>>> other compounds for the CFH, it could do a MUTEX_BEGIN/MUTEX_END.
>>>
>>> I see this as useful in a variety of ways:
>>> - The example in the previous email of:
>>>   MUTEX_BEGIN
>>>   NVERIFY acl_truform ACL_MODEL_NFS4
>>>   SETATTR posix_access_acl
>>>   MUTEX_END
>>> - Append writing:
>>>   MUTEX_BEGIN
>>>   VERIFY size "offset in WRITE that follows"
>>>   WRITE "offset" etc
>>>   MUTEX_END
>>> - A bunch of cases where NFSv4 lacks the postop_attributes
>>>   that were in NFSv3.
>>>   MUTEX_BEGIN
>>>   WRITE
>>>   GETATTR size, change,..
>>>   MUTEX_END
>>>
>>> So, what do others think?
>>> (This was obviously not possible without sessions.)
>>>
>>> rick
>>>
>>>
>>> _______________________________________________
>> nfsv4 mailing list -- nfsv4@ietf.org
>> To unsubscribe send an email to nfsv4-leave@ietf.org
>>
>