[nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operations

Rick Macklem <rick.macklem@gmail.com> Fri, 09 August 2024 15:12 UTC

Return-Path: <rick.macklem@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D8EF6C14F747 for <nfsv4@ietfa.amsl.com>; Fri, 9 Aug 2024 08:12:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.108
X-Spam-Level:
X-Spam-Status: No, score=-2.108 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1FN4932SVa_3 for <nfsv4@ietfa.amsl.com>; Fri, 9 Aug 2024 08:12:51 -0700 (PDT)
Received: from mail-pf1-x42a.google.com (mail-pf1-x42a.google.com [IPv6:2607:f8b0:4864:20::42a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 49AD6C14F6BB for <nfsv4@ietf.org>; Fri, 9 Aug 2024 08:12:51 -0700 (PDT)
Received: by mail-pf1-x42a.google.com with SMTP id d2e1a72fcca58-70d316f0060so2157123b3a.1 for <nfsv4@ietf.org>; Fri, 09 Aug 2024 08:12:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723216370; x=1723821170; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=rTz06CMzOCQz0l0vmUGBaSdasSYa/gJDG4dgCOggTKY=; b=Lo9RYVjv5aixbImZHF0qNVdOleRXc2RRu1R6F/lrxYFeyPPQwUf0V+YyFGTSP+mBaM CxEItmk7POq+dv9Ue1EQk3+cGCLg5IliIJkDd3l0Iqnka8zKiHmf6GDw7KzyBN80oPEz pYPLivFUAYaW0tM8j1ERsfDbkii6sSZUcDNCXrR7FqJQPPglmDx2kIDhae3208zBRzJF BDR4w0NF47JcqYhUG60/MvZbG+SwG4pIk5JDD/IJKlEDlL4eZwfnN4PVHz4dndPNPiCM OXuhTWWXPqFrzXHcRq7toYttAUl3tArXG3XvmvQVaGP37MmyDlXNsM7KSPkxTHyzNc8a W69g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723216370; x=1723821170; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=rTz06CMzOCQz0l0vmUGBaSdasSYa/gJDG4dgCOggTKY=; b=c8VCDhT/dEOV1dzBHS9G9ewrfNHvIQRNL82YmXvKP+qhexftbp1Pxo6+X8qWNOzdyz Vd8wFuDlRCX4tEfcH8dZkSV1bUzWfECAyasKUkkb69ZZsoZDZveJzYjUvONBdJzdcyx+ w1rchSbfdd1f6/tEIiE0rnJjWgFrvyB7NQ2CMjUbPhXlalSpW1iqx4xSZ6WDX0dqD+uO geG8Iw8raIBdfbtYG5PsmM0vbSL9uhkTrcStGRwsxOpbKhOxS5aFfdR3K68+e1ayW5ec dx6IV9Csz5oap2mth7e514rIZ8PzQ9Hnm9DsON/iTDEay9VYwgNyo/7GmtqOFM/EpQbZ m31w==
X-Gm-Message-State: AOJu0YzUL6EC8QG7NUKDQaqUE1iL5BkKBCdleuq4K+nV1q+2XNbjnK3A brXGKGGhzBed035OKwn/J/hQ1Rrrr/A37wBFhFUxUPqKMAS0YGiz2j37Tjuqp9s8iRu2ta/FHOc NtcfJPZCZ5bAIDYLpWPTYzewuVsOO
X-Google-Smtp-Source: AGHT+IEAHV3mckYXF92B7AuJvCYkC6tQOUSaEAybPTHWdKnuYXYhvQLsgj+MD2+kQvlx9FT9YDydY0dSSQ4tetokBhc=
X-Received: by 2002:a17:90a:55c3:b0:2d1:c9f9:e871 with SMTP id 98e67ed59e1d1-2d1e7f07945mr3035343a91.8.1723216370037; Fri, 09 Aug 2024 08:12:50 -0700 (PDT)
MIME-Version: 1.0
References: <CAM5tNy7g+YCiiZQD7G6Ryv_Mo8N5BeRiqMP=224zPpEXa+Yi+A@mail.gmail.com> <20240809090008.tzlq4vy5jmxckcqn@pali>
In-Reply-To: <20240809090008.tzlq4vy5jmxckcqn@pali>
From: Rick Macklem <rick.macklem@gmail.com>
Date: Fri, 09 Aug 2024 08:12:39 -0700
Message-ID: <CAM5tNy7vBv0TLaL57bWMYfwLtG7P4CuBHj4YGOoGsnGmVeVeaA@mail.gmail.com>
To: Pali Rohár <pali-ietf-nfsv4@ietf.pali.im>
Content-Type: multipart/alternative; boundary="0000000000009b7b79061f4193b0"
Message-ID-Hash: NH6PYE4HVSTYW3JC2NH4YY7GPSQYZLY7
X-Message-ID-Hash: NH6PYE4HVSTYW3JC2NH4YY7GPSQYZLY7
X-MailFrom: rick.macklem@gmail.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-nfsv4.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: NFSv4 <nfsv4@ietf.org>
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operations
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/H3nRCblaAdTXbc59Lc5NgG_8Df8>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Owner: <mailto:nfsv4-owner@ietf.org>
List-Post: <mailto:nfsv4@ietf.org>
List-Subscribe: <mailto:nfsv4-join@ietf.org>
List-Unsubscribe: <mailto:nfsv4-leave@ietf.org>

On Fri, Aug 9, 2024 at 2:00 AM Pali Rohár <pali-ietf-nfsv4@ietf.pali.im>
wrote:

> Hello, if I understand correctly then this functionality could be
> already possible via NFS v4.1 by using persistent reply cache of the
> session.
>
> RFC 8881 section 2.10.6.5. says:
>
> "A persistent reply cache places certain demands on the server. The
> execution of the sequence of operations (starting with SEQUENCE) and
> placement of its results in the persistent cache MUST be atomic."
>
> It should be enough for a NFS v4.1 client to create a new session with
> flag CREATE_SESSION4_FLAG_PERSIST and place those "atomic" operations
> into that session.
>
Hmm. I think that "atomic" w.r.t. persistent sessions refers to the
changes done to session state and the file system are committed to
storage at one time, using some sort of log that allows partially completed
compounds to be "rolled back" after a reboot such that the session either
represents the state before the compound began or after it has been
completed.
As an implementer I will note that many (maybe all extant) servers that do
not
support persistent sessions. (I know that the FreeBSD server does not.)

However, I do not think the above precludes another client from performing
operations on the same file concurrently (interspersed) with the operations
on the compound.
An example case that MUTEX_BEGIN/MUTEX_END is meant to address might be
an attempt to implement append writing (Client A and B both have the CFH
set to the same file):
(in temporal ordering)
Client A                             Client B
- without MUTEX_BEGIN/MUTEX_END
VERIFY (size == N) returns NFS_OK
                                     VERIFY (size == N) returns NFS_OK
WRITE (at offset N) returns NFS_OK
                                     WRITE (at offset N) returns NFS_OK
--> Client B overwrites Client A's write
- with MUTEX_BEGIN/MUTEX_END
MUTEX_BEGIN returns NFS_OK
                                     MUTEX_BEGIN - blocks (or replies
NFS4ERR_LOCKED)
VERIFY (size == N) returns NFS_OK
WRITE (at offset N) returns NFS_OK
MUTEX_END returns NFS_OK
                                     MUTEX_BEGIN - returns NFS_OK
                                     VERIFY (size == N) returns
NFS4ERR_NOTSAME
                                     --> this causes an implicit MUTEX_END

Now, it is true that, for writing, byte range locking could be used,
but that adds overhead and requires that all writing do the locking
(since most/all extant servers do advisory byte locking and the client
has no way of knowing whether or not a server is doing advisory vs mandatory
byte range locking).

rick


> On Thursday 08 August 2024 13:36:25 Rick Macklem wrote:
> > Hi,
> >
> > Over the years, I've run into cases where it would be really
> > nice to be able to perform multiple NFSv4 operations on a
> > file without other operations done by other clients
> > "gumming up the works" by changing the file's data/metadata
> > between the operations in the compound.
> >
> > So, what do others think about an extension to NFSv4.2 that
> > adds 2 new operations:
> >   MUTEX_BEGIN(CFH)
> >   MUTEX_END(CFH)
> > Both would use the CFH as argument, and no other client would
> > be allowed to perform operations on the CFH between the MUTEX_BEGIN
> > and MUTEX_END.
> >
> > I think there would need to be a couple of properties for these:
> > - There would need to be an "implicit" MUTEX_END when any operation
> >   between MUTEX_BEGIN and MUTEX_END returns a status other than NFS_OK.
> > - I think you would want a restriction of only one mutex for one CFH
> >   at a time in a compound. Without that, there could easily be deadlocks
> >   caused by other compounds acquiring mutexes on the same CFHs in a
> >   different order.
> > - Only one compound can hold a mutex on a given CFH at any time.
> > - MUTEX_BEGIN/MUTEX_END can only be used in compounds where SEQUENCE
> >   is the first operation.
> > - All mutexes are discarded by a server when it crashes/recovers.
> >   (Any time a client receives a NFS4ERR_STALE_CLIENTID.)
> >   That way RPC retries after a server reboot should work ok, I think?
> >
> > I am not sure what the semantics for reading data/metadata should be,
> > but I was thinking that would be allowed to be done by compounds for
> > other clients for the CFH. If a client wanted to serialize against
> > other compounds for the CFH, it could do a MUTEX_BEGIN/MUTEX_END.
> >
> > I see this as useful in a variety of ways:
> > - The example in the previous email of:
> >   MUTEX_BEGIN
> >   NVERIFY acl_truform ACL_MODEL_NFS4
> >   SETATTR posix_access_acl
> >   MUTEX_END
> > - Append writing:
> >   MUTEX_BEGIN
> >   VERIFY size "offset in WRITE that follows"
> >   WRITE "offset" etc
> >   MUTEX_END
> > - A bunch of cases where NFSv4 lacks the postop_attributes
> >   that were in NFSv3.
> >   MUTEX_BEGIN
> >   WRITE
> >   GETATTR size, change,..
> >   MUTEX_END
> >
> > So, what do others think?
> > (This was obviously not possible without sessions.)
> >
> > rick
>