[nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operations

Rick Macklem <rick.macklem@gmail.com> Tue, 20 August 2024 22:53 UTC

Return-Path: <rick.macklem@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 43A68C1840F4 for <nfsv4@ietfa.amsl.com>; Tue, 20 Aug 2024 15:53:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.107
X-Spam-Level:
X-Spam-Status: No, score=-7.107 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0hSDx65Jkiak for <nfsv4@ietfa.amsl.com>; Tue, 20 Aug 2024 15:53:29 -0700 (PDT)
Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A49A9C131930 for <nfsv4@ietf.org>; Tue, 20 Aug 2024 15:53:29 -0700 (PDT)
Received: by mail-pl1-x635.google.com with SMTP id d9443c01a7336-2025031eb60so20814075ad.3 for <nfsv4@ietf.org>; Tue, 20 Aug 2024 15:53:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724194409; x=1724799209; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=oU5XCGOIPrFJ+jgyQSlYUczRn8GoDGAQjX1UJGYioN4=; b=GfvLxUMfogPoDLegYzO1dMr2dUyeFnb2X8cKzTd2N6wq7C6t5vR+QKuFz6vJ4hmEFh cKIwlNYDZmPVWcx/sNBdw7J41ELi3yqIolQRDHddW8oNUP68618lJKduLkw7oJSa+NtA mjR+mv/zNAUHmFs2onz8041WbOn40sbF45ghOfNwke8Z7+FUBJ0DaziE3bj5lPz/fnwN t4ibxQrvY90wDEfc4fBj7sp6kd/U1NUGkCmmc+pZeRtTkoZCbsAEpWYovfaLcWJ7PEd2 M71NLkHIBxoicmoDr3+ye7t/SRJQbKgwCbfjm8wFvbcbm6goPmUNN9/7dZIUDy4ZVZI6 DWbA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724194409; x=1724799209; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=oU5XCGOIPrFJ+jgyQSlYUczRn8GoDGAQjX1UJGYioN4=; b=uc86eVMwfVsJbElSSBxMsH8cnXd7j+7wPe0tdniSmkY7E+SQmy5zwrfZmZwvod55M0 ieDFtES1Q7fqvMk7Kg01bBQ4LelPN9T6TWFI6b7WpBI6X7FI5tl9JTkWPunz30w407IL SRRUUopAUlLyEuTrT9A7SfyRfK6vhKhoUsSt9dB8xjySj7XFSSYwPlHH3cFpiloEYCc5 d2MkxByF1f84VpCx9xfMP4UhT0rmCUpVhQMuB/py0XK+DHd2ZeMlybmL7lBinMgrHV5I ONPOgENQ9jNWn4kYOhGeFy7Wx/k10mUk4cT4eNBrG5D0KRqSNd84hT2KShgud1g406ht OtMA==
X-Gm-Message-State: AOJu0YzbV1IgEm72zDLq4GEDiZJZaMda4ijVgbowdOif20Yl4KqvhEWW BMq+V2mTXjFXwsVTJb4Ejje8wnw6vV7uyH8VhvMqZMxUR5MI1n7lborvjngdfygknHfrAFxRYKS eimX4HlKPIVRsymG5aN/TrqGrq6cy
X-Google-Smtp-Source: AGHT+IFZDNg2ZNmajWmIlhm+r/LclmuCAHQSQn0+v7Bl0rSyuHXoDmlUrQW25ddY6sSkB1JQvH5TBEaa3sHwfFWC4WI=
X-Received: by 2002:a17:90b:4c0b:b0:2d3:cc16:826f with SMTP id 98e67ed59e1d1-2d5e98b789emr565826a91.0.1724194408811; Tue, 20 Aug 2024 15:53:28 -0700 (PDT)
MIME-Version: 1.0
References: <CAM5tNy7g+YCiiZQD7G6Ryv_Mo8N5BeRiqMP=224zPpEXa+Yi+A@mail.gmail.com> <20240820210352.hllkh7ht4cch3624@pali>
In-Reply-To: <20240820210352.hllkh7ht4cch3624@pali>
From: Rick Macklem <rick.macklem@gmail.com>
Date: Tue, 20 Aug 2024 15:53:18 -0700
Message-ID: <CAM5tNy5D0YbU6s3MNpLjru4fik56sqFmLswRtd3Y0DO6-HgjAw@mail.gmail.com>
To: Pali Rohár <pali-ietf-nfsv4@ietf.pali.im>
Content-Type: multipart/alternative; boundary="00000000000042d6590620254b68"
Message-ID-Hash: 6XL26BVNKWCANRLBQYKTKHSWQIODZBLP
X-Message-ID-Hash: 6XL26BVNKWCANRLBQYKTKHSWQIODZBLP
X-MailFrom: rick.macklem@gmail.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-nfsv4.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: NFSv4 <nfsv4@ietf.org>
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operations
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/ivz62d5xU1iO23AafNZTLLwM9HU>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Owner: <mailto:nfsv4-owner@ietf.org>
List-Post: <mailto:nfsv4@ietf.org>
List-Subscribe: <mailto:nfsv4-join@ietf.org>
List-Unsubscribe: <mailto:nfsv4-leave@ietf.org>

On Tue, Aug 20, 2024 at 2:03 PM Pali Rohár <pali-ietf-nfsv4@ietf.pali.im>
wrote:

> On Thursday 08 August 2024 13:36:25 Rick Macklem wrote:
> > Hi,
> >
> > Over the years, I've run into cases where it would be really
> > nice to be able to perform multiple NFSv4 operations on a
> > file without other operations done by other clients
> > "gumming up the works" by changing the file's data/metadata
> > between the operations in the compound.
> >
> > So, what do others think about an extension to NFSv4.2 that
> > adds 2 new operations:
> >   MUTEX_BEGIN(CFH)
> >   MUTEX_END(CFH)
> > Both would use the CFH as argument, and no other client would
> > be allowed to perform operations on the CFH between the MUTEX_BEGIN
> > and MUTEX_END.
> >
> > I think there would need to be a couple of properties for these:
> > - There would need to be an "implicit" MUTEX_END when any operation
> >   between MUTEX_BEGIN and MUTEX_END returns a status other than NFS_OK.
> > - I think you would want a restriction of only one mutex for one CFH
> >   at a time in a compound. Without that, there could easily be deadlocks
> >   caused by other compounds acquiring mutexes on the same CFHs in a
> >   different order.
> > - Only one compound can hold a mutex on a given CFH at any time.
> > - MUTEX_BEGIN/MUTEX_END can only be used in compounds where SEQUENCE
> >   is the first operation.
> > - All mutexes are discarded by a server when it crashes/recovers.
> >   (Any time a client receives a NFS4ERR_STALE_CLIENTID.)
> >   That way RPC retries after a server reboot should work ok, I think?
> >
> > I am not sure what the semantics for reading data/metadata should be,
> > but I was thinking that would be allowed to be done by compounds for
> > other clients for the CFH. If a client wanted to serialize against
> > other compounds for the CFH, it could do a MUTEX_BEGIN/MUTEX_END.
> >
> > I see this as useful in a variety of ways:
> > - The example in the previous email of:
> >   MUTEX_BEGIN
> >   NVERIFY acl_truform ACL_MODEL_NFS4
> >   SETATTR posix_access_acl
> >   MUTEX_END
> > - Append writing:
> >   MUTEX_BEGIN
> >   VERIFY size "offset in WRITE that follows"
> >   WRITE "offset" etc
> >   MUTEX_END
> > - A bunch of cases where NFSv4 lacks the postop_attributes
> >   that were in NFSv3.
> >   MUTEX_BEGIN
> >   WRITE
> >   GETATTR size, change,..
> >   MUTEX_END
> >
> > So, what do others think?
> > (This was obviously not possible without sessions.)
> >
> > rick
>
> Hello,
>
> Now I'm thinking more about this and those mutexes looks to be
> performance killer for any NFS4 server which allows thousands of
> parallel operations. And this can open also vectors for DDOS attacks
> if servers are implemented not securely where malicious clients take
> mutexes and blocks any operations.


> Would not it be better to address existing problems by separate
> mechanisms? For example append issue can be solved by new append
> operation which NFS4 server can implement more optimized (e.g. if its
> storage or API already provides append operation, which applies for all
> POSIX systems via open/O_APPEND).
>
An Append_Write is not something I know how to do correctly.
After a server reboot, there can be a retry, which can result in
duplicate writes. I'm assuming you are talking about a Write that
is defined as "done at EOF" instead of at "byte offset N".


> Also this mutex mechanism does not solve atomicity of operations in
> multiprotocol environment where other protocols without this kind of
> mutex co-exist together on the same storage (e.g. interop with SMB or
> NFS3). If you have implementation of NFS4 and Samba in different
> userspace processes then what can result is that the NFS4 mutex will
> hold other NFS4 compound operations, but would not hold Samba
> operations. So atomicity would not be guaranteed at all.
>
I suppose that is similar to NFSv3 RPCs. It is doable, but is
left up to the server implementer, I think?


> NFS4 servers do not have to be implemented in kernel where they can lock
> inode to prevent any other operation by other system parts (which
> implements mutex on filehandle). And from POSIX userspace such inode
> locking is highly impossible to implement.
>
If the NFS4 server is in userspace, it can implement some kind of
lock on the file handles (or however it handles file handles).
The kernel vnode lock is just a convenient mechanism for the kernel
based FreeBSD server.


> I just cannot imagine to how implement NFS4 server with support for this
> mutex with system fs storage just by using POSIX API.
>
pthreads have various locking primitives. (It is true that the locking
is only visible to the NFSv4 server and not to the rest of the server
system. Having said this, it would be someone familiar with a userspace
server like Ganesha to answer this.)


> Pali
>