[nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operations
Rick Macklem <rick.macklem@gmail.com> Tue, 20 August 2024 23:43 UTC
Return-Path: <rick.macklem@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E4D2AC18DB99 for <nfsv4@ietfa.amsl.com>; Tue, 20 Aug 2024 16:43:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.107
X-Spam-Level:
X-Spam-Status: No, score=-7.107 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id J7k67m-iz1ld for <nfsv4@ietfa.amsl.com>; Tue, 20 Aug 2024 16:43:44 -0700 (PDT)
Received: from mail-pg1-x531.google.com (mail-pg1-x531.google.com [IPv6:2607:f8b0:4864:20::531]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4E1EAC1516F3 for <nfsv4@ietf.org>; Tue, 20 Aug 2024 16:43:44 -0700 (PDT)
Received: by mail-pg1-x531.google.com with SMTP id 41be03b00d2f7-7c9a2b339a6so1816263a12.2 for <nfsv4@ietf.org>; Tue, 20 Aug 2024 16:43:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724197424; x=1724802224; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=VQ/uQmCWYpgiFKrcLO611MQffvz6pBic1JLawaJv010=; b=Tuvkm33Xd8fz5B4f0rJcdk4xyBMYaxtONl5iCmYiuPVHQMUPGe8Au2dQGedI2Df2MQ GRBkBoc/BbBYc9CE8aWqMlQVRguzmupskpqNYBHq+I3ZRr5JmUasV6E8YbzC4CopC6p6 rLzjIFuj9F/k0yZT8a31FdvBoy6eVjpQ6H70Er4hgm+5AWctruMU85Z35hIzACxpXtte evKrMUCdWgrRKn+8xmAivjieOL3wBoTq18tXhjCTKB/B5Tlwqu+K9F3SLt+2A4E8cqmB g/TAt4QTE+ggIkPTbiQV830/eDs4izbFK0FYZm20TvCOGTH8/tnC6oQQ0foekteTI7y6 Zc0g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724197424; x=1724802224; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=VQ/uQmCWYpgiFKrcLO611MQffvz6pBic1JLawaJv010=; b=BespAtrraIhnru0xT1RD/0tu9t6jgR9VwRBY/t3uhO40Ivzf+N67iwEc95qVNX/Ryp r8ID1WbtWQkjQGhho4kONg+rQFXq5Bygf6ZgiCVxDSuFcVtv52eEctqxyy5aW5GtYRqQ Z0Ofvp5Pvm92WRERSsIdzYWuRmPRqNwZndK60NywEwM0Om1/fno6ubRoFcGp7rS1zLOl LSYXTXeP6xzK6Wr8Jvu/GKk9Ha4A7xEj4p4oms6NULQieOXV7jEH/ppaLVV6JpvDWUCd Qr5VkfGP/fCPBK4QgNsrJLZOVRYkmmY7C8JLmco30N/uLYD6OqXGI1fGUjc8oJvGn9Ll v8+w==
X-Gm-Message-State: AOJu0Yx5pHV9x9Ci0eQRncNmaaWjvj0wIiCu8vxxpcD9PniwdJAEjqup 80mucOs6NWHjK8ZyJkVQX6witbDQTXKb+LGTmXdZCbpdqWumeqLXTO5dX/O9GOcyeTjUkdSPNfk /UCzQHnIBuE7Plnhb+uu8xaaGP3/g
X-Google-Smtp-Source: AGHT+IHV5/l46o2WMNnedojSQR346hbxIWkupAc2kXEpx6tTujQ2Q+Zb+04B8nkn6XUnxi6Vrh7XBJ4ZCyMjnh9PP5w=
X-Received: by 2002:a17:90b:4b81:b0:2d1:ca16:554d with SMTP id 98e67ed59e1d1-2d5e99c5412mr608312a91.4.1724197423427; Tue, 20 Aug 2024 16:43:43 -0700 (PDT)
MIME-Version: 1.0
References: <CAM5tNy7g+YCiiZQD7G6Ryv_Mo8N5BeRiqMP=224zPpEXa+Yi+A@mail.gmail.com> <022dcfb8ca2e484400b7a9a3e79682eda95a7009.camel@hammerspace.com>
In-Reply-To: <022dcfb8ca2e484400b7a9a3e79682eda95a7009.camel@hammerspace.com>
From: Rick Macklem <rick.macklem@gmail.com>
Date: Tue, 20 Aug 2024 16:43:33 -0700
Message-ID: <CAM5tNy7frS=ZKMchWAFpQa1PNWhj975JWbS_d3sJ4r1=aPiQeg@mail.gmail.com>
To: Trond Myklebust <trondmy@hammerspace.com>
Content-Type: multipart/alternative; boundary="000000000000f238d6062025fe31"
Message-ID-Hash: HPU3LZBUNJ5TGROXZVIZ6O5D3YJWJIYX
X-Message-ID-Hash: HPU3LZBUNJ5TGROXZVIZ6O5D3YJWJIYX
X-MailFrom: rick.macklem@gmail.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-nfsv4.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: "nfsv4@ietf.org" <nfsv4@ietf.org>
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operations
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/mN22pCaK6wMtLz70g6rGucsbky4>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Owner: <mailto:nfsv4-owner@ietf.org>
List-Post: <mailto:nfsv4@ietf.org>
List-Subscribe: <mailto:nfsv4-join@ietf.org>
List-Unsubscribe: <mailto:nfsv4-leave@ietf.org>
On Tue, Aug 20, 2024 at 3:50 PM Trond Myklebust <trondmy@hammerspace.com> wrote: > On Thu, 2024-08-08 at 13:36 -0700, Rick Macklem wrote: > > Hi, > > Over the years, I've run into cases where it would be really > nice to be able to perform multiple NFSv4 operations on a > file without other operations done by other clients > "gumming up the works" by changing the file's data/metadata > between the operations in the compound. > > So, what do others think about an extension to NFSv4.2 that > adds 2 new operations: > MUTEX_BEGIN(CFH) > MUTEX_END(CFH) > Both would use the CFH as argument, and no other client would > be allowed to perform operations on the CFH between the MUTEX_BEGIN > and MUTEX_END. > > I think there would need to be a couple of properties for these: > - There would need to be an "implicit" MUTEX_END when any operation > between MUTEX_BEGIN and MUTEX_END returns a status other than NFS_OK. > - I think you would want a restriction of only one mutex for one CFH > at a time in a compound. Without that, there could easily be deadlocks > caused by other compounds acquiring mutexes on the same CFHs in a > different order. > - Only one compound can hold a mutex on a given CFH at any time. > - MUTEX_BEGIN/MUTEX_END can only be used in compounds where SEQUENCE > is the first operation. > - All mutexes are discarded by a server when it crashes/recovers. > (Any time a client receives a NFS4ERR_STALE_CLIENTID.) > That way RPC retries after a server reboot should work ok, I think? > > > I think you also need to apply: > > - The mutex is discarded as soon as the COMPOUND completes. > > Otherwise, you're relying on the client to be a good citizen and clean up > if one of the operations in the COMPOUND generated an error. > Yes. I did mention that in one of the emails. It also needs to be discarded when the server reboots, so that retries of RPCs won't be a problem after the reboot. There is also the question of what to do w.r.t. very slow compounds? Not sure if I have a good answer for that one yet. > > I am not sure what the semantics for reading data/metadata should be, > but I was thinking that would be allowed to be done by compounds for > other clients for the CFH. If a client wanted to serialize against > other compounds for the CFH, it could do a MUTEX_BEGIN/MUTEX_END. > > I see this as useful in a variety of ways: > - The example in the previous email of: > MUTEX_BEGIN > NVERIFY acl_truform ACL_MODEL_NFS4 > SETATTR posix_access_acl > MUTEX_END > - Append writing: > MUTEX_BEGIN > VERIFY size "offset in WRITE that follows" > WRITE "offset" etc > MUTEX_END > - A bunch of cases where NFSv4 lacks the postop_attributes > that were in NFSv3. > MUTEX_BEGIN > WRITE > GETATTR size, change,.. > MUTEX_END > > So, what do others think? > (This was obviously not possible without sessions.) > > > Questions: > > What happens in the case where another client holds a write delegation? Do > you recall the delegation? I think you must, since the client is otherwise > authoritative for some of these attributes that you want to keep stable. > Good point. I hadn't thought of this. I think you are correct that if the server sees a GETATTR between MUTEX_BEGIN/MUTEX_END for a file write delegated to another client, requesting the attributes handled by the client when write delegated (size/change/time_modify), it might need to CB_RECALL. (Of course, for the above examples using WRITE, another client couldn't be holding a write delegation.) Or maybe using the results of a CB_GETATTR might be sufficient, since that client is now authoritative for the attributes? It is definitely a case that needs further thought. What do others think? > What happens if a pNFS layout is outstanding? Do you recall the layout? > Again, I think you have to, since otherwise you cannot guarantee exclusive > access or attribute stability. > Another case I haven't really thought about. I think you are correct. If a RW layout has been issued to another client, I think the layout needs to be recalled when a WRITE/GETATTR/SETATTR (and maybe others) is done between MUTEX_BEGIN/MUTEX_END and no new RW layout issued to other clients until after the MUTEX_END. But I definitely need to think about this some more. Again, what do others think? > Speaking of pNFS, how do you ensure that the above WRITE+GETATTR is > possible for clients that are using pNFS? Do we need a > 'LAYOUTGET_WITH_MUTEX' operation? Since layouts are recallable, it might be > possible... > Another good point. RFC7862 allows some additional operations be done by a NFSv4.2 DS (Sec. 3.3.1). Maybe this extension could add MUTEX_BEGIN/MUTEX_END/GETATTR to that list? That way a client that is pNFS aware and using a NFSv4.2 DS could do the same things on the DS that it would otherwise do on the MDS. If the DS is not NFSv4.2, it would just live without this extension. Any other ideas? rick > > -- > > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com <trond.myklebust@primarydata.com> > > >
- [nfsv4] RFC: new MUTEX_BEGIN/MUTEX_END operations Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Tom Haynes
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Pali Rohár
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Chuck Lever III
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Jeff Layton
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Pali Rohár
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Jeff Layton
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Jeff Layton
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Jeff Layton
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Pali Rohár
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Trond Myklebust
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem
- [nfsv4] Re: RFC: new MUTEX_BEGIN/MUTEX_END operat… Rick Macklem