[nfsv4] Re: My feedback for NFSv4.1 specification
Rick Macklem <rick.macklem@gmail.com> Mon, 05 August 2024 23:29 UTC
Return-Path: <rick.macklem@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2A941C1D530C for <nfsv4@ietfa.amsl.com>; Mon, 5 Aug 2024 16:29:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.107
X-Spam-Level:
X-Spam-Status: No, score=-2.107 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1_vA-oNGuB9m for <nfsv4@ietfa.amsl.com>; Mon, 5 Aug 2024 16:29:43 -0700 (PDT)
Received: from mail-pf1-x436.google.com (mail-pf1-x436.google.com [IPv6:2607:f8b0:4864:20::436]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 13CB3C14F71D for <nfsv4@ietf.org>; Mon, 5 Aug 2024 16:29:43 -0700 (PDT)
Received: by mail-pf1-x436.google.com with SMTP id d2e1a72fcca58-70d2b921cdfso27676b3a.0 for <nfsv4@ietf.org>; Mon, 05 Aug 2024 16:29:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722900582; x=1723505382; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=DY5xgM8fUVI0lOudUcdnk2TBrDztPYyY05QXiSzIkaM=; b=Ot43PgN9x5ZxNbOTa2K/lNCsHvvFISjDl92CzlSwZVyGFR+kwN1+/JsCtPE6EAxAa5 yiNolIvlC/OtIXSGyb0oKnq3RbiocwD1fs5XSno37eW+pzqxWQlXoiF/yfy5K6nerCJs rOKO+rS6e1xdVjx6cZ196N7+y8aXuAWriwGJFe8JjBsG1UXvh9Kezp2jViKm+GzP19rP Ho87JQjZ1EUDiQ/dF+fffXiaqq8i79tkJ7pLe1/DBlrkOOcJN9mYpueY/S49ZZZtcZWl tsnCMFCuSR+wfWt8JnD7Ipjm+q8BSTXI31sfjYY0/rZoBwGWLyxZtQI9gIVzYcIkQg0h 9EGA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722900582; x=1723505382; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=DY5xgM8fUVI0lOudUcdnk2TBrDztPYyY05QXiSzIkaM=; b=kkX3sIJKtQ97Nll93WbLxaeqoEsIyls5ilPtR8HxGRW+AGuwS7pHjbYK3xehbL4Sfw atlYqF4wh2DQ+TFyPoxUsluDAx9KbWJxI2BHPjQMjuYLB3AEFufoNsHp/8DFZukbqKYL I6peN2dPZB0S6ifZdO07Nw8YTVtjYoSAGYjeexQ3u4g1Rke9UWm1fkt7Ywpw0rbhPc4R rI65ZmDEbomrt8y+dwGjs2SNBdn8dfhc7sD1is1DfL3TYlxk1B+5JqjIVEoBkkp0P6EP +oyMATkLBpu75Pssh8z825uGad0gB9rCphAzQK7e3aDwBcH9oAwptCDOfdPnnXELNNKK +6XA==
X-Gm-Message-State: AOJu0YyKC3UuL9OckS6+5VjM6dhf6wrdFXUQB1KnCh+kpIC4XKqbC2Z9 o5Kpe9n0es4239CkV4L3Z0aWHsh6UZhhAU195uCjvKmnNmZ5X2HpEz1sSfzrLfwosXWPDGAcn13 jrV3zWHJg3DfOx0ALFQhkI8oEaEuz
X-Google-Smtp-Source: AGHT+IGxvoqTqb9xqug3+PWTrT0NjUtpl75Mp1D759BG4esBl6SJzWe3jpoTMCakMls9vhaL5zh4t56f67oy9GT4x8w=
X-Received: by 2002:a05:6a00:9187:b0:70e:98e3:1c17 with SMTP id d2e1a72fcca58-7106d04606emr14582229b3a.27.1722900582182; Mon, 05 Aug 2024 16:29:42 -0700 (PDT)
MIME-Version: 1.0
References: <20240805165039.xb4jsaf7njpbvmlg@pali> <CAM5tNy4efN-3FznjsYDk3jH=u_5dfXWt2s1VvSeYn0J_EzuoKw@mail.gmail.com> <20240805220715.ab7n6nvsncrzpwtn@pali>
In-Reply-To: <20240805220715.ab7n6nvsncrzpwtn@pali>
From: Rick Macklem <rick.macklem@gmail.com>
Date: Mon, 05 Aug 2024 16:29:32 -0700
Message-ID: <CAM5tNy7eWgVrw-=tNq1kYdcTPTBiYJePhcBxwDaAfEC1+mXNOg@mail.gmail.com>
To: Pali Rohár <pali-ietf-nfsv4@ietf.pali.im>
Content-Type: multipart/alternative; boundary="0000000000002f351d061ef80da9"
Message-ID-Hash: US44WCLDY22YPO2VGUFVGVVHWB3BRQBB
X-Message-ID-Hash: US44WCLDY22YPO2VGUFVGVVHWB3BRQBB
X-MailFrom: rick.macklem@gmail.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-nfsv4.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: nfsv4@ietf.org
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [nfsv4] Re: My feedback for NFSv4.1 specification
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/Ai3zSOs2g5AYdbwBqXnwGeCba9k>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Owner: <mailto:nfsv4-owner@ietf.org>
List-Post: <mailto:nfsv4@ietf.org>
List-Subscribe: <mailto:nfsv4-join@ietf.org>
List-Unsubscribe: <mailto:nfsv4-leave@ietf.org>
On Mon, Aug 5, 2024 at 3:07 PM Pali Rohár <pali-ietf-nfsv4@ietf.pali.im> wrote: > Hello, thank you very much for your reply. > > I will write my comments below and remove other parts to not have such > long email being repeated. > > On Monday 05 August 2024 14:07:11 Rick Macklem wrote: > > So many questions. > > I'll try and answer a few of them and hopefully others > > will answer some as well. > > > > On Mon, Aug 5, 2024 at 9:56 AM Pali Rohár <pali-ietf-nfsv4@ietf.pali.im> > > wrote: > > > * What should OP_ACCESS's ACCESS4_DELETE do when called for directory > > > object? From the document it is not clear if the access check should > > > check if the caller has permission for deleting the directory itself > > > or if it has permission for deleting child entry in that directory. > > > Those are two different things. > > > > > As you've noted, I think it refers to deletion of entries in the > > directory (or children, if you prefer). For file systems that support > > multiple hard links (like POSIX ones), deletion refers to the name in > > the directory and not the underlying file object. > > RFC1813 defined it as: > > Delete an existing directory entry. > > and I don't think it changed for NFSv4. > > RFC 8881 says "ACCESS4_DELETE - Delete an existing directory entry." > But it is not clear at least to me, if it means to remove the directory > itself or some child in the directory. "directory entry" could really > mean the directory itself as it is de-facto entry of some (upper) > directory. And section 18.1.4. IMPLEMENTATION of RFC 8881 is not clear > neither about this situation. > > > > > > From testing more NFS servers, it looks like that other servers > > > are doing check if the caller has permission to delete some child > > > entry (not the directory itself). It somehow matches ACL permission > > > ACE_DELETE_CHILD (and not the ACE_DELETE). > > > > > There has never been a clear understanding of how ACE_DELETE and > > ACE_DELETE_CHILD should be implemented. Here's what OpenZFS chose: > > * The following chart is the recommended NFSv4 enforcement for > > * ability to delete an object. > > * > > * ------------------------------------------------------- > > * | Parent Dir | Target Object Permissions | > > * | permissions | | > > * ------------------------------------------------------- > > * | | ACL Allows | ACL Denies| Delete | > > * | | Delete | Delete | unspecified| > > * ------------------------------------------------------- > > * | ACL Allows | Permit | Permit | Permit | > > * | DELETE_CHILD | | > > * ------------------------------------------------------- > > * | ACL Denies | Permit | Deny | Deny | > > * | DELETE_CHILD | | | | > > * ------------------------------------------------------- > > * | ACL specifies | | | | > > * | only allow | Permit | Permit | Permit | > > * | write and | | | | > > * | execute | | | | > > * ------------------------------------------------------- > > * | ACL denies | | | | > > * | write and | Permit | Deny | Deny | > > * | execute | | | | > > * ------------------------------------------------------- > > * ^ > > * | > > * No search privilege, can't even look up file? > > On Windows is object detection granted only by DELETE and DELETE_CHILD > permissions. Not by EXECUTE or WRITE(_DATA). On Windows you need at > least one of those permissions. So, your first two lines matches Windows > ACL. Last two lines do not apply on Windows. And instead of there should > be this line: > > ACL Allows Delete | ACL Denies Delete | Delete > unspecified > DELETE_CHILD unspecified | Permit | Deny | Deny > > In multiprotocol environment is common to use this Windows behavior. It > reduces issues with interoperability between NFS4 and SMB. > > To clarify, the above is just what OpenZFS chose. I did not mean to imply that was the "correct" answer. > > > > If there was already a consensus that OP_ACCESS's ACCESS4_DELETE > > > on directory object should check for permission to delete child > entry, > > > what about properly documenting this fact in some next revision of > > > NFS 4.x document? > > > > > > * Which error code should OP_OPEN return if the server reached maximal > > > number of open states, cannot create a new open state and therefore > > > cannot open file? From the document I have not figured out which is > > > the correct error code. Same for OP_LOCK on maximal number of lock > > > states. > > > > > Recently discussed on linux-nfs@vger.kernel.org. I don't believe there > > is any answer except try to allows lots of them. (The FreeBSD server has > > a tuning knob set to 500,000 for Opens + Locks + Delegations, by default, > > with delegations limited to 20% of the setting.) > > (RFC7530 defines NFS4ERR_RESOURCE for NFSv4.0. It was dropped for > NFSv4.1, > > but the FreeBSD server uses it anyhow. Nobody has complained yet.) > > As out-of-memory on server can really happen, it would be a wise idea to > recycle NFS4ERR_RESOURCE and allow it in some NFS v4.1 erratas? > > And if some NFS v4.1+ server implementation is using it then it sounds > like a good idea. > > > > > > Note that expecting that server can open or lock infinitely many of > > > files is not a good idea as a server has always limited number of > > > resources and every server has some upper limit for number of open > > > states. > > > > > > * Is it required for OP_PUTFH to fail in case the file represented by > > > the supplied filehandle does not exist? > > > > If you are talking about persistent filehandles, I believe that > > NFS4ERR_STALE is the correct reply. > > Yes, I'm talking about persistent filehandles. > > My question was rather different. It is allowed from the server to > return NFS4_OK in this case (if format of filehandle is acceptable for > server)? > > Checking for the inode on server can be a performance drop. And in most > cases clients issue OP_PUTFH together with some other operation on the > file, and that operation is also going to check inode existence. > > Even in one COMPOUND with (PUTFH, GETATTR) there can be a race condition > that somebody removes the file between PUTFH and GETATTR, so clients > have to handle reply from that COMPOUND where PUTFH returns OK and > GETATTR returns STALE. > Yes, you are correct. Not only is this allowed, but technically it is the correct answer and I was wrong. NFS4ERR_STALE is not listed as an error return for PUTFH and the description for NFS4ERR_STALE defines it as stale when needed as a CURRENT_FH or SAVED_FH. --> So, technically, it should be the first operation after PUTFH that requires a CURRENT_FH that fails with NFS4ERR_STALE. (I'll admit I think the FreeBSD server returns NFS4ERR_STALE for PUTFH instead of marking it stale and returning the error on the first operation that needs CURRENT_FH. I may fix it someday, but I have never seen an issue, sofar, with returning it for PUTFH.) rick > > Or it is enough for OP_PUTFH > > > to just validate the content structure format of filehandle that it > > > can process (and does not matter if the file represented by the > > > filehandle exists or not; for example if the filehandle encodes inode > > > number, then just check that the length of filehandle is correct, but > > > do not check if the inode exists at all)? > > > > > > * What should attribute numlinks return for a directory object? It is > > > not documented at all. > > > > > > More NFS server return number of directory entries. Seems that this > > > follows the POSIX behavior of POSIX stat.st_nlink structure and would > > > be nice to mention this in some future revision of NFS4.x doc to help > > > people implementing new servers. > > > > > Not as far as I am aware. I believe most POSIX systems reply with > > the number of links (entries in directories) that refer to the directory. > > Yes, this is what I'm expecting. But it is not explicitly mentioned in > RFC 8881 doc. > > > The trick is "." (for the directory) and ".." for all subdiretcories is > > in the count. Should the "." and ".." entries be subtracted out, since > > NFSv4 does not put "." and ".." in the RREADDIR reply? > > "." and ".." are explicitly omitted in READDIR reply. > > > Who knows? > > Does any NFSv4.1/4.2 client care? > > Client itself not. But NFS4 clients on POSIX systems are propagating > this information to the stat() syscall, and this information is then > available to any application. And whatever POSIX applications are doing > with stat() st_nlink entry is questionable. > > So I would think that there are application which would care. > > > I do not think so. Clients do care about numlinks for regular files, > > since the client needs to know if REMOVE of a name will delete the > > regular file object. > > However, I don't think clients care for directories. > > The FreeBSD server does not subtract out the "." and ".." entries and > > no one has ever complained. > > > > > > > * Is NFS v4.1 client required to call OP_CLOSE for pair <filehandle, > > > open state> in case it received from the server NFS4ERR_STALE error > > > for some other operation called on that filehandle and/or open state? > > > > > I don't see how it could, since CLOSE requires the CURRENT_FH be set > > and that cannot be done if the file is stale. > > NFS v4.1 client can use OP_FREE_STATEID (instead of OP_CLOSE) to close > such open state. OP_FREE_STATEID does not take CURRENT_FH. It takes only > stateid. > > Looks like it was designed for SEQ4_STATUS_ADMIN_STATE_REVOKED signal. > Together with OP_TEST_STATEID. > > > It does imply that servers might have to do housekeeping to get rid of > > close structures when the file object (not just a name for it) is > deleted. > > > > > > > OP_REMOVE is allowed to immediately remove the file (opposite of just > > > unlinking it) even if it is opened by other clients without DENY > > > shared reservations. Are those other clients required to issue > > > OP_CLOSE once they figure out that filehandle is STALE? Or should NFS > > > v4.1 server for received OP_REMOVE also invalidate and destroy all > > > open states associated with filehandle of the removed file? > > > > > As above, unless your server implements OPEN4_RESULT_PRESERVED_UNLINKED. > > (This statement also applies to the comments you made, below.) > > > > > > > It is important to know to prevent resource leaks. If NFS v4.1 client > > > is required to call OP_CLOSE (or OP_FREE_STATEID) for already removed > > > file then server has to remember open states for all removed files > and > > > wait for the client until it calls OP_CLOSE. So server will return > > > success status for OP_CLOSE. > > > > > > For me this is not really clear how it should behave. > > > > > > Also it looks like that there is no way how NFS v4.1 server can > notify > > > client that some its opened file was removed. And client will figure > > > it out just after next file operation. > > > > > > * It is valid for NFS v4.1 client to send COMPOUND request with zero > > > operations? NFS v4.1 XDR definition allows it, there is no minimal or > > > maximal limit of operations defined. But specification in section > > > 16.2.3 says that "The COMPOUND procedure is used to combine one or > > > more NFSv4 operations into a single RPC request." which does not > allow > > > it. And contradicts what NFS v4.1 XDR definition has. > > > > > For XDR <10> specifies a maximum of 10, but there is no way to > > express a minimum, afaik. > > I see. So minimal limit has to be written by extra sentence as it cannot > be expressed in XDR. > > > As for whether or not a 0 length compound is allowed? > > You've got me. > > I can see an argument for allowing it, so that client can test to > > see if a given minor version is supported by the server. > > Exactly, this is for what I wanted to use empty COMPOUND. > > But for this purpose I used COMPOUND with OP_ILLEGAL. > > Pali >
- [nfsv4] My feedback for NFSv4.1 specification Pali Rohár
- [nfsv4] Re: My feedback for NFSv4.1 specification Rick Macklem
- [nfsv4] Re: My feedback for NFSv4.1 specification Pali Rohár
- [nfsv4] Re: My feedback for NFSv4.1 specification Rick Macklem