Re: [nfsv4] I-D Action: draft-ietf-nfsv4-layoutwcc-01.txt
David Noveck <davenoveck@gmail.com> Wed, 12 April 2023 12:20 UTC
Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 44FA4C15170B for <nfsv4@ietfa.amsl.com>; Wed, 12 Apr 2023 05:20:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.094
X-Spam-Level:
X-Spam-Status: No, score=-2.094 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3mF-jQDVLfEY for <nfsv4@ietfa.amsl.com>; Wed, 12 Apr 2023 05:20:35 -0700 (PDT)
Received: from mail-qk1-x733.google.com (mail-qk1-x733.google.com [IPv6:2607:f8b0:4864:20::733]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5D7D6C1516F8 for <nfsv4@ietf.org>; Wed, 12 Apr 2023 05:20:35 -0700 (PDT)
Received: by mail-qk1-x733.google.com with SMTP id 201so2192417qkl.2 for <nfsv4@ietf.org>; Wed, 12 Apr 2023 05:20:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1681302034; x=1683894034; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=1zRPqOV1YUTajmudoh/Wm8Z9uTOWpX2zjNg7t11s2kU=; b=Rg4QiJdS9hyc5TPSC2chDJ7A/Og1hwYL75JntyMCqaBD2ujaCvFatePsLZINSuTUpG hwmllfulPocAWWB6/SopMm2+PfqY0mRuOzxoMErwjUuEVM5i5dITNmQ59lTthztnCN7b CPaU8mlIvuObBxZloCDx8y7DpjohMrXpzll/820c4EZXmlsL9dZQkEnU/wpMm2rS5cwY yuyvcc6HMi1rmaIRGqng/1gbD6X+0MTbulSDmZ4VYCtIqHW5pRC4TXb7+BymKM7VA1Iw QknEw0FFtxvJ1WhnucTkhA8mUypforO4ycHmjDfGQMrrXTbOJqHcM83HttSP16ClT4tj KQUQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1681302034; x=1683894034; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=1zRPqOV1YUTajmudoh/Wm8Z9uTOWpX2zjNg7t11s2kU=; b=an4tcWlDZkEEelrgh2QwTzr1YNM5Z4icYJUsrIjOC582AZQpqfWUW4kOaCfrJzbdkS e+T9Ft2+r3fiS65RvwvmH9jEKWZctEVezgMxGxiMY5t0BhrSSfEOru24p1V6NbMU5ki8 oSBBqK32KJfAXuc4T9NMfmzjGbpfP5V/0XYhMKPBHcG2Iy9z45yfQnnDEQ1FpKvJZOJ4 t1tA+5066Hhn5spcWxU/dWrkTBxDXVcxBTJoswRSD+yXK57WiMmQw0D0AFIEXi0mI7KN wpSVcH+E78+XnVSGIvdmH7xGMfOHZ/DTax4NhoLG3y6s4eesbSW4CNVMjXKm7akEblLJ nvFQ==
X-Gm-Message-State: AAQBX9eoDsXb4+Ye8VobzHz7NweToMloSjh7KQdofmZgTBnqjF4iP+BU A89rGAeCxFvgIOdWqkmINfktMg/N4sWWSO94QfcIClF5
X-Google-Smtp-Source: AKy350Zw/dZtsSJF7SwL62QRn46Gg5Odev8aXKMtkgY3SzrNXTomv786TqNJqL7huMTc6cZAANeSrWXjwNEtgKUJTvQ=
X-Received: by 2002:a05:620a:190c:b0:746:96c2:e458 with SMTP id bj12-20020a05620a190c00b0074696c2e458mr823150qkb.4.1681302034190; Wed, 12 Apr 2023 05:20:34 -0700 (PDT)
MIME-Version: 1.0
References: <168022420664.45787.1714641403814314728@ietfa.amsl.com> <CAM5tNy6udE6gqKHTfcTfzmf+CTqWiA9kuz__fnaqCyPPUF1+ZQ@mail.gmail.com> <CAABAsM6SNu1getd5GgGqp96qZtL4HMqt1mKTwLY=LaDOqdefdQ@mail.gmail.com> <CAM5tNy4UVT1R8jeECg+N_=tVn14B7QSNCT2WgZNzA2OG17yY-w@mail.gmail.com> <CAABAsM6ZSiXW=0igHyOfBg+QOG3a_6AAbg8zfxMqHJ3_772gxA@mail.gmail.com>
In-Reply-To: <CAABAsM6ZSiXW=0igHyOfBg+QOG3a_6AAbg8zfxMqHJ3_772gxA@mail.gmail.com>
From: David Noveck <davenoveck@gmail.com>
Date: Wed, 12 Apr 2023 08:20:23 -0400
Message-ID: <CADaq8jcymU8iGAd5fdUg0dSVYEOZX85breXR3AbJWNpKoGFz2A@mail.gmail.com>
To: Trond Myklebust <trondmy@gmail.com>
Cc: Rick Macklem <rick.macklem@gmail.com>, NFSv4 <nfsv4@ietf.org>
Content-Type: multipart/alternative; boundary="0000000000008216aa05f922a219"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/6NLttYDFtY48UhFxUlNjeUeyFiA>
Subject: Re: [nfsv4] I-D Action: draft-ietf-nfsv4-layoutwcc-01.txt
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Apr 2023 12:20:39 -0000
On Tue, Apr 11, 2023, 8:15 PM Trond Myklebust <trondmy@gmail.com> wrote: > As I've said many times before, LAYOUTCOMMIT is not only unnecessary, but > is a terrible idea. > No disagreement on my part but we will have to deal with the fact that it is part of rfc8881. Coming up with suitable text for rfc5661bis will be a challenge. Sigh! Any O_SYNC workload is literally sunk when you require LAYOUTCOMMIT in > order to fully commit the metadata, because each write() syscall needs to > go both to the data server and the metadata server. > > No proposal that attempts to expand the use of LAYOUTCOMMIT is going to be > welcome on the Linux client. > > On Tue, Apr 11, 2023 at 4:14 PM Rick Macklem <rick.macklem@gmail.com> > wrote: > >> On Tue, Apr 11, 2023 at 6:40 AM Trond Myklebust <trondmy@gmail.com> >> wrote: >> > >> > LAYOUTCOMMIT sort of makes sense for block based layouts, since you do >> need to persist updates of the blocks with the metadata server. It makes >> zero sense for files and flex files layout types, since you already have >> persistence in the form of the regular COMMIT operation. While you could >> argue that LAYOUTCOMMIT can be seen as just providing hints in those cases, >> there is nothing in the protocol spec that allows you to conclude that. It >> is not worth introducing a new operation that needs to be called every time >> that the client wants to ensure persistence, just in order to resolve a >> server reboot issue. >> > >> First, note the following sentence: >> The LAYOUTCOMMIT operation indicates that the client has completed >> writes using a layout obtained by a previous LAYOUTGET. >> In particular, note that it says nothing like "when data needs to be >> committed to stable storage". >> Also, I cannot find anywhere where the RFC discusses the relationship >> between LAYOUTCOMMIT >> and COMMIT. Therefore, I do not believe any relationship should be >> assumed. >> >> Now, here's the back story: >> When I implemented the pNFS server for FreeBSD, I assumed that the >> client would do a >> LayoutCommit after a series of writes and before it expected the MDS >> to have up-to-date >> attributes for the file that was written to the DS (using a RW layout). >> --> At the time of LayoutCommit, the MDS did a Getattr on the DS(s) >> for the layout to >> update the MDS's copy of the attributes that change when writing >> occurs. >> >> This worked fine for the FreeBSD client, which does a LayoutCommit >> whenever a >> process did a POSIX fsync or close syscall. >> >> Then I started testing using a Linux client and found out that it got >> confused by >> stale attributes acquired from the MDS after doing write(s) to the DS(s). >> Looking at the packet trace I found that, for the case where the Linux >> client >> did writes to the DS(s) with File_Sync set, the Linux client did not do a >> LayoutCommit. (It, of course, did not do a Commit, but that was what I >> expected.) >> --> I looked in the RFC for a statement that would make it clear that the >> Linux >> client was broken and should do a LayoutCommit after these write(s) >> but... >> I could not find one. >> So, to work around the problem I added a tunable called >> vfs.nfsd.pnfsgetdsattr which, when enabled, makes the pNFS server >> do a >> Getattr against the DS(s) whenever it receives a Getattr from a >> client and >> a RW layout has been issued for the file. >> --> This resulted in a lot of MDS->DS Getattrs, but made the Linux >> client >> function well against the FreeBSD pNFS server. >> >> It looks to me like the proposed Layout_WCC operation is mostly just >> trying to solve >> the problem that can be solved by requiring a pNFS client to do a >> LayoutCommit >> after a series of writes are done to DS(s) based on a RW layout and >> before it >> expects the MDS to have up-to-date attributes for the file that was >> written. >> (For my client, that seems to be whenever a POSIX close or fsync is done >> on >> the file.) >> >> Since LayoutCommits do not happen that frequently, the overhead of doing >> a Getattr against a DS for each of them is not particularily large. >> >> Then there is the separate issue of how a client reports errors >> encountered >> when doing write(s) to a DS if the MDS reboots, making the layout stateid >> stale. >> I'd rather see this handled via a non-empty loca_layoutupdate, since >> that handles >> the problem for both NFSv4.1 and NFSv4.2 via a layout specific driver. >> (NFSv4.1 is deficient, since it does not have LayoutError and this could >> work >> around that.) A new NFSv4.2 operation only fixes the issue for NFSv4.2 >> and >> only after it is adopted and only if it is clearly specified when the >> client must >> do the operation. >> >> It seems to me that a non-empty loca_layoutupdate is only needed for the >> rare second case, although it might be useful for the maintenance of >> up-to-date attributes on the MDS. >> >> Finally, I will note that it is not obvious how the client can provide a >> Change >> attribute to the MDS after doing Writes to a NFSv3 DS. I think that the >> client can provide a boolean that indicates if a write to a DS for the >> file >> has been done since the last LayoutCommit (or Layout_WCC if you >> choose to go that way). >> >> > Secondly, as you pointed out, the repurposing of LAYOUTCOMMIT for >> propagating layout stats and layout errors requires a protocol change in >> any case. While that protocol change would be limited to the flex files >> protocol, it requires a whole new flex files v2 spec, since there is no >> procedure for amending the layout protocols in the way that exists for the >> NFSv4.2 protocol itself. >> >> First, I do not consider this "repurposing". I think you have assumed >> a relationship >> between LayoutCommit and Commit that does not exist and, although the RFC >> seems >> vague w.r.t. when LayoutCommits are to be performed by a pNFS client, >> I do think that >> they were intended for this use. Among other things, they are allowed >> to be done during >> Grace using a stale layout stateid. >> >> Second, what's the difference between an amended RFC for Flexible File >> Layout and >> an RFC for a new NFSv4.2 operation? All that changes is the >> loca_layoutupdate becomes >> optionally non-empty. (If everyone is really convinced that is not >> permitted, then define >> Flexible File Layout 2 with that minimal change, to fix the "how to >> report DS errors when >> the MDS reboots.) >> >> > >> > Lastly, while LAYOUTCOMMIT could potentially carry a layout stats and >> layout error payload in the same way that LAYOUTRETURN can, that payload >> size is limited by the negotiated 'ca_maxrequestsize' session value. This >> is in part why we have dedicated LAYOUTSTATS and LAYOUTERROR operations in >> the first place: so that the client can report the full information in >> multiple RPC calls when necessary, without being limited by the payload >> limitations of a single RPC call. >> >> The arguments that come before loca_layoutupdate for LayoutCommit >> are (at a glance, so I might be off a little) 68bytes vs 20bytes for >> Layout_WCC. >> I doubt 48bytes is going to make a difference. I also believe that >> up-to-date >> attributes and errors received from write(s) to DS(s) while the MDS >> reboots >> are what matters. I do not see stats as important for the case where the >> MDS >> has rebooted and the client cannot do a LayoutStats with the old layout >> stateid. >> (After all, LayoutStats does not even exist in NFSv4.1.) >> >> rick >> >> > >> > On Mon, Apr 10, 2023 at 7:11 PM Rick Macklem <rick.macklem@gmail.com> >> wrote: >> >> >> >> If Tom commented w.r.t. the use of LAYOUTCOMMIT for this, >> >> I missed it. >> >> >> >> Here are some snippets from RFC8881 Sec. 18.42 plus comments: >> >> >> >> The LAYOUTCOMMIT operation indicates that the client has completed >> >> writes using a layout obtained by a previous LAYOUTGET. The client >> >> may have only written a subset of the data range it previously >> >> requested. LAYOUTCOMMIT allows it to commit or discard >> provisionally >> >> allocated space and to update the server with a new end-of-file. >> >> Note that the arguments loca_last_write_offset, loca_time_modify and >> >> loca_layoutupdate can be used by the client to inform the MDS about >> >> file changes due to writing to the DS. >> >> >> >> If the loca_reclaim field is set to TRUE, this indicates that the >> >> client is attempting to commit changes to a layout after the restart >> >> of the metadata server during the metadata server's recovery grace >> >> period (see Section 12.7.4). This type of request may be necessary >> >> when the client has uncommitted writes to provisionally allocated >> >> byte-ranges of a file that were sent to the storage devices before >> >> the restart of the metadata server. >> >> This explains how LayoutCommit can be used after an MDS reboot >> >> to update the MDS w.r.t. writes done to the DS. >> >> >> >> The loca_last_write_offset field specifies the offset of the last >> >> byte written by the client previous to the LAYOUTCOMMIT. Note that >> >> this value is never equal to the file's size (at most it is one byte >> >> less than the file's size) and MUST be less than or equal to >> >> NFS4_MAXFILEOFF. Also, loca_last_write_offset MUST overlap the >> range >> >> described by loca_offset and loca_length. The metadata server may >> >> use this information to determine whether the file's size needs to >> be >> >> updated. If the metadata server updates the file's size as the >> >> result of the LAYOUTCOMMIT operation, it must return the new size >> >> (locr_newsize.ns_size) as part of the results. >> >> >> >> The loca_time_modify field allows the client to suggest a >> >> modification time it would like the metadata server to set. The >> >> metadata server may use the suggestion or it may use the time of the >> >> LAYOUTCOMMIT operation to set the modification time. If the >> metadata >> >> server uses the client-provided modification time, it should ensure >> >> that time does not flow backwards. If the client wants to force the >> >> metadata server to set an exact time, the client should use a >> SETATTR >> >> operation in a COMPOUND right after LAYOUTCOMMIT. See Section >> 12.5.4 >> >> for more details. If the client desires the resultant modification >> >> time, it should construct the COMPOUND so that a GETATTR follows the >> >> LAYOUTCOMMIT. >> >> Here are details on how loca_last_write_offset and loca_time_modify >> >> are used to update the MDS. >> >> >> >> Although the Flex Files layout does not define any loca_layout_update, >> >> it seems that this could be added more easily than new NFSv4.2 >> operations >> >> if additional client->MDS information is deemed necessary. >> >> >> >> Now, I do agree that: >> >> The LAYOUTCOMMIT operation commits changes in the layout represented >> >> by the current filehandle, client ID (derived from the session ID in >> >> the preceding SEQUENCE operation), byte-range, and stateid. >> >> "commits changes in the layout" is kinda weird and does not clarify >> when >> >> LayoutCommit should be done by a client. I think this should be >> clarified, >> >> at least for Flex File Layout, as whenever the client has completed a >> series >> >> of writes to DS(s) using the layout and expects the MDS to return >> >> up-to-date Getattr information for the file to clients. >> >> >> >> With the above clarification (or whatever works for you), I do not see >> why >> >> new operations are needed? >> >> >> >> rick >> >> >> >> On Thu, Mar 30, 2023 at 5:57 PM <internet-drafts@ietf.org> wrote: >> >> > >> >> > >> >> > A New Internet-Draft is available from the on-line Internet-Drafts >> >> > directories. This Internet-Draft is a work item of the Network File >> System >> >> > Version 4 (NFSV4) WG of the IETF. >> >> > >> >> > Title : Add LAYOUT_WCC to NFSv4.2's Flex File Layout >> Type >> >> > Authors : Thomas Haynes >> >> > Trond Myklebust >> >> > Filename : draft-ietf-nfsv4-layoutwcc-01.txt >> >> > Pages : 11 >> >> > Date : 2023-03-30 >> >> > >> >> > Abstract: >> >> > The Parallel Network File System (pNFS) Flexible File Layout >> allows >> >> > for a file's metadata (MDS) and data (DS) to be on different >> servers. >> >> > It does not provide a mechanism for the data server to update the >> >> > metadata server of changes to the data part of the file. The >> client >> >> > has knowledge of such updates, but lacks the ability to update the >> >> > metadata server. This document presents a refinement to RFC8435 >> to >> >> > allow the client to update the metadata server to changes on the >> data >> >> > server. >> >> > >> >> > The IETF datatracker status page for this Internet-Draft is: >> >> > https://datatracker.ietf.org/doc/draft-ietf-nfsv4-layoutwcc/ >> >> > >> >> > There is also an HTML version available at: >> >> > https://www.ietf.org/archive/id/draft-ietf-nfsv4-layoutwcc-01.html >> >> > >> >> > A diff from the previous version is available at: >> >> > >> https://author-tools.ietf.org/iddiff?url2=draft-ietf-nfsv4-layoutwcc-01 >> >> > >> >> > Internet-Drafts are also available by rsync at rsync.ietf.org: >> :internet-drafts >> >> > >> >> > >> >> > _______________________________________________ >> >> > nfsv4 mailing list >> >> > nfsv4@ietf.org >> >> > https://www.ietf.org/mailman/listinfo/nfsv4 >> >> >> >> _______________________________________________ >> >> nfsv4 mailing list >> >> nfsv4@ietf.org >> >> https://www.ietf.org/mailman/listinfo/nfsv4 >> > _______________________________________________ > nfsv4 mailing list > nfsv4@ietf.org > https://www.ietf.org/mailman/listinfo/nfsv4 >
- [nfsv4] I-D Action: draft-ietf-nfsv4-layoutwcc-01… internet-drafts
- Re: [nfsv4] I-D Action: draft-ietf-nfsv4-layoutwc… Rick Macklem
- Re: [nfsv4] I-D Action: draft-ietf-nfsv4-layoutwc… Trond Myklebust
- Re: [nfsv4] I-D Action: draft-ietf-nfsv4-layoutwc… Rick Macklem
- Re: [nfsv4] I-D Action: draft-ietf-nfsv4-layoutwc… Trond Myklebust
- Re: [nfsv4] I-D Action: draft-ietf-nfsv4-layoutwc… David Noveck