Re: [nfsv4] I-D Action: draft-ietf-nfsv4-layoutwcc-01.txt

Trond Myklebust <> Tue, 11 April 2023 13:40 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 7CE62C151540; Tue, 11 Apr 2023 06:40:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.094
X-Spam-Status: No, score=-2.094 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id BfQif7UPB7so; Tue, 11 Apr 2023 06:40:13 -0700 (PDT)
Received: from ( [IPv6:2607:f8b0:4864:20::b2d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by (Postfix) with ESMTPS id 17388C14CF12; Tue, 11 Apr 2023 06:40:13 -0700 (PDT)
Received: by with SMTP id h198so14630962ybg.12; Tue, 11 Apr 2023 06:40:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20210112; t=1681220412; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=hkzS4hSkI++YkLvI9GNo4atFKKHhafRhwsJpEzD+xn4=; b=gax1ioPPM+S4KZmZAoYJRhEc5a89VjKZFKhfUJlGYBHnpPciUnI5z1T0C74nzf+FDz cMyAKqui3o1p3JQrCSiCVyYmFqZns//2qqejtMX2JG2tGaTHgt5Nc9q8IE9YKwWU6g1t DyzFSWfPTGrZWMQZb5OhF1wFXNdu1rJgRExS30pPvCwXV6STti7H6VTlLRsuuZmrYXFu NgTsorQBf2FKxv232iYVEwAd8ljtID5yE9l3euS4wV7fkMfmb1z3JSLbKNqRa52MtiQd o8EpYj8/V52bCNzqw186gissJ6rNutZ/CRdk5HggIqPSKNtK7zc/jIR7OwUrBCtmz9ZS c+oQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20210112; t=1681220412; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=hkzS4hSkI++YkLvI9GNo4atFKKHhafRhwsJpEzD+xn4=; b=Q7OrnTY6qhMF4a6FF/Qfs0h3oHLAjRQYgSmrJbPQpcyAVlFMXfRqpWmJWNf4dhpzVZ lN5dpFOaFoTMGhdXjTNB7BHwXqtD9cr1ybyRt/rnLVWcdeSF5g5DD48eC29GFXIH4nWv g49slIEQISk6RmdrpuxMPRlq9atT8CYBLnX9aksNeDYyDkHUldVTmcf+YQEG+nZ1CPN2 kE8QGGQYqNWp1eTK5iQKzSn9Gz2d5TpYuUfin+cJTFfUk/98Qp/H+Lc4kshQ10UVXH+M m6DvVYpdWHWs0A2SCf2n1oOBu8u1G/AA0gmq8NF+OzTuK4SAkiW+7F0shZX8IOwx1bWc t7jg==
X-Gm-Message-State: AAQBX9drZWfzz6gkwWQ558MUJ2iWLAnzreRWRx67zA3tVXY0tvh3FLG0 hJBmKRtv54xOeV2Natr4zDjbqWpZNk4SZaKtXSXXA+MA1ha6
X-Google-Smtp-Source: AKy350bty6g3u/FCDA4fs79tCYuN7cUO5PhtZMgUCh5Zy3wVStUZTlgGuNb0PWfls/iorjRNzEB2EtQnSSkY8Zoe/Ms=
X-Received: by 2002:a05:6902:12c7:b0:b26:884:c35e with SMTP id j7-20020a05690212c700b00b260884c35emr11721887ybu.4.1681220411950; Tue, 11 Apr 2023 06:40:11 -0700 (PDT)
MIME-Version: 1.0
References: <> <>
In-Reply-To: <>
From: Trond Myklebust <>
Date: Tue, 11 Apr 2023 09:40:00 -0400
Message-ID: <>
To: Rick Macklem <>
Content-Type: multipart/alternative; boundary="00000000000071892405f90fa134"
Archived-At: <>
Subject: Re: [nfsv4] I-D Action: draft-ietf-nfsv4-layoutwcc-01.txt
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: NFSv4 Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 11 Apr 2023 13:40:17 -0000

LAYOUTCOMMIT sort of makes sense for block based layouts, since you do need
to persist updates of the blocks with the metadata server. It makes zero
sense for files and flex files layout types, since you already have
persistence in the form of the regular COMMIT operation. While you could
argue that LAYOUTCOMMIT can be seen as just providing hints in those cases,
there is nothing in the protocol spec that allows you to conclude that. It
is not worth introducing a new operation that needs to be called every time
that the client wants to ensure persistence, just in order to resolve a
server reboot issue.

Secondly, as you pointed out, the repurposing of LAYOUTCOMMIT for
propagating layout stats and layout errors requires a protocol change in
any case. While that protocol change would be limited to the flex files
protocol, it requires a whole new flex files v2 spec, since there is no
procedure for amending the layout protocols in the way that exists for the
NFSv4.2 protocol itself.

Lastly, while LAYOUTCOMMIT could potentially carry a layout stats and
layout error payload in the same way that LAYOUTRETURN can, that payload
size is limited by the negotiated 'ca_maxrequestsize' session value. This
is in part why we have dedicated LAYOUTSTATS and LAYOUTERROR operations in
the first place: so that the client can report the full information in
multiple RPC calls when necessary, without being limited by the payload
limitations of a single RPC call.

On Mon, Apr 10, 2023 at 7:11 PM Rick Macklem <> wrote:

> If Tom commented w.r.t. the use of LAYOUTCOMMIT for this,
> I missed it.
> Here are some snippets from RFC8881 Sec. 18.42 plus comments:
>    The LAYOUTCOMMIT operation indicates that the client has completed
>    writes using a layout obtained by a previous LAYOUTGET.  The client
>    may have only written a subset of the data range it previously
>    requested.  LAYOUTCOMMIT allows it to commit or discard provisionally
>    allocated space and to update the server with a new end-of-file.
> Note that the arguments loca_last_write_offset, loca_time_modify and
> loca_layoutupdate can be used by the client to inform the MDS about
> file changes due to writing to the DS.
>    If the loca_reclaim field is set to TRUE, this indicates that the
>    client is attempting to commit changes to a layout after the restart
>    of the metadata server during the metadata server's recovery grace
>    period (see Section 12.7.4).  This type of request may be necessary
>    when the client has uncommitted writes to provisionally allocated
>    byte-ranges of a file that were sent to the storage devices before
>    the restart of the metadata server.
> This explains how LayoutCommit can be used after an MDS reboot
> to update the MDS w.r.t. writes done to the DS.
>    The loca_last_write_offset field specifies the offset of the last
>    byte written by the client previous to the LAYOUTCOMMIT.  Note that
>    this value is never equal to the file's size (at most it is one byte
>    less than the file's size) and MUST be less than or equal to
>    NFS4_MAXFILEOFF.  Also, loca_last_write_offset MUST overlap the range
>    described by loca_offset and loca_length.  The metadata server may
>    use this information to determine whether the file's size needs to be
>    updated.  If the metadata server updates the file's size as the
>    result of the LAYOUTCOMMIT operation, it must return the new size
>    (locr_newsize.ns_size) as part of the results.
>    The loca_time_modify field allows the client to suggest a
>    modification time it would like the metadata server to set.  The
>    metadata server may use the suggestion or it may use the time of the
>    LAYOUTCOMMIT operation to set the modification time.  If the metadata
>    server uses the client-provided modification time, it should ensure
>    that time does not flow backwards.  If the client wants to force the
>    metadata server to set an exact time, the client should use a SETATTR
>    operation in a COMPOUND right after LAYOUTCOMMIT.  See Section 12.5.4
>    for more details.  If the client desires the resultant modification
>    time, it should construct the COMPOUND so that a GETATTR follows the
> Here are details on how loca_last_write_offset and loca_time_modify
> are used to update the MDS.
> Although the Flex Files layout does not define any loca_layout_update,
> it seems that this could be added more easily than new NFSv4.2 operations
> if additional client->MDS information is deemed necessary.
> Now, I do agree that:
>    The LAYOUTCOMMIT operation commits changes in the layout represented
>    by the current filehandle, client ID (derived from the session ID in
>    the preceding SEQUENCE operation), byte-range, and stateid.
> "commits changes in the layout" is kinda weird and does not clarify when
> LayoutCommit should be done by a client. I think this should be clarified,
> at least for Flex File Layout, as whenever the client has completed a
> series
> of writes to DS(s) using the layout and expects the MDS to return
> up-to-date Getattr information for the file to clients.
> With the above clarification (or whatever works for you), I do not see why
> new operations are needed?
> rick
> On Thu, Mar 30, 2023 at 5:57 PM <> wrote:
> >
> >
> > A New Internet-Draft is available from the on-line Internet-Drafts
> > directories. This Internet-Draft is a work item of the Network File
> System
> > Version 4 (NFSV4) WG of the IETF.
> >
> >    Title           : Add LAYOUT_WCC to NFSv4.2's Flex File Layout Type
> >    Authors         : Thomas Haynes
> >                      Trond Myklebust
> >    Filename        : draft-ietf-nfsv4-layoutwcc-01.txt
> >    Pages           : 11
> >    Date            : 2023-03-30
> >
> > Abstract:
> >    The Parallel Network File System (pNFS) Flexible File Layout allows
> >    for a file's metadata (MDS) and data (DS) to be on different servers.
> >    It does not provide a mechanism for the data server to update the
> >    metadata server of changes to the data part of the file.  The client
> >    has knowledge of such updates, but lacks the ability to update the
> >    metadata server.  This document presents a refinement to RFC8435 to
> >    allow the client to update the metadata server to changes on the data
> >    server.
> >
> > The IETF datatracker status page for this Internet-Draft is:
> >
> >
> > There is also an HTML version available at:
> >
> >
> > A diff from the previous version is available at:
> >
> >
> > Internet-Drafts are also available by rsync at
> :internet-drafts
> >
> >
> > _______________________________________________
> > nfsv4 mailing list
> >
> >
> _______________________________________________
> nfsv4 mailing list