Re: [nfsv4] Progressing RFC errata for RFC 5661

"Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de> Thu, 19 September 2019 09:14 UTC

Return-Path: <tigran.mkrtchyan@desy.de>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F11DD120877 for <nfsv4@ietfa.amsl.com>; Thu, 19 Sep 2019 02:14:43 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.999
X-Spam-Level:
X-Spam-Status: No, score=-5.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HK_RANDOM_ENVFROM=0.001, HK_RANDOM_FROM=0.999, RCVD_IN_DNSWL_HI=-5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=desy.de
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id D7g4I_r62TrM for <nfsv4@ietfa.amsl.com>; Thu, 19 Sep 2019 02:14:40 -0700 (PDT)
Received: from smtp-o-2.desy.de (smtp-o-2.desy.de [131.169.56.155]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9F366120819 for <nfsv4@ietf.org>; Thu, 19 Sep 2019 02:14:40 -0700 (PDT)
Received: from smtp-buf-2.desy.de (smtp-buf-2.desy.de [IPv6:2001:638:700:1038::1:a5]) by smtp-o-2.desy.de (Postfix) with ESMTP id 0F37116055D for <nfsv4@ietf.org>; Thu, 19 Sep 2019 11:14:38 +0200 (CEST)
DKIM-Filter: OpenDKIM Filter v2.11.0 smtp-o-2.desy.de 0F37116055D
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=desy.de; s=default; t=1568884478; bh=DhnVycYhEe0iaHlz/RyZoCKPX2s94Sco8dLWKnLb78c=; h=Date:From:To:Cc:In-Reply-To:References:Subject:From; b=1JD5xY6EAs2zfEJKxRgjWOd52H2YLmwGoPDrNDEUkl97QU2AA6fJ3xhKyFLG3tP1I jGe86MTWRBMReNeHBTA/CybUsBZ/IZUGTK4aaNk5AVSn5Sex+YI0Zz7WKjcQAEfhRv V+5t7LBdxLZhjsKa957S281ReMddfirJoWT39+l4=
Received: from smtp-m-2.desy.de (smtp-m-2.desy.de [131.169.56.130]) by smtp-buf-2.desy.de (Postfix) with ESMTP id 067E21A00E8; Thu, 19 Sep 2019 11:14:38 +0200 (CEST)
X-Virus-Scanned: amavisd-new at desy.de
Received: from z-mbx-2.desy.de (z-mbx-2.desy.de [131.169.55.140]) by smtp-intra-2.desy.de (Postfix) with ESMTP id C811710003A; Thu, 19 Sep 2019 11:14:37 +0200 (CEST)
Date: Thu, 19 Sep 2019 11:14:37 +0200
From: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
To: Rick Macklem <rmacklem@uoguelph.ca>
Cc: Trond Myklebust <trondmy@gmail.com>, Dave Noveck <davenoveck@gmail.com>, Magnus Westerlund <magnus.westerlund@ericsson.com>, NFSv4 <nfsv4@ietf.org>
Message-ID: <908189693.30823713.1568884477388.JavaMail.zimbra@desy.de>
In-Reply-To: <YT1PR01MB35931DB2A308E81571FFDD38DD890@YT1PR01MB3593.CANPRD01.PROD.OUTLOOK.COM>
References: <DB7PR07MB5736124B2F507DA20F317BC195B30@DB7PR07MB5736.eurprd07.prod.outlook.com> <CADaq8jd4u-Lwvy_Csu2jqrcGFZ_tLOeSkqKwUW0eivuc=trsBg@mail.gmail.com> <CAABAsM5TDGx0qiMv+Ln4WLOKjuQiTFKr6HD6d9zqD3NfjpvoFg@mail.gmail.com> <YTXPR0101MB2189B0CB69FA090BD1D54F92DD8E0@YTXPR0101MB2189.CANPRD01.PROD.OUTLOOK.COM> <1941956044.30576385.1568804777656.JavaMail.zimbra@desy.de> <YTXPR0101MB21892F5E56B089AC6A255506DD8E0@YTXPR0101MB2189.CANPRD01.PROD.OUTLOOK.COM> <CAABAsM6wB-Jik_RqEHkzy3RrsOhrCx4X=LSxpqWz=PKvVJ7QxA@mail.gmail.com> <YT1PR01MB35931DB2A308E81571FFDD38DD890@YT1PR01MB3593.CANPRD01.PROD.OUTLOOK.COM>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
X-Mailer: Zimbra 8.8.10_GA_3781 (ZimbraWebClient - FF69 (Linux)/8.8.10_GA_3786)
Thread-Topic: Progressing RFC errata for RFC 5661
Thread-Index: AQHVbM0EuGFC68ZdA0WMehAxtQCJUqcwwJsEofkpcSPeB7DymYAAQlCAgACGM4R9S3qt9Q==
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/nlg62eA5KsWVOyrru00ghuk2XdI>
Subject: Re: [nfsv4] Progressing RFC errata for RFC 5661
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 19 Sep 2019 09:14:44 -0000


----- Original Message -----
> From: "Rick Macklem" <rmacklem@uoguelph.ca>
> To: "Trond Myklebust" <trondmy@gmail.com>
> Cc: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>, "Dave Noveck" <davenoveck@gmail.com>, "Magnus Westerlund"
> <magnus.westerlund@ericsson.com>, "NFSv4" <nfsv4@ietf.org>
> Sent: Thursday, September 19, 2019 6:07:32 AM
> Subject: Re: [nfsv4] Progressing RFC errata for RFC 5661

> Trond Myklebust wrote:
>>Rick,
>>
>>That errata predates most of the Linux pNFS client implementation. We
>>wrote the implementation to conform to the errata.
>>
>>So no. It's not a bug. It's a deliberate design based on a decision
>>that was discussed in the IETF WG, on the mailing list
>>
>>https://mailarchive.ietf.org/arch/msg/nfsv4/_KTtO6uz-MvRoStbhPuOXWZr6yI
> This actually appears to be a discussion related to the offset and length
> arguments for LayoutCommit, but...
>>
>>and in a special session of the IETF:
>>
>>https://mailarchive.ietf.org/arch/msg/nfsv4/Rpw9XCwCARxfaU4ym5L2TauV6ao
> Ok. This was long before I got around to implementing it, so I wouldn't have
> understood the implications.
> --> I would have been interested in hearing the rationale behind not doing
>      LayoutCommit for FILE_SYNC4 writes, since it seems to me that RFC-5661
>      had gotten it right when it required them.

In case of a cluster filesystem back-end, FILE_SYNC4 on write indicates that MDS already
has the correct file attributes. An extra LAYOUTCOMMIT will introduce additional overhead.
I can imagine a HPC workload where client talks to DS over InfiniBand and Ethernet to
MDS. An extra LAYOUTCOMMIT will drop write throughput.

Tigran.


> 
> As I said, the FreeBSD server can handle this case, it just results in a lot of
> overhead synchronizing Size, Change, Time_Modify between MDS and DS
> whenever a RW layout is issued to a client for the file.
> 
> Thanks for pointing this out, rick
> 
> On Wed, 18 Sep 2019 at 12:39, Rick Macklem <rmacklem@uoguelph.ca> wrote:
>>
>> Mkrtchyan, Tigran wrote:
>> [stuff snipped]
>> >Hi Rick,
>> >
>> >here is the public link to errata
>> >
>> >https://www.rfc-editor.org/errata/eid2751
>> >
>> >Tigran.
>> Thanks Tigran.
>>
>> Ok, so now that I've read it I have to admit I think it is rewriting the RFC
>> to conform with what the Linux client does.
>>
>> I think this para. from Sec. 13.10 of RFC-5661 is clear:
>>    The NFSv4.1 protocol only provides close-to-open file data cache
>>    semantics; meaning that when the file is closed, all modified data is
>>    written to the server.  When a subsequent OPEN of the file is done,
>>    the change attribute is inspected for a difference from a cached
>>    value for the change attribute.  For the case above, this means that
>>    a LAYOUTCOMMIT will be done at close (along with the data WRITEs) and
>>    will update the file's size and change attribute.  Access from
>>    another client after that point will result in the appropriate size
>>    being returned.
>>
>> It states "will be done". It doesn't say anything about UNSTABLE4 vs FILE_SYNC4.
>> (I think most POSIX-like clients would consider the fsync(2) syscall to require
>>  the same treatment as "close" above, but that is a POSIX-specific client issue.)
>> I can see the argument that, since there is no "must" in the statement, that a
>> client can choose not to do this, but that would also imply that the client will
>> need to live with the consequences of it.
>>
>> I think the second sentence of the first para. of the errata is bogus:
>> For file layouts, WRITEs to a Data Server that return a stable_how4 value of
>> FILE_SYNC4 guarantee that data and file system metadata are on stable
>> storage.  This means that a LAYOUTCOMMIT is not needed in order to make the
>> data and metadata visible to the metadata server and other clients.
>>
>> Why?
>> The FILE_SYNC4 was returned by the DS. This would imply the DS
>> has committed data and metadata to stable storage on the DS.
>> However, I am not aware of anything in RFC-5661 that would imply that
>> the Size, Time_Modify and Change attributes or anything else must have
>> been updated or in stable storage on the MDS at this time.
>>
>> If a server does not require LayoutCommit operations for correct behaviour
>> then it can simply reply NFS4ERR_NOTSUPP (as I believe the Netapp filer
>> does) and the client then no longer needs to do them.
>>
>> If there is somewhere in RFC-5661 that it is stated that LayoutCommits are
>> not required when the DS replies FILE_SYNC4, then I missed it and there
>> is a problem with RFC-5661 that needs to be addressed.
>>
>> Otherwise, sorry, but it seems that the bug is in the Linux client
>> implementation
>> and not RFC-5661.
>>
>> Is there a File Layout pNFS server implementation where the DSs return
>> FILE_SYNC4 that will break if the client does a LayoutCommit for this case?
>> (If so, then something may need to be done.)
>>
>> rick
>>
>>