Re: [nfsv4] file size and getattr

Rick Macklem <rmacklem@uoguelph.ca> Thu, 28 February 2019 01:22 UTC

Return-Path: <rmacklem@uoguelph.ca>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E9B46130EB3 for <nfsv4@ietfa.amsl.com>; Wed, 27 Feb 2019 17:22:19 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rC4rPQ0UL0jV for <nfsv4@ietfa.amsl.com>; Wed, 27 Feb 2019 17:22:16 -0800 (PST)
Received: from CAN01-QB1-obe.outbound.protection.outlook.com (mail-eopbgr660073.outbound.protection.outlook.com [40.107.66.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9B1C21277D2 for <nfsv4@ietf.org>; Wed, 27 Feb 2019 17:22:16 -0800 (PST)
Received: from QB1PR01MB3537.CANPRD01.PROD.OUTLOOK.COM (52.132.89.15) by QB1PR01MB3156.CANPRD01.PROD.OUTLOOK.COM (52.132.84.82) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1643.16; Thu, 28 Feb 2019 01:22:14 +0000
Received: from QB1PR01MB3537.CANPRD01.PROD.OUTLOOK.COM ([fe80::609b:1ecd:c908:d44c]) by QB1PR01MB3537.CANPRD01.PROD.OUTLOOK.COM ([fe80::609b:1ecd:c908:d44c%6]) with mapi id 15.20.1643.022; Thu, 28 Feb 2019 01:22:14 +0000
From: Rick Macklem <rmacklem@uoguelph.ca>
To: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>, Dave Noveck <davenoveck@gmail.com>
CC: trondmy <trondmy@hammerspace.com>, NFSv4 <nfsv4@ietf.org>
Thread-Topic: [nfsv4] file size and getattr
Thread-Index: xB45BipLgBZpsCd+0L6G4OWR1hcJRu0o7KH9gAAKLuOAAKAAAIAAunJtgAAVXACAARdkgIAAA8GAgAAA84CAAERMgIAAMsSN
Date: Thu, 28 Feb 2019 01:22:14 +0000
Message-ID: <QB1PR01MB3537C7064B31A85146B95CBADD750@QB1PR01MB3537.CANPRD01.PROD.OUTLOOK.COM>
References: <155049372736.14318.3390584694682770373.idtracker@ietfa.amsl.com> <QB1PR01MB3537215D010370E139D70843DD7B0@QB1PR01MB3537.CANPRD01.PROD.OUTLOOK.COM> <1cbdd04ce23c5012f59b6d5dcbb6b505be0675b9.camel@hammerspace.com> <QB1PR01MB3537841A7F121CFA082B778ADD7B0@QB1PR01MB3537.CANPRD01.PROD.OUTLOOK.COM> <9bae6a43ea26f112e015b7b0b213fb9dd43bac68.camel@hammerspace.com> <CADaq8jfcFCuM5u30bXqTvg+0Hc-V+1VO9aUdSJLA090j48LCvA@mail.gmail.com> <90c104ee477cdc8e999ddaff4ccab32332d0f71d.camel@hammerspace.com> <CADaq8jfCp503KHQO-NOpjiPv1zt32RiVMfUkua-jtvXrZhzmeQ@mail.gmail.com>, <1245470464.7828950.1551305609342.JavaMail.zimbra@desy.de>
In-Reply-To: <1245470464.7828950.1551305609342.JavaMail.zimbra@desy.de>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: spf=none (sender IP is ) smtp.mailfrom=rmacklem@uoguelph.ca;
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 2ca9f49c-08aa-4aa5-bef0-08d69d1b2bcc
x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600127)(711020)(4605104)(2017052603328)(7153060)(7193020); SRVR:QB1PR01MB3156;
x-ms-traffictypediagnostic: QB1PR01MB3156:
x-ms-exchange-purlcount: 1
x-microsoft-exchange-diagnostics: 1; QB1PR01MB3156; 23:CDI65qL9oGBU4kdxP9uqjhWIQtyW4SluQSx3gr1/9P0uKsjn3XKHvUgpT9q41EvxkYh9zzTDaaDqLXjWdCmy534CXSRXh0fNTBzY1DX/nM7napiaMx0icLFpHcInMCfKu2l7gHUYM3b1Xc4QMe3aWKZ/yg31iwkBkvfqSKirO3V4X46tFiLC9FBwYg99kKh15BfoKGB4/jmK1aWmM6e9tiNztMoXi8LgN2o1E2cguqg8xAkARuftDTvbJ3+qjHXKKn1RZsN2Oc4SAfOAHaHRFcJADFkXYCN+cmho3i1h1MAXzzNtrQa5uDVfso+opGHwGqwzTL+/FLxGODIO+BJOkjRLNjVoKRgnBd2Wc+w+Qxe05lT9yRZM5zPC5kXv9y4XjWQheBsrZpRo1SF/cIiEZwT6yAYE6bHUm7tOd9sM9YgUFaZKwVrH1rCxijmiZlgUWsf/D18o1flMaW2XN1bLSK3BYlV782NFAE1JPqeddOGjX1/MR02AU86C5SdDZENPkFZdKZMVNyT/z4Cx4yqqls7XeFyAfZcFUwkLfOL7KRGMr8SSdDSGEovWrfrRILB37aZX37kWelB9ncBmR61h23WbSoqRt1LmnqKI5EtgYczUxUBTtgQu6D04fXTpPqrih6cmaOn2XP/goqTmqDqjKn6DaykatzhbcUrTa09GPuYU508ltmcWBzAhNZAewn10AfHLTpODYcdsxsFL7ArbLf+Sd2N3rWiV6U1MTKC3bhEISx2wzPnfsRTlJ8WFq1RpxBYhq8HgyC3J7n8tmgwfiOuizWsoqNSOHntaLHpkq18b+eWTtPCqGj2Cy4tX5+ktN8oXyOv0+KPuL/HLfdMl+lE3hkwxetyu+1RmSsnhi2zda7J19VSHvXjuJVrH5AZSXwKqj8NYSAjjoJJhXKmnpNIWdXZSJ224rwWwEsvKU3rI5Pfk3I4ejYO9FWx3ECwz+L8ev082joK/v+0Isk01QZ6/WxKL8htfMtM4TUEkaTSj5EJnhJRVct4eO+phdKgdVUrQ4RjxZOX/rOwvwYzrWrMnjr7dT3Snq4fuK2wACVcbxR6PGE5AFRR9AhdkLcvc/c5hDhDz5mAuedNpv3slxqVh+aoJ6/gM2xYaCEniq+uTmtmOmQwnmW2R6tcHZ3TpnByzOA+JhaAeFYVMVPtWw9YOQmgRLgt98OnGDjkuaFo7CFuQPcqa8947TrJaLaKWdAXYuchdVyrx3UEKwqNoPjeIDfRmiM7ynu51dyT9CuT9iziGC8wgjbknkrvpJtmN
x-microsoft-antispam-prvs: <QB1PR01MB31567DE29C2B370DB9AB0A7DDD750@QB1PR01MB3156.CANPRD01.PROD.OUTLOOK.COM>
x-forefront-prvs: 0962D394D2
x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(39860400002)(366004)(346002)(136003)(376002)(396003)(51914003)(189003)(199004)(110136005)(966005)(446003)(25786009)(4326008)(478600001)(6436002)(256004)(93886005)(55016002)(229853002)(33656002)(6246003)(71200400001)(14454004)(786003)(53936002)(9686003)(54906003)(97736004)(14444005)(6306002)(8936002)(2906002)(186003)(86362001)(74482002)(53546011)(6506007)(7696005)(316002)(68736007)(305945005)(99286004)(74316002)(71190400001)(8676002)(486006)(106356001)(81156014)(81166006)(105586002)(46003)(76176011)(11346002)(476003)(102836004)(5660300002); DIR:OUT; SFP:1101; SCL:1; SRVR:QB1PR01MB3156; H:QB1PR01MB3537.CANPRD01.PROD.OUTLOOK.COM; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1;
received-spf: None (protection.outlook.com: uoguelph.ca does not designate permitted sender hosts)
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam-message-info: uaip/eCcmbWtstgcGuYmwlUFKsIDRNqLuZ/1iG0+KAINjJ1FYBZw9cFrZZvQ2tBq0YyEjW1T8dj6jndrH8I7K0WVs44Ae8dX1sTHEvJPiab+a5bYx3oRmvINj04QD+sSEH/nvsTkWXLzyAEgrJwz6Jsw4OOrM/UKBzvbp1xGCyAglmFZ0nwBCN5jcJEU1HRFNnE55B1a8Spl3VLW5r5uXryla5Le8+Slle1MY6X0dyR2ZC8HpH+DGv3IVbeiQpJZIkZpTdsLRT5QE04n1dBGptKeCi+qP1G3GAAVb7uWQ+8vKQBCveVzxOiuNlB/iFKiRyFV1Mk3F6sC6IA8zQAK2tYNyaelpfuqxOY/FlgDfDD2qnGq2+Atl1unFAT9Jrqpii4lLITw+qiXZ0viMAoJiXNX5sKwTLyOP2JI6mIP+mo=
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: uoguelph.ca
X-MS-Exchange-CrossTenant-Network-Message-Id: 2ca9f49c-08aa-4aa5-bef0-08d69d1b2bcc
X-MS-Exchange-CrossTenant-originalarrivaltime: 28 Feb 2019 01:22:14.3119 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-Transport-CrossTenantHeadersStamped: QB1PR01MB3156
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/VUPGOPffgH_0bVUBHrL-f5xwKW0>
Subject: Re: [nfsv4] file size and getattr
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Feb 2019 01:22:20 -0000

Tigran Mkrtchyan wrote:
[stuff snipped]
>There is no free cheese...In Distributed systems consistency has a high
>price tag and the currency measured in milliseconds.
>
>On the other hand, not all applications require O_SYNC. Though, there are of course
>some specific `killer` use cases, like cluster-wide checkpointing in HPC.
Thanks for the interesting discussion everyone.

I don't have a good answer for the O_SYNC case and will be comfortable with
whichever is done for the Flexible File Layout (I suppose some sort of errata for
RFC-8435 might be needed?) w.r.t. this.
(At this time it is a tunable in the FreeBSD pNFS server, but I can't think of an easy
 way for the client to tell the server whether it will be doing LayoutCommits for
 writes with FILE_SYNC, so leaving it as a tunable is all I can think of to do.)

rick

>
> On Wed, Feb 27, 2019 at 1:05 PM Trond Myklebust <trondmy@hammerspace.com>
> wrote:
>
>> On Wed, 2019-02-27 at 12:52 -0500, David Noveck wrote:
>> > > However note
>> > > that the counter argument to what you state above is that _if_ the
>> > > server requires a layoutcommit before it will acknowledge a file
>> > size
>> > > change, then pNFS is likely to underperform for applications such
>> > as
>> > > databases or VMs where each record is required to be written in
>> > stable
>> > > mode.
>> > > IOW: If all writes that need to be stable are also required to be
>> > > acknowledged with a layoutcommit (to the MDS),
>> >
>> > But it is not true that *all* writes that need to be stable are also
>> > required
>> > to be acknowledged with a layoutcommit (to the MDS.  Only those that
>> > potentially change the file size require this.
>>
>> That's true for POSIX O_DSYNC writes, but it is not true for O_SYNC. In
>> the latter case, the timestamps are required to be updated
>> synchronously as well, which implies a layoutcommit.
>>
>> >
>> > > then your ability to
>> > > scale out your server will be in doubt
>> >
>> > For many applications, particularly databases, it will easy to make
>> > sure
>> > that the writes that potentially change the file size are few and far
>> > between.
>>
>> If the database uses O_DSYNC, yes.
>>
>> >
>> > On Tue, Feb 26, 2019 at 8:12 PM Trond Myklebust <
>> > trondmy@hammerspace.com> wrote:
>> > > On Wed, 2019-02-27 at 00:13 +0000, Rick Macklem wrote:
>> > > > Trond Myklebust wrote:
>> > > > [stuff snipped]
>> > > > > Please see the Errata ID 2751
>> > > > > http://www.rfc-editor.org/errata/eid2751
>> > > >
>> > > > I'll admit I hadn't seen this errata before. However, it seems to
>> > > be
>> > > > specific to
>> > > > the File Layout. For the Flexible File Layout...
>> > > >
>> > > > When I look in RFC-8435, I cannot find anything that states that
>> > > a
>> > > > LayoutCommit
>> > > > is only required for case(s) where a Commit to the Storage Server
>> > > is
>> > > > required.
>> > > > Sec. 2.1
>> > > >    Clearly states that a Commit to the Storage Server is required
>> > > > before the client
>> > > >    does a LayoutCommit when the write(s) were not done FILE_SYNC.
>> > > >    However, I do not see any indication that the LayoutCommit is
>> > > not
>> > > > to be done
>> > > >    for the case where the write(s) are done FILE_SYNC.
>> > > >
>> > > > FF_FLAGS_NO_LAYOUTCOMMIT can be used to indicate to a client that
>> > > > LayoutCommits are not required, but this does not be dependent on
>> > > how
>> > > > the write(s) to the Storage Server were done.
>> > > >
>> > > > The only way a Flexible File layout Metadata server can know what
>> > > the
>> > > > current file size is (when a read/write layout is issued to a
>> > > client)
>> > > > is to do a
>> > > > Getattr to the Storage Server.
>> > > > If a client is not required to do a LayoutCommit when the
>> > > write(s) to
>> > > > the
>> > > > Storage Server are done FILE_SYNC, then the Metadata server must
>> > > do
>> > > > Getattr RPCs to the Storage Server whenever it needs an up to
>> > > date
>> > > > file size
>> > > > if a read/write layout is issued to a client.
>> > > >
>> > > > This can result in a lot of overhead that can be avoided by
>> > > requiring
>> > > > the
>> > > > LayoutCommit to be done by a client after writing to a Storage
>> > > > Server,
>> > > > irrespective of the need for a Commit to the Storage Server.
>> > > > As such, I would rather not have this errata applied to RFC-8435.
>> > > >
>> > >
>> > > Fair enough. I agree that the errata in question only applies to
>> > > the
>> > > pNFS files layout, however you were talking about RFC5661 and
>> > > whether
>> > > or not we were interpreting that correctly. Since RFC5661 only
>> > > refers
>> > > to about the behaviour of the pNFS files layout, then I assumed
>> > > that
>> > > was what you were referring to.
>> > >
>> > > For flexfiles we may have a bug in the layoutcommit case. However
>> > > note
>> > > that the counter argument to what you state above is that _if_ the
>> > > server requires a layoutcommit before it will acknowledge a file
>> > > size
>> > > change, then pNFS is likely to underperform for applications such
>> > > as
>> > > databases or VMs where each record is required to be written in
>> > > stable
>> > > mode.
>> > > IOW: If all writes that need to be stable are also required to be
>> > > acknowledged with a layoutcommit (to the MDS), then your ability to
>> > > scale out your server will be in doubt.
>> > >
>> --
>> Trond Myklebust
>> Linux NFS client maintainer, Hammerspace
>> trond.myklebust@hammerspace.com
>>
>>