Re: [nfsv4] Notes regarding discussion of directory scalabiliy issues
Trond Myklebust <trondmy@hammerspace.com> Tue, 30 June 2020 13:32 UTC
Return-Path: <trondmy@hammerspace.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 579813A0849 for <nfsv4@ietfa.amsl.com>; Tue, 30 Jun 2020 06:32:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.1
X-Spam-Level:
X-Spam-Status: No, score=-2.1 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=hammerspace.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vYOiPIlhN3X2 for <nfsv4@ietfa.amsl.com>; Tue, 30 Jun 2020 06:32:30 -0700 (PDT)
Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2095.outbound.protection.outlook.com [40.107.220.95]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 20F293A0847 for <nfsv4@ietf.org>; Tue, 30 Jun 2020 06:32:29 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=iLC/SiokjM2c/ndIvFbGNjQ4qRpP7KS1tn+cBwOJH7y+DqBw3gKP0KChpX7Sm0loqulmIRhYqmYq7AvT+8YyHrWMEHIz7r8d2IyCi/awqcR2luKiTevtsJLyEutYhb6O+ecQoQjG2ukDnoFyUtwPIRDM7jLuV96Uao3IAgdsmK5krsyhnZDyTUMBjLNkky7lkH7sOx+DSJBMObq0euUP+ffx/8M36MXE3hCCMsiChHuR4jPbyctsQckFn+vzBphe4t4gDDvTHf79zhVBS3jLMOVuz9hm3PRaBB497gVT5/e6KHR7wMZ6PYRBd1bhgG73zUWA+Llj5IHQWFuTKh0w1g==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=CII/JO0kJ9l1XMbn1o37JQAonLKvxTWlvXWPp7m2/iQ=; b=nbX5x9XceiYD5z1S8wGbf/NCkEL/XFSq7D1NiQbGKKhHpSnT4odSTV1g3kjrakV9oMqU6SBx8z/YAexu/HQwjwEvmJ838U0kXsxCuJPoQT+oKLf1cVDGGN87axTlNoH5FINBWcdc25Ngg1/0J99vgjWFLggXAviAMDRI21yaXHoeSeVqyNOxkfTf/ZDuLhwRsfsNN5UJhu8DyoW9RVg+O/PfWuJakIIWQCf3FovYs8/XrHPp4dyexQtKU3POaGogjDua8iEgC/VO6T4k8EI/oAJjb2H7lwMOWzg2k/0BkfVul9nUixzMCkXRf8fPw7ubUN+VLSXP8ohBJx0NMwGMog==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=hammerspace.com; dmarc=pass action=none header.from=hammerspace.com; dkim=pass header.d=hammerspace.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hammerspace.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=CII/JO0kJ9l1XMbn1o37JQAonLKvxTWlvXWPp7m2/iQ=; b=REvRX0RrfIxfYDTEYsQ7y5/7TD8UQ82DxDprZK3vGIKJlWa/V6KNoGRmj23dr+hDgRh47jvybK4B+OfEcwK+oAe8UO7lTffsuILhH4ZxPlZhki5qzKoYDYqcKlGgRR1o0oIRI8dYEeKxy77UCpmwZxytWwtBYhylBVwSdeThVro=
Received: from CH2PR13MB3398.namprd13.prod.outlook.com (2603:10b6:610:2a::33) by CH2PR13MB3448.namprd13.prod.outlook.com (2603:10b6:610:28::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3153.11; Tue, 30 Jun 2020 13:32:28 +0000
Received: from CH2PR13MB3398.namprd13.prod.outlook.com ([fe80::352c:f318:f4a7:6a0f]) by CH2PR13MB3398.namprd13.prod.outlook.com ([fe80::352c:f318:f4a7:6a0f%3]) with mapi id 15.20.3153.020; Tue, 30 Jun 2020 13:32:28 +0000
From: Trond Myklebust <trondmy@hammerspace.com>
To: "davenoveck@gmail.com" <davenoveck@gmail.com>
CC: "nfsv4@ietf.org" <nfsv4@ietf.org>
Thread-Topic: [nfsv4] Notes regarding discussion of directory scalabiliy issues
Thread-Index: AQHWTkUbs01N8D6Lt0ORrflYNt2du6jxGrGAgAAOjgA=
Date: Tue, 30 Jun 2020 13:32:27 +0000
Message-ID: <5d9b4f697ec698a7f07e8168f56826dbb52e234b.camel@hammerspace.com>
References: <CADaq8jev+tUs=mrGDMnZMpfmQXL=KLwDKW5S-CbBLpL-54RJTA@mail.gmail.com> <3fc0af37d7d870eeb6ab854a75d6eeb5aae61a0d.camel@hammerspace.com> <CADaq8jcb5BLiE49SyS3wxbbDmz88GJNWeLgeh5XU4oGkdBuJmw@mail.gmail.com>
In-Reply-To: <CADaq8jcb5BLiE49SyS3wxbbDmz88GJNWeLgeh5XU4oGkdBuJmw@mail.gmail.com>
Accept-Language: en-US, en-GB
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: gmail.com; dkim=none (message not signed) header.d=none;gmail.com; dmarc=none action=none header.from=hammerspace.com;
x-originating-ip: [68.36.133.222]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: d08bb066-7a50-47aa-db0f-08d81cfa0858
x-ms-traffictypediagnostic: CH2PR13MB3448:
x-microsoft-antispam-prvs: <CH2PR13MB3448011ECC4038AD3A84C49CB86F0@CH2PR13MB3448.namprd13.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:10000;
x-forefront-prvs: 0450A714CB
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: wPy8u1WwNhOfLX9IvYF+CcZsSy1SApWdN2mtSyOmbEfBvTypBKjtHUfXd/9QwMJlhCrcYWcftU/py7ZjggjeohmwmhdgYVKwDHNg82tXez/6TG/FPcpU6uthuhC0g8V4DpMtv0mQ1NonXgwd0IRlXrsOQQufDMW677N8QnahSDcAId4IcUDAZacMADnXrI4cn/oPsEhouAnV++Ob8W4TVPqalU5bdIEuvvIPi0SzjI6/efV89ck0J9VX2Uy1Hi/apqUpUPrYPNt40FFf7dsScj4fwbamvxq5Xs6B/qlNi3mFp4R38bswLoTvp4P0lr7wfiqWroMWAbbyqAihfADXyFMPzArMaFI+dhG0762DXBA/gbRhEZMAEwr7PSG/y4SUGlrroyvRRrJEC0dcKLLAAA==
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:CH2PR13MB3398.namprd13.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(376002)(346002)(396003)(39840400004)(366004)(136003)(83380400001)(6512007)(66446008)(64756008)(66556008)(66476007)(76116006)(66946007)(166002)(4326008)(966005)(478600001)(21615005)(26005)(186003)(8676002)(6916009)(2616005)(5660300002)(86362001)(2906002)(316002)(8936002)(6486002)(53546011)(6506007)(36756003)(71200400001); DIR:OUT; SFP:1102;
x-ms-exchange-antispam-messagedata: iQpSSxhYn3Rq3ljegMiBU7gbULevwI1RMUjWejZllFmr2x4dm17ZL/PD2u3GeU/N6z4xt7Ch0b3i63UbD6Ry1T/kwJNi0M3GTdM4/k6e2/hUzlIbwoELGd1+wqyJqBxZHjZy7yasogX6i7b6oTIOrFplRB2qsNq4XGe5KXtryuNs/8PBT8SDF7auch1FguRoH8IEqg7fiwaIt6d1QYm2UTqi/qeCOursNlgGVB3PgSw7puNSJUBrFcxREiqLqwXlGSwDXC9lr+8DelJmdG2D8TvXfE/4XVlm/bGTfttchiDRCDw5D/VJ5VE5/5LY6JRolyycaIa28RDJF9PoZFi0fd95xY0qPQXxKbtPrLx3jpQ2oGcLKdoWyPh3BoiPYgeCkkwOM2K4o65R9TZROMBmfCeaV4rc1WIAHiprGUvj8OHA5fwiqW/B7ZTCL9CR7QYg8l3IgkXyYBKziQJ8a8yMBKKzUtpn1//iUgTq4oIUlII=
x-ms-exchange-transport-forked: True
Content-Type: multipart/alternative; boundary="_000_5d9b4f697ec698a7f07e8168f56826dbb52e234bcamelhammerspac_"
MIME-Version: 1.0
X-OriginatorOrg: hammerspace.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: CH2PR13MB3398.namprd13.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: d08bb066-7a50-47aa-db0f-08d81cfa0858
X-MS-Exchange-CrossTenant-originalarrivaltime: 30 Jun 2020 13:32:27.8968 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 0d4fed5c-3a70-46fe-9430-ece41741f59e
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: wBUhpZLtVkMixAKPkKsAtgqrsACaw5oNdm72myewN1DmsKbNmu39R8LImYs6h2+BSiv9jmeaKfMejcj0pJHN3w==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH2PR13MB3448
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/CTcMBevY7TjVNFcSrsDdlaACBv0>
Subject: Re: [nfsv4] Notes regarding discussion of directory scalabiliy issues
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Jun 2020 13:32:32 -0000
On Tue, 2020-06-30 at 08:40 -0400, David Noveck wrote: Thanks for your helpful comments. On Mon, Jun 29, 2020 at 2:43 PM Trond Myklebust <trondmy@hammerspace.com<mailto:trondmy@hammerspace.com>> wrote: Like it or not, the readdir cookie is an attribute of the directory. If the protocol treated them as such, then the attribute notifications feature could provide updates to the client. Given that it doesn't, we could add a cookie update feature to directory notification feaure as a v4.2 extension to the protocol. However, I'm reluctant to start work on the necessary protocol additions until we are sure they are needed to provide better directory cacheability. Actually, they are attributes of directory streams. The difference is not all that important given that client implementations are unlikely to be aware of the specific steam associated with any particular request. However, there are a few cases in which the difference is important in determining whether various approaches to client handling of cookies might or might not work, and will be important in the discussion below: * Two requests made on different clients necessarily are made on distinct streams. * Two requests made on different instances of the same client (with an intervening restart/reboot) also have to arise on different streams. If I want to support the POSIX telldir() and seekdir() operations ( https://pubs.opengroup.org/onlinepubs/9699919799/functions/seekdir.html ), then I need to ensure that when the application calls seekdir(), I return to the exact same cursor location in the stream that I was at when I called telldir(). Agreed. Without a server side cookie on which to anchor my telldir() cookies, Every client has these available but it is not clear to me useful such anchoring is. I think the flexibility that each client has to assign cookies to streams it is responsible for is valuable and could be compromised if anchoring to the server cookies is made the focus of the implementation. then all I have is a list of filenames that can and will change every time a file is created, deleted or renamed. Clearly it will change. However, the directory notifications feature makes some assumptions, currently implicit about how the list will change. Once these are made explicit, the wg could decide that server/fs pairs incapable of staying within these reasonable restrictions (if they are, in fact, reasonable), cannot support the directory notifications feature. Both the length and ordering of that list may change whenever the directory is modified, Clearly the length will change, but the reasonable expectation is that creating a file will increase the length by one and deleting one will decrease it by one. I don't see the value of supporting directory notifications on server fs that do something else. With regard ro ordering, suppose the spec allows an fs to shuffle the directory order every time a change is made, but I'm unaware of any actual file systems that do this. Do we need to support directory notifications for such fs's? touch foo; touch bar; ln foo baz; rm foo; mv baz foo There... Most filesystems will end up reordering 'foo' and 'bar' in the directory stream given the above sequence of commands. How does the client figure out what happened if the above sequence of commands is performed on the server? Now let's say that is a directory of a million files, and something like the above is made to happen regularly. How do I maintain a stable list of synthetic cookies on the client? meaning that a naive implementation OK. I'll plead guilty to one misdemeanor count of directory naivety. of synthetic cookies as an offset is not compatible with the telldir()/seekdir() requirements. It's not clear to me how this incompatibility would manifest itself. I think I need to understand what would break. To make matters worse, the list size is for all intents and purposes unbounded, because there is no hard limit on the size of a directory. That makes it also impossible to create a cached mapping between a synthetic cookie and a filename; such a mapping would be unbounded both in size and in duration (since we don't know a priori how long the application will keep the directory open, or for that matter, which exact set of cookies it may have cached). Such a mapping would, in essence, be part of the cached directory. So, if it is too big to keep in client memory,then it is too big to cache and you might as well decide not to cache it. I expect there is an issue that is a worry in the case in which a reasonably sized directory grows over time to be too big to cache while an open directory stream retains some directory cookies which might be incompatible with the client dropping caching of directories and switching to server-based cookies.😖 I feel it is reasonable to treat this situation as one might a cookie-verifier failure, particularly if this is the only worrisome failure mode. However, this possibility means that I would not ask clients to implement such local cookies. To enable that, we would have to make explicit the same sort of reasonableness requirement for cookie changes that we have already discussed for ordering changes. RFC7530 already alludes to the need to avoid spurious cookie invalidations although not in as explicit or strict way as we would need to support directory notifications: As there is no way for the client to indicate that a cookie value, once received, will not be subsequently used, server implementations should avoid schemes that allocate memory corresponding to a returned cookie. Such allocation can be avoided if the server bases cookie values on a value such as the offset within the directory where the scan is to be resumed. Cookies generated by such techniques should be designed to remain valid despite modification of the associated directory. If a server were to invalidate a cookie because of a directory modification, READDIRs of large directories might never finish. So in order to make this work the client would basically have to create its own B-tree and persist it in storage somewhere. I don't see the need to make this persistent. If the client restarts, all directory streams have ceased to exist and we know a posteriori that there are no outstanding directory cookies to which the client would have to respond. <mailto:nfsv4@ietf.org> -- -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com<mailto:trond.myklebust@primarydata.com>
- [nfsv4] Notes regarding discussion of directory s… David Noveck
- Re: [nfsv4] Notes regarding discussion of directo… Trond Myklebust
- Re: [nfsv4] Notes regarding discussion of directo… David Noveck
- Re: [nfsv4] Notes regarding discussion of directo… Trond Myklebust
- Re: [nfsv4] Notes regarding discussion of directo… David Noveck
- Re: [nfsv4] Notes regarding discussion of directo… Rick Macklem
- Re: [nfsv4] Notes regarding discussion of directo… David Noveck
- Re: [nfsv4] Notes regarding discussion of directo… Rick Macklem
- Re: [nfsv4] Notes regarding discussion of directo… David Noveck
- Re: [nfsv4] Notes regarding discussion of directo… Rick Macklem
- Re: [nfsv4] Notes regarding discussion of directo… David Noveck
- Re: [nfsv4] Notes regarding discussion of directo… Rick Macklem