Re: [nfsv4] Review of draft-ietf-nfsv4-flex-files-08 (part one of three)

Thomas Haynes <loghyr@primarydata.com> Mon, 17 July 2017 09:14 UTC

Return-Path: <loghyr@primarydata.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 56F4912ECC1 for <nfsv4@ietfa.amsl.com>; Mon, 17 Jul 2017 02:14:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.489
X-Spam-Level:
X-Spam-Status: No, score=-2.489 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_DKIM_INVALID=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=fail (1024-bit key) reason="fail (body has been altered)" header.d=primarydata.onmicrosoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6VzBOTaoiz53 for <nfsv4@ietfa.amsl.com>; Mon, 17 Jul 2017 02:14:23 -0700 (PDT)
Received: from us-smtp-delivery-194.mimecast.com (us-smtp-delivery-194.mimecast.com [63.128.21.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3324512EB5F for <nfsv4@ietf.org>; Mon, 17 Jul 2017 02:14:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=PrimaryData.onmicrosoft.com; s=selector1-primarydata-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=Yo6lOhpg517tX8YIDa14ANG2cycyv9mgQ+1d7d0QsiE=; b=aY2IB1jlsc2theH0lAuQR6V7L74VS4u8zMf67xP/zVEk5pdHNIRLIt1PB2bOZvJ9FYudI4WwVmPIbzK3ztWU9W0mMG9PTv5gC3Fs79cDW7Kgc5SLjrTIZWl6UaeG+OjsoLpBRukr2qq1RoRfvt/n2odRen3Bk9v+6qwhcLJKAo4=
Received: from NAM01-BY2-obe.outbound.protection.outlook.com (mail-by2nam01lp0179.outbound.protection.outlook.com [216.32.181.179]) (Using TLS) by us-smtp-1.mimecast.com with ESMTP id us-mta-198-HeOLq39YOvaJomdXa0sPwQ-1; Mon, 17 Jul 2017 05:14:17 -0400
Received: from BY2PR1101MB1093.namprd11.prod.outlook.com (10.164.166.21) by BY2PR1101MB1093.namprd11.prod.outlook.com (10.164.166.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1261.13; Mon, 17 Jul 2017 09:14:14 +0000
Received: from BY2PR1101MB1093.namprd11.prod.outlook.com ([10.164.166.21]) by BY2PR1101MB1093.namprd11.prod.outlook.com ([10.164.166.21]) with mapi id 15.01.1261.022; Mon, 17 Jul 2017 09:14:13 +0000
From: Thomas Haynes <loghyr@primarydata.com>
To: Dave Noveck <davenoveck@gmail.com>
CC: Thomas Haynes <loghyr@primarydata.com>, Benny Halevy <bhalevy@gmail.com>, "nfsv4@ietf.org" <nfsv4@ietf.org>
Thread-Topic: Review of draft-ietf-nfsv4-flex-files-08 (part one of three)
Thread-Index: AQHRs2kdBf+BX4PHHEmD4XKZmM29+6JaU0YA
Date: Mon, 17 Jul 2017 09:14:13 +0000
Message-ID: <B984862C-3D12-418F-85C9-C33DAE6C0A1C@primarydata.com>
References: <CADaq8jc0fvVm=mtKu4C6HKEqwzqyaeW4yKbNHJVwtJQmSwdMOQ@mail.gmail.com>
In-Reply-To: <CADaq8jc0fvVm=mtKu4C6HKEqwzqyaeW4yKbNHJVwtJQmSwdMOQ@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [2001:67c:1232:144:1555:9810:1ad:2b96]
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; BY2PR1101MB1093; 20:eOPjzQ3wnMsKUrPOndkFiD3JFs59b++l2nedUxSqt9kWkyUfdGcaXBgfOrCN5+5oQ8Iy5LrbMeX+P5r81n90iAEgYv+0/hLtFFJrjnfIfvPSgFDcpJWyk6ETqc7qLmSdsZPcvF19zqT8WPJtvZmsD65HVuBO3zhvd/TykUgrmdw=
x-forefront-antispam-report: SFV:SKI; SCL:-1SFV:NSPM; SFS:(10019020)(39410400002)(39400400002)(39830400002)(39450400003)(31014005)(24454002)(377454003)(50986999)(54356999)(76176999)(6116002)(102836003)(25786009)(6486002)(81166006)(77096006)(2906002)(53546010)(4326008)(5660300001)(36756003)(2900100001)(478600001)(83716003)(82746002)(606006)(14454004)(6246003)(39060400002)(110136004)(6506006)(38730400002)(6512007)(236005)(54906002)(99286003)(230783001)(54896002)(229853002)(6306002)(561944003)(6436002)(3660700001)(3280700002)(8676002)(189998001)(7736002)(1411001)(33656002)(53936002)(53946003)(86362001)(2950100002)(6916009)(8936002)(42262002)(579004); DIR:OUT; SFP:1102; SCL:1; SRVR:BY2PR1101MB1093; H:BY2PR1101MB1093.namprd11.prod.outlook.com; FPR:; SPF:None; MLV:ovrnspm; PTR:InfoNoRecords; LANG:en;
x-ms-office365-filtering-correlation-id: 288a3671-294e-43b9-6def-08d4ccf43115
x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254075)(300000503095)(300135400095)(2017052603031)(201703131423075)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095); SRVR:BY2PR1101MB1093;
x-ms-traffictypediagnostic: BY2PR1101MB1093:
x-exchange-antispam-report-test: UriScan:(278178393323532)(158342451672863)(133145235818549)(236129657087228)(192374486261705)(90097320859284)(788757137089)(48057245064654)(148574349560750)(167848164394848);
x-microsoft-antispam-prvs: <BY2PR1101MB1093B01AF189B23BE0378E7BCEA00@BY2PR1101MB1093.namprd11.prod.outlook.com>
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(5005006)(8121501046)(2017060910075)(10201501046)(93006095)(93001095)(3002001)(100000703101)(100105400095)(6041248)(20161123558100)(20161123562025)(20161123560025)(20161123564025)(20161123555025)(2016111802025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(6043046)(6072148)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:BY2PR1101MB1093; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:BY2PR1101MB1093;
x-forefront-prvs: 0371762FE7
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
MIME-Version: 1.0
X-OriginatorOrg: primarydata.com
X-MS-Exchange-CrossTenant-originalarrivaltime: 17 Jul 2017 09:14:13.0063 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 03193ed6-8726-4bb3-a832-18ab0d28adb7
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY2PR1101MB1093
X-MC-Unique: HeOLq39YOvaJomdXa0sPwQ-1
Content-Type: multipart/alternative; boundary="_000_B984862C3D12418F85C9C33DAE6C0A1Cprimarydatacom_"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/ECBxpP_XXSXrgGCbOTwuxssV8sQ>
Subject: Re: [nfsv4] Review of draft-ietf-nfsv4-flex-files-08 (part one of three)
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Jul 2017 09:14:28 -0000

I’ve lost my draft of my reply, so I’ll just wing it as I go along…

I.e., I made edits, recorded that fact in a draft reply and now have to clue as to my rationale.

I have addressed all of the issues in this part of the series and will be publishing them as draft-10.

I have also provided an approach to kerberized access under the loosely coupled model.


On May 21, 2016, at 7:00 AM, David Noveck <davenoveck@gmail.com<mailto:davenoveck@gmail.com>> wrote:

Review Structure

This is the first section of a multi-part review which has been broken up into multiple mail messages to avoid running into wg mailing list size restrictions.

This section contains the general comments and the first part of the per-section comments.  As a result, much of the material in the general comments will be referring to material that only appears in later review sections.

General Comments

Background of Review

The -06 of this document entered Working Group Last Call, according to the Datatracker, on 8/24/2015.

Soon after that I sent a long review (in two parts) to the authors, cc-ing the working group list.  The substance of my review was that the document had some major issues that needed to be addressed to get it into a WGLC-worthy state.

At the time I didn't get any response from Tom.  It now appears that Tom did send a response to the second part of my review on 10/16/2015, but it doesn't seem that I got it.  Although he did send it to the working group list, it can't be found in the mailing list archives.

I have not been able to find the email in which the WGLC was announced, so I'm not sure when it was supposed to end or any mail in which the end of the last call is discussed and next steps agreed upon. In any case, given the eight months since then and now, I presume the WGLC is now effectively over.  I'll leave it to the authors to resolve the administrative issues regarding the document state with Spencer.

In January 2016, and recently, Tom has posted updates to the document, although they don't address the major issues that I raised in my review of -06.  Recently, Tom forwarded me the lost response to the second part of my review, so I have a better picture of his views about my comments and his plans for the document going forward.

In any case, I've decided to send a review of -08.  Because the differences between  -06 and -08 are small, it will be based, in large part on my earlier review of -06, although there will be changes of the following sorts:

  *   In cases in which the -08 has a new issue, not present in the -06, I'll add a discussion of that issue, and note that it is a new issue in the -08.
  *   In cases in which there is an issue not noted in the -06 review but which applies to the -06 as well, I'll note that fact as well.
  *   Issues noted in the review of the -06, that have been addressed in -08 are simply eliminated.
  *   In cases in which I know from the forwarded response how Tom intends to deal with an issue, the discussion is simplified to reflect the likely approach.  In particular, the discussion of the handling of stateids is simplified given that Tom has decided on a particular approach.
  *   In cases in which I know from the forwarded response that Tom objects to a proposed change, I'll summarize the situation, but will try not to continue the argument, and leave resolution of the issue until later discussion of the document

General Evaluation

Basically the situation remains as it was when I sent the review of -06.  Although there has been some discussion of some of the issues raised in my review of -06, and some changes made, the major issues I noted have not been addressed by document changes.

There is a lot of valuable stuff in this document but significant work needs to be done to clarify things regarding the handling of locking state.  A lot of the problems derive from the fact that this document describes four different situations and is not careful about the differences among:

  *   Using NFSv3 as a data access protocol
  *   Using NFSv4.0 as a data access protocol
  *   Using NFSv4.1 as a data access protocol in the loose coupling case
  *   Using NFSv4.1 as a data access protocol in the tight coupling case

Issues Blocking Working Group Last Call

I'd characterize the following issues as being in this category:

  *   Resolving the contradiction in -08 between Section 2.3.  Locking Models and Section 5.1.  ff_layout4,
  *   Implementing some of the cleanup/clarification regarding locking discussed in 2.3.  Locking Models (as regards my current views regarding 08, this is addressed in the second part of the current review).
  *   Resolving the issue of the unused ffds_stateid field in Section 5.1.  ff_layout4,

My take is to just obsolete it and say that tightly coupled layouts are broken.



  *   Resolving the issue about FF_FLAGS_NO_IO_THROUH_MDS discussed in 5.1.  ff_layout4,

Other Noteworthy Issues

I think it makes sense to address RIck's proposal in his mail of March 8, Possible change to Flex Files layout. The response on the list was generally positive, so it makes sense to address this before working group last call.

Comments by Section (Through Section 2.2)

Abstract:

Suggest replacing "to allow" by "which allows"


Ack

1. Introduction:

Suggest replacing "the wire protocol of such a protocol" by "such a protocol".



Ack


1.1 Definitions:

• control protocol

This is not "a set of requirements".  Rather it is something that supports a set of requirements.  Unstated requirements with nothing to provide support them are not very helpful.


If it is not a set of requirements, then it is not a protocol.


As an example of the difficulties that this gives rise to, consider the phrase "requirements for the protocol" in the introduction.  If the control protocol is a "set of requirements" then the requirements for the protocol is "requirements for a set of requirements" :-(  This is making my head hurt.



And that is why we have the other document. RFC 5661 avoided defining what the control protocol was and even avoided defining the requirements for such a control protocol. Oh, we will let the vendor implementations do all that.


· data file

At the very least, should replace "E.g." by "I.e.".  my preference would be to simply say it is the file contents without using any Latin abbreviations.  So how about:

That part of a file system object that consists of the file contents.



I’ve made changes to this and the metadata file definitions.

• data server (DS)

I think this definition should be deleted together with Section 1.2. Difference Between a Data Server and a Storage Device. See my comments below (in 1.2. Difference Between a Data Server and a Storage Device)
for more details/ranting.



No

· file layout type

I think you mean here the layout type defined in chapter 13 of RFC5661.  As it is, with the current reference to 'the NFS protocol" this seems to include that and the flexible file layout.   If you need a term for that latter, suggest "NFS-based layout type".



Made a ref to Sec 13 of 5661.

• layout segment

The reference there is to chapter 13 of RFC5661 which deals with the (old-fashioned,  tightly-coupled) file layout type.  It makes more sense, in this document, to reference something specific to  the flex-files layout type.



I’m fine with it the way it is.

• layout stateid


In the last sentence suggest adding "of that document" after "Section 12.5.3".


Made a change.



  *   loose coupling
You seem to have gotten rid of the "no control protocol" stuff elsewhere.  it's time to do the same in the definitions section



I’m fine with what I have.


  *   metadata file

This is now confusing because, although the metadata file does not include the file contents, it does describe them.  So "describes the object and not the payload" is wrong.

Suggest the following replacement:

That part of a file system object which gives attributes for and the location of the file data rather than containing the file contents itself,  It can include such items as the times of  last modification or access, and security information such as an ACL.



I’m fine with what I have.


• mirror

There are a couple of issues I have with the second sentence (or the first sentence following the initial sentence fragment):
o It isn't clear to me how the two clauses relate.
o there needs to be more information about how the situation referred to in the first clause is arrived.
o None of this is really part of the definition of mirror.


Removed the 2nd sentence.


Some of this material needs to go into the section 8. Mirroring.


• tight coupling

Same issue as for "loose coupling"  You need to characterize the different types/strengths of potential control protocols, rather than pretending that a very thin one doesn't exist.



I’m fine with what I have.


  *   recalling a layout
In the last sentence, suggest "could be able" by "has the opportunity".



done

Also, I'm not sure what "etc" is referring to in the last sentence.  Should it be removed?


commit, update attributes if a write delegation is held, etc.


  *   revoking a layout


It seems to me that, for this layout type, revoking a layout and fencing are essentially identical and that this fact should be noted.



No, they are different.

recall is graceful and revoke is not.



1.2. Difference Between a Data Server and a Storage Device

The term data server appears eleven times in this document.  Nine of them are either in the definitions of "data server" and "data storage device" or in this section.  That leaves two:  the title of section 2.2 and the corresponding TOC entry.

There is no point in introducing two terms that essentially mean the same thing.  Instead of doing that and then backing away from it, it is  best to pick one and stick with it.




Again, I am fine with what I have.


2.1. LAYOUTCOMMIT:

There are a couple of issues in the first sentence:
•       "tightly coupled system" is not as clear/general as it should be.
•       The reference to the semantics of the File Layout Type is inappropriate given that the reference is to Chapter 12 of RFC5661.

Suggest the following replacement:

When tightly coupled storage devices are used, the metadata server has the responsibility, upon receiving a LAYOUTCOMMIT (see Section 18.42 of [RFC5661]<https://tools.ietf.org/html/rfc5661#section-18.42>), of ensuring that the semantics of pNFS are respected (see Section 12.5.4 of [RFC5661]<https://tools.ietf.org/html/rfc5661#section-12.5.4>).  These do not include a requirement that data written to data storage device be stable upon completion of the LAYOUTCOMMIT.


It makes sense to start a new paragraph at this point.

Ack and taken

With regard to the rest of material, I basically agree with Benny but I think we have to arrange things to make clearer that we are talking about the loosely coupled case here.  I think you need something like the following:

In the case of loosely coupled storage devices, it is the responsibility of the client to make sure the data file is stable before the metadata server begins to query the storage devices about the changes to the file. If any WRITE to a storage device did not result with stable_how equal to FILE_SYNC, a LAYOUTCOMMIT to the metadata server MUST be preceded by a COMMIT to the storage devices written to.


The last sentence of the paragraph is not very convincing as to the reason for this requirement.  It should explain why "the LAYOUTCOMMIT might not be synchronized to the last WRITE operation to the storage device".  I assume the worry is about a data storage device reboot immediately after the LAYOUTCOMMIT.



Ack

2.2. Fencing Clients from the Data Server:


If one replaces "Data Server" by "Data Storage Device" in the title, one gets rid of the last two uses of "Data server" in this document. Thus there is no need to discuss the differences/nuances between the two.

There's a problem in the fourth paragraph. Given that the first three paragraphs deal with the loosely coupled case, when you get to the tightly coupled case you need to say something about fencing clients from the data storage device. Instead you simply explain that the loose couplig fencing facilities are not used, leaving the fencing question up in the air.  Saying it's not your job is OK if you explain why that is.  For example, one might say:

In the case of tight coupling, such fencing facilities are the responsibility of the control protocol and are not described in detail here.  However, implementations of the tight coupling locking model (see Section 2.3), will need a way to prevent access by certain clients to specific files by invalidating the corresponding stateids on the data storage device.