[nfsv4] Fwd: Review of draft-ietf-nfsv4-flex-files-08 (part two of three)
David Noveck <davenoveck@gmail.com> Sat, 02 July 2016 10:01 UTC
Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C621112D113 for <nfsv4@ietfa.amsl.com>; Sat, 2 Jul 2016 03:01:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.699
X-Spam-Level:
X-Spam-Status: No, score=-2.699 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id L1ru4yKEGkaw for <nfsv4@ietfa.amsl.com>; Sat, 2 Jul 2016 03:01:40 -0700 (PDT)
Received: from mail-oi0-x22f.google.com (mail-oi0-x22f.google.com [IPv6:2607:f8b0:4003:c06::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4BC9C12D0A1 for <nfsv4@ietf.org>; Sat, 2 Jul 2016 03:01:40 -0700 (PDT)
Received: by mail-oi0-x22f.google.com with SMTP id u201so140909259oie.0 for <nfsv4@ietf.org>; Sat, 02 Jul 2016 03:01:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=9/4g9m6pfU4sJjdNIcvKNaIqa+3XfQFVv9pkJyWSEjk=; b=rHLJDTeTopfIo0t68nzY87wyA8+Me9hsvYszwcBdiD7/JLaPfByxeL6xlQ+Bmzn/ln oJPHewFlr4BWNNREWj0ZNbEUjedDUq2yaqkS9JGSfAlbQruTWCh3AwsLpsyw52hBx7Ue XsdSWgDfxMUsrJvcjtAMGlo3c5tbFZRp0gOcDtzpq1+Fqxeym/FT6KOhyETT/fExUU9Y dCrh50CoCHeW02QIvAcZdSceKmQChYNVcN+1SFUzPx/S1g+LUcHYyKTa7nsz4i5gT4cd zOrUlspjkMSeyHNuMqYMdCMp0WIGrCT6l9B8XidLbRxF6a9uDrfiJ54VDHDTdV6KHc+C be2g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=9/4g9m6pfU4sJjdNIcvKNaIqa+3XfQFVv9pkJyWSEjk=; b=cKSn3xibs/MJCysKSWRUPdJsRzYBhyAoRFgByqkpV/39iGZaYmkuKC0agSBIvnK7Y7 5cPzsZzBGCepJ95VbH9kJYP8vryyL2FfVmh/oqwadbb5gsLmBd//iytpGJat6Fno3acp Yf4MADtbDJjqcLTXkgxw1D1MuLIehZl/QhFoAr8KA1SZ5Y7EAM9VH+8+YwpnXPGBcpzK R11R3eOBCY6viyyShcaolT85VNRXn93/8YIAdefM7GWIjKGRM07C0K56dhEqehPuvJ/4 GMTguGnFjYziSYMyJz4qq6JzqDcPmQ8UywFkKyC0Wrf8/U23hRW7K2gXqq4kUsqL4HWu WaAg==
X-Gm-Message-State: ALyK8tL5yfifn3FWfKVL2+lwGc8lU0LzIdfy1KUPfPNaVj+Eqmv4xBbxCt5lFACKM8QxRaxJ40AP0jic4Kfx6A==
X-Received: by 10.157.22.164 with SMTP id c33mr1309427ote.114.1467453699481; Sat, 02 Jul 2016 03:01:39 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.182.20.72 with HTTP; Sat, 2 Jul 2016 03:01:39 -0700 (PDT)
In-Reply-To: <CADaq8jePBxsJxBwV-KkPdNjGJdBGwDsgxesayOuOF6k=O3u9Gw@mail.gmail.com>
References: <CADaq8jePBxsJxBwV-KkPdNjGJdBGwDsgxesayOuOF6k=O3u9Gw@mail.gmail.com>
From: David Noveck <davenoveck@gmail.com>
Date: Sat, 02 Jul 2016 06:01:39 -0400
Message-ID: <CADaq8jf7EeLd-vhHEhecZ5Q5QkWkRhrXQbRYsjbkbNS+B1pM=A@mail.gmail.com>
To: "nfsv4@ietf.org" <nfsv4@ietf.org>
Content-Type: multipart/alternative; boundary="001a1141ca327b67550536a430e8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/iXNw99DbDFvYRnrvRlEzd_gYsPE>
Subject: [nfsv4] Fwd: Review of draft-ietf-nfsv4-flex-files-08 (part two of three)
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Jul 2016 10:01:45 -0000
resending ---------- Forwarded message ---------- From: David Noveck <davenoveck@gmail.com> Date: Sat, May 21, 2016 at 10:00 AM Subject: Review of draft-ietf-nfsv4-flex-files-08 (part two of three) To: Thomas Haynes <thomas.haynes@primarydata.com>, Benny Halevy < bhalevy@gmail.com> Cc: "nfsv4@ietf.org" <nfsv4@ietf.org> *Review Structure* This email is the second part of a three-part review. Note that the overall comments are contained in the first part of this review. These contain: - *Background of Review* - *General Evaluation * - *Issues Blocking Working Group Last Call * - *Other Noteworthy Issues * *Per-section Comments (From Section 2.3 through Section 5.1.1)* *2.3. State and Locking Models:* This section consists of two parts: - The first part describes a locking model which I presume is the locking model that applies in the loose coupling case. - The second part, the last two paragraphs, describes how certain features of the environment govern which locking model is to be selected. The problem with this structure is that the second part should be at the start and you would then be in a position to describe each of the locking models. I think the better structure would be to start with what are now the two final paragraphs and then have subsections that describe the two locking models. There are a number of editorial issues in the last two paragraphs: - In the last sentence of the last paragraph, "described in [RFC5661]" is wrong since there is no protocol described there. - Using "NFSv4" to mean "NFSv4.0" is a likely source of confusion. - In many cases, mention of NFSv3 is missing. I propose rewriting the current final two paragraphs as follows: The choice of locking models is governed by the following rules: - Data storage devices implementing the NFSv3 and NFSv4.0 protocols are always treated as loosely coupled. - NFSv4.1+ storage devices that do not return the EXCHGID4_FLAG_USE_PNFS_DS flag set to EXCHANGE_ID are indicating that they are to be treated as loosely coupled. As such, they are treated, from the locking viewpoint, in the same way as NFSv4.0 storage devices. - NFSv4.1+ storage devices that do identify themselves with the EXCHGID4_FLAG_USE_PNFS_DS flag set to EXCHANGE_ID are considered strongly coupled. They will be using a back-end control protocol as provided for in [RFC5661] to implement the global stateid model as defined there. With regard to the tight coupling case, I presume that the appropriate locking model is that described in Chapter 13 of RFC5661 but think there these should be some discussion what exactly this means in practice and of how the new/different features of the mapping type interact with locking model. Now to go back to the first paragraphs, the second sentence of the second paragraph is wrong and needs to be changed as it contradicts what is written about stateids in *5.1. ff_layout4*. Based on my discussion with Tom, I am assuming that anonymous stateids will be used to do IO in the loosely coupled case. Once that issue is resolved, there needs to be some discussion of how the fact that all IO will be stateid-anonymous will be dealt with. I am going to be assuming the it will be in this section, rather than in *5.1. ff_layout4*. With regard to mandatory byte-range locking we need an explicit statement tht this is not (i.e. cannot be) supported with loose coupling. With regard to mandatory locking due to share reservations one doesn't have the option of simply not supporting the functionality. The spec will have to clearly explain how it is to be done. Some likely elements: - In the case in which each of the clients with a particular file opened, has the same IO rights, the MDS has to ensure via layout recalls (and potential indicating layouts are unavailable) that no client which has no owner allowed to a particular form of IO has no layouts that allow that form of IO to be done. (it may already say that but it probably needs clarification). - In the case in which a particular client has multiple owners with different levels IO rights, the spec either has to ask the pNFS client to do the enforcement itself, or it has to provide that layouts are to be unavailable to this client and require the client to perform the IO via the MDS. Once that is addressed, we have to face the fundamental problem with this section. It has to to with the stateids that are returned to clients, rather than the ones that appear (or don't) in layouts. >From what is written there now, it is hard to determine what is actually intended. A lot of confusion results from the multiple and uncertain meanings of the preposition "against". In the first sentence, the phrase "against the metadata server", simply indicates that the operations in question are directed to the metadata server. As this paragraph, unlike the following one, applies to both loose and tight coupling,it should stay where it is. I suggest redrafting it as follows: Clients always perform locking-relating operations by interacting with the metadata server. These include operations related to open files (OPEN, OPEN_DOWNGRADE, and CLOSE), byte-range locking (LOCK, LOCKT, and LOCKU), delegation management (DELEGRETURN), and stateid management (TEST_STATEID and REMOVE_STATEID). Delegation recall is effected by the metadata server sending a callback to the client. In all cases, the stateids that result from executing these operations are returned by the metadata server to the client and client uses these stateids in subsequent locking-related operations. The means by which these stateids are maintained and the handling of IO operations differ with the coupling strength in effect for the connnection. The existing second paragraph is not clear but, for a number of reasons, I don't believe that it is a good basis for an eventual subsection describing the loose coupling locking model - Although the introductory sentences mention OPEN, LOCK, DELEGATION, the rest of the discussion focuses on opens, leaving it very unclear how byte-range locks and delegations will/should/might be dealt with. I think this is primarily an editorial problem although there are potential interactions with choices regarding fundamental technical choices as far as NFSv4.x. - When mirroring and/or striping is in effect, doing open "against" the data files will result in mulitple stateid's. - In the loose-coupling case, the three NFS protocols are treated as essentially the same, despite their very real differences. This is, in part, an editorial problem, but it appears to me that once the editorial problems are addressed, one could face significant technical issues, See below for details. At this point I can't figure out the locking models that are actually intended but, as a way of continuing the discussion, I draft some descriptions below of something that I believe is workable in the context. Although I may not be right in my guesses about how this will work, it seems to me that the items that are mentioned have to be addressed somehow to clearly describe a locking model. Here is what I've come up with for *Section 2.3.1. Loose-coupling Locking Model*: When locking-related operations are requested, they are primarily dealt with by the metadata server, who generates the appropriate stateids. When an NFSv4 version is used as the data access protocol, the metadata server may make stateid-related requests of the data storage devices. However, it is not required to do so and the resulting stateids are known only to the metadata server and the data storage device. Given this basic structure, locking-related operations are handled as follows: - OPENs are dealt with primarily on the metadata server. Stateids are selected by the metadata server and associated with the client id describing the client's connection to the metadata server. The metadata server may need to interact with the data storage device to locate the file to be opened, but no locking-related functionality need be used on the data storage device. OPEN_DOWNGRADE and CLOSE only require local execution on the metadata sever. - Advisory byte-range locks can be implemented locally on the metadata server. As in the case of OPENs, the stateids associated with byte-range locks, are assigned by the metadata server and only used on the metadata server. For reasons explained below, mandatory byte-range locks are not supported when loose coupling is in effect. - Delegations are assigned by the metadata server who initiates recalls when conflicting OPENs are processed. No data storage device involvement is required. - TEST_STATEID and FREE_STATEID are processed locally on the metadata server, without data storage device involvement. All IO operations to the data storage device are done using the anonymous stateid. As a result, the data storage device has no information about the openowner and lockowner responsible for issuing a particular IO operation. As a result, - Mandatory byte-range locking cannot be supported because the data storage device has no way of distinguishing IOs done on behalf of the lock owner from those done by others. - Enforcement of share reservations is the responsibility of the client. Even though IO is done using the anonymous stateid, the client must ensure that it has a valid stateid associated with the openowner, that allows the IO being done before issuing the IO. In the event that a stateid is revoked, the metadata server is responsible for preventing client access, since the metadata server has no way of being sure that the client is aware that the stateid in question has been revoked. As the client never receives a stateid generated by the data storage device, there is no client lease on the data storage device and no prospect of lease expiration, even when NFSv4 protocols are used to access the data storage device. Clients will have leases on the metadata server, which are subject to expiration. In dealing with lease expiration, the metadata server my need to use fencing to prevent revoked stateids from being relied upon by a client unaware of the fact that they have been revoked. Here is what I've come up with for *Section 2.3.2. Tight-coupling Locking Model*: When locking-related operations are requested, they are primarily dealt with by the metadata server, who generates the appropriate stateids. These stateids must be made known to the data storage device using control protocol facilities, the details of which are not discussed in this document. Given this basic structure, locking-related operations are handled as follows: - OPENs are dealt with primarily on the metadata server. Stateids are selected by the metadata server and associated with the client id describing the client's connection to the metadata server. The metadata server needs to interact with the data storage device to locate the file to be opened, and to make the data storage device aware of the association between the metadata-sever-chosen stateid and the client and openowner that it represents. OPEN_DOWNGRADE and CLOSE are executed initially on the metadata server but the state change made must be propagated to the data storage device. - Advisory byte-range locks can be implemented locally on the metadata server. As in the case of OPENs, the stateids associated with byte-range locks, are assigned by the metadata server and are available for use on the metadata server. Because IO operations are allowed to present lock stateids, the metadata server needs the ability to make the data storage device aware of the association between the metadata-sever-chosen stateid and the corresponding open stateid it is associated with. - Mandatory byte-range locks can be supported when both the metadata server and the data storage devices has the appropriate support. As in the case of advisory byte-range locks, these are assigned by the metadata server and are available for use on the metadata server. To enable mandatory lock enforcement on the data storage device, the metadata server needs the ability to make the data storage device aware of the association between the metadata-sever-chosen stateid and the client, openowner, and lock (i.e., lockowner, byte-range, lock-type0 that it represents. Because IO operations are allowed to present lock stateids, this information needs to be propagated to all data storage devices to which IO might be directed rather than only to daya storage device that contain the locked region. - Delegations are assigned by the metadata server who initiates recalls when conflicting OPENs are processed. Because IO operations are allowed to present delegation stateids, the metadata server requires the ability to make the data storage device aware of the association between the metadata-server-chosen stateid and the filehandle and delegation type it represents, and to break such an association. - TEST_STATEID is processed locally on the metadata server, without data storage device involvement. - FREE_STATEID is processed on the metadata server but the metadata server requires the ability to propagate the request to the corresponding data storage devices. Because the client will possess and use stateids valid on the data storage device, there will be a client lease on the data storage device and the possibility of lease expiration does exist. The best approach for the data storage device is to retain these locks as a courtesy. However, if it does not do so, control protocol facilities need to provide the means to synchronize lock state between the metadata server and data storage device. Clients will also have leases on the metadata server, which are subject to expiration. In dealing with lease expiration, the metadata server would be expected to use control protocol facilities enabling it to invalidate revoked stateids on the data storage device. In the event the data storage device is not responsive, the metadata server may need to use fencing to prevent revoked stateids from being acted upon by the data storage device. As a result of describing the tight coupling locking model in parallel with the the loose coupling locking model, I've come to the conclusion that the phrase "global stateid id model", while a useful and compact summary, has made the function of the control protocol seem more difficult/mysterious than it needs to be. Since the goal is to make it clear what is needed to implement flex-file, including the tight-coupling option, I think it would be helpful if the flex-files spec, retained the additional detail that appears above. Now I'm getting beyond the scope of a review of the flex-files spec but I'd like to note that the flex-files layout work has already made it clear that a large part of control protocol functionality is already present in the NFSv4 base protocol. Perhaps an NFSv4.x extension could be defined to provide the remainder and be usable for both the RFC5661-specified files layout and the flex-files layout with tight coupling. Perhaps this could be discussed in Berlin? *4.1. ff_device_addr4:* In the third non-CODE paragraph suggest, the following changes, primarily to reflect the fact that pNFS client use of layouts is never mandatory: · In the penultimate sentence, suggest replacing "MUST access the storage device" by "MAY ONLY access the storage device directly" · In the last sentence, suggest replacing "MUST access the storage device using NFSv4" by "MAY ONLY access the storage device directly using the corresponding minor version of NFSv4" Tom believes that the two suggestions above imply that the client can use an unsupported protocol version. I disagree. This issue needs to be resolved. *5. Flexible File Layout type* "type" needs to be capitalized in the title. This is a new issue introduced by a change in -08. *5.1. ff_layout4:* There are two remaining issues in this section that were in the the -06. - The contradiction between this section and *Section 2.3. State and Locking Models.* - The fact that there is no use for ffds_stateid, since the anonymous stateid is used in the loose-coupling case and a globally valid one is used in the tight coupling case. In addition, the new FF_FLAGS_NO_IO_THROUGH_MDS in -07 raises some issues that need to be addressed: - First of all, it isn't clear that "SHOULD" is actually intended/appropriate. According to RFC2119, this means "that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course". In particular, the text does not give one a basis to understand the implications of choosing to do IO using the MDS, when this flag is present. Perhaps "should" is more appropriate here? - The statement "even if a storage device is partitioned from the client, the client SHOULD not try to proxy the IO through the metadata server" raises additional issues. I assume that partitioning might happen after the layout in question is recalled and is part of the revocation process for the layout in question. Thurs this flag seems to be giving directions regarding metadata-directed after the layout in question no longer applies. ???? - Given that base NFSv4 IO does not require use of layouts, it isn't clear that the client would actually use layouts and, even if it did, it would not require one for areas to which it is doing IO directed at the metadata server. Because of this, a client might not see the RECOMMENDATION/recommendation before doing the IO being warned against. Although how this might be dealt with is going to depend on the resolution of the should-vs.-SHOULD question mentioned above, I'm concerned that someone contributing to this specification, not necessarily one of the authors, is assuming a level of metadata server direction with regard to client IO that is inconsistent with the pNFS model. Within pNFS, a client's ability to do IO to the metadata server is defined by the base NFSv4.1 semantics, while the layout type may impose, using layouts, any restrictions it wants for IO through the data storage devices. *5.1.1. Error codes from LAYOUTGET* I'm doubtful about the use of "SHOULD" in the cases for NFS4ERR_{LAYOUTTRYLATER, DELAY}. It seems to me that the author is telling me, that, when the client has a layout, it is either desirable or undesirable for me to continue to use it. But there is no basis given for considering this an interoperability issue, or letting the reader understand the consequences of taking the choice considered undesirable. I think these "SHOULD"s should be "should"s.
- [nfsv4] Fwd: Review of draft-ietf-nfsv4-flex-file… David Noveck
- [nfsv4] Review of draft-ietf-nfsv4-flex-files-08 … David Noveck
- Re: [nfsv4] Review of draft-ietf-nfsv4-flex-files… Thomas Haynes
- Re: [nfsv4] Review of draft-ietf-nfsv4-flex-files… David Noveck