Re: [nfsv4] Review of draft-ietf-nfsv4-flex-files-08 (part two of three)

David Noveck <davenoveck@gmail.com> Tue, 18 July 2017 05:53 UTC

MIME-Version: 1.0
In-Reply-To: <D9A4169C-8E61-4F84-AF42-B9D9C76596F8@primarydata.com>
References: <CADaq8jePBxsJxBwV-KkPdNjGJdBGwDsgxesayOuOF6k=O3u9Gw@mail.gmail.com> <D9A4169C-8E61-4F84-AF42-B9D9C76596F8@primarydata.com>
From: David Noveck <davenoveck@gmail.com>
Date: Tue, 18 Jul 2017 01:53:50 -0400
Message-ID: <CADaq8je3JGrr2ShNwXkO3N-jE5Zf7=QKnn687-qAd6_T4C9LXQ@mail.gmail.com>
To: Thomas Haynes <loghyr@primarydata.com>
Cc: Benny Halevy <bhalevy@gmail.com>, "nfsv4@ietf.org" <nfsv4@ietf.org>
Content-Type: multipart/alternative; boundary="001a1143d6eec7709205549123f4"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/GUE48ZA-MoIxAm7U-qVfY2FQ6Tg>
Subject: Re: [nfsv4] Review of draft-ietf-nfsv4-flex-files-08 (part two of three)
Precedence: list

> To me, SHOULD means I’d really love to make this a MUST, but there exists
enough prior art to prevent that.

That clarifies things.

This is not what "SHOULD" means according to RFC2119.  The idea there is
that someone not obeying the "SHOULD" has to do so knowingly, aware of the
consequences of doing that.

Here, the prior art that motivates the choice of "SHOULD" is unaware of why
it is not in line with the "SHOULD", which didn't exist when it was put
into practice.

When RFC2119 was written, the idea was that RFC's would define new
protocols, rather than extend existing ones.

Maybe we should discuss this issue in person, rather than email.

On Mon, Jul 17, 2017 at 5:08 PM, Thomas Haynes <loghyr@primarydata.com>
wrote:

>
> On May 21, 2016, at 4:00 PM, David Noveck <davenoveck@gmail.com> wrote:
>
> *Review Structure*
>
> This email is the second part of a three-part review.
>
> Note that the overall comments are contained in the first part of this
> review.  These contain:
>
>    - *Background of Review*
>    -
> *General Evaluation *
>    -
> * Issues Blocking Working Group Last Call *
>    -
> * Other Noteworthy Issues *
>
>
> *Per-section Comments (From Section 2.3 through Section 5.1.1) *
>
>
> *2.3. State and Locking Models:*
>
>
> This section consists of two parts:
>
>
>    - The first part describes a locking model which I presume is the
>    locking model that applies in the loose coupling case.
>    - The second part, the last two paragraphs, describes how certain
>    features of the environment govern which locking model is to be selected.
>
> The problem with this structure is that the second part should be at the
> start and you would then be in a position to describe each of the locking
> models.  I think the better structure would be to start with what are now
> the two final paragraphs and then have subsections that describe the two
> locking models.
>
> There are a number of editorial issues in the last two paragraphs:
>
>    - In the last sentence of the last paragraph, "described in [RFC5661]"
>    is wrong since there is no protocol described there.
>    - Using "NFSv4" to mean "NFSv4.0" is a likely source of confusion.
>    - In many cases, mention of NFSv3 is missing.
>
>
> I propose rewriting the current final two paragraphs as follows:
>
>
> The choice of locking models is governed by the following rules:
>
>
>
>    - Data storage devices implementing the NFSv3 and NFSv4.0 protocols
>    are always treated as loosely coupled.
>    - NFSv4.1+ storage devices that do not return the EXCHGID4_FLAG_USE_PNFS_DS
>    flag set to EXCHANGE_ID are indicating that they are to be treated as
>    loosely coupled. As such, they are treated, from the locking viewpoint, in the
>    same way as NFSv4.0 storage devices.
>    - NFSv4.1+ storage devices that do identify themselves with the
>    EXCHGID4_FLAG_USE_PNFS_DS flag set to EXCHANGE_ID are considered strongly
>    coupled. They will be using a back-end control protocol as provided for in
>    [RFC5661] to implement the global stateid model as defined there.
>
> With regard to the tight coupling case, I presume that the appropriate
> locking model is that described in Chapter 13 of RFC5661 but think there
> these should be some discussion what exactly this means in practice and of
> how the new/different features of the mapping type interact with locking
> model.
>
>
> Now to go back to the first paragraphs, the second sentence of the second
> paragraph is wrong and needs to be changed as it contradicts what is
> written about stateids in *5.1.  ff_layout4*.  Based on my discussion
> with Tom, I am assuming that anonymous stateids will be used to do IO in
> the loosely coupled case.
>
>
>
> I don’t see anyone doing a NFSv4.0 storage device.
>
> And due to the issue with ffm_stateid, either the anonymous stateid is
> used or we restrict the size of ffds_fh_vers to be 1.
>
>
> Once that issue is resolved, there needs to be some discussion of how the
> fact that all IO will be stateid-anonymous will be dealt with.  I am going
> to be assuming the it will be in this section, rather than in *5.1.
>  ff_layout4*.
>
>
> With regard to mandatory byte-range locking we need an explicit statement
> tht this is not (i.e. cannot be) supported with loose coupling.
>
>
>
>
> Agreed
>
> With regard to mandatory locking due to share reservations one doesn't
> have the option of simply not supporting the functionality. The spec will
> have to clearly explain how it is to be done. Some likely elements:
>
>
>    - In the case in which each of the clients with a particular file
>    opened, has the same IO rights, the MDS has to ensure via layout recalls
>    (and potential indicating layouts are unavailable) that no client which has
>    no owner allowed to a particular form of IO has no layouts that allow that
>    form of IO to be done. (it may already say that but it probably needs
>    clarification).
>    - In the case in which a particular client has multiple owners with
>    different levels IO rights, the spec either has to ask the pNFS client to
>    do the enforcement itself, or it has to provide that layouts are to be
>    unavailable to this client and require the client to perform the IO via the
>    MDS.
>
> Once that is addressed, we have to face the fundamental problem with this
> section. It has to to with the stateids that are returned to clients,
> rather than the ones that appear (or don't) in layouts.
>
> From what is written there now, it is hard to determine what is actually
> intended.  A lot of confusion results from the multiple and uncertain
> meanings of the preposition "against".
>
> In the first sentence, the phrase "against the metadata server", simply
> indicates that the operations in question are directed to the metadata
> server.  As this paragraph, unlike the following one, applies to both loose
> and tight coupling,it should stay where it is.  I suggest redrafting it as
> follows:
>
>
> Clients always perform locking-relating operations by interacting with the
> metadata server.  These include operations related to open files (OPEN,
> OPEN_DOWNGRADE, and CLOSE), byte-range locking (LOCK, LOCKT, and LOCKU), delegation
> management (DELEGRETURN), and stateid management (TEST_STATEID and
> REMOVE_STATEID).  Delegation recall is effected by the metadata server
> sending a callback to the client.
>
>
> In all cases, the stateids that result from executing these operations are
> returned by the metadata server to the client and client uses these
> stateids in subsequent locking-related operations.  The means by which
> these stateids are maintained and the handling of IO operations differ with
> the coupling strength in effect for the connnection.
>
> The existing second paragraph is not clear but, for a number of reasons, I
> don't believe that it is a good basis for an eventual subsection describing
> the loose coupling locking model
>
>
>    - Although the introductory sentences mention OPEN, LOCK, DELEGATION,
>    the rest of the discussion focuses on opens, leaving it very unclear how
>    byte-range locks and delegations will/should/might be dealt with. I think
>    this is primarily an editorial problem although there are potential
>    interactions with choices regarding fundamental technical choices as far as
>    NFSv4.x.
>    - When mirroring and/or striping is in effect, doing open "against"
>    the data files will result in mulitple stateid's.
>    - In the loose-coupling case, the three NFS protocols are treated as
>    essentially the same, despite their very real differences. This is, in
>    part, an editorial problem, but it appears to me that once the editorial
>    problems are addressed, one could face significant technical issues, See
>    below for details.
>
> At this point I can't figure out the locking models that are actually
> intended but, as a way of continuing the discussion, I draft some
> descriptions below of something that I believe is workable in the context.
> Although I may not be right in my guesses about how this will work, it
> seems to me that the items that are mentioned have to be addressed somehow
> to clearly describe a locking model.
>
>
> Here is what I've come up with for *Section 2.3.1.  Loose-coupling
> Locking Model*:
>
>
> When locking-related operations are requested, they are primarily dealt
> with by the metadata server, who generates the appropriate stateids.  When
> an NFSv4 version is used as the data access protocol, the metadata server
> may make stateid-related requests of the data storage devices.  However, it
> is not required to do so and the resulting stateids are known only to the
> metadata server and the data storage device.
>
>
> Given this basic structure, locking-related operations are handled as
> follows:
>
>
>
>    - OPENs are dealt with primarily on the metadata server.  Stateids are
>    selected by the metadata server and associated with the client id
>    describing the client's connection to the metadata server.  The metadata
>    server may need to interact with the data storage device to locate the file
>    to be opened, but no locking-related functionality need be used on the data
>    storage device.
>
>
> OPEN_DOWNGRADE and CLOSE only require local execution on the metadata
> sever.
>
>
>    - Advisory byte-range locks can be implemented locally on the metadata
>    server.  As in the case of OPENs, the stateids associated with byte-range
>    locks, are assigned by the metadata server and only used on the metadata
>    server.
>
> For reasons explained below, mandatory byte-range locks are not supported
> when loose coupling is in effect.
>
>
>    - Delegations are assigned by the metadata server who initiates
>    recalls when conflicting OPENs are processed.  No data storage device
>    involvement is required.
>    - TEST_STATEID and FREE_STATEID are processed locally on the metadata
>    server, without data storage device involvement.
>
> All IO operations to the data storage device are done using the anonymous
> stateid.  As a result, the data storage device has no information about the
> openowner and lockowner responsible for issuing a particular IO operation.
> As a result,
>
>
>
>    - Mandatory byte-range locking cannot be supported because the data
>    storage device has no way of distinguishing IOs done on behalf of the lock
>    owner from those done by others.
>    - Enforcement of share reservations is the responsibility of the
>    client.   Even though IO is done using the anonymous stateid, the client
>    must ensure that it has a valid stateid associated with the openowner, that
>    allows the IO being done before issuing the IO.
>
>
> In the event that a stateid is revoked, the metadata server is responsible
> for preventing client access, since the metadata server has no way of being
> sure that the client is aware that the stateid in question has been revoked.
>
>
> As the client never receives a stateid generated by the data storage
> device, there is no client lease on the data storage device and no prospect
> of lease expiration, even when NFSv4 protocols are used to access the data
> storage device.  Clients will have leases on the metadata server, which are
> subject to expiration.  In dealing with lease expiration, the metadata
> server my need to use fencing to prevent revoked stateids from being relied
> upon by a client unaware of the fact that they have been revoked.
>
>
>
>
> Adapted
>
> Here is what I've come up with for *Section 2.3.2.  Tight-coupling
> Locking Model*:
>
>
> When locking-related operations are requested, they are primarily dealt
> with by the metadata server, who generates the appropriate stateids.  These
> stateids must be made known to the data storage device using control
> protocol facilities, the details of which are not discussed in this
> document.
>
>
> Given this basic structure, locking-related operations are handled as
> follows:
>
>
>
>    - OPENs are dealt with primarily on the metadata server.  Stateids are
>    selected by the metadata server and associated with the client id
>    describing the client's connection to the metadata server.  The metadata
>    server needs to interact with the data storage device to locate the file to
>    be opened, and to make the data storage device aware of the association
>    between the metadata-sever-chosen stateid and the client and openowner that
>    it represents.
>
>
> OPEN_DOWNGRADE and CLOSE are executed initially on the metadata server but
> the state change made must be propagated to the data storage device.
>
>
>    - Advisory byte-range locks can be implemented locally on the metadata
>    server.  As in the case of OPENs, the stateids associated with byte-range
>    locks, are assigned by the metadata server and are available for use on the
>    metadata server.  Because IO operations are allowed to present lock
>    stateids, the metadata server needs the ability to make the data storage
>    device aware of the association between the metadata-sever-chosen stateid
>    and the corresponding open stateid it is associated with.
>
>
>    - Mandatory byte-range locks can be supported when both the metadata
>    server and the data storage devices has the appropriate support.  As in the
>    case of advisory byte-range locks, these are assigned by the metadata
>    server and are available for use on the metadata server.  To enable
>    mandatory lock enforcement on the data storage device, the metadata server
>    needs the ability to make the data storage device aware of the association
>    between the metadata-sever-chosen stateid and the client, openowner, and
>    lock (i.e., lockowner, byte-range, lock-type0 that it represents.  Because
>    IO operations are allowed to present lock stateids, this information needs
>    to be propagated to all data storage devices to which IO might be directed
>    rather than only to daya storage device that contain the locked region.
>
>
>    - Delegations are assigned by the metadata server who initiates
>    recalls when conflicting OPENs are processed.  Because IO operations are
>    allowed to present delegation stateids, the metadata server requires the
>    ability to make the data storage device aware of the association between
>    the metadata-server-chosen stateid and the filehandle and delegation type
>    it represents, and to break such an association.
>    - TEST_STATEID is processed locally on the metadata server, without
>    data storage device involvement.
>    - FREE_STATEID is processed on the metadata server but the metadata
>    server requires the ability to propagate the request to the corresponding
>    data storage devices.
>
> Because the client will possess and use stateids valid on the data storage
> device, there will be a client lease on the data storage device and the
> possibility of lease expiration does exist.  The best approach for the data
> storage device is to retain these locks as a courtesy.  However, if it does
> not do so, control protocol facilities need to provide the means to
> synchronize lock state between the metadata server and data storage device.
>
> Clients will also have leases on the metadata server, which are subject to
> expiration.  In dealing with lease expiration, the metadata server would be
> expected to use control protocol facilities enabling it to invalidate
> revoked stateids on the data storage device.  In the event the data storage
> device is not responsive, the metadata server may need to use fencing to
> prevent revoked stateids from being acted upon by the data storage device.
>
>
> The last sentence does not make sense. If the storage device is not
> responsive, then it is not going to be able to react to a revoked stateid.
> Unless you mean that the metadata server changes the stateid that the
> client has, then it subsequently uses that stateid in communicating to the
> storage device. In that case, though, this would not be fencing as it is
> not the client being denied access, it is the storage device.
>
> I’m going to read this as “In event the client is not responsive, …"
>
>
> As a result of describing the tight coupling locking model in parallel
> with the the loose coupling locking model, I've come to the conclusion that
> the phrase "global stateid id model", while a useful and compact summary,
> has made the function of the control protocol seem more
> difficult/mysterious than it needs to be.  Since the goal is to make it
> clear what is needed to implement flex-file, including the tight-coupling
> option, I think it would be helpful if the flex-files spec, retained the
> additional detail that appears above.
>
> Now I'm getting beyond the scope of a review of the flex-files spec but
> I'd like to note that the flex-files layout work has already made it clear
> that a large part of control protocol functionality is already present in
> the NFSv4 base protocol. Perhaps an NFSv4.x extension could be defined to
> provide the remainder and be usable for both the RFC5661-specified files
> layout and the flex-files layout with tight coupling.  Perhaps this could
> be discussed in Berlin?
>
>
> *4.1.  ff_device_addr4:*
>
>
> In the third non-CODE paragraph suggest, the following changes, primarily
> to reflect the fact that pNFS client use of layouts is never mandatory:
>
>
> Either the client access the file through non-pNFS access to the metadata
> server or it uses pNFS to access the file through the storage device. Since
> we are specifying the use of ff_device_addr4, the client has already made
> the decision to use pNFS and hence MUST is correct versus MAY ONLY.
>
> · In the penultimate sentence, suggest replacing "MUST access the storage
> device" by "MAY ONLY access the storage device directly"
>
> · In the last sentence, suggest replacing "MUST access the storage device
> using NFSv4" by "MAY ONLY access the storage device directly using the
> corresponding minor version of NFSv4"
>
>
> Tom believes that the two suggestions above imply that the client can use
> an unsupported protocol version.
>
>
> Where do I state that belief? I am quite firm in stating only specific
> protocol versions can be used.
>
>   I disagree.  This issue needs to be resolved.
>
>
> There is no issue,
>
> *5.  Flexible File Layout type*
>
> "type" needs to be capitalized in the title.  This is a new issue
> introduced by a change in -08.
>
>
> *5.1. ff_layout4:*
>
>
> There are two remaining issues in this section that were in the the -06.
>
>
>    - The contradiction between this section and *Section 2.3.  State and
>    Locking Models.*
>    - The fact that there is no use for ffds_stateid, since the anonymous
>    stateid is used in the loose-coupling case and a globally valid one is used
>    in the tight coupling case.
>
>
> In addition, the new FF_FLAGS_NO_IO_THROUGH_MDS in -07 raises some issues
> that need to be addressed:
>
>
>    - First of all, it isn't clear that "SHOULD" is actually
>    intended/appropriate. According to RFC2119, this means "that there may
>    exist valid reasons in particular circumstances to ignore a particular
>    item, but the full implications must be understood and carefully weighed
>    before choosing a different course". In particular, the text does not give
>    one a basis to understand the implications of choosing to do IO using
>    the MDS, when this flag is present.  Perhaps "should" is more appropriate
>    here?
>
>
> I believe the definition of SHOULD fits the use here. The implementation
> of the metadata server has decided for some reason of its own that it wants
> the client to route data requests directly to the storage device and unless
> the client implementation is aware of those reasons, it SHOULD really,
> really avoid sending it to the metadata server.
>
> It isn’t a MUST because it is a hint.
>
>
>
>    - The statement "even if a storage device is partitioned from the
>    client, the client SHOULD not try to proxy the IO through the metadata
>    server" raises additional issues. I assume that partitioning might happen
>    after the layout in question is recalled and is part of the revocation
>    process for the layout in question. Thurs this flag seems to be giving
>    directions regarding metadata-directed after the layout in question no
>    longer applies. ????
>
>
> partitioned in the sense of “network partition” - i.e., happening before
> the recall.
>
> Made this a bit clearer in draft 10.
>
>
>
>    - Given that base NFSv4 IO does not require use of layouts, it isn't
>    clear that the client would actually use layouts and, even if it did, it
>    would not require one for areas to which it is doing IO directed at the
>    metadata server.  Because of this, a client might not see the
>    RECOMMENDATION/recommendation before doing the IO being warned against.
>
>
> I don’t want to point out everywhere that pNFS need not be used. By asking
> for a LAYOUT in the first place, the client is doing pNFS. Yes, it may
> decide against using pNFS once it sees that the metadata server is
> advocating FF_FLAGS_NO_IO_THRU_MDS, but the use of non-pNFS here is not
> interesting.
>
>
> Although how this might be dealt with is going to depend on the resolution
> of the should-vs.-SHOULD question mentioned above, I'm concerned that
> someone contributing to this specification, not necessarily one of the
> authors, is assuming a level of metadata server direction with regard to
> client IO that is inconsistent with the pNFS model.  Within pNFS, a
> client's ability to do IO to the metadata server is defined by the base
> NFSv4.1 semantics,  while the layout type may impose, using layouts, any
> restrictions it wants for IO through the data storage devices.
>
>
>
> I don’t follow how this differs from what I have presented. The text
> states that the flex files layout type will act a certain way, it makes no
> requirements on the more general pNFS model.
>
> *5.1.1.  Error codes from LAYOUTGET*
>
> I'm doubtful about the use of "SHOULD" in the cases for NFS4ERR_{LAYOUTTRYLATER, DELAY}.  It seems to me that the author is telling me, that, when the client has a layout, it is either desirable or undesirable for me to continue to use it.  But there is no basis given for considering this an interoperability issue, or letting the reader understand the consequences of taking the choice considered undesirable.  I think these "SHOULD"s should be "should”s.
>
>
> We seem to disagree upon SHOULD/should.
>
> To me, SHOULD means I’d really love to make this a MUST, but there exists
> enough prior art to prevent that.
>
> Okay, in thinking about it, the use of FF_FLAGS_NO_IO_THROUGH_MDS is a
> SHOULD and here it is a should. I.e., the new flag is to allow for the
> server to define specific behavior for flex files, but these
> are “interpretations”.
>
>
>

[nfsv4] Fwd: Review of draft-ietf-nfsv4-flex-file… David Noveck
[nfsv4] Review of draft-ietf-nfsv4-flex-files-08 … David Noveck
Re: [nfsv4] Review of draft-ietf-nfsv4-flex-files… Thomas Haynes
Re: [nfsv4] Review of draft-ietf-nfsv4-flex-files… David Noveck