[nfsv4] Fwd: Review of draft-ietf-nfsv4-flex-files-08 (part two of three)

David Noveck <davenoveck@gmail.com> Sat, 02 July 2016 10:01 UTC

MIME-Version: 1.0
In-Reply-To: <CADaq8jePBxsJxBwV-KkPdNjGJdBGwDsgxesayOuOF6k=O3u9Gw@mail.gmail.com>
References: <CADaq8jePBxsJxBwV-KkPdNjGJdBGwDsgxesayOuOF6k=O3u9Gw@mail.gmail.com>
From: David Noveck <davenoveck@gmail.com>
Date: Sat, 02 Jul 2016 06:01:39 -0400
Message-ID: <CADaq8jf7EeLd-vhHEhecZ5Q5QkWkRhrXQbRYsjbkbNS+B1pM=A@mail.gmail.com>
To: "nfsv4@ietf.org" <nfsv4@ietf.org>
Content-Type: multipart/alternative; boundary="001a1141ca327b67550536a430e8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/iXNw99DbDFvYRnrvRlEzd_gYsPE>
Subject: [nfsv4] Fwd: Review of draft-ietf-nfsv4-flex-files-08 (part two of three)
Precedence: list

resending

---------- Forwarded message ----------
From: David Noveck <davenoveck@gmail.com>
Date: Sat, May 21, 2016 at 10:00 AM
Subject: Review of draft-ietf-nfsv4-flex-files-08 (part two of three)
To: Thomas Haynes <thomas.haynes@primarydata.com>, Benny Halevy <
bhalevy@gmail.com>
Cc: "nfsv4@ietf.org" <nfsv4@ietf.org>

*Review Structure*

This email is the second part of a three-part review.

Note that the overall comments are contained in the first part of this
review.  These contain:

   - *Background of Review*
   -
*General Evaluation *
   -
*Issues Blocking Working Group Last Call *
   -
*Other Noteworthy Issues *

*Per-section Comments (From Section 2.3 through Section 5.1.1)*

*2.3. State and Locking Models:*

This section consists of two parts:

   - The first part describes a locking model which I presume is the
   locking model that applies in the loose coupling case.
   - The second part, the last two paragraphs, describes how certain
   features of the environment govern which locking model is to be selected.

The problem with this structure is that the second part should be at the
start and you would then be in a position to describe each of the locking
models.  I think the better structure would be to start with what are now
the two final paragraphs and then have subsections that describe the two
locking models.

There are a number of editorial issues in the last two paragraphs:

   - In the last sentence of the last paragraph, "described in [RFC5661]"
   is wrong since there is no protocol described there.
   - Using "NFSv4" to mean "NFSv4.0" is a likely source of confusion.
   - In many cases, mention of NFSv3 is missing.

I propose rewriting the current final two paragraphs as follows:

The choice of locking models is governed by the following rules:

   - Data storage devices implementing the NFSv3 and NFSv4.0 protocols are
   always treated as loosely coupled.
   - NFSv4.1+ storage devices that do not return the EXCHGID4_FLAG_USE_PNFS_DS
   flag set to EXCHANGE_ID are indicating that they are to be treated as
   loosely coupled. As such, they are treated, from the locking
viewpoint, in the
   same way as NFSv4.0 storage devices.
   - NFSv4.1+ storage devices that do identify themselves with the
   EXCHGID4_FLAG_USE_PNFS_DS flag set to EXCHANGE_ID are considered strongly
   coupled. They will be using a back-end control protocol as provided for in
   [RFC5661] to implement the global stateid model as defined there.

With regard to the tight coupling case, I presume that the appropriate
locking model is that described in Chapter 13 of RFC5661 but think there
these should be some discussion what exactly this means in practice and of
how the new/different features of the mapping type interact with locking
model.

Now to go back to the first paragraphs, the second sentence of the second
paragraph is wrong and needs to be changed as it contradicts what is
written about stateids in *5.1.  ff_layout4*.  Based on my discussion with
Tom, I am assuming that anonymous stateids will be used to do IO in the
loosely coupled case.

Once that issue is resolved, there needs to be some discussion of how the
fact that all IO will be stateid-anonymous will be dealt with.  I am going
to be assuming the it will be in this section, rather than in *5.1.
 ff_layout4*.

With regard to mandatory byte-range locking we need an explicit statement
tht this is not (i.e. cannot be) supported with loose coupling.

With regard to mandatory locking due to share reservations one doesn't have
the option of simply not supporting the functionality. The spec will have
to clearly explain how it is to be done. Some likely elements:

   - In the case in which each of the clients with a particular file
   opened, has the same IO rights, the MDS has to ensure via layout recalls
   (and potential indicating layouts are unavailable) that no client which has
   no owner allowed to a particular form of IO has no layouts that allow that
   form of IO to be done. (it may already say that but it probably needs
   clarification).
   - In the case in which a particular client has multiple owners with
   different levels IO rights, the spec either has to ask the pNFS client to
   do the enforcement itself, or it has to provide that layouts are to be
   unavailable to this client and require the client to perform the IO via the
   MDS.

Once that is addressed, we have to face the fundamental problem with this
section. It has to to with the stateids that are returned to clients,
rather than the ones that appear (or don't) in layouts.

>From what is written there now, it is hard to determine what is actually
intended.  A lot of confusion results from the multiple and uncertain
meanings of the preposition "against".

In the first sentence, the phrase "against the metadata server", simply
indicates that the operations in question are directed to the metadata
server.  As this paragraph, unlike the following one, applies to both loose
and tight coupling,it should stay where it is.  I suggest redrafting it as
follows:

Clients always perform locking-relating operations by interacting with the
metadata server.  These include operations related to open files (OPEN,
OPEN_DOWNGRADE, and CLOSE), byte-range locking (LOCK, LOCKT, and
LOCKU), delegation
management (DELEGRETURN), and stateid management (TEST_STATEID and
REMOVE_STATEID).  Delegation recall is effected by the metadata server
sending a callback to the client.

In all cases, the stateids that result from executing these operations are
returned by the metadata server to the client and client uses these
stateids in subsequent locking-related operations.  The means by which
these stateids are maintained and the handling of IO operations differ with
the coupling strength in effect for the connnection.

The existing second paragraph is not clear but, for a number of reasons, I
don't believe that it is a good basis for an eventual subsection describing
the loose coupling locking model

   - Although the introductory sentences mention OPEN, LOCK, DELEGATION,
   the rest of the discussion focuses on opens, leaving it very unclear how
   byte-range locks and delegations will/should/might be dealt with. I think
   this is primarily an editorial problem although there are potential
   interactions with choices regarding fundamental technical choices as far as
   NFSv4.x.
   - When mirroring and/or striping is in effect, doing open "against" the
   data files will result in mulitple stateid's.
   - In the loose-coupling case, the three NFS protocols are treated as
   essentially the same, despite their very real differences. This is, in
   part, an editorial problem, but it appears to me that once the editorial
   problems are addressed, one could face significant technical issues, See
   below for details.

At this point I can't figure out the locking models that are actually
intended but, as a way of continuing the discussion, I draft some
descriptions below of something that I believe is workable in the context.
Although I may not be right in my guesses about how this will work, it
seems to me that the items that are mentioned have to be addressed somehow
to clearly describe a locking model.

Here is what I've come up with for *Section 2.3.1.  Loose-coupling Locking
Model*:

When locking-related operations are requested, they are primarily dealt
with by the metadata server, who generates the appropriate stateids.  When
an NFSv4 version is used as the data access protocol, the metadata server
may make stateid-related requests of the data storage devices.  However, it
is not required to do so and the resulting stateids are known only to the
metadata server and the data storage device.

Given this basic structure, locking-related operations are handled as
follows:

   - OPENs are dealt with primarily on the metadata server.  Stateids are
   selected by the metadata server and associated with the client id
   describing the client's connection to the metadata server.  The metadata
   server may need to interact with the data storage device to locate the file
   to be opened, but no locking-related functionality need be used on the data
   storage device.

OPEN_DOWNGRADE and CLOSE only require local execution on the metadata sever.

   - Advisory byte-range locks can be implemented locally on the metadata
   server.  As in the case of OPENs, the stateids associated with byte-range
   locks, are assigned by the metadata server and only used on the metadata
   server.

For reasons explained below, mandatory byte-range locks are not supported
when loose coupling is in effect.

   - Delegations are assigned by the metadata server who initiates recalls
   when conflicting OPENs are processed.  No data storage device involvement
   is required.
   - TEST_STATEID and FREE_STATEID are processed locally on the metadata
   server, without data storage device involvement.

All IO operations to the data storage device are done using the anonymous
stateid.  As a result, the data storage device has no information about the
openowner and lockowner responsible for issuing a particular IO operation.
As a result,

   - Mandatory byte-range locking cannot be supported because the data
   storage device has no way of distinguishing IOs done on behalf of the lock
   owner from those done by others.
   - Enforcement of share reservations is the responsibility of the client.
     Even though IO is done using the anonymous stateid, the client must
   ensure that it has a valid stateid associated with the openowner, that
   allows the IO being done before issuing the IO.

In the event that a stateid is revoked, the metadata server is responsible
for preventing client access, since the metadata server has no way of being
sure that the client is aware that the stateid in question has been revoked.

As the client never receives a stateid generated by the data storage
device, there is no client lease on the data storage device and no prospect
of lease expiration, even when NFSv4 protocols are used to access the data
storage device.  Clients will have leases on the metadata server, which are
subject to expiration.  In dealing with lease expiration, the metadata
server my need to use fencing to prevent revoked stateids from being relied
upon by a client unaware of the fact that they have been revoked.

Here is what I've come up with for *Section 2.3.2.  Tight-coupling Locking
Model*:

When locking-related operations are requested, they are primarily dealt
with by the metadata server, who generates the appropriate stateids.  These
stateids must be made known to the data storage device using control
protocol facilities, the details of which are not discussed in this
document.

Given this basic structure, locking-related operations are handled as
follows:

   - OPENs are dealt with primarily on the metadata server.  Stateids are
   selected by the metadata server and associated with the client id
   describing the client's connection to the metadata server.  The metadata
   server needs to interact with the data storage device to locate the file to
   be opened, and to make the data storage device aware of the association
   between the metadata-sever-chosen stateid and the client and openowner that
   it represents.

OPEN_DOWNGRADE and CLOSE are executed initially on the metadata server but
the state change made must be propagated to the data storage device.

   - Advisory byte-range locks can be implemented locally on the metadata
   server.  As in the case of OPENs, the stateids associated with byte-range
   locks, are assigned by the metadata server and are available for use on the
   metadata server.  Because IO operations are allowed to present lock
   stateids, the metadata server needs the ability to make the data storage
   device aware of the association between the metadata-sever-chosen stateid
   and the corresponding open stateid it is associated with.

   - Mandatory byte-range locks can be supported when both the metadata
   server and the data storage devices has the appropriate support.  As in the
   case of advisory byte-range locks, these are assigned by the metadata
   server and are available for use on the metadata server.  To enable
   mandatory lock enforcement on the data storage device, the metadata server
   needs the ability to make the data storage device aware of the association
   between the metadata-sever-chosen stateid and the client, openowner, and
   lock (i.e., lockowner, byte-range, lock-type0 that it represents.  Because
   IO operations are allowed to present lock stateids, this information needs
   to be propagated to all data storage devices to which IO might be directed
   rather than only to daya storage device that contain the locked region.

   - Delegations are assigned by the metadata server who initiates recalls
   when conflicting OPENs are processed.  Because IO operations are allowed to
   present delegation stateids, the metadata server requires the ability to
   make the data storage device aware of the association between the
   metadata-server-chosen stateid and the filehandle and delegation type it
   represents, and to break such an association.
   - TEST_STATEID is processed locally on the metadata server, without data
   storage device involvement.
   - FREE_STATEID is processed on the metadata server but the metadata
   server requires the ability to propagate the request to the corresponding
   data storage devices.

Because the client will possess and use stateids valid on the data storage
device, there will be a client lease on the data storage device and the
possibility of lease expiration does exist.  The best approach for the data
storage device is to retain these locks as a courtesy.  However, if it does
not do so, control protocol facilities need to provide the means to
synchronize lock state between the metadata server and data storage device.

Clients will also have leases on the metadata server, which are subject to
expiration.  In dealing with lease expiration, the metadata server would be
expected to use control protocol facilities enabling it to invalidate
revoked stateids on the data storage device.  In the event the data storage
device is not responsive, the metadata server may need to use fencing to
prevent revoked stateids from being acted upon by the data storage device.

As a result of describing the tight coupling locking model in parallel with
the the loose coupling locking model, I've come to the conclusion that the
phrase "global stateid id model", while a useful and compact summary, has
made the function of the control protocol seem more difficult/mysterious
than it needs to be.  Since the goal is to make it clear what is needed to
implement flex-file, including the tight-coupling option, I think it would
be helpful if the flex-files spec, retained the additional detail that
appears above.

Now I'm getting beyond the scope of a review of the flex-files spec but I'd
like to note that the flex-files layout work has already made it clear that
a large part of control protocol functionality is already present in the
NFSv4 base protocol. Perhaps an NFSv4.x extension could be defined to
provide the remainder and be usable for both the RFC5661-specified files
layout and the flex-files layout with tight coupling.  Perhaps this could
be discussed in Berlin?

*4.1.  ff_device_addr4:*

In the third non-CODE paragraph suggest, the following changes, primarily
to reflect the fact that pNFS client use of layouts is never mandatory:

· In the penultimate sentence, suggest replacing "MUST access the storage
device" by "MAY ONLY access the storage device directly"

· In the last sentence, suggest replacing "MUST access the storage device
using NFSv4" by "MAY ONLY access the storage device directly using the
corresponding minor version of NFSv4"

Tom believes that the two suggestions above imply that the client can use
an unsupported protocol version.   I disagree.  This issue needs to be
resolved.

*5.  Flexible File Layout type*

"type" needs to be capitalized in the title.  This is a new issue
introduced by a change in -08.

*5.1. ff_layout4:*

There are two remaining issues in this section that were in the the -06.

   - The contradiction between this section and *Section 2.3.  State and
   Locking Models.*
   - The fact that there is no use for ffds_stateid, since the anonymous
   stateid is used in the loose-coupling case and a globally valid one is used
   in the tight coupling case.

In addition, the new FF_FLAGS_NO_IO_THROUGH_MDS in -07 raises some issues
that need to be addressed:

   - First of all, it isn't clear that "SHOULD" is actually
   intended/appropriate. According to RFC2119, this means "that there may
   exist valid reasons in particular circumstances to ignore a particular
   item, but the full implications must be understood and carefully weighed
   before choosing a different course". In particular, the text does not give
   one a basis to understand the implications of choosing to do IO using
   the MDS, when this flag is present.  Perhaps "should" is more appropriate
   here?
   - The statement "even if a storage device is partitioned from the
   client, the client SHOULD not try to proxy the IO through the metadata
   server" raises additional issues. I assume that partitioning might happen
   after the layout in question is recalled and is part of the revocation
   process for the layout in question. Thurs this flag seems to be giving
   directions regarding metadata-directed after the layout in question no
   longer applies. ????
   - Given that base NFSv4 IO does not require use of layouts, it isn't
   clear that the client would actually use layouts and, even if it did, it
   would not require one for areas to which it is doing IO directed at the
   metadata server.  Because of this, a client might not see the
   RECOMMENDATION/recommendation before doing the IO being warned against.

Although how this might be dealt with is going to depend on the resolution
of the should-vs.-SHOULD question mentioned above, I'm concerned that
someone contributing to this specification, not necessarily one of the
authors, is assuming a level of metadata server direction with regard to
client IO that is inconsistent with the pNFS model.  Within pNFS, a
client's ability to do IO to the metadata server is defined by the base
NFSv4.1 semantics,  while the layout type may impose, using layouts, any
restrictions it wants for IO through the data storage devices.

*5.1.1.  Error codes from LAYOUTGET*

I'm doubtful about the use of "SHOULD" in the cases for
NFS4ERR_{LAYOUTTRYLATER, DELAY}.  It seems to me that the author is
telling me, that, when the client has a layout, it is either desirable
or undesirable for me to continue to use it.  But there is no basis
given for considering this an interoperability issue, or letting the
reader understand the consequences of taking the choice considered
undesirable.  I think these "SHOULD"s should be "should"s.

[nfsv4] Fwd: Review of draft-ietf-nfsv4-flex-file… David Noveck
[nfsv4] Review of draft-ietf-nfsv4-flex-files-08 … David Noveck
Re: [nfsv4] Review of draft-ietf-nfsv4-flex-files… Thomas Haynes
Re: [nfsv4] Review of draft-ietf-nfsv4-flex-files… David Noveck