[Gen-art] Gen-art Last Call/Telechat review of draft-ietf-nfsv4-rfc3530-migration-update-07

Elwyn Davies <elwynd@folly.org.uk> Mon, 18 January 2016 14:02 UTC

Return-Path: <elwynd@folly.org.uk>
X-Original-To: gen-art@ietfa.amsl.com
Delivered-To: gen-art@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 427371B3758; Mon, 18 Jan 2016 06:02:41 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.801
X-Spam-Level:
X-Spam-Status: No, score=0.801 tagged_above=-999 required=5 tests=[BAYES_50=0.8, HTML_MESSAGE=0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id n5lXKN7CsVVW; Mon, 18 Jan 2016 06:02:34 -0800 (PST)
Received: from a.painless.aa.net.uk (a.painless.aa.net.uk [IPv6:2001:8b0:0:30::51bb:1e33]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4BB0D1B374D; Mon, 18 Jan 2016 06:02:33 -0800 (PST)
Received: from 4.e.a.4.6.4.2.8.9.8.a.d.f.6.9.f.1.0.0.0.f.b.0.0.0.b.8.0.1.0.0.2.ip6.arpa ([2001:8b0:bf:1:f96f:da89:8246:4ae4]) by a.painless.aa.net.uk with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.77) (envelope-from <elwynd@folly.org.uk>) id 1aLANz-0006o8-1u; Mon, 18 Jan 2016 14:02:31 +0000
From: Elwyn Davies <elwynd@folly.org.uk>
To: General area reviewing team <gen-art@ietf.org>
Message-ID: <569CF06B.8070909@folly.org.uk>
Date: Mon, 18 Jan 2016 14:02:19 +0000
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="------------090303040906030402060509"
X-Painless-Spam-Score: -0.1
Archived-At: <http://mailarchive.ietf.org/arch/msg/gen-art/XU_LtxAlin6budw7u35May12Nao>
Cc: draft-ietf-nfsv4-rfc3530-migration-update.all@ietf.org
Subject: [Gen-art] Gen-art Last Call/Telechat review of draft-ietf-nfsv4-rfc3530-migration-update-07
X-BeenThere: gen-art@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "GEN-ART: General Area Review Team" <gen-art.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/gen-art>, <mailto:gen-art-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/gen-art/>
List-Post: <mailto:gen-art@ietf.org>
List-Help: <mailto:gen-art-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/gen-art>, <mailto:gen-art-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Jan 2016 14:02:41 -0000

I am the assigned Gen-ART reviewer for this draft. The General Area
Review Team (Gen-ART) reviews all IETF documents being processed
by the IESG for the IETF Chair. Please wait for direction from your
document shepherd or AD before posting a new version of the draft.

For more information, please see the FAQ at

<http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.

Document: draft-ietf-nfsv4-rfc3530-migration-update-07.tx
Reviewer: Elwyn Davies
Review Date: 2016/01/16
IETF LC End Date: 2016/01/18
IESG Telechat date: 2016/01/18

Summary: Almost ready.  There are a few minor issues and a considerable 
number of nits that need attention.  The text is extremely complex and 
in some places the logic is not entirely clear.  I would encourage the 
authors to revisit the various usages of RFC 2119 language;  a number of 
the instances, especially several of the 'SHOULDs', appear to refer to 
operational or administrative decisions rather than matters that affect 
the protocol - and are therefore not actionable by the receiving 
implementation.  If appropriate the language should be modified.

Major issues:
------------------
None

Minor issues:
------------------
Bits possibly not covered from RFC 7530:
---------------------------------------------------------
With respect to s4:
In s9.1.1, para 1 of RFC 7530 there is:
> Breaking
>     the lease state amounts to the server removing all lock, share
>     reservation, and, where the server is not supporting the
>     CLAIM_DELEGATE_PREV claim type, all delegation state associated with
>     the same client with the same identity.  For a discussion of
>     delegation state recovery, seeSection 10.2.1 <https://tools.ietf.org/html/rfc7530#section-10.2.1>.
There is nothing in the update about servers that don't support the 
CLAIM_DELEGATE_PREV claim type.  Should something be said about this?

The general description of the distinction between open owners and lock 
owners in para 2, 3 and 4 of s9.1.1 is lost if the new text is taken as 
a true replacement of s9.1.1.  This is undesirable.

------------------------------------------
Items in the text of this draft:
-----------------------------------------
s5.1.3: When a file system is migrated, the source server is queried 
about the new location of the file system using GETATTR(fs_locations).  
Since the file system involved has migrated, the server no longer has a 
connection with the relevant file system but still needs to respond to 
the GETATTR appropriately.  Does there need to be some discussion of how 
long the server needs to hang onto information that will allow it to 
respond to these requests, particularly since the suggested means of 
discovery uses an arbitrary file from the file system.  It is possible 
that I have missed something or my understanding/memory of how 
fs_locations works is faulty.

s7.4.5.1: It would be helpful to indicate the if-then structure of the 
bullets by making current bullets 3 and 4 sub-bullets of bullet 2.   
Indeed, on second thoughts, it is not clear which 'if' (either that in 
bullet 2 or bullet 3) the 'otherwise' in bullet 4 applies to.  Please 
make the logic clearer.

s7.5, bullet 1: Surely the decision as to whether privacy is needed is 
an administrative/operational deployment decision?  Transfer integrity 
is vital and provision should be made for transfer privacy to be 
possible if the installation requires it, but I can't see that privacy 
needs to be ensured in all possible circumstances.

=======================

Nits/editorial comments:
-----------------------------------

General: s/e.g. /e.g., /g (5 instances); s/i.e. /i.e., /g  (5 instances).

General (in s5): s/openowner/open-owner/g; s/lockowner/lock-owner/g

General: s/filesystem/file system/g (In line with usage in RFC 7530).

General: Lack of terminology section. There are a number of terms that 
might be advantageously incorporated into a terminology section - lock 
types, client id, stateid, incarnation verifier (and its relation to 
reboots/'boot verifier'), server (address) trunking, open owners, lock 
owners spring to mind.


s3 , para 6/s4.1, para 7: The terms 'client boot instance' and 
(subsequently) 'boot verifier' are not used in RFC 7530.  s9.1.1 uses 
the term 'client incarnation verifier' (and this term is actually used 
in s4.1 , para 10 (second bullet after nfs_client_id4 data structure 
definition).  I personally think is a more descriptive term.  I would be 
inclined to replace 'boot verifier' with 'incarnation verifier' 
throughout, and provide a definition in a terminology section (see 
above).  This could also cover the case where the incarnation verifier 
is changed because the structure is destroyed rather then the client 
host rebooting (as noted in s4.6) although the intention is that it 
should only change on a reboot.

s3, 2nd set of bullets, bullets 2 and 3; Also s4.9, para 1: s/client 
id/client id string/

s3, last bullet: It would be worth pointing out that there is a complete 
revision of the definition of the SETCLIENTID operation. This is 
relevant because there is a pointer to the definition of SETCLIENTID in s4.

s4: Correct section names from RFC 7530..
OLD:
The replaced sections are named "client ID" and "Server Release of 
Clientid."
NEW:
The replaced sections are named "Client ID" and "Server Release of 
Client ID."
END

s4.1, bullet 1: s/client IP address/the client IP address/

s4.1,bullet 2: I would replace 'save' by 'maintain a non-volatile record 
across reboots of' to make it clear what is intended.

s4.1, last para:  The concept of 'server trunking' needs to be defined.  
NFSv4.0 per RFC 7530 does not have this concept - it is introduced in 
NFSv4.1, thus somebody looking from the PoV of a 4.0 implementer need 
not know anything about trunking.

s4.2: Section references to relevant parts of RFC 7530 that provide 
definitions of structures and operations would be helpful here.

s4.2, bullet 1 of second set of bullets (under struct nfs_client_id):
> The id field is a variable-length string which uniquely identifies
>        a specific client.  Although, we describe it as a string and it is
>        often referred to as a "client string," it should be understood
>        that the protocol defines this as opaque data.
Later on s5.1 recommends that opaque data such as this should be encoded 
in network byte order.  It would be helpful to repeat or introduce this 
recommendation (make it a MUST???) here to ensure that the id will be 
identical on servers of whatever endianess. Also the term 'network byte 
order' is primarily relevant to data for which the internal 
representation is a (binary) number.  A better way of saying this might 
be 'the encoding and decoding processes (e.g., using network byte order 
for the external representation) for the opaque data result in the same 
internal representation whatever the endianness of the originating and 
receiving machines' even if it is somewhat more long winded.

s4.2, para 20 (across the boundary between p7 and p8):
>     This shorthand client identifier (a client ID) is
>     assigned by the server and should be chosen so that it will not
>     conflict with a client ID previously assigned by same server.  This
>     applies across server restarts or reboots.
Would there be any advantage to recommending that servers should try to 
generate clientid4's that are unique to the server as far as possible as 
well as different across reboots, etc.?  It would appear to minimise the 
risk of clashes during migrations but this may be unnecessarily 
complicated and the client has to cope with the remote possibility 
anyway. I note that there is a statement in s4.8, para 4 on page 19
>        Given that it is already highly unlikely that the clientid XC is
>        duplicated by distinct servers, the probability that SCn is
>        duplicated as well has to be considered vanishingly small.
Adding the recommendation would support this statement.

s4.2, para 7 on page 8:
s/of administrative action,/of administrative action./ (swap comma for 
period/full stop)

s4.2, last bullet:
>     o  Merger of state under the associated lease with another lease
>        under a different clientid causes the clientid4 serving as the
>        source of the merge to cease being recognized on its server.
>        (Always returns NFS4ERR_STALE_CLIENTID)
Should 'clientid causes the clientid4' be 'client ID causes the clientid4'?
There are 10 other instances of clientid (rather than clientid4) where 
the same question applies.

s4.2, para after last bullet on p9:
> In
>     cases of server or client error resulting in this error, use of
>     SETCLIENTID to establish a new lease is desirable as well.
Which error is meant?  There isn't one mentioned in this para - I guess 
_STALE_CLIENTID but could be 'any of the above errors'.

s4.2, page 9.
> (See the section entitled "Server Failure and Recovery")
and
> (see the section entitled "lock-owner" for
>     details)
   I take these are intended to be sections in RFC 7530 (respectively 
9.6.2 and 9.1.5) -  adding these section numbers and the RFC ref would 
be helpful.

s4.2, para last but two:
>     In the last two cases, different recovery procedures are required.
>     See Section 5.1.1 for details.
It isn't clear which cases you are referring to here... the last two 
bullets two paras before or the last two cases referred to in the 
previous para out of
>     In the event of a server reboot, loss of lease state due to lease
>     expiration, or administrative revocation of a clientid4, the client
Please make this more explicit.


s4.2, last para:
>     See the detailed descriptions of SETCLIENTID and SETCLIENTID_CONFIRM
>     for a complete specification of these operations.
Section references would help, plus a note that the definition of 
SETCLIENTID is now in this document (s7.4) rather than Section 16.33 of 
[RFC7530].  SETCLIENTID_CONFIRM is at Section 16.34 of [RFC7530].

s4.3, last para:
>     In that event, when the server gets a SETCLIENTID specifying a client
>     id string for which the server has a clientid4
Which of the three events in the previous para is being referred to?  
Presumably the deliberate change of the primcipal since the other two 
are not really recognizable in a controlled way. ;-)

s4.4, 2nd bullet on page 11 (para 7):
>        the client to be
>        aware when two server IP addresses are connected to the same
>        server (they return the same server name in responding to an
>        EXCHANGE_ID).
The comment about 'return[ing] the same server name' in NFSv4.1 is not 
the real reason the client is sure they are the same server. Would 
suggest changing this to '(Section 2.10.5.1 of [RFC5661] explains how 
the client is able to assure itself that the connections are to the same 
logical server.)'

s4.4, 3rd bullet on page 11 (para 9):  Suggest adding a reference to the 
fs_locations discussions in RFC 7530 - Perhaps '(see Sections 8.1 and 
8.42 of [RFC7530])'.


s4.4. 4th bullet on page 11 (para 10):
>        in that the two different
>        client id strings sent to different IP addresses may wind up on
>        the same IP address, adding confusion.
The phrase 'same IP address' doesn't really express what is going on 
IMO.  Perhaps 'same (logical) server'.

s4.5, para 3: s/per server network addresses./per server network address./

s4.5, last para:
> Therefore, client implementations
>     that support migration with transparent state migration SHOULD NOT
>     use the non-uniform client id string approach, except where it is
>     necessary for compatibility with existing server implementations
Arguably, this is an operational decision rather than something that 
affects the protocol... possibly s/SHOULD NOT/should not/.

s4.6, para 10: s/typically the reboot time./typically at reboot time./

s4.7, para 6: s/the spec/this specification/

s4.7, para 8:   Spurious double quote (") at end of para:
> enable transparent state migration."

s4.7, para 9: Missing period/full stop at end of para.


s4.8, para 3: s/blind-sided/surprised/ (blind-sided is rather too 
idiomatic).

s4.8, para 7: s/Servers MUST NOT do that./Servers MUST NOT behave in 
this way./

s4.8, last para on page 17:
>     In this algorithm, when SETCLIENTID is done it will use the common
>     nfs_client_id4 and specify the current target IP address as part of
>     the callback parameters.
I am unclear why (and, indeed, how) the target IP address needs to be 
incorporated in the callback parameters.  AFAICS, the process described 
in this section does not examine a callback (by this I mean the results 
of one of the CB_xxx operations) as part of the process of trunking 
determination.   The choice of callback_ident (which appears to be the 
only relevant parameter) seems more likely to be appropriately selected 
in line with the statement later in the process (on page 19):
>        The specific callback
>        parameters chosen, in terms of cb_client4 and callback_ident, are
>        up to the client and should reflect its preferences as to callback
>        handling for the common clientid, in the event that X and IPn are
>        trunked together.
but chosen so that they would be appropriate if the server doesn't end 
up trunked with any other existing client ID.  Clearly the IP address 
would be a convenient semi-random value which would work to identify the 
callback uniquely eventually but the selected value doesn't seem to have 
any effect on the algorithm.

s4.8, para 1 on page 18:
>     Note that when the client has done previous SETCLIENTID's, to any IP
>     addresses, with more than one principal or authentication flavor, we
>     have the possibility of receiving NFS4ERR_CLID_INUSE, since we do not
>     yet know which of our connections with existing IP addresses might be
>     trunked with our current one.  In the event that the SETCLIENTID
>     fails with NFS4ERR_CLID_INUSE, one must try all other combinations of
>     principals and authentication flavors currently in use and eventually
>     one will be correct and not return NFS4ERR_CLID_INUSE.
I am not clear how this interacts with the variation of the definition 
of NFS4ERR_CLID_INUSE given in s7.2.  Would I expect the server to 
accept authentication flavors other than the one already used for 
another address leading to the same server?  s7.2 says that the server 
MAY accept a different flavor if it thinks the request is bona fide.  I 
don't understand what might or might not lead the server to think an 
alternative flavor was bona fide - and would it be likely to apply in 
this case?

s4.9, last set of bullets: Should the draft also advise applying a one 
way function to the MAC address for privacy reasons?



s4.8, page 18:
>     o  If one or more matching clientid4's is found, none of which is
>        marked unresolved, the new IP address X is entered and marked
>        unresolved.  A SETCLIENTID_CONFIRM is done to X using XC and XV.
>
>        After applying the steps below to each of the lead IP addresses
>        with a matching clientid4, the address will have been resolved: It
>        may have been determined to be part of an already known server as
>        a new IP address to be added to an existing set of IP addresses
>        for that server.  Otherwise, it will be recognized as a new
>        server.  At the point at which this determination is made, the
>        unresolved indication is cleared and any suspended SETCLIENTID
>        processing is restarted
I suspect that the second para in the section quoted here is not 
intended to be part of the bullet, and should be unindented.

-----------------
s5.1, para 2: It would be desirable to mention that the change from big 
endian in [RFC7530] to network byte order is purely terminological.

s5.1.1, para 1:
>     In the case of migration, the servers involved in the migration of a
>     filesystem SHOULD transfer all server state associated with the
>     migrating filesystem from source to the destination server.
Arguably this SHOULD is an operational decision.  It does not affect the 
protocol or the format of the bits on the wire.  Accordingly I suggest 
s/SHOULD/should/.  This corresponds with the discussion in [RFC7530].  
On the other hand:
>   This
>     must be done in a way that is transparent to the client.
is probably a MUST: s/This must be done/If state is transferred it MUST 
be done/

s5.1.1, para 7: s/NFS version 4.0 protocol/NFS version 4.0 protocol 
(either in [RFC7530] or this update)/

s5.1.1, para 12:
> When a
>     server implements migration and it does not transfer state
>     information, it SHOULD provide a filesystem-specific grace period, to
>     allow clients to reclaim locks associated with files in the migrated
>     filesystem.
I believe this is a MUST rather than a SHOULD.  The alternative of not 
doing it doesn't look good!  If so the last sentence of s5..1.1 needs 
adapting to match.

s5.1.1.1: The {v, X, c} notation needs to be defined (presumably as used 
in s7.4 and Section 16.33 of [RFC7530]).  Section references in this doc 
or  [RFC7530] ought to be provided for SETCLIENTID (**s7.4 in this 
doc**) and SETCLIENTID_CONFIRM (s16.34 of [RFC7530]).

s5.1.1.2, para 5: How does the server know that the client has been 
implemented to isolate owners to a particular filesystem?  Looking at 
the next para, I suspect that the server doesn't know - it merely 
observes that owners don't sprerad across multiple file systems at the 
time of migration but there is no guarantee - and it doesn't matter - 
that the client might not have owners that spread across filesystems 
some time.

s5.1.1.2, para 7 on page 29:
> Such support only needs to be
>     provided for requests issued before the migration event whose status
>     as the last by sequence is invalidated by the migration event.
I can't parse the last line of this sentence.

s5.1.1.2, 1st and 2nd bullets on page 30: s/the last request for the 
owner in effect/the sequence number for the last request for the owner 
in effect/;  s/longer one less the next sequence to be received./longer 
one less than the next sequence to be received./

------------------------
s6.2, para 4:
>     o  Some operations which modify locking state are not allowed to
>        return NFS4ERR_DELAY.
I think it be worth mentioning which operations these are in this para.  
Inspection of Table 7 in RFC7530 indicates to me that the relevant 
operations are OPEN_CONFIRM and RELEASE_LOCKOWNER plus possibly RENEW.  
GETFH, RESTOREFH, SAVEFH and ILLEGAL don't return NFS4ERR_DELAY but 
don't appear to affect the locking state.  I observe that all of these 
are mentioned explicitly except for OPEN_CONFIRM:  Does anything need to 
be said about this extra case?

s6.2, para 6 on page 37: s/cannot change target filesystem locking 
state,/cannot change the target file system locking state,/

s6.2, para 7 on page 37:
>     o  Keeping track of the earliest request started which is still in
>        execution (for example, by keeping a list of active requests
>        ordered by request start time).  The server can then define T' to
>        be the first time at which the earliest-started active request
>        started after time T.
I don't understand what this criterion is driving at.  The second 
sentence appears to be aimed at identifying the time associated with 
some action that is associated with the earliest-started request but 
doesn't actually say what it is AFAICS.

s7.1: It would be helpful to identify the sections of RFC 7530 that are 
affected by the changes summarized here, identifying replacements and 
modifications.  It would also be worth reordering the summary to match 
with the order of subsections of s7.1.

s7.1, bullets 1 and 2: s/CLID_INUSE/NFS4ERR_CLID_INUSE/

s7.2: See the comment on s4.8, para 1 on page 18 above.

s7.3, para 7: s/Thus, a client that is prepared to receive 
NFS4ERR_MOVED/Thus, if a client is prepared to receive NFS4ERR_MOVED

s7.4.5.1: A pointer to Section 9.1.7 of [RFC7530] where the DRC is 
introduced would be helpful.  BTW I am not sure that a server is 
*allowed* to be without a DRC - s9.1.7 says it is critical to have 
something that looks like a DRC.


s7.6, para 1: Please reference Section 19 of [RFC7530].  The affected 
para is the penultimate rather than the ultimate para of Section 19 of 
[RFC7530] judging from the initial words.

s7.6, para 2:
> See the section "Client Identity
>        Definition" for further discussion.
Please add a section reference (Section 4 of this document)

s8: Suggest
OLD:
Is to be modified as specified in Section 7.6.
NEW:
The security considerations of [RFC7530] remain appropriate with the 
exception of the modification to the penultimate  paragraph specified in 
Section 7.6 of this document and the addition of the material in Section 
7.5.
END