[nfsv4] Benjamin Kaduk's Discuss on draft-ietf-nfsv4-mv0-trunking-update-03: (with DISCUSS and COMMENT)
Benjamin Kaduk <kaduk@mit.edu> Wed, 09 January 2019 19:17 UTC
Return-Path: <kaduk@mit.edu>
X-Original-To: nfsv4@ietf.org
Delivered-To: nfsv4@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 19206131054; Wed, 9 Jan 2019 11:17:42 -0800 (PST)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: Benjamin Kaduk <kaduk@mit.edu>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-nfsv4-mv0-trunking-update@ietf.org, Spencer Shepler <spencer.shepler@gmail.com>, nfsv4-chairs@ietf.org, spencer.shepler@gmail.com, nfsv4@ietf.org
X-Test-IDTracker: no
X-IETF-IDTracker: 6.89.2
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <154706146206.5038.389871557428840458.idtracker@ietfa.amsl.com>
Date: Wed, 09 Jan 2019 11:17:42 -0800
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/hWc_HOeYUnD4By-VoP-M0EkkVQM>
Subject: [nfsv4] Benjamin Kaduk's Discuss on draft-ietf-nfsv4-mv0-trunking-update-03: (with DISCUSS and COMMENT)
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.29
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 09 Jan 2019 19:17:42 -0000
Benjamin Kaduk has entered the following ballot position for draft-ietf-nfsv4-mv0-trunking-update-03: Discuss When responding, please keep the subject line intact and reply to all email addresses included in the To and CC lines. (Feel free to cut this introductory paragraph, however.) Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html for more information about IESG DISCUSS and COMMENT positions. The document, along with other ballot positions, can be found here: https://datatracker.ietf.org/doc/draft-ietf-nfsv4-mv0-trunking-update/ ---------------------------------------------------------------------- DISCUSS: ---------------------------------------------------------------------- First off, thanks for the work on this document; it's important to get this behavior clarified and functional even for NFSv4.0. That said, this document (along with the pieces of 7530 and 7931 that I read along the way) still leave me uncertain about how some things are supposed to work. (If it's clarified in parts of those documents that I didn't read, I'll happily clear and apologize for the disruption, of course.) To start with, I'm still lacking a clear high-level picture of why a client needs to care about trunking detection vs. just treating all listed addresses as replicas. There are some parts in the body where we talk about, e.g., lock state and similar maintenance, but I don't have a clear picture of what the risks and benefits of (not) tracking trunking are, and this would be a fine opportunity to add some text. Specifically, in Section 5.2.1, we just say that "[a] client may use file system location elements simultaneously to provide higher-performance access to the target file system"; most of the focus of this document makes me think that this statement was intended to apply only to trunking, but I also think there are supposed to be replication-only scenarios that provide performance gains. I'm not sure if we need to clarify the distinction in that location as well as the high-level overview. It's also unclear to me what parts of migration flows are under the control of the client vs. the server. It's clear that the server has to initiate migration via NFS4ERR_MOVED, but my current understanding is just that this prompts the client to look at fs_locations, and the client has control over which alternate location to move to. But there's also a lot of discussion in all three documents about the servers migrating state along with migration, so it seems like the server should be controlling where the client goes. Is this just supposed to be by limiting the fs_locations data to the specific migration target chosen by the server? (If so, this would probably have potential for poor interaction with the implicit filesystem discovery described in Section 5.3.) On the other hand, Section 5.2.6 talks about the server putting entries "that represent addresses usable with the current server or a migration target before those associated with replicas", which seems to imply that there is some other way to know what the migration target is. Section 5.2.6 also tells the client to rely on that ordering: To keep this process as short as possible, Servers are REQUIRED to place file system location entries that represent addresses usable with the current server or a migration target before those associated with replicas. A client can then cease scanning for trunkable file system location entries once it encounters a file system location element whose fs_name differs but I don't think a client actually can do so, since the client has no way to know that the server implements this document as opposed to stock 7530+7931 (at least, no way that I saw). Finally, removing the last paragraph of Section 8.5 of RFC 7530 could have negative operational impact if updated clients interact with non-updated servers/environments that are misconfigured in the described fashion. It's probably worth stating in the top-level Section 5 that such misconfigured servers are believed to no longer exist (if that's in fact true, of course; if not, we'd need to reconsider the change). ---------------------------------------------------------------------- COMMENT: ---------------------------------------------------------------------- Section 1 As part of addressing this need, [RFC7931] introduces trunking into NFS version 4.0 along with a trunking detection mechanism. This enables a client to determine whether two distinct network addresses are connected to the same NFS version 4.0 server instance. Nevertheless, the use of the concept of server-trunkability is the same in both protocol versions. Er, what are the two protocol versions in question? (I assume 4.0 and 4.1, but you don't say 4.1 anywhere.) o To provide NFS version 4.0 with a means of trunking discovery, compatible with the means of trunking detection introduced by [RFC7931]. We haven't yet mentioned that the distinction between "detection" and "discovery" is important, so it's probably worth a forward reference to the text below. Section 5.1 The fs_locations attribute (described as "RECOMMENDED" in [RFC7530]) If you're going to describe this section as "replacing Section 8.1 of [RFC7530]", then it needs to stand on its own without reference to the current Section 8.1 of RFC 7530. That is, if the "RECOMMENDED" nature is to remain, then it should be described as such de novo in this text. Clients use the existing means for NFSv4.0 trunking detection, defined in [RFC7931], to confirm that such addresses are connected to the same server. The client can ignore addresses found not to be so connected. nit: I would suggest phrasing this as "use the NFSv4.0 trunking detection mechanism [RFC7931] to confirm [...]", as temporal refernces like "existing" may not age well. not-nit: "ignore" is pretty strong; does this imply that a client is free to ignore things like migration, replication, and referrals? location entries. If a file system location entry specifies a network address, there is only a single corresponding location element. When a file system location entry contains a host name, the client resolves the hostname, producing one file system location element for each of the resulting network addresses. Issues regarding the trustworthiness of hostname resolutions are further discussed in Section 7. nit(?) this is confusing if we read "Section 7" as being "Section 7 of RFC 7530", which is a tempting reading since this text is supposed to replace text in that document. Perhaps "Section 7 of [[this document]]" would make more sense (but I also forget the RFC Editor's policy on such self-references). Section 5.2.1 The client utilizes trunking detection and/or discovery, further described in Section 5.2.2 of the current document, to determine a nit(?) perhaps s/the current document/[[this document]]/ as above (for update by the RFC Editor). I'll stop commenting this construction, though of course if such changes are made they should be done globally. Section 5.2.3 Because of the need to support multiple connections, clients face the What need? Where is this need articulated? As a result, clients supporting multiple connection types need to attempt to establish a connection on various connection types allowing it to determine which connection types are supported. nit: maybe describe this as a "trial and error" approach to connection type support determination? To avoid waiting when there is at least one viable network path available, simultaneous attempts to establish multiple connection types are possible. Once a viable connection is established, the client discards less-preferred connections. It's probably worth referencing the "happy eyeballs" technique used elsewhere (e.g., RFC 8305) as being analogous. Section 5.2.5 Such migration can help provide load balancing or general resource reallocation. [...] side note: is this load balancing generally going to be just of a "move a filesystem or ten to a different server when load gets too high" or are people also doing "send different clients to different replicas for the same filesystem" live load-balancing? Section 5.2.6 When the set of network addresses designated by a file system location attribute changes, NFS4ERR_MOVED might or might not result. In some of the cases in which NFS4ERR_MOVED is returned migration has occurred, while in others there is a shift in the network addresses used to access a particular file system with no migration. I got pretty confused when I first read this, thinking there was some implication that a server could introduce a fleeting NFS4ERR_MOVED as a notification that addresses changed, even if the server could otherwise continue handling the client's requests. Perhaps: % When the set of network addresses on a server change in a way that would % affect a file system location attribute, there are several possible % outcomes for clients currently accessing that file system. NFS4ERR_MOVED % is returned only when the server cannot satisfy a request from the client, % whether because the file system has been migrated to a different server, is % only accessible at a different trunked address on the same server, or some % other reason. Similarly, we may want to clarify that (e.g.) case (1) is not going to result in an NFS4ERR_MOVED. 2. When the list of network addresses is a subset of that previously in effect, immediate action is not needed if an address missing in the replacement list is not currently in use by the client. The client should avoid using that address in the future, whether the address is for a replica or an additional path to the server being used. "avoid using that address in the future" needs to be scoped to this filesystem; it's not going to work if clients treat it as a global blacklisting. Although significant harm cannot arise from this misapprehension, it can give rise to disconcerting situations. For example, if a lock has been revoked during the address shift, it will appear to the client as if the lock has been lost during migration, normally calling for it to be recoverable via an fs-specific grace period associated with the migration event. I think this example needs to be clarified more or rewritten to describe what behavior fo which participant that normally happens does not happen (specifically, the "normally ..." clause). from the current fs_name, or whose address is not server-trunkable with the one it is currently using. nit: does it make more sense to put the address clause first, since fs_name is only valid within the scope of a given address/server? Section 5.3 As mentioned above, a single file system location entry may have a server address target in the form of a DNS host name that resolves to multiple network addresses, while multiple file system location entries may have their own server address targets that reference the same server. nit: I'm not sure that "while" is the right word here. Perhaps "and conversely"? When server-trunkable addresses for a server exist, the client may assume that for each file system in the namespace of a given server network address, there exist file systems at corresponding namespace locations for each of the other server network addresses. It may do Pretty sure you need to say "trunkable" here, too. this even in the absence of explicit listing in fs_locations. Such I may be confused, but we're talking about different file systems within a single server's single-server namespace, right? So there is not even a way for them to be listed in the fs_locations for queries on FHs in the current filesystem (unless the server exports the same filesystem under different paths in its namespace for some reason). So, we should probably be saying more about how these are fs_locations results returned for queries against different filesystems hosted on the same server... corresponding file system locations can be used as alternative locations, just as those explicitly specified via the fs_locations attribute. ... (and possibly some related tweaks in this part too). Section 7 We probably need to reiterate the privacy considerations inherent in the UCS approach, mentioned at the end of Section 5.6 of RFC 7931. o When DNS is used to convert NFS server host names to network addresses and DNSSEC [RFC4033] is not available, the validity of the network addresses returned cannot be relied upon. However, when the client uses RPCSEC_GSS [RFC7861] to access NFS servers, it is possible for mutual authentication to detect invalid server addresses. Other forms of transport layer nit: It seems to only sort-of be the case that the mutual authentication detects invalid addresses. I tend to think of the property involved as ensuring that I am talking to who I think I am, which encompasses both the intended network address and the stuff on the other end. On the other hand, one could imagine some bizzare deployments that share kerberos keys across servers where GSS could succeed (if the acceptor didn't have strict host name checking in place) but the address would still be unintended. If I had to rephrase this (unclear that it's really necessary), I might go with something like "to increase confidence in the correctness of server addresses", but there are lots of valid things to say here and it's not a big deal. o Fetching file system location information SHOULD be performed using RPCSEC_GSS with integrity protection, as previously I forget if we have to say "integrity protection or better" or if this phrasing also includes the confidentiality protection case. When a file system location attribute is fetched upon connecting with an NFSv4 server, it SHOULD, as stated above, be done using RPCSEC_GSS with integrity protection. It looks like this is now three places where this normative requirement is stated (7530's security considerations, and earlier in this section). Usually we try to stick to just one, to avoid risk of conflicting interpretations, and restate requirements non-normatively when needed. (It's not even clear that this duplication is needed, though.) For example, if a range of network addresses can be determined that assure that the servers and clients using AUTH_SYS are subject to appropriate constraints (such as physical network isolation and the use of administrative controls within the operating systems), then network adresses in this range can be used with others discarded or restricted in their use of AUTH_SYS. I'd strongly suggest adding a comma or something here to avoid the misparsing of "used with others". To summarize considerations regarding the use of RPCSEC_GSS in fetching file system location information, consider the following possibilities for requests to interrogate location information, with interrogation approaches on the referring and destination servers arrived at separately: I don't understand what this is trying to say, especially in light of the following bullet points being essentially recommendations for behavior (in one case, limited to a specific situation where disrecommended behavior is unavoidable). I do appreciate the good discussions about the provenance and reliability of location data -- it seems to be pretty complete, so thank you!
- [nfsv4] Benjamin Kaduk's Discuss on draft-ietf-nf… Benjamin Kaduk
- Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… David Noveck
- Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… David Noveck
- Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
- Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
- Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… Chuck Lever