[nfsv4] Benjamin Kaduk's Discuss on draft-ietf-nfsv4-rfc5661sesqui-msns-03: (with DISCUSS and COMMENT)
David Noveck <davenoveck@gmail.com> Thu, 02 January 2020 15:09 UTC
Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8EE2F1200A4; Thu, 2 Jan 2020 07:09:26 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.754
X-Spam-Level:
X-Spam-Status: No, score=-0.754 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, NORMAL_HTTP_TO_IP=0.001, NUMERIC_HTTP_ADDR=1.242, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gP4oqJV9eVGa; Thu, 2 Jan 2020 07:09:17 -0800 (PST)
Received: from mail-ed1-x52f.google.com (mail-ed1-x52f.google.com [IPv6:2a00:1450:4864:20::52f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 79558120091; Thu, 2 Jan 2020 07:09:16 -0800 (PST)
Received: by mail-ed1-x52f.google.com with SMTP id cy15so39333817edb.4; Thu, 02 Jan 2020 07:09:16 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=+crQZfC8lkTqXGBHl9eI0VyoGT5NIThyjgt/mTVAZz4=; b=GEZp6x2fD1k8chNdbbSPBdrts5WkcP++eOzUHlrejkmSofx1FTibl1xG6hKUudQc3/ F8UBm4aKGJfJTFD9/9F5LinSR3c1MzwwmdD65QwOls1zZBrEKOzRohRCI9WH7pntp7ld FXaamnF+fak7zzdA8u1+dFJqpTLxNR/UU0upRCha54dEI0oWrkqHHsCTuFv5dk0Q5zjd /ks7JCzCALijJRhcroWz6LM78oXDa3B3hJOkg0YhC3lub7iniSPh9klYGFpDYzBCEWWC RNKRJ4nW9Ujt3FShW11Td/H+6UjLEY79Ub/JBTrWyVNsRGKiVIBgiu5XIw+mIVcXidnA N76Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=+crQZfC8lkTqXGBHl9eI0VyoGT5NIThyjgt/mTVAZz4=; b=GzeIBuwMrUJX5/Waqx1zcTe8q+ZkCUwXpRaslkt2G+xSG1jEFF+HtPE5UozrJQu+YK 2mOJJ/buQpUxTEFb7sl9sj4ttsZJpHaC08n71ixJUs4JXyyrcbysNHEA6hzkue5FJes4 OhAdyOPWjcdepjH6p25pgy5IHVdmZw5jjYkFUpMCVXxG2XJIiWWMJMvB7fEoWAr0NKUL 0IDOXQ+vBXKYMSESILHSB/J3Ar3kwLYdBntRzsM6LO6sM186iaUDjDcAkw9/Zs2e7jzz HCuVCiuBpYm/8+6dQxAWTBdymjcKcPSClh4eEFVHtINaAqBZIICs7QSfzBhyyJqDo3+M h36g==
X-Gm-Message-State: APjAAAVdXl5RaQdiBxlhs1g+e9ynsC1bXcS+gwVVNTWSQM+OfJDBan9x 1uUjmHMmzorp9kz7aWYEuCoTbC6aIzClBXrMbJQ=
X-Google-Smtp-Source: APXvYqxB1fZlL0Qsc0sYBYX1vR1VsRs4u2maAoFWmicVOFGIlMjQQBFx6x3nXz30YcVyUSy3jGi52TMFYYbSchChCxI=
X-Received: by 2002:a17:906:1354:: with SMTP id x20mr88722789ejb.279.1577977753687; Thu, 02 Jan 2020 07:09:13 -0800 (PST)
MIME-Version: 1.0
References: <157665795217.30033.16985899397047966102.idtracker@ietfa.amsl.com> <CADaq8jegizL79V4yJf8=itMVUYDuHf=-pZgZEh-yqdT30ZdJ5w@mail.gmail.com>
In-Reply-To: <CADaq8jegizL79V4yJf8=itMVUYDuHf=-pZgZEh-yqdT30ZdJ5w@mail.gmail.com>
From: David Noveck <davenoveck@gmail.com>
Date: Thu, 02 Jan 2020 10:09:02 -0500
Message-ID: <CADaq8jcURAKZsNvs17MhNFT7eBNtkvOdrur5hHY2J1gXH7QdsA@mail.gmail.com>
To: Benjamin Kaduk <kaduk@mit.edu>
Cc: The IESG <iesg@ietf.org>, draft-ietf-nfsv4-rfc5661sesqui-msns@ietf.org, "nfsv4-chairs@ietf.org" <nfsv4-chairs@ietf.org>, Magnus Westerlund <magnus.westerlund@ericsson.com>, NFSv4 <nfsv4@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000078daee059b299345"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/dR_kJrRSEQpfndPY393D--ZAm88>
Subject: [nfsv4] Benjamin Kaduk's Discuss on draft-ietf-nfsv4-rfc5661sesqui-msns-03: (with DISCUSS and COMMENT)
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Jan 2020 15:09:27 -0000
On Wed, Dec 18, 2019 at 3:32 AM Benjamin Kaduk via Datatracker < noreply@ietf.org> wrote: > Benjamin Kaduk has entered the following ballot position for > draft-ietf-nfsv4-rfc5661sesqui-msns-03: Discuss > > ---------------------------------------------------------------------- > DISCUSS: > ---------------------------------------------------------------------- > > Responded to these on 12/20. > > ---------------------------------------------------------------------- > COMMENT: > ---------------------------------------------------------------------- > > I think I may have mistakenly commented on some sections that are > actually just moved text, since my lookahead window in the diff was too > small. No harm, no foul. > > Since the "Updates:" header is part of the immutable RFC text (though > "Updated by:" is mutable), we should probably explicitly state that "the > updates that RFCs 8178 and 8434 made to RFC 5661 apply equally to this > document". > I think we could update the last paragraph of Section 1.1 to be more explicit about this. Perhaps it could read: Until the above work is done, there will not be a consistent set of documents providing a description of the NFSv4.1 protocol and any full description would involve documents updating other documents within the specification. The updates applied by RFC 8434 [66] and RFC 8178 [63] to RFC5661 also apply to this specification, and will apply to any subsequent v4.1 specification until that work is done. > > I note inline (in what is probably too many places; please don't reply > at all of them!) some question about how clear the text is that a file > system migration is something done at a per-file-system granularity, and > that migrating a client at a time is not possible. It might be possible but doing so is not a goal of this specfication. I'm not sure how to address your concern. I don't know why anyone would assume that migrating entire clients is a goal of this specification. As far as I can see, when the word "migration" is used it is always in connection with migrating a file system. Is there some specific place where you think this issue is likely to arise? As was the case for > my Discuss point about addresses/port-numbers, I'm missing the context > of the rest of the document, so perhaps this is a non-issue, but the > consequences of getting it wrong seem severe enough that I wanted to > check. > I'm not seeing any severe consequences. Am I missing something? > > Does a client have any way to know in advance that two addresses will be > session-trunkable other than the one listed in Section 11.1.1 that "when > two connections of different connection types are made to the same > network address and are based on a single file system location entry > they are always session-trunkable"? No. > It seems like mostly we're defining > the property by saying that the client has to try it and see if it > works; I'd love to be wrong about that. > We could add an extension to provide easier access to this information, but the goal of this document is to clarify what is possible with the current protocol. Section 1.1 > > The revised description of the NFS version 4 minor version 1 > (NFSv4.1) protocol presented in this update is necessary to enable > full use of trunking in connection with multi-server namespace > features and to enable the use of transparent state migration in > connection with NFSv4.1. [...] > > nit: do we expect all readers to know what is meant by "trunking" with > no other lead-in? > Good point. perhaps it could be addressed by rewriting the material in the first paragraph of Section 1.1 to read as follows;. Two important features previously defined in minor version 0 but never fully addressed in minor version 1 are trunking, the use of multiple connections between a client and server potentially to different network addresses, and transparent state migration, which allows a file system to be transferred betwwen servers in a way that provides for the client to maintain its existing locking state accross the transfer. The revised description of the NFS version 4 minor version 1 (NFSv4.1) protocol presented in this update is necessary to enable full use of these features with other multi-server namespace features This document is in the form of an updated description of the NFS 4.1 protocol previously defined in RFC5661 [62]. RFC5661 is obsoleted by this document. However, the update has a limited scope and is focused on enabling full use of trunkinng and transparent state migration. The need for these changes is discussed in Appendix A. Appendix B describes the specific changes made to arrive at the current text. > This limited scope update is applied to the main NFSv4.1 RFC with the > > nit: hyphenate "limited-scope" > Will fix. > > scope as could expected by a full update of the protocol. Below are > some areas which are known to need addressing in a future update of > the protocol. > [...] > > side note: I'd be interested in better understanding the preference for > the subjunctive verb tense for most of these points ("work would have to > be done"); my naive expectation would be that since there are plans to > undertake the work, just "work needs to be done" or "work will be done" > might suffice. > I think that when these words were written, the wg had not fully embraced the idea that rfc5661bis had to be done. The difficulty/length of rfc7530 was a focus of attention. Since that time, it has become generally accepted that this work is unavoidable and I intend to switch to the future indicative in the next revision. > > o Work would have to be done with regard to RFC8178 [63] which > establishes NFSv4-wide versioning rules. As RFC5661 is curretly > inconsistent with this document, changes are needed in order to > arrive at a situation in which there would be no need for RFC8178 > to update the NFSv4.1 specfication. > > nit: s/this document/that document/ -- "this document" is > draft-ietf-nfsv4- rfc5661sesqui-msns. Will fix. > o Work would have to be done with regard to RFC8434 [66], which > establishes the requirements for pNFS layout types, which are not > clearly defined in RFC5661. When that work is done and the > resulting documents approved, the new NFSv4.1 specfication > document will provide a clear set of requirements for layout types > and a description of the file layout type that conforms to those > requirements. Other layout types will have their own specfication > documents that conforms to those requirements as well. > > It's not entirely clear to me that the other layout types need to get > mentioned in this document; how do they relate to the formal status of > the "current NFSv4.1 core protocol specification document"? > Other layout types are not specfically mentioned but it is clear that they will exist and that this document needs to make provision for them, as rfc5661 did. Such documents refer normatively to rfc5661, but rfc5661 (and this document) would only refer to them informatively.. There is some text in Section 22.3 about setting up a layout type registry and it refers (informatively) to the defining documents. It may be that incorporation of RFC8434 in rfc5661bis will involve further references to layout type specifications but I expect that all such references would be informative. > > o Work would have to be done to address many erratas relevant to RFC > 5661, other than errata 2006 [60], which is addressed in this > document. That errata was not deferrable because of the > interaction of the changes suggested in that errata and handling > of state and session migration. The erratas that have been > deferred include changes originally suggested by a particular > errata, which change consensus decisions made in RFC 5661, which > need to be changed to ensure compatibility with existing > implementations that do not follow the handling delineated in RFC > 5661. Note that it is expected that such erratas will remain > > This sentence is pretty long and hard to follow; maybe it could be split > after "change consensus decisions made in RFC 5661" and the second half > start with a more declarative statement about existing implementations? > (E.g., "Existing implementations did not perform handling as delineated in > RFC > 5661 since the procedures therein were not workable, and in order to > have the specification accurately reflect the existing deployment base, > changes are needed [...]") > I will clean this bullet up. See below for a proposed replcement. > > relevant to implementers and the authors of an eventual > rfc5661bis, despite the fact that this document, when approved, > will obsolete RFC 5661. > > (I assume the RFC Editor can tweak this line to reflect what actually > happens; my understanding is that the errata reports will get cloned to > this-RFC.) > I understand that Magnus has already got that issue addressed. I'll discuss the appropriate text with him. > [rant about "errata" vs. "erratum" elided] > This is annoying but there is no way we are going to get people to use "erratum". What I've tried to do in my propsed replacement text is to refer to "errata report(s)", which is more accurate and allows people who speak English to use English singulars and plurals, without having to worry about Latin grammar. Here's my proposed replacement for the troubled bullet: o Work needs to be done to address many errata reports relevant to RFC 5661, other than errata report 2006 [60], which is addressed in this document. Addressing of that report was not deferrable because of the interaction of the changes suggested there and the newly described handling of state and session migration. The errata reports that have been deferred and that will need to be addressed in a later document include reports currently assigned a range of statuses in the errata reporting system including reports marked Accepted and those marked Held Over because the change was too minor to address immediately. In addition, there is a set of other reports, including at least one in state Rejected, which will need to be addressed in a later document. This will involve making changes to consensus decisions reflected in RFC 5661, in situations in which the working group has already decided that the treatment in RFC 5661 is incorrect, and needs to be revised to reflect the working group's new consensus and ensure compatibility with existing implementations that do not follow the handling described in in RFC 5661. Note that it is expected that such all errata reports will remain relevant to implementers and the authors of an eventual rfc5661bis, despite the fact that this document, when approved, will obsolete RFC 5661 [62]. > Section 2.10.4 > > Servers each specify a server scope value in the form of an opaque > string eir_server_scope returned as part of the results of an > EXCHANGE_ID operation. The purpose of the server scope is to allow a > group of servers to indicate to clients that a set of servers sharing > the same server scope value has arranged to use compatible values of > otherwise opaque identifiers. Thus, the identifiers generated by two > servers within that set can be assumed compatible so that, in some > cases, identifiers generated by one server in that set may be > presented to another server of the same scope. > > Is there more that we can say than "in some cases"? Not really. In general, when a server sends you an id, it comes with an implied promise to recognize it when you present it subsequently to the same server. The fact that two servers have decided to co-operate in their Id assignment does not change that. The previous text > implies a higher level of reliability than just "some cases", to me. > I think I need to change the text, perhaps by replacing "use compatible values of otherwise opaque identifiers" by "use distinct values of otherwise opaque identifiers so that the two servers never assign the same value to two distinct objects". I anticipate the following replacement for the first two paragraphs of Section 2.10.4: Servers each specify a server scope value in the form of an opaque string eir_server_scope returned as part of the results of an EXCHANGE_ID operation. The purpose of the server scope is to allow a group of servers to indicate to clients that a set of servers sharing the same server scope value has arranged to use distinct values of opaque identifiers so that the two servers never assign the same value to two distinct object. Thus, the identifiers generated by two servers within that set can be assumed compatible so that, in certain important cases, identifiers generated by one server in that set may be presented to another server of the same scope. The use of such compatible values does not imply that a value generated by one server will always be accepted by another. In most cases, it will not. However, a server will not accept a value generated by another inadvertently. When it does accept it, it will be because it is recognized as valid and carrying the same meaning as on another server of the same scope. As an illustration of the (limited) value of this information, consider the case of client recovery from a server reboot. The client has to reclaim his locks using file handles returned by the previous server instance. If the server scopes are the same (they almost always are), the client is not sure he will get his locks back (e.g. the file might have been deleted), but he does know that, if the lock reclaim succeeds, it is for the same file. If the server scopes are not the same, he has no such assurance. > Section 2.10.4 > > I see the list of identifier types for which same-scope compatibility > applies got reduced from RFC 5661 to this document, by removing session > ID, client ID, and state ID values. For at least one of those I can see > this making sense as only being workable when the server really is "the > same server", inline with the improved discussion of migration vs. > trunking that is a main focus of this document. Does that > justification apply to all of them, or are there more reasons involved? > That was involved but overall we wound up deciding to have the list reflect actual server practice, as opposed to rfc5661, which focused on what it thought servers should do. > We also remove the text about a client needing to compare server scope > values during a potential migration event, to determine whether the > migration preserved state or a reclaim is needed. I thought this > scenario would still be possible (and thus still need to be listed), > though perhaps we are claiming that it is so under-specified so as to be > never workable in practice? > It's not workable in practice. > Section 2.10.5 > > o When eir_server_scope changes, the client has no assurance that > any id's it obtained previously (e.g. file handles, state ids, > client ids) can be validly used on the new server, and, even if > > It's interesting to see file handles, state ids, and client ids listed > together here (nit: also with lowercase "id"), when in the previous > section we have removed state IDs and client IDs from a list that > includes all three in RFC 5661. > They shouldn't be. Given the potential interaction with migration, will be fixing this document. > o When eir_server_scope remains the same and > eir_server_owner.so_major_id changes, the client can use the > filehandles it has, consider its locking state lost, and attempt > to reclaim or otherwise re-obtain its locks. It may find that its > file handle IS now stale but if NFS4ERR_STALE is not received, it > can proceed to reclaim or otherwise re-obtain its open locking > state. > > nit(?): this bit about "It may find that its file handle IS now stale > but if NFS4ERR_STALE is not received" seems to assume some familiarity > by the reader as to what actions would be performed that would get > NFS4ERR_STALE back. > No actions needed. I think the last sentence would be better as two sentences: It might find out that its file handle is now stale. However, if NFS4ERR_STALE is not returned, the client can proceed to reclaim or otherwise re-obtain its open locking state. > Section 2.10.5.1 > > When the server responds using two different connections claim > matching or partially matching eir_server_owner, eir_server_scope, > > nit: The grammar got wonky here; maybe s/claim/claiming/? Will fix > > > > Section 11.1.1 > > In the case of NFS version 4.1 and later minor versions, the means > of trunking detection are as described in this document and are > available to every client. Two network addresses connected to the > same server are always server-trunkable but cannot necessarily be > used together to access a single session. > > nit: we haven't defined "server-trunkable" yet, so it may be worth a > hint that the definition is coming soon. I would prefer not using "server-trunkable before it is defined. I anticipate rewriting this paragraph as follows: In the case of NFS version 4.1 and later minor versions, the means of trunking detection are as described in this document and are available to every client. Two network addresses connected to the same server can always be used to together to access a particular server but cannot necessarily be used together to access a single session. See below for definitions of the terms "server- trunkable" and "session-trunkable" > > The combination of a server network address and a particular > connection type to be used by a connection is referred to as a > "server endpoint". Although using different connection types may > result in different ports being used, the use of different ports by > multiple connections to the same network address is not the essence > of the distinction between the two endpoints used. > > There's perhaps a fine line to walk here, as the port can still have > significant relevance, in general, I'd prefer not to keep walking a fine line if that can be avoided and intend to allow port specification as indicated in my response to your DISCUSS points. Given that I've done that I need to make clear that, in this case, I am only referring to differences in ports due to different connection types, i.e. 2049 vs 20049. I anticipate revising this paragraph to read: The combination of a server network address and a particular connection type to be used by a connection is referred to as a "server endpoint". Although using different connection types may result in different ports being used, the use of different ports by multiple connections to the same network address in such cases is not the essence of the distinction between the two endpoints used. This is in contrast to the case in which the explicit specification of port numbers within network addresses is used to allow a single server node to suppport multiple NFS servers. > and we are frequently in the IETF > told to make no assumption about what is behind specific port values at > a given network address. That may be, that there is a lot of client code out there that assumes that you will find an NFS server at port 2049. > (Consider, for example, a hypothetical virtual > hosting service that provides "DS-as-a-service" where customers run > their own MDS that point to configured DSes for actual storage. > Different ports on that cloud provider would represent entirely > different customers/servers!) [This became a discuss point but it > didn't end up including all the discussion here, so I left it as an > informational thing; discussion should happen in the Discuss section] > DS's at different ports, although interesting, are not addressed by any of this text. Network addresses for DS's are specified in layouts, which do use XDR structures to define network addresses. I assume port numbers are appropriately dealt with. If they aren't, then this would be an issue for rfc566bis, in the case of pNFS files and other documents in the case of other layout types (e.g. flex files) > Section 11.1.2 > > o In some cases, a server will have a namespace more extensive than > its local namespace by using features associated with attributes > that provide file system location information. These features, > which allow construction of a multi-server namespace are all > > nit: comma after "multi-server namespace". > Will fix. > o A file system present in a server's pseudo-fs may have multiple > file system instances on different servers associated with it. > All such instances are considered replicas of one another. > > [Some readers might take this as requiring live read/write replication > such that all writes to any instance are immediately visible on all > other instances. I expect that such readers would have led exceptionally sheltered lives :-) > The rest of the document ought to disabuse them of > that notion, and yet...] > It is reasonable to give them more warning. I intend to address this by adding the following to the paragraph. Whether such replicas can be used simultaneously is discussed in Section 11.11.1, while the level of co-ordination betwenn them (important when switching between them) is discussed in Sections 11.11.2 through 11.11.8 below. > > o File system location entries provide the individual file system > locations within the file system location attributes. Each such > entry specifies a server, in the form of a host name or IP an > address, and an fs name, which designates the location of the file > > nit: s/IP an/an IP/. > Will fix > client may establish connections. There may be multiple endpoints > because a host name may map to multiple network addresses and > because multiple connection types may be used to communicate with > a single network address. However, all such endpoints MUST > provide a way of connecting to a single server. The exact form of > > nit: "MUST provide" feels strange here, since it implies in some sense > an extra layer of indirection ("A lists X, and X among other things > provides Y"); would a different word like "indicate" work? > I think "designate"would work. Will use that in next revision. > element derives from a corresponding location entry. When a > location entry specifies an IP address there is only a single > corresponding location element. File system location entries that > contain a host name are resolved using DNS, and may result in one > or more location elements. All location elements consist of a > location address which is the IP address of an interface to a > server and an fs name which is the location of the file system > within the server's local namespace. The fs name can be empty if > > I can't decide whether both instances of "IP address" are pedantically > correct, in the presence of the potential for port information to be > included/available. The former is probably okay, but the latter might > need some clarification. > I think you can switch from."IP address" to "network address containing an IP address" or from "is an IP address" to "includes an IP address". Will fix. > > Section 11.2 > > The fs_locations attribute defined in NFSv4.0 is also a part of > NFSv4.1. This attribute only allows specification of the file system > locations where the data corresponding to a given file system may be > found. Servers should make this attribute available whenever > fs_locations_info is supported, but client use of fs_locations_info > is preferable, as it provides more information. > > I think this was probably okay as "SHOULD make this attribute available" > (as it was in 5661), but don't object to the lowercase version either. > I think "SHOULD" is right. Will fix. > Section 11.5 > > Where a file system had been absent, specification of file system > I expect to change this to say "Where a file system is currently absent". > > I guess I'm probably in the rough on this one (since 5661 had my > more-preferred language), but it still feels like "had been absent" > implies that it is no longer absent, i.e., that it is now present or has > otherwise changed. What's going on here with referrals is more like a > "was never present" case, though using "never" is of course problematic > as it's more absolute than is appropriate. > I did later use "never previously present" but will switch that to "not previously present". > > > If we're going to talk about "pure referral"s, do we want to make > mention of or otherwise differentiate/characterize "non-pure" > ("impure"?) referrals? > > I'd prefer not to do that. > Section 11.5.1 > > In order to simplify client handling and allow the best choice of > replicas to access, the server should adhere to the following > guidelines. > > Just to check: these are just informal "guidelines" and not something > that a server SHOULD or even MUST adhere to? > They are informal. It woluld be nice to go farther but we can't as it raises compatibility isses to suddenly tell a sever that what it had been doing is now illegal/strongly disfavored. > > Section 11.5.2 > > Locations entries used to discover candidate addresses for use in > > nit(?): is this supposed to just be "Location" singular > Yes. Will fix > Section 11.5.3 > > Irrespective of the particular attribute used, when there is no > indication that a step-up operation can be performed, a client > supporting RDMA operation can establish a new RDMA connection and it > can be bound to the session already established by the TCP > connection, allowing the TCP connection to be dropped and the session > converted to further use in RDMA node. > > Should we say something to make this contingent on the server also > supporting RDMA? > I will add ", if the server supports that" to the sentence. > Section 11.5.5 > > will typically use the first one provided. If that is inaccessible > for some reason, later ones can be used. In such cases the client > might consider that the transition to the new replica as a migration > event, even though some of the servers involved might not be aware of > the use of the server which was inaccessible. In such a case, a > > nit: the grammar here got wonky; maybe s/as a/is a/? > How about s/as a/to be a/ ? > > Section ?? > > The old (RFC 5661) Section 11.5 mentioned several things, and I'd like > to check that we have either covered or disavowed all of them. > No disavowal, as far as I know. > My current understanding is that: > > The first paragraph basically talked about trunking detection, and is > covered elsewhere. > Yes. > The second paragraph talks about something that I would call "implicit > replication" with the 5661 definition of "replica", but in the new model > is essentially definitionally true, since we consider all addresses for the same server to be ... part of the same server, so of course that > server's namespaces match up. We've disavowed the description although not the reality described. > Though perhaps the discussion about not > all of the cartesian product of (addresses-for-server, local path) being > listed is still worth having? > I think this is already discussed (in two places in the next revision). > > The third paragraph basically talks about the need for trunking > detection, and includes some guidance to clients about assuming server > misconfiguration that seems of questionable merit. > Agree. I think the treatment of misconfiguration is primarily in 2.10.5. Not sure what else to add. > Section 11.5.7 > > o Deletions from the list of network addresses for the current file > system instance need not be acted on immediately, although the > client might need to be prepared for a shift in access whenever > the server indicates that a network access path is not usable to > access the current file system, by returning NFS4ERR_MOVED. > > I think this should be wordsmithed a bit more, as (IIUC) the idea here > is that if a client notices in a location response that the address the > client is currently using for a filesystem has disappeared from the > list, the client should be prepared for imminent changes in server > behavior relating to the presumed-move. Those imminent changes would > most likely be reflected in the form of the server returning > NFS4ERR_MOVED, but there is no NFS4ERR_MOVED involved in the actual > deletion from the list of network instances of the current system > instance. > Yes. If there is a deletion, there need not be any MOVED error. However, if you get a MOVED error, then there has to be a deletion of your address from the list. I anticipate replacing this bullet by the following: o Deletions from the list of network addresses for the current file system instance do not need to be acted on immediately by ceasing use of existing access paths although new connections are not to be esablished on addresses that have been deleted. However, clients can choose to act on such deletions by making preparations for an eventual shift in access which would become unavoidable as soon as the server indicates that a particular network access path is not usable to access the current file system, by returning NFS4ERR_MOVED > Section 11.6 > > corresponding attribute is interrogated subsequently. In the case of > a multi-server namespace, that same promise applies even if server > boundaries have been crossed. Similarly, when the owner attribute of > a file is derived from the securiy principal which created the file, > that attribute should have the same value even if the interrogation > occurs on a different server from the file creation. > > I can see how the interrogation would be on a different server from file > creation for "simple" replication scenarios, but I'm not sure I'm seeing > how non-replication cases would arise, paritulcarly that cross server > boundaries in a multi-server (hierarchical?) namespace. Am I missing > something obvious? > I suppose non-replication cases cannot arise. nit: s/securiy/security/ > Will fix. > > o All servers support a common set of domains which includes all of > the domains clients use and expect to see returned as the domain > portion of an owner or group in the form "id@domain". Note that > although this set most ofen consists of a single domain, it is > possible for mutiple domains to be supported. > > I a little bit wonder if the "most often" still holds when client > principals come from an AD forest. > "Most often" describes the reality that exists. The protocol allows multiple domains, but the big obstacle to multi-domain support is the on-disk file systems that support only 32-bit uids and gids, as well as the vfs's which have the same restriction. :-( > > o All servers recognize the same set of security principals, and > each principal, the same credential are required, independent of > the server being accessed. In addition, the group membership for > > nit: I think there's a missing word here, maybe "and for each > principal"? > Yes, but I needed to rewrite the bullet as follows: o All servers recognize the same set of security principals. For each principal, the same credential is required, independent of the server being accessed. In addition, the group membership for each such prinicipal is to be the same, independent of the server accessed. > > Note that there is no requirment that the users corresponding to > > nit: "requirement" > Will fix. > > o The "local" representation of all owners and groups must be the > same on all servers. The word "local" is used here since that is > the way that numeric user and group ids are described in > Section 5.9. However, when AUTH_SYS or stringified owners or > group are used, these identifiers are not truly local, since they > are known tothe clients as well as the server. > > I am trying to find a way to note that the AUTH_SYS case mentioned here > is precisely because of the requirement being imposed by this bullet > point, Not sure what you mean by that. I think the requirement is to allow the client to be able to use AUTH_SYS, without the contortions that would be required if different fs's had the same uid's meaning different things. while acknowledging that the "stringified owners or group" case > is separate, but not having much luck. > My attempt to revise this area is below: Note that there is no requirement in general that the users corresponding to particular security principals have the same local representation on each server, even though it is most often the case that this is so. When AUTH_SYS is used, the following additional requirements must be met: o Only a single NFSv4 domain can be supported through use of AUTH_SYS. o The "local" representation of all owners and groups must be the same on all servers. The word "local" is used here since that is the way that numeric user and group ids are described in Section 5.9. However, when AUTH_SYS or stringified numeric owners or groups are used, these identifiers are not truly local, since they are known to the clients as well as the server. Similarly, when strigified numeric user and group ids are used, the "local" representation of all owners and groups must be the same on all servers, even when AUTH_SYS is not used. Also, nit: "to the" > Fixed. > Section 11.9 > > o When use of a particular address is to cease and there is also one > currently in use which is server-trunkable with it, requests that > would have been issued on the address whose use is to be > discontinued can be issued on the remaining address(es). When an > address is not a session-trunkable one, the request might need to > be modified to reflect the fact that a different session will be > used. > > I suggest writing this as "when an address is server-trunkable but not > session-trunkable, > OK. Will fix. > > o When use of a particular connection is to cease, as indicated by > receiving NFS4ERR_MOVED when using that connection but that > address is still indicated as accessible according to the > appropriate file system location entries, it is likely that > requests can be issued on a new connection of a different > connection type, once that connection is established. Since any > two server endpoints that share a network address are inherently > session-trunkable, the client can use BIND_CONN_TO_SESSION to > access the existing session using the new connection and proceed > to access the file system using the new connection. > > I'm not entirely sure how "inherent" this is (in the vein of my Discuss > point, and what we mean by "network address"). > Will need to address for specified ports, given that these are now allowed. Part of this will be new text in the definition of "server endpoints": The combination of a server network address and a particular connection type to be used by a connection is referred to as a "server endpoint". Although using different connection types may result in different ports being used, the use of different ports by multiple connections to the same network address in such cases is not the essence of the distinction between the two endpoints used. This is in contrast to the case of port-specific endpoints, in which the explicit specification of port numbers within network addresses is used to allow a single server node to suppport multiple NFS servers. Also need to to revise, as follows, the paragraph containing 'inherently". o When use of a particular connection is to cease, as indicated by receiving NFS4ERR_MOVED when using that connection but that address is still indicated as accessible according to the appropriate file system location entries, it is likely that requests can be issued on a new connection of a different connection type, once that connection is established. Since any two, non-port-specific server endpoints that share a network address are inherently session-trunkable, the client can use BIND_CONN_TO_SESSION to access the existing session using the new connection and proceed to access the file system using the new connection. > > o When there are no potential replacement addresses in use but there > > What is a "replacement address"? > I've explained that in some new text added before these bullets, as a new second paragraph of this section: The appropriate action depends on the set of replacement addresses (i.e. server endpoints which are server-trunkable with one previously being used) which are available for use. > > are valid addresses session-trunkable with the one whose use is to > be discontinued, the client can use BIND_CONN_TO_SESSION to access > the existing session using the new address. Although the target > session will generally be accessible, there may be cases in which > that session is no longer accessible. In this case, the client > can create a new session to enable continued access to the > existing instance and provide for use of existing filehandles, > stateids, and client ids while providing continuity of locking > state. > > I'm not sure I understand this last sentence. On its own, the "new > session to enable continued access to the existing instance" sounds like > the continued access would be on the address whose use is to cease, and > thus the new session would be there. That is not the intention. Will need to clarify. > But why make a new session when > the old one is still good, It isn't usable on the new connection. > especially when we just said in the previous > sentence that the old session can't be moved to the new > connection/address? > Because we can't use it on the new connection, we have to create a new session to access the client. Perhaps a forward reference down to Section 11.12.{4,5} for this and the > next bullet point would help as well as rewording? > It rurns out these would add confusion since they deal with migration situations and deciding wheher transparent stte miugration has occurred in the switch between replicas. In the cases we are dealing with, ther is only a single replicas/fs and no migration.. Here is my proposed replacement text for the two bullets in question: o When there are no potential replacement addresses in use but there are valid addresses session-trunkable with the one whose use is to be discontinued, the client can use BIND_CONN_TO_SESSION to access the existing session using the new address. Although the target session will generally be accessible, there may be rare situations in which that session is no longer accessible, when an attempt is made tto bind the new conntectin to it. In this case, the client can create a new session to enable continued access to the existing instance and provide for use of existing filehandles, stateids, and client ids while providing continuity of locking state. o When there is no potential replacement address in use and there are no valid addresses session-trunkable with the one whose use is to be discontinued, other server-trunkable addresses may be used to provide continued access. Although use of CREATE_SESSION is available to provide continued access to the existing instance, servers have the option of providing continued access to the existing session through the new network access path in a fashion similar to that provided by session migration (see Section 11.12). To take advantage of this possibility, clients can perform an initial BIND_CONN_TO_SESSION, as in the previous case, and use CREATE_SESSION only if that fails. > Section 11.10.6 > > In a file system transition, the two file systems might be clustered > in the handling of unstably written data. When this is the case, and > > What does "clustered in the handling of unstably written data" mean? > > the two file systems belong to the same write-verifier class, write > > How is the client supposed to determine "when this is the case"? > Here's a prpoed replcment for this pargraph: In a file system transition, the two file systems might be cooperating in the handling of unstably written data. Clients can ditermine if this is the case, by seeing if the two file systems belong to the same write-verifier class. When this is the case, write verifiers returned from one system may be compared to those returned by the other and superfluous writes avoided. > Section 11.10.7 > > In a file system transition, the two file systems might be consistent > in their handling of READDIR cookies and verifiers. When this is the > case, and the two file systems belong to the same readdir class, > > As above, how is the client supposed to determine "when this is the > case"? > READDIR cookies and verifiers from one system may be recognized by > the other and READDIR operations started on one server may be validly > continued on the other, simply by presenting the cookie and verifier > returned by a READDIR operation done on the first file system to the > second. > > Are these "may be"s supposed to admit the possibility that the > destination server can just decide to not honor them arbitrarily? > No. They are intended to indicate that the client might or might not use the capability Here is proposed replacement text for the paragraph: In a file system transition, the two file systems might be consistent in their handling of READDIR cookies and verifiers. Clients can determine if this is the case, by seeing if the two file systems belong to the same readdit class. When this is the case, readdir class, READDIR cookies and verifiers from one system will be recognized by the other and READDIR operations started on one server can be validly continued on the other, simply by presenting the cookie and verifier returned > Section 11.10.8 > > the degree indicated by the fs_locations_info attribute). However, > when multiple file systems are presented as replicas of one another, > the precise relationship between the data of one and the data of > another is not, as a general matter, specified by the NFSv4.1 > protocol. It is quite possible to present as replicas file systems > where the data of those file systems is sufficiently different that > some applications have problems dealing with the transition between > replicas. The namespace will typically be constructed so that > applications can choose an appropriate level of support, so that in > one position in the namespace a varied set of replicas will be > listed, while in another only those that are up-to-date may be > considered replicas. [...] > > This seems quite wishy-washy for a standards-track protocol! We give no > hard bounds on how "different" replicas may be, no protocol element to > convey even a qualitative sense of where on the spectrum of replication > fidelity a replica may lie, and no indication as to how the namespace > might be constructed to indicate a level of support. > I agree but feel that fixing this situation gets us beyond rfc5661sesqui or even rfc5661bis, but it would be possible to extend the fs_location_info attribute to accommodate this sort of information. This would take the form of an extension to NFsv4.2. > > The protocol does define three special cases of > the relationship among replicas to be specified by the server and > relied upon by clients: > > I'd like to hear from the rest of the IESG, but we may need to consider > limiting "replication" to just these special cases until we can be more > precise about the other cases. > The troublesome problem is that this terminology (and the associated wishy- washiness has been used for multiple minor versions. That makes it hard to change now. > > o When multiple replicas exist and are used simultaneously by a > client (see the FSLIB4_CLSIMUL definition within > fs_locations_info), they must designate the same data. Where file > systems are writable, a change made on one instance must be > visible on all instances, immediately upon the earlier of the > return of the modifying requester or the visibility of that change > on any of the associated replicas. This allows a client to use > > Hmm, how would this "earlier of [...]" work when there are three > nominally equivalent machines? Assume the RPC is made to A, and the > other two are B and C. If the update first goes visible on B, it must > also be visible on C, instilling what is apparently a hard requirement > for exact synchronization between B an C, perhaps by some sort of > negotiated "make visible at timestamp X" mechanism. But if the RPC > returns from A first, then the change still has to be visible on B and C > at the same time. Does this phrasing give any weaker a requirement than > "must be visible on all machines at the same time", in practice? (There > are, of course, various distributed-consensus protocols that can do > this, as could a scenario where all NFS servers are connected to a > common file store backend.) > Actually it leads to a stronger requirement and I'm not sure that's right. The current text would not allow A to store the new data in non-volatile memory, but delay propagation until one of A,B,C is askedf or the data. Will switch to use your suggetion. I anticipate using the following text for this bullet: o When multiple replicas exist and are used simultaneously by a client (see the FSLIB4_CLSIMUL definition within fs_locations_info), they must designate the same data. Where file systems are writable, a change made on one instance must be visible on all instances at the same time, regardless of whether the interrogated instance is the one on which the modification was done. This allows a client to use these replicas simultaneously without any special adaptation to the fact that there are multiple replicas, beyond adapting to the fact that locks obtained on one replica are maintained separately (i.e. under a different client ID). In this case, locks (whether share reservations or byte- range locks) and delegations obtained on one replica are immediately reflected on all replicas, in the sense that access from all other servers is prevented regardless of the replica used. However, because the servers are not required to treat two associated client IDs as representing the same client, it is best to access each file using only a single client ID. > Section 11.10.9 > > When access is transferred between replicas, clients need to be > assured that the actions disallowed by holding these locks cannot > > To check my understanding: this "access is transferred" means *all* > clients' access (not just one particular client)? Otherwise I'm not > sure how the destination would know to enforce the grace period. > It doesn't imply that. Even if all clients of the source fs were transferred, there will be clients of the dstination fs who have not been transferred> As far as grace periods go, co-operating servers could transfer locks transparently or limit a grace period to a single transferred client The following anticipated text for Section 11.11.9 may be helpful: When access is transferred between replicas, clients need to be assured that the actions disallowed by holding these locks cannot have occurred during the transition. This can be ensured by the methods below. Unless at least one of these is implemented, clients will not be assured of continuity of lock possession across a migration event. o Providing the client an opportunity to re-obtain his locks via a per-fs grace period on the destination server, denying all clients using the destination filesysten the opportunity to obtain new locks that conflict which those held by the transferred client as long as that client has not completed its per-fs grace period. Because the lock reclaim mechanism was originally defined to support server reboot, it implicitly assumes that file handles will, upon reclaim, will be the same as those at open. In the case of migration, this requires that source and destination servers use the same filehandles, as evidenced by using the same server scope (see Section 2.10.4) or by showing this agreement using fs_locations_info (see Section 11.11.2 above). Note that such a grace period can be implemented without interfering with the ability of non-transferred clients to obtain new locks while it is going on. As long as the destination server is aware of the transferred locks, it can distinguish requests to obtain new locks that contrast with existing locks from those that do not, allowing it to treat such client requests without reference to the ongoing grace period. > Section 11.11.1 > > I think the last two paragraphs might be duplicating some things > mentioned earlier in the section, but the repetition is probably not > harmful. > > Section 11.12.1 > > Because of the absence of NFSV4ERR_LEASE_MOVED, it is possible for > file systems whose access path has not changed to be successfully > > It might be worth phrasing this as "SEQ4_STATUS_LEASE_MOVED is not an > error condition". > Will do. > Section 11.12.2 > > o No action needs to be taken for such indications received by the > those performing migration discovery, since continuation of that > work will address the issue. > > nit: "by the those" is not right, but the proper fix eludes me, as this > bullet point needs to be more specific somehow than the next one. > I expect to use the following replacement: o No action needs to be taken for such indications received by any threads performing migration discovery, since continuation of that work will address the issue. > > o If the fs_status attribute indicates that the file system is a > migrated one (i.e. fss_absent is true and fss_type != > STATUS4_REFERRAL) and thus that it is likely that the fetch of the > file system location attribute has cleared one the file systems > contributing to the lease-migrated indication. > > This looks like a sentence fragment -- it's of the form "If X, and thus > Y." with no concluding clause. > It is. Will fix by using the following replacement: o If the fs_status attribute indicates that the file system is a migrated one (i.e. fss_absent is true and fss_type != STATUS4_REFERRAL) then a migrated file system has been found. In this situation, it is likely that the fetch of the file system location attribute has cleared one the file systems contributing to the lease-migrated indication. > Section 11.12.4 > > Once the client has determined the initial migration status, and > determined that there was a shift to a new server, it needs to re- > establish its locking state, if possible. To enable this to happen > without loss of the guarantees normally provided by locking, the > destination server needs to implement a per-fs grace period in all > cases in which lock state was lost, including those in which > Transparent State Migration was not implemented. > > Similarly to above, does this imply that the migration has to happen for > all clients concurrently, as opposed to clients getting migrated in > sequence? > No, it doesn't. The following replacement text should help make this clear: Once the client has determined the initial migration status, and determined that there was a shift to a new server, it needs to re- establish its locking state, if possible. To enable this to happen without loss of the guarantees normally provided by locking, the destination server needs to implement a per-fs grace period in all cases in which lock state was lost, including those in which Transparent State Migration was not implemented. Each client for which there was a shift of locking state to the new server will have the length of the grace period to reclaim its locks, from the time its locks were transferred. > Section 11.3.1 > > In this case, destination server need have no knowledge of the locks > > nit: singular/plural mismatch "destination server"/"need" > "needs have" would not work as "need" was being used as an (undeclinble) auxiliary verb. Went with "does not need any knowledge". Also clarified stuff about grace period with the following replacement text: In this case, destination server does not need any knowledge of the locks held on the source server, but relies on the clients to accurately report (via reclaim operations) the locks previously held, not allowing new locks to be granted on migrated file system until the grace period expires. Note that the diallowing of new locks applies to all clients accessing the file system, while grace period expiration occurs for each migrated client independently. > Section 11.13.3 > > o Not responding with NFS4ERR_SEQ_MISORDERED for the initial request > on a slot within a transferred session, since the destination > > Does this then translate to "process as usual in the absence of > migration"? "Don't return error X" tells me what not to do, but doesn't > really tell me what to do instead. > Intend to use the following replacement text: o Not responding with NFS4ERR_SEQ_MISORDERED for the initial request on a slot within a transferred session, since the destination server cannot be aware of requests made by the client after the server handoff but before the client became aware of the shift. In cases in which NFS4ERR_SEQ_MISORDERED would normally have been reported, the request is to be processed normally, as a new request. > Section 11.16.1 > > With the exception of the transport-flag field (at offset > FSLI4BX_TFLAGS with the fls_info array), all of this data applies to > the replica specified by the entry, rather that the specific network > path used to access it. > > Is it clear that this applies only to the fields defined by this > specification (since, as mentioned later, future extensions must specify > whether they apply to the replica or the entry)? > Intend to use the following replacement text: With the exception of the transport-flag field (at offset FSLI4BX_TFLAGS with the fls_info array), all of this data defuined in this specification applies to the replica specified by the entry, rather that the specific network path used to access it. The classification of data in extensions to this data is discussed below > Section 15.1.1.3 > > o When NFS4ERR_DELAY is returned on an operation other than the > first within a request and there has been a non-idempotent > operation processed before the NFS4ERR_DELAY was returned, the > reissued request should avoid the non-idempotent operation. The > request still must use a SEQUENCE operation with either a > different slot id or sequence value from the SEQUENCE in the > original request. Because this is done, there is no way the > replier could avoid spuriously re-executing the non-idempotent > operation since the different SEQUENCE parameters prevent the > requester from recognizing that the non-idempotent operation us > being retried. > > I don't think that this is very clear about the counterfactual scenario > in which the replier is trying to avoid spuriously re-executing the > non-idempotent operation. Is it supposed to be explaining why the > client has to use a different slot or sequence value, because the > replier would reexecute the non-idempotent operation otherwise? > Expect to use the following replacement text: o When NFS4ERR_DELAY is returned on an operation other than the first within a request and there has been a non-idempotent operation processed before the NFS4ERR_DELAY was returned, reissuing the request as is normally done would incorrectly cause the re-execution of the non-idempotent operation. To avoid this, the reissued request should avoid the non- idempotent operation. The request still must use a SEQUENCE operation with either a different slot id or sequence value from the SEQUENCE in the original request. Because this is done, there is no way the replier could avoid spuriously re-executing the non- idempotent operation since the different SEQUENCE parameters prevent the requester from recognizing that the non-idempotent operation is being retried. > Section 18.35.3 > > I a little bit wonder if we want to reaffirm that co_verifier remains > fixed when the client is establishing multiple connections for trunking > usage -- the "incarnation of the client" language here could make a > reader wonder, though I think the discussion of its use elsewhere as > relating to "client restart" is sufficiently clear. > This should be made clearer but the clarification needs to be done multiple places. Possible replacement text for eighth non-code paragraph of section 2.4: The first field, co_verifier, is a client incarnation verifier, allowing the server to distingish successive incarnations (e.g. reboots) of the same client. The server will start the process of canceling the client's leased state if co_verifier is different than what the server has previously recorded for the identified client (as specified in the co_ownerid field). Likely replacement text for the seventh paragraph of this section: The eia_clientowner field is composed of a co_verifier field and a co_ownerid string. As noted in Section 2.4, the co_ownerid describes the client, and the co_verifier is the incarnation of the client. An EXCHANGE_ID sent with a new incarnation of the client will lead to the server removing lock state of the old incarnation. Whereas an EXCHANGE_ID sent with the current incarnation and co_ownerid will result in an error, an update of the client ID's properties, depending on the arguments to EXCHANGE_ID, or the return of information about the existing client_id as might happen when this opration is done to the same seerver using different network addresses as part of creating trunked connections. > The eia_clientowner field is composed of a co_verifier field and a > co_ownerid string. As noted in s Section 2.4, the co_ownerid > > s/s // > Will fix. > Section 18.51.4 > > o When a server might become the destination for a file system being > migrated, inappropriate use of per-fs RECLAIM_COMPLETE is more > concerning. In the case in which the file system designated is > not within a per-fs grace period, the per-fs RECLAIM_COMPLETE > SHOULD be ignored, with the negative consequences of accepting it > being limited, as in the case in which migration is not supported. > However, if the server encounters a file system undergoing > migration, the operation cannot be accepted as if it were a global > RECLAIM_COMPLETE without invalidating its intended use. > > This seems to be the only place where we acknowledge that the "misuse" > in question was to "treat rca_one_fs of TRUE as if it was FALSE", which > is probably not so great for clarity. > Addressed in the following revised pargraph in SEction 18.51.4: Because previous descriptions of RECLAIM_COMPLETE were not sufficiently explicit about the circumstances in which use of RECLAIM_COMPLETE with rca_one_fs set to TRUE was appropriate, there have been cases which it has been misused by clients who have issued RECLAIM_COMPLETE with rca_one_fs set to TRUE when it should have not been. There have also been cases in which servers have, in various ways, not responded to such misuse as described above, either ignoring the rca_one_fs setting (treating the operation as a global RECLAIM_COMPLETE) or ignoring the entire operation. > Section 21 > > Some other topics at least somewhat related to trunking and migration > that we could potentially justify including in the current, > limited-scope, update (as opposed to deferring for a full -bis) include: > Some of these are related to multi-server namespace but not related to security, as far as I can see. > > - clients that lie about reclaimed locks during a post-migration grace > period > Will address in a number of places: First of all, I inted to add a new paragraph to Section 21, to be placed as the sixth non-bulleted paragraph and to read as follows: Security consideration for lock reclaim differ between the state reclaim done after server failure (discussed in Section 8.4.2.1.1 and the per-fs state reclaim done in support of migration/replication (discussed in Section 11.11.9.1). Next is a new proposed new section to appear as Section 11.11.9.1: 11.11.9.1. Security Consideration Related to Reclaiming Lock State after File System Transitions Although it is possible for a client reclaiming state to misrepresent its state, in the same fashion as described in Section 8.4.2.1.1, most implementations providing for such reclamation in the case of file system transitions will have the ability to detect such misreprsentations. this limits the ability of unauthenicatd clients to execute denial-of-service attacks in these cirsumstances. Nevertheless, the rules stated in Section 8.4.2.1.1, regarding prinipal verification for reclaim requests, apply in this situation as well. Typically,implementations support file system transitions will have extensive information about the locks to be transferred. This is because: o Since failure is not involved, there is no need store to locking information in persistent storage. o There is no need, as there is in the failure case, to update multiple repositories containg locking state to keep them in sync. Instead, there is a one-time communication of locking state from the source to the destination server. o Providing this information avoids potential interference with existing clients using the destination file system, by denying them the ability to obtain new locks during the grace period. When such detailed locking infornation, not necessarily including the associated stateid,s is available, o It is possible to detect reclaim requests that attempt to reclsim locks that did not exist before the transfer, rejecting them with NFS4ERR_RECLAIM_BAD (Section 15.1.9.4). o It is possible when dealing with non-reclaim requests, to determine whether they conflict with existing locks, eliminating the need to return NFS4ERR_GRACE ((Section 15.1.9.2) on non- reclaim requests. It is possible for implementations of grace periods in connection with file system transitions not to have detailed locking information available at the destination server, in which case the security situation is exactly as described in Section 8.4.2.1.1. I think I should also draw your attention to a revised Section 15.1.9. These includes some revisions originally done for draft-ietf-nfsv4-rfc5661-msns-update, which somehow got dropped as a few that turned up as necessary in writing 11.11.9.1: 15.1.9. Reclaim Errors These errors relate to the process of reclaiming locks after a server restart. 15.1.9.1. NFS4ERR_COMPLETE_ALREADY (Error Code 10054) The client previously sent a successful RECLAIM_COMPLETE operation specifying the same scope, whether that scope is global or for the same file system in the case of a per-fs RECLAIM_COMPLETE. An additional RECLAIM_COMPLETE operation is not necessary and results in this error. 15.1.9.2. NFS4ERR_GRACE (Error Code 10013) This error is returned when the server was in its recovery or grace period. with regard to the file system object for which the lock was requested resulting in a situation in which a non-reclaim locking request could not be granted. This can occur because either o The server does not have sufficiuent information about locks that might be poentially reclaimed to determine whether the lock could validly be granted. o The request is made by a client responsible for reclaiming its locks that has not yet done the appropriate RECLAIM_COMPLETE operation, allowing it to proceed to obtain new locks. It should be noted that, in the case of a per-fs grace period, there may be clients, i.e. those currently using the destination file system who might be unaware of the circumstances resulting in the intiation of the grace period. Such clients need to periodically retry the request until the grace period is over, just as other clients do. 15.1.9.3. NFS4ERR_NO_GRACE (Error Code 10033) A reclaim of client state was attempted in circumstances in which the server cannot guarantee that conflicting state has not been provided to another client. This occurs if there is no active grace period appliying to the file system object for which the request was made,if the client making the request has no current role in reclaining locks, or because previous operations have created a situation in which the server is not able to determine that a reclaim-interfering edge condition does not exist. 15.1.9.4. NFS4ERR_RECLAIM_BAD (Error Code 10034) The server has determined that a reclaim attempted by the client is not valid, i.e. the lock specified as being reclaimed could not possibly have existed before the server restart or file system migration event. A server is not obliged to make this determination and will typically rely on the client to only reclaim locks that the client was granted prior to restart. However, when a server does have reliable information to enable it make this determination, this error indicates that the reclaim has been rejected as invalid. This is as opposed to the error NFS4ERR_RECLAIM_CONFLICT (see Section 15.1.9.5) where the server can only determine that there has been an invalid reclaim, but cannot determine which request is invalid. 15.1.9.5. NFS4ERR_RECLAIM_CONFLICT (Error Code 10035) The reclaim attempted by the client has encountered a conflict and cannot be satisfied. This potentially indicates a misbehaving client, although not necessarily the one receiving the error. The misbehavior might be on the part of the client that established the lock with which this client conflicted. See also Section 15.1.9.4 for the related error, NFS4ERR_RECLAIM_BAD - how attacker capabilities compare by using a compromised server to > give bogus referrals/etc. as opposed to just giving bogus data/etc. > Will address. See the paragraphs to be added to the end of Section 21. > - an attacker in the network trying to shift client traffic (in terms of > what endpoints/connections they use) to overload a server > Will address. See the paragraphs to be added to the end of Section 21. > - how asynchronous replication can cause clients to repeat > non-idempotent actions > Not sure what you are referring to. > - the potential for state skew and/or data loss if migration events > happen in close succession and the client "misses a notification" > Is there a specfic problem that needs to be addressed? > - cases where a filesystem moves and there's no longer anything running > at the old network endpoint to return NFS4ERR_MOVED > This seems to me just a recognition that sometimes system fail. Not sure specifically what to address. - what can happen when non-idempotent requests are in a COMPOUND before > a request that gets NFS4ERR_MOVED > Intend to address in Section 15.1.2.4: The file system that contains the current filehandle object is not present at the server, or is not accessible using the network addressed. It may have been made accessible on a different ser of network addresses, relocated or migrated to another server, or it may have never been present. The client may obtain the new file system location by obtaining the "fs_locations" or "fs_locations_info" attribute for the current filehandle. For further discussion, refer to Section 11.3. As with the case of NFS4ERR_DELAY, it is possible that one or more non-idempotent operation may have been successfully executed within a COMPOUND before NFS4ERR_MOVED is returned. Because of this, once the new location is determined, the original request which received the NFS4ERR_MOVED should not be re-executed in full. Instead, a new COMPOUND, with any successfully executed non-idempotent operation removed should be executed. This new request should have different slot id or sequence in those cases in which the same session is used for the new request (i.e. transparent session migration or an endpoint transition to a new address session-trunkable with the original one). > - how bad it is if the client messes up at Transparent State Migration > discovery, most notably in the case when some lock state is lost > Propose to address this by adding the following paragraph to the end of Section 11.13.2: Lease discovery needs to be provided as described above, in order to ensure that migrations are discovered soon enough to ensure that leases moved to new servers are discovred in time to make sure that leases are renewed early enough to avoid lease expiration, leading to loss of locking state. While the consequences of such loss can be ameliorated through implementations of courtesy locks, servers are under no obligation to do, and a conflicting lock request may means that a lock is revoked unexpectedly. Clients should be aware of this possibility. > - the interactions between cached replies and migration(-like) events, > though a lot of this is discussed in section 11.13.X and 15.1.1.3 > already > Will address any specfics that you feel aren't adequately addressed. > > but I defer to the WG as to what to cover now vs. later. > > In light of the ongoing work on draft-ietf-nfsv4-rpc-tls, it might be > reasonable to just talk about "integrity protection" as an abstract > thing without the specific focus on RPCSEC_GSS's integrity protection > (or authentication) > > I was initially leery of this, but when I looked at the text, I was able to avoid referring to RPCSEC_GSS in most cases in which integrity was mentioned:-). The same does not seem posible for authentication :-( > being returned. These include cases in which the client is > directed a server under the control of an attacker, who might get > > nit: "directed to" > Fixed. > o Despite the fact that it is a requirement that "implementations" > provide "support" for use of RPCSEC_GSS, it cannot be assumed that > use of RPCSEC_GSS is always available between any particular > client-server pair. > > side note: scare-quotes around "support" makes sense to me, but not > around "implementations". > Fixed. > > the destination. Even when RPCSEC_GSS authentication is available on > the destination, the server might validly represent itself as the > server to which the client was erroneously directed. Without a way > > Something about the wording here tickles me funny; at first I thought it > was the "validly", but now I think it's "represent itself", perhaps > because that phrasing can have connotations of "falsely represent". > ("Valid" is fine -- the attack here is the misdirection, and the target > of the misdirection doesn't have to misbehave at all for it to be a > damaging attack.) The best remedy I can come up with is a somewhat > drastic change, and thus questionable: "Even when [...], the server > might still properly authenticate as the server to which the client was > erroneously directed." > I think it's a good remedy and I intend to adopt it. > > > I'd also consider adding a third bullet point to the final list ("to > summarize considerations regarding the use of RPCSEC_GSS"): > Actually it is to summarise considertions regarding its use in fetching location information > > % o The integrity protection afforded to results by RPCSEC_GSS protects > % only a given request/response transaction; True but that is in the nature of fetching location information. > RPCSEC_GSS does not > % protect the binding from one server to another as part of a referral > % or migration event. The source server must be trusted to provide > % the correct information, based on whatever factors are available to > % the client. > These are both situations for which RPCSEC_GSS has no solution, but neither is there another one. It is probably best to just say that without reference to integrity protection. I have added new paragraphs after these bullets that may address some of the issues you were concerned about. Even if such requests are not interfered with in flight, it is possible for a compromised server to direct the client to use inappropriate servers, such as those under the control of the attacker. It is not clear that being directed to such servers represents a greater threat to the client than the damage that could be done by the comprromised server itself. However, it is possible that some sorts of transient server compromises might be taken advantage of to direct a client to a server capable of doing greater damage over a longer time. One useful step to guard against this possibility is to issue requests to fetch location data using RPCSEC_GSS, even if no mapping to an RPCSEC_GSS principal is available. In this case, RPCSEC_GSS would not be used, as it typically is, to identify the client principal to the server, but rather to make sure (via RPCSEC_GSS mutual authentication) that the server being contacted is the one intended. Similar considrations apply if the threat to be avoided is the direction of client traffic to inappropriate (i.e. poorly performing) servers. In both cases, there is no reason for the information returned to depend on the identity of the client principal requesting it, while the validity of the server information, which has the capability to affect all client principals, is of considerable importance. > Section 22.1 > > Thank you for thinking about how the IANA considerations should be > presented in the post-update document. (I think I've had to place at > least two Discuss positions on bis documents that did not...) > > Section 23.2 > > I'm not sure that all of the moves from Normative to Informative should > stick; e.g., HMAC (which went from [11] to [59]) is needed for SSV > calculation. Hmm, actually, maybe that's the only one. > Moved it back to being Normative. > > Appendix B > > I have mixed feelings about whether to keep this content for the final > RFC. (Appendix A seems clearly useful; the specific details of the > reorganization are less clear, as to some extent they can be deduced > from the changes themselves. But only to some extent...) > > Appendix B.1.2 > > o The new Sections 11.8 and 11.9 have resulted in existing sections > wit these numbers to be renumbered. > > s/wit/with/ > > Section B.2.1 > > The new treatment can be found in Section 18.35 below. It is > > s/below/above/ > > intended to supersede the treatment in Section 18.35 of RFC5661 [62]. > Publishing a complete replacement for Section 18.35 allows the > corrected definition to be read as a whole, in place of the one in > RFC5661 [62]. > > This seems like it was more appropriate in the scope of > draft-ietf-nfsv4-mv1-msns-update but could be obsolete here. > > Section B.4 > > o The discussion of trunking which appeared in Section 2.10.5 of > RFC5661 [62] needed to be revised, to more clearly explain the > multiple types of trunking supporting and how the client can be > made aware of the existing trunking configuration. In addition > the last paragraph (exclusive of sub-sections) of that section, > dealing with server_owner changes, is literally true, it has been > a source of confusion. [...] > > nit: the grammar here is weird; I think there's a missing "while" or > similar. > > Anticipate using the following replacement text: o The discussion of trunking which appeared in Section 2.10.5 of RFC5661 [62] needed to be revised, to more clearly explain the multiple types of trunking supporting and how the client can be made aware of the existing trunking configuration. In addition, while the last paragraph (exclusive of sub-sections) of that section, dealing with server_owner changes, is literally true, it has been a source of confusion. Since the existing paragraph can be read as suggesting that such changes be dealt with non- disruptively, the issue needs to be clarified in the revised section, which appears in Section 2.10.5.
- [nfsv4] Benjamin Kaduk's Discuss on draft-ietf-nf… Benjamin Kaduk via Datatracker
- Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… David Noveck
- Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
- Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… David Noveck
- [nfsv4] Benjamin Kaduk's Discuss on draft-ietf-nf… David Noveck
- Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
- Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
- Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
- Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… David Noveck
- Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… David Noveck
- Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
- Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… Benjamin Kaduk
- Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… David Noveck
- Re: [nfsv4] Benjamin Kaduk's Discuss on draft-iet… Magnus Westerlund