Notes on NFSv4 Namespace Discussion Scope ----- What is a "global namespace"? Brent Callaghan called for a problem statement on this topic. It seems that we have talked about three different "global namespaces": 1. Intra-cluster namespace. This is the unified namespace for all NAS servers in a tightly-coupled or aggregated cluster. Many proprietary intra-cluster namespace schemes exist today as part of vendor solutions. 2. Enterprise-wide namespace. The majority of the discussions were on this. Coincidentally this is also the most requested form of "global namespace" from the enterprise storage administrators. An enterprise storage environment tends to be heterogeneous, so having enterprise namespace as part of the standard protocol makes sense. 3. World-wide namesapce. This makes possible the "world-wide NFS", with a global URL to each file. I believe if we are successful in defining an enterprise-wide namespace, it is possible to extend it to the global scale, perhaps by leveraging IP DNS resolutions. But the enterprise-wide namespace seems to be of higher priority for us to work on the first. For the rest of this discussion, I will focus on the enterprise-wide namespace. Requirements ------------ What are the requirements for a enterprise-wide namespace? Here is a quick (and probably incomplete) list of basic requirements: - Location Independent: The namespace tree is designed according to business or logical divisions, independent of the physical location of the data. This implies that the namespace needs to maintain a "map" or "location table" that links between the logical namespace and the physical locations. - Unified: There should be a single map of the namespace that all clients agree as authoratative. This implies the existence of a root server and/or central repository for an enterprise domain, but does not imply that each client must mount into this unified namespace in the same way. - Constant and Transparent: It is desired that when the physical location of the data changes due to administrative reasons (either by migration or replication), the namespace to the clients remain constant. The update of the namespace map entry (a.k.a. LTE or Referral) can be achieved transparently to the clients. The client applications continue running, namespace remain constant, while the data is now from a different physical location. In addition to the above three requirements, there are more advanced and/or detailed requirements: - Granularity of namespace mapping. Whether the namespace mapping can happen at the filesystem granularity, or directory granularity, or file granularity? (Julian Satran asked questions related to this.) - Nested Mapping. Is it possible for namespace entry /a/b to link to filerA, while /a/b/c to link to filerB? (Microsoft Dfsroot, to my knowledge, does not support this). - Variable Support. Depending on variables such as client OS, client geographical location, or time-of-day, can the namespace mapping be different? (It is critical to many customer environments, as Skottie Miller pointed out.) - Manageability. Can the namespace be accessed and modified real-time by administrators? by applications? by user groups? How fast does a namespace mapping change propagate to all clients? - Cycle Prevention. Will the namespace tree be guaranteed to be acyclic? - Multi-protocol Interoperability. Will NFSv2 and v3 clients be able to use this same namespace? Will this namespace be synchronized with the CIFS namespace? Architecture ------------ David Robinson gave a nice taxonomy of how a namespace can be achieved. They are "dumb server, smart client" (#1), "smart server, dumb client" (#3), and "somewhere in-between" (#2). For NFS v2/v3 environments, the most popular namespace solution implemented is automounter daemon with automounter maps centrally managed at NIS server or LDAP server. This solution belongs to category #1. The popularity of this solution shows that it at least addressed some of the namespace requirements outlined. In particular, it supports "location independent" requirement (at export granularity) and supports the "unified" requirement. In addition, it supports nested mapping, and supports wildcard variables. Because there is no server to server redirect, there is no cycle issues here either. So why do some NFS enterprise users still ask for a "global namespace"? What is it lacking in an automounter-based solution? Here is what I've heard from NFS administrators. First the update of the automounter map is not completely transparent. Clients which have applications running and keeping the old mount active will not let go the old mount. For some versions of some OS, even after the mount become inactive, the old mount still won't be released, even with "-f" option. Dealing with the multitude of client OS's and versions, this is a difficult problem. (Reflected in Mike Eisler's comment about category #1). Secondly the granularity of this solution is at export level. For some applications that require a global namespace, such as Load Balancing, HSM (or more fashionably, ILM) applications, finer granularity is desired. It looked that everyone on the thread agreed with David that #3 is a "best vendor win" solution, and out of scope for the workgroup. Category #2 becomes the interesting case where by v4 protocol enhancements over v2/v3 that make the client and server both just a little bit smarter, a superior solution (in terms of transparency, granularity, and possibly manageability) to the current automounter/yellowpage solution might be possible. Proposal -------- When reading the email threads, I actually sensed that there are rough agreements on many elements of the architecture from the participants of the discussions. I would like to attempt to draw out a proposal, to test if this is agreeable. Some items of the proposal are taken from Carl Burnett, Jon Haswell and Mario Wurzl's comments, and of course I-Ds from Rob and Dave. First, we choose a central repository, such as LDAP, for the namespace mappings (aka location table?). We can work to define a standard schema for the NFS namespace mappings. This work is not part of the NFSv4 protocol itself, but it's not too far-fetching for us to attempt it for an NFS namespace. There were suggestions of having this namespace to support multi-protocol. That might be too ambitious, since CIFS is hardly under our control, and not even well documented. Second, we need to clarify the client-server interactions based on the "right interpretations" of RFC 3530. Dave has started the work on this. I believe most immediately this is the most challenging piece of work, with confusions remain in how to implement this section of RFC3530 for both the migration case, and the pure referral case. The hope is that this challenge will be overcome, and we'll be able to have the first client, server and namespace server reference implementation of the most basic use of the NFS4ERR_MOVED, fs_location and possibly NFS4ERR_FHEXPIRED. Third, we should define a mechanism with which clients in the enterprise know where to find the root for the NFS enterprise namespace. One simple solution is to leverage the DNS domain, and set up a convention that the DNS name nfsroot always corresponds to the root namespace server. The root namespace server can refer clients to other namespace servers. Schemes should be designed to enforce that the relationship between namespace servers is hierarchical and not cyclical. This scheme can be extended to support world-wide NFS namespace as well. Fourth, backward compatibility with v2 and v3 is very important, as Brent and Skottie emphasized. Automounters are able to access the central repository (LDAP or else) to enforce the namespace mappings by mounting according to the mappings. The control should be in the administrator's hand how he would like to configure the environment and how he would like the clients access the namespace, either by client-based automounter, or via accessing the namespace server. There were comments that allowing one namesapce method is better than two, but in this case allowing both might be necessary, for backward compatibility and future enhancement. Fifth, with NFSv4.x clients accessing the namespace through the namespace server via NFS protocol, it is then possible to enhance the protocol in the form of minor versions to support better transaprency and finer granularity and better manageability. Possible enhancements in 4.x that may worth some discussion include file-level referrals, lifetime on file handles, additional client-server exchange of variable values, etc. If this is a workable architecture, we have the following work items? 1. NFSv4 Global Namespace Problem Statement 2. Clarification on NFSv4 client-server ops involving NFS4ERR_MOVED and fs_location. 3. Best Practice in configuring NFSv4 enterprise namespace, including nfsroot schemes. 4. Proposal for NFSv4 minor version enhancements 5. Proposal for database schema for NFS namespace 6. Prototype implementation of the client, server and namespace server. With the feedback from the group, I volunteer to start immediately to work on item 1, perhaps co-authoring with some of you? I also volunteer to join Dave and Rob on the work on item #2, and collaborate on some initial prototyping. I hava some questions and comments on item #2 regarding detailed ops and I'll send them in a separate email. Footnote 1 ---------- There were discussions whether we are tackling too big a task trying to support "global namespace" in NFSv4. I am optimistic that if we scope it right (for example, the data migration mechanism and the server state transfer mechansim during migration are both clearly out-of-scope), it is possible for us to make pragmatic progress. The minor version capability of NFSv4 is wonderful. AFS VLDB (mentioned by Jim Rees) and Dfs Dfsroot (Jon Haswell explained how that works) have done something like what we want to achieve. NFSv4 will just have the opportunity to be the first open and industry standard protocol that supports global namespace. Footnote 2 ---------- I tried to use the terminologies that Ted Anderson and Nicolas Williams were working on, but I am afraid I inevitably invented some new ones for which I apologize. I'll be happy to change them to standard terms when they became standardized.