Re: [rbridge] Comments on draft-ietf-trill-rbridge-protocol-13.txt
Donald Eastlake <d3e3e3@gmail.com> Sun, 20 September 2009 14:57 UTC
Return-Path: <rbridge-bounces@postel.org>
X-Original-To: ietfarch-trill-archive-Osh9cae4@core3.amsl.com
Delivered-To: ietfarch-trill-archive-Osh9cae4@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id C71F83A67C2 for <ietfarch-trill-archive-Osh9cae4@core3.amsl.com>; Sun, 20 Sep 2009 07:57:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.568
X-Spam-Level:
X-Spam-Status: No, score=-1.568 tagged_above=-999 required=5 tests=[AWL=-0.769, BAYES_50=0.001, GB_I_LETTER=-2, J_CHICKENPOX_54=0.6, J_CHICKENPOX_55=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PFN-6OJjBv7V for <ietfarch-trill-archive-Osh9cae4@core3.amsl.com>; Sun, 20 Sep 2009 07:57:04 -0700 (PDT)
Received: from boreas.isi.edu (boreas.isi.edu [128.9.160.161]) by core3.amsl.com (Postfix) with ESMTP id 5DC103A680F for <trill-archive-Osh9cae4@lists.ietf.org>; Sun, 20 Sep 2009 07:57:04 -0700 (PDT)
Received: from boreas.isi.edu (localhost [127.0.0.1]) by boreas.isi.edu (8.13.8/8.13.8) with ESMTP id n8KEX4hG022853; Sun, 20 Sep 2009 07:33:05 -0700 (PDT)
Received: from mail-ew0-f214.google.com (mail-ew0-f214.google.com [209.85.219.214]) by boreas.isi.edu (8.13.8/8.13.8) with ESMTP id n8KEWRkW022747 for <rbridge@postel.org>; Sun, 20 Sep 2009 07:32:29 -0700 (PDT)
Received: by ewy10 with SMTP id 10so2623387ewy.13 for <rbridge@postel.org>; Sun, 20 Sep 2009 07:32:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type; bh=/4vRKKk2oqMjApAdOyOOgLVOWkoLUQyPoAHV5eVCvBU=; b=OPI3HZQiXjn34gh2rj2cUbdcEwPV62sHheDkDJBuqhmJjMAVjPwTWQfbQko9W6hqGn P4wjjPivRicenCko8/buk0dzuIy7TiCbJD1Gx6WQCbFYz6pKGSN4dJzDMvdw/KtNNaw+ b7I12GZRljTrlhoOe4i1gkcC8r+X0GX6v/YiU=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=b8vRrJ6io62pX8YrrLt0V85aUsuaFQa52O/F2lkCTXji13mHbYmzidi2ABot9NS4ij wVZc/cmOXqasmxmWIhJtr3dvSinf4jVu55oQ2qsRBfm8FljVTXfYQKzpzUumZLjruyIv Tf528qytI23U30tbJj0g7Uxf5BbZaCdGurRx4=
MIME-Version: 1.0
Received: by 10.216.11.136 with SMTP id 8mr1093067wex.106.1253457146786; Sun, 20 Sep 2009 07:32:26 -0700 (PDT)
In-Reply-To: <BLU137-W2B9BC426B17E65D39076393E50@phx.gbl>
References: <EDC652A26FB23C4EB6384A4584434A0401984C29@307622ANEX5.global.avaya.com> <1028365c0909031208m46af5055n5376f67fc1ea6f49@mail.gmail.com> <292BD6E016C29243BB231D329D626C8C02D236E5@USDALSMBS05.ad3.ad.alcatel.com> <BLU137-W2B9BC426B17E65D39076393E50@phx.gbl>
Date: Sun, 20 Sep 2009 10:32:26 -0400
Message-ID: <1028365c0909200732m77156393m9b72c6c5192e742f@mail.gmail.com>
From: Donald Eastlake <d3e3e3@gmail.com>
To: Bernard Aboba <bernard_aboba@hotmail.com>
X-ISI-4-43-8-MailScanner: Found to be clean
X-MailScanner-From: d3e3e3@gmail.com
Cc: rbridge@postel.org, "dromasca@avaya.com" <dromasca@avaya.com>, jari.arkko@piuha.net, rdroms@cisco.com
Subject: Re: [rbridge] Comments on draft-ietf-trill-rbridge-protocol-13.txt
X-BeenThere: rbridge@postel.org
X-Mailman-Version: 2.1.6
Precedence: list
List-Id: "Developing a hybrid router/bridge." <rbridge.postel.org>
List-Unsubscribe: <http://mailman.postel.org/mailman/listinfo/rbridge>, <mailto:rbridge-request@postel.org?subject=unsubscribe>
List-Archive: <http://mailman.postel.org/pipermail/rbridge>
List-Post: <mailto:rbridge@postel.org>
List-Help: <mailto:rbridge-request@postel.org?subject=help>
List-Subscribe: <http://mailman.postel.org/mailman/listinfo/rbridge>, <mailto:rbridge-request@postel.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: rbridge-bounces@postel.org
Errors-To: rbridge-bounces@postel.org
Hi Bernard, Thanks for your very thorough review and detailed comments. [DE] My responses are below marked with "[DE]" at the left margin and in a few cases I've inserted addition "[BA]"s in front of your remarks for clarity. [BA] Here is my review. Review of draft-ietf-trill-rbridge-protocol-13.txt Bernard Aboba Summary Overall, I found this document to be very easy to read. From an architectural perspective, I believe that the authors have done a good job of articulating where TRILL fits in the IEEE 802 architecture. In particular, the document makes it clear that TRILL is designed as a substitute for (R)STP, but does not represent a re-thinking of Ethernet or the IEEE 802 architecture in general. Of course, it is possible or even likely that some unforeseen interactions could arise, but my overall assessment is that the authors have been thoughtful in their design, utilizing encapsulation to enable maximum compatibility between Rbridges and legacy bridges. Of course, this thoughtful approach to backward compatibility also means that Rbridges will have more of an evolutionary impact than perhaps was first envisaged. For example, an IEEE 802.11 AP connected to an infrastructure based on Rbridges will still have some of the same limitations with respect to restrictions on multiple Associations within an SSID as would exist if the AP were to be connected to a conventional bridged Ethernet. My major areas of concern with this document relates to potential MTU issues as well as potential compatibility issues with the extensions enumerated in the Appendix. There are also a few instances in which the document is not as clear as it could be, or includes optional capabilities that IMHO should either be made mandatory to implement or be eliminated. [DE] On MTU and optional capabilities, see below. However, I do not think that these concerns are sufficient to require the document to be published as Experimental. If and when TRILL is deployed, I expect that a number of issues will be found and that a -bis document will eventually need to be prepared. That is par for the course nowadays, and if this were to preclude consideration for Proposed Standard status, then we wouldn't have many new protocol specifications under consideration for PS. [DE] Thanks. In my view, the more relevant question with respect to status is whether the protocol is sufficiently well specified so as to preclude the introduction of widespread interoperability problems. Where a document could potentially introduce such problems and where initial deployment is likely to result in major changes to the design, an Experimental status would be warranted. In retrospect, such a Status would have been more appropriate for protocols such as SIP whose continual "evolution" has lead to persistent interoperability problems a decade after its initial introduction. However, TRILL is based on a mature routing protocol (IS-IS) with demonstrated interoperability, with some modest enhancements. Other than a few optional capabilities which could be made mandatory or eliminated, and some instances where the text needs to be clarified, the specification is relatively straightforward, and so it seems unlikely that we will see numerous major interoperability issues between TRILL implementations, on the scale of what we have seen with SIP. More likely is the potential for compatibility issues between TRILL and existing legacy bridges. However, rather than requiring the authors to enumerate all such potential issues and provide solutions prior to publication (which could create years of delay), a more practical approach would be for the document to more clearly enumerate the scenarios believed to be most suitable for initial deployment. [DE] The document could say that the use of optimal paths and multi-pathing are of more benefit the more mesh-like the network is. Timers STP has been superseded by RSTP in order to improve convergence times. Looking through this spec, I'm not clear whether TRILL was designed to compete with RSTP convergence times, and if so, what the default values should be. [DE] RSTP was standardized by 802.1w-2001, long before the proposal of TRILL. TRILL, as described in this specification, is the application of link state router technology to the VLAN aware customer bridging problem. RBridges are true routers, swapping the out MAC addresses on each hop as well as decrementing a hop count. I think it is better to view TRILL as a different technical approach with different characteristics than bridging rather than as some competition based just on the metric of convergence time. See further remarks below. Optional functionality By IETF standards, there is only a modest amount of optional functionality in this spec, but what there is (ESADI and options) doesn't seem compelling to me. Is this functionality really necessary, or is it in there only to provide "value add" (e.g. opportunities for non-interoperability)? [DE] See responses below. RBridge nicknames I understand why nicknames are needed (to avoid making the MTU issues even worse). However, we have learned with RFC 3927 that collisions within 16-bit spaces can be painful (e.g. collision probabilities can be quite high if you have a substantial number of Rbridges in the network). Overall, I think that nickname collisions (along with optional functionality such as ESADI and Options) represent one of the potential weaknesses in the spec, particularly since Rbridges are most attractive in situations (such as datacenters) where the number of RBridges could be large. Some tweaks in the algorithms are suggested below. [DE] Nicknames not only help with MTU, they also simplify fast path forwarding lookup, making the table indexes narrower. [DE] Thanks for providing a pointer to RFC 3927. [DE] There are significant differences between nickname collisions and RFC 3927 link local IPv4 address collisions. With RFC 3927 addresses, you have to pick an address to try from across the entire range, since you don't know which are in use. Furthermore, two hosts may simultaneously detect a collision and both re-try with new values. With nicknames, you pick a new nickname only from among those that were free, since you can see all the nicknames that are in use in the link state. And in case of a collision detected simultaneously by two RBridges, they can both see each other's priority and only one will retry. [DE] RFC3927 recommends against more than 1300 hosts on a link, saying that a new host connecting and picking an address at random has a 98% of avoiding collision. If an RBridge joins a campus and there is at least one nickname free, either the RBridge will wait until it has the link state and then assert the free nickname or, if it immediately asserts a nickname that collides, either it or, depending on priority, the RBridge that used to have that nickname, will switch directly to the free nickname. Of course, as far as I know, no one is planning to run a campus with anything like that many nicknames in use but, because of these differences, nickname resolution should be fine with an order of mangnitude more allocations than the RFC 3927 recommendation. And, since nicknames are only needed by RBridges as opposed to RFC 3927 IP addresses which are needed by end stations, there will normally be far fewer nicknames needed than IP addresses for a particular network size (see comments below in reference to more than one nickname per RBridge). Global uniqueness of MAC addresses We are now seeing situations in which some of the conventional assumptions made by IEEE 802.1 have broken down. One of these is the widespread deployment of virtualization within datacenters. In these scenarios, multiple MAC addresses may be assigned to a single (virtualized) NIC. To limit a potential explosion in demand for MAC addresses, MAC addresses can be assigned by management software from a vendor OUI, and as a result, MAC addresses are not guaranteed to be unique across VLANs. In reading through the document, it seems clear to me that the intent is for traffic to be routed by VLAN and MAC address, so that end stations are not required to have globally unique MAC addresses (although Rbridge MAC addresses do need to be globally unique). However, there are a number of instances in which the text is not as clear as it could be on this point (see below for detailed comments). [DE] Yes, as per below, there are a number of cases where it currently says "MAC address", or the like, and VLAN needs to be added so it says "VLAN and MAC address", or the like. Note that the ability of TRILL to handle non-globally unique endstation MAC addresses is IMHO a major advantage as compared to single-spanning tree switches, or even "Q in Q" provider bridges. Some of these advantages might be worth calling out. For example: 1. TRILL encapsulation potentially shields legacy bridges from learning MAC addresses which might cause problems (e.g. single spanning tree implementations). 2. TRILL encapsulation can shield "Q in Q" provider bridges from exposure to MAC address duplication which could occur when a provider needs to handle traffic from customers with their own distinct datacenters utilizing virtualization. 3. Greenfield TRILL deployments only require end station MAC addresses to be unique per VLAN. As a result, TRILL is virtualization-friendly, and the prohibitions on virtualization described in documents such as IEEE 802.1X-REV are not necessary in an Rbridge deployment. MTU issues While support for PMTU discovery is quite common within TCP implementations, the same is not true for UDP. Legacy implementations that lack support for IEEE 802.1AB-REV, and automatically set an Ethernet interface MTU to 1500 are quite widespread. In greenfield Rbridge installations designed to support a larger MTU between Rbridges, this should be a solvable problem. It also should be addressable in situations where Rbridges are installed alongside relatively new switches that support MTUs of 1530 or greater. However, I do wonder what issues could arise in situations where Rbridges are installed alongside switches that only support Ethernet frames of 1512 or 1516 octets. Reading the document, it was not clear whether MTU probing was "mandatory to implement" but I think this functionality should be mandatory, if only for diagnostic purposes. I also think that the algorithm for MTU determination could be better specified, perhaps by incorporating elements from RFC 4821. [DE] There are two issues for MTU: MTU for end station data and MTU for TRILL IS-IS control frames. [DE] There needs to be a campus wide lower limit for the MTU of inter-RBridge links to be sure of correct operation so that TRILL IS-IS control frames can get through. Each RBridge advertises its requested MTU for this purpose, the lowest value wins (but not less than 1470), and all RBridges SHOULD test the inter-RBridge links to see that their MTU meets or exceed this value. MTU for end station data frames is a whole different thing. The draft needs to make this clear. I'll suggest some wording changes on the mailing list. Port protocols There are a number of IEEE 802.1 protocols (such as IEEE 802.1X, IEEE 802.1AB, etc.) that relate to the state of a port and utilize a non-forwardable multicast address. The draft states that an Rbridge will not forward frames sent to a non-forwardable multicast address. This cleanly segments the problem in the sense that TRILL is put forward purely as a substitute for the spanning tree protocol, leaving the operation of port protocols intact. In such a model, the operation of IEEE 802.1X-2004 and IEEE 802.1AB-REV should not be affected; these protocols, if implemented at the edge Rbridge, should operate largely as they do today from the point of view of the end station. However, there are some corner cases in which limitations could arise. One example is an edge device such as wired VOIP handset with one or more wired Ethernet ports. These devices often do not implement "port protocols" such as IEEE 802.1X-2004, but instead operate like a TPMR, passing them through to the switch. I do not see this as a problem with the Rbridge specification per se, because the spec is attempting to mimic the behavior of a conventional switch in this (and other cases). However, it might be worth a sentence explicitly stating that -- and noting that potential extensions to other cases such as Virtual RBridges or Provider Rbridges are not excluded, but are left to future work. [DE] Yes, something about "extensions to support provider bridging services are left for future work" or the like, since this spec covers only customer bridging services, seems reasonable. I would also note that IEEE 802.1X-REV goes beyond "port access control" to defining "pseudo" and "virtual" ports. Virtual ports are created via MACSEC (IEEE 802.1AE) and pseudo ports involve MAC-based authentication state, so as to allow IEEE 802.1X supplicants to coexist on shared media. Among other things, pseudo and virtual ports introduce the notion of IEEE 802.1X traffic that could be destined to a unicast address (as in IEEE 802.11i) rather than to a non-forwardable multicast address. As is stated in the document, the location of TRILL in the IEEE 802.1 architecture makes it transparent to these extensions; however in places the document language is outdated and could be cleaned up (see below for examples). [DE] See discussion below re pseudo/virtual ports. Detailed Comments Abstract The design supports VLANs and optimization of the distribution of multi-destination frames based on VLAN and IP derived multicast groups. It also allows forwarding tables to be sized according to the number of RBridges (rather than the number of end nodes), which allows internal forwarding tables to be substantially smaller than in conventional bridges. [BA] Since core bridges can forward based on VLAN tags and not MAC addresses, this claim seems somewhat exaggerated. [DE] Well, perhaps that statement in the abstract should be limited to unicast forwarding information since RBridge multi-destination frame forwarding SHOULD also prune distribution based on looking at the VLAN and destination MAC address (although this pruning is only an optimization so things will work if it is not done). However, I'm not aware of any 802.1 VLAN aware bridging standards that have a bridge forward known unicast customer data frames based only on VLAN, ignoring MAC address. As far as I know, VLAN aware bridge forwarding lookups are 60-bits wide, a 12-bit VLAN plus 48-bit MAC address. Section 1 IEEE 802.1 bridges avoid these problems by transparently gluing many physical links into what appears to IP to be a single LAN [802.1D]. However, 802.1 bridge forwarding using the spanning tree protocol has some disadvantages: [BA] Throughout the document, I get the sense that TRILL is aiming at a target that is somewhat frozen in time. Today most new bridge implementations support RSTP, which offers considerably faster convergence. Shortest path bridging is progressing, etc. Overall, I'd like the document to be clearer about what advantages apply to classic STP, and which ones also apply to enhancements such as RSTP, shortest path bridging, etc. [DE] It seems to me that it is a good thing that the TRILL working group has had consistent goals, rather than constantly changing it goals. RSTP was standardized is 2001 by 802.1w, long before the TRILL effort started. See Section 2.3 of RFC 5556 (TRILL Problem and Applicability Statement) for some relevant comments. [BA] For example, it seems to me that TRILL has advantages over single spanning tree implementations in terms of ability to customize forwarding per VLAN. Based on the document, it seems that it may have some disadvantages in terms of convergence times and initialization behavior, as compared with RSTP. So overall, I think that Section 1 could do a better job of articulating the pros/cons of TRILL. In particular, it appears to me that some of the pros described in the problem statement document have not been realized, and some addition pros (such as virtualization support) have arisen. [DE] Convergence is a bit more complex that it seems at first and depends on engineering, implementation, which aspects of convergence are included, and how they are measured, as well as protocol. For example, one might expect RBridges engineered for rapid fail-over to also implement BFD for rapid failure detection. How would that be factored in? See also Section 2.3 of RFC 5556. in most cases they can incrementally replace IEEE [BA] The use of "most" here is somewhat vague. I might say "as described in Appendix A, they can incrementally replace IEEE..." [DE] Yes, even though Sections 1 and 2 are supposed to be general overview section, it seems like the use of general words like "most" causes some to jump to the conclusion that a more precise description isn't known, even though it is. The wording should be improved. While they can be applied to a variety of link protocols, this specification focuses on IEEE [802.3] links. [BA] This would seem to suggest that TRILL could be used on Token Ring. You might want to specifically exclude this usage (or interconnection with source routing bridges, for that matter). [DE] The next link type of interest among TRILL working group members beyond 802.3 seems to be PPP and Jim Carlson is working on a draft for that. But I don't see any reason to rule out yet other types of links. TRILL was intended to be used with a variety of link types. You would expect to have a separate document specifying how to handle each link type. I can't see any reason you couldn't use a Token Ring (802.5) link to interconnect some number of end stations, RBridge, and/or bridge ports. From the point of view of bridges, an RBridge is pretty much an end station. So, while I could be wrong, I would guess that a document specifying how an RBridge Token Ring port would work would say that it would handle receipt/transmission of frames with a token ring functional address the same as other token ring end stations. [DE] Wording should be added to the draft saying that a separate specification would be expected for each link type. [DE] Source routing bridges are something I know even less about. But RBridges look like end stations to bridges and terminate spanning tree. So, I would imagine that if they handled route explorer frames as if they were an end station, then RBridges would work with source routed bridges also. Section 1.2 Section 2: general RBridge description Section 3: the TRILL header Section 4: other TRILL protocol details In case of conflict, the order of precedence of these section is as follows, with those appearing earlier in this list having precedence over those that appear later: 4 > 3 > 2 [BA] "this list" is ambiguous. Do you mean the above list, in which case, Section 2 would take precedence over Section 4? Or do you mean Section 4 takes precedence over 2? I think you need to make this more clear. [DE] Thanks. Section 4 has highest precedence and 2 has lowest. The ambiguity in wording needs to be fixed. Section 1.3 1.3 Terminology and Notation in this document "TRILL" is the protocol specified herein while an "RBridge" is a devices that implement that protocol. The second letter in Rbridge is case insensitive. Both Rbridge and RBridge are correct. [BA] s/devices/device/ [DE] Yup, thanks. In this document, the term "link", unless otherwise qualified, means "bridged LAN", that is to say, the combination of one or more [802.3] links with zero or more brides, hubs, repeaters, or the like. The [BA] Do you really want to be combining IEEE 802.3 links with brides (or grooms)? Suggest changing "brides" to "bridges". [DE] Well, in some ways RBridges are the marriage of Routers and Bridges but you are probably right about the correction. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. [BA] You might want to move this to earlier in the section (or give it its own sub-section). [DE] OK, it can be moved to the beginning of the section. Section 2 RBridges run a link state protocol amongst themselves. This gives them enough information to compute pair-wise optimal paths for unicast, and calculate distribution trees for delivery of frames either to unknown MAC destinations or to multicast/broadcast groups. [RBridges] [RP1999] [BA] I think you mean "unknown VLAN/MAC destination pairs" here, no? [DE] Yes, this is one of the cases where VLAN needs to be added to MAC address. used, within a campus, even by an RBridge that lacks an IP or other Layer 3 transport stack or which has zero configuration and thus no Layer 3 address, by transporting SNMP with Ethernet [RFC4789]. [BA] The term "zero configuration" is ambiguous here, since it might be construed to refer to an RFC 3927 "zero config" IP address within 169.254/16. In any case, a switch obtaining a dynamic IP address can still be "zero configuration", so I'm not sure what the point is. [DE] OK. The term "zero configuration" or an equivalent do not seem necessary here. And, in general, I think all remaining occurrences of "zero configuration" should be removed or replaced with "default configuration" or the like. Section 2.1 An RBridge, RB1, which is the VLAN-x forwarder on any of its links MUST learn the location of VLAN-x end nodes, both on the links for which it is VLAN-x forwarder, and on other links in the campus. RB1 learns the port and Layer 2 (MAC) addresses of end nodes on links for [BA] Later on it is made clear that we are talking about learning the VLAN as well as the port and MAC address. It's important to be consistent on this point throughout the document. [DE] Yes, even though it is speaking of the "VLAN-x forwarder", so there is an implicaiton that it is learning the MAC within VLAN-x, this should be made more explicit. Also, it can be more secure because not only might the enrollment be authenticated (for example by cryptographically based EAP methods via [802.1X]), but ESADI also supports cryptographic authentication of its messages [RFC5304]. [BA] Elsewhere the document makes it clear that IEEE 802.1X is implemented below TRILL so that they don't interact. Here you seem to be saying that ESADI and EAP could be related. If IEEE 802.1X operating on a port or source MAC does not allow non-802.1X frames to pass, then ESADI should not be announcing the unauthenticated source MAC, right? Note that where 802.1X frames are sent to a unicast address, they *will* be forwarded, and learning will occur; however, filtering will occur at the edge so as to prevent incoming or outgoing frames to/from unauthenticated MAC addresses. The potential for different behavior of learning and ESADI concerns me. [DE] Yes, 802.1X is implemented below TRILL but it mentions several places in the draft that information can flow between TRILL and such lower level protocols. For example, the RBridge's confidence in a locally learned address can be influenced by 802.1X authentication. [DE] The wording should be clarified but it isn't saying that 802.1X and ESADI are related. It is saying that address learning can be more secure for two reasons. The first reason is that addresses can be learned in conjunction with a cryptographically secured Layer 2 authentication protocol of which 802.1X is just an example. The second reason is that addresses can be securely transmitted through cryptographically secured ESADI messages. These two reasons are mostly orthogonal. Advertising end nodes using ESADI is optional, as is learning from these announcements. [BA] Does it make any sense to support Advertising but not learning, or learning and not advertising? If both are optional, then these combinations are possible. [DE] Learning from observing frames at the data level is the bridging mind-set way of doing things. Learning through the control plane, i.e. ESADI, is how you would do it coming from a router mind-set. For interoperability with default configuration, at least one of these two techniques needs to be mandatory to implement and enabled by default. In this case, data plane learning prevailed in the working group and is mandatory to implement and enabled by default. [DE] All combinations of ESADI advertising/learning are reasonable and no combination cause any significant interoperability problems if data plane learning is left enabled, as is the default. Things get more dicey is you disable data plane learning. But some users demand the ability to do things like disabling all learning and having only statically configured addressing information. [BA] In general, I'm not a fan of optional functionality. ESADI is optional, and it's not clear to me that the benefits outweigh the potential headaches. Is ESADI really necessary, particularly given the claims made about scaling with the number of Rbridges, not end nodes? [DE] If an end station is unplugged from one RBridge and plugged into another then, depending on circumstances, frames addressed to that end station can be black holed. This is, they can be sent to the older RBridge that the end station used to be connected to until cached address information at some remote RBridge(s) times out, possibly for a number of minutes or longer. With ESADI, the link interruption and establishment from the unplugging and plugging can cause immediate updates to be sent. [DE] The forwarding tables of transit RBridges scale with the number of RBridges rather than the number of end nodes since transit RBridges don't need to learn end station addresses. But ingress and egress RBridges do need to learn end station addresses for the VLANs for which they are an appointed forwarder on one or more ports so their tables related to encapsulation/decapsulation do scale with the number of VLAN scoped addresses. Whether you use data plane or control plan address learning doesn't have that much effect on scaling. Section 2.2 2. elimination of the need for original source and destination MAC address learning in transit RBridges; [BA] The specification doesn't discuss "destination MAC address learning". Is this a typo? [DE] Yeah, or maybe best to replace "original source and destination MAC" with "end station VLAN and MAC". 3. direction of frames towards the egress RBridge (this enables forwarding tables of RBridges to be sized with the number of RBridges rather than the total number of end nodes); and, [BA] I am unclear about the situations in which this claim applies. Are we only talking about core Rbridges, or edge ones as well? What about scaling properties when ESADI is implemented? [DE] Well, it says "forwarding tables". (1) There is an encapsulation process at the ingress RBridge. Ingress RBridges have to learn MAC addresses and VLANs of remote end stations in the VLANs for which they are ingressing native frames. This encapsulation database grows with the number of such remote end stations. (2) The ingress RBridge and each transit RBridge forwards the encapsulated data frame. This is what uses the forwarding table which scales with the number of RBridges. (3) There is a decapsulation process at the egress RBridge. Egress RBridges have to learn MAC addresses and VLANs of local end stations in the VLANs for which they are egressing native frames. This decapsulation database grows with the number of such local end stations. [DE] ESADI has only to do with the transport of addressing information, not the amount of such information any particular RBridge needs. So implementing ESADI has little effect on scaling. 2.2.1 Known-Unicast These frames have a unicast inner MAC destination address (Inner.MacDA) and are those for which ingress RBridge knows the egress RBridge for that destination MAC address. [BA] I think you mean for "VLAN/destination MAC address pair", no? [DE] Yes, another case where VLAN needs to be added. 2.2.2 1. unicast frames for which the destination is unknown: the Inner.MacDA is unicast, but the ingress RBridge does not know its location; [BA] do you mean "does not include an entry for the VLAN/destination MAC address pair"? [DE] Should have VLAN added and the wording should be tweaked. 3. multicast frames for which the Layer 2 destination address is not derived from an IP multicast address: the Inner.MacDA is multicast, and not from the set of Layer 2 multicast addresses derived from IPv4 or IPv6 multicast addresses; [BA] Does this work for *all* multicast addresses not derived from an IP multicast address (e.g. addresses used in provider bridging)? [DE] Frames addressed to the small number of special bridging/link/TRILL reserved addresses are handled specially. That exception should be added. Section 3.3 3.3 Reserved (R) The two R bits are reserved for future use in extensions to this version zero of the TRILL protocol. They MUST be initially set to zero, transparently copied by transit RBridges, and ignored on receipt. [BA] From this sentence, I'm not clear who is ignoring the R bits. Does the sentence just apply to transit RBridges or all RBridges? [DE] Good point. Should probably say "They MUST be set to zero when the TRILL header is added by in ingress RBridge, transparently copied but otherwise ignored by transit RBridges, and ignored by egress RBridges. Section 3.5 Note: Most RBridge implementations are expected to be optimized for the simplest and most common cases of frame forwarding and processing. The inclusion of any options may, and the inclusion of complex or lengthy options very likely will, cause frame processing using a "slow path" with markedly inferior performance to "fast path" processing. Limited slow path throughput may cause such frames to be lost. [BA] This makes a very good case for removing options from this specification. Do we really need this?? This seems like it will bring with it all the issues that options have in IPv4, and then some. [DE] There was some controversy about options and the above warning is probably over alarmist. The hard size limit on the options area was based on input from ASIC engineers. The base protocol draft contains only the minimum hooks for options and the working group consensus has been to include these hooks. To revisit this question at this point would cause substantial delay. Section 3.6 Although the RBridge MAY decrease the hop count by more than 1, under the circumstances described above, the RBridge forwarding a frame MUST decrease the hop count by at least 1, and discards the frame if it cannot do so because the hop count is 0. [BA] The MAY here seems dangerous. Could an implementation decrement hop count by 10? This seems like one of those situations where the spec should be more strict (e.g. SHOULD decrement by 1). Allowing a wide variety of behavior seems like it would be courting trouble. Is there value in allowing variation, and if so, please explain what that value is. [DE] The big fear is that, even with a hop count, multi-destination frames (multicast, etc.) in some sort of temporary loop can spawn multiple copies when they go through a distribution tree fork saturating your network. Known unicast frames are much less dangerous and TRILL considers the TTL mechanism adequate to keep them under control. It is because of this concern with multi-destination frames that they are subject to the mandatory stringent reverse path forwarding check (RPFC), which should, in conjunction with their hop count, keep multi-destination frames under control. [DE] So, assume, say, a TRILL encapsulated broadcast frames arrives at RBridge X via port-1 with a hop count of 13 while on a distribution tree such that RBridge X will forward two copies of the frame, one out of port-2 and one out of port-3. Assume that the most distant RBridge down the port-2 branch of this distribution tree is 12 hops away. But the most distant RBridge down the port-3 branch is only 2 hops away. The RBridge MUST decrease the hop count by at least one, which in this case, still leaves just enough to complete the distribution on the max-12 hop branch. But the language you are referring to permits the RBridge to reduce the hop count by more, up to 10 in this case, for the max-2 hop branch out of port-3, as long as it still has enough hop count left in that copy for complete distribution. But it always has to reduce it by at least one even if that doesn't leave a big enough hop count for complete distribution. This is an optional extra safety measure to control multi-destination frames, the most dangerous kind of frame. Section 3.7.3 o Nickname values MAY be configured. An RBridge that has been configured with one or more nickname values will have priority for those nickname values over all Rbridges with non-configured nicknames. [BA] RFC 3927 does not permit static assignment of link scope addresses because it was feared that this would lead to implementations ignoring collision detection. I realize that configured nicknames get priority, but it still seems like a good idea for an Rbridge to test for conflict before configuring the nickname, so as to avoid a potential conflict, no? [DE] A number of working group members believe that some customers insist on being able to configure everything and would want all the RBridges in their campus to have pre-assigned nicknames. If it isn't provided for in the spec, it will be implemented in a variety of proprietary ways. Under the provisions of this section, I don't see how it can do any harm. The worst someone could do is manually configure all their RBridges with the same nickname. Then the nickname procedure would sort this all out and would force all the but the highest nickname priority RBridge to change nickname, regardless of the configuration. o Each RBridge is responsible for ensuring that its nickname or each of its nicknames is unique. If RB1 chooses nickname x, and RB1 discovers, through receipt of RB2's LSP, that RB2 has also chosen x, then the RBridge with the numerically higher priority keeps the nickname, or if there is a tie in priority, the RBridge with the numerically higher IS-IS System ID keeps the nickname, and the other RBridge MUST select a new nickname. This can require an RBridge with a configured nickname to select a replacement nickname. [BA] Given that a configured nickname might need to select a replacement, what is the value of supporting configuration? [DE] So that, if you properly configure it, you know what nicknames particular RBridges will have. Of course, if you improperly configure them, that is, configure conflicts, then your configuration efforts were not very useful. [BA] Also, this would suggest that even configured nicknames need to test for uniqueness prior to configuration, rather than relying on increased priority due to aging (which is undefined in the spec) to avoid forcing nodes that have been using a nickname for a long time to change. [DE] I believe the spec makes it clear that every RBridge in the campus has to check that their nickname isn't duplicated elsewhere by a higher nickname priority RBridge. The time to make that check and the only time it makes sense to do such a check is when you receive an LSP that changes what you think some other RBridges nickname and/or nickname priority are. [BA] Also, just because RB1 gets RB2's LSP doesn't mean that RB2 simultaneously has RB1's LSP. So one side can assume that the other will give up the nickname, but that might not happen for a while. Do you want to encourage RB1 to send its LSP right away after it detects a collision so that the information asymmetry is quickly corrected? [DE] This really seems to me to be something best left to implementations and IS-IS. Generally speaking, you send LSPs when something changes or you get a sequence number PDU from your neighbor indicating that their link state is incomplete or out of date, in which case you send them what they are missing or have only an older copy of. Sure, there can be time delays, but there is no reason in your example above to think that R1 and R2 are directly connected. IS-IS reliable flooding assures that every RBridge will end up with a complete copy of the core link state no matter how long ago an LSP changed. Spontaneously sending a redundant copy of your LSP that has already been sent won't speed things up. Spontaneously sending your LSP right away when you change something in it is good, but that's what you normally do anyway. o To minimize the probability of nickname collisions, when an RBridge selects a new nickname, it does so by randomly hashing some of its parameters, e.g., interface MAC addresses, time and date, and other entropy sources such as those given in [RFC4086]. There is no reason for all Rbridges to use the same algorithm for selecting nicknames. [BA] Randomness isn't required to reduce collision probabilities. It's only necessary for the distribution within the space to be uniform. RFC 3927 doesn't require randomness because it was felt that this would just increase the difficulty of debugging with no net benefit. I'd suggest that you rethink this. [DE] Well, you need a unique seed to start with although you could, as RFC 3927 suggests, use a pseudo-random number generator thereafter. For nicknames, the distribution isn't over the space of all nicknames but only over the nicknames that appear to be available based on the link state database held by the RBridge selecting a new nickname. An RBridge MAY request multiple nicknames so that it can be the root of multiple trees for multipathing of multi-destination frames. These trees would all be shortest path trees from the RBridge but, since the tree number is used in tie breaking when there are multiple equal cost paths (see Section 4.5.1), the different trees will likely utilize different links. [BA] In a deployment with a substantial number of Rbridges, the collision probability will already be high. If each Rbridge has multiple nicknames, it will impact scaling in a negative way. Have you thought this through? For example, in a situation where each Rbridge has 16 nicknames, you might start seeing high collision probabilities with as few as 16 Rbridges. I'd suggest that this practice be NOT RECOMMENDED. [DE] It is expected that very few RBridges in a campus would have multiple nicknames. This would have to be configured by the network manager since one nickname is the default. If a network manager was using this feature, they would pick a few RBridges that, becasue of the network topology, were particularly good places from which to calculate multiple different shortest path distribution trees. Such trees need separate nicknames so traffic can be multipathed across them. [DE] 16 RBridges each with 16 nicknames isn't going to cause much of a collision probability, at least in my opinion. That's only 256 nicknames. For example, assume a huge campus with 10,000 RBridges with random nicknames assigned to those RBridges. This would mean that about ~1540 RBridges would have an initial colliding nickname so the ~770 lower priority of these RBridges would pick new nikcnames from the ~54,230 available nicknames. I think this would be expected to result in ~758 of them picking a nickname not picked by any of the other RBridges and ~12 experiencing a second collision. So ~6 of the lower priority of these second colliders would pick a new nickname out of then ~53,466 available nicknames with an expected probability of 99.93% that all 6 would pick non-conflicting names on their second try. Less than one in a thousand times, you would have to go to a third round. Is that sort of behavior OK? I suppose it depends on the application but I think it would be fine for the initial start-up of most data centers. Section 4.1.1 Frames with the same source address, destination address, VLAN, and priority that are received on the same port as each other and are transmitted on the same port MUST be transmitted in the order received unless the RBridge classifies the frames into more fine grained flows, in which case this ordering requirement applies to each such flow. (Such frames might not be sent out the same port if multipath is implemented. See Appendix C.) [BA] Do you mean "granular"? [DE] I don't see much difference between "fine grained" and "finely granular" but I think the first, which is the current wording, reads better. Section 4.2.4.3 o Loop avoidance: - Inhibiting itself for a configurable time from zero to 30 seconds, which defaults to 30 second, after it sees a root bridge change on the link (see Section 4.9.3.2). [BA] The 30 second default might make sense where RBridges are deployed alongside STP, but is this default needed where the legacy bridges run RSTP? Overall, I'm curious as to how the failover times in TRILL will compare to RSTP. The spec doesn't say much about this. [DE] 30 seconds was chosen as the default for safety. It's configurable so, for example, if you know that all your bridges are RSTP and in the same room with low transmission delays, you can configure it down. Although an RBridge can see if it is receiving RSTP BPDUs from immediately adjacent bridges, it would be very hard for an RBridge to assure itself that RSTP is being used on all the links interior to an attached bridged LAN. Failover times are more complex that might seem at first, especially if you include questions related to the updating of learned addresses, and depend on the engineering and implemention of the specific devices, the way convergence is measured, and the specific circumstances, as well as on protocols. 4.2.5.2 TRILL ESADI Information The information in ESADI is the list of local end station MAC addresses known to the originating RBridge and, for each such address, a one octet unsigned "confidence" rating in the range 0-254 (see Section 4.8). In order to make it practical to optionally provide for VLAN ID translation, as specified in a separate document, TRILL ESADI frames MUST NOT contain the VLAN ID in the body of the frame after the Inner.VLAN tag. [BA] Does VLAN ID translation really require support for ESADI? [DE] What the draft tries to say here is that, to support VLAN ID translation, TRILL ESADI frames, if used, are subject to a restriction. It says nothing about a requirement to support ESADI. This comment was included so that, when designing the encoding of or extending ESADI, nothing would be done that would break VLAN translation. The wording should be clarified. Section 4.3.1 Sz is determined by having each RBridge (optionally) advertise, in its LSP, its assumption of the value of the campus-wide Sz. This LSP element is known in IS-IS as the originatingLSPBufferSize, TLV #14. The default and minimum value for Sz, and the implicitly advertised value of Sz if the TLV is absent, is 1470 bytes. [BA] Given the potential headaches that can be caused by MTU issues, I wonder whether the spec couldn't be tightened. If implementations don't advertise Sz, then it seems like we could end up with a 1470 octet MTU limitation on endstations, even where this might not be necessary (e.g. MTU discovery could enable a larger MTU). This seems like it could cause headaches where it wouldn't be necessary. [DE] This needs to be clarified. The whole MTU feature was motivated by problems with TRILL IS-IS frames on inter-RBridge links. The only thing Sz limits is the size of PDUs generated for TRILL IS-IS (except for MTU-probe/ack PDUs). This is needed to assure proper operation. It doesn't really have anything to do with MTU on links to end stations. There is no way provided in TRILL or IS-IS to communicate an MTU to an end station. See more below. [BA] Would it make sense to require Sz to be advertised where MTU discovery finds a larger MTU size than 1470? [DE] What do you mean "advertised"? Each RBridge calculates Sz, which is the maximum size for all TRILL IS-IS PDUs including LSP (link state PDUs) but, of course, excluding MTU-probes/acks that can be bigger than Sz for testing. Each RBridge calculates this by calculating the minimum originatingLSPBufferSize advertised in the link state by any RBridge but not less than 1470. There is also a facility for advertising the MTU of links as determined by the MTU probe and ack. This is advertised with other link information in the link state database. Section 4.3.2 There are two new TRILL IS-IS message types for use between pairs of RBridge neighbors to test the bidirectional packet size capacity of their connection. These messages are: -- MTU-probe -- MTU-ack Both the MTU-probe and the MTU-ack are padded to the size being tested. [BA] This section doesn't say whether support for MTU-probe is mandatory. The way it is written, I'd be concerned that implementers would not support it, and that lack of MTU discovery capability would cause problems in deployments. [BA] This section might also benefit from a more detailed specification of the MTU discovery algorithm (such as incorporating elements from the Packetization Layer Path MTU RFC). [DE] This isn't a path MTU determination. It is a link MTU determination only applied to Inter-RBridge links. With 802.3 links, which is what this draft is aimed at, as long as Sz is at the default 1470 bytes, TRILL IS-IS PDUs necessary for proper operation should get through. But the draft says that RBridges SHOULD check. Section 4.6 Source address information ( { VLAN, Outer.MacSA, port } ) is learned from any frame with a unicast sources address (see Section 4.8). [BA] Good to see this clearly stated here. As noted earlier, there are places in the document where this is not as clear. [DE] Yes, hopefully all such places will be fixed. Section 4.6.1.1 4. If a unicast destination address is unknown, RB1 handles the frame as described in Section 4.6.1.2 for a broadcast frame except that the Inner.MacDA is the original native frame's unicast destination address. [BA] Do you mean "if a VLAN/unicast destination address pair is unknown"? [DE] Yes, hopefully all such places will be fixed. Section 4.6.2.4 If RBn is a transit RBridge the hop count is decremented by one and the frame forwarded to the next hop RBridge towards the egress RBridge. The Inner.VLAN and ingress nickname are not examined by a transit RBridge when it forwards a known unicast TRILL data frame. [BA] Elsewhere it says that the hop count MAY be decremented by more than one. [DE] The only case where an RBridge is permitted to decrease the hop count by more than one is when it forwards a multi-destination frame onto a branch of the frame's distribution tree. As discussed above, in that case, it can reduce the hop count to the distance to the most remote RBridge in the distribution tree branch. Section 4.6.2.4 that you are commenting on concerns the handling of known unicast frames. RBridges are not permitted to decrease their hop count by more than 1. Section 4.8.1 3. By Layer 2 registration protocols learning the { source MAC, VLAN, port } of end stations registering at a local port. [BA] Are you referring to IEEE 802.11 association here? This won't tell you the VLAN of the end-station (this might not be determined until after authentication). [DE] This provision is not meant to be limited to any particular existing Layer 2 registration protocol, whether 802.11 or 802.16 or whatever, and is intended to include Layer 2 registration protocols specified in the future, where appropriate. "IEEE 802.11 association and authentication" is used as an example in Section 2.1 and Section 4.2.4.3. "Authentication" is specifically included in the current draft wording where 802.11 is used as an example. Section 4.8 Although outside the scope of this specification, there are some Layer 2 features in which a set of VLANs has shared learning, where one of the VLANs is the "primary" and the other VLANs in the group are "secondaries". [BA] One concern about this section is that it might be construed to permit trees shared between VLANs. You might make it clear that this is not the intent. [DE] This section relates to VLAN/MAC address learning where the separate identities of some number of multiple VLAN are merged, that is, what 802.1 calls SVL (Shared VLAN Learning), and that should probably be clarified. However, I'm not sure what "trees" you are talking about. If you mean TRILL multi-destination frame distribution trees, they are always shared across all VLANs. Section 4.9.1 [BA] Given the earlier discussion of "zero config" you might put in a sentence or two indicating that the configuration bits are only needed for special circumstances. You might also state what the default setting of the bits is. [DE] I'm not sure that's true. I can easily see a network manager adopting a policy (and a configuration where this policy is reasonable), that all ports in their RBridge campus will be configured as either trunk or access. Would that be "special circumstances"? But giving default values is probably a good idea. Section 4.9.2 Low-level control frames are handled in the lower level port/link control logic in the same way as in an [802.1Q-2005] bridge. This can optionally include a variety of 802.1 or link specific protocols such as link layer discovery, link aggregation (Clause 43 of [802.3]), MAC security [802.1AE], or port based access control [802.1X]. While handled at a low level, these frames may affect higher level processing. For example, a Layer 2 registration protocol may affect the confidence in learned addresses. The upper interface [BA] IEEE 802.1X-REV is no longer purely "port based", since it supports "pseudo" and "virtual" ports as well. [DE] It seems to me that the right thing is to add a definition of "port" to Section 1.3 that makes it clear that the unadorned word "port" includes pseudo and virtual ports. In addition, some references, where appropriate, to something as being implemented "in ports" could be changed to saying implemented "below the EISS layer" or the like. Section 6 IEEE 802.1 port admission and link security mechanisms, such as [802.1X] and [802.1AE], can also be used. These are best thought of as being implemented within a port and are outside the scope of TRILL (just as they are generally out of scope for bridging standards [802.1D] and 802.1Q); however, TRILL can make use of secure registration through the confidence level communicated in optional TRILL ESADI (see Section 4.8). [BA] Neither IEEE 802.1AE nor IEEE 802.1X-REV are based on physical ports. Instead, I'd refer to the diagrams which make it clear that these mechanisms operate below TRILL. [DE] OK. See also response immediately above. Section A.3.4 The spanning tree solution does quite well in this particular case. But it depends on both RB1 and RB2 having implemented the optional feature of being able to configure a port to emit BPDUs as described in Section A.3.3 above. It also makes the bridged LAN whose partition [BA] This somewhat begs the question about whether being able to emit BPDUs should be optional, recommended or mandatory. [DE] The text in Appendix A should be changed to make it clear that it is only talking about spanning tree BPDUs. As per Section 4.9.3.3 of the draft, there are conditions under which RBridges SHOULD send topology change BPDUs and RBridges MAY send spanning tree BPDUs. The implementation requirement key words appear in the body of the draft. Appendix C When multipathing is used, frames that follow different paths will be subject to different delays and may be re-ordered. While some traffic may be order/delay insensitive, typically most traffic consists of flows of frames where re-ordering within a flow is damaging. How to determine flows or what granularity flows should have is beyond the scope of this document but, as an example, under many circumstances it would be safe to consider all the frames flowing between a particular pair of end station ports to be a flow. [BA] I think you have to say more here, given that Ethernet invariants include ordering requirements. Certainly considering all frames between a pair of end stations to be a flow would be conservative, but this isn't the Ethernet requirement, right? [DE] It seems like there are two possible responses to your comment. [DE] One is to add more detail and complexity. The next step in that direction would probably be to add priority and VLAN so it would say "... all the frames of the same priority and in the same VLAN flowing between ...". But I think that would invite further complaints leading to further details and yet more complexity in what is supposed to just be a dead simple, ultra-conservative example. [DE] The alternative, which I prefer, is to simplify. Just say "How to determine flows or what granularity they should have is beyond the scope of this document." dropping the example. Thanks, Donald ============================= Donald E. Eastlake 3rd +1-508-634-2066 (home) 155 Beaver Street Milford, MA 01757 USA d3e3e3@gmail.com _______________________________________________ rbridge mailing list rbridge@postel.org http://mailman.postel.org/mailman/listinfo/rbridge
- [rbridge] Review of draft-ietf-trill-rbridge-prot… Romascanu, Dan (Dan)
- Re: [rbridge] Review of draft-ietf-trill-rbridge-… Donald Eastlake
- [rbridge] Fwd: Review of draft-ietf-trill-rbridge… Donald Eastlake
- Re: [rbridge] Review of draft-ietf-trill-rbridge-… Caitlin Bestler
- Re: [rbridge] Review of draft-ietf-trill-rbridge-… FEDYK Don
- [rbridge] Comments on draft-ietf-trill-rbridge-pr… Bernard Aboba
- [rbridge] Review of draft-ietf-trill-rbridge-prot… Roger Lapuh
- Re: [rbridge] Review of draft-ietf-trill-rbridge-… Caitlin Bestler
- Re: [rbridge] Review of draft-ietf-trill-rbridge-… Radia Perlman
- Re: [rbridge] Review of draft-ietf-trill-rbridge-… Donald Eastlake
- Re: [rbridge] Comments on draft-ietf-trill-rbridg… Donald Eastlake
- Re: [rbridge] Comments on draft-ietf-trill-rbridg… Donald Eastlake
- [rbridge] Fwd: Comments on draft-ietf-trill-rbrid… Donald Eastlake
- Re: [rbridge] Comments on draft-ietf-trill-rbridg… Donald Eastlake