Re: [rbridge] Comments on draft-ietf-trill-rbridge-protocol-13.txt

Hi Bernard,

Thanks for your very thorough review and detailed comments.

[DE] My responses are below marked with "[DE]" at the left margin and
in a few cases I've inserted addition "[BA]"s in front of your remarks
for clarity.

[BA] Here is my review.

Review of draft-ietf-trill-rbridge-protocol-13.txt
Bernard Aboba

Summary

Overall, I found this document to be very easy to read. From an
architectural perspective, I believe that the authors have done a good
job of articulating where TRILL fits in the IEEE 802 architecture.  In
particular, the document makes it clear that TRILL is designed as a
substitute for (R)STP, but does not represent a re-thinking of
Ethernet or the IEEE 802 architecture in general.  Of course, it is
possible or even likely that some unforeseen interactions could arise,
but my overall assessment is that the authors have been thoughtful in
their design, utilizing encapsulation to enable maximum compatibility
between Rbridges and legacy bridges.

Of course, this thoughtful approach to backward compatibility also
means that Rbridges will have more of an evolutionary impact than
perhaps was first envisaged.  For example, an IEEE 802.11 AP connected
to an infrastructure based on Rbridges will still have some of the
same limitations with respect to restrictions on multiple Associations
within an SSID as would exist if the AP were to be connected to a
conventional bridged Ethernet.

My major areas of concern with this document relates to potential MTU
issues as well as potential compatibility issues with the extensions
enumerated in the Appendix.  There are also a few instances in which
the document is not as clear as it could be, or includes optional
capabilities that IMHO should either be made mandatory to implement or
be eliminated.

[DE] On MTU and optional capabilities, see below.

However, I do not think that these concerns are sufficient to require
the document to be published as Experimental.  If and when TRILL is
deployed, I expect that a number of issues will be found and that a
-bis document will eventually need to be prepared.  That is par for
the course nowadays, and if this were to preclude consideration for
Proposed Standard status, then we wouldn't have many new protocol
specifications under consideration for PS.

[DE] Thanks.

In my view, the more relevant question with respect to status is
whether the protocol is sufficiently well specified so as to preclude
the introduction of widespread interoperability problems.  Where a
document could potentially introduce such problems and where initial
deployment is likely to result in major changes to the design, an
Experimental status would be warranted.  In retrospect, such a Status
would have been more appropriate for protocols such as SIP whose
continual "evolution" has lead to persistent interoperability problems
a decade after its initial introduction.

However, TRILL is based on a mature routing protocol (IS-IS) with
demonstrated interoperability, with some modest enhancements.  Other
than a few optional capabilities which could be made mandatory or
eliminated, and some instances where the text needs to be clarified,
the specification is relatively straightforward, and so it seems
unlikely that we will see numerous major interoperability issues
between TRILL implementations, on the scale of what we have seen with
SIP.

More likely is the potential for compatibility issues between TRILL
and existing legacy bridges.  However, rather than requiring the
authors to enumerate all such potential issues and provide solutions
prior to publication (which could create years of delay), a more
practical approach would be for the document to more clearly enumerate
the scenarios believed to be most suitable for initial deployment.

[DE] The document could say that the use of optimal paths and
multi-pathing are of more benefit the more mesh-like the network is.

Timers

STP has been superseded by RSTP in order to improve convergence times.
Looking through this spec, I'm not clear whether TRILL was designed to
compete with RSTP convergence times, and if so, what the default
values should be.

[DE] RSTP was standardized by 802.1w-2001, long before the proposal of
TRILL. TRILL, as described in this specification, is the application
of link state router technology to the VLAN aware customer bridging
problem. RBridges are true routers, swapping the out MAC addresses on
each hop as well as decrementing a hop count. I think it is better to
view TRILL as a different technical approach with different
characteristics than bridging rather than as some competition based
just on the metric of convergence time. See further remarks below.

Optional functionality

By IETF standards, there is only a modest amount of optional
functionality in this spec, but what there is (ESADI and options)
doesn't seem compelling to me.  Is this functionality really
necessary, or is it in there only to provide "value add"
(e.g. opportunities for non-interoperability)?

[DE] See responses below.

RBridge nicknames

I understand why nicknames are needed (to avoid making the MTU issues
even worse).  However, we have learned with RFC 3927 that collisions
within 16-bit spaces can be painful (e.g.  collision probabilities can
be quite high if you have a substantial number of Rbridges in the
network).  Overall, I think that nickname collisions (along with
optional functionality such as ESADI and Options) represent one of the
potential weaknesses in the spec, particularly since Rbridges are most
attractive in situations (such as datacenters) where the number of
RBridges could be large.  Some tweaks in the algorithms are suggested
below.

[DE] Nicknames not only help with MTU, they also simplify fast path
forwarding lookup, making the table indexes narrower.

[DE] Thanks for providing a pointer to RFC 3927.

[DE] There are significant differences between nickname collisions and
RFC 3927 link local IPv4 address collisions. With RFC 3927 addresses,
you have to pick an address to try from across the entire range, since
you don't know which are in use. Furthermore, two hosts may
simultaneously detect a collision and both re-try with new
values. With nicknames, you pick a new nickname only from among those
that were free, since you can see all the nicknames that are in use in
the link state. And in case of a collision detected simultaneously by
two RBridges, they can both see each other's priority and only one
will retry.

[DE] RFC3927 recommends against more than 1300 hosts on a link, saying
that a new host connecting and picking an address at random has a 98%
of avoiding collision. If an RBridge joins a campus and there is at
least one nickname free, either the RBridge will wait until it has the
link state and then assert the free nickname or, if it immediately
asserts a nickname that collides, either it or, depending on priority,
the RBridge that used to have that nickname, will switch directly to
the free nickname. Of course, as far as I know, no one is planning to
run a campus with anything like that many nicknames in use but,
because of these differences, nickname resolution should be fine with
an order of mangnitude more allocations than the RFC 3927
recommendation. And, since nicknames are only needed by RBridges as
opposed to RFC 3927 IP addresses which are needed by end stations,
there will normally be far fewer nicknames needed than IP addresses
for a particular network size (see comments below in reference to more
than one nickname per RBridge).

Global uniqueness of MAC addresses

We are now seeing situations in which some of the conventional
assumptions made by IEEE 802.1 have broken down.  One of these is the
widespread deployment of virtualization within datacenters.  In these
scenarios, multiple MAC addresses may be assigned to a single
(virtualized) NIC.  To limit a potential explosion in demand for MAC
addresses, MAC addresses can be assigned by management software from a
vendor OUI, and as a result, MAC addresses are not guaranteed to be
unique across VLANs.

In reading through the document, it seems clear to me that the intent
is for traffic to be routed by VLAN and MAC address, so that end
stations are not required to have globally unique MAC addresses
(although Rbridge MAC addresses do need to be globally unique).
However, there are a number of instances in which the text is not as
clear as it could be on this point (see below for detailed comments).

[DE] Yes, as per below, there are a number of cases where it currently
says "MAC address", or the like, and VLAN needs to be added so it says
"VLAN and MAC address", or the like.

Note that the ability of TRILL to handle non-globally unique endstation MAC
addresses is IMHO a major advantage as compared to single-spanning tree
switches, or even "Q in Q" provider bridges.  Some of these advantages
might be worth calling out.  For example:

1. TRILL encapsulation potentially shields legacy bridges from learning
MAC addresses which might cause problems (e.g. single spanning tree
implementations).

2. TRILL encapsulation can shield "Q in Q" provider bridges from
exposure to MAC address duplication which could occur when a provider
needs to handle traffic from customers with their own distinct
datacenters utilizing virtualization.

3. Greenfield TRILL deployments only require end station MAC addresses
to be unique per VLAN.  As a result, TRILL is virtualization-friendly,
and the prohibitions on virtualization described in documents such as
IEEE 802.1X-REV are not necessary in an Rbridge deployment.

MTU issues

While support for PMTU discovery is quite common within TCP
implementations, the same is not true for UDP.  Legacy implementations
that lack support for IEEE 802.1AB-REV, and automatically set an
Ethernet interface MTU to 1500 are quite widespread.  In greenfield
Rbridge installations designed to support a larger MTU between
Rbridges, this should be a solvable problem.  It also should be
addressable in situations where Rbridges are installed alongside
relatively new switches that support MTUs of 1530 or greater.
However, I do wonder what issues could arise in situations where
Rbridges are installed alongside switches that only support Ethernet
frames of 1512 or 1516 octets.  Reading the document, it was not clear
whether MTU probing was "mandatory to implement" but I think this
functionality should be mandatory, if only for diagnostic purposes.  I
also think that the algorithm for MTU determination could be better
specified, perhaps by incorporating elements from RFC 4821.

[DE] There are two issues for MTU: MTU for end station data and MTU
for TRILL IS-IS control frames.

[DE] There needs to be a campus wide lower limit for the MTU of
inter-RBridge links to be sure of correct operation so that TRILL
IS-IS control frames can get through. Each RBridge advertises its
requested MTU for this purpose, the lowest value wins (but not less
than 1470), and all RBridges SHOULD test the inter-RBridge links to
see that their MTU meets or exceed this value. MTU for end station
data frames is a whole different thing. The draft needs to make this
clear. I'll suggest some wording changes on the mailing list.

Port protocols

There are a number of IEEE 802.1 protocols (such as IEEE 802.1X, IEEE
802.1AB, etc.) that relate to the state of a port and utilize a
non-forwardable multicast address.  The draft states that an Rbridge
will not forward frames sent to a non-forwardable multicast address.
This cleanly segments the problem in the sense that TRILL is put
forward purely as a substitute for the spanning tree protocol, leaving
the operation of port protocols intact.  In such a model, the
operation of IEEE 802.1X-2004 and IEEE 802.1AB-REV should not be
affected; these protocols, if implemented at the edge Rbridge, should
operate largely as they do today from the point of view of the end
station.

However, there are some corner cases in which limitations could arise.
One example is an edge device such as wired VOIP handset with one or
more wired Ethernet ports.  These devices often do not implement "port
protocols" such as IEEE 802.1X-2004, but instead operate like a TPMR,
passing them through to the switch.

I do not see this as a problem with the Rbridge specification per se,
because the spec is attempting to mimic the behavior of a conventional
switch in this (and other cases).  However, it might be worth a
sentence explicitly stating that -- and noting that potential
extensions to other cases such as Virtual RBridges or Provider
Rbridges are not excluded, but are left to future work.

[DE] Yes, something about "extensions to support provider bridging
services are left for future work" or the like, since this spec covers
only customer bridging services, seems reasonable.

I would also note that IEEE 802.1X-REV goes beyond "port access
control" to defining "pseudo" and "virtual" ports.  Virtual ports are
created via MACSEC (IEEE 802.1AE) and pseudo ports involve MAC-based
authentication state, so as to allow IEEE 802.1X supplicants to
coexist on shared media. Among other things, pseudo and virtual ports
introduce the notion of IEEE 802.1X traffic that could be destined to
a unicast address (as in IEEE 802.11i) rather than to a
non-forwardable multicast address.  As is stated in the document, the
location of TRILL in the IEEE 802.1 architecture makes it transparent
to these extensions; however in places the document language is
outdated and could be cleaned up (see below for examples).

[DE] See discussion below re pseudo/virtual ports.

Detailed Comments

Abstract

   The design supports VLANs and optimization of the distribution of
   multi-destination frames based on VLAN and IP derived multicast
   groups.  It also allows forwarding tables to be sized according to
   the number of RBridges (rather than the number of end nodes), which
   allows internal forwarding tables to be substantially smaller than in
   conventional bridges.

[BA] Since core bridges can forward based on VLAN tags and not MAC
addresses, this claim seems somewhat exaggerated.

[DE] Well, perhaps that statement in the abstract should be limited to
unicast forwarding information since RBridge multi-destination frame
forwarding SHOULD also prune distribution based on looking at the VLAN
and destination MAC address (although this pruning is only an
optimization so things will work if it is not done). However, I'm not
aware of any 802.1 VLAN aware bridging standards that have a bridge
forward known unicast customer data frames based only on VLAN,
ignoring MAC address. As far as I know, VLAN aware bridge forwarding
lookups are 60-bits wide, a 12-bit VLAN plus 48-bit MAC address.

Section 1

   IEEE 802.1 bridges avoid these problems by transparently gluing many
   physical links into what appears to IP to be a single LAN [802.1D].
   However, 802.1 bridge forwarding using the spanning tree protocol has
   some disadvantages:

[BA] Throughout the document, I get the sense that TRILL is aiming at
a target that is somewhat frozen in time.  Today most new bridge
implementations support RSTP, which offers considerably faster
convergence.  Shortest path bridging is progressing, etc.  Overall,
I'd like the document to be clearer about what advantages apply to
classic STP, and which ones also apply to enhancements such as RSTP,
shortest path bridging, etc.

[DE] It seems to me that it is a good thing that the TRILL working
group has had consistent goals, rather than constantly changing it
goals. RSTP was standardized is 2001 by 802.1w, long before the TRILL
effort started. See Section 2.3 of RFC 5556 (TRILL Problem and
Applicability Statement) for some relevant comments.

[BA] For example, it seems to me that TRILL has advantages over single
spanning tree implementations in terms of ability to customize
forwarding per VLAN.  Based on the document, it seems that it may have
some disadvantages in terms of convergence times and initialization
behavior, as compared with RSTP.  So overall, I think that Section 1
could do a better job of articulating the pros/cons of TRILL.  In
particular, it appears to me that some of the pros described in the
problem statement document have not been realized, and some addition
pros (such as virtualization support) have arisen.

[DE] Convergence is a bit more complex that it seems at first and
depends on engineering, implementation, which aspects of convergence
are included, and how they are measured, as well as protocol. For
example, one might expect RBridges engineered for rapid fail-over to
also implement BFD for rapid failure detection. How would that be
factored in? See also Section 2.3 of RFC 5556.

  in most cases they can incrementally replace IEEE

[BA] The use of "most" here is somewhat vague.  I might say "as
described in Appendix A, they can incrementally replace IEEE..."

[DE] Yes, even though Sections 1 and 2 are supposed to be general
overview section, it seems like the use of general words like "most"
causes some to jump to the conclusion that a more precise description
isn't known, even though it is. The wording should be improved.

   While they can be
   applied to a variety of link protocols, this specification focuses on
   IEEE [802.3] links.

[BA] This would seem to suggest that TRILL could be used on Token
Ring.  You might want to specifically exclude this usage (or
interconnection with source routing bridges, for that matter).

[DE] The next link type of interest among TRILL working group members
beyond 802.3 seems to be PPP and Jim Carlson is working on a draft for
that. But I don't see any reason to rule out yet other types of
links. TRILL was intended to be used with a variety of link types.
You would expect to have a separate document specifying how to handle
each link type. I can't see any reason you couldn't use a Token Ring
(802.5) link to interconnect some number of end stations, RBridge,
and/or bridge ports. From the point of view of bridges, an RBridge is
pretty much an end station. So, while I could be wrong, I would guess
that a document specifying how an RBridge Token Ring port would work
would say that it would handle receipt/transmission of frames with a
token ring functional address the same as other token ring end
stations.

[DE] Wording should be added to the draft saying that a separate
specification would be expected for each link type.

[DE] Source routing bridges are something I know even less about. But
RBridges look like end stations to bridges and terminate spanning
tree. So, I would imagine that if they handled route explorer frames
as if they were an end station, then RBridges would work with source
routed bridges also.

Section 1.2

      Section 2: general RBridge description
      Section 3: the TRILL header
      Section 4: other TRILL protocol details

   In case of conflict, the order of precedence of these section is as
   follows, with those appearing earlier in this list having precedence
   over those that appear later:

         4 > 3 > 2

[BA] "this list" is ambiguous.  Do you mean the above list, in which
case, Section 2 would take precedence over Section 4?  Or do you mean
Section 4 takes precedence over 2?  I think you need to make this more
clear.

[DE] Thanks. Section 4 has highest precedence and 2 has lowest. The
ambiguity in wording needs to be fixed.

Section 1.3

1.3 Terminology and Notation in this document

   "TRILL" is the protocol specified herein while an "RBridge" is a
   devices that implement that protocol.  The second letter in Rbridge
   is case insensitive. Both Rbridge and RBridge are correct.

[BA] s/devices/device/

[DE] Yup, thanks.

   In this document, the term "link", unless otherwise qualified, means
   "bridged LAN", that is to say, the combination of one or more [802.3]
   links with zero or more brides, hubs, repeaters, or the like. The

[BA] Do you really want to be combining IEEE 802.3 links with brides
(or grooms)?  Suggest changing "brides" to "bridges".

[DE] Well, in some ways RBridges are the marriage of Routers and
Bridges but you are probably right about the correction.

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

[BA] You might want to move this to earlier in the section (or give it
its own sub-section).

[DE] OK, it can be moved to the beginning of the section.

Section 2

   RBridges run a link state protocol amongst themselves. This gives
   them enough information to compute pair-wise optimal paths for
   unicast, and calculate distribution trees for delivery of frames
   either to unknown MAC destinations or to multicast/broadcast groups.
   [RBridges] [RP1999]

[BA] I think you mean "unknown VLAN/MAC destination pairs" here, no?

[DE] Yes, this is one of the cases where VLAN needs to be added to MAC
address.

   used, within a campus, even by an RBridge that lacks an IP or other
   Layer 3 transport stack or which has zero configuration and thus no
   Layer 3 address, by transporting SNMP with Ethernet [RFC4789].

[BA] The term "zero configuration" is ambiguous here, since it might
be construed to refer to an RFC 3927 "zero config" IP address within
169.254/16.  In any case, a switch obtaining a dynamic IP address can
still be "zero configuration", so I'm not sure what the point is.

[DE] OK. The term "zero configuration" or an equivalent do not seem
necessary here. And, in general, I think all remaining occurrences of
"zero configuration" should be removed or replaced with "default
configuration" or the like.

Section 2.1

   An RBridge, RB1, which is the VLAN-x forwarder on any of its links
   MUST learn the location of VLAN-x end nodes, both on the links for
   which it is VLAN-x forwarder, and on other links in the campus. RB1
   learns the port and Layer 2 (MAC) addresses of end nodes on links for

[BA] Later on it is made clear that we are talking about learning the
VLAN as well as the port and MAC address.  It's important to be
consistent on this point throughout the document.

[DE] Yes, even though it is speaking of the "VLAN-x forwarder", so
there is an implicaiton that it is learning the MAC within VLAN-x,
this should be made more explicit.

   Also, it can be more secure because not
   only might the enrollment be authenticated (for example by
   cryptographically based EAP methods via [802.1X]), but ESADI also
   supports cryptographic authentication of its messages [RFC5304].

[BA] Elsewhere the document makes it clear that IEEE 802.1X is
implemented below TRILL so that they don't interact.  Here you seem to
be saying that ESADI and EAP could be related.  If IEEE 802.1X
operating on a port or source MAC does not allow non-802.1X frames to
pass, then ESADI should not be announcing the unauthenticated source
MAC, right?  Note that where 802.1X frames are sent to a unicast
address, they *will* be forwarded, and learning will occur; however,
filtering will occur at the edge so as to prevent incoming or outgoing
frames to/from unauthenticated MAC addresses.  The potential for
different behavior of learning and ESADI concerns me.

[DE] Yes, 802.1X is implemented below TRILL but it mentions several
places in the draft that information can flow between TRILL and such
lower level protocols. For example, the RBridge's confidence in a
locally learned address can be influenced by 802.1X authentication.

[DE] The wording should be clarified but it isn't saying that 802.1X
and ESADI are related. It is saying that address learning can be more
secure for two reasons. The first reason is that addresses can be
learned in conjunction with a cryptographically secured Layer 2
authentication protocol of which 802.1X is just an example. The second
reason is that addresses can be securely transmitted through
cryptographically secured ESADI messages. These two reasons are mostly
orthogonal.

   Advertising end nodes using ESADI is optional, as is learning from
   these announcements.

[BA] Does it make any sense to support Advertising but not learning,
or learning and not advertising?  If both are optional, then these
combinations are possible.

[DE] Learning from observing frames at the data level is the bridging
mind-set way of doing things. Learning through the control plane,
i.e. ESADI, is how you would do it coming from a router mind-set. For
interoperability with default configuration, at least one of these two
techniques needs to be mandatory to implement and enabled by
default. In this case, data plane learning prevailed in the working
group and is mandatory to implement and enabled by default.

[DE] All combinations of ESADI advertising/learning are reasonable and
no combination cause any significant interoperability problems if data
plane learning is left enabled, as is the default. Things get more
dicey is you disable data plane learning. But some users demand the
ability to do things like disabling all learning and having only
statically configured addressing information.

[BA] In general, I'm not a fan of optional functionality.  ESADI is
optional, and it's not clear to me that the benefits outweigh the
potential headaches.  Is ESADI really necessary, particularly given
the claims made about scaling with the number of Rbridges, not end
nodes?

[DE] If an end station is unplugged from one RBridge and plugged into
another then, depending on circumstances, frames addressed to that end
station can be black holed. This is, they can be sent to the older
RBridge that the end station used to be connected to until cached
address information at some remote RBridge(s) times out, possibly for
a number of minutes or longer. With ESADI, the link interruption and
establishment from the unplugging and plugging can cause immediate
updates to be sent.

[DE] The forwarding tables of transit RBridges scale with the number
of RBridges rather than the number of end nodes since transit RBridges
don't need to learn end station addresses. But ingress and egress
RBridges do need to learn end station addresses for the VLANs for
which they are an appointed forwarder on one or more ports so their
tables related to encapsulation/decapsulation do scale with the number
of VLAN scoped addresses. Whether you use data plane or control plan
address learning doesn't have that much effect on scaling.

Section 2.2

   2. elimination of the need for original source and destination MAC
      address learning in transit RBridges;

[BA] The specification doesn't discuss "destination MAC address
learning".  Is this a typo?

[DE] Yeah, or maybe best to replace "original source and destination
MAC" with "end station VLAN and MAC".

   3. direction of frames towards the egress RBridge (this enables
      forwarding tables of RBridges to be sized with the number of
      RBridges rather than the total number of end nodes); and,

[BA] I am unclear about the situations in which this claim applies.
Are we only talking about core Rbridges, or edge ones as well?  What
about scaling properties when ESADI is implemented?

[DE] Well, it says "forwarding tables". (1) There is an encapsulation
process at the ingress RBridge. Ingress RBridges have to learn MAC
addresses and VLANs of remote end stations in the VLANs for which they
are ingressing native frames. This encapsulation database grows with
the number of such remote end stations. (2) The ingress RBridge and
each transit RBridge forwards the encapsulated data frame. This is
what uses the forwarding table which scales with the number of
RBridges. (3) There is a decapsulation process at the egress
RBridge. Egress RBridges have to learn MAC addresses and VLANs of
local end stations in the VLANs for which they are egressing native
frames. This decapsulation database grows with the number of such
local end stations.

[DE] ESADI has only to do with the transport of addressing
information, not the amount of such information any particular RBridge
needs. So implementing ESADI has little effect on scaling.

2.2.1 Known-Unicast

   These frames have a unicast inner MAC destination address
   (Inner.MacDA) and are those for which ingress RBridge knows the
   egress RBridge for that destination MAC address.

[BA] I think you mean for "VLAN/destination MAC address pair", no?

[DE] Yes, another case where VLAN needs to be added.

2.2.2

   1. unicast frames for which the destination is unknown: the
      Inner.MacDA is unicast, but the ingress RBridge does not know its
      location;

[BA] do you mean "does not include an entry for the VLAN/destination
MAC address pair"?

[DE] Should have VLAN added and the wording should be tweaked.

   3. multicast frames for which the Layer 2 destination address is not
      derived from an IP multicast address: the Inner.MacDA is
      multicast, and not from the set of Layer 2 multicast addresses
      derived from IPv4 or IPv6 multicast addresses;

[BA] Does this work for *all* multicast addresses not derived from an
IP multicast address (e.g.  addresses used in provider bridging)?

[DE] Frames addressed to the small number of special
bridging/link/TRILL reserved addresses are handled specially. That
exception should be added.

Section 3.3

3.3 Reserved (R)

   The two R bits are reserved for future use in extensions to this
   version zero of the TRILL protocol. They MUST be initially set to
   zero, transparently copied by transit RBridges, and ignored on
   receipt.

[BA] From this sentence, I'm not clear who is ignoring the R bits.
Does the sentence just apply to transit RBridges or all RBridges?

[DE] Good point. Should probably say "They MUST be set to zero when
the TRILL header is added by in ingress RBridge, transparently copied
but otherwise ignored by transit RBridges, and ignored by egress
RBridges.

Section 3.5

   Note: Most RBridge implementations are expected to be optimized for
      the simplest and most common cases of frame forwarding and
      processing. The inclusion of any options may, and the inclusion of
      complex or lengthy options very likely will, cause frame
      processing using a "slow path" with markedly inferior performance
      to "fast path" processing. Limited slow path throughput may cause
      such frames to be lost.

[BA] This makes a very good case for removing options from this
specification.  Do we really need this??  This seems like it will
bring with it all the issues that options have in IPv4, and then some.

[DE] There was some controversy about options and the above warning is
probably over alarmist. The hard size limit on the options area was
based on input from ASIC engineers. The base protocol draft contains
only the minimum hooks for options and the working group consensus has
been to include these hooks. To revisit this question at this point
would cause substantial delay.

Section 3.6

   Although the RBridge MAY decrease the hop count by more than 1, under
   the circumstances described above, the RBridge forwarding a frame
   MUST decrease the hop count by at least 1, and discards the frame if
   it cannot do so because the hop count is 0.

[BA] The MAY here seems dangerous.  Could an implementation decrement
hop count by 10?  This seems like one of those situations where the
spec should be more strict (e.g. SHOULD decrement by 1).  Allowing a
wide variety of behavior seems like it would be courting trouble.  Is
there value in allowing variation, and if so, please explain what that
value is.

[DE] The big fear is that, even with a hop count, multi-destination
frames (multicast, etc.) in some sort of temporary loop can spawn
multiple copies when they go through a distribution tree fork
saturating your network. Known unicast frames are much less dangerous
and TRILL considers the TTL mechanism adequate to keep them under
control. It is because of this concern with multi-destination frames
that they are subject to the mandatory stringent reverse path
forwarding check (RPFC), which should, in conjunction with their hop
count, keep multi-destination frames under control.

[DE] So, assume, say, a TRILL encapsulated broadcast frames arrives at
RBridge X via port-1 with a hop count of 13 while on a distribution
tree such that RBridge X will forward two copies of the frame, one out
of port-2 and one out of port-3. Assume that the most distant RBridge
down the port-2 branch of this distribution tree is 12 hops away. But
the most distant RBridge down the port-3 branch is only 2 hops
away. The RBridge MUST decrease the hop count by at least one, which
in this case, still leaves just enough to complete the distribution on
the max-12 hop branch. But the language you are referring to permits
the RBridge to reduce the hop count by more, up to 10 in this case,
for the max-2 hop branch out of port-3, as long as it still has enough
hop count left in that copy for complete distribution. But it always
has to reduce it by at least one even if that doesn't leave a big
enough hop count for complete distribution. This is an optional extra
safety measure to control multi-destination frames, the most dangerous
kind of frame.

Section 3.7.3

   o  Nickname values MAY be configured. An RBridge that has been
      configured with one or more nickname values will have priority for
      those nickname values over all Rbridges with non-configured
      nicknames.

[BA] RFC 3927 does not permit static assignment of link scope
addresses because it was feared that this would lead to
implementations ignoring collision detection.  I realize that
configured nicknames get priority, but it still seems like a good idea
for an Rbridge to test for conflict before configuring the nickname,
so as to avoid a potential conflict, no?

[DE] A number of working group members believe that some customers
insist on being able to configure everything and would want all the
RBridges in their campus to have pre-assigned nicknames. If it isn't
provided for in the spec, it will be implemented in a variety of
proprietary ways. Under the provisions of this section, I don't see
how it can do any harm. The worst someone could do is manually
configure all their RBridges with the same nickname. Then the nickname
procedure would sort this all out and would force all the but the
highest nickname priority RBridge to change nickname, regardless of
the configuration.

   o  Each RBridge is responsible for ensuring that its nickname or each
      of its nicknames is unique.  If RB1 chooses nickname x, and RB1
      discovers, through receipt of RB2's LSP, that RB2 has also chosen
      x, then the RBridge with the numerically higher priority keeps the
      nickname, or if there is a tie in priority, the RBridge with the
      numerically higher IS-IS System ID keeps the nickname, and the
      other RBridge MUST select a new nickname. This can require an
      RBridge with a configured nickname to select a replacement nickname.

[BA] Given that a configured nickname might need to select a
replacement, what is the value of supporting configuration?

[DE] So that, if you properly configure it, you know what nicknames
particular RBridges will have. Of course, if you improperly configure
them, that is, configure conflicts, then your configuration efforts
were not very useful.

[BA]                                                        Also,
this would suggest that even configured nicknames need to test for
uniqueness prior to configuration, rather than relying on increased
priority due to aging (which is undefined in the spec) to avoid
forcing nodes that have been using a nickname for a long time to
change.

[DE] I believe the spec makes it clear that every RBridge in the
campus has to check that their nickname isn't duplicated elsewhere by
a higher nickname priority RBridge. The time to make that check and
the only time it makes sense to do such a check is when you receive an
LSP that changes what you think some other RBridges nickname and/or
nickname priority are.

[BA] Also, just because RB1 gets RB2's LSP doesn't mean that RB2
simultaneously has RB1's LSP.  So one side can assume that the other
will give up the nickname, but that might not happen for a while.  Do
you want to encourage RB1 to send its LSP right away after it detects
a collision so that the information asymmetry is quickly corrected?

[DE] This really seems to me to be something best left to
implementations and IS-IS. Generally speaking, you send LSPs when
something changes or you get a sequence number PDU from your neighbor
indicating that their link state is incomplete or out of date, in
which case you send them what they are missing or have only an older
copy of. Sure, there can be time delays, but there is no reason in
your example above to think that R1 and R2 are directly
connected. IS-IS reliable flooding assures that every RBridge will end
up with a complete copy of the core link state no matter how long ago
an LSP changed. Spontaneously sending a redundant copy of your LSP
that has already been sent won't speed things up. Spontaneously
sending your LSP right away when you change something in it is good,
but that's what you normally do anyway.

   o  To minimize the probability of nickname collisions, when an
      RBridge selects a new nickname, it does so by randomly hashing
      some of its parameters, e.g., interface MAC addresses, time and
      date, and other entropy sources such as those given in [RFC4086].
      There is no reason for all Rbridges to use the same algorithm for
      selecting nicknames.

[BA] Randomness isn't required to reduce collision probabilities.
It's only necessary for the distribution within the space to be
uniform.  RFC 3927 doesn't require randomness because it was felt that
this would just increase the difficulty of debugging with no net
benefit.  I'd suggest that you rethink this.

[DE] Well, you need a unique seed to start with although you could, as
RFC 3927 suggests, use a pseudo-random number generator
thereafter. For nicknames, the distribution isn't over the space of
all nicknames but only over the nicknames that appear to be available
based on the link state database held by the RBridge selecting a new
nickname.

   An RBridge MAY request multiple nicknames so that it can be the root
   of multiple trees for multipathing of multi-destination frames. These
   trees would all be shortest path trees from the RBridge but, since
   the tree number is used in tie breaking when there are multiple equal
   cost paths (see Section 4.5.1), the different trees will likely
   utilize different links.

[BA] In a deployment with a substantial number of Rbridges, the
collision probability will already be high.  If each Rbridge has
multiple nicknames, it will impact scaling in a negative way.  Have
you thought this through?  For example, in a situation where each
Rbridge has 16 nicknames, you might start seeing high collision
probabilities with as few as 16 Rbridges.  I'd suggest that this
practice be NOT RECOMMENDED.

[DE] It is expected that very few RBridges in a campus would have
multiple nicknames. This would have to be configured by the network
manager since one nickname is the default. If a network manager was
using this feature, they would pick a few RBridges that, becasue of
the network topology, were particularly good places from which to
calculate multiple different shortest path distribution trees. Such
trees need separate nicknames so traffic can be multipathed across
them.

[DE] 16 RBridges each with 16 nicknames isn't going to cause much of a
collision probability, at least in my opinion. That's only 256
nicknames. For example, assume a huge campus with 10,000 RBridges with
random nicknames assigned to those RBridges. This would mean that
about ~1540 RBridges would have an initial colliding nickname so the
~770 lower priority of these RBridges would pick new nikcnames from
the ~54,230 available nicknames. I think this would be expected to
result in ~758 of them picking a nickname not picked by any of the
other RBridges and ~12 experiencing a second collision. So ~6 of the
lower priority of these second colliders would pick a new nickname out
of then ~53,466 available nicknames with an expected probability of
99.93% that all 6 would pick non-conflicting names on their second
try. Less than one in a thousand times, you would have to go to a
third round. Is that sort of behavior OK? I suppose it depends on the
application but I think it would be fine for the initial start-up of
most data centers.

Section 4.1.1

   Frames with the same source address, destination address, VLAN, and
   priority that are received on the same port as each other and are
   transmitted on the same port MUST be transmitted in the order
   received unless the RBridge classifies the frames into more fine
   grained flows, in which case this ordering requirement applies to
   each such flow. (Such frames might not be sent out the same port if
   multipath is implemented. See Appendix C.)

[BA] Do you mean "granular"?

[DE] I don't see much difference between "fine grained" and "finely
granular" but I think the first, which is the current wording, reads
better.

Section 4.2.4.3

   o  Loop avoidance:

      -  Inhibiting itself for a configurable time from zero to 30
         seconds, which defaults to 30 second, after it sees a root
         bridge change on the link (see Section 4.9.3.2).

[BA] The 30 second default might make sense where RBridges are
deployed alongside STP, but is this default needed where the legacy
bridges run RSTP? Overall, I'm curious as to how the failover times in
TRILL will compare to RSTP.  The spec doesn't say much about this.

[DE] 30 seconds was chosen as the default for safety. It's
configurable so, for example, if you know that all your bridges are
RSTP and in the same room with low transmission delays, you can
configure it down. Although an RBridge can see if it is receiving RSTP
BPDUs from immediately adjacent bridges, it would be very hard for an
RBridge to assure itself that RSTP is being used on all the links
interior to an attached bridged LAN. Failover times are more complex
that might seem at first, especially if you include questions related
to the updating of learned addresses, and depend on the engineering
and implemention of the specific devices, the way convergence is
measured, and the specific circumstances, as well as on protocols.

4.2.5.2 TRILL ESADI Information

   The information in ESADI is the list of local end station MAC
   addresses known to the originating RBridge and, for each such
   address, a one octet unsigned "confidence" rating in the range 0-254
   (see Section 4.8). In order to make it practical to optionally
   provide for VLAN ID translation, as specified in a separate document,
   TRILL ESADI frames MUST NOT contain the VLAN ID in the body of the
   frame after the Inner.VLAN tag.

[BA] Does VLAN ID translation really require support for ESADI?

[DE] What the draft tries to say here is that, to support VLAN ID
translation, TRILL ESADI frames, if used, are subject to a
restriction. It says nothing about a requirement to support
ESADI. This comment was included so that, when designing the encoding
of or extending ESADI, nothing would be done that would break VLAN
translation. The wording should be clarified.

Section 4.3.1

   Sz is determined by having each RBridge (optionally) advertise, in
   its LSP, its assumption of the value of the campus-wide Sz. This LSP
   element is known in IS-IS as the originatingLSPBufferSize, TLV #14.
   The default and minimum value for Sz, and the implicitly advertised
   value of Sz if the TLV is absent, is 1470 bytes.

[BA] Given the potential headaches that can be caused by MTU issues, I
wonder whether the spec couldn't be tightened.  If implementations
don't advertise Sz, then it seems like we could end up with a 1470
octet MTU limitation on endstations, even where this might not be
necessary (e.g. MTU discovery could enable a larger MTU).  This seems
like it could cause headaches where it wouldn't be necessary.

[DE] This needs to be clarified. The whole MTU feature was motivated
by problems with TRILL IS-IS frames on inter-RBridge links. The only
thing Sz limits is the size of PDUs generated for TRILL IS-IS (except
for MTU-probe/ack PDUs). This is needed to assure proper operation. It
doesn't really have anything to do with MTU on links to end stations.
There is no way provided in TRILL or IS-IS to communicate an MTU to an
end station. See more below.

[BA] Would it make sense to require Sz to be advertised where MTU
discovery finds a larger MTU size than 1470?

[DE] What do you mean "advertised"? Each RBridge calculates Sz, which
is the maximum size for all TRILL IS-IS PDUs including LSP (link state
PDUs) but, of course, excluding MTU-probes/acks that can be bigger
than Sz for testing. Each RBridge calculates this by calculating the
minimum originatingLSPBufferSize advertised in the link state by any
RBridge but not less than 1470. There is also a facility for
advertising the MTU of links as determined by the MTU probe and
ack. This is advertised with other link information in the link state
database.

Section 4.3.2

   There are two new TRILL IS-IS message types for use between pairs of
   RBridge neighbors to test the bidirectional packet size capacity of
   their connection. These messages are:

      -- MTU-probe
      -- MTU-ack

   Both the MTU-probe and the MTU-ack are padded to the size being
   tested.

[BA] This section doesn't say whether support for MTU-probe is
mandatory.  The way it is written, I'd be concerned that implementers
would not support it, and that lack of MTU discovery capability would
cause problems in deployments.

[BA] This section might also benefit from a more detailed
specification of the MTU discovery algorithm (such as incorporating
elements from the Packetization Layer Path MTU RFC).

[DE] This isn't a path MTU determination. It is a link MTU
determination only applied to Inter-RBridge links. With 802.3 links,
which is what this draft is aimed at, as long as Sz is at the default
1470 bytes, TRILL IS-IS PDUs necessary for proper operation should get
through. But the draft says that RBridges SHOULD check.

Section 4.6

   Source address information ( { VLAN, Outer.MacSA, port } ) is learned
   from any frame with a unicast sources address (see Section 4.8).

[BA] Good to see this clearly stated here.  As noted earlier, there
are places in the document where this is not as clear.

[DE] Yes, hopefully all such places will be fixed.

Section 4.6.1.1

   4. If a unicast destination address is unknown, RB1 handles the frame
      as described in Section 4.6.1.2 for a broadcast frame except that
      the Inner.MacDA is the original native frame's unicast destination
      address.

[BA] Do you mean "if a VLAN/unicast destination address pair is
unknown"?

[DE] Yes, hopefully all such places will be fixed.

Section 4.6.2.4

   If RBn is a transit RBridge the hop count is decremented by one and
   the frame forwarded to the next hop RBridge towards the egress
   RBridge. The Inner.VLAN and ingress nickname are not examined by a
   transit RBridge when it forwards a known unicast TRILL data frame.

[BA] Elsewhere it says that the hop count MAY be decremented by more
than one.

[DE] The only case where an RBridge is permitted to decrease the hop
count by more than one is when it forwards a multi-destination frame
onto a branch of the frame's distribution tree. As discussed above, in
that case, it can reduce the hop count to the distance to the most
remote RBridge in the distribution tree branch. Section 4.6.2.4 that
you are commenting on concerns the handling of known unicast
frames. RBridges are not permitted to decrease their hop count by more
than 1.

Section 4.8.1

   3. By Layer 2 registration protocols learning the { source MAC, VLAN,
      port } of end stations registering at a local port.

[BA] Are you referring to IEEE 802.11 association here?  This won't
tell you the VLAN of the end-station (this might not be determined
until after authentication).

[DE] This provision is not meant to be limited to any particular
existing Layer 2 registration protocol, whether 802.11 or 802.16 or
whatever, and is intended to include Layer 2 registration protocols
specified in the future, where appropriate. "IEEE 802.11 association
and authentication" is used as an example in Section 2.1 and Section
4.2.4.3. "Authentication" is specifically included in the current
draft wording where 802.11 is used as an example.

Section 4.8

   Although outside the scope of this specification, there are some
   Layer 2 features in which a set of VLANs has shared learning, where
   one of the VLANs is the "primary" and the other VLANs in the group
   are "secondaries".

[BA] One concern about this section is that it might be construed to
permit trees shared between VLANs.  You might make it clear that this
is not the intent.

[DE] This section relates to VLAN/MAC address learning where the
separate identities of some number of multiple VLAN are merged, that
is, what 802.1 calls SVL (Shared VLAN Learning), and that should
probably be clarified. However, I'm not sure what "trees" you are
talking about. If you mean TRILL multi-destination frame distribution
trees, they are always shared across all VLANs.

Section 4.9.1

[BA] Given the earlier discussion of "zero config" you might put in a
sentence or two indicating that the configuration bits are only needed
for special circumstances.  You might also state what the default
setting of the bits is.

[DE] I'm not sure that's true. I can easily see a network manager
adopting a policy (and a configuration where this policy is
reasonable), that all ports in their RBridge campus will be configured
as either trunk or access. Would that be "special circumstances"? But
giving default values is probably a good idea.

Section 4.9.2

   Low-level control frames are handled in the lower level port/link
   control logic in the same way as in an [802.1Q-2005] bridge.  This
   can optionally include a variety of 802.1 or link specific protocols
   such as link layer discovery, link aggregation (Clause 43 of
   [802.3]), MAC security [802.1AE], or port based access control
   [802.1X]. While handled at a low level, these frames may affect
   higher level processing. For example, a Layer 2 registration
   protocol may affect the confidence in learned addresses. The upper
   interface

[BA] IEEE 802.1X-REV is no longer purely "port based", since it
supports "pseudo" and "virtual" ports as well.

[DE] It seems to me that the right thing is to add a definition of
"port" to Section 1.3 that makes it clear that the unadorned word
"port" includes pseudo and virtual ports. In addition, some
references, where appropriate, to something as being implemented "in
ports" could be changed to saying implemented "below the EISS layer"
or the like.

Section 6

   IEEE 802.1 port admission and link security mechanisms, such as
   [802.1X] and [802.1AE], can also be used. These are best thought of
   as being implemented within a port and are outside the scope of
   TRILL (just as they are generally out of scope for bridging
   standards [802.1D] and 802.1Q); however, TRILL can make use of
   secure registration through the confidence level communicated in
   optional TRILL ESADI (see Section 4.8).

[BA] Neither IEEE 802.1AE nor IEEE 802.1X-REV are based on physical
ports.  Instead, I'd refer to the diagrams which make it clear that
these mechanisms operate below TRILL.

[DE] OK. See also response immediately above.

Section A.3.4

   The spanning tree solution does quite well in this particular case.
   But it depends on both RB1 and RB2 having implemented the optional
   feature of being able to configure a port to emit BPDUs as
   described in Section A.3.3 above. It also makes the bridged LAN
   whose partition

[BA] This somewhat begs the question about whether being able to emit
BPDUs should be optional, recommended or mandatory.

[DE] The text in Appendix A should be changed to make it clear that it
is only talking about spanning tree BPDUs. As per Section 4.9.3.3 of
the draft, there are conditions under which RBridges SHOULD send
topology change BPDUs and RBridges MAY send spanning tree BPDUs. The
implementation requirement key words appear in the body of the draft.

Appendix C

   When multipathing is used, frames that follow different paths will
   be subject to different delays and may be re-ordered.  While some
   traffic may be order/delay insensitive, typically most traffic
   consists of flows of frames where re-ordering within a flow is
   damaging. How to determine flows or what granularity flows should
   have is beyond the scope of this document but, as an example, under
   many circumstances it would be safe to consider all the frames
   flowing between a particular pair of end station ports to be a
   flow.

[BA] I think you have to say more here, given that Ethernet invariants
include ordering requirements.  Certainly considering all frames
between a pair of end stations to be a flow would be conservative, but
this isn't the Ethernet requirement, right?

[DE] It seems like there are two possible responses to your comment.

[DE] One is to add more detail and complexity. The next step in that
direction would probably be to add priority and VLAN so it would say
"... all the frames of the same priority and in the same VLAN flowing
between ...". But I think that would invite further complaints leading
to further details and yet more complexity in what is supposed to just
be a dead simple, ultra-conservative example.

[DE] The alternative, which I prefer, is to simplify. Just say "How to
determine flows or what granularity they should have is beyond the
scope of this document." dropping the example.

Thanks,
Donald

=============================
 Donald E. Eastlake 3rd   +1-508-634-2066 (home)
 155 Beaver Street
 Milford, MA 01757 USA
 d3e3e3@gmail.com
_______________________________________________
rbridge mailing list
rbridge@postel.org
http://mailman.postel.org/mailman/listinfo/rbridge