Re: [trill] TRILL Resilient Distribution Trees, who have read it?

Hi Thomas,

On Tue, Dec 18, 2012 at 10:35 AM, Thomas Narten <narten@us.ibm.com> wrote:
> Hi.
>
> I understand at a high level what the reational is. What I'm really
> trying to get at is: does the cost/complexity justify the benefit? For
> this, some quantifiable metrics would help - not just generalities
> like "convergence will be faster", etc.
>
> Mingui Zhang <zhangmingui@huawei.com> writes:
>
>> Hi Thomas,
>
>> TRILL provides multicast service using ISIS link state routing. When
>>  there is a failure, RBridges are notified through the propagation
>>  of LSPs, which will trigger a campus-wide convergence on
>>  distribution trees. The convergence includes the following major
>>  processes: 1) the propagation of LSPs around the whole campus; 2)
>>  the recalculation of trees on affected RBridges; 3) the update of
>>  forwarding information on those RBridges. These processes make the
>>  convergence time-consuming.
>
> I take issue with this. The whole point of link-state flooding
> approaches is that they are fast - much faster than alternatives.

They are also more stable than distance-vector methods.

> And as a data point, back in Paris, Paul Unbehagen gave a presentation
> in the IS-IS WG on SPB deployment where (according to my notes):
>
>     They are seeing 34-100ms convergence times and think this can be
>     further reduced by an order of magnitude (recommendation 7)
>
> Won't TRILL have similar convergence times? Are we talking about using
> failover to shave off a few tens of milliseconds? Or is there some
> sort of expectation that this will shave seconds or more off the
> recovery time?

Absolute values of convergence time seem meaningless to me without
knowing how much horsepower the routing computation CPU has, how many
nodes there are, and how much computation you are doing (for example,
how many distribution trees you are computing in a TRILL campus).

> That is, do we have any evidence (or targets) for TRILL convergence
> times (after a topology change) that are "too long"? Just "how fast"
> reconvergence is considered fast enough, and what is considered
> problematical?

Any loss of traffic is bad. A typical acceptable limit for one kind of
inherently redundant traffic, voice, is 50 milliseconds.

A common rule of thumb is that it is good to test every 10
milliseconds so that, after a few successive test packets fail you
can, even allowing for various overhead, bound the switchover time to
be less than 50 milliseconds.

>> As we know, most multicast traffic is
>>  generated by applications which are sensitive to interrupt
>>  latency. Therefore, the convergence will lead to the disruption of
>>  multicast service.
>
> Which applications are those? Are there particular deployment
> scenarios that are being targetted, or is this just a general
> argument?

What I hear about most are real time streaming media, like voice and
video, and distribution of financial information.

>> Due to the propagation delay of LSPs, the distribution trees are
>>  calculated and installed on RBridges at different time, which
>>  brings _inconsistence_ of forwarding states among RBridges. Before
>>  the convergence finishes, multicast frames may be trapped by
>>  forwarding loops and finally dropped. This increases the volume of
>>  broadcast/multicast traffic.

Unicast frames can loop briefly during a transient unless you use
ordered FIBs; however, although there is no technique I am aware of
similar to ordered FIBs for multicast distribution, due to the reverse
path forwarding check, I think it is much more likely that multicast
traffic will be dropped than that they will loop.

> And just how long does this period of "inconsistency" last?

For failures, conditions that will lose traffic really can extend from
the initial failure until the tree forwarding and RPFC information at
all RBridges is consistent with the new topology. Inconsistency of
that information depends on the fastest and slowest RBridges to
adjust. So you could have an RBridge with a particularly powerful
processor that gets the LSPs indicting the change in topology early
and that only has a few ports connected to an RBridge with a
particularly feeble processor that gets the LSPs indicating the change
in topology late and has a zillion ports connected...

> And just
> how quickly will a node know that a path that it is using has failed
> and that it needs to fallback to using the precomputed backup path?

Sometimes you can tell quickly that an directly connected link has
failed due to physical layer or link layer test protocol indications.
For a "path", you may have to send test messages at that level and how
quickly you find out depends on how frequently you test.

> Won't it also take time for news of a failure to propagate through the
> network? In other words, won't the network already be reconfiguring
> itself even before a node realizes that a path has failed and it
> should switch to a backup?

Probably. But if you are a source multiple hops away from the failure
and have detected the failure due to your path test messages, you
might prefer to "instantly" switch to a pre-configured maximally
disjoint alternative tree rather than wait for the slowest
reconfiguring RBridge along some new path the network is trying to
configure itself to before your traffic starts getting through again.

Thanks,
Donald
=============================
 Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
 155 Beaver Street, Milford, MA 01757 USA
 d3e3e3@gmail.com

> So again, I'm really trying to understand just how much of an
> improvement the proposed approach is expected to produce.
>
> To be clear, I'm not necessarily opposed to this work. But I would
> like a better understanding of exactly how much benefit we can really
> expect, so I can understand how to weigh the tradeoffs between
> increased complexity vs. actual benefit.
>
> Thomas
>
> _______________________________________________
> trill mailing list
> trill@ietf.org
> https://www.ietf.org/mailman/listinfo/trill