Re: [trill] TRILL Resilient Distribution Trees, who have read it?

Thomas Narten <narten@us.ibm.com> Tue, 18 December 2012 15:36 UTC

Message-Id: <201212181535.qBIFZqAs012411@cichlid.raleigh.ibm.com>
To: Mingui Zhang <zhangmingui@huawei.com>
In-reply-to: <4552F0907735844E9204A62BBDD325E73213A467@SZXEML507-MBS.china.huawei.com>
References: <4552F0907735844E9204A62BBDD325E732132977@SZXEML507-MBS.china.huawei.com> <201212141659.qBEGxxqU012075@cichlid.raleigh.ibm.com> <4552F0907735844E9204A62BBDD325E73213A467@SZXEML507-MBS.china.huawei.com>
Comments: In-reply-to Mingui Zhang <zhangmingui@huawei.com> message dated "Mon, 17 Dec 2012 07:12:41 +0000."
Date: Tue, 18 Dec 2012 10:35:51 -0500
From: Thomas Narten <narten@us.ibm.com>
Cc: "trill@ietf.org" <trill@ietf.org>
Subject: Re: [trill] TRILL Resilient Distribution Trees, who have read it?
Precedence: list

Hi.

I understand at a high level what the reational is. What I'm really
trying to get at is: does the cost/complexity justify the benefit? For
this, some quantifiable metrics would help - not just generalities
like "convergence will be faster", etc.

Mingui Zhang <zhangmingui@huawei.com> writes:

> Hi Thomas,

> TRILL provides multicast service using ISIS link state routing. When
>  there is a failure, RBridges are notified through the propagation
>  of LSPs, which will trigger a campus-wide convergence on
>  distribution trees. The convergence includes the following major
>  processes: 1) the propagation of LSPs around the whole campus; 2)
>  the recalculation of trees on affected RBridges; 3) the update of
>  forwarding information on those RBridges. These processes make the
>  convergence time-consuming.

I take issue with this. The whole point of link-state flooding
approaches is that they are fast - much faster than alternatives.

And as a data point, back in Paris, Paul Unbehagen gave a presentation
in the IS-IS WG on SPB deployment where (according to my notes):

    They are seeing 34-100ms convergence times and think this can be
    further reduced by an order of magnitude (recommendation 7)

Won't TRILL have similar convergence times? Are we talking about using
failover to shave off a few tens of milliseconds? Or is there some
sort of expectation that this will shave seconds or more off the
recovery time?

That is, do we have any evidence (or targets) for TRILL convergence
times (after a topology change) that are "too long"? Just "how fast"
reconvergence is considered fast enough, and what is considered
problematical?

> As we know, most multicast traffic is
>  generated by applications which are sensitive to interrupt
>  latency. Therefore, the convergence will lead to the disruption of
>  multicast service.

Which applications are those? Are there particular deployment
scenarios that are being targetted, or is this just a general
argument?

> Due to the propagation delay of LSPs, the distribution trees are
>  calculated and installed on RBridges at different time, which
>  brings _inconsistence_ of forwarding states among RBridges. Before
>  the convergence finishes, multicast frames may be trapped by
>  forwarding loops and finally dropped. This increases the volume of
>  broadcast/multicast traffic.

And just how long does this period of "inconsistency" last? And just
how quickly will a node know that a path that it is using has failed
and that it needs to fallback to using the precomputed backup path?
Won't it also take time for news of a failure to propagate through the
network? In other words, won't the network already be reconfiguring
itself even before a node realizes that a path has failed and it
should switch to a backup?

So again, I'm really trying to understand just how much of an
improvement the proposed approach is expected to produce.

To be clear, I'm not necessarily opposed to this work. But I would
like a better understanding of exactly how much benefit we can really
expect, so I can understand how to weigh the tradeoffs between
increased complexity vs. actual benefit.

Thomas

[trill] TRILL Resilient Distribution Trees, who h… Mingui Zhang
Re: [trill] TRILL Resilient Distribution Trees, w… gayle noble
Re: [trill] TRILL Resilient Distribution Trees, w… Anoop Ghanwani
Re: [trill] TRILL Resilient Distribution Trees, w… Ayan Banerjee
[trill] 答复: TRILL Resilient Distribution Trees, w… Haoweiguo
Re: [trill] TRILL Resilient Distribution Trees, w… Pathangi_Janardhanan
[trill] 答复: TRILL Resilient Distribution Trees, w… Guyingjie (Yingjie)
Re: [trill] TRILL Resilient Distribution Trees, w… Linda Dunbar
Re: [trill] TRILL Resilient Distribution Trees, w… Sam Aldrin
Re: [trill] TRILL Resilient Distribution Trees, w… zhai.hongjun
Re: [trill] TRILL Resilient Distribution Trees, w… Mingui Zhang
Re: [trill] TRILL Resilient Distribution Trees, w… Sam Aldrin
[trill] 答复: TRILL Resilient Distribution Trees, w… Zhangxudong
Re: [trill] TRILL Resilient Distribution Trees, w… gayle noble
Re: [trill] TRILL Resilient Distribution Trees, w… Mingui Zhang
Re: [trill] TRILL Resilient Distribution Trees, w… Mingui Zhang
Re: [trill] TRILL Resilient Distribution Trees, w… gayle noble
Re: [trill] TRILL Resilient Distribution Trees, w… hu.fangwei
Re: [trill] TRILL Resilient Distribution Trees, w… Thomas Narten
Re: [trill] TRILL Resilient Distribution Trees, w… Mingui Zhang
Re: [trill] TRILL Resilient Distribution Trees, w… Thomas Narten
Re: [trill] TRILL Resilient Distribution Trees, w… Mingui Zhang
Re: [trill] TRILL Resilient Distribution Trees, w… Donald Eastlake