Re: [Lsr] Moving Forward [Re: Flooding Reduction Draft Redux]

"Les Ginsberg (ginsberg)" <ginsberg@cisco.com> Tue, 05 March 2019 05:08 UTC

IronPort-PHdr: 9a23:Wui3nRV2Znsvyq0FjWahopLaKZ7V8LGuZFwc94YnhrRSc6+q45XlOgnF6O5wiEPSA9yJ8OpK3uzRta2oGXcN55qMqjgjSNRNTFdE7KdehAk8GIiAAEz/IuTtank1HcJZXlJ/8FmwMFNeH4D1YFiB6nA=
From: "Les Ginsberg (ginsberg)" <ginsberg@cisco.com>
To: Huaimo Chen <huaimo.chen@huawei.com>, Tony Li <tony1athome@gmail.com>
CC: "lsr@ietf.org" <lsr@ietf.org>, Christian Hopps <chopps@chopps.org>, "Acee Lindem (acee)" <acee@cisco.com>
Thread-Topic: [Lsr] Moving Forward [Re: Flooding Reduction Draft Redux]
Thread-Index: AQHUuilUQNpyPXkEuUWPo/7QhgsxnKXMYBoAgAJPrgCAAAO5AIAD3POAgAvHngCAAKHCgIAGzxQAgAAC2QCAA8lLAIAARCUAgBLG1gCAAAdLAA==
Date: Tue, 05 Mar 2019 05:07:54 +0000
Message-ID: <MN2PR11MB36472B0D64D7D6838E2F56ACC1720@MN2PR11MB3647.namprd11.prod.outlook.com>
References: <sa65zu31zqk.fsf@chopps.org> <sa64l9n1yqy.fsf@chopps.org> <5316A0AB3C851246A7CA5758973207D463B3B9D8@sjceml521-mbx.china.huawei.com> <8378287F-27B9-4663-A22B-F8A2EC6C9FC3@cisco.com> <5316A0AB3C851246A7CA5758973207D463B46315@sjceml521-mbx.china.huawei.com> <f3dca967-9adc-d67f-2606-548624ceef91@cisco.com> <5316A0AB3C851246A7CA5758973207D463B4A6AB@sjceml521-mbx.china.huawei.com> <25021deb-2032-9fcc-4bf4-fcaaa21598b9@cisco.com> <5316A0AB3C851246A7CA5758973207D463B56620@sjceml521-mbx.china.huawei.com> <59b45885-3e06-e3d0-5c09-2cbee4cef341@cisco.com> <5316A0AB3C851246A7CA5758973207D463B57ED6@sjceml521-mbx.china.huawei.com> <C0CFBF7D-6314-4CF6-A58F-126151A097FE@gmail.com> <5316A0AB3C851246A7CA5758973207D463B669E8@sjceml521-mbx.china.huawei.com>
In-Reply-To: <5316A0AB3C851246A7CA5758973207D463B669E8@sjceml521-mbx.china.huawei.com>
Accept-Language: en-US
Content-Language: en-US
received-spf: None (protection.outlook.com: cisco.com does not designate permitted sender hosts)
Content-Type: multipart/alternative; boundary="_000_MN2PR11MB36472B0D64D7D6838E2F56ACC1720MN2PR11MB3647namp_"
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: 481e8c0f-eaad-49e9-c989-08d6a1288656
X-MS-Exchange-CrossTenant-originalarrivaltime: 05 Mar 2019 05:07:54.2891 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5ae1af62-9505-4097-a69a-c1553ef7840e
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR11MB3551
X-OriginatorOrg: cisco.com
X-Outbound-SMTP-Client: 173.36.7.18, xch-aln-008.cisco.com
X-Outbound-Node: rcdn-core-5.cisco.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/VCkOXFdHkM1KrSU9xw-461d7S40>
Subject: Re: [Lsr] Moving Forward [Re: Flooding Reduction Draft Redux]
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Mar 2019 05:08:04 -0000

Huaimo –

Some responses inline.

From: Lsr <lsr-bounces@ietf.org> On Behalf Of Huaimo Chen
Sent: Monday, March 04, 2019 8:16 PM
To: Tony Li <tony1athome@gmail.com>
Cc: lsr@ietf.org; Christian Hopps <chopps@chopps.org>; Acee Lindem (acee) <acee@cisco.com>
Subject: Re: [Lsr] Moving Forward [Re: Flooding Reduction Draft Redux]

Hi Tony,

>From: Tony Li [mailto:tony1athome@gmail.com]
>Sent: Thursday, February 21, 2019 12:32 AM
>To: Huaimo Chen <huaimo.chen@huawei.com<mailto:huaimo.chen@huawei.com>>
>Cc: Peter Psenak <ppsenak@cisco.com<mailto:ppsenak@cisco.com>>; Acee Lindem (acee) <acee@cisco.com<mailto:acee@cisco.com>>; Christian Hopps ><chopps@chopps.org<mailto:chopps@chopps.org>>; lsr@ietf.org<mailto:lsr@ietf.org>
>Subject: Re: [Lsr] Moving Forward [Re: Flooding Reduction Draft Redux]
>
>
>Hi Huaimo,
>
>>The way in which the flooding topology converges in the centralized mode/solution is different
>>from that in the distributed mode/solution. In the former, after receiving the link states for the failures,
>>the leader computes a new flooding topology and floods it to every other node, which receives
>>and installs the new flooding topology. The working load on every non leader node is light. It has more
>>processing power for a procedure/method for fault tolerance to failures.
>>However, in the latter, every node computes and installs a new flooding topology after receiving
>>the link states for the failures. It has less processing power for a procedure/method for fault tolerance.
>>It is better to let each of the two modes use its own procedure/method for fault tolerance to failures,
>>which is more appropriate to it.
>
>It’s true that a distributed solution will call more on an average node than a centralized
>solution will. However, that is not the steady state for either. In the
>steady state, the flooding topology has been computed and has been put in place already.
>Thus, the impact of the topology computation at the time of the
>topology change is nil.
>
>In addition, the amount of work to temporarily amend the flooding topology should also
>be minimal, and by that, I mean O(log n).  The decision should only
>be whether or not to temporarily add a link to flooding, and the only information that a node
>needs to do that is to determine if the node is already on the
>flooding topology. That should be a lookup in a tree that represents the nodes on the topology,
>and that lookup should be O(log n). In other words, it’s fast
>and efficient and not a significant drain on resources.
>

When multiple failures happen, the current flooding topology changes, the procedure for fault tolerance to failures is triggered to run, and a new flooding topology is to be computed. We need to have a converged flooding topology as soon as possible.
In the distributed solution/mode, if a procedure for fault tolerance, which is not appropriate to it, is used, then we will have a converged flooding topology in a longer time.
For example, after multiple failures occur, one procedure (in rough idea) for fault tolerance includes: 1) determine whether the current flooding topology splits, 2) compute backup paths to connect the split flooding topology, 3) enable/request the temporary flooding on the backup paths through extensions to Hello protocol. We can see that this procedure for fault tolerance takes a longer time than the algorithm computes a new flooding topology. This procedure will delay the convergence of flooding topology, which is not appropriate to the distributed solution/mode.
So it is better for the distributed solution/mode to use a procedure for fault tolerance, which is more appropriate to it.

[Les:] Given that you do not define what you think we should do I cannot comment on whatever alternative you might have in mind.

I can say that your discussion does not acknowledge that BEFORE I can compute a new flooding topology I have to make sure I know what the updated full network topology is. This is what is compromised when the old flooding topology becomes partitioned. So the first priority has to be acquiring the updated topology.

It would be useful if you replied to the thread that Tony started earlier today where he asks for input on how best to use temporary additions to the flooding topology.

One extreme (my words – not Tony’s) would be to enable flooding on all links. This clearly risks introducing a destabilizing flooding storm.

The other extreme would be to enable temporary flooding on a “minimal set of links”. This clearly risks delaying convergence.

If this topic interests you, please reply to Tony’s new thread (“Open issues with Dynamic Flooding”).

>>In the centralized solution/mode, scheduling an algorithm to compute flooding topology happens
>>only on the leader, and then on the backup leader after the leader fails. The parameters for
>>scheduling on the leader may be different from those for scheduling on the backup leader.
>>However, in the distributed solution/mode, scheduling an algorithm to compute flooding topology
>>occurs on every node. The parameters for scheduling on all the nodes need to be the same.
>
>
>Actually, that’s not true.  An implementation is free to do its own internal scheduling
>however it chooses, regardless of whether it implements a
>distributed or centralized implementation.
>
>
>>The procedure for achieving this is specific to the distributed mode/solution.
>
>More accurately, it is specific to a given implementation.
>
>
>>If every particular algorithm for computing flooding topology in the distributed solution/mode
>>describes a procedure for scheduling in details itself, there will be duplicated descriptions of
>>the same procedure in multiple algorithms, one of which is selected to compute flooding
>>topology on every node. It is better for the same scheduling procedure for multiple algorithms
>>to be described in one document.
>
>
>Actually, since the IETF should not be specifying the details of scheduling as it is an
>implementation detail, as they do not affect the behavior of the protocol, it should not be
>discussed in any documents.

In multiple vendor networks, using different implementations will create more micro routing loops during the convergence process due to discrepancies of parameters/timers for scheduling than using a same implementation. More micro routing loops will lead to more traffic lose. Service providers are already aware to use similar timers (values and behavior), but sometimes it is not possible due to limitations of implementations.
Here we come to a point whether we need to have a same scheduling procedure for a flooding topology computation algorithm to be implemented by multiple vendors. If we do not have a same scheduling procedure, then service providers will have different scheduling implementations/procedures from different vendors, which will create more micro routing loops, leading to more traffic lose. If we have a same scheduling procedure, then service providers will have the same scheduling procedure from different vendors, which will create less micro routing loops. Thus we will have less traffic lose.
We can see that there is a need to have a same scheduling procedure.

[Les:] If your concern is that we do not want one node to apply a delay of 50 ms and another node to apply a delay of 10 seconds I think we can easily agree on that. But we have many years of experience in configuring consistent SPF delay timers and I think that is applicable here as well. I don’t think this is a point of concern or controversy.

   Les

Best Regards,
Huaimo

>Regards,
>Tony

[Lsr] Flooding Reduction Draft Redux Christian Hopps
[Lsr] Moving Forward [Re: Flooding Reduction Draf… Christian Hopps
Re: [Lsr] Moving Forward [Re: Flooding Reduction … John E Drake
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Tony Li
Re: [Lsr] Moving Forward [Re: Flooding Reduction … David Allan I
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Peter Psenak
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Les Ginsberg (ginsberg)
Re: [Lsr] Flooding Reduction Draft Redux Huaimo Chen
Re: [Lsr] Flooding Reduction Draft Redux Christian Hopps
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Naiming Shen (naiming)
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Jeff Tantsura
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Huaimo Chen
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Aijun Wang
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Robert Raszuk
Re: [Lsr] Moving Forward [Re: Flooding Reduction … li zhenqiang
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Abdussalam Baryun
Re: [Lsr] Moving Forward [Re: Flooding Reduction … David Allan I
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Acee Lindem (acee)
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Huaimo Chen
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Huaimo Chen
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Robert Raszuk
Re: [Lsr] Moving Forward [Re: Flooding Reduction … tony.li
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Robert Raszuk
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Les Ginsberg (ginsberg)
Re: [Lsr] Moving Forward [Re: Flooding Reduction … David Allan I
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Les Ginsberg (ginsberg)
[Lsr] 答复: Moving Forward [Re: Flooding Reduction … Lizhenbin
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Christian Hopps
Re: [Lsr] 答复: Moving Forward [Re: Flooding Reduct… Christian Hopps
Re: [Lsr] 答复: Moving Forward [Re: Flooding Reduct… Linda Dunbar
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Acee Lindem (acee)
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Peter Psenak
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Peter Psenak
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Robert Raszuk
Re: [Lsr] 答复: Moving Forward [Re: Flooding Reduct… Dongjie (Jimmy)
Re: [Lsr] 答复: Moving Forward [Re: Flooding Reduct… Huzhibo
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Huaimo Chen
[Lsr] 回复: Re: 答复: Moving Forward [Re: Flooding Re… li zhenqiang
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Peter Psenak
Re: [Lsr] Moving Forward [Re: Flooding Reduction … John E Drake
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Acee Lindem (acee)
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Acee Lindem (acee)
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Huaimo Chen
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Peter Psenak
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Huaimo Chen
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Tony Li
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Huaimo Chen
Re: [Lsr] Moving Forward [Re: Flooding Reduction … Les Ginsberg (ginsberg)