Re: [Lsr] LSR Flooding Reduction Drafts - Moving Forward

Huaimo Chen <huaimo.chen@huawei.com> Fri, 24 August 2018 17:33 UTC

Return-Path: <huaimo.chen@huawei.com>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6A683130E19 for <lsr@ietfa.amsl.com>; Fri, 24 Aug 2018 10:33:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001] autolearn=unavailable autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xKCpZJgQE-w0 for <lsr@ietfa.amsl.com>; Fri, 24 Aug 2018 10:33:55 -0700 (PDT)
Received: from huawei.com (lhrrgout.huawei.com [185.176.76.210]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AC0B9130DE4 for <lsr@ietf.org>; Fri, 24 Aug 2018 10:33:55 -0700 (PDT)
Received: from lhreml703-cah.china.huawei.com (unknown [172.18.7.107]) by Forcepoint Email with ESMTP id EC355E3F28CED for <lsr@ietf.org>; Fri, 24 Aug 2018 18:33:48 +0100 (IST)
Received: from SJCEML703-CHM.china.huawei.com (10.208.112.39) by lhreml703-cah.china.huawei.com (10.201.108.44) with Microsoft SMTP Server (TLS) id 14.3.399.0; Fri, 24 Aug 2018 18:33:27 +0100
Received: from SJCEML521-MBX.china.huawei.com ([169.254.1.176]) by SJCEML703-CHM.china.huawei.com ([169.254.5.239]) with mapi id 14.03.0399.000; Fri, 24 Aug 2018 10:33:23 -0700
From: Huaimo Chen <huaimo.chen@huawei.com>
To: "tony.li@tony.li" <tony.li@tony.li>, Tony Przygienda <tonysietf@gmail.com>
CC: "lsr@ietf.org" <lsr@ietf.org>, Jeff Tantsura <jefftant.ietf@gmail.com>, Peter Psenak <ppsenak@cisco.com>, "Acee Lindem (acee)" <acee=40cisco.com@dmarc.ietf.org>
Thread-Topic: [Lsr] LSR Flooding Reduction Drafts - Moving Forward
Thread-Index: AQHUOZzx1fG+ry0N80OZf0oWOKvmqqTMbEsAgAAEEgCAARKmgIAA6SMAgACpmYCAAGOpgIAACP4AgAAIsACAAAYrAP//l3GQ
Date: Fri, 24 Aug 2018 17:33:23 +0000
Message-ID: <5316A0AB3C851246A7CA5758973207D463ABCF95@sjceml521-mbx.china.huawei.com>
References: <8F5D2891-2DD1-4E51-9617-C30FF716E9FB@cisco.com> <C64E476F-1C00-435E-9C74-BEC3053377E8@gmail.com> <2F5FDB3F-ADCA-4DB4-83DA-D2BC3129D2F2@gmail.com> <5B7E78DD.90302@cisco.com> <172728E8-49E6-4F43-9356-815E1F4C22E7@gmail.com> <5B7FCAB3.6040600@cisco.com> <3D1DEC37-ACE7-4412-BB2E-4C441A4E7455@tony.li> <CCF220A3-8308-47B8-8CC6-1989705FF05C@cisco.com> <CA+wi2hNv8AVyR81LRmJ=Pd5_p5rS2djCOjY9YDgKxG=KEO_MkA@mail.gmail.com> <39509D13-4D2D-49A9-8738-C9D1F7C54223@tony.li>
In-Reply-To: <39509D13-4D2D-49A9-8738-C9D1F7C54223@tony.li>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.212.247.125]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-CFilter-Loop: Reflected
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/5etLrSKfgTieNmCuF-rGn3lcdGE>
Subject: Re: [Lsr] LSR Flooding Reduction Drafts - Moving Forward
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 24 Aug 2018 17:33:57 -0000

I think distributed is more practical too. 
For computing routes, we have been using distributed SPF on every node for many years.
In fact, we may not need to run the exact algorithm on every node. As long as the algorithms running on different nodes generate the same result, that would work. 

Best Regards,
Huaimo
-----Original Message-----
From: Lsr [mailto:lsr-bounces@ietf.org] On Behalf Of tony.li@tony.li
Sent: Friday, August 24, 2018 12:29 PM
To: Tony Przygienda <tonysietf@gmail.com>
Cc: lsr@ietf.org; Jeff Tantsura <jefftant.ietf@gmail.com>; Peter Psenak <ppsenak@cisco.com>; Acee Lindem (acee) <acee=40cisco.com@dmarc.ietf.org>
Subject: Re: [Lsr] LSR Flooding Reduction Drafts - Moving Forward

> a) we are talking any kind of topology for the solution, i.e. generic graph? 


Well, the problem with a topology restriction is that mistakes happen.  If we have a solution for a pure bipartite graph (i.e., a leaf-spine topology) and someone mistakenly inserts a leaf to leaf link, what happens?  Having the entire DC implode would be a Bad Thing, IMHO.


> and then suggestion for IME realistic, operational MUST requirements 
> 
> b) req a): the solution should support distributed and centralized algorithm to compute/signal reduced mesh(es). I personally think distributed is the more practical choice for something like this but it's my 2c from having lived the telephony controller fashion, the distributed fashion and the controller fashion now again ;-)


Well, I did think long and hard about this.

Being distributed would be very nice.  However, that implies that all nodes are going to get to the exact same solution. Which implies that they all must execute the same algorithm, presumably with the same inputs.

That’s all well and good, but we don’t have an algorithm to really put on the table yet.  We need experience with one.  We know we want to tweak things based on biconnectivity, performance, and degree because doing it right day one seems unlikely.  Changing algorithms is going to be VERY painful if it’s distributed.  

However, if it’s centralized, it’s completely trivial.

So, my strong preference is to start centralized.  Iterate on the algorithm until we have it where we want it.  And then take it distributed if there’s a point to it.  However, at that point, we have something working.  So why fix it?


> c) req b): the solution should include redundancy (i.e. @ least 2 maximally disjoint vertex covers/lifts) to deal with single link failure (unless the link is unavoidably a minimal cut on the graph) 


Not everyone agrees with this, but some do.  This seems like one possible input to the algorithm.


> d) req c): the solution should guarantee disruption free flooding in case of 
>   i) single link failure
>  ii) single node failure
>  iii) change in one of the vertex lifts 


Sorry, I don’t understand point iii).


> e) the solution should not lead to "hot-spot" or "minimal-cut" links which will disrupt flooding between two partitions on failure or lead to flood throughput bottlenecks 


Agreed.

> I am agnostic to Tony L's thinking about diameter and so on. It makes sense but is not necessarily easy to pull into the solution. 


It all boils down to the point that Peter just made about performance.  A topology with a high diameter is going to require many flooding hops and hurt performance.  To be avoided...


> moreover, I observe that IME ISIS is much more robust under such optimizations since the CSNPs catch (@ a somehow ugly delay cost) any corner cases whereas OSPF after IDBE will happily stay out of sync forever if flooding skips something (that may actually become a reason to introduce periodic stuff on OSPF to do CSNP equivalent albeit it won't be trivial in backwards compatible way on my first thought, I was thinking a bit about cut/snapshot consistency check [in practical terms OSPF hellos could carry a checksum on the DB headers] but we never have that luxury on a link-state routing protocol [i.e. we always operate under unbounded epsilon consistency ;-) and in case of BGP stable oscialltions BTW not even that ;^} ]). 

Emacs

Tony


_______________________________________________
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr