Re: [Lsr] Dynamic flow control for flooding

Robert,

Nothing has changed about the probability of network partitioning. That was simply a use case selected to motivate the discussion about flooding speed.

The entire discussion is almost orthogonal to dynamic flooding.  Let’s please take that out of the discussion.

Tony

> On Jul 24, 2019, at 7:38 AM, Robert Raszuk <robert@raszuk.net> wrote:
> 
> Hi,
> 
> Yes indeed while I was reading your richly connected node restart problem use of overload-bit should be explored, proposed, implemented. 
> 
> For the partition problem I have two general comments: 
> 
> a) If network partitions is likely to happen more often in the case of dynamic flooding perhaps as already said before we should increase the max number of occurrences given LSP is to arrive at flooding optimized node. Two may not be enough.
> 
> b)  If protocol extensions will help to mitigate effects of network partition via much faster repair some folks may treat network partitions as normal operational model and instead of re-architecting the network to make sure network partition events are as rear as possible. 
> 
> Thx,
> R.
> 
> On Wed, Jul 24, 2019 at 4:12 PM Henk Smit <henk.ietf@xs4all.nl <mailto:henk.ietf@xs4all.nl>> wrote:
> 
> Hello Robert,
> 
> Tony brought up the example of a partioned network.
> But there are more examples.
> 
> E.g. in a network there is a router with a 1000 neighbors.
> (When discussing distributed vs centralized flooding-topology
>   reduction algorithms, I've been told these network designs exist).
> When such a router reboots/crashes/comes back up, all 1000 neighbors
> will create a new version of their own LSP. This causes a 1000 different
> LSPs to be flooded through the network at the same time. Impacting every
> router in the network.
> 
> The case I was thinking of myself, was when a router in a large network
> boots. When it brings up a number of adjacencies, each neighbor will
> try to synchronize its LSPDB with the newly booted router. As the newly
> booted router will send emtpy CSNPs to each of its neighbors, each
> neighbor will start sending the full LSPDB. If such a network has 10k
> LSPs, and such a router has 100 neighbors, that router will receive 100 
> * 10k
> is 1 million LSPs. Having a faster and more efficient flooding 
> transport,
> with flow-control, will make a reboot in such a topology less painful.
> 
> (In that last case, creative use of the overload-bit could prevent 
> black-holing
> or microloops while ISIS synchronizes its LSPDB after a reboot. Just 
> like we
> used the overload-bit to solve the problem of slow convergence of BGP 
> after
> a reboot, 22 years ago. I have no idea if there are any implementations 
> that
> use the overload-bit to alleviate slow convergence of IS-IS after a 
> reboot).
> 
> henk.
> 
> 
> Robert Raszuk schreef op 2019-07-24 15:33:
> > Hey Henk & all,
> > 
> > If acks for 1000 LSPs take 16 PSNPs (max 66 per PSNP) or even as long
> > as Tony mentioned the full flooding as Tony said may take 33 sec - is
> > this really a problem ?
> > 
> > Remember we are not talking about protocol convergence after link flap
> > or node going down. We are talking about serious network partitioning
> > which itself may have lasted for minutes, hours or days. While just
> > considering absolute numbers yelds desire to go faster and faster, if
> > we put things in the overall perspective is there really a problem to
> > be solved in the first place ?
> > 
> > Would there still be a problem if LSR WG recommends faster acking
> > maybe not for each LSP but for say 20 or 30 max ?
> > 
> > Thx,
> > R.
> 
> _______________________________________________
> Lsr mailing list
> Lsr@ietf.org <mailto:Lsr@ietf.org>
> https://www.ietf.org/mailman/listinfo/lsr <https://www.ietf.org/mailman/listinfo/lsr>