Re: [Lsr] Flooding across a network

Gyan Mishra <hayabusagsm@gmail.com> Sun, 17 May 2020 09:07 UTC

Return-Path: <hayabusagsm@gmail.com>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AD9DC3A0F95 for <lsr@ietfa.amsl.com>; Sun, 17 May 2020 02:07:07 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.197
X-Spam-Level:
X-Spam-Status: No, score=-0.197 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8YOPvIN5yChf for <lsr@ietfa.amsl.com>; Sun, 17 May 2020 02:07:03 -0700 (PDT)
Received: from mail-io1-xd34.google.com (mail-io1-xd34.google.com [IPv6:2607:f8b0:4864:20::d34]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4275E3A0F94 for <lsr@ietf.org>; Sun, 17 May 2020 02:07:03 -0700 (PDT)
Received: by mail-io1-xd34.google.com with SMTP id f4so7301265iov.11 for <lsr@ietf.org>; Sun, 17 May 2020 02:07:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=yw6TfhzOOlBLntYo0KAgTa3pEp0YchUonv1ObFL7o6M=; b=pPcPij3hMhADf3VierEEhRO+PZ6SkASJcR06i+3E5xDja+WIdIf4/xjY08g0C0IEOL Wl4dSnIW2Y7FHkjWOM54ubvzy9amc/fWEyxKNd+r4ncuQmeBJi2zoXHqq1gyBjApasG0 E4nzTVBzeKk1clfzUOZyEPvHbgGc9d8Q1RktKmGawV1YT0zUnKC3iala/O1CtjzWQcu/ jGSHmQrEq13W+0OmMcljWXns79ZG88d0uVmVvYWd7sor0nsAtskQ37yJinmzk+UQatRy kav5Flig+Z7ShL1x8/19wxAxwh81rp6JkXAJBpMzw96kfqJfw/gQU/q+jQg/qgrmrExw u0Fg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=yw6TfhzOOlBLntYo0KAgTa3pEp0YchUonv1ObFL7o6M=; b=ucAWzkLbTNKaWmiGh7L6jI1GcKBaQnlGnyGuOjR7yq9X+G+kAYaKUEET2PmOreqqba DBoPGhT4YTLa50CmHhJA8RLa2Ox0o9lgq1BZQUTpC3o9YbAwQmsCARwxy+EmOGW1zQnh rmV9+WG1AHqf5AjYQc5wlgm9WlS7T1AqC1vE/7r1CtOUBuEbdUELrGvi2m7diCzKI9zL Q0zEmDJ63RRfP9nhm7cHlDy/FCdGmh3Ij1iiUIvjnvp8fsB4hXJ9qOy4HC3tgNsZ3egM FCSDwOFydLTMdgnmFizROB936qeToAUhbTZWiZjvbtO3n4vBpyA6JiLcmmyvls6DTio4 gBaw==
X-Gm-Message-State: AOAM530U5gire52wGHUiUKMVQapHFQFY0FQTcpXgseUxbIJn3DR8xElh O2QRo1kzhpLkhwSF4LDshgLdHCwS8C7LbHOiyt8=
X-Google-Smtp-Source: ABdhPJw/CMfTTnYyePuSrR3UZ0cNNzyGkUOqQV+cdVo+Jzfw18rb/nStFoSFB6/jRBxIW8aADPGDf1PTrGxRiyjZQlA=
X-Received: by 2002:a6b:5813:: with SMTP id m19mr9626498iob.88.1589706422089; Sun, 17 May 2020 02:07:02 -0700 (PDT)
MIME-Version: 1.0
References: <24209_1588692477_5EB185FD_24209_35_1_53C29892C857584299CBF5D05346208A48E3D455@OPEXCAUBM43.corporate.adroot.infra.ftgroup> <MW3PR11MB46198A668B9F2532BCCC38FEC1A70@MW3PR11MB4619.namprd11.prod.outlook.com> <6287_1588771252_5EB2B9B4_6287_332_1_53C29892C857584299CBF5D05346208A48E3F698@OPEXCAUBM43.corporate.adroot.infra.ftgroup> <MW3PR11MB46199CC33B10BC9D3D622D2AC1A40@MW3PR11MB4619.namprd11.prod.outlook.com> <10562_1588775602_5EB2CAB2_10562_251_11_53C29892C857584299CBF5D05346208A48E3FB63@OPEXCAUBM43.corporate.adroot.infra.ftgroup> <87CDE7F3-E08D-4C45-9AF1-9DAD635F8908@chopps.org> <9992_1588784982_5EB2EF56_9992_201_1_53C29892C857584299CBF5D05346208A48E40256@OPEXCAUBM43.corporate.adroot.infra.ftgroup> <MW3PR11MB4619015E4B356DFC225CD001C1A40@MW3PR11MB4619.namprd11.prod.outlook.com> <6544_1588843052_5EB3D22C_6544_99_1_53C29892C857584299CBF5D05346208A48E415A8@OPEXCAUBM43.corporate.adroot.infra.ftgroup> <MW3PR11MB46193B31C35D5CA02F58AF81C1A50@MW3PR11MB4619.namprd11.prod.outlook.com> <26975_1588867353_5EB43119_26975_5_1_53C29892C857584299CBF5D05346208A48E423B4@OPEXCAUBM43.corporate.adroot.infra.ftgroup>
In-Reply-To: <26975_1588867353_5EB43119_26975_5_1_53C29892C857584299CBF5D05346208A48E423B4@OPEXCAUBM43.corporate.adroot.infra.ftgroup>
From: Gyan Mishra <hayabusagsm@gmail.com>
Date: Sun, 17 May 2020 05:06:51 -0400
Message-ID: <CABNhwV1Pr-bU1zU-icFcaBTdotroV=VPkfrU4xW+jJhUAoozHA@mail.gmail.com>
To: bruno.decraene@orange.com
Cc: Christian Hopps <chopps@chopps.org>, "Les Ginsberg (ginsberg)" <ginsberg@cisco.com>, "lsr@ietf.org" <lsr@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000095fc1e05a5d45e4b"
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/q81sETnOgRU4arC8W-e8n1djV7g>
Subject: Re: [Lsr] Flooding across a network
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 17 May 2020 09:07:08 -0000

Am reading through this thread late but want to chime in on the discussion.

Router Isis
fast-flood x

The concept of ISIS fast flooding has been around for decades and is
critical for ISIS LSDB synchronization to so all nodes have all prefixes to
avoid micro loops and within the network.

The concept of fast flood feature is basically to flood the number of LSPs
that need to be flooded before starting the local SPF.  So with fast flood
enabled all nodes in the domain wait till the last LSP is flooded before it
runs its local SPF.  By fast flooding it reduces the overall number of SPFs
executed and allows the LSDB to be synchronized by all nodes in the domain.

Let’s say a link went down or came up topology change.  The router should
at a minimum flood at least that LSP that triggered the SPF before running
its local SPF.  Fast flooding is recommended by all vendors as it improves
convergence times and limits duration of LSDB inconsistencies during which
time  micro loops form.

In case where a few routers don’t have fast flood enabled, those  few
router in slow mode may have to run multiple SPFs before completely being
synchronized.  I think that question where a mix of slow and fast exist
versus all slow or all fast you may have mixed unpredictable behaviors that
you really would have to test out in a lab or live network.

In my experience I have always had all nodes running fast flood mode as the
feature has been around for a long time and recommended for optimal
convergence.

There are many other parameters with ISIS that come into play to optimize
convergence below which may vary by vendor but can improve convergence and
minimize micro loops below:

One of them is setting all P2P routed links to circuit type point-to-point
to avoid DIS election.

Both ISIS and OSPF have the iSPF incremental SPF so only the changed part
of the tree is updated and not the entire tree to save on SPF processing.

With short spf interval and spf delay the local spf May start before the
lsp that triggered the spf is flooded.

Lsp pacing can be used to speed up end to end flooding.

Hello padding TLV 8 for MTU defection to help with convergence so max
number of LSPs can be sent during flooding.

Increase lsp lifetime to maximum to reduce control traffic so cpu can be
optimized to refresh lsp’s

Reduce the frequency of periodic lsp flooding of topology to reduce link
utilization by Isis

Ignore lsp errors instead of purging them

Carful balance of event processing and throttling of events to maximize
convergence times

Prefix prioritization on critical prefixes such as MPLS FEC binding.

LFA and RLFA for pre computed backup paths

BFD for ms failure detection

Overall Isis as far as scalability far exceeds ospf with stability with
domains with a very large number of nodes in the 100s and high number of
adjacencies per node.

As far as stability and convergence both ospf and Isis are equivalent from
my experience.

Kind regards

Gyan
Verizon

On Thu, May 7, 2020 at 12:03 PM <bruno.decraene@orange.com> wrote:

> Les,
>
>
>
>
>
> *From:* Les Ginsberg (ginsberg) [mailto:ginsberg@cisco.com]
> *Sent:* Thursday, May 7, 2020 4:55 PM
> *To:* DECRAENE Bruno TGI/OLN
> *Cc:* lsr@ietf.org; Christian Hopps
> *Subject:* RE: [Lsr] Flooding across a network
>
>
>
> Bruno –
>
>
>
> I have specifically used an example where “microloop avoidance” is not
> applicable. So I did not want to use the term “microloop” but rather used
> “loop” so as not to suggest that “microloop avoidance” is a potential
> solution for the sub-optimal behavior.
>
> [Bruno] I’m not sure what you mean exactly by “microloop avoidance”.
>
> On my side, I mean “Loop avoidance using Segment Routing” [1] which _*is*_
> applicable. (Note that I’m not saying that all _*implementations*_ cover
> all cases.)
>
> [1]
> https://tools.ietf.org/html/draft-bashandy-rtgwg-segment-routing-uloop-08
>
>
>
>
>
> Hope you can appreciate that point.
>
>
>
> It would be easy enough to include more nodes in the topology which only
> support slow flooding. The end result would be the same.
>
> [Bruno] The question is whether the activation of fast flooding on one
> (some) node(s)/IGP adjacency(es) may result in delaying the LSDB
> synchronization network wide. It’s not about the number of slow nodes.
>
>
>
> You seem to assume that we have a majority of fast nodes, and some
> remaining slow nodes. In which cases I agree that in some cases the
> overall/network wide behavior may be a slow LSDB sync. But not slower than
> with slow nodes only.
>
> I have kept the example simple in the hopes we could more easily agree
> that what I describe can happen when not all nodes support faster flooding
> – which is the only point I am trying to make.
>
> [Bruno] If your point is that while we have slow nodes in the network, in
> some cases the network wide behavior can be “slow”, somewhere between “all
> nodes are fast” and “all nodes are slow”, then I agree with you.
>
>
>
> However what you said is: “when only some nodes in the network support
> faster flooding […]  it prolongs the period of LSDB inconsistency.”
>
> In that sentence, I disagree with “prolongs” which according to 2
> dictionaries means “lasting longer”.
>
> https://dictionary.cambridge.org/fr/dictionnaire/anglais/prolong
>
> https://www.merriam-webster.com/dictionary/prolong
>
>
>
> Because, at least to me, this reads as “The introduction of fast-flooding
> nodes in the network (may) increase the period of LSDB inconsistency.”
>
>
>
> Whether the ratio of Fast/Slow nodes is large or small or about the same
> doesn’t eliminate the possibility that the same behavior could be seen –
> though it might alter the location of the topology change which would be
> problematic.
>
>
>
> From an operator’s POV, I am pretty sure that what you really care about
> is whether packets get successfully forwarded or not.
>
> [Bruno] Yes, but let’s not combine multiple (complex) problems.
>
> Here I’d like we focus on LSP flooding and LSDB synchronization across the
> network. Plus this is exactly the point that you raised: “increase the
> period of LSDB inconsistency.”
>
>
>
> I am demonstrating that it isn’t safe to assume forwarding behavior will
> be optimal when not all nodes support fast flooding.
>
> [Bruno] Which is very different from your original point “when only some
> nodes in the network support faster flooding […]  it prolongs the period
> of LSDB inconsistency.”
>
>
>
> --Bruno
>
>
>
>    Les
>
>
>
>
>
> *From:* bruno.decraene@orange.com <bruno.decraene@orange.com>
> *Sent:* Thursday, May 07, 2020 2:18 AM
> *To:* Les Ginsberg (ginsberg) <ginsberg@cisco.com>
> *Cc:* lsr@ietf.org; Christian Hopps <chopps@chopps.org>
> *Subject:* RE: [Lsr] Flooding across a network
>
>
>
> Les,
>
>
>
> > From: Les Ginsberg (ginsberg) [mailto:ginsberg@cisco.com
> <ginsberg@cisco.com>]
>
> >
>
> > Bruno -
>
> >
>
> > I am sorry it has been so difficult for us to understand each other. I
> am trying my best.
>
> +1 +1
>
>
>
> > Look at it this way:
>
> >
>
> > You are the customer. 😊
>
> > I am the vendor.
>
>
>
> I'm not sure what (technical) point you are trying to make.
>
>
>
> Coming back to your statement : “when only some nodes in the network
> support faster flooding […]  it prolongs the period of LSDB inconsistency
> .”
>
> Your example is flawed on two points:
>
> - Your performance indicator is "micro loop duration", while we are
> talking about the duration of LSP flooding across the network. So the
> metric should be "LSP flooding time" (or "period of LSDB inconsistency")
>
> - Your example is about one node doing slower flooding, while we are
> interested in the case when one node support faster flooding. (It's quite
> clear that if one node is slower, the end to end flooding time may be
> longer. The point you are raising is when one node is doing faster flooding)
>
>
>
> Can you fix your example on those 2 points?
>
>
>
> Thanks
>
> --Bruno
>
>
>
>
>
> > The failure scenario I describe below happens and you notice that all
> Northbound destinations loop for 35 seconds whenever fast flooding is
> enabled.
>
> > I think you are going to complain about this - to me. 😊
>
> >
>
> > And I am going to tell you that this is a consequence of enabling fast
> flooding in the presence of a node which does not support it. Your options
> to reduce the period of looping will be:
>
> >
>
> > 1)Upgrade the slow node to support faster flooding
>
> > 2)Disable fast flooding
>
> > 3)Redesign your network
>
> >
>
>     > Les
>
> >
>
> > > -----Original Message-----
>
> > > From: bruno.decraene@orange.com <bruno.decraene@orange.com>
>
> > > Sent: Wednesday, May 06, 2020 10:10 AM
>
> > > To: Christian Hopps <chopps@chopps.org>
>
> > > Cc: Les Ginsberg (ginsberg) <ginsberg@cisco.com>om>; lsr@ietf.org
>
> > > Subject: RE: [Lsr] Flooding across a network
>
> > >
>
> > > > From: Christian Hopps [mailto:chopps@chopps.org <chopps@chopps.org>]
>
> > > >
>
> > > > Bruno persistence has made me realize something fundamental here.
>
> > > >
>
> > > > The minute the LSP originator changes the LSP and floods it you have
> LSDB
>
> > > inconsistency.
>
> > >
>
> > > Exactly my point. Thank you Chris.
>
> > > I would even say: "The minute the LSP originator changes the LSP then
> you
>
> > > have LSDB inconsistency." But no big deal if there is disagreement on
> this
>
> > > detail.
>
> > >
>
> > > > That is going to last until the last node in the network has updated
> it's LSDB.
>
> > >
>
> > > Absolutely.
>
> > > So the faster we flood, the shorter the LSBD inconsistency.
>
> > >
>
> > > Now IMO, even if a single/few nodes flood faster, there is a chance of
>
> > > shortening the LSDB inconsistency. But in all cases, I don't see how
> this could
>
> > > make the LSDB inconsistency longer.
>
> > >
>
> > >
>
> > > > Les is pointing out that LSDB inconsistency can be bad in certain
>
> > > circumstances e.g., if a critical node is slow and thus inconsistent.
>
> > > >
>
> > > > I believe the right way to fix this is a simple one, help the
> operator flag the
>
> > > broken router software/hardware for replacement, but otherwise IS-IS
>
> > > should just try to do the best job it can do to which is to flood
> around the
>
> > > problem (i.e., flood as optimally as possible).
>
> > >
>
> > > +1
>
> > > On a side note, I would not call a router flooding slowly as "broken".
> I find it
>
> > > understandable that in a given network there are different type of
> routers
>
> > > (core vs aggregation), different roles (P having 50 IGP adjacencies
> with 50 PEs
>
> > > vs PE having only 2 IGP adjacencies with 2 P), different hardware
>
> > > generations, different software, different vendors with different
>
> > > perspectives/markets.
>
> > >
>
> > > Thank you Chris.
>
> > >
>
> > > --Bruno
>
> > > >
>
> > > > Thanks,
>
> > > > Chris.
>
> > > > [as WG member]
>
> > > >
>
> > > >
>
> > > > > On May 6, 2020, at 10:33 AM, bruno.decraene@orange.com wrote:
>
> > > > >
>
> > > > > Les,
>
> > > > >
>
> > > > > From: Les Ginsberg (ginsberg) [mailto:ginsberg@cisco.com
> <ginsberg@cisco.com>]
>
> > > > > Sent: Wednesday, May 6, 2020 4:14 PM
>
> > > > > To: DECRAENE Bruno TGI/OLN
>
> > > > > Cc: lsr@ietf.org
>
> > > > > Subject: RE: Flooding across a network
>
> > > > >
>
> > > > > Bruno –
>
> > > > >
>
> > > > > I am somewhat at a loss to understand your comments.
>
> > > > > The example is straightforward and does not need to consider FIB
> update
>
> > > time nor the ordering of prefix updates on different nodes.
>
> > > > > [Bruno] The example is straightforward but you are referring to
> FIB and IP
>
> > > packets forwarding as per those FIBs.
>
> > > > > I’d like we focus on LSP flooding and LSDB consistency.
>
> > > > >
>
> > > > > Consider the state of Node B and Node D at various time points
> from the
>
> > > trigger event.
>
> > > > >
>
> > > > > T+ 2 seconds:
>
> > > > > -----------------
>
> > > > > B has received all LSP Updates. It triggers an SPF and for all
> Northbound
>
> > > destinations previously reachable via C it installs paths via D.
>
> > > > > Let’s assume it take 5 seconds to update the forwarding plane.
>
> > > > >
>
> > > > > D has received 40 of the 1000 LSP updates. It triggers an SPF and
> finds
>
> > > that all Northbound destinations are reachable via B-C. It makes no
> changes
>
> > > to the forwarding plane.
>
> > > > >
>
> > > > > T+7 seconds
>
> > > > > -----------------
>
> > > > > B has completed FIB updates. Traffic to all Northbound
> destinations is
>
> > > being forwarded via D.
>
> > > > >
>
> > > > > D has now received 140 of the 1000 LSP updates. Entries in its
> forwarding
>
> > > plane for Northbound destinations still point to B.
>
> > > > >
>
> > > > > We have a loop.
>
> > > > >
>
> > > > > T + 30 seconds
>
> > > > > --------------------
>
> > > > > D has now received 600 of the 1000 LSP updates. Still no changes
> to its
>
> > > forwarding plane.
>
> > > > > Traffic to Northbound destinations is still looping.
>
> > > > >
>
> > > > > T+ 50 seconds
>
> > > > > -------------------
>
> > > > > D has finally received all 1000 LSP updates..
>
> > > > > It triggers (another) SPF and calculates paths to Northbound
> destinations
>
> > > via E. It begins to update its forwarding plane.
>
> > > > > Let’s assume this will take 5 seconds..
>
> > > > >
>
> > > > > T + 55 seconds
>
> > > > > --------------------
>
> > > > > D has completed forwarding plane updates – no more looping.
>
> > > > >
>
> > > > > That is all I am trying to illustrate.
>
> > > > >
>
> > > > > If you want to start arguing that node protecting LFAs + microloop
>
> > > avoidance could help (NOTE I explicitly  took those out of the example
> for
>
> > > simplicity) – it is easy enough to change the example to include
> multiple node
>
> > > failures or a node failure plus some northbound link failures on other
> nodes.
>
> > > > > [Bruno] I’m not talking about LFA/FRR. And with regards to
> microloops
>
> > > avoidance, some algorithms can handle any graph transition so including
>
> > > multiple node failures.
>
> > > > >
>
> > > > > But again, let’s stick to LSP flooding and LSDB consistency. (you
> are the
>
> > > one speaking about microloops in the forwarding plane).
>
> > > > >
>
> > > > > The point here is to look at the impact of long-lived LSDB
> inconsistency
>
> > > which results when some nodes support flooding an order of magnitude
>
> > > faster flooding than other nodes – which is what you asked me to
> clarify.
>
> > > > > [Bruno] No. I asked you to clarify why having a node with faster
> flooding
>
> > > could prolongs the period of LSDB inconsistency.
>
> > > > >
>
> > > > > Again, with you own words: “when only some nodes in the network
>
> > > support faster flooding the behavior of the whole network may not be
>
> > > "better" when faster flooding is enabled because it prolongs the
> period of
>
> > > LSDB inconsistency.”
>
> > > > > And with less words: “when only some nodes in the network support
>
> > > faster flooding […]  it prolongs the period of LSDB inconsistency.”
>
> > > > >
>
> > > > > --Bruno
>
> > > > >
>
> > > > >    Les
>
> > > > >
>
> > > > >
>
> > > > >
>
> > > > > From: bruno.decraene@orange.com <bruno.decraene@orange.com>
>
> > > > > Sent: Wednesday, May 06, 2020 6:21 AM
>
> > > > > To: Les Ginsberg (ginsberg) <ginsberg@cisco.com>
>
> > > > > Cc: lsr@ietf.org
>
> > > > > Subject: RE: Flooding across a network
>
> > > > >
>
> > > > > Les,
>
> > > > >
>
> > > > > From: Les Ginsberg (ginsberg) [mailto:ginsberg@cisco.com
> <ginsberg@cisco.com>]
>
> > > > > Sent: Wednesday, May 6, 2020 1:35 AM
>
> > > > > To: DECRAENE Bruno TGI/OLN; lsr@ietf..org
>
> > > > > Subject: RE: Flooding across a network
>
> > > > >
>
> > > > > Bruno -
>
> > > > >
>
> > > > > Seems like it was not too long ago that we were discussing this in
> person.
>
> > > Ahhh...the good old days...
>
> > > > > [Bruno] Indeed, may be not to the point of concluding. Indeed.
>
> > > > >
>
> > > > > First, let's agree that the interesting case does not involve 1 or
> even a
>
> > > small number of LSPs. For those cases flooding speed does not matter.
>
> > > > > The interesting cases involve a large number of LSPs (hundreds or
>
> > > thousands). And in such cases LFA/microloop avoidance techniques are
> not
>
> > > applicable.
>
> > > > >
>
> > > > > Take the following simple topology:
>
> > > > >
>
> > > > >    |  | ... |            |
>
> > > > >      +---+             +---+
>
> > > > >      | C |             | E |
>
> > > > >      +---+             +---+
>
> > > > >        |                 | 1000
>
> > > > >      +---+             +---+
>
> > > > >      | B |-------------| D |
>
> > > > >      +---+   1000      +---+
>
> > > > >        |                 |
>
> > > > >        |                 |
>
> > > > >         \               /
>
> > > > >          \            /
>
> > > > >           \         /
>
> > > > >            \      /
>
> > > > >              +---+
>
> > > > >              | A |
>
> > > > >              +---+
>
> > > > >
>
> > > > > There is a topology northbound of C and E (not shown) and a
> topology
>
> > > southbound of A (not shown).
>
> > > > > Cost on all links is 10 except B-D and D-E where cost is high.
>
> > > > >
>
> > > > > C is a node with 1000 neighbors.
>
> > > > > When all links are up, shortest path for all northbound
> destinations is via
>
> > > C.
>
> > > > > All nodes in the network support fast flooding except for Node D.
>
> > > > > Let’s say fast flooding is 500 LSPs/second and slow flooding (Node
> D) is 20
>
> > > LSPs/seconds.
>
> > > > > If  Node C fails we have 1000 LSPs to flood.
>
> > > > > All nodes except for D can receive these in 2 seconds (plus
> internode
>
> > > delay time).
>
> > > > > D can receive LSPs in 50 seconds.
>
> > > > >
>
> > > > > [Bruno] Thanks for your example. Agreed so far.
>
> > > > >
>
> > > > > When A and B and all southbound nodes receive/process the LSP
>
> > > updates they will start sending traffic to Northbound destinations via
> D.
>
> > > > > But for the better part of 50 seconds, Node D has yet to receive
> all LSP
>
> > > updates and still believes that shortest path is via B-C. It will loop
> traffic.
>
> > > > >
>
> > > > > [Bruno] May I remind you that we are discussing IS-IS flooding in
> order to
>
> > > sync LSDB (LSP database). That is already a big enough subject. It
> does not
>
> > > including FIB (updates), nor IP forwarding.
>
> > > > >
>
> > > > > Quoting you “when only some nodes in the network support faster
>
> > > flooding the behavior of the whole network may not be "better" when
> faster
>
> > > flooding is enabled because it prolongs the period of LSDB
> inconsistency.”
>
> > > > >
>
> > > > > Taking your own examples, in both cases (all nodes support fast
> flooding;
>
> > > all nodes but D support fast flooding) the period of LSDB
> inconsistency is 50
>
> > > seconds. Hence this example does not illustrate your statement.
>
> > > > >
>
> > > > > Hence I’m restating my questions:
>
> > > > >
>
> > > > > > > when only some nodes in the network support faster flooding the
>
> > > behavior
>
> > > > > > of the whole network may not be "better" when faster flooding is
>
> > > enabled
>
> > > > > > because it prolongs the period of LSDB inconsistency.
>
> > > > > >
>
> > > > > > 1) Do you have data on this?
>
> > > > > >
>
> > > > > > 2) If not, can you provide an example where increasing the
> flooding
>
> > > rate on
>
> > > > > > one adjacency prolongs the period of LSDB inconsistency across
> the
>
> > > > > > network?
>
> > > > >
>
> > > > >
>
> > > > > Had all nodes used slow flooding, it still would have taken 50
> seconds to
>
> > > converge, but there would be significantly less looping. There could
> be a
>
> > > good amount of blackholing, but this is preferable to looping.
>
> > > > > [Bruno] You are using an example where ordering FIB updates across
> the
>
> > > network, e.g. as per [1], allows to reduce _FIB_ inconsistency across
> the
>
> > > path/network. And you seem to conclude from this that this translates
> to
>
> > > LSDB update ordering. Those are two different things. In this thread,
> I’d
>
> > > suggest that we focus on IGP flooding and LSDB sync only. (*)
>
> > > > > [1] https://tools.ietf.org/html/rfc6976
>
> > > > > (*) We can discuss loop free IGP converge in a different thread if
> you
>
> > > want. IMO, the use of segment routing/source routing is better than
> oFIB.
>
> > > But at some point, it still relies on fast flooding when multiple LSPs
> are
>
> > > involved. (and I mean _fast_ not _ordered_)
>
> > > > >
>
> > > > > --Bruno
>
> > > > >
>
> > > > > One can always come up with examples – based on a specific topology
>
> > > and a specific failure - where things might be better/worse/unchanged
> in the
>
> > > face of inconsistent flooding speed support.
>
> > > > > But I hope this simple example illustrates the pitfalls.
>
> > > > >
>
> > > > >     Les
>
> > > > >
>
> > > > > > -----Original Message-----
>
> > > > > > From: bruno.decraene@orange.com <bruno.decraene@orange.com>
>
> > > > > > Sent: Tuesday, May 05, 2020 8:28 AM
>
> > > > > > To: Les Ginsberg (ginsberg) <ginsberg@cisco.com>om>; lsr@ietf.org
>
> > > > > > Subject: Flooding across a network
>
> > > > > >
>
> > > > > > Les,
>
> > > > > >
>
> > > > > > > From: Lsr [mailto:lsr-bounces@ietf.org <lsr-bounces@ietf.org>]
> On Behalf Of Les Ginsberg
>
> > > > > > (ginsberg)
>
> > > > > > > Sent: Monday, May 4, 2020 4:39 PM
>
> > > > > > [...]
>
> > > > > > > when only some nodes in the network support faster flooding the
>
> > > behavior
>
> > > > > > of the whole network may not be "better" when faster flooding is
>
> > > enabled
>
> > > > > > because it prolongs the period of LSDB inconsistency.
>
> > > > > >
>
> > > > > > 1) Do you have data on this?
>
> > > > > >
>
> > > > > > 2) If not, can you provide an example where increasing the
> flooding
>
> > > rate on
>
> > > > > > one adjacency prolongs the period of LSDB inconsistency across
> the
>
> > > > > > network?
>
> > > > > >
>
> > > > > > 3) In the meantime, let's try the theoretical analysis on a
> simple
>
> > > scenario
>
> > > > > > where a single LSP needs to be flooded across the network.
>
> > > > > >
>
> > > > > > - Let's call Dij the time needed to flood the LSP from node i to
> the
>
> > > adjacent
>
> > > > > > node j. Clearly Dij>0.
>
> > > > > > - Let's call k the node originating this LSP at t0=0s
>
> > > > > >
>
> > > > > > >From t0, the LSDB is inconsistent across the network as all
> nodes but k
>
> > > are
>
> > > > > > missing the LSP and hence only know about the 'old' topology.
>
> > > > > >
>
> > > > > > Let's call  SPT(k) the SPT rooted on k, using Dij as the metric
> between
>
> > > > > > adjacent nodes i and j. Let's call SP(k,i) the shortest path
> from k to i; and
>
> > > > > > D(k,i) the shortest distance between k and i.
>
> > > > > >
>
> > > > > > It seems that the time needed:
>
> > > > > > - for node j to learn about the LSP, and get in sync with k, is
> D(k,j)
>
> > > > >
>
-- 

Gyan  Mishra

Network Engineering & Technology

Verizon

Silver Spring, MD 20904

Phone: 301 502-1347

Email: gyan.s.mishra@verizon.com