Re: [Lsr] Flooding across a network

Tony Przygienda <tonysietf@gmail.com> Wed, 06 May 2020 18:33 UTC

Return-Path: <tonysietf@gmail.com>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F259E3A0975 for <lsr@ietfa.amsl.com>; Wed, 6 May 2020 11:33:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MFOzwZBSHXx9 for <lsr@ietfa.amsl.com>; Wed, 6 May 2020 11:33:08 -0700 (PDT)
Received: from mail-io1-xd30.google.com (mail-io1-xd30.google.com [IPv6:2607:f8b0:4864:20::d30]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9AFDC3A0522 for <lsr@ietf.org>; Wed, 6 May 2020 11:33:08 -0700 (PDT)
Received: by mail-io1-xd30.google.com with SMTP id k6so1493554iob.3 for <lsr@ietf.org>; Wed, 06 May 2020 11:33:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=roGOsPbAXVERrLSxXPQVxzNmym6Tr8pMGsrkAe7C3Nc=; b=DYaCJT6EQPgK/VfjURaPNjPAqTqXRjS25eUPrxh19WoYIFyrgJzQMHQsdpxDLxyQ0i L9e9lwVMZgWdxMuhLfcZGc9GA4IAId2TOtUnk9i4GocSQVuY/eFllhwhny34I/AdUzD1 cqDrHD6Vz+Of6huwYEJczo12qjpJF+CCTuSWk+HEAQQS44h26zABNQJ4vfuA3ZhpnVEk r4Z128qcLL9RNIk/VSmI1CErKM8nw/xDuLxWIVTrVFV5YWtwnmmDqYonyWmI+I0tYe5t WuOpGtKkZaz+nytwrDZ4ER/AHIqBxD1sACAwaC4ginhXLCTh1BfZue71fSNEy89svX10 cIbQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=roGOsPbAXVERrLSxXPQVxzNmym6Tr8pMGsrkAe7C3Nc=; b=O3Liywgs/zWuzM6mTmbuiLLQazgNPJ3ACvoLaorzmInHnQyttZaXVXsJ1MAEMuQArb w91u2As3kyV7uUxoe5tFDQCMmIF4UVS2LICgnL2SWl1Uc4lJf6Mdh0sZ40PpFzZ2z1qu YqDI/iYsldmMEFyxl8hu/fylNcAjojbB+7zxkaMhf8geZ6nFc6PWwDrMHAfcEVKvPiWg EDjiNBC2bCg3aW6/zp6i2QHk2Bxnpnzs3djbsJKd+Vz/Fe3PZYQe2fhA+KormQfKj6un UsoYUdY+ofV2Ozf0zhQeNrHC29AZ7Kpb0C7DW5FbzXWotmbsC5ZLmtQ70kLGEBoxtbXR O1eg==
X-Gm-Message-State: AGi0PuacVjYsmblM5NUTI79/WtTBzNnmN7uJnmTxhX+ZBs9k5jFN9tb3 +/4bP4Vros6Z9/j3lP73ExD8TepCHlhbfhnAhHA+2a9A
X-Google-Smtp-Source: APiQypKgkXun43IWXPuIvl/MzhuPr7n+OjtQZGZoJqoDmVmH/ZGL+G+MCyIfvtkONaUjXNvso62TV2HtL6w7w3MPBJM=
X-Received: by 2002:a5d:8986:: with SMTP id m6mr9716875iol.174.1588789987656; Wed, 06 May 2020 11:33:07 -0700 (PDT)
MIME-Version: 1.0
References: <24209_1588692477_5EB185FD_24209_35_1_53C29892C857584299CBF5D05346208A48E3D455@OPEXCAUBM43.corporate.adroot.infra.ftgroup> <MW3PR11MB46198A668B9F2532BCCC38FEC1A70@MW3PR11MB4619.namprd11.prod.outlook.com> <6287_1588771252_5EB2B9B4_6287_332_1_53C29892C857584299CBF5D05346208A48E3F698@OPEXCAUBM43.corporate.adroot.infra.ftgroup> <MW3PR11MB46199CC33B10BC9D3D622D2AC1A40@MW3PR11MB4619.namprd11.prod.outlook.com> <10562_1588775602_5EB2CAB2_10562_251_11_53C29892C857584299CBF5D05346208A48E3FB63@OPEXCAUBM43.corporate.adroot.infra.ftgroup> <87CDE7F3-E08D-4C45-9AF1-9DAD635F8908@chopps.org> <9992_1588784982_5EB2EF56_9992_201_1_53C29892C857584299CBF5D05346208A48E40256@OPEXCAUBM43.corporate.adroot.infra.ftgroup> <MW3PR11MB4619015E4B356DFC225CD001C1A40@MW3PR11MB4619.namprd11.prod.outlook.com> <8f25568b-cb57-7714-1e16-71c257aae0b2@joelhalpern.com>
In-Reply-To: <8f25568b-cb57-7714-1e16-71c257aae0b2@joelhalpern.com>
From: Tony Przygienda <tonysietf@gmail.com>
Date: Wed, 06 May 2020 11:31:44 -0700
Message-ID: <CA+wi2hPyPocYiKf741_xdL4UDqK84TjLYAYyK4tNp3tyoEZ=jw@mail.gmail.com>
To: "Joel M. Halpern" <jmh@joelhalpern.com>
Cc: "Les Ginsberg (ginsberg)" <ginsberg=40cisco.com@dmarc.ietf.org>, "lsr@ietf.org" <lsr@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000d649e305a4fefe60"
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/SgGlS1UW8FriMoC9P3YEIQVbk68>
Subject: Re: [Lsr] Flooding across a network
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 06 May 2020 18:33:16 -0000

This is not a simple let's-build-a-better-mechanism problem, this is an
epistemology problem and uneven information diffusion cannot be fixed by
magic when dealing with total distributed computation. Traditional
Link-state basically only works because we assume an epsilon consistently
(correct term for "eventual" consistency) with very small epsilon across
the whole topology. It used that epsilon in seconds was good, now epsilon
in hundreds of milliseconds is perceived problematic. If we start to build
things where the gradient varies across the network we will have unintended
consequences of distributed computation Les describes that cannot be wished
away so we better acknowledge them (diffusion is somewhat better unevent
gradient scenarios but @ cost of slower convergence under normal condition
pi times eye measure ;-).

A possible partial remediation to the problem is to prioritize flooding of
topology information over reachability (prefixes) but alas, in ISIS that's
not how the packets are laid out.

back to shelling peanuts ;-)

--- tony

On Wed, May 6, 2020 at 11:19 AM Joel M. Halpern <jmh@joelhalpern.com> wrote:

> Les, maybe I am missing your point, but it sounds like what you are
> asking for is a (better?) version of the micro-loop prevention work, so
> as to mitigate the interaction between inconsistent convergence and
> fast-reroute?
>
> Yours,
> Joel
>
> On 5/6/2020 1:53 PM, Les Ginsberg (ginsberg) wrote:
> > Bruno -
> >
> > I am sorry it has been so difficult for us to understand each other. I
> am trying my best.
> >
> > Look at it this way:
> >
> > You are the customer. 😊
> > I am the vendor.
> >
> > The failure scenario I describe below happens and you notice that all
> Northbound destinations loop for 35 seconds whenever fast flooding is
> enabled.
> > I think you are going to complain about this - to me. 😊
> >
> > And I am going to tell you that this is a consequence of enabling fast
> flooding in the presence of a node which does not support it. Your options
> to reduce the period of looping will be:
> >
> > 1)Upgrade the slow node to support faster flooding
> > 2)Disable fast flooding
> > 3)Redesign your network
> >
> >      Les
> >
> >> -----Original Message-----
> >> From: bruno.decraene@orange.com <bruno.decraene@orange.com>
> >> Sent: Wednesday, May 06, 2020 10:10 AM
> >> To: Christian Hopps <chopps@chopps.org>
> >> Cc: Les Ginsberg (ginsberg) <ginsberg@cisco.com>; lsr@ietf.org
> >> Subject: RE: [Lsr] Flooding across a network
> >>
> >>> From: Christian Hopps [mailto:chopps@chopps.org]
> >>>
> >>> Bruno persistence has made me realize something fundamental here.
> >>>
> >>> The minute the LSP originator changes the LSP and floods it you have
> LSDB
> >> inconsistency.
> >>
> >> Exactly my point. Thank you Chris.
> >> I would even say: "The minute the LSP originator changes the LSP then
> you
> >> have LSDB inconsistency." But no big deal if there is disagreement on
> this
> >> detail.
> >>
> >>> That is going to last until the last node in the network has updated
> it's LSDB.
> >>
> >> Absolutely.
> >> So the faster we flood, the shorter the LSBD inconsistency.
> >>
> >> Now IMO, even if a single/few nodes flood faster, there is a chance of
> >> shortening the LSDB inconsistency. But in all cases, I don't see how
> this could
> >> make the LSDB inconsistency longer.
> >>
> >>
> >>> Les is pointing out that LSDB inconsistency can be bad in certain
> >> circumstances e.g., if a critical node is slow and thus inconsistent.
> >>>
> >>> I believe the right way to fix this is a simple one, help the operator
> flag the
> >> broken router software/hardware for replacement, but otherwise IS-IS
> >> should just try to do the best job it can do to which is to flood
> around the
> >> problem (i.e., flood as optimally as possible).
> >>
> >> +1
> >> On a side note, I would not call a router flooding slowly as "broken".
> I find it
> >> understandable that in a given network there are different type of
> routers
> >> (core vs aggregation), different roles (P having 50 IGP adjacencies
> with 50 PEs
> >> vs PE having only 2 IGP adjacencies with 2 P), different hardware
> >> generations, different software, different vendors with different
> >> perspectives/markets.
> >>
> >> Thank you Chris.
> >>
> >> --Bruno
> >>>
> >>> Thanks,
> >>> Chris.
> >>> [as WG member]
> >>>
> >>>
> >>>> On May 6, 2020, at 10:33 AM, bruno.decraene@orange.com wrote:
> >>>>
> >>>> Les,
> >>>>
> >>>> From: Les Ginsberg (ginsberg) [mailto:ginsberg@cisco.com]
> >>>> Sent: Wednesday, May 6, 2020 4:14 PM
> >>>> To: DECRAENE Bruno TGI/OLN
> >>>> Cc: lsr@ietf.org
> >>>> Subject: RE: Flooding across a network
> >>>>
> >>>> Bruno –
> >>>>
> >>>> I am somewhat at a loss to understand your comments.
> >>>> The example is straightforward and does not need to consider FIB
> update
> >> time nor the ordering of prefix updates on different nodes.
> >>>> [Bruno] The example is straightforward but you are referring to FIB
> and IP
> >> packets forwarding as per those FIBs.
> >>>> I’d like we focus on LSP flooding and LSDB consistency.
> >>>>
> >>>> Consider the state of Node B and Node D at various time points from
> the
> >> trigger event.
> >>>>
> >>>> T+ 2 seconds:
> >>>> -----------------
> >>>> B has received all LSP Updates. It triggers an SPF and for all
> Northbound
> >> destinations previously reachable via C it installs paths via D.
> >>>> Let’s assume it take 5 seconds to update the forwarding plane.
> >>>>
> >>>> D has received 40 of the 1000 LSP updates. It triggers an SPF and
> finds
> >> that all Northbound destinations are reachable via B-C. It makes no
> changes
> >> to the forwarding plane.
> >>>>
> >>>> T+7 seconds
> >>>> -----------------
> >>>> B has completed FIB updates. Traffic to all Northbound destinations is
> >> being forwarded via D.
> >>>>
> >>>> D has now received 140 of the 1000 LSP updates. Entries in its
> forwarding
> >> plane for Northbound destinations still point to B.
> >>>>
> >>>> We have a loop.
> >>>>
> >>>> T + 30 seconds
> >>>> --------------------
> >>>> D has now received 600 of the 1000 LSP updates. Still no changes to
> its
> >> forwarding plane.
> >>>> Traffic to Northbound destinations is still looping.
> >>>>
> >>>> T+ 50 seconds
> >>>> -------------------
> >>>> D has finally received all 1000 LSP updates..
> >>>> It triggers (another) SPF and calculates paths to Northbound
> destinations
> >> via E. It begins to update its forwarding plane.
> >>>> Let’s assume this will take 5 seconds..
> >>>>
> >>>> T + 55 seconds
> >>>> --------------------
> >>>> D has completed forwarding plane updates – no more looping.
> >>>>
> >>>> That is all I am trying to illustrate.
> >>>>
> >>>> If you want to start arguing that node protecting LFAs + microloop
> >> avoidance could help (NOTE I explicitly  took those out of the example
> for
> >> simplicity) – it is easy enough to change the example to include
> multiple node
> >> failures or a node failure plus some northbound link failures on other
> nodes.
> >>>> [Bruno] I’m not talking about LFA/FRR. And with regards to microloops
> >> avoidance, some algorithms can handle any graph transition so including
> >> multiple node failures.
> >>>>
> >>>> But again, let’s stick to LSP flooding and LSDB consistency. (you are
> the
> >> one speaking about microloops in the forwarding plane).
> >>>>
> >>>> The point here is to look at the impact of long-lived LSDB
> inconsistency
> >> which results when some nodes support flooding an order of magnitude
> >> faster flooding than other nodes – which is what you asked me to
> clarify.
> >>>> [Bruno] No. I asked you to clarify why having a node with faster
> flooding
> >> could prolongs the period of LSDB inconsistency.
> >>>>
> >>>> Again, with you own words: “when only some nodes in the network
> >> support faster flooding the behavior of the whole network may not be
> >> "better" when faster flooding is enabled because it prolongs the period
> of
> >> LSDB inconsistency.”
> >>>> And with less words: “when only some nodes in the network support
> >> faster flooding […]  it prolongs the period of LSDB inconsistency.”
> >>>>
> >>>> --Bruno
> >>>>
> >>>>     Les
> >>>>
> >>>>
> >>>>
> >>>> From: bruno.decraene@orange.com <bruno.decraene@orange.com>
> >>>> Sent: Wednesday, May 06, 2020 6:21 AM
> >>>> To: Les Ginsberg (ginsberg) <ginsberg@cisco.com>
> >>>> Cc: lsr@ietf.org
> >>>> Subject: RE: Flooding across a network
> >>>>
> >>>> Les,
> >>>>
> >>>> From: Les Ginsberg (ginsberg) [mailto:ginsberg@cisco.com]
> >>>> Sent: Wednesday, May 6, 2020 1:35 AM
> >>>> To: DECRAENE Bruno TGI/OLN; lsr@ietf..org
> >>>> Subject: RE: Flooding across a network
> >>>>
> >>>> Bruno -
> >>>>
> >>>> Seems like it was not too long ago that we were discussing this in
> person.
> >> Ahhh...the good old days...
> >>>> [Bruno] Indeed, may be not to the point of concluding. Indeed.
> >>>>
> >>>> First, let's agree that the interesting case does not involve 1 or
> even a
> >> small number of LSPs. For those cases flooding speed does not matter.
> >>>> The interesting cases involve a large number of LSPs (hundreds or
> >> thousands). And in such cases LFA/microloop avoidance techniques are not
> >> applicable.
> >>>>
> >>>> Take the following simple topology:
> >>>>
> >>>>     |  | ... |            |
> >>>>       +---+             +---+
> >>>>       | C |             | E |
> >>>>       +---+             +---+
> >>>>         |                 | 1000
> >>>>       +---+             +---+
> >>>>       | B |-------------| D |
> >>>>       +---+   1000      +---+
> >>>>         |                 |
> >>>>         |                 |
> >>>>          \               /
> >>>>           \            /
> >>>>            \         /
> >>>>             \      /
> >>>>               +---+
> >>>>               | A |
> >>>>               +---+
> >>>>
> >>>> There is a topology northbound of C and E (not shown) and a topology
> >> southbound of A (not shown).
> >>>> Cost on all links is 10 except B-D and D-E where cost is high.
> >>>>
> >>>> C is a node with 1000 neighbors.
> >>>> When all links are up, shortest path for all northbound destinations
> is via
> >> C.
> >>>> All nodes in the network support fast flooding except for Node D.
> >>>> Let’s say fast flooding is 500 LSPs/second and slow flooding (Node D)
> is 20
> >> LSPs/seconds.
> >>>> If  Node C fails we have 1000 LSPs to flood.
> >>>> All nodes except for D can receive these in 2 seconds (plus internode
> >> delay time).
> >>>> D can receive LSPs in 50 seconds.
> >>>>
> >>>> [Bruno] Thanks for your example. Agreed so far.
> >>>>
> >>>> When A and B and all southbound nodes receive/process the LSP
> >> updates they will start sending traffic to Northbound destinations via
> D.
> >>>> But for the better part of 50 seconds, Node D has yet to receive all
> LSP
> >> updates and still believes that shortest path is via B-C. It will loop
> traffic.
> >>>>
> >>>> [Bruno] May I remind you that we are discussing IS-IS flooding in
> order to
> >> sync LSDB (LSP database). That is already a big enough subject. It does
> not
> >> including FIB (updates), nor IP forwarding.
> >>>>
> >>>> Quoting you “when only some nodes in the network support faster
> >> flooding the behavior of the whole network may not be "better" when
> faster
> >> flooding is enabled because it prolongs the period of LSDB
> inconsistency.”
> >>>>
> >>>> Taking your own examples, in both cases (all nodes support fast
> flooding;
> >> all nodes but D support fast flooding) the period of LSDB inconsistency
> is 50
> >> seconds. Hence this example does not illustrate your statement.
> >>>>
> >>>> Hence I’m restating my questions:
> >>>>
> >>>>>> when only some nodes in the network support faster flooding the
> >> behavior
> >>>>> of the whole network may not be "better" when faster flooding is
> >> enabled
> >>>>> because it prolongs the period of LSDB inconsistency.
> >>>>>
> >>>>> 1) Do you have data on this?
> >>>>>
> >>>>> 2) If not, can you provide an example where increasing the flooding
> >> rate on
> >>>>> one adjacency prolongs the period of LSDB inconsistency across the
> >>>>> network?
> >>>>
> >>>>
> >>>> Had all nodes used slow flooding, it still would have taken 50
> seconds to
> >> converge, but there would be significantly less looping. There could be
> a
> >> good amount of blackholing, but this is preferable to looping.
> >>>> [Bruno] You are using an example where ordering FIB updates across the
> >> network, e.g. as per [1], allows to reduce _FIB_ inconsistency across
> the
> >> path/network. And you seem to conclude from this that this translates to
> >> LSDB update ordering. Those are two different things. In this thread,
> I’d
> >> suggest that we focus on IGP flooding and LSDB sync only. (*)
> >>>> [1] https://tools.ietf.org/html/rfc6976
> >>>> (*) We can discuss loop free IGP converge in a different thread if you
> >> want. IMO, the use of segment routing/source routing is better than
> oFIB.
> >> But at some point, it still relies on fast flooding when multiple LSPs
> are
> >> involved. (and I mean _fast_ not _ordered_)
> >>>>
> >>>> --Bruno
> >>>>
> >>>> One can always come up with examples – based on a specific topology
> >> and a specific failure - where things might be better/worse/unchanged
> in the
> >> face of inconsistent flooding speed support.
> >>>> But I hope this simple example illustrates the pitfalls.
> >>>>
> >>>>      Les
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: bruno.decraene@orange.com <bruno.decraene@orange.com>
> >>>>> Sent: Tuesday, May 05, 2020 8:28 AM
> >>>>> To: Les Ginsberg (ginsberg) <ginsberg@cisco.com>; lsr@ietf.org
> >>>>> Subject: Flooding across a network
> >>>>>
> >>>>> Les,
> >>>>>
> >>>>>> From: Lsr [mailto:lsr-bounces@ietf.org] On Behalf Of Les Ginsberg
> >>>>> (ginsberg)
> >>>>>> Sent: Monday, May 4, 2020 4:39 PM
> >>>>> [...]
> >>>>>> when only some nodes in the network support faster flooding the
> >> behavior
> >>>>> of the whole network may not be "better" when faster flooding is
> >> enabled
> >>>>> because it prolongs the period of LSDB inconsistency.
> >>>>>
> >>>>> 1) Do you have data on this?
> >>>>>
> >>>>> 2) If not, can you provide an example where increasing the flooding
> >> rate on
> >>>>> one adjacency prolongs the period of LSDB inconsistency across the
> >>>>> network?
> >>>>>
> >>>>> 3) In the meantime, let's try the theoretical analysis on a simple
> >> scenario
> >>>>> where a single LSP needs to be flooded across the network.
> >>>>>
> >>>>> - Let's call Dij the time needed to flood the LSP from node i to the
> >> adjacent
> >>>>> node j. Clearly Dij>0.
> >>>>> - Let's call k the node originating this LSP at t0=0s
> >>>>>
> >>>>> >From t0, the LSDB is inconsistent across the network as all nodes
> but k
> >> are
> >>>>> missing the LSP and hence only know about the 'old' topology.
> >>>>>
> >>>>> Let's call  SPT(k) the SPT rooted on k, using Dij as the metric
> between
> >>>>> adjacent nodes i and j. Let's call SP(k,i) the shortest path from k
> to i; and
> >>>>> D(k,i) the shortest distance between k and i.
> >>>>>
> >>>>> It seems that the time needed:
> >>>>> - for node j to learn about the LSP, and get in sync with k, is
> D(k,j)
> >>>>> - for all nodes across the network to learn about the LSP, and get
> in sync
> >> with
> >>>>> k, is Max[for all j] D(k,j)
> >>>>>
> >>>>> Then how can reducing the flooding delay on one adjacency could
> >> prolongs
> >>>>> the period of LSDB inconsistency?
> >>>>> It seems to me that it can only improve/decrease it. Otherwise, this
> >> would
> >>>>> mean that decreasing the cost on a link can increase the cost of the
> >> shortest
> >>>>> path.
> >>>>>
> >>>>> Note: I agree that there are other cases, such as  multiple LSPs
> >> originated by
> >>>>> the same node, and multiple LSPs originated by multiple nodes, but
> >> let's start
> >>>>> with the simple case.
> >>>>>
> >>>>> Thanks,
> >>>>> --Bruno
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Lsr [mailto:lsr-bounces@ietf.org] On Behalf Of Les Ginsberg
> >>>>> (ginsberg)
> >>>>>> Sent: Monday, May 4, 2020 4:39 PM
> >>>>>>
> >>>>>> Henk -
> >>>>>>
> >>>>>> Thanx for your thoughtful posts.
> >>>>>> I have read your later posts on this thread as well - but decided to
> >> reply to
> >>>>> this one.
> >>>>>> Top posting for better readability.
> >>>>>>
> >>>>>> There is broad agreement that faster flooding is desirable.
> >>>>>> There are now two proposals as to how to address the issue - neither
> >> of
> >>>>> which is proposing to use TCP (or equivalent).
> >>>>>>
> >>>>>> I have commented on why IS-IS flooding requirements are
> >> significantly
> >>>>> different than that for which TCP is used.
> >>>>>> I think it is also useful to note that even the simple test case
> which
> >> Bruno
> >>>>> reported on in last week's interim meeting demonstrated that without
> >> any
> >>>>> changes to the protocol at all IS-IS was able to flood an order of
> >> magnitude
> >>>>> faster than it commonly does today.
> >>>>>> This gives me hope that we are looking at the problem correctly and
> >> will not
> >>>>> need "TCP".
> >>>>>>
> >>>>>> Introducing a TCP based solution requires:
> >>>>>>
> >>>>>> a)A major change to the adjacency formation logic
> >>>>>>
> >>>>>> b)Removal of the independence of the IS-IS protocol from the
> >> address
> >>>>> families whose reachability advertisements it supports - something
> >> which I
> >>>>> think is a great strength of the protocol - particularly in
> environments
> >> where
> >>>>> multiple address family support is needed
> >>>>>>
> >>>>>> I really don't want to do either of the above.
> >>>>>>
> >>>>>> Your comments regarding PSNP response times are quite correct -
> >> and
> >>>>> both of the draft proposals discuss this - though I agree more
> detail will
> >> be
> >>>>> required.
> >>>>>> It is intuitive that if you want to flood faster you also need to
> ACK
> >> faster -
> >>>>> and probably even retransmit faster when that is needed.
> >>>>>> The basic relationship between retransmit interval and PSNP interval
> >> is
> >>>>> expressed in ISO 10589:
> >>>>>>
> >>>>>> " partialSNPInterval - This is the amount of time between periodic
> >>>>>          > action for transmission of Partial Sequence Number PDUs.
> >>>>>          > It shall be less than minimumLSPTransmission-Interval."
> >>>>>>
> >>>>>> Of course ISO 10589 recommended values (2 seconds and 5 seconds
> >>>>> respectively) associated with a much slower flooding rate and
> >>>>> implementations I am aware of use values in this order of magnitude.
> >> These
> >>>>> numbers need to be reduced if we are to flood faster, but the
> >> relationship
> >>>>> between the two needs to remain the same.
> >>>>>>
> >>>>>> It is also true - as you state - that sending ACKs more quickly
> will result
> >> in
> >>>>> additional PDUs which need to be received/processed by IS-IS - and
> this
> >> has
> >>>>> some impact. But I think it is reasonable to expect that an
> >> implementation
> >>>>> which can support sending and receiving LSPs at a faster rate should
> >> also be
> >>>>> able to send/receive PSNPs at a faster rate. But we still need to be
> >> smarter
> >>>>> than sending one PSNP/one LSP in cases where we have a burst.
> >>>>>>
> >>>>>> LANs are a more difficult problem than P2P - and thus far draft-
> >> ginsberg-lsr-
> >>>>> isis-flooding-scale has been silent on this - but not because we
> aren't
> >> aware
> >>>>> of this - just have focused on the P2P behavior first.
> >>>>>> What the best behavior on a LAN may be is something I am still
> >> considering.
> >>>>> Slowing flooding down to the speed at which the slowest IS on the LAN
> >> can
> >>>>> support may not be the best strategy - as it also slows down the
> >> propagation
> >>>>> rate for systems downstream from the nodes on the LAN which can
> >> handle
> >>>>> faster flooding - thereby having an impact on flooding speed
> >> throughout the
> >>>>> network in a way which may be out of proportion. This is a smaller
> >> example
> >>>>> of the larger issue that when only some nodes in the network support
> >> faster
> >>>>> flooding the behavior of the whole network may not be "better" when
> >> faster
> >>>>> flooding is enabled because it prolongs the period of LSDB
> >> inconsistency.
> >>>>> More work needs to be done here...
> >>>>>>
> >>>>>> In summary, I don't expect to have to "reinvent TCP" - but I do
> think
> >> you
> >>>>> have provided a useful perspective for us to consider as we progress
> on
> >> this
> >>>>> topic,
> >>>>>>
> >>>>>> Thanx.
> >>>>>>
> >>>>>      > Les
> >>>>>>
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Lsr <lsr-bounces@ietf.org> On Behalf Of Henk Smit
> >>>>>>> Sent: Thursday, April 30, 2020 6:58 AM
> >>>>>>> To: lsr@ietf.org
> >>>>>>> Subject: [Lsr] Why only a congestion-avoidance algorithm on the
> >> sender
> >>>>> isn't
> >>>>>>> enough
> >>>>>>>
> >>>>>>>
> >>>>>>> Hello all,
> >>>>>>>
> >>>>>>> Two years ago, Gunter Van de Velde and myself published this
> >> draft:
> >>>>>>>
> https://tools.ietf.org/html/draft-hsmit-lsr-isis-flooding-over-tcp-00
> >>>>>>> That started this discussion about flow/congestion control and ISIS
> >>>>>>> flooding.
> >>>>>>>
> >>>>>>> My thoughts were that once we start implementing new algorithms
> >> to
> >>>>>>> optimize ISIS flooding speed, we'll end up with our own version of
> >> TCP.
> >>>>>>> I think most people here have a good general understanding of TCP.
> >>>>>>> But if not, this is a good overview how TCP does it:
> >>>>>>> https://en.wikipedia.org/wiki/TCP_congestion_control
> >>>>>>>
> >>>>>>>
> >>>>>>> What does TCP do:
> >>>>>>> ====
> >>>>>>> TCP does 2 things: flow control and congestion control.
> >>>>>>>
> >>>>>>> 1) Flow control is: the receiver trying to prevent itself from
> being
> >>>>>>> overloaded. The receiver indicates, through the receiver-window-
> >> size
> >>>>>>> in the TCP acks, how much data it can or wants to receive.
> >>>>>>> 2) Congestion control is: the sender trying to prevent the links
> >> between
> >>>>>>> sender and receiver from being overloaded. The sender makes an
> >>>>> educated
> >>>>>>> guess at what speed it can send.
> >>>>>>>
> >>>>>>>
> >>>>>>> The part we seem to be missing:
> >>>>>>> ====
> >>>>>>> For the sender to make a guess at what speed it can send, it looks
> at
> >>>>>>> how the transmission is behaving. Are there drops ? What is the RTT
> >> ?
> >>>>>>> Do drop-percentage and RTT change ? Do acks come in at the same
> >> rate
> >>>>>>> as the sender sends segments ? Are there duplicate acks ? To be
> >> able
> >>>>>>> to do this, the sender must know what to expect. How acks behave.
> >>>>>>>
> >>>>>>> If you want an ISIS sender to make a guess at what speed it can
> >> send,
> >>>>>>> without changing the protocol, the only thing the sender can do is
> >> look
> >>>>>>> at the PSNPs that come back from the receiver. But the RTT of
> >> PSNPs can
> >>>>>>> not be predicted. Because a good ISIS implementation does not
> >>>>>>> immediately
> >>>>>>> send a PSNP when it receives a LSP. 1) the receiver should jitter
> the
> >>>>>>> PSNP,
> >>>>>>> like it should jitter all packets. And 2) the receiver should wait
> a
> >>>>>>> little
> >>>>>>> to see if it can combine multiple acks into a single PSNP packet.
> >>>>>>>
> >>>>>>> In TCP, if a single segment gets lost, each new segment will cause
> >> the
> >>>>>>> receiver to send an ack with the seqnr of the last received byte.
> This
> >>>>>>> is called "duplicate acks". This triggers the sender to do
> >>>>>>> fast-retransmission. In ISIS, this can't be be done. The
> information
> >>>>>>> a sender can get from looking at incoming PSNPs is a lot less than
> >> what
> >>>>>>> TCP can learn from incoming acks.
> >>>>>>>
> >>>>>>>
> >>>>>>> The problem with sender-side congestion control:
> >>>>>>> ====
> >>>>>>> In ISIS, all we know is that the default retransmit-interval is 5
> >>>>>>> seconds.
> >>>>>>> And I think most implementations use that as the default. This
> >> means
> >>>>>>> that
> >>>>>>> the receiver of an LSP has one requirement: send a PSNP within 5
> >>>>>>> seconds.
> >>>>>>> For the rest, implementations are free to send PSNPs however and
> >>>>>>> whenever
> >>>>>>> they want. This means a sender can not really make conclusions
> >> about
> >>>>>>> flooding speed, dropped LSPs, capacity of the receiver, etc.
> >>>>>>> There is no ordering when flooding LSPs, or sending PSNPs. This
> >> makes
> >>>>>>> a sender-side algorithm for ISIS a lot harder.
> >>>>>>>
> >>>>>>> When you think about it, you realize that a sender should wait the
> >>>>>>> full 5 seconds before it can make any real conclusions about
> >> dropped
> >>>>>>> LSPs.
> >>>>>>> If a sender looks at PSNPs to determine its flooding speed, it will
> >>>>>>> probably
> >>>>>>> not be able to react without a delay of a few seconds. A sender
> >> might
> >>>>>>> send
> >>>>>>> hunderds or thousands of LSPs in those 5 seconds, which might all
> >> or
> >>>>>>> partially be dropped, complicating matters even further.
> >>>>>>>
> >>>>>>>
> >>>>>>> A sender-sider algorithm should specify how to do PSNPs.
> >>>>>>> ====
> >>>>>>> So imho a sender-side only algorithm can't work just like that in a
> >>>>>>> multi-vendor environment. We must not only specify a congestion-
> >>>>> control
> >>>>>>> algorithm for the sender. We must also specify for the receiver a
> >> more
> >>>>>>> specific algorithm how and when to send PSNPs. At least how to do
> >>>>> PSNPs
> >>>>>>> under load.
> >>>>>>>
> >>>>>>> Note that this might result in the receiver sending more (and
> >> smaller)
> >>>>>>> PSNPs.
> >>>>>>> More packets might mean more congestion (inside routers).
> >>>>>>>
> >>>>>>>
> >>>>>>> Will receiver-side flow-control work ?
> >>>>>>> ====
> >>>>>>> I don't know if that's enough. It will certainly help.
> >>>>>>>
> >>>>>>> I think to tackle this problem, we need 3 parts:
> >>>>>>> 1) sender-side congestion-control algorithm
> >>>>>>> 2) more detailed algorithm on receiver when and how to send
> >> PSNPs
> >>>>>>> 3) receiver-side flow-control mechanism
> >>>>>>>
> >>>>>>> As discussed at length, I don't know if the ISIS process on the
> >>>>>>> receiving
> >>>>>>> router can actually know if its running out of resources (buffers
> on
> >>>>>>> interfaces, linecards, etc). That's implementation dependent. A
> >> receiver
> >>>>>>> can definitely advertise a fixed value. So the sender has an upper
> >> bound
> >>>>>>> to use when doing congestion-control. Just like TCP has both a
> >>>>>>> flow-control
> >>>>>>> window and a congestion-control window, and a sender uses both.
> >>>>> Maybe
> >>>>>>> the
> >>>>>>> receiver can even advertise a dynamic value. Maybe now, maybe
> >> only in
> >>>>>>> the
> >>>>>>> future. An advertised upper limit seems useful to me today.
> >>>>>>>
> >>>>>>>
> >>>>>>> What I didn't like about our own proposal (flooding over TCP):
> >>>>>>> ====
> >>>>>>> The problem I saw with flooding over TCP concerns multi-point
> >> networks
> >>>>>>> (LANs).
> >>>>>>>
> >>>>>>> When flooding over a multi-point network, setting up TCP
> >> connections
> >>>>>>> introduces serious challenges. Who are the endpoints of the TCP
> >>>>>>> connections ?
> >>>>>>> Full mesh ? Or do all ISes on a LAN create a TCP-connection to the
> >> DIS ?
> >>>>>>> There is no backup DIS in ISIS (unlike OSPF). Things get messy
> >> quickly.
> >>>>>>>
> >>>>>>> However, the other two proposals do not solve this problem either.
> >>>>>>> How will a sender-side congestion-avoidence algorithm determine
> >>>>> whether
> >>>>>>> there were drops ? There are no acks (PSNPs) on a LAN. We assume
> >> most
> >>>>>>> LSPs
> >>>>>>> that are broadcasted are received by all other ISes on the LAN.
> >> There
> >>>>>>> are
> >>>>>>> no acks. Only after the DIS has sent its periodic CSNPs, ISes can
> send
> >>>>>>> PSNPs to request retransmissions. It seems impossible (or very
> >> hard) to
> >>>>>>> me for all ISes on a LAN to keep track of dropped LSPs and adjust
> >> their
> >>>>>>> sending speed accordingly..
> >>>>>>>
> >>>>>>> When flooding on a LAN, the receiver-side algorithm seems best.
> >>>>> Because
> >>>>>>> all ISes can see what the lowest advertised sending-speed is. And
> >> make
> >>>>>>> sure they send slow enough to not overload the slowest IS. I'm not
> >> sure
> >>>>>>> this is a good solution, but is seems easier and more realistic
> than
> >>>>>>> ISIS-flooding-over-TCP or sender-side congestion-avoidance.
> >>>>>>>
> >>>>>>>
> >>>>>>> My conclusion:
> >>>>>>> ====
> >>>>>>> Sender-side congestion-control won't work without specifying in
> >> more
> >>>>>>> detail how and when to send PSNPs.
> >>>>>>> Receiver-side flow-control will certainly help. I dont' know if
> it's
> >>>>>>> good enough. I don't know if advertising a static value is good
> >> enough.
> >>>>>>> But it's a start.
> >>>>>>>
> >>>>>>> I still think we'll end up re-implementing a new (and weaker) TCP.
> >>>>>>>
> >>>>>>>
> >>>>>>> henk.
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Lsr mailing list
> >>>>>>> Lsr@ietf.org
> >>>>>>> https://www.ietf.org/mailman/listinfo/lsr
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Lsr mailing list
> >>>>>> Lsr@ietf.org
> >>>>>> https://www.ietf.org/mailman/listinfo/lsr
> >>>>>
> >>>>>
> >> __________________________________________________________
> >>>>>
> >> __________________________________________________________
> >>>>> _____
> >>>>>
> >>>>> Ce message et ses pieces jointes peuvent contenir des informations
> >>>>> confidentielles ou privilegiees et ne doivent donc
> >>>>> pas etre diffuses, exploites ou copies sans autorisation. Si vous
> avez
> >> recu ce
> >>>>> message par erreur, veuillez le signaler
> >>>>> a l'expediteur et le detruire ainsi que les pieces jointes. Les
> messages
> >>>>> electroniques etant susceptibles d'alteration,
> >>>>> Orange decline toute responsabilite si ce message a ete altere,
> >> deforme ou
> >>>>> falsifie. Merci.
> >>>>>
> >>>>> This message and its attachments may contain confidential or
> privileged
> >>>>> information that may be protected by law;
> >>>>> they should not be distributed, used or copied without authorisation.
> >>>>> If you have received this email in error, please notify the sender
> and
> >> delete
> >>>>> this message and its attachments.
> >>>>> As emails may be altered, Orange is not liable for messages that have
> >> been
> >>>>> modified, changed or falsified.
> >>>>> Thank you.
> >>>>
> >>>>
> >> __________________________________________________________
> >> __________________________________________________________
> >> _____
> >>>>
> >>>> Ce message et ses pieces jointes peuvent contenir des informations
> >> confidentielles ou privilegiees et ne doivent donc
> >>>> pas etre diffuses, exploites ou copies sans autorisation. Si vous
> avez recu
> >> ce message par erreur, veuillez le signaler
> >>>> a l'expediteur et le detruire ainsi que les pieces jointes. Les
> messages
> >> electroniques etant susceptibles d'alteration,
> >>>> Orange decline toute responsabilite si ce message a ete altere,
> deforme
> >> ou falsifie. Merci.
> >>>>
> >>>> This message and its attachments may contain confidential or
> privileged
> >> information that may be protected by law;
> >>>> they should not be distributed, used or copied without authorisation.
> >>>> If you have received this email in error, please notify the sender and
> >> delete this message and its attachments.
> >>>> As emails may be altered, Orange is not liable for messages that have
> >> been modified, changed or falsified.
> >>>> Thank you.
> >>>>
> >> __________________________________________________________
> >> __________________________________________________________
> >> _____
> >>>>
> >>>> Ce message et ses pieces jointes peuvent contenir des informations
> >> confidentielles ou privilegiees et ne doivent donc
> >>>> pas etre diffuses, exploites ou copies sans autorisation. Si vous
> avez recu
> >> ce message par erreur, veuillez le signaler
> >>>> a l'expediteur et le detruire ainsi que les pieces jointes. Les
> messages
> >> electroniques etant susceptibles d'alteration,
> >>>> Orange decline toute responsabilite si ce message a ete altere,
> deforme
> >> ou falsifie. Merci.
> >>>>
> >>>> This message and its attachments may contain confidential or
> privileged
> >> information that may be protected by law;
> >>>> they should not be distributed, used or copied without authorisation.
> >>>> If you have received this email in error, please notify the sender and
> >> delete this message and its attachments.
> >>>> As emails may be altered, Orange is not liable for messages that have
> >> been modified, changed or falsified.
> >>>> Thank you.
> >>>>
> >>>> _______________________________________________
> >>>> Lsr mailing list
> >>>> Lsr@ietf.org
> >>>> https://www.ietf.org/mailman/listinfo/lsr
> >>>
> >>
> >> __________________________________________________________
> >> __________________________________________________________
> >> _____
> >>
> >> Ce message et ses pieces jointes peuvent contenir des informations
> >> confidentielles ou privilegiees et ne doivent donc
> >> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez
> recu ce
> >> message par erreur, veuillez le signaler
> >> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
> >> electroniques etant susceptibles d'alteration,
> >> Orange decline toute responsabilite si ce message a ete altere, deforme
> ou
> >> falsifie. Merci.
> >>
> >> This message and its attachments may contain confidential or privileged
> >> information that may be protected by law;
> >> they should not be distributed, used or copied without authorisation.
> >> If you have received this email in error, please notify the sender and
> delete
> >> this message and its attachments.
> >> As emails may be altered, Orange is not liable for messages that have
> been
> >> modified, changed or falsified.
> >> Thank you.
> >
> > _______________________________________________
> > Lsr mailing list
> > Lsr@ietf.org
> > https://www.ietf.org/mailman/listinfo/lsr
> >
>
> _______________________________________________
> Lsr mailing list
> Lsr@ietf.org
> https://www.ietf.org/mailman/listinfo/lsr
>