Re: [Lsr] Flooding across a network

"Joel M. Halpern" <jmh@joelhalpern.com> Wed, 06 May 2020 18:19 UTC

Return-Path: <jmh@joelhalpern.com>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 03D7C3A0A00 for <lsr@ietfa.amsl.com>; Wed, 6 May 2020 11:19:36 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=joelhalpern.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pu3S2KHTWmu6 for <lsr@ietfa.amsl.com>; Wed, 6 May 2020 11:19:30 -0700 (PDT)
Received: from mailb2.tigertech.net (mailb2.tigertech.net [208.80.4.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6E2C93A09FD for <lsr@ietf.org>; Wed, 6 May 2020 11:19:30 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by mailb2.tigertech.net (Postfix) with ESMTP id 49HPxt1v83z1p3MF; Wed, 6 May 2020 11:19:30 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelhalpern.com; s=2.tigertech; t=1588789170; bh=uEBibSQYQh/F81vcesCC0mJShHgo6ebFd8JWFIz3cP4=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=FeH2aI1/UIgkHlZvLgKJ814BLq+MJxyUYbnhJibgR66/QlX8QPIVlfUPJTfQeimug lENoqV3qpULecbNnfj97wfKkZT+GI+ZUoWjb38AzqVPp/CXxNO3+KGVcv6zl20KPRy rTdcrNd4b+umNZ2VpyTP+9dfL0qxnxtwmkRpxLGI=
X-Virus-Scanned: Debian amavisd-new at b2.tigertech.net
Received: from [192.168.128.43] (209-255-163-147.ip.mcleodusa.net [209.255.163.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mailb2.tigertech.net (Postfix) with ESMTPSA id 49HPxs3YX1z1nylD; Wed, 6 May 2020 11:19:29 -0700 (PDT)
To: "Les Ginsberg (ginsberg)" <ginsberg=40cisco.com@dmarc.ietf.org>
Cc: "lsr@ietf.org" <lsr@ietf.org>
References: <24209_1588692477_5EB185FD_24209_35_1_53C29892C857584299CBF5D05346208A48E3D455@OPEXCAUBM43.corporate.adroot.infra.ftgroup> <MW3PR11MB46198A668B9F2532BCCC38FEC1A70@MW3PR11MB4619.namprd11.prod.outlook.com> <6287_1588771252_5EB2B9B4_6287_332_1_53C29892C857584299CBF5D05346208A48E3F698@OPEXCAUBM43.corporate.adroot.infra.ftgroup> <MW3PR11MB46199CC33B10BC9D3D622D2AC1A40@MW3PR11MB4619.namprd11.prod.outlook.com> <10562_1588775602_5EB2CAB2_10562_251_11_53C29892C857584299CBF5D05346208A48E3FB63@OPEXCAUBM43.corporate.adroot.infra.ftgroup> <87CDE7F3-E08D-4C45-9AF1-9DAD635F8908@chopps.org> <9992_1588784982_5EB2EF56_9992_201_1_53C29892C857584299CBF5D05346208A48E40256@OPEXCAUBM43.corporate.adroot.infra.ftgroup> <MW3PR11MB4619015E4B356DFC225CD001C1A40@MW3PR11MB4619.namprd11.prod.outlook.com>
From: "Joel M. Halpern" <jmh@joelhalpern.com>
Message-ID: <8f25568b-cb57-7714-1e16-71c257aae0b2@joelhalpern.com>
Date: Wed, 6 May 2020 14:19:26 -0400
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0
MIME-Version: 1.0
In-Reply-To: <MW3PR11MB4619015E4B356DFC225CD001C1A40@MW3PR11MB4619.namprd11.prod.outlook.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/DS-M89SW5-EvD8JAyPx1xVS2PIA>
Subject: Re: [Lsr] Flooding across a network
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 06 May 2020 18:19:37 -0000

Les, maybe I am missing your point, but it sounds like what you are 
asking for is a (better?) version of the micro-loop prevention work, so 
as to mitigate the interaction between inconsistent convergence and 
fast-reroute?

Yours,
Joel

On 5/6/2020 1:53 PM, Les Ginsberg (ginsberg) wrote:
> Bruno -
> 
> I am sorry it has been so difficult for us to understand each other. I am trying my best.
> 
> Look at it this way:
> 
> You are the customer. 😊
> I am the vendor.
> 
> The failure scenario I describe below happens and you notice that all Northbound destinations loop for 35 seconds whenever fast flooding is enabled.
> I think you are going to complain about this - to me. 😊
> 
> And I am going to tell you that this is a consequence of enabling fast flooding in the presence of a node which does not support it. Your options to reduce the period of looping will be:
> 
> 1)Upgrade the slow node to support faster flooding
> 2)Disable fast flooding
> 3)Redesign your network
> 
>      Les
> 
>> -----Original Message-----
>> From: bruno.decraene@orange.com <bruno.decraene@orange.com>
>> Sent: Wednesday, May 06, 2020 10:10 AM
>> To: Christian Hopps <chopps@chopps.org>
>> Cc: Les Ginsberg (ginsberg) <ginsberg@cisco.com>om>; lsr@ietf.org
>> Subject: RE: [Lsr] Flooding across a network
>>
>>> From: Christian Hopps [mailto:chopps@chopps.org]
>>>
>>> Bruno persistence has made me realize something fundamental here.
>>>
>>> The minute the LSP originator changes the LSP and floods it you have LSDB
>> inconsistency.
>>
>> Exactly my point. Thank you Chris.
>> I would even say: "The minute the LSP originator changes the LSP then you
>> have LSDB inconsistency." But no big deal if there is disagreement on this
>> detail.
>>
>>> That is going to last until the last node in the network has updated it's LSDB.
>>
>> Absolutely.
>> So the faster we flood, the shorter the LSBD inconsistency.
>>
>> Now IMO, even if a single/few nodes flood faster, there is a chance of
>> shortening the LSDB inconsistency. But in all cases, I don't see how this could
>> make the LSDB inconsistency longer.
>>
>>
>>> Les is pointing out that LSDB inconsistency can be bad in certain
>> circumstances e.g., if a critical node is slow and thus inconsistent.
>>>
>>> I believe the right way to fix this is a simple one, help the operator flag the
>> broken router software/hardware for replacement, but otherwise IS-IS
>> should just try to do the best job it can do to which is to flood around the
>> problem (i.e., flood as optimally as possible).
>>
>> +1
>> On a side note, I would not call a router flooding slowly as "broken". I find it
>> understandable that in a given network there are different type of routers
>> (core vs aggregation), different roles (P having 50 IGP adjacencies with 50 PEs
>> vs PE having only 2 IGP adjacencies with 2 P), different hardware
>> generations, different software, different vendors with different
>> perspectives/markets.
>>
>> Thank you Chris.
>>
>> --Bruno
>>>
>>> Thanks,
>>> Chris.
>>> [as WG member]
>>>
>>>
>>>> On May 6, 2020, at 10:33 AM, bruno.decraene@orange.com wrote:
>>>>
>>>> Les,
>>>>
>>>> From: Les Ginsberg (ginsberg) [mailto:ginsberg@cisco.com]
>>>> Sent: Wednesday, May 6, 2020 4:14 PM
>>>> To: DECRAENE Bruno TGI/OLN
>>>> Cc: lsr@ietf.org
>>>> Subject: RE: Flooding across a network
>>>>
>>>> Bruno –
>>>>
>>>> I am somewhat at a loss to understand your comments.
>>>> The example is straightforward and does not need to consider FIB update
>> time nor the ordering of prefix updates on different nodes.
>>>> [Bruno] The example is straightforward but you are referring to FIB and IP
>> packets forwarding as per those FIBs.
>>>> I’d like we focus on LSP flooding and LSDB consistency.
>>>>
>>>> Consider the state of Node B and Node D at various time points from the
>> trigger event.
>>>>
>>>> T+ 2 seconds:
>>>> -----------------
>>>> B has received all LSP Updates. It triggers an SPF and for all Northbound
>> destinations previously reachable via C it installs paths via D.
>>>> Let’s assume it take 5 seconds to update the forwarding plane.
>>>>
>>>> D has received 40 of the 1000 LSP updates. It triggers an SPF and finds
>> that all Northbound destinations are reachable via B-C. It makes no changes
>> to the forwarding plane.
>>>>
>>>> T+7 seconds
>>>> -----------------
>>>> B has completed FIB updates. Traffic to all Northbound destinations is
>> being forwarded via D.
>>>>
>>>> D has now received 140 of the 1000 LSP updates. Entries in its forwarding
>> plane for Northbound destinations still point to B.
>>>>
>>>> We have a loop.
>>>>
>>>> T + 30 seconds
>>>> --------------------
>>>> D has now received 600 of the 1000 LSP updates. Still no changes to its
>> forwarding plane.
>>>> Traffic to Northbound destinations is still looping.
>>>>
>>>> T+ 50 seconds
>>>> -------------------
>>>> D has finally received all 1000 LSP updates..
>>>> It triggers (another) SPF and calculates paths to Northbound destinations
>> via E. It begins to update its forwarding plane.
>>>> Let’s assume this will take 5 seconds..
>>>>
>>>> T + 55 seconds
>>>> --------------------
>>>> D has completed forwarding plane updates – no more looping.
>>>>
>>>> That is all I am trying to illustrate.
>>>>
>>>> If you want to start arguing that node protecting LFAs + microloop
>> avoidance could help (NOTE I explicitly  took those out of the example for
>> simplicity) – it is easy enough to change the example to include multiple node
>> failures or a node failure plus some northbound link failures on other nodes.
>>>> [Bruno] I’m not talking about LFA/FRR. And with regards to microloops
>> avoidance, some algorithms can handle any graph transition so including
>> multiple node failures.
>>>>
>>>> But again, let’s stick to LSP flooding and LSDB consistency. (you are the
>> one speaking about microloops in the forwarding plane).
>>>>
>>>> The point here is to look at the impact of long-lived LSDB inconsistency
>> which results when some nodes support flooding an order of magnitude
>> faster flooding than other nodes – which is what you asked me to clarify.
>>>> [Bruno] No. I asked you to clarify why having a node with faster flooding
>> could prolongs the period of LSDB inconsistency.
>>>>
>>>> Again, with you own words: “when only some nodes in the network
>> support faster flooding the behavior of the whole network may not be
>> "better" when faster flooding is enabled because it prolongs the period of
>> LSDB inconsistency.”
>>>> And with less words: “when only some nodes in the network support
>> faster flooding […]  it prolongs the period of LSDB inconsistency.”
>>>>
>>>> --Bruno
>>>>
>>>>     Les
>>>>
>>>>
>>>>
>>>> From: bruno.decraene@orange.com <bruno.decraene@orange.com>
>>>> Sent: Wednesday, May 06, 2020 6:21 AM
>>>> To: Les Ginsberg (ginsberg) <ginsberg@cisco.com>
>>>> Cc: lsr@ietf.org
>>>> Subject: RE: Flooding across a network
>>>>
>>>> Les,
>>>>
>>>> From: Les Ginsberg (ginsberg) [mailto:ginsberg@cisco.com]
>>>> Sent: Wednesday, May 6, 2020 1:35 AM
>>>> To: DECRAENE Bruno TGI/OLN; lsr@ietf..org
>>>> Subject: RE: Flooding across a network
>>>>
>>>> Bruno -
>>>>
>>>> Seems like it was not too long ago that we were discussing this in person.
>> Ahhh...the good old days...
>>>> [Bruno] Indeed, may be not to the point of concluding. Indeed.
>>>>
>>>> First, let's agree that the interesting case does not involve 1 or even a
>> small number of LSPs. For those cases flooding speed does not matter.
>>>> The interesting cases involve a large number of LSPs (hundreds or
>> thousands). And in such cases LFA/microloop avoidance techniques are not
>> applicable.
>>>>
>>>> Take the following simple topology:
>>>>
>>>>     |  | ... |            |
>>>>       +---+             +---+
>>>>       | C |             | E |
>>>>       +---+             +---+
>>>>         |                 | 1000
>>>>       +---+             +---+
>>>>       | B |-------------| D |
>>>>       +---+   1000      +---+
>>>>         |                 |
>>>>         |                 |
>>>>          \               /
>>>>           \            /
>>>>            \         /
>>>>             \      /
>>>>               +---+
>>>>               | A |
>>>>               +---+
>>>>
>>>> There is a topology northbound of C and E (not shown) and a topology
>> southbound of A (not shown).
>>>> Cost on all links is 10 except B-D and D-E where cost is high.
>>>>
>>>> C is a node with 1000 neighbors.
>>>> When all links are up, shortest path for all northbound destinations is via
>> C.
>>>> All nodes in the network support fast flooding except for Node D.
>>>> Let’s say fast flooding is 500 LSPs/second and slow flooding (Node D) is 20
>> LSPs/seconds.
>>>> If  Node C fails we have 1000 LSPs to flood.
>>>> All nodes except for D can receive these in 2 seconds (plus internode
>> delay time).
>>>> D can receive LSPs in 50 seconds.
>>>>
>>>> [Bruno] Thanks for your example. Agreed so far.
>>>>
>>>> When A and B and all southbound nodes receive/process the LSP
>> updates they will start sending traffic to Northbound destinations via D.
>>>> But for the better part of 50 seconds, Node D has yet to receive all LSP
>> updates and still believes that shortest path is via B-C. It will loop traffic.
>>>>
>>>> [Bruno] May I remind you that we are discussing IS-IS flooding in order to
>> sync LSDB (LSP database). That is already a big enough subject. It does not
>> including FIB (updates), nor IP forwarding.
>>>>
>>>> Quoting you “when only some nodes in the network support faster
>> flooding the behavior of the whole network may not be "better" when faster
>> flooding is enabled because it prolongs the period of LSDB inconsistency.”
>>>>
>>>> Taking your own examples, in both cases (all nodes support fast flooding;
>> all nodes but D support fast flooding) the period of LSDB inconsistency is 50
>> seconds. Hence this example does not illustrate your statement.
>>>>
>>>> Hence I’m restating my questions:
>>>>
>>>>>> when only some nodes in the network support faster flooding the
>> behavior
>>>>> of the whole network may not be "better" when faster flooding is
>> enabled
>>>>> because it prolongs the period of LSDB inconsistency.
>>>>>
>>>>> 1) Do you have data on this?
>>>>>
>>>>> 2) If not, can you provide an example where increasing the flooding
>> rate on
>>>>> one adjacency prolongs the period of LSDB inconsistency across the
>>>>> network?
>>>>
>>>>
>>>> Had all nodes used slow flooding, it still would have taken 50 seconds to
>> converge, but there would be significantly less looping. There could be a
>> good amount of blackholing, but this is preferable to looping.
>>>> [Bruno] You are using an example where ordering FIB updates across the
>> network, e.g. as per [1], allows to reduce _FIB_ inconsistency across the
>> path/network. And you seem to conclude from this that this translates to
>> LSDB update ordering. Those are two different things. In this thread, I’d
>> suggest that we focus on IGP flooding and LSDB sync only. (*)
>>>> [1] https://tools.ietf.org/html/rfc6976
>>>> (*) We can discuss loop free IGP converge in a different thread if you
>> want. IMO, the use of segment routing/source routing is better than oFIB.
>> But at some point, it still relies on fast flooding when multiple LSPs are
>> involved. (and I mean _fast_ not _ordered_)
>>>>
>>>> --Bruno
>>>>
>>>> One can always come up with examples – based on a specific topology
>> and a specific failure - where things might be better/worse/unchanged in the
>> face of inconsistent flooding speed support.
>>>> But I hope this simple example illustrates the pitfalls.
>>>>
>>>>      Les
>>>>
>>>>> -----Original Message-----
>>>>> From: bruno.decraene@orange.com <bruno.decraene@orange.com>
>>>>> Sent: Tuesday, May 05, 2020 8:28 AM
>>>>> To: Les Ginsberg (ginsberg) <ginsberg@cisco.com>om>; lsr@ietf.org
>>>>> Subject: Flooding across a network
>>>>>
>>>>> Les,
>>>>>
>>>>>> From: Lsr [mailto:lsr-bounces@ietf.org] On Behalf Of Les Ginsberg
>>>>> (ginsberg)
>>>>>> Sent: Monday, May 4, 2020 4:39 PM
>>>>> [...]
>>>>>> when only some nodes in the network support faster flooding the
>> behavior
>>>>> of the whole network may not be "better" when faster flooding is
>> enabled
>>>>> because it prolongs the period of LSDB inconsistency.
>>>>>
>>>>> 1) Do you have data on this?
>>>>>
>>>>> 2) If not, can you provide an example where increasing the flooding
>> rate on
>>>>> one adjacency prolongs the period of LSDB inconsistency across the
>>>>> network?
>>>>>
>>>>> 3) In the meantime, let's try the theoretical analysis on a simple
>> scenario
>>>>> where a single LSP needs to be flooded across the network.
>>>>>
>>>>> - Let's call Dij the time needed to flood the LSP from node i to the
>> adjacent
>>>>> node j. Clearly Dij>0.
>>>>> - Let's call k the node originating this LSP at t0=0s
>>>>>
>>>>> >From t0, the LSDB is inconsistent across the network as all nodes but k
>> are
>>>>> missing the LSP and hence only know about the 'old' topology.
>>>>>
>>>>> Let's call  SPT(k) the SPT rooted on k, using Dij as the metric between
>>>>> adjacent nodes i and j. Let's call SP(k,i) the shortest path from k to i; and
>>>>> D(k,i) the shortest distance between k and i.
>>>>>
>>>>> It seems that the time needed:
>>>>> - for node j to learn about the LSP, and get in sync with k, is D(k,j)
>>>>> - for all nodes across the network to learn about the LSP, and get in sync
>> with
>>>>> k, is Max[for all j] D(k,j)
>>>>>
>>>>> Then how can reducing the flooding delay on one adjacency could
>> prolongs
>>>>> the period of LSDB inconsistency?
>>>>> It seems to me that it can only improve/decrease it. Otherwise, this
>> would
>>>>> mean that decreasing the cost on a link can increase the cost of the
>> shortest
>>>>> path.
>>>>>
>>>>> Note: I agree that there are other cases, such as  multiple LSPs
>> originated by
>>>>> the same node, and multiple LSPs originated by multiple nodes, but
>> let's start
>>>>> with the simple case.
>>>>>
>>>>> Thanks,
>>>>> --Bruno
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Lsr [mailto:lsr-bounces@ietf.org] On Behalf Of Les Ginsberg
>>>>> (ginsberg)
>>>>>> Sent: Monday, May 4, 2020 4:39 PM
>>>>>>
>>>>>> Henk -
>>>>>>
>>>>>> Thanx for your thoughtful posts.
>>>>>> I have read your later posts on this thread as well - but decided to
>> reply to
>>>>> this one.
>>>>>> Top posting for better readability.
>>>>>>
>>>>>> There is broad agreement that faster flooding is desirable.
>>>>>> There are now two proposals as to how to address the issue - neither
>> of
>>>>> which is proposing to use TCP (or equivalent).
>>>>>>
>>>>>> I have commented on why IS-IS flooding requirements are
>> significantly
>>>>> different than that for which TCP is used.
>>>>>> I think it is also useful to note that even the simple test case which
>> Bruno
>>>>> reported on in last week's interim meeting demonstrated that without
>> any
>>>>> changes to the protocol at all IS-IS was able to flood an order of
>> magnitude
>>>>> faster than it commonly does today.
>>>>>> This gives me hope that we are looking at the problem correctly and
>> will not
>>>>> need "TCP".
>>>>>>
>>>>>> Introducing a TCP based solution requires:
>>>>>>
>>>>>> a)A major change to the adjacency formation logic
>>>>>>
>>>>>> b)Removal of the independence of the IS-IS protocol from the
>> address
>>>>> families whose reachability advertisements it supports - something
>> which I
>>>>> think is a great strength of the protocol - particularly in environments
>> where
>>>>> multiple address family support is needed
>>>>>>
>>>>>> I really don't want to do either of the above.
>>>>>>
>>>>>> Your comments regarding PSNP response times are quite correct -
>> and
>>>>> both of the draft proposals discuss this - though I agree more detail will
>> be
>>>>> required.
>>>>>> It is intuitive that if you want to flood faster you also need to ACK
>> faster -
>>>>> and probably even retransmit faster when that is needed.
>>>>>> The basic relationship between retransmit interval and PSNP interval
>> is
>>>>> expressed in ISO 10589:
>>>>>>
>>>>>> " partialSNPInterval - This is the amount of time between periodic
>>>>>          > action for transmission of Partial Sequence Number PDUs.
>>>>>          > It shall be less than minimumLSPTransmission-Interval."
>>>>>>
>>>>>> Of course ISO 10589 recommended values (2 seconds and 5 seconds
>>>>> respectively) associated with a much slower flooding rate and
>>>>> implementations I am aware of use values in this order of magnitude.
>> These
>>>>> numbers need to be reduced if we are to flood faster, but the
>> relationship
>>>>> between the two needs to remain the same.
>>>>>>
>>>>>> It is also true - as you state - that sending ACKs more quickly will result
>> in
>>>>> additional PDUs which need to be received/processed by IS-IS - and this
>> has
>>>>> some impact. But I think it is reasonable to expect that an
>> implementation
>>>>> which can support sending and receiving LSPs at a faster rate should
>> also be
>>>>> able to send/receive PSNPs at a faster rate. But we still need to be
>> smarter
>>>>> than sending one PSNP/one LSP in cases where we have a burst.
>>>>>>
>>>>>> LANs are a more difficult problem than P2P - and thus far draft-
>> ginsberg-lsr-
>>>>> isis-flooding-scale has been silent on this - but not because we aren't
>> aware
>>>>> of this - just have focused on the P2P behavior first.
>>>>>> What the best behavior on a LAN may be is something I am still
>> considering.
>>>>> Slowing flooding down to the speed at which the slowest IS on the LAN
>> can
>>>>> support may not be the best strategy - as it also slows down the
>> propagation
>>>>> rate for systems downstream from the nodes on the LAN which can
>> handle
>>>>> faster flooding - thereby having an impact on flooding speed
>> throughout the
>>>>> network in a way which may be out of proportion. This is a smaller
>> example
>>>>> of the larger issue that when only some nodes in the network support
>> faster
>>>>> flooding the behavior of the whole network may not be "better" when
>> faster
>>>>> flooding is enabled because it prolongs the period of LSDB
>> inconsistency.
>>>>> More work needs to be done here...
>>>>>>
>>>>>> In summary, I don't expect to have to "reinvent TCP" - but I do think
>> you
>>>>> have provided a useful perspective for us to consider as we progress on
>> this
>>>>> topic,
>>>>>>
>>>>>> Thanx.
>>>>>>
>>>>>      > Les
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Lsr <lsr-bounces@ietf.org> On Behalf Of Henk Smit
>>>>>>> Sent: Thursday, April 30, 2020 6:58 AM
>>>>>>> To: lsr@ietf.org
>>>>>>> Subject: [Lsr] Why only a congestion-avoidance algorithm on the
>> sender
>>>>> isn't
>>>>>>> enough
>>>>>>>
>>>>>>>
>>>>>>> Hello all,
>>>>>>>
>>>>>>> Two years ago, Gunter Van de Velde and myself published this
>> draft:
>>>>>>> https://tools.ietf.org/html/draft-hsmit-lsr-isis-flooding-over-tcp-00
>>>>>>> That started this discussion about flow/congestion control and ISIS
>>>>>>> flooding.
>>>>>>>
>>>>>>> My thoughts were that once we start implementing new algorithms
>> to
>>>>>>> optimize ISIS flooding speed, we'll end up with our own version of
>> TCP.
>>>>>>> I think most people here have a good general understanding of TCP.
>>>>>>> But if not, this is a good overview how TCP does it:
>>>>>>> https://en.wikipedia.org/wiki/TCP_congestion_control
>>>>>>>
>>>>>>>
>>>>>>> What does TCP do:
>>>>>>> ====
>>>>>>> TCP does 2 things: flow control and congestion control.
>>>>>>>
>>>>>>> 1) Flow control is: the receiver trying to prevent itself from being
>>>>>>> overloaded. The receiver indicates, through the receiver-window-
>> size
>>>>>>> in the TCP acks, how much data it can or wants to receive.
>>>>>>> 2) Congestion control is: the sender trying to prevent the links
>> between
>>>>>>> sender and receiver from being overloaded. The sender makes an
>>>>> educated
>>>>>>> guess at what speed it can send.
>>>>>>>
>>>>>>>
>>>>>>> The part we seem to be missing:
>>>>>>> ====
>>>>>>> For the sender to make a guess at what speed it can send, it looks at
>>>>>>> how the transmission is behaving. Are there drops ? What is the RTT
>> ?
>>>>>>> Do drop-percentage and RTT change ? Do acks come in at the same
>> rate
>>>>>>> as the sender sends segments ? Are there duplicate acks ? To be
>> able
>>>>>>> to do this, the sender must know what to expect. How acks behave.
>>>>>>>
>>>>>>> If you want an ISIS sender to make a guess at what speed it can
>> send,
>>>>>>> without changing the protocol, the only thing the sender can do is
>> look
>>>>>>> at the PSNPs that come back from the receiver. But the RTT of
>> PSNPs can
>>>>>>> not be predicted. Because a good ISIS implementation does not
>>>>>>> immediately
>>>>>>> send a PSNP when it receives a LSP. 1) the receiver should jitter the
>>>>>>> PSNP,
>>>>>>> like it should jitter all packets. And 2) the receiver should wait a
>>>>>>> little
>>>>>>> to see if it can combine multiple acks into a single PSNP packet.
>>>>>>>
>>>>>>> In TCP, if a single segment gets lost, each new segment will cause
>> the
>>>>>>> receiver to send an ack with the seqnr of the last received byte. This
>>>>>>> is called "duplicate acks". This triggers the sender to do
>>>>>>> fast-retransmission. In ISIS, this can't be be done. The information
>>>>>>> a sender can get from looking at incoming PSNPs is a lot less than
>> what
>>>>>>> TCP can learn from incoming acks.
>>>>>>>
>>>>>>>
>>>>>>> The problem with sender-side congestion control:
>>>>>>> ====
>>>>>>> In ISIS, all we know is that the default retransmit-interval is 5
>>>>>>> seconds.
>>>>>>> And I think most implementations use that as the default. This
>> means
>>>>>>> that
>>>>>>> the receiver of an LSP has one requirement: send a PSNP within 5
>>>>>>> seconds.
>>>>>>> For the rest, implementations are free to send PSNPs however and
>>>>>>> whenever
>>>>>>> they want. This means a sender can not really make conclusions
>> about
>>>>>>> flooding speed, dropped LSPs, capacity of the receiver, etc.
>>>>>>> There is no ordering when flooding LSPs, or sending PSNPs. This
>> makes
>>>>>>> a sender-side algorithm for ISIS a lot harder.
>>>>>>>
>>>>>>> When you think about it, you realize that a sender should wait the
>>>>>>> full 5 seconds before it can make any real conclusions about
>> dropped
>>>>>>> LSPs.
>>>>>>> If a sender looks at PSNPs to determine its flooding speed, it will
>>>>>>> probably
>>>>>>> not be able to react without a delay of a few seconds. A sender
>> might
>>>>>>> send
>>>>>>> hunderds or thousands of LSPs in those 5 seconds, which might all
>> or
>>>>>>> partially be dropped, complicating matters even further.
>>>>>>>
>>>>>>>
>>>>>>> A sender-sider algorithm should specify how to do PSNPs.
>>>>>>> ====
>>>>>>> So imho a sender-side only algorithm can't work just like that in a
>>>>>>> multi-vendor environment. We must not only specify a congestion-
>>>>> control
>>>>>>> algorithm for the sender. We must also specify for the receiver a
>> more
>>>>>>> specific algorithm how and when to send PSNPs. At least how to do
>>>>> PSNPs
>>>>>>> under load.
>>>>>>>
>>>>>>> Note that this might result in the receiver sending more (and
>> smaller)
>>>>>>> PSNPs.
>>>>>>> More packets might mean more congestion (inside routers).
>>>>>>>
>>>>>>>
>>>>>>> Will receiver-side flow-control work ?
>>>>>>> ====
>>>>>>> I don't know if that's enough. It will certainly help.
>>>>>>>
>>>>>>> I think to tackle this problem, we need 3 parts:
>>>>>>> 1) sender-side congestion-control algorithm
>>>>>>> 2) more detailed algorithm on receiver when and how to send
>> PSNPs
>>>>>>> 3) receiver-side flow-control mechanism
>>>>>>>
>>>>>>> As discussed at length, I don't know if the ISIS process on the
>>>>>>> receiving
>>>>>>> router can actually know if its running out of resources (buffers on
>>>>>>> interfaces, linecards, etc). That's implementation dependent. A
>> receiver
>>>>>>> can definitely advertise a fixed value. So the sender has an upper
>> bound
>>>>>>> to use when doing congestion-control. Just like TCP has both a
>>>>>>> flow-control
>>>>>>> window and a congestion-control window, and a sender uses both.
>>>>> Maybe
>>>>>>> the
>>>>>>> receiver can even advertise a dynamic value. Maybe now, maybe
>> only in
>>>>>>> the
>>>>>>> future. An advertised upper limit seems useful to me today.
>>>>>>>
>>>>>>>
>>>>>>> What I didn't like about our own proposal (flooding over TCP):
>>>>>>> ====
>>>>>>> The problem I saw with flooding over TCP concerns multi-point
>> networks
>>>>>>> (LANs).
>>>>>>>
>>>>>>> When flooding over a multi-point network, setting up TCP
>> connections
>>>>>>> introduces serious challenges. Who are the endpoints of the TCP
>>>>>>> connections ?
>>>>>>> Full mesh ? Or do all ISes on a LAN create a TCP-connection to the
>> DIS ?
>>>>>>> There is no backup DIS in ISIS (unlike OSPF). Things get messy
>> quickly.
>>>>>>>
>>>>>>> However, the other two proposals do not solve this problem either.
>>>>>>> How will a sender-side congestion-avoidence algorithm determine
>>>>> whether
>>>>>>> there were drops ? There are no acks (PSNPs) on a LAN. We assume
>> most
>>>>>>> LSPs
>>>>>>> that are broadcasted are received by all other ISes on the LAN.
>> There
>>>>>>> are
>>>>>>> no acks. Only after the DIS has sent its periodic CSNPs, ISes can send
>>>>>>> PSNPs to request retransmissions. It seems impossible (or very
>> hard) to
>>>>>>> me for all ISes on a LAN to keep track of dropped LSPs and adjust
>> their
>>>>>>> sending speed accordingly..
>>>>>>>
>>>>>>> When flooding on a LAN, the receiver-side algorithm seems best.
>>>>> Because
>>>>>>> all ISes can see what the lowest advertised sending-speed is. And
>> make
>>>>>>> sure they send slow enough to not overload the slowest IS. I'm not
>> sure
>>>>>>> this is a good solution, but is seems easier and more realistic than
>>>>>>> ISIS-flooding-over-TCP or sender-side congestion-avoidance.
>>>>>>>
>>>>>>>
>>>>>>> My conclusion:
>>>>>>> ====
>>>>>>> Sender-side congestion-control won't work without specifying in
>> more
>>>>>>> detail how and when to send PSNPs.
>>>>>>> Receiver-side flow-control will certainly help. I dont' know if it's
>>>>>>> good enough. I don't know if advertising a static value is good
>> enough.
>>>>>>> But it's a start.
>>>>>>>
>>>>>>> I still think we'll end up re-implementing a new (and weaker) TCP.
>>>>>>>
>>>>>>>
>>>>>>> henk.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Lsr mailing list
>>>>>>> Lsr@ietf.org
>>>>>>> https://www.ietf.org/mailman/listinfo/lsr
>>>>>>
>>>>>> _______________________________________________
>>>>>> Lsr mailing list
>>>>>> Lsr@ietf.org
>>>>>> https://www.ietf.org/mailman/listinfo/lsr
>>>>>
>>>>>
>> __________________________________________________________
>>>>>
>> __________________________________________________________
>>>>> _____
>>>>>
>>>>> Ce message et ses pieces jointes peuvent contenir des informations
>>>>> confidentielles ou privilegiees et ne doivent donc
>>>>> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez
>> recu ce
>>>>> message par erreur, veuillez le signaler
>>>>> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
>>>>> electroniques etant susceptibles d'alteration,
>>>>> Orange decline toute responsabilite si ce message a ete altere,
>> deforme ou
>>>>> falsifie. Merci.
>>>>>
>>>>> This message and its attachments may contain confidential or privileged
>>>>> information that may be protected by law;
>>>>> they should not be distributed, used or copied without authorisation.
>>>>> If you have received this email in error, please notify the sender and
>> delete
>>>>> this message and its attachments.
>>>>> As emails may be altered, Orange is not liable for messages that have
>> been
>>>>> modified, changed or falsified.
>>>>> Thank you.
>>>>
>>>>
>> __________________________________________________________
>> __________________________________________________________
>> _____
>>>>
>>>> Ce message et ses pieces jointes peuvent contenir des informations
>> confidentielles ou privilegiees et ne doivent donc
>>>> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu
>> ce message par erreur, veuillez le signaler
>>>> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
>> electroniques etant susceptibles d'alteration,
>>>> Orange decline toute responsabilite si ce message a ete altere, deforme
>> ou falsifie. Merci.
>>>>
>>>> This message and its attachments may contain confidential or privileged
>> information that may be protected by law;
>>>> they should not be distributed, used or copied without authorisation.
>>>> If you have received this email in error, please notify the sender and
>> delete this message and its attachments.
>>>> As emails may be altered, Orange is not liable for messages that have
>> been modified, changed or falsified.
>>>> Thank you.
>>>>
>> __________________________________________________________
>> __________________________________________________________
>> _____
>>>>
>>>> Ce message et ses pieces jointes peuvent contenir des informations
>> confidentielles ou privilegiees et ne doivent donc
>>>> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu
>> ce message par erreur, veuillez le signaler
>>>> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
>> electroniques etant susceptibles d'alteration,
>>>> Orange decline toute responsabilite si ce message a ete altere, deforme
>> ou falsifie. Merci.
>>>>
>>>> This message and its attachments may contain confidential or privileged
>> information that may be protected by law;
>>>> they should not be distributed, used or copied without authorisation.
>>>> If you have received this email in error, please notify the sender and
>> delete this message and its attachments.
>>>> As emails may be altered, Orange is not liable for messages that have
>> been modified, changed or falsified.
>>>> Thank you.
>>>>
>>>> _______________________________________________
>>>> Lsr mailing list
>>>> Lsr@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/lsr
>>>
>>
>> __________________________________________________________
>> __________________________________________________________
>> _____
>>
>> Ce message et ses pieces jointes peuvent contenir des informations
>> confidentielles ou privilegiees et ne doivent donc
>> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce
>> message par erreur, veuillez le signaler
>> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
>> electroniques etant susceptibles d'alteration,
>> Orange decline toute responsabilite si ce message a ete altere, deforme ou
>> falsifie. Merci.
>>
>> This message and its attachments may contain confidential or privileged
>> information that may be protected by law;
>> they should not be distributed, used or copied without authorisation.
>> If you have received this email in error, please notify the sender and delete
>> this message and its attachments.
>> As emails may be altered, Orange is not liable for messages that have been
>> modified, changed or falsified.
>> Thank you.
> 
> _______________________________________________
> Lsr mailing list
> Lsr@ietf.org
> https://www.ietf.org/mailman/listinfo/lsr
>