Re: [Lsr] Dynamic flow control for flooding

<stephane.litkowski@orange.com> Wed, 24 July 2019 19:31 UTC

Return-Path: <stephane.litkowski@orange.com>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BD3D412047F for <lsr@ietfa.amsl.com>; Wed, 24 Jul 2019 12:31:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.597
X-Spam-Level:
X-Spam-Status: No, score=-2.597 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EPMrF0y3HCm3 for <lsr@ietfa.amsl.com>; Wed, 24 Jul 2019 12:31:15 -0700 (PDT)
Received: from relais-inet.orange.com (relais-inet.orange.com [80.12.70.34]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4FFF4120494 for <lsr@ietf.org>; Wed, 24 Jul 2019 12:31:15 -0700 (PDT)
Received: from opfednr01.francetelecom.fr (unknown [xx.xx.xx.65]) by opfednr25.francetelecom.fr (ESMTP service) with ESMTP id 45v5754rBxzCrcR; Wed, 24 Jul 2019 21:31:13 +0200 (CEST)
Received: from Exchangemail-eme6.itn.ftgroup (unknown [xx.xx.13.32]) by opfednr01.francetelecom.fr (ESMTP service) with ESMTP id 45v57543R6zDq8K; Wed, 24 Jul 2019 21:31:13 +0200 (CEST)
Received: from OPEXCAUBMA3.corporate.adroot.infra.ftgroup ([fe80::90fe:7dc1:fb15:a02b]) by OPEXCAUBM7C.corporate.adroot.infra.ftgroup ([::1]) with mapi id 14.03.0439.000; Wed, 24 Jul 2019 21:31:13 +0200
From: stephane.litkowski@orange.com
To: "Les Ginsberg (ginsberg)" <ginsberg@cisco.com>, "tony.li@tony.li" <tony.li@tony.li>
CC: "lsr@ietf.org" <lsr@ietf.org>
Thread-Topic: [Lsr] Dynamic flow control for flooding
Thread-Index: AQHVQVtmKvjAt9eVxU+2Fnzeg1St4qbYhmsAgACjvbD//+++AIAACsGAgADXvQCAACtVYA==
Date: Wed, 24 Jul 2019 19:31:12 +0000
Message-ID: <12688_1563996673_5D38B201_12688_338_1_9E32478DFA9976438E7A22F69B08FF924D9C3C20@OPEXCAUBMA3.corporate.adroot.infra.ftgroup>
References: <CAMj-N0LdaNBapVNisWs6cbH6RsHiXd-EMg6vRvO_U+UQsYVvXw@mail.gmail.com> <BYAPR11MB36382C89363202D1B5659614C1C70@BYAPR11MB3638.namprd11.prod.outlook.com> <5841_1563943794_5D37E372_5841_105_1_9E32478DFA9976438E7A22F69B08FF924D9C373E@OPEXCAUBMA3.corporate.adroot.infra.ftgroup> <BYAPR11MB363856BB026992DFBB3BB224C1C60@BYAPR11MB3638.namprd11.prod.outlook.com> <7D53FA6A-8072-4FC5-ABC9-5791F139C011@tony.li> <BYAPR11MB3638CD7EDAD8185BC4A788AEC1C60@BYAPR11MB3638.namprd11.prod.outlook.com>
In-Reply-To: <BYAPR11MB3638CD7EDAD8185BC4A788AEC1C60@BYAPR11MB3638.namprd11.prod.outlook.com>
Accept-Language: fr-FR, en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.114.13.247]
Content-Type: multipart/alternative; boundary="_000_9E32478DFA9976438E7A22F69B08FF924D9C3C20OPEXCAUBMA3corp_"
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/ZxpPxrwSwlOP8e7NS0Tneo_7c4Y>
Subject: Re: [Lsr] Dynamic flow control for flooding
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Jul 2019 19:31:19 -0000

Les,

Can’t we use from a transmitter point of view, the lack of ACKs from the receiver as a signal that the transmitter should slow down ?
I agree that depending on the exact bottleneck/issue, the IS-IS stack of the receiver may not be aware that something bad is happening and can’t provide feedback to the transmitter. However if the transmitter sees that the receiver is acking slowly or quickly, wouldn’t it be able to adjust its flooding speed. Can we have a receiver intentionally postponing a PSNP to aggregate multiple ACKs in a single message ?

From: Les Ginsberg (ginsberg) [mailto:ginsberg@cisco.com]
Sent: Wednesday, July 24, 2019 14:48
To: tony.li@tony.li
Cc: LITKOWSKI Stephane OBS/OINIS; lsr@ietf.org
Subject: RE: [Lsr] Dynamic flow control for flooding

More inline…

From: Tony Li <tony1athome@gmail.com> On Behalf Of tony.li@tony.li
Sent: Tuesday, July 23, 2019 10:56 PM
To: Les Ginsberg (ginsberg) <ginsberg@cisco.com>
Cc: stephane.litkowski@orange.com; lsr@ietf.org
Subject: Re: [Lsr] Dynamic flow control for flooding


Les,

There is something to be said for simply “flooding fast” and not worrying about flow control at all (regardless of whether TX or RX mechanisms would be used).


The only thing I would say to that is: really bad idea.

[Les:] I have to watch my words more closely. 😊
I am not arguing for this – but I do think that “most of the time” this strategy would actually be optimal.
We are discussing the extreme cases – as we should – where we have a large # of LSPs to flood. But let’s not lose sight of the fact that the simple approach works most of the time. For the times when the simple approach doesn’t work well, I am then arguing we should not overcomplicate the solution – particularly because the strategies we might use don’t help convergence.

If you supersaturate the receiver, you waste transmitter in the transmission, you swamp the receiver causing packet loss, you potentially trigger the loss of IIH’s, you risk causing a cascade failure, and until we come up with a better retransmission mechanism, you probably actually delay network convergence, as nothing is going to happen until you’ve completed retransmissions.
[Les:] Prioritization of hellos over LSPs/SNPs is a longstanding best practice (both on Tx and Rx) and this must not change. No one is advocating that – certainly not me.

The way to maximize goodput is NOT to transmit blindly.


[Les:] Not arguing for blindness, but I am arguing for simplicity.

But most important to me is to recognize that flow control (however done) is not fixing anything – nor making the flooding work better. The network is compromised and flow control won’t fix it.


???? The network is not compromised.

[Les:] If the SLA the customer expects is convergence in less than N, then a slow link jeopardizes our ability to achieve that.

If you accept that, then it makes sense to look for the simplest way to do flow control and that is decidedly not from the RX side. (I expect Tony Li to disagree with that 😊– but I have already outlined why it is more complex to do it from the Rx side.)



Flow control cannot be done without involvement of the RX side.  That’s why it’s called flow _control_.  The only thing that can be done purely from the TX side is a unilateral (and pessimal) transmit rate cap that will have to allow for the worst case CPU in the worst case situation (e.g., BGP impacting the CPU).

Flow control is the creation of a control loop and that requires feedback from the receiver.  This is true in every form of true flow control: XON/XOFF, RTS/CTS, sliding window protocols, credit based fabric mechanisms, etc.

I’ll go so far as to quote Wikipedia:

"In data communications<https://en.wikipedia.org/wiki/Data_communications>, flow control is the process of managing the rate of data transmission between two nodes to prevent a fast sender from overwhelming a slow receiver. It provides a mechanism for the receiver to control the transmission speed, so that the receiving node is not overwhelmed with data from transmitting node.”

[Les:] I will not argue about the definition.
In this specific case, there are difficulties in controlling the flooding rate based on advertisements from the RX side. The difficulties are outlined in my slides and largely have to do with the difficulties/costs of dynamically calculating what number to advertise. (A static advertisement is also difficult to calculate w/o being overly conservative.)

If you disagree please take things bullet-by-bullet:


  *   LSP input queue implementations are typically interface independent FIFOs
  *   Overloaded Receiver does not know which senders are disproportionately causing the overflow
  *   LSPs may be dropped at lower layers – IS-IS receiver may be unaware that the overload condition exists
  *   Updating hellos dynamically to alter flooding transmission rate is an OOB signaling mechanism consuming  resources at a time when routers are the most busy
  *   Consistent flooding rates will require updated hellos be sent to all neighbors – exacerbating the cost on both sender and receiver

   Les

Tony


_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.