[Lsr] Why only a congestion-avoidance algorithm on the sender isn't enough
Henk Smit <henk.ietf@xs4all.nl> Thu, 30 April 2020 13:57 UTC
Return-Path: <henk.ietf@xs4all.nl>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id ECF663A07A3 for <lsr@ietfa.amsl.com>; Thu, 30 Apr 2020 06:57:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level:
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=xs4all.nl
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EAT1YdwvrDPr for <lsr@ietfa.amsl.com>; Thu, 30 Apr 2020 06:57:40 -0700 (PDT)
Received: from lb1-smtp-cloud7.xs4all.net (lb1-smtp-cloud7.xs4all.net [194.109.24.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 66E683A079B for <lsr@ietf.org>; Thu, 30 Apr 2020 06:57:40 -0700 (PDT)
Received: from webmail.xs4all.nl ([IPv6:2001:888:0:22:194:109:20:203]) by smtp-cloud7.xs4all.net with ESMTPA id U9h2jzyZNzorkU9h2jboE0; Thu, 30 Apr 2020 15:57:37 +0200
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=xs4all.nl; s=s1; t=1588255057; bh=ejz2wj4Hc75e+DpvxRqWdLaEwuo9P6b3X1DvTBGVJoU=; h=MIME-Version:Content-Type:Date:From:To:Subject:Message-ID:From: Subject; b=pNS4G6/CXekVZ7YKMx6l/PB7OuIYnRF6ReQPshMwbQT0+InPIUfnEEAe+eQIS/f5r IP1ZAJG2FB2pO7sHP6AL4P/dnGLU+plB8Af4Hse/UVHoHwzayVN/IShwlpr4PosMTx bS5QcxERLhx3j94QJKHfHdcvYAFd0nQEkM6dW4KCvMYgbZalP9VOcIwFze1Zy9Zc8F XRU4usSH5+Xv9ZT2GwAdlX3xHWHlXjeOpcGUwac9/lK/eyCEiuYbUg8ayDhli2AqVX kCOC8pAcYSHgXAEpoq7sjlIGHqJzYWtGKfl6jJL64SyveLI/yz3vDLQSOMUj+BbdvY 9q1c07aHd2G5Q==
Received: from knint.xs4all.nl ([83.162.203.154]) by webmail.xs4all.nl with HTTP (HTTP/1.1 POST); Thu, 30 Apr 2020 15:57:36 +0200
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"; format="flowed"
Content-Transfer-Encoding: 7bit
Date: Thu, 30 Apr 2020 15:57:36 +0200
From: Henk Smit <henk.ietf@xs4all.nl>
To: lsr@ietf.org
Message-ID: <07927ff234a0eb40738e63c36a356df5@xs4all.nl>
X-Sender: henk.ietf@xs4all.nl
User-Agent: XS4ALL Webmail
X-CMAE-Envelope: MS4wfE2wfmyFeTjJLlWndmeynxfk2vMGP7Nz776iOkVf38tqZ5X1eRN1G9V/j67UUN1XXBUNoE1caf5mmWFhPAk50gwHgiTYKqJho4aJ51Ej7ugX48hiFe1X sY46XlD0CZL+7AMyasFksVJd+gLe3P44c1HVtLNCBHGk543iWcESWM7b0eKkuMFiGA33xvxKil/67djoZAYpoqzwwqmeE88sInU=
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/7vjJgA9LKCoO6bXDxQoLM7iBKCk>
Subject: [Lsr] Why only a congestion-avoidance algorithm on the sender isn't enough
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Apr 2020 13:57:43 -0000
Hello all, Two years ago, Gunter Van de Velde and myself published this draft: https://tools.ietf.org/html/draft-hsmit-lsr-isis-flooding-over-tcp-00 That started this discussion about flow/congestion control and ISIS flooding. My thoughts were that once we start implementing new algorithms to optimize ISIS flooding speed, we'll end up with our own version of TCP. I think most people here have a good general understanding of TCP. But if not, this is a good overview how TCP does it: https://en.wikipedia.org/wiki/TCP_congestion_control What does TCP do: ==== TCP does 2 things: flow control and congestion control. 1) Flow control is: the receiver trying to prevent itself from being overloaded. The receiver indicates, through the receiver-window-size in the TCP acks, how much data it can or wants to receive. 2) Congestion control is: the sender trying to prevent the links between sender and receiver from being overloaded. The sender makes an educated guess at what speed it can send. The part we seem to be missing: ==== For the sender to make a guess at what speed it can send, it looks at how the transmission is behaving. Are there drops ? What is the RTT ? Do drop-percentage and RTT change ? Do acks come in at the same rate as the sender sends segments ? Are there duplicate acks ? To be able to do this, the sender must know what to expect. How acks behave. If you want an ISIS sender to make a guess at what speed it can send, without changing the protocol, the only thing the sender can do is look at the PSNPs that come back from the receiver. But the RTT of PSNPs can not be predicted. Because a good ISIS implementation does not immediately send a PSNP when it receives a LSP. 1) the receiver should jitter the PSNP, like it should jitter all packets. And 2) the receiver should wait a little to see if it can combine multiple acks into a single PSNP packet. In TCP, if a single segment gets lost, each new segment will cause the receiver to send an ack with the seqnr of the last received byte. This is called "duplicate acks". This triggers the sender to do fast-retransmission. In ISIS, this can't be be done. The information a sender can get from looking at incoming PSNPs is a lot less than what TCP can learn from incoming acks. The problem with sender-side congestion control: ==== In ISIS, all we know is that the default retransmit-interval is 5 seconds. And I think most implementations use that as the default. This means that the receiver of an LSP has one requirement: send a PSNP within 5 seconds. For the rest, implementations are free to send PSNPs however and whenever they want. This means a sender can not really make conclusions about flooding speed, dropped LSPs, capacity of the receiver, etc. There is no ordering when flooding LSPs, or sending PSNPs. This makes a sender-side algorithm for ISIS a lot harder. When you think about it, you realize that a sender should wait the full 5 seconds before it can make any real conclusions about dropped LSPs. If a sender looks at PSNPs to determine its flooding speed, it will probably not be able to react without a delay of a few seconds. A sender might send hunderds or thousands of LSPs in those 5 seconds, which might all or partially be dropped, complicating matters even further. A sender-sider algorithm should specify how to do PSNPs. ==== So imho a sender-side only algorithm can't work just like that in a multi-vendor environment. We must not only specify a congestion-control algorithm for the sender. We must also specify for the receiver a more specific algorithm how and when to send PSNPs. At least how to do PSNPs under load. Note that this might result in the receiver sending more (and smaller) PSNPs. More packets might mean more congestion (inside routers). Will receiver-side flow-control work ? ==== I don't know if that's enough. It will certainly help. I think to tackle this problem, we need 3 parts: 1) sender-side congestion-control algorithm 2) more detailed algorithm on receiver when and how to send PSNPs 3) receiver-side flow-control mechanism As discussed at length, I don't know if the ISIS process on the receiving router can actually know if its running out of resources (buffers on interfaces, linecards, etc). That's implementation dependent. A receiver can definitely advertise a fixed value. So the sender has an upper bound to use when doing congestion-control. Just like TCP has both a flow-control window and a congestion-control window, and a sender uses both. Maybe the receiver can even advertise a dynamic value. Maybe now, maybe only in the future. An advertised upper limit seems useful to me today. What I didn't like about our own proposal (flooding over TCP): ==== The problem I saw with flooding over TCP concerns multi-point networks (LANs). When flooding over a multi-point network, setting up TCP connections introduces serious challenges. Who are the endpoints of the TCP connections ? Full mesh ? Or do all ISes on a LAN create a TCP-connection to the DIS ? There is no backup DIS in ISIS (unlike OSPF). Things get messy quickly. However, the other two proposals do not solve this problem either. How will a sender-side congestion-avoidence algorithm determine whether there were drops ? There are no acks (PSNPs) on a LAN. We assume most LSPs that are broadcasted are received by all other ISes on the LAN. There are no acks. Only after the DIS has sent its periodic CSNPs, ISes can send PSNPs to request retransmissions. It seems impossible (or very hard) to me for all ISes on a LAN to keep track of dropped LSPs and adjust their sending speed accordingly. When flooding on a LAN, the receiver-side algorithm seems best. Because all ISes can see what the lowest advertised sending-speed is. And make sure they send slow enough to not overload the slowest IS. I'm not sure this is a good solution, but is seems easier and more realistic than ISIS-flooding-over-TCP or sender-side congestion-avoidance. My conclusion: ==== Sender-side congestion-control won't work without specifying in more detail how and when to send PSNPs. Receiver-side flow-control will certainly help. I dont' know if it's good enough. I don't know if advertising a static value is good enough. But it's a start. I still think we'll end up re-implementing a new (and weaker) TCP. henk.
- [Lsr] Why only a congestion-avoidance algorithm o… Henk Smit
- Re: [Lsr] Why only a congestion-avoidance algorit… Christian Hopps
- Re: [Lsr] Why only a congestion-avoidance algorit… Mitchell Erblich
- Re: [Lsr] Why only a congestion-avoidance algorit… tony.li
- Re: [Lsr] Why only a congestion-avoidance algorit… Mitchell Erblich
- Re: [Lsr] Why only a congestion-avoidance algorit… Henk Smit
- Re: [Lsr] Why only a congestion-avoidance algorit… Henk Smit
- Re: [Lsr] Why only a congestion-avoidance algorit… Christian Hopps
- Re: [Lsr] Why only a congestion-avoidance algorit… Les Ginsberg (ginsberg)