Re: [Lsr] Why only a congestion-avoidance algorithm on the sender isn't enough

Henk Smit <> Mon, 04 May 2020 09:47 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 74C1B3A0762 for <>; Mon, 4 May 2020 02:47:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id W2tumftjhMlk for <>; Mon, 4 May 2020 02:47:38 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 2AACB3A074E for <>; Mon, 4 May 2020 02:47:37 -0700 (PDT)
Received: from ([IPv6:2001:888:0:22:194:109:20:213]) by with ESMTPA id VXhGj6ASotKAsVXhGjN3zA; Mon, 04 May 2020 11:47:35 +0200
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=s1; t=1588585655; bh=FNvTGFQTeCWDUp+B0NZSuyy7BfRUbh6N+6xsnF+E8ng=; h=MIME-Version:Content-Type:Date:From:To:Subject:Message-ID:From: Subject; b=Jj5jX0Zx8nUfVdtTGtnyJOzqtUuQAOxGFGXMFap3byTXru7mA6RjBibQSTjP7DatZ E8FZbDNbat5bXPf+ko2GFdPWhbE/GwVD2opvAKpB3zWZdNz0mQp10eFBn8jE3X40So S1h3rLoJfpWOGmyIR6M/DfMm32W/JSjT7POq+4dM6NR5/E7BzlgBwW7Hd/4j1v495H wQ4k8mEi42iFvg0n9JfuMFICN5FS2jW4pYNOWLxa97BWG6M0CPq0f+xvvD5kXYRSAN HGqPB8tHlKAmmQn0MjU11zxdSoiA/JtwkWdML8zF6cKHQU+IJHg20CXr/eSisKMlCL py/Ip41/sq6Gg==
Received: from ([]) by with HTTP (HTTP/1.1 POST); Mon, 04 May 2020 11:47:34 +0200
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII; format=flowed
Content-Transfer-Encoding: 7bit
Date: Mon, 04 May 2020 11:47:34 +0200
From: Henk Smit <>
To: Christian Hopps <>
In-Reply-To: <>
References: <> <>
Message-ID: <>
User-Agent: XS4ALL Webmail
X-CMAE-Envelope: MS4wfDnjm+6mpN/usQpO2O45is8mLS7P2JBOzh63Srodg39AthJGriIaI7ZewGZYvK4KK9aJSA40NIJhWks2goDB26zCcJ2KQPZEVmv2NLOFfsYPeY63jKeV Uh+Ymkz84YWPQYdaAAJcQx25tuQzSz2gLYONBhHotPhNlJJMkllJ469TidHGouREpqnAdgIfoGfYxDigzMV5dbY6QGlhrTCs3ZAXpu75MusoYGLOZ0pSUQTC
Archived-At: <>
Subject: Re: [Lsr] Why only a congestion-avoidance algorithm on the sender isn't enough
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 04 May 2020 09:47:40 -0000

On Friday I wrote:
> I still think we'll end up re-implementing a new (and weaker) TCP.

Christian Hopps wrote 2020-05-04 01:27:
> Let's not be too cynical at the start though! :)

I wasn't trying to be cynical.
Let me explain my line of reasoning two years ago.

When reading about the proposals for limiting the flooding topology
in IS-IS, I read a requirement doc. It said that the goal was to
support areas (flooding domains) of 10k routers. Or maybe even 100k
routers. My immediate thought was: "how are you gonna sync the LSDB
when a router boots up ? That takes 300 to 3000 seconds !?".

This is the problem I wanted to solve. I hadn't even thought of
routers in dense topologies that have 1k+ neighbors.

There are currently heathens that use BGP as IGP in their data-centers.
There's even a cult that is developing a new IGP on top of BGP (LSVR).
If they think BGP/BGP-LS/LSVR are good choices for an IGP, why is that ?
One reason is that people claim that BGP is more scalable. Note, when
doing "Internet-routing" with large number of prefixes, routers, or
some implementations of BGP, still sometimes need minutes, or dozens
of minutes to process and re-advetise all those prefixes. So when we
talk about minutes, why do people think BGP is so much more wonderful ?
I think it's TCP. TCP can transport lots of info quickly and 
And conceptually TCP is easy to understand for the user ("you write
into a socket, you read from a socket on the other box. done").

If TCP is good enough for BGP bulk-transport, it should be good
enough for IS-IS bulk-transport.

If there are issues with using TCP for routing-protocols, I'm sure
we've solved those by now (in our implementations). We can use those
same solutions/tweaks we use for BGP's TCP in ISIS's TCP. Or am I
too naive now ?

BTW, all the implementations I've worked with used regular TCP. All
the Open Source BGPs seem to be using the regular TCP in their
kernels. Can someone explain why TCP is good for BGP but not for IS-IS ?

Almost 24 years ago, I sat on a bench in Santa Cruz discussing protocols
with an engineer who had a lot more experience than I had, and still 
He was designing LDP at the time (with Yakov). LDP also uses TCP.
He said "if we had to design IS-IS now, of course we'd use TCP as
transport now". I never forgot that.

The goal here is not to make IS-IS transport optimal. We don't need to
use maximum available bandwidth. I just happen to think we need the
same 2 elements that TCP has: sender-side congestion-avoidance and
receiver-side flow-control. I hope I have explained why sender-side
congestion-control in IS-IS is not enough (you don't get the feedback
you need to make it work). Les and others have tried to explain
why receiver-side flow-control is hard to implement (the receiving
IS-IS might not know about the state of its interfaces, linecards, etc).

That's why I think we need both.
And when we implement both, it'll start to look like TCP.
So why not use TCP itself ?
Or Quic ? Or another transport that's already implemented ?

> I'd note that our environment is a bit more controlled than the
> end-to-end internet environment. In IS-IS we are dealing with single
> link (logical) so very simple solutions (CTS/RTS, ethernet PAUSE)
> could be viable.

Les's argument is that it's often not so controlled.

Let me ask you one question:
In your algorithm, the receiving IS-IS will send a "pause signal" when
it is overrun. How does IS-IS know it is overrun ? The router is 
IS-IS pdu's on the interface, on the linecard, on the queue between 
and Control Plane, on the IS-IS process's input-queue. When queues are 
you can't send a message up saying "we didn't have space for an IS-IS 
but we're sending you this message that we've just dropped an IS-IS 
How do you envision this works ?

Imho receiver-side flow-control can only send a rough upper-bound on how 
pdu's it can receive normally.

A solution with a "pause signal" is basically the same as a 
flow-control, where the receive-window is either 0 or infinite.

> Thus our choice of algorithms may well be less restricted.

I'm looking forward to seeing (an outline of) your algorithm.

Again, I'm not pushing for TCP (anymore). I'm not pushing for anything.
I'm just trying to explain the problems that I see with solutions
that are, imho, a bit too simple to really help. Maybe I'm wrong, and
the problem is simpler than I think. Experimentation would be nice.