Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

Jeffrey Haas <> Thu, 17 December 2020 14:28 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 92C1A3A08C5 for <>; Thu, 17 Dec 2020 06:28:39 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.8
X-Spam-Status: No, score=-1.8 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, BODY_ENHANCEMENT2=0.1, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 0gbNMwrZA4zj for <>; Thu, 17 Dec 2020 06:28:37 -0800 (PST)
Received: from ( []) by (Postfix) with ESMTP id 1C43C3A08BE for <>; Thu, 17 Dec 2020 06:28:37 -0800 (PST)
Received: from ( []) by (Postfix) with ESMTPSA id 736561E356; Thu, 17 Dec 2020 09:46:01 -0500 (EST)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.\))
From: Jeffrey Haas <>
In-Reply-To: <>
Date: Thu, 17 Dec 2020 09:28:38 -0500
Cc: "Jakob Heitz (jheitz)" <>, "" <>
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <> <> <> <> <> <> <> <> <> <> <> <>
To: Brian Dickson <>
X-Mailer: Apple Mail (2.3608.
Archived-At: <>
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 17 Dec 2020 14:28:40 -0000


> On Dec 16, 2020, at 6:08 PM, Brian Dickson <> wrote:
> Thinking a bit bigger-picture, who could or should be able to (a) detect, and (b) respond to, a situation like this in future?
> What are the pros/cons of different approaches, in terms of risk (of accidental or malicious outages induced), or effectiveness?
> I'll start:
> A large network peering with another large network, is likely to have more visibility.
> If all of the sessions are stuck (but still up), that's a much stronger indicator, and maybe that'd be a good situation when auto-reaction would be appropriate.
> Assuming the auto-reaction was limited to large peers of a large network, I think this is less risky, and still very effective.
> (There would still be a challenge of how to share this state discovery across an ASN, but that's a more constrained problem to solve, IMHO.)

This sort of analysis is amenable to current centrally collected telemetry situations.  It's just not one that would be in people's playbooks.

> A large ASN may want a reliable, secure method of shutting down peers for some modest duration. Is that also something to consider developing a solution for?

Arguably, how is this different than your billing system deciding someone hasn't paid the bill and you shut down their BGP?  In this case, it's just a matter of finding all of either the impacted peering sessions, or having a list of all peering sessions by AS.

> I don't see that possible without signatures a la RPKI, but don't know if it's something anyone would really want to have available.

Pushing this action into someone else's system is probably a non-starter.  If you think people were upset at the Dutch court attack scenario...

> The conceptual model would be that of a rapid administrative shutdown of peering sessions, possibly with a pre-configured timer to re-enable sessions and/or a start time?

I think in the scenario in question, automated and persisted shutdown is the desire.  Figuring out whether you have stable BGP, much less safe transit BGP through the provider is trickier.

-- Jeff