Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

Brian Dickson <> Wed, 16 December 2020 23:09 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id D1AB13A1288 for <>; Wed, 16 Dec 2020 15:09:01 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 9ia3Wmuz4P4b for <>; Wed, 16 Dec 2020 15:08:59 -0800 (PST)
Received: from ( [IPv6:2607:f8b0:4864:20::930]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 817F43A128E for <>; Wed, 16 Dec 2020 15:08:59 -0800 (PST)
Received: by with SMTP id n18so8551294ual.9 for <>; Wed, 16 Dec 2020 15:08:59 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=lNAnOq4lQvtnaIckNa/XwesUXqiIWI40gvo5HRkqgvk=; b=egSgWvhbPQ0HgUXCbXxzEZnp5nVWGQexdGd/B+UXWmO+uS0tE332M4ifoiI1yg/IaE 4QcG6HJ9sp0IysVOwgD/BrdQed49Lg7pPXUTZV0gOpbP9eNNQ6ZXrp1rlzU+l/mTaaOY b/c3Z/BQAwbcdgqyX92FUnR9SJHLHeAt4sg86fwqAogYFLFC/qGm/OZLeH9aPnyjLVIG UiDJPXT9I5YQE2FCBMZiPBrOfH8HPP954oKDTNP8ZDXjTWFX9SYUCSOAteLHnNi3pxQy alPClpM7Me0OLa+IBLUKCZEYEXJk8gC8iOfqLtDZhNrQJtUemLDPlZszLwTtAdUlFmJC jUcw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=lNAnOq4lQvtnaIckNa/XwesUXqiIWI40gvo5HRkqgvk=; b=rFa3h8A41LVJUekP57w+Z7bR+E6Q99Yd0LDEXhTMJCqVwnc5RoIenBMn2+ONUvreay ceEEzIRuO68VExgbTj5P7H/itSpX5Gi+avHB4bIes1CivbIM0fCgonv+ip/ovqRz6EB3 s72l6oBxTPRQTttzj6ww/sfKYyxemVrHu/ppFBrOQH2pZw/szRtC8BClm2N7JtdZ/jz/ UFLTovUHtCmahv86U4b9KEnMiRRetUbVE2UAB6cybJChs1NMYEdYL7umY9cfqYPZTAH5 S/44ekYcE2vvsz4IiI1sCGtuhBLGcQOhFXpg6Os48Nf8Lk3tLDTed22z3JKI4VQpI/AY rFzg==
X-Gm-Message-State: AOAM5323AFmUUvTENlom1/5UrUAKWeNrlXE1KQzDU2sO95aD9GWBd8j8 RUxdGiwKSzvdTyUfq81PfxG01Mn3jomdV2hHv7Ohm6+KChDYXg==
X-Google-Smtp-Source: ABdhPJyHHg/wCAWyKSONZ48C3BDlR4AcAdYavYZJUGRknfX3AAoM20V8fydVGrC0FCg0wnEDcYFilMxzflAa0a9Q6EQ=
X-Received: by 2002:ab0:2e9:: with SMTP id 96mr34239539uah.87.1608160138570; Wed, 16 Dec 2020 15:08:58 -0800 (PST)
MIME-Version: 1.0
References: <> <> <> <> <> <> <> <> <> <> <>
In-Reply-To: <>
From: Brian Dickson <>
Date: Wed, 16 Dec 2020 15:08:47 -0800
Message-ID: <>
To: Jeffrey Haas <>
Cc: "Jakob Heitz (jheitz)" <>, "" <>
Content-Type: multipart/alternative; boundary="000000000000cd302f05b69cf53e"
Archived-At: <>
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 16 Dec 2020 23:09:02 -0000

On Wed, Dec 16, 2020 at 1:43 PM Jeffrey Haas <> wrote:

> Brian,
> On Wed, Dec 16, 2020 at 11:17:37AM -0800, Brian Dickson wrote:
> > While this is my opinion on the best way to handle it, the underlying
> facts
> > aren't arguable.
> > An AS-wide situation (stuck receivers with no TCP progress) would never
> > result in the AS sending withdrawals.
> For the incident in question, it's not possible for a single BGP
> implementation to decide that something AS-wide is happening.  And even so,
> auto-mitigation of this triggered by a single session on a single device
> would be unwise.
Fair enough.

Thinking a bit bigger-picture, who could or should be able to (a) detect,
and (b) respond to, a situation like this in future?
What are the pros/cons of different approaches, in terms of risk (of
accidental or malicious outages induced), or effectiveness?

I'll start:
A large network peering with another large network, is likely to have more
If all of the sessions are stuck (but still up), that's a much stronger
indicator, and maybe that'd be a good situation when auto-reaction would be
Assuming the auto-reaction was limited to large peers of a large network, I
think this is less risky, and still very effective.
(There would still be a challenge of how to share this state discovery
across an ASN, but that's a more constrained problem to solve, IMHO.)

> > It has occured and can occur, ergo it needs to be handled outside of the
> > state machine proper.
> For the demonstrated case, a coordinated response was needed.  To some
> extent, the argument is for tooling to permit easy shutdown of sessions.
> Operators have plenty of provisioning machinery.  Writing up the use case
> and motivations for encouraging such a thing seems like something
> appropriate to an operational forum.
A large ASN may want a reliable, secure method of shutting down peers for
some modest duration. Is that also something to consider developing a
solution for?
Basically, a "Shoot me now, shoot me now." kind of thing.
I don't see that possible without signatures a la RPKI, but don't know if
it's something anyone would really want to have available.
The conceptual model would be that of a rapid administrative shutdown of
peering sessions, possibly with a pre-configured timer to re-enable
sessions and/or a start time?
Maybe signed with both a personal PGP key and an RPKI key, so as to have a
great deal of control over its use and trustworthiness.