Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

William McCall <> Sat, 12 December 2020 15:38 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id CD50C3A119D for <>; Sat, 12 Dec 2020 07:38:50 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id yTgUFgkp9XVg for <>; Sat, 12 Dec 2020 07:38:49 -0800 (PST)
Received: from ( [IPv6:2607:f8b0:4864:20::72f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id A699F3A11A2 for <>; Sat, 12 Dec 2020 07:38:49 -0800 (PST)
Received: by with SMTP id z11so11536456qkj.7 for <>; Sat, 12 Dec 2020 07:38:49 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=JhpWukDMmXrq1vprCvm0LU/VB5DZBrNLnI14fco5ojQ=; b=Hbo6B0iVK0j7WNSmu24nvyAVkBrLcw24s82adYrHOteyeTBOPozLteaAJXqCejjpk+ eqg4RvNd8vLE4L+tdCKYn/tZPKazxx/F+H6cWN5Su5bsOiWZxw1e71RWuOtNMMLE+IiH rlTeQeDhljZar48VvVv3xWMmSsOF5pkw5m6N5reP2RcXAws2DQxsbDtYR1KI8XGT+wGO Ga/ch6FpBbirNj3nig4qVjHNz9W0Z6mtTpPdhmlnFQ0IsSmXi3fb4Kr9M3i76tbn1ds4 kwn0szgRWa6SpPoHnpcQooQtBR6tKmIFOgwxGW+O5wSEzbmNk7fm7y+8Fp10NRoUTTva EFeQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=JhpWukDMmXrq1vprCvm0LU/VB5DZBrNLnI14fco5ojQ=; b=FxP8u0Duu/qG3VDA0OygH4s2SA7fs0oJ871duW2IexMxIrPuR1Go0xhccazOjPesfI vB97ZB1kpwWGcCg7eF+8qVuFax7ZE3llCNOCToD/2LoTjYWFU5bolCs0UqO6qYLKNkm9 +u7KoyBJ9PAFp7ESSp9Ux39f4vABUKqXHz6u5uBU5Ivu1T5g8WP1Tl8CTq+Xh17EBU9O xW0ZexWFl4uHMW9WdAPu8KY9SxcgbkLlvVLSQdsp2wlx1vpGFLaSzqzv1YiKjeuacQCW z7V6G81OAKRkPzezaRzHaK+vFoDt6+ES4mTtrzDv98kxiW/QL33+YowN2MYHiIexHrs6 0+jA==
X-Gm-Message-State: AOAM531bL1OcG8KVQstBxwieWWrTUzi/YiEi5ljD+nG2QeT0xWmU2UhV RGdPFwuZFBC8GowA/ZZ8pNWfbRwLlxMm3DO2owjdSxMT
X-Google-Smtp-Source: ABdhPJw0mQdCtlv4yd3bJQDJZXalz+KDbH//5NFpwhdDd0C6FVox2sdE9AYdkKvIt5NTu4bQubunkGFZhr5TUSGfG9Q=
X-Received: by 2002:a37:a7d0:: with SMTP id q199mr22587587qke.217.1607787528565; Sat, 12 Dec 2020 07:38:48 -0800 (PST)
MIME-Version: 1.0
References: <> <> <> <>
In-Reply-To: <>
From: William McCall <>
Date: Sat, 12 Dec 2020 15:38:37 +0000
Message-ID: <>
To: "Jakob Heitz (jheitz)" <>, "" <>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <>
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sat, 12 Dec 2020 15:38:51 -0000

On Sat, Dec 12, 2020 at 3:29 AM Jakob Heitz (jheitz)
<> wrote:
> Good point Keyur.
> A receiver may be overwhelmed for a long time and not open its TCP window to avoid
> silly window syndrome or some other reason. The receiver may still be functional
> and able to clear its backlog, albeit in a long time. Resetting such a session
> will only make the situation worse. Telling the difference between this case
> and a receiver stuck in a bug is difficult.
> Regards,
> Jakob.

Byzantine Generals again.

We ran into this problem recently. The update group behavior made a
single session's
slowness cascade to all of the others, so there was much screaming.

At a certain point, a slow neighbor is a pointless neighbor. If we saw a number,
say 2 days backlog, it would be an obvious problem that requires
reset. Lower numbers
get trickier.

It seems to me that we're looking at 3 categories:

1) Normal sessions - don't kill
2) Slow sessions - don't kill yet, do the slow session things.
3) Too slow - goodbye session.

Maybe a "minimum update rate" algo that would kick off #2 and #3 is a
possibility. A bit
more flexible than just looking at queue depth or getting too deep
into other layers by
looking at buffer characteristics. I know some implementations implement #2, but
I never bothered to dig into how it works. #3 just seems like a
logical extension.

When we ran into this problem recently, the only discussion of window
size was as an affirmation
that the broken box was, in fact, the broken box. But the problem
description is always
something like "stale routes" or "routing loops" (undoubtedly because
they are stale).