Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

Job Snijders <job@sobornost.net> Wed, 16 December 2020 20:07 UTC

Return-Path: <job@sobornost.net>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 172A43A0EC9 for <idr@ietfa.amsl.com>; Wed, 16 Dec 2020 12:07:38 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id O6cvnbdnM3jo for <idr@ietfa.amsl.com>; Wed, 16 Dec 2020 12:07:35 -0800 (PST)
Received: from outbound.soverin.net (outbound.soverin.net [IPv6:2a01:4f8:fff0:2d:8::215]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6FA333A0EC8 for <idr@ietf.org>; Wed, 16 Dec 2020 12:07:34 -0800 (PST)
Received: from smtp.freedom.nl (unknown [10.10.3.36]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by outbound.soverin.net (Postfix) with ESMTPS id 7BC43601AE; Wed, 16 Dec 2020 20:07:31 +0000 (UTC)
Received: from smtp.freedom.nl (smtp.freedom.nl [116.202.65.211]) by soverin.net
Received: from localhost (bench.sobornost.net [local]) by bench.sobornost.net (OpenSMTPD) with ESMTPA id e446a161; Wed, 16 Dec 2020 20:07:27 +0000 (UTC)
Date: Wed, 16 Dec 2020 20:07:27 +0000
From: Job Snijders <job@sobornost.net>
To: Robert Raszuk <robert@raszuk.net>
Cc: "Jakob Heitz (jheitz)" <jheitz=40cisco.com@dmarc.ietf.org>, "idr@ietf.org" <idr@ietf.org>
Message-ID: <X9po/91zjLQ3ODp/@bench.sobornost.net>
References: <91D9B9F7-0DBE-45E6-84D5-2E3D9F8C44A1@tix.at> <X9kweQ5EtTL7tOAM@bench.sobornost.net> <CAOj+MMFySPXpE8QxcO+7szKzQ78faQASYKnBUYg_h_aLd=P4Lg@mail.gmail.com> <BYAPR11MB3207412804697588E4AA3F03C0C60@BYAPR11MB3207.namprd11.prod.outlook.com> <20201216093614.GI68083@diehard.n-r-g.com> <4E9BEA12-998A-4AD1-B342-4F26AA6EBA69@cisco.com> <20201216174319.GM68083@diehard.n-r-g.com> <BYAPR11MB320759EE6ABC8AB863BC1838C0C50@BYAPR11MB3207.namprd11.prod.outlook.com> <X9phnLQWPIVrcjwo@bench.sobornost.net> <CAOj+MMEpL9BBOL8K9k0W-x3qvOqk+KGXdnchR9zAL-93gs480A@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAOj+MMEpL9BBOL8K9k0W-x3qvOqk+KGXdnchR9zAL-93gs480A@mail.gmail.com>
X-Clacks-Overhead: GNU Terry Pratchett
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/uQzjEl4JlMGLvnDg-arx1Gt7X4k>
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Dec 2020 20:07:38 -0000

On Wed, Dec 16, 2020 at 08:53:25PM +0100, Robert Raszuk wrote:
> I think your observations may not always apply.
> 
> Imagine you are peering with a stub customer and he is only getting default
> route from you while advertising few routes to your ASN.

The stub customer still is not be receiving KEEPALIVEs, the stub has no
idea should you want to announce routes in addition to the default
route, or withdraw the default route (for some type of maintenance). All
these important BGP messages simply never make it to the stuck peer.

Flapping the session increases chances of recovery, or at least draws
operator attention.

> At min make before break (new session should be established - if
> possible) before killing the old one.

What if new session can't establish? The old session still is stuck and
most likely hurting the users of the network. As stated before in the
thread: this 'stuck' situation does *not* appear under normal
circumstances, many BGP sessions have been inspected. We now know of
multiple situations where automatic disconnection of 'stuck' peers would
have helped improve global routing.

A consession: vendors are free to make it possible to disable the new
improved behavior. If an operator knows of a deployment scenario where
after exchanging the OPEN & sending a single UPDATE (that default route)
no further bi-directional communication is required, sure. However, I'd
probably recommend considering RIP instead of BGP at that point ;-)

Kind regards,

Job

ps. No snark intended: I appreciate the working group looking at each
and every corner case. I understand it is quite unusual for someone to
point at the BGP-4 FSM and say 'I think we are hurting here', decades
into the deployment.