Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

Job Snijders <job@sobornost.net> Wed, 16 December 2020 20:17 UTC

Return-Path: <job@sobornost.net>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 59A783A0EED for <idr@ietfa.amsl.com>; Wed, 16 Dec 2020 12:17:57 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=unavailable autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id r78yI5r1TR-W for <idr@ietfa.amsl.com>; Wed, 16 Dec 2020 12:17:55 -0800 (PST)
Received: from outbound.soverin.net (outbound.soverin.net [IPv6:2a01:4f8:fff0:2d:8::215]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8682E3A0EF0 for <idr@ietf.org>; Wed, 16 Dec 2020 12:17:55 -0800 (PST)
Received: from smtp.freedom.nl (unknown [10.10.3.36]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by outbound.soverin.net (Postfix) with ESMTPS id DAC52600EC; Wed, 16 Dec 2020 20:17:52 +0000 (UTC)
Received: from smtp.freedom.nl (smtp.freedom.nl [116.202.65.211]) by soverin.net
Received: from localhost (bench.sobornost.net [local]) by bench.sobornost.net (OpenSMTPD) with ESMTPA id 6d5f9f38; Wed, 16 Dec 2020 20:17:49 +0000 (UTC)
Date: Wed, 16 Dec 2020 20:17:49 +0000
From: Job Snijders <job@sobornost.net>
To: "Jakob Heitz (jheitz)" <jheitz=40cisco.com@dmarc.ietf.org>
Cc: "idr@ietf.org" <idr@ietf.org>
Message-ID: <X9prbamkn9gncHqz@bench.sobornost.net>
References: <91D9B9F7-0DBE-45E6-84D5-2E3D9F8C44A1@tix.at> <X9kweQ5EtTL7tOAM@bench.sobornost.net> <CAOj+MMFySPXpE8QxcO+7szKzQ78faQASYKnBUYg_h_aLd=P4Lg@mail.gmail.com> <BYAPR11MB3207412804697588E4AA3F03C0C60@BYAPR11MB3207.namprd11.prod.outlook.com> <20201216093614.GI68083@diehard.n-r-g.com> <4E9BEA12-998A-4AD1-B342-4F26AA6EBA69@cisco.com> <20201216174319.GM68083@diehard.n-r-g.com> <BYAPR11MB320759EE6ABC8AB863BC1838C0C50@BYAPR11MB3207.namprd11.prod.outlook.com> <X9phnLQWPIVrcjwo@bench.sobornost.net> <BYAPR11MB32073DE138C73D554530BC1BC0C50@BYAPR11MB3207.namprd11.prod.outlook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <BYAPR11MB32073DE138C73D554530BC1BC0C50@BYAPR11MB3207.namprd11.prod.outlook.com>
X-Clacks-Overhead: GNU Terry Pratchett
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/4NbRhShORhCb7sdaqa544Cqp1mA>
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Dec 2020 20:17:57 -0000

On Wed, Dec 16, 2020 at 07:42:39PM +0000, Jakob Heitz (jheitz) wrote:
> How far do you want to push this?
> Does anyone want to restart the router?

Not very far! :-)

(1) A node detecting its peer is stuck (in either sending or receiving
direction) only needs to stop and start the affected session.

(2) Separately, restarting the stuck router (the one advertising
recvwind=0) might also be required, but such action would be orthogonal
to the node from (1) proceeding to cleanup routes & try again.

It won't be possible to specify a mechanism for part (2), but (1) very
much is within reach. Part (2) probably is done by field engineers,
while part (1) seems an appealing automated response to improve
robostness in multi-node networks.

Kind regards,

Job