[Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
Job Snijders <job@sobornost.net> Fri, 11 December 2020 19:28 UTC
Return-Path: <job@sobornost.net>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EE9243A0E30 for <idr@ietfa.amsl.com>; Fri, 11 Dec 2020 11:28:21 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cHUL2cCgamyD for <idr@ietfa.amsl.com>; Fri, 11 Dec 2020 11:28:19 -0800 (PST)
Received: from outbound.soverin.net (outbound.soverin.net [IPv6:2a01:4f8:fff0:2d:8::215]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 140F23A0DE9 for <idr@ietf.org>; Fri, 11 Dec 2020 11:28:18 -0800 (PST)
Received: from smtp.freedom.nl (unknown [10.10.3.36]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by outbound.soverin.net (Postfix) with ESMTPS id 5C637600E6 for <idr@ietf.org>; Fri, 11 Dec 2020 19:28:17 +0000 (UTC)
Received: from smtp.freedom.nl (smtp.freedom.nl [116.202.65.211]) by soverin.net
Received: from localhost (bench.sobornost.net [local]) by bench.sobornost.net (OpenSMTPD) with ESMTPA id 02ea0842 for <idr@ietf.org>; Fri, 11 Dec 2020 19:28:16 +0000 (UTC)
Resent-From: Job Snijders <job@sobornost.net>
Resent-Date: Fri, 11 Dec 2020 19:28:16 +0000
Resent-Message-ID: <X9PIUNlIXCLq+dKe@bench.sobornost.net>
Resent-To: idr@ietf.org
Date: Fri, 11 Dec 2020 19:23:50 +0000
From: Job Snijders <job@sobornost.net>
To: idr@ietf.org
Message-ID: <X9PHRuGndvsFzQrG@bench.sobornost.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
X-Clacks-Overhead: GNU Terry Pratchett
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/L9nWFBpW0Tci0c9DGfMoqC1j_sA>
Subject: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 11 Dec 2020 19:28:22 -0000
Dear group, Not too long ago an incident [1] in one Autonomous System resulted in the global Internet being unusable in many parts of the world for multiple hours. Some have reported the root cause was a 'configuration error', however I believe much of the observed communication blackouts in the global routing system stemmed from a pre-existing condition: a specific implementation property present in multiple implementations currently in use in the default-free zone. Usually when an incident happens in one AS, affected parties can through unilateral action 'route around the problem', but the ability to 'route around problems' critically depends on the ability to distribute WITHDRAW or UPDATE messages. When messages are not processed, what generally was assumed to be a unilaterally solvable problem, now requires coordination between *all* neighbors of the suffering AS. The global routing system requires every participant to process BGP messages, because the alternative is intervention on thousands of BGP devices to manually shutdown thousands of BGP sessions disconnecting the AS suffering from an incident, to help the rest of the default-free zone. I speak from experience when saying that coordinating a disconnection of an AS at global scale is incredibly hard and slow, any many approval levels must be worked through. It takes *hours* of phone calls & email chains, a time window during which internet traffic is routed towards stale (now blackholing) locations. In the average ISP's network design using IBGP Route Reflectors, these blackout effects are aggravated when BGP sessions landing in such devices are not terminated when TCP causes the BGP session to stall. The problem of how TCP and BGP-4 can interact has been discussed before, but I'm not sure the working group followed up with any publication detailing the problem and the solution. https://mailarchive.ietf.org/arch/msg/idr/q0Sx5d3zZjfOmOQ4lO2OZAHh9Lc/ Does everyone agree BGP-4 sessions MUST be terminated using a TCP RST (instead of a BGP-4 Cease NOTIFICATION) if the peer has indicated for the duration of the Hold Timer that the TCP receive window is zero? I'm fine with there being buttons to make this different, but the default for routers in the global Internet routing system should be to consider the remote peer to be 'a lost cause' when it won't accept new BGP messages for the duration of the hold timer. Perhaps RFC 4271 Section 6.5 should be amended as following: OLD: If a system does not receive successive KEEPALIVE, UPDATE, and/or NOTIFICATION messages within the period specified in the Hold Time field of the OPEN message, then the NOTIFICATION message with the Hold Timer Expired Error Code is sent and the BGP connection is closed. NEW: If a system does not receive (or is unable to send) successive KEEPALIVE, UPDATE, and/or NOTIFICATION messages within the period specified in the Hold Time field of the OPEN message, then the NOTIFICATION message with the Hold Timer Expired Error Code is sent and the BGP connection is closed. If the NOTIFICATION message cannot be send the BGP connection is closed. This is an ongoing problem. I suspect the BGP Nyancat's discoloration at the left most eye might have been caused by an active TCP session keeping a stale BGP session alive. But also the observations from "BGP Zombies: an Analysis of Beacons Stuck Routes" [3] could be explained by the problematic interaction between TCP and BGP. I appreciate the work the IDR working group has done to *SOFTEN* the blow from implementation defects on global routing (RFC 7606 is a brilliant example of this), but I fear in this case there is no subtle way to say goodbye when the peer doesn't process messages in a timely fashion. It might be good to document this. Kind regards, Job [1]: https://www.reuters.com/article/level-3-communi-outages-idUSL2N1CB00C [2]: https://labs.ripe.net/Members/cteusche/bgp-meets-cat [3]: https://www.iij-ii.co.jp/en/members/romain/pdf/romain_pam2019.pdf
- [Idr] TCP & BGP: Some don't send terminate BGP wh… Job Snijders
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Tony Li
- Re: [Idr] TCP & BGP: Some don't send terminate BG… John Scudder
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeff Tantsura
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Tony Li
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Keyur Patel
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeff Tantsura
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Keyur Patel
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Christoph Loibl
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Christoph Loibl
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jared Mauch
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jared Mauch
- Re: [Idr] TCP & BGP: Some don't send terminate BG… William McCall
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jared Mauch
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Randy Bush
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jared Mauch
- Re: [Idr] TCP & BGP: Some don't send terminate BG… John Scudder
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Christoph Loibl
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
- Re: [Idr] TCP & BGP: Some don't send terminate BG… John Scudder
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
- Re: [Idr] TCP & BGP: Some don't send terminate BG… John Scudder
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
- Re: [Idr] TCP & BGP: Some don't send terminate BG… john heasley
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Tony Li
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Keyur Patel
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Keyur Patel
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Brian Dickson
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Claudio Jeker
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Claudio Jeker
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
- Re: [Idr] TCP & BGP: Some don't send terminate BG… John Heasley
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Claudio Jeker
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Brian Dickson
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Brian Dickson
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Brian Dickson
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Gert Doering
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Claudio Jeker
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Brian Dickson
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Brian Dickson
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
- Re: [Idr] TCP & BGP: Some don't send terminate BG… John Scudder
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
- Re: [Idr] TCP & BGP: Some don't send terminate BG… William McCall
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen