[Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

Job Snijders <job@sobornost.net> Fri, 11 December 2020 19:28 UTC

Return-Path: <job@sobornost.net>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EE9243A0E30 for <idr@ietfa.amsl.com>; Fri, 11 Dec 2020 11:28:21 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cHUL2cCgamyD for <idr@ietfa.amsl.com>; Fri, 11 Dec 2020 11:28:19 -0800 (PST)
Received: from outbound.soverin.net (outbound.soverin.net [IPv6:2a01:4f8:fff0:2d:8::215]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 140F23A0DE9 for <idr@ietf.org>; Fri, 11 Dec 2020 11:28:18 -0800 (PST)
Received: from smtp.freedom.nl (unknown [10.10.3.36]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by outbound.soverin.net (Postfix) with ESMTPS id 5C637600E6 for <idr@ietf.org>; Fri, 11 Dec 2020 19:28:17 +0000 (UTC)
Received: from smtp.freedom.nl (smtp.freedom.nl [116.202.65.211]) by soverin.net
Received: from localhost (bench.sobornost.net [local]) by bench.sobornost.net (OpenSMTPD) with ESMTPA id 02ea0842 for <idr@ietf.org>; Fri, 11 Dec 2020 19:28:16 +0000 (UTC)
Resent-From: Job Snijders <job@sobornost.net>
Resent-Date: Fri, 11 Dec 2020 19:28:16 +0000
Resent-Message-ID: <X9PIUNlIXCLq+dKe@bench.sobornost.net>
Resent-To: idr@ietf.org
Date: Fri, 11 Dec 2020 19:23:50 +0000
From: Job Snijders <job@sobornost.net>
To: idr@ietf.org
Message-ID: <X9PHRuGndvsFzQrG@bench.sobornost.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
X-Clacks-Overhead: GNU Terry Pratchett
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/L9nWFBpW0Tci0c9DGfMoqC1j_sA>
Subject: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 11 Dec 2020 19:28:22 -0000

Dear group,

Not too long ago an incident [1] in one Autonomous System resulted in
the global Internet being unusable in many parts of the world for
multiple hours. Some have reported the root cause was a 'configuration
error', however I believe much of the observed communication blackouts
in the global routing system stemmed from a pre-existing condition: a
specific implementation property present in multiple implementations
currently in use in the default-free zone. 

Usually when an incident happens in one AS, affected parties can through
unilateral action 'route around the problem', but the ability to 'route
around problems' critically depends on the ability to distribute
WITHDRAW or UPDATE messages. When messages are not processed, what
generally was assumed to be a unilaterally solvable problem, now requires
coordination between *all* neighbors of the suffering AS.

The global routing system requires every participant to process BGP
messages, because the alternative is intervention on thousands of BGP
devices to manually shutdown thousands of BGP sessions disconnecting the
AS suffering from an incident, to help the rest of the default-free
zone. I speak from experience when saying that coordinating a disconnection
of an AS at global scale is incredibly hard and slow, any many approval
levels must be worked through. It takes *hours* of phone calls & email
chains, a time window during which internet traffic is routed towards
stale (now blackholing) locations.

In the average ISP's network design using IBGP Route Reflectors, these
blackout effects are aggravated when BGP sessions landing in such
devices are not terminated when TCP causes the BGP session to stall.

The problem of how TCP and BGP-4 can interact has been discussed before,
but I'm not sure the working group followed up with any publication
detailing the problem and the solution.

    https://mailarchive.ietf.org/arch/msg/idr/q0Sx5d3zZjfOmOQ4lO2OZAHh9Lc/

Does everyone agree BGP-4 sessions MUST be terminated using a TCP RST
(instead of a BGP-4 Cease NOTIFICATION) if the peer has indicated for
the duration of the Hold Timer that the TCP receive window is zero?
I'm fine with there being buttons to make this different, but the
default for routers in the global Internet routing system should be to
consider the remote peer to be 'a lost cause' when it won't accept new
BGP messages for the duration of the hold timer.

Perhaps RFC 4271 Section 6.5 should be amended as following:

OLD:
    If a system does not receive successive KEEPALIVE, UPDATE, and/or
    NOTIFICATION messages within the period specified in the Hold Time
    field of the OPEN message, then the NOTIFICATION message with the
    Hold Timer Expired Error Code is sent and the BGP connection is
    closed.

NEW:
    If a system does not receive (or is unable to send) successive
    KEEPALIVE, UPDATE, and/or NOTIFICATION messages within the period
    specified in the Hold Time field of the OPEN message, then the
    NOTIFICATION message with the Hold Timer Expired Error Code is sent
    and the BGP connection is closed. If the NOTIFICATION message cannot
    be send the BGP connection is closed.

This is an ongoing problem. I suspect the BGP Nyancat's discoloration at
the left most eye might have been caused by an active TCP session
keeping a stale BGP session alive. But also the observations from "BGP
Zombies: an Analysis of Beacons Stuck Routes" [3] could be explained by
the problematic interaction between TCP and BGP.

I appreciate the work the IDR working group has done to *SOFTEN* the
blow from implementation defects on global routing (RFC 7606 is a
brilliant example of this), but I fear in this case there is no subtle
way to say goodbye when the peer doesn't process messages in a timely
fashion. It might be good to document this.

Kind regards,

Job

[1]: https://www.reuters.com/article/level-3-communi-outages-idUSL2N1CB00C
[2]: https://labs.ripe.net/Members/cteusche/bgp-meets-cat
[3]: https://www.iij-ii.co.jp/en/members/romain/pdf/romain_pam2019.pdf