Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
Robert Raszuk <robert@raszuk.net> Tue, 15 December 2020 23:18 UTC
Return-Path: <robert@raszuk.net>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7CD7B3A0836 for <idr@ietfa.amsl.com>; Tue, 15 Dec 2020 15:18:51 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=raszuk.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DYwr6L1yMzTf for <idr@ietfa.amsl.com>; Tue, 15 Dec 2020 15:18:49 -0800 (PST)
Received: from mail-lf1-x134.google.com (mail-lf1-x134.google.com [IPv6:2a00:1450:4864:20::134]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id ABAAC3A082F for <idr@ietf.org>; Tue, 15 Dec 2020 15:18:48 -0800 (PST)
Received: by mail-lf1-x134.google.com with SMTP id a9so43719756lfh.2 for <idr@ietf.org>; Tue, 15 Dec 2020 15:18:48 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raszuk.net; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=kvT6Tw5+cra0gBEy+mV5ueDJ+yISqUZE1rX8GrOGcZ8=; b=ZN2I1w2V6PkFMuxx6ldpneCedg6CZM8VVXNY2TyZUNkDT9QJDVnz51Y2uPkXMkLOw/ 4WjoRMDR60guMK8U5VUwplEkoyYzE+3fTtBDcwgpBUXuxjo7c9bguOC6pAmEgaQQgSBx H0ylW9/2qid4pnffCwOFzuUWIYliQeRVolq3LfZ3mdpB1ilmpZBLy4Dwzs1y5CeaHgg8 VIbWWVB8JE/a5hh8pbUfs6AZqqs9SilL0YYFF6tL+gAazmtO4VyiiMnpzjIUjzXqWwsX /RtYoQJiy4wDYj/LUtt1aHSQ6ihzP0h5zrNIqc0CbPveDKxozeHS3lXLWugtcw9ZlHbg xApQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=kvT6Tw5+cra0gBEy+mV5ueDJ+yISqUZE1rX8GrOGcZ8=; b=VnJKnGAfEmv2ridHuIeFSFB45rkf1mEwICyfb5PypRUfqOw8sk9gWi9yZRnTZJQAfz HgvUF79lMUF2zECm5yJ8h6vbhHC1BpeJbvaVFO4cuM70hwifOEIDix241pBRLbV7kOC3 avfY+9/36HCxm7s0fzZHnp4+Ml6GCzOK7eJsWNlRqe6sBDC8ZVrd0omC4/MYIT3j1MMu YPLOAHPS+aXFol6MjQIcFgzdYM7AW9Th1t763Kx7zH//ev1vajF5hI/vnHY17ZyyStfD iRRkg1uAVkIcmPLF3oqd6cd7W8JlB3hkDKZWxBfrC9iE41d4M6He6WAXH0d/L0NMXhvY yQtA==
X-Gm-Message-State: AOAM533mtxDGbPfCKAlQdRZhrMLJzU51DXFBwZCDpnmfn1WhLEZIBk4T 0NoWhm1/TMDJ540zuEequvq244VuM5MnxCerENhFPvrF6nA=
X-Google-Smtp-Source: ABdhPJwApP5moviirW7BVEZ4Q1X36oPnYPsfgHcDRU56iRE67Qx3YzSrj5miHp58RVeNHSgF4PENMSJtd7X8/qXDHvQ=
X-Received: by 2002:a05:6512:287:: with SMTP id j7mr12121549lfp.541.1608074326628; Tue, 15 Dec 2020 15:18:46 -0800 (PST)
MIME-Version: 1.0
References: <X9PHRuGndvsFzQrG@bench.sobornost.net> <CAOj+MME4OHmoqJfzNQ4Tj6+wCd1kJVHPfJsDbk_+Xh8fh5G8Dg@mail.gmail.com> <6F7C5906-51A8-43C2-8AEC-3DB74CB9941F@tix.at> <1B4E7C9D-BBFE-4865-87F9-133ACE55D122@cisco.com> <22C381D0-2174-4828-A724-FD97B2FE0BCB@tix.at> <9D6268BD-C555-4B9A-A883-9B55EEB5D5DA@juniper.net> <91D9B9F7-0DBE-45E6-84D5-2E3D9F8C44A1@tix.at> <X9kweQ5EtTL7tOAM@bench.sobornost.net>
In-Reply-To: <X9kweQ5EtTL7tOAM@bench.sobornost.net>
From: Robert Raszuk <robert@raszuk.net>
Date: Wed, 16 Dec 2020 00:18:36 +0100
Message-ID: <CAOj+MMFySPXpE8QxcO+7szKzQ78faQASYKnBUYg_h_aLd=P4Lg@mail.gmail.com>
To: Job Snijders <job@sobornost.net>
Cc: Christoph Loibl <c@tix.at>, John Scudder <jgs@juniper.net>, "idr@ietf.org" <idr@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000002f30f05b688fbc9"
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/V_ZJW5VKARw5bk4r8QotMcnzAjw>
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Dec 2020 23:18:52 -0000
Hi Job, Putting all other concerns aside I have few questions ... 1. Is this BGP which should trigger the session RST or FIN or TCP ? 2. If this is BGP (TCP would not be aware of HOLD_SEND) how exactly do we know that peer's window is 0 for HOLD_SEND TIME ? 3. Which TCP socket option will return BGP an error that for the duration of X sec window for a given peer was 0 ? I presumed even if it jumped for 100 ms above 0 the timer would be reset indicating peer is still alive ? >From your bgpd example you are not checking anything other then BGP's ability to write to out queue. So is this the suggestion now forgetting all about TCP layer ? Simply if I can not write anything to a peer for over X sec RST the session ? Hi John, I think the suggestion is to add a second HOLD_SEND TIME different from normal HOLD TIME. Also there could be lost of different type of peers so unless HOLD_SEND would be say 5 x HOLD putting all peers under same time value may be suboptimal. Thx, R. On Tue, Dec 15, 2020 at 10:54 PM Job Snijders <job@sobornost.net> wrote: > On Tue, Dec 15, 2020 at 09:57:47PM +0100, Christoph Loibl wrote: > > Thanks for answering my question in more detail. Maybe I was unclear > > (but reading your email I think we are talking about the same). > > > On 15.12.2020, at 21:00, John Scudder <jgs@juniper.net> wrote: > > > > > > I think you are talking about this scenario. I’ll copy the example > > > from Rob’s message cited above: > > > > > > rtr-A rtr-B > > > (congested c-p) (uncongested c-p) > > > send window: >0 send window: 0 > > > recv window: 0 recv window: >0 > > > > > > In this case we expect: > > > a) rtr-B does not send any BGP packet (KEEPALIVE/UPDATE/NOTIFICATION) > > > to rtr-A in normal operating circumstances. > > > b) rtr-A does not expect any KEEPALIVE/UPDATE packets from rtr-B. The > > > session remains established even if no packet is received in the > > > holdtime. > > > c) rtr-A continues to send KEEPALIVE packets to rtr-B. > > > > The part I have a problem to understand is b). It is clear that rtr-A > > will not receive any packets from rtr-B because rtr-B cannot send them > > (send window: 0). But does "rtr-A does not expect any KEEPALIVE/UPDATE > > packets from rtr-B” mean that rtr-A has essentially suspended its > > hold-timer until it is ready to receive new messages and opens up its > > recv window? If yes, why? I would expect timers to run independently > > of the transport protocol. > > Yeah, I'd expect that too. We've seen congested BGP implementations > continue to send KEEPALIVEs but not accept (or send!) other BGP > messages. And rtr-B's attempts at KEEPALIVE just be TCP ACked with zero > window. > > I'd argue in the above scenario rtr-A is simply broken and rtr-B MUST > proceed to close down the session towards rtr-A, rtr-B must cleanup and > generate WITHDRAWs for any routes pointing to rtr-A. By doing the > clean-up rtr-B does both itself and rtr-A a favor. If the issue was > transcient rtr-A and rtr-B will re-establish a few minutes later > (IdleHoldTimer, right?) and things will normalize. > > Arguably and measurably, rtr-A is operating its Loc-RIB (forwarding) > based on stale routing information (assuming rtr-A is working at all!): > rtr-A has not received any WITHDRAWs, UPDATEs (or somewhat less > importantly KEEPALIVEs) from rtr-B. > > Rtr-B is fully aware of this stale situation, because rtr-B was not able > to write these BGP messages to the network: the messages are still in > OutQ. Rtr-A didn't accept any KEEPALIVE (or UPDATE/WITHDRAW) from > rtr-B. > > How to solve this? Claudio Jeker took a look at what it would take in > OpenBGPD and came up with the (tiny!) following patch, should be > readable to most: https://marc.info/?l=openbsd-tech&m=160796802508185&w=2 > > Ben Cox helped me create a 'EBGP peer from hell': a publicly accessible > EBGP multihop instance which can reliably produce the undesirable > TCP/BGP behavior we're discussing here. This 'peer from hell' will do > the OPEN exchange but then manipulates the TCP recvwindow towards zero. > > All BGP implementations tested so far (5 famous ones) appear vulnerable > because they continue to consider the BGP session healthy & stable > (meanwhile OutQ keeps growing endlessly and zero BGP messages go across > the wire). > > One network operator (with thousands of EBGP sessions in the DFZ) > reported to me the above stalled-TCP scenario is *not* a common case on > the Internet. On a normal day, a network operator will see no (zero) > sessions stuck this way, which leads me to believe 'recvwind=0' ... > *for the duration of the hold timer* is a very strong indicator for a > really broken situation which should be attempted to automatically > resolve. > > I believe BGP implementations are not helping any known deployment > scenarios by *not* disconnecting a stuck peer, however on the other we > now know about various operational examples where honoring recvwind=0 > for (hours, days) longer than $holdtimer led to global scale problems. > > As the 'not-at-all progressing OutQ' situation seems somewhat rare in > the wild (yet continues to happen from time to time) I think it is worth > discussing & documenting how implementers can attempt to avoid this > state from happening. It might help make the Internet 1% more robust. > > BGP implementers (or operators wanting to test their equipment) feel > free to contact me off-list if you'd like to set up an EBGP multihop > session towards the 'peer from hell' testbed. Testing potential > solutions this way is quite easy, the behavior can be triggered within a > few seconds. > > Kind regards, > > Job > > ps. At this moment we have (1) an attempt at problem description, (2) a > demonstration BGP-4 implementation of a 'problem causer', and (3) a > different BGP-4 implementation with a 'solution'. This enables IDR to > test interopability & (potentially revised) protocol compliance, > hopefully moving the problem a bit from theoretical to practical > reality? :) >
- [Idr] TCP & BGP: Some don't send terminate BGP wh… Job Snijders
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Tony Li
- Re: [Idr] TCP & BGP: Some don't send terminate BG… John Scudder
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeff Tantsura
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Tony Li
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Keyur Patel
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeff Tantsura
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Keyur Patel
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Christoph Loibl
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Christoph Loibl
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jared Mauch
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jared Mauch
- Re: [Idr] TCP & BGP: Some don't send terminate BG… William McCall
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jared Mauch
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Randy Bush
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jared Mauch
- Re: [Idr] TCP & BGP: Some don't send terminate BG… John Scudder
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Christoph Loibl
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
- Re: [Idr] TCP & BGP: Some don't send terminate BG… John Scudder
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
- Re: [Idr] TCP & BGP: Some don't send terminate BG… John Scudder
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
- Re: [Idr] TCP & BGP: Some don't send terminate BG… john heasley
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Tony Li
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Keyur Patel
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Keyur Patel
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Brian Dickson
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Claudio Jeker
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Claudio Jeker
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
- Re: [Idr] TCP & BGP: Some don't send terminate BG… John Heasley
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Claudio Jeker
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Brian Dickson
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Brian Dickson
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Brian Dickson
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Gert Doering
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Claudio Jeker
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Brian Dickson
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Brian Dickson
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
- Re: [Idr] TCP & BGP: Some don't send terminate BG… John Scudder
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
- Re: [Idr] TCP & BGP: Some don't send terminate BG… William McCall
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
- Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen