Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

Gyan Mishra <hayabusagsm@gmail.com> Sun, 20 December 2020 21:05 UTC

Return-Path: <hayabusagsm@gmail.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B44A53A11C2 for <idr@ietfa.amsl.com>; Sun, 20 Dec 2020 13:05:51 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.087
X-Spam-Level:
X-Spam-Status: No, score=-2.087 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_REMOTE_IMAGE=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9dUlP0kjyDfD for <idr@ietfa.amsl.com>; Sun, 20 Dec 2020 13:05:49 -0800 (PST)
Received: from mail-pg1-x52e.google.com (mail-pg1-x52e.google.com [IPv6:2607:f8b0:4864:20::52e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C0ED93A11C3 for <idr@ietf.org>; Sun, 20 Dec 2020 13:05:49 -0800 (PST)
Received: by mail-pg1-x52e.google.com with SMTP id n7so5129836pgg.2 for <idr@ietf.org>; Sun, 20 Dec 2020 13:05:49 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=M3bXyKmQqNvM2whwmcjm+IgMbEv803xHn4ZYUXbTvPs=; b=eWsSiZLXDW2Q0K+40vfDrcLrANqDWI5809NRYYVp7EGQhybQRLjEJ/Dd6yH6Ol11GS W2LJxNd5m9u2wd8erZyMV7q+wa+OmNd+B855tW+1ctpdPYAIjloU4gxcDETSgt52zBB2 iWiQytorpo8ujwqESv1VCaoiys+BACVCyJ1rIC/vXod9i9qkBz8Skw1IXkoIYg/k+bUB AJgrUtse9d1+f4XK+G4X0dBFrSWA65RxH7jNfdDURG/CsPdKeWw1KIzPi9+C66eWDJ9w 6JuxegJTaZ/jOAaLgyL5645Jtsi63MzBzAn5aVxV3JG2jq5P64mktf/ggw6uy/WYSXgz CROg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=M3bXyKmQqNvM2whwmcjm+IgMbEv803xHn4ZYUXbTvPs=; b=UxG5aRyFyDVOhdNjUPYeHrugrFz0VHEbiUFqa4IFVuUP0dsbcw2Eu8tmveVeJZDMjS gvSfN4n4y7OzM0ccurcz9lsvPC5U1O3lMhRnFzspmi8uxddR8QBf8UZte5xu37H/RaQz k4Envl+HdQs09MhhJo5RVHTWG7RaZl4MrwBHQfj2v3j3Axyp8FYEYgd3bLbjXfqo2ix2 O/7Pn65gl6P5ov1wMtJ49o5UIOZnTLeRTqnBPgtknCO7ynZGSd+1wQkwXx0NfJnUhjT/ 6WMo08OunpruYLFHesg4ialQmlkv/aWMd/LwXUayx9Ygd9YuG3adMfonP0awHOagYFeQ yJ7A==
X-Gm-Message-State: AOAM531dW6gtyxf64DsnNy1jhmIBkFiIX2lnvx1eNG/63xeTN2t2CyHI 4XvwD2Lyq+dN0z0Yl26h9meeRuKki8IbQ9NUPnc=
X-Google-Smtp-Source: ABdhPJwzpA1OD3sZqGYc/XUjEnBZNEsFDuMmSjGDaoLcy2byBf5YhILYRyi3JwFKQw9sSXSO0nl4NT4H8xaeinjSwVE=
X-Received: by 2002:aa7:9698:0:b029:19d:d63f:d2d2 with SMTP id f24-20020aa796980000b029019dd63fd2d2mr12518003pfk.4.1608498348984; Sun, 20 Dec 2020 13:05:48 -0800 (PST)
MIME-Version: 1.0
References: <X9PHRuGndvsFzQrG@bench.sobornost.net> <CAOj+MME4OHmoqJfzNQ4Tj6+wCd1kJVHPfJsDbk_+Xh8fh5G8Dg@mail.gmail.com> <6F7C5906-51A8-43C2-8AEC-3DB74CB9941F@tix.at> <1B4E7C9D-BBFE-4865-87F9-133ACE55D122@cisco.com> <22C381D0-2174-4828-A724-FD97B2FE0BCB@tix.at> <9D6268BD-C555-4B9A-A883-9B55EEB5D5DA@juniper.net> <X9o1+4/vque3I8er@bench.sobornost.net>
In-Reply-To: <X9o1+4/vque3I8er@bench.sobornost.net>
From: Gyan Mishra <hayabusagsm@gmail.com>
Date: Sun, 20 Dec 2020 16:05:38 -0500
Message-ID: <CABNhwV3HyLgVGSjFmYiEzb5qQH-RzcqV59zd62Ch8GNznG3Jpw@mail.gmail.com>
To: Job Snijders <job@sobornost.net>
Cc: John Scudder <jgs=40juniper.net@dmarc.ietf.org>, Robert Raszuk <robert@raszuk.net>, "idr@ietf.org" <idr@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000b68db105b6ebb47b"
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/09ipc_PZhqEVufQVlMDcp1uO3qw>
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 20 Dec 2020 21:05:52 -0000

Hi Job

I had a question about the adj-RIB-in and adj-RIB-out during the TCP paused
BGP stuck state.

Did the rtr-A rib in and rtr-B rib out did that look normal showing 800k in
out ?

So from a NOC perspective the peering looked almost normal except for the
rtr-A red flag with 0 or few routes.

rtr-A GR enabled ?
adj-RIB-in  ? did this show few routes or 0 did it change throughout the
stuck duration
adj-RIab-out 800k routes - Looked Normal

rtr-B GR enabled ?
adj-RIB-in  800k routes - Looked Normal
adj-RIab-out  did this show 800k routes out to the peer mentioned

Also from an interface level for the peer I am guessing was showing 0
traffic in either direction during the 46 minutes you probably also had a
ton of drops on the inter ISP peer in either direction.

That was probably an immediate red flag for the NOC of the stuck state.

So when the NOC discovered the issue they saw the 1 way control plane.

Did the NOC think of bouncing the peer to recover?

Bounce immediately would have forced the withdrawal and immediate
convergence.

Very interesting real world problem and thank you again for bringing to
IDR.  We definitely need to find a solid solution which looks like we are
on our way.


Kind Regards

Gyan

On Wed, Dec 16, 2020 at 11:30 AM Job Snijders <job@sobornost.net> wrote:

> Dear all,
>
> A follow-up with some PCAP data to illustate the issue
>
> On Tue, Dec 15, 2020 at 08:00:06PM +0000, John Scudder wrote:
> >   rtr-A                   rtr-B
> >   (congested c-p)         (uncongested c-p)
> >   send window: >0         send window: 0
> >   recv window: 0          recv window: >0
> >
> > In this case we expect:
> >  a) rtr-B does not send any BGP packet (KEEPALIVE/UPDATE/NOTIFICATION)
> > to rtr-A in normal operating circumstances.
> >  b) rtr-A does not expect any KEEPALIVE/UPDATE packets from rtr-B. The
> > session remains established even if no packet is received in the
> > holdtime.
> >  c) rtr-A continues to send KEEPALIVE packets to rtr-B.
>
> A PCAP showing the above scenario is available here:
>
>     webhosted decoded: https://www.cloudshark.org/captures/6b120e111c76
>     original .pcap file:
> http://sobornost.net/~job/bgp_tcp_bad_interaction.pcap
>
> rtr-A == 165.254.255.17, rtr-B == 45.138.228.4
> The PCAP was captured on a wiretap applied between these BGP nodes.
>
> rtr-B is trying to send a full routing table (~ 800,000 routes) to
> rtr-A, rtr-A in turn has only a few (stale) downstream routes.
>
> Things are normal during the first 22 seconds (packets 1-149). At packet
> 150 it becomes clear rtr-B is not able to send UPDATEs (or KEEPALIVEs or
> WITHDRAWs) to rtr-A. This situation persisted for ~ 46 minutes, at which
> point I manually killed the sesssion.
>
> During those 46 minutes (wall clock) the OutQ on rtr-B climbed to
> thousands. Even if rtr-B has decent OutQ deduplication (which could
> somewhat hide the detrimental effects of this situation), it is clear
> from the packet capture that rtr-B is not able to follow the spirit and
> intent of the BGP Hold Timers: BGP communication is completely and fully
> stalled in one direction. All of rtr-A's BGP messages are TCP ACKed, but
> zero progress is made sending anything from rtr-B to rtr-A.
>
> About the OpenBGPD example solution: OpenBGPD ('bgpd') is an integral
> part of the OpenBSD Network Operating System. The OpenBSD developers are
> responsible for bgpd, userland, the kernel, ssh, TCP subsystem, NIC
> drivers, all of it. Conceptually such a 'complete fullstack
> implementation' is no different than its more impressive siblings Junos
> (a complete operating system with an embedded BGP implementation), IOS
> XR, or SR-OS. Customers are interested in the complete product.
>
> I think vendors are expected to have full ownership of their entire
> product: customers most likely won't care how "Inside the router
> chassis" the BGP daemon(s), kernel(s), hypervisor, filesystem, NIC
> drivers, etc are separate components. What matters is what actually
> happens on the wire between between individual BGP nodes.
>
> I think Takt offered a valuable insight:
>
>     "Usually there is some tx buffering going on. Just because write()
>     was successful doesn't mean a message actually arrived on the other
>     hand. But if write() blocked for an Holdtimer interval it is sure
>     there is an issue."
>     source: https://twitter.com/taktv6/status/1338223595487719436
>
> It's up to each implementation/vendor how to pull up state from lower
> layers into their BGP engine. In the case of OpenBGPD we are fortunate
> to have some infrastructure in place to accomodate this type of
> improvement. Other implementations might have to come up with different
> solutions depending how they designed to handle IO or buffer BGP
> messages.
>
> Compliance testing for the yet-to-be-submitted internet-draft will be
> gauged simply by looking at what happens on the wire between two nodes.
>
> Kind regards,
>
> Job
>
> _______________________________________________
> Idr mailing list
> Idr@ietf.org
> https://www.ietf.org/mailman/listinfo/idr
>
-- 

<http://www.verizon.com/>

*Gyan Mishra*

*Network Solutions A**rchitect *



*M 301 502-134713101 Columbia Pike *Silver Spring, MD