Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

Job Snijders <job@sobornost.net> Fri, 11 December 2020 22:25 UTC

Return-Path: <job@sobornost.net>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 46C2A3A0FD6 for <idr@ietfa.amsl.com>; Fri, 11 Dec 2020 14:25:51 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.919
X-Spam-Level:
X-Spam-Status: No, score=-1.919 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CRt1qC6Lc-kS for <idr@ietfa.amsl.com>; Fri, 11 Dec 2020 14:25:48 -0800 (PST)
Received: from outbound.soverin.net (outbound.soverin.net [116.202.65.215]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 906EA3A0FD0 for <idr@ietf.org>; Fri, 11 Dec 2020 14:25:47 -0800 (PST)
Received: from smtp.freedom.nl (unknown [10.10.3.36]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by outbound.soverin.net (Postfix) with ESMTPS id B55FF60275 for <idr@ietf.org>; Fri, 11 Dec 2020 22:25:45 +0000 (UTC)
Received: from smtp.freedom.nl (smtp.freedom.nl [116.202.65.211]) by soverin.net
Received: from localhost (bench.sobornost.net [local]) by bench.sobornost.net (OpenSMTPD) with ESMTPA id b88267ad for <idr@ietf.org>; Fri, 11 Dec 2020 22:25:43 +0000 (UTC)
Resent-From: Job Snijders <job@sobornost.net>
Resent-Date: Fri, 11 Dec 2020 22:25:43 +0000
Resent-Message-ID: <X9Px5+vBL1n2C0ds@bench.sobornost.net>
Resent-To: idr@ietf.org
Date: Fri, 11 Dec 2020 22:24:22 +0000
From: Job Snijders <job@sobornost.net>
To: "Jakob Heitz (jheitz)" <jheitz@cisco.com>
Cc: "idr@ietf.org" <idr@ietf.org>
Message-ID: <X9PxlgRUPrdjLM8L@bench.sobornost.net>
References: <X9PHRuGndvsFzQrG@bench.sobornost.net> <FCB1ADB7-AD8C-447E-82FE-2EC15B8C3FB9@juniper.net> <CAOj+MMEGRLw9cRXJR4VgOYtoj+tRyeY4WhWsdkMuYktGh6THag@mail.gmail.com> <BYAPR11MB32072F8635A4EE22D89A606AC0CA0@BYAPR11MB3207.namprd11.prod.outlook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <BYAPR11MB32072F8635A4EE22D89A606AC0CA0@BYAPR11MB3207.namprd11.prod.outlook.com>
X-Clacks-Overhead: GNU Terry Pratchett
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/yN6feiEii0qhx7dHyuMbb_oYpjY>
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 11 Dec 2020 22:25:51 -0000

Dear Jakob, Robert, Jeff, John, Tony,

On Fri, Dec 11, 2020 at 10:07:57PM +0000, Jakob Heitz (jheitz) wrote:
> Perhaps we are focusing too much on TCP. The real issue is that BGP
> can't send a single byte for the duration of the hold timer.  We could
> instead say if the socket is send blocked.
> 
> Now, I think BGP should close the socket ungracefully, with close() and no shutdown()
> and no NOTIFICATION. It cannot send a NOTIFICATION if the socket is blocked.
> The management plane issue can be fixed in any number of ways.

Good feedback, I have a feeling we all are pointing at the same thing
but using different words to describe it. Increasing robustness in the
global routing system is the goal here, so with that in mind:

I agree with Tony's edit suggestions at https://mailarchive.ietf.org/arch/msg/idr/CS7VOx42V76RfGLmXhQqM8DyNjk/

I think the desired behavior is that locally the system emits a log
message 'hold time expired', but on the control plane really makes sure
the stuck BGP session is killed and discarded and withdraws are
generated. (Whether the Cease NOTIFICATION could be send or arrived or
was lost in transmission!)

Both as developer and operator you'd want to inspect the captured TCP
'outq' and SYSLOG facility, so there might be merit to Robert's
suggestion to make this an implementation checklist item
https://mailarchive.ietf.org/arch/msg/idr/c9uqI9l815RrOUayFdLdV3ntNc0/ 

> I advocate for TCP to send a RST rather than wait for the window to open up and
> drain the queue with a FIN at the end.
> Allowing TCP to wait only drags out the pain. It must die. And it must die NOW.
> 
> Holding up WITHDRAWs is serious, because it does not allow redundant paths
> to take effect.

Yes, it is very problematic in the global context.

> Also, a RST allows a recovery using graceful restart, whereas
> NOTIFICATION and FIN precludes GR.  Notwithstanding RFC8538.

The interactions with GR I defer to gurus like yourself, Robert, Jeff,
John, and Tony.

Kind regards,

Job