Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

john heasley <heas@shrubbery.net> Wed, 16 December 2020 00:15 UTC

Return-Path: <heas@shrubbery.net>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A28DE3A091C for <idr@ietfa.amsl.com>; Tue, 15 Dec 2020 16:15:08 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_NONE=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wyGGIImN1POu for <idr@ietfa.amsl.com>; Tue, 15 Dec 2020 16:15:07 -0800 (PST)
Received: from guelah.shrubbery.net (guelah.shrubbery.net [198.58.5.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2800A3A0964 for <idr@ietf.org>; Tue, 15 Dec 2020 16:15:07 -0800 (PST)
Received: by guelah.shrubbery.net (Postfix, from userid 7053) id BE9A4CB69; Wed, 16 Dec 2020 00:15:05 +0000 (UTC)
Date: Wed, 16 Dec 2020 00:15:05 +0000
From: john heasley <heas@shrubbery.net>
To: "Jakob Heitz (jheitz)" <jheitz=40cisco.com@dmarc.ietf.org>
Cc: Keyur Patel <keyur@arrcus.com>, Jeff Tantsura <jefftant.ietf@gmail.com>, John Scudder <jgs=40juniper.net@dmarc.ietf.org>, "idr@ietf.org" <idr@ietf.org>
Message-ID: <X9lRiQF/y6emL15n@shrubbery.net>
References: <2F238121-E468-4D0F-A0FF-9D82E44C3247@arrcus.com> <57DF4DA1-256A-4FA9-8827-EFF6D9ED2A2E@gmail.com> <BBEA6C0A-5727-4D9F-8D7C-74E572ED612D@arrcus.com> <BYAPR11MB3207C98296234C953487D6ECC0C90@BYAPR11MB3207.namprd11.prod.outlook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <BYAPR11MB3207C98296234C953487D6ECC0C90@BYAPR11MB3207.namprd11.prod.outlook.com>
X-PGPkey: http://www.shrubbery.net/~heas/public-key.asc
X-note: live free, or die!
X-homer: i just want to have a beer while i am caring.
X-Claimation: an engineer needs a manager like a fish needs a bicycle
X-reality: only YOU can put an end to the embarrassment that is Tom Cruise
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/328gRCN3xJ4DFE3idEKvVmOb9tU>
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Dec 2020 00:15:09 -0000

Sat, Dec 12, 2020 at 03:29:02AM +0000, Jakob Heitz (jheitz):
> Good point Keyur.
> A receiver may be overwhelmed for a long time and not open its TCP window to avoid
> silly window syndrome or some other reason. The receiver may still be functional
> and able to clear its backlog, albeit in a long time. Resetting such a session
> will only make the situation worse. Telling the difference between this case
> and a receiver stuck in a bug is difficult.

It seems that closing after HOLDTIME would be fragile at boot-time of the
receiver or recovery of an IxP interface, when there is high demand -
which I think is keyur's comment.  Maybe this should not be enforced until
an EoR marker, allowing a receiver to retard new, GR, or RtRefresh peers?

Could a sender test the liveliness of a peer by attempting to open a
new session?  would a successful 3-way and commencement of BGP OPEN be
an indication that it should be more patient, increase its "deadtimer"
(HOLDTIME < STUCKTIME < PATHETICTIME)?  Clearly the remote has been
sending bgp keepalives, so perhaps not for all implementations.

could an implementation more tightly coupled to its tcp use the urgent
pointer to test liveliness?