Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

"Jakob Heitz (jheitz)" <jheitz@cisco.com> Sat, 12 December 2020 07:15 UTC

Return-Path: <jheitz@cisco.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E81A93A0B18 for <idr@ietfa.amsl.com>; Fri, 11 Dec 2020 23:15:30 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.619
X-Spam-Level:
X-Spam-Status: No, score=-9.619 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com header.b=B7A8Kyfz; dkim=pass (1024-bit key) header.d=cisco.onmicrosoft.com header.b=rKC5FV2M
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NhYWDke1L1jN for <idr@ietfa.amsl.com>; Fri, 11 Dec 2020 23:15:29 -0800 (PST)
Received: from rcdn-iport-4.cisco.com (rcdn-iport-4.cisco.com [173.37.86.75]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C3DF83A0B17 for <idr@ietf.org>; Fri, 11 Dec 2020 23:15:28 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=15618; q=dns/txt; s=iport; t=1607757328; x=1608966928; h=from:to:subject:date:message-id:references:in-reply-to: mime-version; bh=5rX8k9hGRjGx2VnxqpE0ywgKdglxtb9Y2CzSEBstxrA=; b=B7A8Kyfz3jvl4Yam3zBYYPhU697MsWfYRHfy9EJ/I9qVCI/x29mhD5Ts huJm71j75f2oWAyW3HZeAWYPFGrCFsvgsKFKTh5vHKuUfrzUum1V3riq2 IQ0Fwh/sX9dlImOS0lVh9uQaBBcos6wA769g3OMp/LOjiP8EZxpoWzfHk o=;
IronPort-PHdr: 9a23:32hisR0GjaV3ifZysmDT+zVfbzU7u7jyIg8e44YmjLQLaKm44pD+JxWEtad1hVvOVIHH7PRJl/XRqaP7H2cH5MXJvHMDdclKUBkIwYUTkhc7CcGIQUv8MLbxbiM8EcgDMT0t/3yyPUVPXsqrYVrUry6q5DoVExHjMgx4OvvyAI3Jyc+w0rP695jaeQ4dgj27bPt7Jwm3qgOEsM4QjO4AYqY8wxfEuD1GYeNTkGhpPlmU2R3745S9
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0C0CADjbNRf/5FdJa1iHgEBCxIMggQLgSMvUQd1Wy8uhD6DSAONUwOUGYRxglMDVAsBAQENAQEjCgIEAQGESgIXgWgCJTcGDgIDAQELAQEFAQEBAgEGBHGFYQyFcgEBAQEDDgQRChMBASMVDwIBCA4DBAEBJAQDAgICMBQJCAIEARIIGoMFgX5XAy4BDp9LAoE8iGl2gTKDBAEBBYUMGIIQAwaBOIJ1g3mGWRuBQT+BVIJVPoJdAoFhFRYJgmEzgiyCQYEtEIEPVxkvCDePDRiDNocqgzKIeJE1CoJ0iSKSR6I6lAGLDJYaAgQCBAUCDgEBBYFsJIFXcBWCcAEzEj4XAg2OITeDOoUUhUR0NwIGCgEBAwl8iRRfAQE
X-IronPort-AV: E=Sophos;i="5.78,413,1599523200"; d="scan'208,217";a="814941953"
Received: from rcdn-core-9.cisco.com ([173.37.93.145]) by rcdn-iport-4.cisco.com with ESMTP/TLS/DHE-RSA-SEED-SHA; 12 Dec 2020 07:15:27 +0000
Received: from XCH-RCD-003.cisco.com (xch-rcd-003.cisco.com [173.37.102.13]) by rcdn-core-9.cisco.com (8.15.2/8.15.2) with ESMTPS id 0BC7FQs5023563 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=FAIL); Sat, 12 Dec 2020 07:15:26 GMT
Received: from xhs-rtp-003.cisco.com (64.101.210.230) by XCH-RCD-003.cisco.com (173.37.102.13) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Sat, 12 Dec 2020 01:15:26 -0600
Received: from xhs-rtp-003.cisco.com (64.101.210.230) by xhs-rtp-003.cisco.com (64.101.210.230) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Sat, 12 Dec 2020 02:15:25 -0500
Received: from NAM11-BN8-obe.outbound.protection.outlook.com (64.101.32.56) by xhs-rtp-003.cisco.com (64.101.210.230) with Microsoft SMTP Server (TLS) id 15.0.1497.2 via Frontend Transport; Sat, 12 Dec 2020 02:15:25 -0500
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=JanKmw+tSvzCxEznrsl+ZD7p/k6CCT73ZXRBrHBZZqzHDoDZb8o4bzOpo7hV1LkxsFtx6CKv92WOnMj4w3xUoVXd7pCrVXytSbQF5zeNNGesCHPsaI4k6vbpFUsxoMBnnaS/iUU7O8A8NwxtAhKT9iPkG6A+CSkcD8ogg+kRXbMOcEzpaLmkPAEhUeErK+v905VgF5R0H44L/iNjSht3Hb/vFJURwecGk2vZMroJzFfqE+7dJYm36Iy5uF1LA59OxozfeWlnhwP1IgegxZ15Oe1nPbwRyBS79Ok4B+bR+TEm/e6181xbuETCfET+hYFyjNxAOUoerFTaUQPv2JOoRw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5rX8k9hGRjGx2VnxqpE0ywgKdglxtb9Y2CzSEBstxrA=; b=IdOE5X6c7+U99KHrX49VbSeXzU0iam1Zwo7zyApp7M8aGQ8ohw3KPfiVYfFjqRidpSyuR+Rn3wREbP4YuHaEEwMgGcKY4LnJiIRGet9N+JvZJqfODpgCiv5b4lr9U7icuXEsHrSShTc2yAl3OScaQFDcW0G0GR5xMxO/Rx41YtaRRtQuuNbELbtD2G1Mir2f2T1p3Zu5SmoUBmKbTTyccpUF3QsgewsJbPGXH6nmLmf/wzNYiODlhYnyI1RD5whF4laQ4C3uuV2wIImKEwKalc57QYYwOjxfa77hE0Oa+RPPrIVWDwO8uQGJzXBNaj1lAXEc/DJEEPZ6vT7xNK56dQ==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=cisco.com; dmarc=pass action=none header.from=cisco.com; dkim=pass header.d=cisco.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cisco.onmicrosoft.com; s=selector2-cisco-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5rX8k9hGRjGx2VnxqpE0ywgKdglxtb9Y2CzSEBstxrA=; b=rKC5FV2Mi5yfaGPA7/yp24cWI/UaeABYc9phNBaV7Qpwlpdg4JFwUwblrNjBpJoB2OsiDnFj/VOUEMEfy+TM9RBW5JJSksyIDczcZVV03HazFBGFHcjSsSFyXNaRQiT6f7mzPTmVbX62ns16HGDO2VYS+Imi05R9TgsMGDLEd2o=
Received: from BYAPR11MB3207.namprd11.prod.outlook.com (2603:10b6:a03:7c::14) by BYAPR11MB3846.namprd11.prod.outlook.com (2603:10b6:a03:f5::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3632.17; Sat, 12 Dec 2020 07:15:24 +0000
Received: from BYAPR11MB3207.namprd11.prod.outlook.com ([fe80::2581:444d:50af:1701]) by BYAPR11MB3207.namprd11.prod.outlook.com ([fe80::2581:444d:50af:1701%4]) with mapi id 15.20.3632.023; Sat, 12 Dec 2020 07:15:24 +0000
From: "Jakob Heitz (jheitz)" <jheitz@cisco.com>
To: Enke Chen <enchen@paloaltonetworks.com>, "idr@ietf.org" <idr@ietf.org>
Thread-Topic: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
Thread-Index: AQHW0E5A1yeUcr6qKk6fowKb6UkM0KnzCoKA
Date: Sat, 12 Dec 2020 07:15:24 +0000
Message-ID: <BYAPR11MB32077A4C76A2D5B569837698C0C90@BYAPR11MB3207.namprd11.prod.outlook.com>
References: <CANJ8pZ_4OasVWQ+Z7UddOXF85RgMOQGbZni9Zpivy-wa0AXj3Q@mail.gmail.com>
In-Reply-To: <CANJ8pZ_4OasVWQ+Z7UddOXF85RgMOQGbZni9Zpivy-wa0AXj3Q@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: paloaltonetworks.com; dkim=none (message not signed) header.d=none;paloaltonetworks.com; dmarc=none action=none header.from=cisco.com;
x-originating-ip: [2001:420:c0c8:1002::744]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: ba89fb52-f271-4536-89ca-08d89e6db1e2
x-ms-traffictypediagnostic: BYAPR11MB3846:
x-microsoft-antispam-prvs: <BYAPR11MB3846C986AD253816BB8B71BDC0C90@BYAPR11MB3846.namprd11.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:9508;
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: h6XSgKCOolLTcuAQgonhh7kcE2Ek3ecGYBPwlK3meq+vUl9x/Adc55DXTajyvqF1lMXImde7lNyw58sukfZ/lCy+37GaNMXHWfH3QCjm3S1ddeclDmpU578+9jhD0lQDiZR7LXAFP7to0k7khrn2EY9FKdnKqF8gw1gRRs5AnW4BAjmxitKD2SUDPVnMjOty2SmJE31gRnX7Wh4HgoLDl1k7xPFZHoJZOwe06s0G2a8qRuxgAXo1M7tZaRbTCTIRj04wgOfKOSsPgw2zCQo3/zVPpPCARl0cDGnZZ6E3hIiSNEfIxjhBI9JieqV8pZSHPMADMOdeyTD2WFs+me5xm8DwQBgE8u/ogjS3JqUEBDXcSXjXeNWVLJXfeS5MFcXFROw3xBTeldpEfK52mSbI/Q==
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BYAPR11MB3207.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(366004)(376002)(346002)(136003)(5660300002)(66556008)(8936002)(55016002)(71200400001)(6506007)(33656002)(66946007)(52536014)(966005)(66476007)(508600001)(2906002)(64756008)(76116006)(166002)(110136005)(186003)(86362001)(9686003)(83380400001)(7696005)(8676002)(53546011)(66446008); DIR:OUT; SFP:1101;
x-ms-exchange-antispam-messagedata: 1/pu/C46vy2Uk5Mjpd4uxhV/GHDvD7nPff5bUOqLUyNBJwexPkgniC9KxJrKnvU3z84W0FpDfXIW8GvERA8kRyGy4fLa9pE9dG8Yc4Iah3ahqyyKTB73j10lke9izCrZ+Tc2x7bzY+JLiBH6DeEWR1UTdQgiX7lq3NCUS22eus+9xo+KZAJaxwi7AhhxLTU9UZLkkTwl/sgXfx5n91chErKNu45Nfxaf+br9l90LCdyqJRDBoUFUlPcgh3Lxtb40SjAouD8pSlofMBROBXmbNc8UW+nyryIyOkLLdyZWIeO70dm0VU+xJBoFdx6gy+yGxDl56Ke+2MSBCp4e+0EBUbxUzXZbtAuIYW34mlNyIKQ+aCDwfp0oWcltGWcuU8yQKOxbeRk6ZYxkki/ezcncKdXjQP6cMvR6CwVnvD0QXsQtcUa4BehUo+iWyzsy0abeKTMWkqKx7WkQskdejvD/0SvVXTB0/CfcLCkCfl5p2+y5r2Xl2TNWL6vZ+bKMdOhT22SB1YMXe4VihYX5bo7XqrR0PC05xX4WuJaqNP5ymNPrE4NK0q5BaAAum2Kgl6qrPL3lQY+7puNlZZRl0Q0FFfshmy+xPbBq3ux1pX+4XhV4sEpOU7+L+uSQ7eZxMmuWlpK78jb5PJ9yXxLbXROLxngz1Pk1NcKMMxwbyOP33Z6Ul8Yh+puf/W1fOEOuiblqem4Lkq56RZtRnKj7JZ2rw1lxdTVU7ytnL0urZLpP8JF7GBPyGMTRJTMl4jm8ba4rlEb1Kzul7hSIgxuem9I0kIjkGqMNMXvbpAfPVhPdETfvZv+5ERxuApdqClzINUT+VRCvX3k1VS1Lz6B8AQROT8W8b5Ue9NQbj63GSVJdlr/G8huw2tF1G9e5aCiQA7mnv5sNY3QLiQCDa8HxwfxMSHxg/S9qujeBADj9MKC5Xj5SlQ/DXigHuI12tla99HobRGn0YPSgSDvgd5xRuGKsXgoQliS0R/BMahQ4IlOQDNn2OoaXrHbjYxv8eqUYr3UR
x-ms-exchange-transport-forked: True
Content-Type: multipart/alternative; boundary="_000_BYAPR11MB32077A4C76A2D5B569837698C0C90BYAPR11MB3207namp_"
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: BYAPR11MB3207.namprd11.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: ba89fb52-f271-4536-89ca-08d89e6db1e2
X-MS-Exchange-CrossTenant-originalarrivaltime: 12 Dec 2020 07:15:24.5013 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5ae1af62-9505-4097-a69a-c1553ef7840e
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: dJUbFLgczclj+ujxA+jl48Hny+1HMsr7iamm7X99ds8Lo//XATxhlgMy/Ca2fde8YUBsuLBV6eoMMWnB1dfPmQ==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR11MB3846
X-OriginatorOrg: cisco.com
X-Outbound-SMTP-Client: 173.37.102.13, xch-rcd-003.cisco.com
X-Outbound-Node: rcdn-core-9.cisco.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/WXQLJeErpvDJuNR4gJBhSQQIQ-E>
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 12 Dec 2020 07:15:31 -0000

Kill and restart all from scratch with BGP is a scary prospect,
especially if it's just going to repeat.

How about we graceful restart it.
Require GR to be in effect on the session.
Then either RST or silently drop the session and start a new one.
The receiver should accept the new session and do GR.
And, yeah, use a long timer. Don't restart again until at least the EOR
is both sent and received and only then start the boom timer again.

Regards,
Jakob.

From: Idr <idr-bounces@ietf.org> On Behalf Of Enke Chen
Sent: Friday, December 11, 2020 10:15 PM
To: idr@ietf.org
Cc: Enke Chen <enchen@paloaltonetworks.com>
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

Hi, Folks:

There is an interesting article titled "When TCP Sockets Refuse to Die":

      https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/

which recommends using the TCP keepalive option and the TCP_USER_TIMEOUT option together to deal with several TCP "stuck" scenarios.

To strike a balance between maintaining routing stability and working around these corner cases, how about we recommend using these two TCP options with a timeout value larger than the BGP per-session holdtimer (e.g.,  2 * bgp_holdtimer) ?

Thanks.   -- Enke
------
Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

"Jakob Heitz (jheitz)" <jheitz@cisco.com<mailto:jheitz@cisco.com>> Sat, 12 December 2020 03:29 UTCShow header<https://mailarchive.ietf.org/arch/browse/idr/>

Good point Keyur.

A receiver may be overwhelmed for a long time and not open its TCP window to avoid

silly window syndrome or some other reason. The receiver may still be functional

and able to clear its backlog, albeit in a long time. Resetting such a session

will only make the situation worse. Telling the difference between this case

and a receiver stuck in a bug is difficult.



Regards,

Jakob.