Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

"Jakob Heitz (jheitz)" <jheitz@cisco.com> Sat, 12 December 2020 03:29 UTC

Return-Path: <jheitz@cisco.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id ECC273A0D9D for <idr@ietfa.amsl.com>; Fri, 11 Dec 2020 19:29:09 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.601
X-Spam-Level:
X-Spam-Status: No, score=-9.601 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com header.b=TZADsWeu; dkim=pass (1024-bit key) header.d=cisco.onmicrosoft.com header.b=CPUvibhy
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id x68JEem_y8n9 for <idr@ietfa.amsl.com>; Fri, 11 Dec 2020 19:29:07 -0800 (PST)
Received: from alln-iport-6.cisco.com (alln-iport-6.cisco.com [173.37.142.93]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 18E053A0D9C for <idr@ietf.org>; Fri, 11 Dec 2020 19:29:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=13072; q=dns/txt; s=iport; t=1607743747; x=1608953347; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=kVsZdQsbR3Zo+ag7+6P0840KUp0e38Y6cZ24H8Tj2z0=; b=TZADsWeuacXL2cRpQDz62QWoW4yqat8xyw779JBCsBJkmB5Dq9Rah+vt SPEI26TBwjy0tH6CxBHh7mT2YUtAg1qqugFhU0Q1u5sOAmmQiLjMF0q6M ixyl3gZ/wHbt5YbnujqL/2iCcZCpHjrjNoLPOx6u1p7iLwWOhTBcTGdIz 8=;
X-IPAS-Result: =?us-ascii?q?A0BAAACWN9RfkIwNJK1iGQEBAQEBAQEBAQEBAQEBAQEBA?= =?us-ascii?q?RIBAQEBAQEBAQEBAQFAgU+BUikofFsvLoQ+g0gDjVMDgQWJFIRyiX+BQoERA?= =?us-ascii?q?1QLAQEBDQEBGAYPAgQBAYRKAheBaAIlOBMCAwEBAQMCAwEBAQEFAQEBAgEGB?= =?us-ascii?q?BQBAQEBAQGGOAyFcgEBAQECAQEBEBERDAEBLAsBCwQCAQgRBAEBAQICHwcCA?= =?us-ascii?q?gIfBgsVCAgCBAENBQgTB4I5SwGCVQMOIAEOoDYCgTyIaXaBMoMEAQEFgTMBg?= =?us-ascii?q?1UNC4IQAwaBDiqCdYN5gQaBPoQVG4FBP4ERQ4IgNT6CG0IBAQKBHwQFAQERA?= =?us-ascii?q?SMVgwAzgiyBTxpYFFIEFCcICgIEEw4NLCAVBysIGx0tAQc3Ao8LGAKDNIpcm?= =?us-ascii?q?VZXCoJ0iSKHN4VThT2DJYMim3OUAYICiQqCd45nFYQnAgQCBAUCDgEBBYFtI?= =?us-ascii?q?WlwcBU7WoIPUBcCDViNSQwBDQkUgzqDBoFTO4VDAXQCATQCBgEJAQEDCXyHL?= =?us-ascii?q?QEGIYE/XwEB?=
IronPort-PHdr: =?us-ascii?q?9a23=3AGubqQRGnhod5G91I5yMdVZ1GYnJ96bzpIg4Y7I?= =?us-ascii?q?YmgLtSc6Oluo7vJ1Hb+e401QGbVJ/B4O9fzeHRtvOoVW8B5MOHt3YPONxJWg?= =?us-ascii?q?QegMob1wonHIaeCEL9IfKrCk5yHMlLWFJ/uX3uN09TFZX3fUfZv2b05jkXSV?= =?us-ascii?q?3zMANvLbHzHYjfx828y+G1/cjVZANFzDqwaL9/NlO4twLU48IXmoBlbK02z0?= =?us-ascii?q?jE?=
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-AV: E=Sophos;i="5.78,413,1599523200"; d="scan'208";a="651175311"
Received: from alln-core-7.cisco.com ([173.36.13.140]) by alln-iport-6.cisco.com with ESMTP/TLS/DHE-RSA-SEED-SHA; 12 Dec 2020 03:29:05 +0000
Received: from XCH-RCD-003.cisco.com (xch-rcd-003.cisco.com [173.37.102.13]) by alln-core-7.cisco.com (8.15.2/8.15.2) with ESMTPS id 0BC3T5WA019791 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=FAIL); Sat, 12 Dec 2020 03:29:05 GMT
Received: from xhs-rtp-003.cisco.com (64.101.210.230) by XCH-RCD-003.cisco.com (173.37.102.13) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 11 Dec 2020 21:29:04 -0600
Received: from xhs-aln-001.cisco.com (173.37.135.118) by xhs-rtp-003.cisco.com (64.101.210.230) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 11 Dec 2020 22:29:03 -0500
Received: from NAM10-DM6-obe.outbound.protection.outlook.com (173.37.151.57) by xhs-aln-001.cisco.com (173.37.135.118) with Microsoft SMTP Server (TLS) id 15.0.1497.2 via Frontend Transport; Fri, 11 Dec 2020 21:29:03 -0600
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=hNeJZq7tLG5naWOxYMzFTftosEBz+PDmFRxa5HvA4tc1N1pxj8laGXyMmbpjwLJOqvYBO9D8HD4RpRoVbcjFb5SU2+5I9nQGbJ9c/nTvKIMgfva/gBRmk4L8dXuH0qQE2bpzoVGgdA2H6hAzu2i0M9oMWU9ZN3zVb1h/zZjoFoUdkqS+LZuIhoY2feJ06JM7JkbsHj6n7f6YUs5ZdfyjjXa1sE4qr+oE3X8L4cczxMc6iEYRO+7/frdtEUgWFsu1IDdYiiViMgRBUx2m7jA8ONGmKZasUWHRmjAvE7+nsOVizuOt9lgVjhVV64/Mw1JLn3orFlS3pKO74nqBY/Hexg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=kVsZdQsbR3Zo+ag7+6P0840KUp0e38Y6cZ24H8Tj2z0=; b=A7Jcx30uEKX2FlqwTu4tu9BLFSJyvKA497bdxRKGY265gdV5opYdLZaTIRJa0zJ7jFznK4LJSMdxrZm6SayJLeb1OZgixY5jxQJUDYkTxeckRDf0nRGGJb/sWWRmj0NnEpqEsRzCPkrUODxZQkcxwvGpu8QdAaU5T0X240ArvZVqvCrsZO7yr6TajHn2ixIaRgqwUjkR0CqM78NGL9BRp3NbASu2xVcUDd83DK4cmRZxyKTITEsMgiTO36O03DBr2pZpnyzBXrpXALQeHcVSWmqyIkzhY2QyM3Sap/eO5CmhEI5JNS5QLjrGyyGJ7QhbDCYUx4jUkc1YIhh2SswciQ==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=cisco.com; dmarc=pass action=none header.from=cisco.com; dkim=pass header.d=cisco.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cisco.onmicrosoft.com; s=selector2-cisco-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=kVsZdQsbR3Zo+ag7+6P0840KUp0e38Y6cZ24H8Tj2z0=; b=CPUvibhyIwQn8Xo0XF3u4UZl20cI85w2bIJu14AiG7T3bK1ja6O6bKV7qbi3q0SnVvYZ8wFnYyLYO8XIW//HkDl9aTZtQ5x+wAlbaufaeamEqaYcuI15koIwm2USNG6SpBPAehoK2462aO6cnD7OSgHta7GG3diUL/8APDE6cTw=
Received: from BYAPR11MB3207.namprd11.prod.outlook.com (2603:10b6:a03:7c::14) by BYAPR11MB3077.namprd11.prod.outlook.com (2603:10b6:a03:90::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3632.18; Sat, 12 Dec 2020 03:29:02 +0000
Received: from BYAPR11MB3207.namprd11.prod.outlook.com ([fe80::2581:444d:50af:1701]) by BYAPR11MB3207.namprd11.prod.outlook.com ([fe80::2581:444d:50af:1701%4]) with mapi id 15.20.3632.023; Sat, 12 Dec 2020 03:29:02 +0000
From: "Jakob Heitz (jheitz)" <jheitz@cisco.com>
To: Keyur Patel <keyur@arrcus.com>, Jeff Tantsura <jefftant.ietf@gmail.com>
CC: John Scudder <jgs=40juniper.net@dmarc.ietf.org>, "idr@ietf.org" <idr@ietf.org>
Thread-Topic: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
Thread-Index: AQHWz/PZ/nZ2Wy6ptUq1oN4xA4s39qnyUWWAgAA54oCAAAeKAIAAGj8AgAAegoA=
Date: Sat, 12 Dec 2020 03:29:02 +0000
Message-ID: <BYAPR11MB3207C98296234C953487D6ECC0C90@BYAPR11MB3207.namprd11.prod.outlook.com>
References: <2F238121-E468-4D0F-A0FF-9D82E44C3247@arrcus.com> <57DF4DA1-256A-4FA9-8827-EFF6D9ED2A2E@gmail.com> <BBEA6C0A-5727-4D9F-8D7C-74E572ED612D@arrcus.com>
In-Reply-To: <BBEA6C0A-5727-4D9F-8D7C-74E572ED612D@arrcus.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: arrcus.com; dkim=none (message not signed) header.d=none;arrcus.com; dmarc=none action=none header.from=cisco.com;
x-originating-ip: [2001:420:c0c8:1002::744]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: daf85f72-c25a-4ca0-53cc-08d89e4e1267
x-ms-traffictypediagnostic: BYAPR11MB3077:
x-microsoft-antispam-prvs: <BYAPR11MB3077471246F6EFE4CCD439EBC0C90@BYAPR11MB3077.namprd11.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:10000;
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: gz7dd8mnptUbS8hMsDni8IqjAlCK42MtPnHLBqjx1W5wNtJha41hbxMSolvzz9VaC+oWLIE6c6NoFFybOA7/bpNq0oRTxBFbcH59ocWOm/SHpC0Z1mHz8LLOSenDyVz1CiVPvCaQZ8ABjlrhtVSDoeDILkNvaYGa0a7CXQkqhtABvn+LBlJT0+VGyXt3qznOJQXFC4mGB8+bLvgtrMcKkEGRyod7wQzqMbhMAtsdBttxSffqN/IbKTGnDJVigKqMpJL+j/PDSs08Ynb6Kg1qu+cFgYEQOEeuSgjZeEENL4Y1JqR+1DMsBQ61Gn3PfoEAGCgGXrFQbbaoS6Fx7iCb0zZ3Yc5Bf9EkZ0uQW8r/+Gjr5iuOZZm82TyL8Mz3/pDdAIzSW6jwJ8iF/HUbbIDNbg==
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BYAPR11MB3207.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(346002)(366004)(136003)(376002)(66446008)(64756008)(966005)(86362001)(52536014)(53546011)(66476007)(66556008)(5660300002)(33656002)(66946007)(186003)(508600001)(55016002)(76116006)(2906002)(110136005)(4326008)(71200400001)(7696005)(54906003)(9686003)(6506007)(83380400001)(8676002)(8936002)(66574015); DIR:OUT; SFP:1101;
x-ms-exchange-antispam-messagedata: =?utf-8?B?Mnc1bFRhWHBXVitnaHRNb1hsY250a1JFd3g4bkcxREhSK3lpT2dNcSt2ZlFK?= =?utf-8?B?cVgvTEM2R3RCWlVSbXQ2UWRDcGxJWXozbXRHRWkxdWpaKzZNOHJiNEFob1Bi?= =?utf-8?B?bmFIeE16WFhZaEYwaUplZWd4Vzdibm9KNFFlVzJzUHRDRmozUHEvbnd0UUZO?= =?utf-8?B?cjl3M1k0cFl0NHAzWEdPV2N1UjZnb1lzbGFsRW1rY0ZubDRYV2xYTm1QMlNS?= =?utf-8?B?U2NmRnhuRVlqcTQ2WHIxS2tLYUdEN3FMb2w4SXBwTWc0L0RYT1g3WVpZdnEr?= =?utf-8?B?Wno3b0VmMW5paGxGT05wbVkxSTkwd3JpMjlDOHJ2QnRVRW5nbGdYNmJsOFYr?= =?utf-8?B?dVpxNzV5c2FHSnpVcHRRWjkzT0kwamdBTllwZFJab3YrQVhobmU0UE0wamc2?= =?utf-8?B?WUpzRmhYNGIyN3lpNmxocERUamZsSk93eTRVLzJIdXYwTEtyN0JJNGF5cmtM?= =?utf-8?B?ak05a0g2OCtCWEJZZCtaTUhQSmRzZ3hIcjJreXZHbU9ubzFLUWRVQWI5SEZI?= =?utf-8?B?WWV1MzhVaTNzWWhMNFdmOFZsV0tObG81OUJyVE1wWjBOUGEvY2Mzak1VKzl5?= =?utf-8?B?cC84bDRqTHBlN1VDU2tNbzdhMmNhTk4wYzBJWjIrMm9ZcnZHR2hFTFhucEJx?= =?utf-8?B?dXhSQjVJQzArY2szamVkbTlWVUE0N3ZHWFYzOUtXR0RZNm9SNXAxRlI3dGlT?= =?utf-8?B?SU53TS8vNEhVQ2M3V1FuZy9QamZQek5uNlorbDRDdU5nU2ptMzZzY0VwMWJn?= =?utf-8?B?MnZUc0NRS2t6M29oRjlreDV4amJoaDNYYmx6Q2tXWUNPZjJSdWNiUi9RRjQ1?= =?utf-8?B?NmxVeXU4Z0tiV1RScUtMUk9lSlVINVZpc2Y3MExtcWg2MTZ2czVkZ1dva2JZ?= =?utf-8?B?T3lWaXhKbk9Ccm40UVRlbVZLd25UeEl3dzVERTMrMm81a2NESVQ5aGFMZ1lx?= =?utf-8?B?bGJwMGVMdzhTeUc5NnovakxWa1U3dDBSdjlORWxpamVveE1ZZ1hHL1pUcTQv?= =?utf-8?B?akFxRkoxSWJyQmVocUpEcnl3UU1MSjBJNktzVkd6cjNCUDM0czRiaW0xNndK?= =?utf-8?B?bTE4emlTcVBSakt2Rlg4TWdENFp3SjJWYk0yTTBOTlRZSlU5bWdTeHl2ZC9Z?= =?utf-8?B?YW1CNlVpTWhETUZXZTluYlVxTTVDTUYyNWIwc1I5QmZuYmo3aU51cWRyY3J5?= =?utf-8?B?K0lRamdmZHVHYUhyWi9lTGxXTXZkV3dZZmUwdnliODBvdmgxWWVjKzJZc0Vh?= =?utf-8?B?TmpIWDZ4VkxYeTlxODZSOGNJcDlTTURnUHdNRnRNd0RGMjU2NEQvRTZPTHlM?= =?utf-8?Q?jzfQs1uQi4exKS4pcy22AqhsF5H3biK0PL?=
x-ms-exchange-transport-forked: True
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: BYAPR11MB3207.namprd11.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: daf85f72-c25a-4ca0-53cc-08d89e4e1267
X-MS-Exchange-CrossTenant-originalarrivaltime: 12 Dec 2020 03:29:02.5604 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5ae1af62-9505-4097-a69a-c1553ef7840e
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: slhRZNuNPH4NMYS8jMQZZYZ5breaQS9Iva1mGJi63/gesbdNsF6BTSUpuCK4AN4s+4QqzMw8V3UzMg0szd822Q==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR11MB3077
X-OriginatorOrg: cisco.com
X-Outbound-SMTP-Client: 173.37.102.13, xch-rcd-003.cisco.com
X-Outbound-Node: alln-core-7.cisco.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/4Y6p22AkYwAWwkixWxa20-3NRG4>
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 12 Dec 2020 03:29:10 -0000

Good point Keyur.
A receiver may be overwhelmed for a long time and not open its TCP window to avoid
silly window syndrome or some other reason. The receiver may still be functional
and able to clear its backlog, albeit in a long time. Resetting such a session
will only make the situation worse. Telling the difference between this case
and a receiver stuck in a bug is difficult.

Regards,
Jakob.

-----Original Message-----
From: Idr <idr-bounces@ietf.org> On Behalf Of Keyur Patel
Sent: Friday, December 11, 2020 5:31 PM
To: Jeff Tantsura <jefftant.ietf@gmail.com>
Cc: John Scudder <jgs=40juniper.net@dmarc.ietf.org>rg>; idr@ietf.org
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

One more comment: 

The flapping of the session may result into persistent flaps (more network instability) if the session is brought up prematurely. Most issues are a side effect of excessive throttling of resources at the protocol/system level or bugs in the implementation. That means a manual intervention would be needed before the session is restarted.  

Regards,
Keyur

On 12/11/20, 3:57 PM, "Jeff Tantsura" <jefftant.ietf@gmail.com> wrote:

    The trade-off is (as often happens) between stability and convergence.
    Given severity, I’d prefer formalized approach rather than implementation artifact ( at mercy of Product Manager in charge ;-))

    Regards,
    Jeff

    > On Dec 11, 2020, at 15:30, Keyur Patel <keyur@arrcus.com> wrote:
    > 
    > One comment inlined #Keyur
    > 
    > On 12/11/20, 12:04 PM, "Idr on behalf of John Scudder" <idr-bounces@ietf.org on behalf of jgs=40juniper.net@dmarc.ietf.org> wrote:
    > 
    >    [all hats on]
    > 
    >    Hi Job,
    > 
    >    Thanks for bringing this up.
    > 
    >    To take the liberty of summarizing your wall of text :-) you’re saying that you believe BGP should tear down its session if it’s unable to send a message for the duration of the hold time. 
    > 
    >    Given that the conversation last time was inconclusive I think this is a good thing for the WG to discuss again. If you want to, you (or someone) could turn the idea into a short draft that updates RFC 4271, and we could have a WG adoption discussion about it. It might help focus the discussion but it’s not mandatory.
    > 
    >    I’ll point out a few things to start with —
    > 
    >    - Making it mandatory to apply hold time to the sending of messages would potentially make BGP peerings less stable. It clearly can’t make them *more* stable. Of course one can argue that if you haven’t been able to send a message for the hold time, the session has failed its metric of usefulness anyway, so any veneer of stability at this point is a harmful sham.
    >    - If I recall correctly, RST doesn’t work (or may not work) if you’re using the MD5 TCP option. Nothing much to be done, but be aware.
    >    - There is nothing stopping an implementation from doing what you describe now. The formalism that keeps you within the letter of 4271 would be that the implementation supplies a configuration option, that you set to enable the behavior. Once you’ve done that, when the implementation notices that the hold time has been exceeded in the outbound direction, it generates a ManualStop event for the session. 
    > 
    > #Keyur: +1 to what John said. This could very well be an implementation knob that generates ManualStop event.
    > 
    > Regards,
    > Keyur
    > 
    >    Thanks,
    > 
    >    —John
    > 
    >> On Dec 11, 2020, at 2:23 PM, Job Snijders <job@sobornost.net> wrote:
    >> 
    >> 
    >> Dear group,
    >> 
    >> Not too long ago an incident [1] in one Autonomous System resulted in
    >> the global Internet being unusable in many parts of the world for
    >> multiple hours. Some have reported the root cause was a 'configuration
    >> error', however I believe much of the observed communication blackouts
    >> in the global routing system stemmed from a pre-existing condition: a
    >> specific implementation property present in multiple implementations
    >> currently in use in the default-free zone.
    >> 
    >> Usually when an incident happens in one AS, affected parties can through
    >> unilateral action 'route around the problem', but the ability to 'route
    >> around problems' critically depends on the ability to distribute
    >> WITHDRAW or UPDATE messages. When messages are not processed, what
    >> generally was assumed to be a unilaterally solvable problem, now requires
    >> coordination between *all* neighbors of the suffering AS.
    >> 
    >> The global routing system requires every participant to process BGP
    >> messages, because the alternative is intervention on thousands of BGP
    >> devices to manually shutdown thousands of BGP sessions disconnecting the
    >> AS suffering from an incident, to help the rest of the default-free
    >> zone. I speak from experience when saying that coordinating a disconnection
    >> of an AS at global scale is incredibly hard and slow, any many approval
    >> levels must be worked through. It takes *hours* of phone calls & email
    >> chains, a time window during which internet traffic is routed towards
    >> stale (now blackholing) locations.
    >> 
    >> In the average ISP's network design using IBGP Route Reflectors, these
    >> blackout effects are aggravated when BGP sessions landing in such
    >> devices are not terminated when TCP causes the BGP session to stall.
    >> 
    >> The problem of how TCP and BGP-4 can interact has been discussed before,
    >> but I'm not sure the working group followed up with any publication
    >> detailing the problem and the solution.
    >> 
    >>   https://urldefense.com/v3/__https://mailarchive.ietf.org/arch/msg/idr/q0Sx5d3zZjfOmOQ4lO2OZAHh9Lc/__;!!NEt6yMaO-gk!WnfNFxBMMXzuVhI23_QuKvcPfiG3Jwero3GwHhk0hhH6WNn1W0XWUkPhCc8cBA$
    >> 
    >> Does everyone agree BGP-4 sessions MUST be terminated using a TCP RST
    >> (instead of a BGP-4 Cease NOTIFICATION) if the peer has indicated for
    >> the duration of the Hold Timer that the TCP receive window is zero?
    >> I'm fine with there being buttons to make this different, but the
    >> default for routers in the global Internet routing system should be to
    >> consider the remote peer to be 'a lost cause' when it won't accept new
    >> BGP messages for the duration of the hold timer.
    >> 
    >> Perhaps RFC 4271 Section 6.5 should be amended as following:
    >> 
    >> OLD:
    >>   If a system does not receive successive KEEPALIVE, UPDATE, and/or
    >>   NOTIFICATION messages within the period specified in the Hold Time
    >>   field of the OPEN message, then the NOTIFICATION message with the
    >>   Hold Timer Expired Error Code is sent and the BGP connection is
    >>   closed.
    >> 
    >> NEW:
    >>   If a system does not receive (or is unable to send) successive
    >>   KEEPALIVE, UPDATE, and/or NOTIFICATION messages within the period
    >>   specified in the Hold Time field of the OPEN message, then the
    >>   NOTIFICATION message with the Hold Timer Expired Error Code is sent
    >>   and the BGP connection is closed. If the NOTIFICATION message cannot
    >>   be send the BGP connection is closed.
    >> 
    >> This is an ongoing problem. I suspect the BGP Nyancat's discoloration at
    >> the left most eye might have been caused by an active TCP session
    >> keeping a stale BGP session alive. But also the observations from "BGP
    >> Zombies: an Analysis of Beacons Stuck Routes" [3] could be explained by
    >> the problematic interaction between TCP and BGP.
    >> 
    >> I appreciate the work the IDR working group has done to *SOFTEN* the
    >> blow from implementation defects on global routing (RFC 7606 is a
    >> brilliant example of this), but I fear in this case there is no subtle
    >> way to say goodbye when the peer doesn't process messages in a timely
    >> fashion. It might be good to document this.
    >> 
    >> Kind regards,
    >> 
    >> Job
    >> 
    >> [1]: https://urldefense.com/v3/__https://www.reuters.com/article/level-3-communi-outages-idUSL2N1CB00C__;!!NEt6yMaO-gk!WnfNFxBMMXzuVhI23_QuKvcPfiG3Jwero3GwHhk0hhH6WNn1W0XWUkMkF2w4cg$
    >> [2]: https://urldefense.com/v3/__https://labs.ripe.net/Members/cteusche/bgp-meets-cat__;!!NEt6yMaO-gk!WnfNFxBMMXzuVhI23_QuKvcPfiG3Jwero3GwHhk0hhH6WNn1W0XWUkMry7Ktyw$
    >> [3]: https://urldefense.com/v3/__https://www.iij-ii.co.jp/en/members/romain/pdf/romain_pam2019.pdf__;!!NEt6yMaO-gk!WnfNFxBMMXzuVhI23_QuKvcPfiG3Jwero3GwHhk0hhH6WNn1W0XWUkO8A78j8Q$
    >> 
    >> _______________________________________________
    >> Idr mailing list
    >> Idr@ietf.org
    >> https://urldefense.com/v3/__https://www.ietf.org/mailman/listinfo/idr__;!!NEt6yMaO-gk!WnfNFxBMMXzuVhI23_QuKvcPfiG3Jwero3GwHhk0hhH6WNn1W0XWUkMMXdwc-g$
    > 
    >    _______________________________________________
    >    Idr mailing list
    >    Idr@ietf.org
    >    https://www.ietf.org/mailman/listinfo/idr
    > 
    > _______________________________________________
    > Idr mailing list
    > Idr@ietf.org
    > https://www.ietf.org/mailman/listinfo/idr

_______________________________________________
Idr mailing list
Idr@ietf.org
https://www.ietf.org/mailman/listinfo/idr