Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

"Jakob Heitz (jheitz)" <jheitz@cisco.com> Wed, 16 December 2020 19:26 UTC

Return-Path: <jheitz@cisco.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9585F3A0D16 for <idr@ietfa.amsl.com>; Wed, 16 Dec 2020 11:26:22 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.597
X-Spam-Level:
X-Spam-Status: No, score=-9.597 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com header.b=OlF2/pzD; dkim=pass (1024-bit key) header.d=cisco.onmicrosoft.com header.b=cVWpfAPc
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id KVR6iIlzslTU for <idr@ietfa.amsl.com>; Wed, 16 Dec 2020 11:26:20 -0800 (PST)
Received: from alln-iport-5.cisco.com (alln-iport-5.cisco.com [173.37.142.92]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 25E463A0CFE for <idr@ietf.org>; Wed, 16 Dec 2020 11:26:20 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=13306; q=dns/txt; s=iport; t=1608146780; x=1609356380; h=from:to:cc:subject:date:message-id:references: in-reply-to:mime-version; bh=7D6SR4fekgcXHRxkAUUHrvQftPXv3uKiXFb4Ow8pN7k=; b=OlF2/pzD1aAjSOd/dr1Os9rdjbErx73KG+H8hGgUwsWugnUNfxWK9zus +phgguK3g4nohUEjanadpVsYAodlxwkce66Sd4HYEVoF1d7euC8DNx6tM s2rymLi6qHwFzxBlBmHb+LFoTqoXDLYNf/4I+68bpLDOCKFVnc5XkJ/+i 4=;
X-IPAS-Result: =?us-ascii?q?A0AUAADwXtpfkI9dJa1iGgEBAQEBAQEBAQEDAQEBARIBA?= =?us-ascii?q?QEBAgIBAQEBgX4CAQEBAQsBgSIvUXxbLy6EP4NIA41ZA4oZigCEcYJTA1QLA?= =?us-ascii?q?QEBDQEBHw4CBAEBhEoCF4FZAiU3Bg4CAwEBAQMCAwEBAQEFAQEBAgEGBBQBA?= =?us-ascii?q?QEBAQGGOAyFcgEBAQQSEQoTAQE3AQ8CAQgRBAEBJAQDAgICHxEUCQgCBA4FC?= =?us-ascii?q?BEJgwQBgX5XAy4BDqIpAoE8iGl2gTKDBAEBBYUSDQuCEAMGgTgBgnSDeoY2F?= =?us-ascii?q?hAbgUE/gRFDglY+ghtCAQECAYFeFRYJgmEzgiyCQXVGAoFBOQctAQc3jycIA?= =?us-ascii?q?YMthyuDMoh6kF9XCoJ0iSONDIU+gyafF5YHiQuCd5MkAgQCBAUCDgEBBYFsI?= =?us-ascii?q?oFZcBWDJFAXAg2OIQwOCRSDOoUUhUR0AjUCBgoBAQMJfIhiXwEB?=
IronPort-PHdr: =?us-ascii?q?9a23=3At+G28RJTOlhDdLOXTNmcpTVXNCE6p7X5OBIU4Z?= =?us-ascii?q?M7irVIN76u5InmIFeGvKk/g1rAXIGd4PVB2KLasKHlDGoH55vJ8HUPa4dFWB?= =?us-ascii?q?JNj8IK1xchD8iIBQyeTrbqYiU2Ed4EWApj+He2YkdQEcf6IVbVpy764TsbAB?= =?us-ascii?q?6qMw1zK6z8EZLTiMLi0ee09tXTbgxEiSD7b6l1KUC9rB7asY8dho4xJw=3D=?= =?us-ascii?q?3D?=
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-AV: E=Sophos;i="5.78,425,1599523200"; d="scan'208,217";a="631478354"
Received: from rcdn-core-7.cisco.com ([173.37.93.143]) by alln-iport-5.cisco.com with ESMTP/TLS/DHE-RSA-SEED-SHA; 16 Dec 2020 19:26:19 +0000
Received: from XCH-RCD-004.cisco.com (xch-rcd-004.cisco.com [173.37.102.14]) by rcdn-core-7.cisco.com (8.15.2/8.15.2) with ESMTPS id 0BGJQJ9a030674 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=FAIL); Wed, 16 Dec 2020 19:26:19 GMT
Received: from xhs-rcd-001.cisco.com (173.37.227.246) by XCH-RCD-004.cisco.com (173.37.102.14) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Wed, 16 Dec 2020 13:26:18 -0600
Received: from xhs-aln-002.cisco.com (173.37.135.119) by xhs-rcd-001.cisco.com (173.37.227.246) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Wed, 16 Dec 2020 13:26:17 -0600
Received: from NAM11-BN8-obe.outbound.protection.outlook.com (173.37.151.57) by xhs-aln-002.cisco.com (173.37.135.119) with Microsoft SMTP Server (TLS) id 15.0.1497.2 via Frontend Transport; Wed, 16 Dec 2020 13:26:18 -0600
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=QSbQmGJ4DWsOzaTpb7j7rlxMiJF/XAcTzed2VMwqlwQbGCHi3EGZ74eMuyfwTAhVVeW6HpglX4lkUVt7tTRgvl/EM2EnOW34bSlhFgRIcZi+fa8PCgrTIfJUz7siuM9THvYHbY0zD+0I+CX3PvpOndpfaC6IwHiHlx8Wbe43w9+mlhRS5D9/qr9uI1qzwnL2HWkYu8J2l/3YjSyr9fMFpsy5c5MNN6rkIcf29+ySULyYD7z3yg1ylhHSKxhoxPc5vo0G7qTw+uqZ07pbH37rnJ0b6ORJmsHLq9ESodjXwuCtarDfWOIFp8kLV23PxX3q2xvbB46P5SmQU0TxdVsxdw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=7D6SR4fekgcXHRxkAUUHrvQftPXv3uKiXFb4Ow8pN7k=; b=NQHL8Y82ZNx+YuSdENBGcUVPsMXpDY5/+j+YYCNvJsqH1gjdvX1kVf8u5yf02NCosjM5841iWnG6EQptaCIip4+2A9quh3akZjRq5wKe+ENeXFdJD4YHnOsWbVw9YRQlkhn7rk6R2MVnhKz8yyVp6YwYtWGpZL7UvMP5yP/KR0JXTRowVlUDPF2s+qA5vQ4E/HxW0LeYVL0VXRRPbGk5Lx2nbqJMn4NUW+LE/As3ehCIOvuVzFFCIVkh/gS1ui/nxpCAxPvXbZoLrqAxRa98dc6hprsOzSptqa8IkVAVPXowC4G0tJ7eJnMsn0R2OxEsH8L4OXsv//NQB12vyvDGWQ==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=cisco.com; dmarc=pass action=none header.from=cisco.com; dkim=pass header.d=cisco.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cisco.onmicrosoft.com; s=selector2-cisco-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=7D6SR4fekgcXHRxkAUUHrvQftPXv3uKiXFb4Ow8pN7k=; b=cVWpfAPc8KMaMUcxVCdsPyO9r6LC/xpqW0bxbOm0jtO9n+r1p3BGr5oMemm1H60mwQ4BFgM9Bme/5HgswSA+VeG4Mfh0cBuWCc264Q2aoiCews9Ikuvygh5ZEvrC+W07E+jGX9eehRxCh5BPFg/q9QZpvj5kIlpGaJhiBs+pOV0=
Received: from BYAPR11MB3207.namprd11.prod.outlook.com (2603:10b6:a03:7c::14) by BYAPR11MB2853.namprd11.prod.outlook.com (2603:10b6:a02:c8::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3654.20; Wed, 16 Dec 2020 19:26:17 +0000
Received: from BYAPR11MB3207.namprd11.prod.outlook.com ([fe80::2581:444d:50af:1701]) by BYAPR11MB3207.namprd11.prod.outlook.com ([fe80::2581:444d:50af:1701%4]) with mapi id 15.20.3654.025; Wed, 16 Dec 2020 19:26:17 +0000
From: "Jakob Heitz (jheitz)" <jheitz@cisco.com>
To: Brian Dickson <brian.peter.dickson@gmail.com>
CC: Claudio Jeker <cjeker@diehard.n-r-g.com>, "idr@ietf.org" <idr@ietf.org>
Thread-Topic: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
Thread-Index: AQHWz/PZ/nZ2Wy6ptUq1oN4xA4s39qnzMNSAgABM7oCAAAJ3QIAABHgAgAUVKwCAABAegIAAD7aAgAAXogCAAALoQIAAqagAgABxV5CAABbAgIAAEAkQgAAKUICAAAFYcA==
Date: Wed, 16 Dec 2020 19:26:16 +0000
Message-ID: <BYAPR11MB32076892E23403C8A6E61715C0C50@BYAPR11MB3207.namprd11.prod.outlook.com>
References: <6F7C5906-51A8-43C2-8AEC-3DB74CB9941F@tix.at> <1B4E7C9D-BBFE-4865-87F9-133ACE55D122@cisco.com> <22C381D0-2174-4828-A724-FD97B2FE0BCB@tix.at> <9D6268BD-C555-4B9A-A883-9B55EEB5D5DA@juniper.net> <91D9B9F7-0DBE-45E6-84D5-2E3D9F8C44A1@tix.at> <X9kweQ5EtTL7tOAM@bench.sobornost.net> <CAOj+MMFySPXpE8QxcO+7szKzQ78faQASYKnBUYg_h_aLd=P4Lg@mail.gmail.com> <BYAPR11MB3207412804697588E4AA3F03C0C60@BYAPR11MB3207.namprd11.prod.outlook.com> <20201216093614.GI68083@diehard.n-r-g.com> <4E9BEA12-998A-4AD1-B342-4F26AA6EBA69@cisco.com> <20201216174319.GM68083@diehard.n-r-g.com> <BYAPR11MB320759EE6ABC8AB863BC1838C0C50@BYAPR11MB3207.namprd11.prod.outlook.com> <CAH1iCipjgS4-NPTjNhc7Cj73bitWgTcw=ufax7iOCCnT+xGiZQ@mail.gmail.com>
In-Reply-To: <CAH1iCipjgS4-NPTjNhc7Cj73bitWgTcw=ufax7iOCCnT+xGiZQ@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: gmail.com; dkim=none (message not signed) header.d=none;gmail.com; dmarc=none action=none header.from=cisco.com;
x-originating-ip: [2601:647:5701:46e0:e82d:ab03:2132:19e4]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 044cc67e-aead-4d78-5406-08d8a1f875ac
x-ms-traffictypediagnostic: BYAPR11MB2853:
x-microsoft-antispam-prvs: <BYAPR11MB2853EC8A91D8045F73E29F44C0C50@BYAPR11MB2853.namprd11.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:9508;
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: e+f/nP0yE1+3w40PIr5UaCqcHLsyCWE6pnC75wmX+7mcg6+lkb1S0fuasLqppv8jyJPmIkLjZc/LMpMk5uI9wpPXb2H5lMgOnpLtYSjRZa7RumhlhC0BMJn/kaBPK3G6ElXybvc8bzU9CytvZpTLvBbCbOYaAq53NXZQs2/c3dJpammTVBDxY4mygU6W22vA+e7rHrc1Sq36bxlBlLHKh5pJY6ZUXepE2ohIL+2xuD4/s5y/UvDfart9BdErp3g1tOLQxeB03a2DKUlFDlcPwcmUXIFttrWRjnF+SU5XP13f5WN8uVoK7H74yQ/2FOTbisvzzABzMXBSEm3o+X9uPTD5hZATHvb6DLe1bMqmdfm+VEMt5p1x826CgeWITgOzvnjdYPCjqRQmJPXrsh4akw==
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BYAPR11MB3207.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(366004)(39860400002)(136003)(376002)(346002)(396003)(4326008)(54906003)(8676002)(66946007)(64756008)(166002)(66476007)(66556008)(53546011)(316002)(7696005)(52536014)(83380400001)(6506007)(8936002)(33656002)(76116006)(86362001)(66446008)(478600001)(55016002)(6916009)(5660300002)(66574015)(9686003)(2906002)(71200400001)(186003); DIR:OUT; SFP:1101;
x-ms-exchange-antispam-messagedata: =?utf-8?B?bFlWR2tFek5iWC9SOGRVUlNVSDBBY2dFQitMaFoxOEluK1B5MzE1dmtWTWs5?= =?utf-8?B?YWMwcWxMb0Y0SHVsTWN6VEVrZ2FGdWtyNDNmR0xSU0QvVy9OUm5aejdqeDNS?= =?utf-8?B?ZHpNWHpXb2hiWlJiSjlDdDlsd2FGUVRPWit4R3ZnSUUvdXFQT3NObXkwV25N?= =?utf-8?B?TlBPUThjb0lFUWVjbmJqTXZhbUUzUjhuNUpUa2RzTkJQb1VXNjllYllkWnNk?= =?utf-8?B?bHZ5elRLdXJXSEM3QW8vWFp0OWRRNGg2ZnpGNUFJOXVHQ00wTHJtVDFUYmoz?= =?utf-8?B?eVVITHhuUFNiVHE5YlZ2VlhreU05aHltOStZTWxuSjlZdXIyRlM2UGR5SC9v?= =?utf-8?B?YklvSzQxcytFejhxVGNxaUhrL1gwTG1VWUZ4VFNvdmQ4QTh6YTltcFJ4cVBO?= =?utf-8?B?V2M5cEYzOVBXVU8wdS9FSkwyajV3M3E0Y1NUblNEZHEyZ0lmdEFkRDcwbFJ4?= =?utf-8?B?Kzg3MDJubkZNRFdHTDBEcHdGS2owdEpNMFFsRVp3SUdWbGhqRGp0cVZRZ0Yx?= =?utf-8?B?QStUeGVhanJjbFhPL1QyYXJTYXhGT2s0VHFGVzlwN2ltamNDSE1oZWh2dHUx?= =?utf-8?B?aFFnRXVwZUJuRGlmU1c1OVVhZWJ0L2FrUTBLWnVLVVh2cWE4bHd5WUVsWmtM?= =?utf-8?B?b0FqbWlMU3RxTER0eThHOVhWcEk5enAxeERhWjZudDBjMUMzalRYbTlFMVVW?= =?utf-8?B?N2o4V1FNMUEvUnRGQ1phWnhjUkhpb2ZFTWhtY1padHRsS1JIZmIzdEFjUURB?= =?utf-8?B?YXhOU09xRGs3NDRzSzlYSDRmaTA2bFFaOFJSTHJWbmF2cEZwWW1TQm9Ec3Ex?= =?utf-8?B?ck1INVk2YUtLTEV3ZFFTRHVubU9JOXpORFptUDgwNVJNaTYwWFBZUk1sWHU3?= =?utf-8?B?MGpzYzFPaVVQV1hQRjJLWkx6ODNVdlVuYUdJTUZmajhzank0ZVhyQUdsemdV?= =?utf-8?B?cnpBUFVWNlNtTUxnVTJKeFUxdkRuY2JUTXdlbkJndk9pdFRqd21FRm1QaU55?= =?utf-8?B?RnBSSE1KM2s0K0ZHWlBYZWJIOVplVzZaY3duRXVtVmJudVBWT0R6a0lkQjR2?= =?utf-8?B?ZzcxVlQ4aDR6ZmVOc3NkdHhMK1lkRkhFcjh4aU9KcWNtcGtwRHF4WmJpbTFF?= =?utf-8?B?a1NEbTFGcWhrS1d1dXUvOUpBc21ZbXBZSWR1UXdxbW81bTQ1cXFTSTBWTUhL?= =?utf-8?B?MTB5Q1BBeFlvUm1CS2ZZR0FPRWdOZ2xLQnlwVUlEQzNET0VySjdyaFBrT0px?= =?utf-8?B?RVdndXJ4OEZMU1g1STRjblRXYnFCbk15eURZbkhERit1cnI2cnFmTzZMMnN1?= =?utf-8?B?NVpxaGdWYi9VUkVoVkw1ekw1UnE2eHZ5VnZLa2lZWGZlY0duOTEycmVJV2l0?= =?utf-8?Q?wjMR/ekZF26tyJQSO2AyOJIKUH8D2n4o=3D?=
x-ms-exchange-transport-forked: True
Content-Type: multipart/alternative; boundary="_000_BYAPR11MB32076892E23403C8A6E61715C0C50BYAPR11MB3207namp_"
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: BYAPR11MB3207.namprd11.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 044cc67e-aead-4d78-5406-08d8a1f875ac
X-MS-Exchange-CrossTenant-originalarrivaltime: 16 Dec 2020 19:26:16.9775 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5ae1af62-9505-4097-a69a-c1553ef7840e
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: LYpU7psb0Yw6/rjRKuLnbNpZ87DP5OOKWSwYY4lJjOdZyysOMOk4AFVh0FN6OrKP/9WQLPaHoOzN0TmHQtce5g==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR11MB2853
X-OriginatorOrg: cisco.com
X-Outbound-SMTP-Client: 173.37.102.14, xch-rcd-004.cisco.com
X-Outbound-Node: rcdn-core-7.cisco.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/rK0Z2suopQofPOttVo_NU4JvPlk>
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Dec 2020 19:26:23 -0000

Brian,

We went through this discussion when developing RFC 7606<https://tools.ietf.org/html/rfc7606>06>.
What errors deserve what consequences?
In implementations, we ended up making them all configurable.
We can certainly make this one optional too.

Regards,
Jakob.

From: Brian Dickson <brian.peter.dickson@gmail.com>
Sent: Wednesday, December 16, 2020 11:18 AM
To: Jakob Heitz (jheitz) <jheitz@cisco.com>
Cc: Claudio Jeker <cjeker@diehard.n-r-g.com>om>; idr@ietf.org
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0



On Wed, Dec 16, 2020 at 10:51 AM Jakob Heitz (jheitz) <jheitz=40cisco.com@dmarc.ietf.org<mailto:40cisco.com@dmarc.ietf.org>> wrote:
The restarting speaker in this case did not actually restart.
It just restarted this one session. There is no reason for it
to delete any forwarding state. There is no evidence of any
problem with its received routes, only with the routes it sent
to the stuck peer. It may still set the (F) bit to force the
stuck peer to WITHDRAW.

The transitive nature of bgp pretty much requires the safest choice should be taken.

Which is to say, there is EVERY reason to delete forwarding state.
If a peer's router is so messed up that it is not accepting any TCP packets, the only safe assumption is that the problem is AS-wide for that peer.
In which case, the only data point available (the TCP session) indicates the problem is very likely affecting other routing announcements towards that ASN.
And the only SAFE behavior is to tear down the session completely, including removing received routes from the IN-RIB, update the RIB and FIB, and go on with life.

The incident that I believe lead to this proposal was super nasty, and only a fix like this could handle that.
There is no reason to believe this can't/won't happen again in future.
As Jared notes, there are likely other implementations that would fare as poorly if they encountered the triggering situation.

While this is my opinion on the best way to handle it, the underlying facts aren't arguable.
An AS-wide situation (stuck receivers with no TCP progress) would never result in the AS sending withdrawals.
The current handling of that is demonstrably broken.
The assumption that the BGP state machine can be fully relied upon is no longer sound.

It probably never was a completely safe assumption.
IFF the state machine is working correctly, this corner case would never occur.
It has occured and can occur, ergo it needs to be handled outside of the state machine proper.

Brian