Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

"Jakob Heitz (jheitz)" <jheitz@cisco.com> Wed, 16 December 2020 17:03 UTC

Return-Path: <jheitz@cisco.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 66A003A1151 for <idr@ietfa.amsl.com>; Wed, 16 Dec 2020 09:03:40 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.598
X-Spam-Level:
X-Spam-Status: No, score=-9.598 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com header.b=N0Rv85m7; dkim=pass (1024-bit key) header.d=cisco.onmicrosoft.com header.b=A12SafE+
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3s754PQiDYV2 for <idr@ietfa.amsl.com>; Wed, 16 Dec 2020 09:03:38 -0800 (PST)
Received: from rcdn-iport-7.cisco.com (rcdn-iport-7.cisco.com [173.37.86.78]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E8CD83A111E for <idr@ietf.org>; Wed, 16 Dec 2020 09:03:36 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=12218; q=dns/txt; s=iport; t=1608138217; x=1609347817; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=mjm3lIPV6TqAFqVGT98AI/bo1hsSRvOOBdHqnt+o87g=; b=N0Rv85m7vT/4DhCxxiZBy2B1D9VWeX37MuoBYChiTxfdoqerdLR9wWwp QTLekb4Uw9SZZ1aGeuF5BH2yqFLfQXX7YDjXXd+eYIhVZ6VGOR3oNm6mN xA/qN8Kbp/g62Fl/PVd+h7AoszFyB2GWWOhJ5wYcuAeYPmraPtRcfRtu5 s=;
IronPort-PHdr: 9a23:kp841hCEvne3EMFVCQXoUyQJPHJ1sqjoPgMT9pssgq5PdaLm5Zn5IUjD/qw21g3FXIjb4uhIzeyTqeXvH2cH5MXJvHMDdclKUBkIwYUTkhc7CcGIQUv8MLbxbiM8EcgDMT0t/3yyPUVPXsqrYVrUry6+7DMSEw/zcwwwPKH6XIXVipf/2+W74ZaGZQJOiXK0aq9zKxPjqwLXu6x0yYtvI6o80F3HuHxNLu9X3mhvY1mUmkXx
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0DFAACYPdpf/4UNJK1bBxkBAQEBAQEBAQEBAQEBAQEBAQESAQEBAQEBAQEBAQEBQIFPgVJRB3VbLy6EP4NIA402JQOPC4UAhH+BQoERA1QLAQEBDQEBGAsKAgQBAYRKAheBWQIlOBMCAwEBCwEBBQEBAQIBBgRxhTQHJgyFcgEBAQMBAQEQEREMAQEsCwEECwIBCBEEAQEBAgIfBAMCAgIlCxQBCAgCBA4FGweDBAGCVQMOIAEOoh8CgTyIaXaBMoMEAQEFhSMYghADBoEOKoJ1g3qGNiYbgUE/gREnDBCCVj6BSYEUAQECF4ERARIBCAQsgwAzgiyCBQ8tFBsDBEgOGQUNAgQTDA8sFh8yBxIBBQQgDhwINwSNKYFiGBWDIYpdmjAKgnSJI4c4inADH4MmgyKbdZYHiQuRXwyEMAIEAgQFAg4BAQWBbSNncHAVOyoBL4IPEz0XAg1Yii+CfxsjFIM6glmCO4VDAXQCNQIGAQkBAQMJfIZ7BySBPF8BAQ
X-IronPort-AV: E=Sophos;i="5.78,424,1599523200"; d="scan'208";a="827622709"
Received: from alln-core-11.cisco.com ([173.36.13.133]) by rcdn-iport-7.cisco.com with ESMTP/TLS/DHE-RSA-SEED-SHA; 16 Dec 2020 17:02:58 +0000
Received: from XCH-RCD-003.cisco.com (xch-rcd-003.cisco.com [173.37.102.13]) by alln-core-11.cisco.com (8.15.2/8.15.2) with ESMTPS id 0BGH2wov020064 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=FAIL); Wed, 16 Dec 2020 17:02:58 GMT
Received: from xhs-aln-002.cisco.com (173.37.135.119) by XCH-RCD-003.cisco.com (173.37.102.13) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Wed, 16 Dec 2020 11:02:58 -0600
Received: from xhs-rtp-003.cisco.com (64.101.210.230) by xhs-aln-002.cisco.com (173.37.135.119) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Wed, 16 Dec 2020 11:02:58 -0600
Received: from NAM12-MW2-obe.outbound.protection.outlook.com (64.101.32.56) by xhs-rtp-003.cisco.com (64.101.210.230) with Microsoft SMTP Server (TLS) id 15.0.1497.2 via Frontend Transport; Wed, 16 Dec 2020 12:02:57 -0500
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=B2FDtX/SUu4UTBLzgSUu7tnIDOeNWohc7/Kp5pO8S4K7X42mAv65ol+k0+aw1diR/Ay1+SCk3Uu/Ml/bX61uRDSduIa88RztLhrpZUtW+owWdtYypRsaTto1DAWqwRdS6bCRtUdOw+0aBqJ3QJzaTHG/tr/Ys+iCvZfI/zBszZvoNk9f22UpcecqasjDuzG5jnTAueKE92Cq0dGmGnBEQVUYPd6wxLkK7D/9u5izZhqEWPD+RxOvq5k2ke+JEVb+8t/RS5sGPF9fcPyodWzqC8mEKjk6zuImoLI6TNhgUB8tVyRBWgBebOxbEEugK2nQFr6xmW+mA8epdks1wUcDOg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=mjm3lIPV6TqAFqVGT98AI/bo1hsSRvOOBdHqnt+o87g=; b=Sqjh4ptlRL6MZEbkhEDh2DrAKamFahyqaqLYaQqAC9yKtyIhid1hr2bftaE5RRqI54mjdowHO7y2yyy4wG/H2MewzgCTH3akWk3Dy7QjA8bWHTXDY9Wm77NBOVrlQUk7Poz4dF1Syq3IOlrvN21dBrmK43z9oQaD7Mu2Wy80ywb9D7TCspKLsuipIjIT/Iz9DZHthOqz0jMA4ZbQx1v9IsNWPx8fhPrwbO7hEsN4qOXiXmPRmTa+/C2zfkfxt2XVbu5vMvTKA/ay8fUmmnMp0QiomRGDJEmImJ/3tFgZRhFCD/zeKLN4XCcVzeVKvtFiOzKDzS1iZ1tz7KQIqZchaw==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=cisco.com; dmarc=pass action=none header.from=cisco.com; dkim=pass header.d=cisco.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cisco.onmicrosoft.com; s=selector2-cisco-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=mjm3lIPV6TqAFqVGT98AI/bo1hsSRvOOBdHqnt+o87g=; b=A12SafE+ulUmETFQL/UDw4go6Ak2tWase4w7Ao/kyeTFgDbqUZ/WJwtLPOXgZWatVhLF777BeFmrHh+uZ4aYX/UjIwfbSymZrwlFaYlYn//AAnTdmcH4oRcvPMoLXVpthiClrIHvxO2mOQ5nOIRMqZ+EebAXpIIUSWGsjIjWuWI=
Received: from BYAPR11MB3207.namprd11.prod.outlook.com (2603:10b6:a03:7c::14) by BYAPR11MB3461.namprd11.prod.outlook.com (2603:10b6:a03:7b::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3654.17; Wed, 16 Dec 2020 17:02:54 +0000
Received: from BYAPR11MB3207.namprd11.prod.outlook.com ([fe80::2581:444d:50af:1701]) by BYAPR11MB3207.namprd11.prod.outlook.com ([fe80::2581:444d:50af:1701%4]) with mapi id 15.20.3654.025; Wed, 16 Dec 2020 17:02:53 +0000
From: "Jakob Heitz (jheitz)" <jheitz@cisco.com>
To: "Jakob Heitz (jheitz)" <jheitz=40cisco.com@dmarc.ietf.org>
CC: Claudio Jeker <cjeker@diehard.n-r-g.com>, "idr@ietf.org" <idr@ietf.org>
Thread-Topic: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
Thread-Index: AQHWz/PZ/nZ2Wy6ptUq1oN4xA4s39qnzMNSAgABM7oCAAAJ3QIAABHgAgAUVKwCAABAegIAAD7aAgAAXogCAAALoQIAAqagAgABxV5CAAAt0gA==
Date: Wed, 16 Dec 2020 17:02:53 +0000
Message-ID: <4376EA43-985B-4894-80E2-146031D87BB0@cisco.com>
References: <X9PHRuGndvsFzQrG@bench.sobornost.net> <CAOj+MME4OHmoqJfzNQ4Tj6+wCd1kJVHPfJsDbk_+Xh8fh5G8Dg@mail.gmail.com> <6F7C5906-51A8-43C2-8AEC-3DB74CB9941F@tix.at> <1B4E7C9D-BBFE-4865-87F9-133ACE55D122@cisco.com> <22C381D0-2174-4828-A724-FD97B2FE0BCB@tix.at> <9D6268BD-C555-4B9A-A883-9B55EEB5D5DA@juniper.net> <91D9B9F7-0DBE-45E6-84D5-2E3D9F8C44A1@tix.at> <X9kweQ5EtTL7tOAM@bench.sobornost.net> <CAOj+MMFySPXpE8QxcO+7szKzQ78faQASYKnBUYg_h_aLd=P4Lg@mail.gmail.com> <BYAPR11MB3207412804697588E4AA3F03C0C60@BYAPR11MB3207.namprd11.prod.outlook.com>, <20201216093614.GI68083@diehard.n-r-g.com>, <4E9BEA12-998A-4AD1-B342-4F26AA6EBA69@cisco.com>
In-Reply-To: <4E9BEA12-998A-4AD1-B342-4F26AA6EBA69@cisco.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: diehard.n-r-g.com; dkim=none (message not signed) header.d=none;diehard.n-r-g.com; dmarc=none action=none header.from=cisco.com;
x-originating-ip: [2601:647:5701:46e0:a8e9:ea6d:8723:6479]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 51e800c4-93de-452c-7586-08d8a1e46d8b
x-ms-traffictypediagnostic: BYAPR11MB3461:
x-microsoft-antispam-prvs: <BYAPR11MB3461E9DCE1903DF60D430FBFC0C50@BYAPR11MB3461.namprd11.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:7691;
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: 2/sPO6RJSo0nydT1YtBWYTTCHX8xkymzV7GHL0lm5DWX7WLSo5qgbnYCoF86zqzXgGxfjAz0MQZg3CDOoMgsctxz/8jLQo607AJgjOb9R2eP3ae0WJ3bXqhm/JO/jQ4JEkzlQhMrsyeTJ7X/DZnol99zPVRI6BR32WmKsPquYdh9MI97bCZP6+TSLdGScNpgaEJsMOBdbeNtwKUDkzkRqsiQzmFGcLtc8d0vIr4ElPp3ca/Ri0vTqa3IkftGEyPdQYmlSAm4CrYn3XTIQOERZjsgMKHpghXoy2Hqu6SFJC82zGGFKb+2xo1zCdXZKm+WIx9/s7gssjTJjbtZJbmbGifmYInPiUdND8PDOQkDWKp8oN88rXLhAyYKebJQBQWF5nfSXVofrTTshlA+pxaaql1OJ699ty42BMichFh4U96WdHX3gCHWQ3l842OAzeQGQC7CybCGbx21tA4tPGWjntFf66DEQbZ1Ck28zCNwQgk=
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BYAPR11MB3207.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(346002)(39860400002)(366004)(376002)(136003)(396003)(5660300002)(83380400001)(54906003)(6512007)(186003)(6486002)(2616005)(8936002)(66556008)(4326008)(478600001)(71200400001)(966005)(33656002)(8676002)(64756008)(6506007)(66946007)(66476007)(66446008)(76116006)(316002)(86362001)(53546011)(36756003)(66574015)(2906002)(45980500001)(6606295002); DIR:OUT; SFP:1101;
x-ms-exchange-antispam-messagedata: gZO9aKUEsLLB3asOP+bzVJVJfpzRbnP8f4WSuMjiq7UgswEBBY2vZ38RDj4yfObr/Nly1pPMMOEkSf4GvX9Hp/1MkaxzIoUn9tQPUY7dCDZBUUqsj6ASDwqzt2eoPwdoyfpXAZi5r2JrykuqhNdEKiaRe/MM3VEC7jSgVtL3Enk21XCyVVryhOK1XBjSpX3dK9EbbHr9HevIrfJMEaWqouiZK2nLlWbkp93CT40SQe8/ML0j01YcIA9IsgZV6Zu4qyWpfbO7ShdxGCnGtlgsbW/V+/fXHDg2f98vlkYB9ZvWZWrWCQqrgL2PPlOYoIpaOz1AVdIp16Aw/ivE2G65NSD0EWRdrqh287dRCZZsINEcrOcMfncPjEDymjDpQXxFee2iJ6ed+tgzvQN8G5BlfyAnI9YqVDDytpypzUJLQyXkT9Vc2o45ZNMtuK7HjVv9kXvkNuHocajJdQZwr6qWKU8PFQ0pgItl+lmMSeSUhyRkPJvmX506Rlip5m4VSJ2ZhOlvbELkJoxTq5vEMvWKlPTym4JeBF3AmSm/xrxVenLAu25UZoigNWli+WfWlezmsDgPOS5M5CzZjc/C2Ip8W1MwXsP2xrZDbYHcAd3nSNxJl0a6Vd/wpEsTmThvWxXdh0F1E9/tkah0+xqIMdHbNT+4qW3qBcGkJsd9we88AIRFa7zEKAUCv5ZJPoMed29+oiVqkGOao0ApX1cxaZtvJ9xRRXjtx05H5rc0Ca8PxT4NyluGKTmtRqVCv5rvm3uVrtl5fa5DuvweuVxADGVbRHz+OCPF76aF4oO/dYiZxP1lUMmgoRR5JLibcDhJfSsqHv3vkCMYDGlks6XgGTDt3kFLnsmD3nOLpu8ak+p5iiGo8U+kjYRaxSsU1ltjHnDJx9a2EPoy7K3Q9NVKgqJ5A1vvQ6tEaoZENzoE48ATnhsd5JJju7p6kFTL2GOM4H2peb+0ryRUaWLRWygDZsyOjnNFgpm6UINNPzodtJPCzMU8vz9aoug9Pdyze27SEUsMlGmnYpRVYp6O9yQnjleH84b2ci40WCIO0NSl3fnyN3JQOsNMw8vRg7/2lNMISJYE
x-ms-exchange-transport-forked: True
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: BYAPR11MB3207.namprd11.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 51e800c4-93de-452c-7586-08d8a1e46d8b
X-MS-Exchange-CrossTenant-originalarrivaltime: 16 Dec 2020 17:02:53.4177 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5ae1af62-9505-4097-a69a-c1553ef7840e
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: MUw0K8lDBduuHcF/5goxGtUUdgblgS9ycEBtLfrNh8Om5/KuLXcbSryB7MhVYMEMfxdGFXMT0oRVed5FBuSZVw==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR11MB3461
X-OriginatorOrg: cisco.com
X-Outbound-SMTP-Client: 173.37.102.13, xch-rcd-003.cisco.com
X-Outbound-Node: alln-core-11.cisco.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/GR33NnklKs3zMqXaPxBsTBbJsus>
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Dec 2020 17:03:41 -0000

Sorry, I responded incorrectly. The iPad only showed me a portion of the message, so I misinterpreted.

Regards,
Jakob.


> On Dec 16, 2020, at 8:22 AM, Jakob Heitz (jheitz) <jheitz=40cisco.com@dmarc.ietf.org> wrote:
> 
> No. It's not closed with a NOTIFICATION. The send queue is frozen. No data, not even a NOTIFICATION is going to get to rtr-A. The only thing that will get there is a TCP RST and/or a new TCP SYN.
> 
> Regards,
> Jakob.
> 
> 
>>> On Dec 16, 2020, at 1:36 AM, Claudio Jeker <cjeker@diehard.n-r-g.com> wrote:
>>> 
>>> On Tue, Dec 15, 2020 at 11:39:52PM +0000, Jakob Heitz (jheitz) wrote:
>>> If you tell the socket to shutdown and then close, it will attempt to
>>> send everything in the queue with the FIN at the end.
>>> Then wait for the FIN ACK and all manner of nonsense to bore rtr-B to tears.
>>> So, to get on with it, send the RST.
>>> 
>>> Next question is what to do if GR is in effect.
>>> rtr-A will dutifully retain all the routes from rtr-B and Job's beloved WITHDRAW
>>> will still not happen.
>>> The new session will come up (maybe), rtr-B will send all its routes again and
>>> (if it doesn't get stuck again) will send its EOR. Only now can Job breathe easy.
>>> 
>>> Might we need a new bit in the GR capability in the OPEN message?
>>> "WITHDRAW ALL MY ROUTES NOW"
>> 
>> GR should not be an issue since the connection is closed with a
>> NOTIFICATION. At least the system detecting the stuck session will flush
>> and WITHDRAW all routes. In the next OPEN message this system will neither
>> set the R flag nor the F flag and so the stuck system will WITHDRAW
>> all routes as well.
>> 
>> The per AF "Forwarding State" bit already acts as a withdraw all my routes
>> now indicator.
>> 
>> Cheers,
>> -- 
>> :wq Claudio
>> 
>>> Regards,
>>> Jakob.
>>> 
>>> From: Idr <idr-bounces@ietf.org> On Behalf Of Robert Raszuk
>>> Sent: Tuesday, December 15, 2020 3:19 PM
>>> To: Job Snijders <job@sobornost.net>
>>> Cc: idr@ietf.org
>>> Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
>>> 
>>> Hi Job,
>>> 
>>> Putting all other concerns aside I have few questions ...
>>> 
>>> 1. Is this BGP which should trigger the session RST or FIN or TCP ?
>>> 
>>> 2. If this is BGP (TCP would not be aware of HOLD_SEND) how exactly do we know that peer's window is 0 for HOLD_SEND TIME ?
>>> 
>>> 3. Which TCP socket option will return BGP an error that for the duration of X sec window for a given peer was 0 ? I presumed even if it jumped for 100 ms above 0 the timer would be reset indicating peer is still alive ?
>>> 
>>> From your bgpd example you are not checking anything other then BGP's ability to write to out queue. So is this the suggestion now forgetting all about TCP layer ? Simply if I can not write anything to a peer for over X sec RST the session ?
>>> 
>>> Hi John,
>>> 
>>> I think the suggestion is to add a second HOLD_SEND TIME different from normal HOLD TIME.
>>> 
>>> Also there could be lost of different type of peers so unless HOLD_SEND would be say 5 x HOLD putting all peers under same time value may be suboptimal.
>>> 
>>> Thx,
>>> R.
>>> 
>>> 
>>> On Tue, Dec 15, 2020 at 10:54 PM Job Snijders <job@sobornost.net<mailto:job@sobornost.net>> wrote:
>>>> On Tue, Dec 15, 2020 at 09:57:47PM +0100, Christoph Loibl wrote:
>>>> Thanks for answering my question in more detail. Maybe I was unclear
>>>> (but reading your email I think we are talking about the same).
>>>>> On 15.12.2020, at 21:00, John Scudder <jgs@juniper.net<mailto:jgs@juniper.net>> wrote:
>>>>> 
>>>>> I think you are talking about this scenario. I’ll copy the example
>>>>> from Rob’s message cited above:
>>>>> 
>>>>> rtr-A                   rtr-B
>>>>> (congested c-p)         (uncongested c-p)
>>>>> send window: >0         send window: 0
>>>>> recv window: 0          recv window: >0
>>>>> 
>>>>> In this case we expect:
>>>>> a) rtr-B does not send any BGP packet (KEEPALIVE/UPDATE/NOTIFICATION)
>>>>> to rtr-A in normal operating circumstances.
>>>>> b) rtr-A does not expect any KEEPALIVE/UPDATE packets from rtr-B. The
>>>>> session remains established even if no packet is received in the
>>>>> holdtime.
>>>>> c) rtr-A continues to send KEEPALIVE packets to rtr-B.
>>>> 
>>>> The part I have a problem to understand is b). It is clear that rtr-A
>>>> will not receive any packets from rtr-B because rtr-B cannot send them
>>>> (send window: 0). But does "rtr-A does not expect any KEEPALIVE/UPDATE
>>>> packets from rtr-B” mean that rtr-A has essentially suspended its
>>>> hold-timer until it is ready to receive new messages and opens up its
>>>> recv window? If yes, why? I would expect timers to run independently
>>>> of the transport protocol.
>>> 
>>> Yeah, I'd expect that too. We've seen congested BGP implementations
>>> continue to send KEEPALIVEs but not accept (or send!) other BGP
>>> messages. And rtr-B's attempts at KEEPALIVE just be TCP ACked with zero
>>> window.
>>> 
>>> I'd argue in the above scenario rtr-A is simply broken and rtr-B MUST
>>> proceed to close down the session towards rtr-A, rtr-B must cleanup and
>>> generate WITHDRAWs for any routes pointing to rtr-A. By doing the
>>> clean-up rtr-B does both itself and rtr-A a favor. If the issue was
>>> transcient rtr-A and rtr-B will re-establish a few minutes later
>>> (IdleHoldTimer, right?) and things will normalize.
>>> 
>>> Arguably and measurably, rtr-A is operating its Loc-RIB (forwarding)
>>> based on stale routing information (assuming rtr-A is working at all!):
>>> rtr-A has not received any WITHDRAWs, UPDATEs (or somewhat less
>>> importantly KEEPALIVEs) from rtr-B.
>>> 
>>> Rtr-B is fully aware of this stale situation, because rtr-B was not able
>>> to write these BGP messages to the network: the messages are still in
>>> OutQ. Rtr-A didn't accept any KEEPALIVE (or UPDATE/WITHDRAW) from
>>> rtr-B.
>>> 
>>> How to solve this? Claudio Jeker took a look at what it would take in
>>> OpenBGPD and came up with the (tiny!) following patch, should be
>>> readable to most: https://marc.info/?l=openbsd-tech&m=160796802508185&w=2
>>> 
>>> Ben Cox helped me create a 'EBGP peer from hell': a publicly accessible
>>> EBGP multihop instance which can reliably produce the undesirable
>>> TCP/BGP behavior we're discussing here. This 'peer from hell' will do
>>> the OPEN exchange but then manipulates the TCP recvwindow towards zero.
>>> 
>>> All BGP implementations tested so far (5 famous ones) appear vulnerable
>>> because they continue to consider the BGP session healthy & stable
>>> (meanwhile OutQ keeps growing endlessly and zero BGP messages go across
>>> the wire).
>>> 
>>> One network operator (with thousands of EBGP sessions in the DFZ)
>>> reported to me the above stalled-TCP scenario is *not* a common case on
>>> the Internet. On a normal day, a network operator will see no (zero)
>>> sessions stuck this way, which leads me to believe 'recvwind=0' ...
>>> *for the duration of the hold timer* is a very strong indicator for a
>>> really broken situation which should be attempted to automatically
>>> resolve.
>>> 
>>> I believe BGP implementations are not helping any known deployment
>>> scenarios by *not* disconnecting a stuck peer, however on the other we
>>> now know about various operational examples where honoring recvwind=0
>>> for (hours, days) longer than $holdtimer led to global scale problems.
>>> 
>>> As the 'not-at-all progressing OutQ' situation seems somewhat rare in
>>> the wild (yet continues to happen from time to time) I think it is worth
>>> discussing & documenting how implementers can attempt to avoid this
>>> state from happening. It might help make the Internet 1% more robust.
>>> 
>>> BGP implementers (or operators wanting to test their equipment) feel
>>> free to contact me off-list if you'd like to set up an EBGP multihop
>>> session towards the 'peer from hell' testbed. Testing potential
>>> solutions this way is quite easy, the behavior can be triggered within a
>>> few seconds.
>>> 
>>> Kind regards,
>>> 
>>> Job
>>> 
>>> ps. At this moment we have (1) an attempt at problem description, (2) a
>>> demonstration BGP-4 implementation of a 'problem causer', and (3) a
>>> different BGP-4 implementation with a 'solution'. This enables IDR to
>>> test interopability & (potentially revised) protocol compliance,
>>> hopefully moving the problem a bit from theoretical to practical
>>> reality? :)
>> 
>>> _______________________________________________
>>> Idr mailing list
>>> Idr@ietf.org
>>> https://www.ietf.org/mailman/listinfo/idr
>> 
> _______________________________________________
> Idr mailing list
> Idr@ietf.org
> https://www.ietf.org/mailman/listinfo/idr