Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

Keyur Patel <keyur@arrcus.com> Sat, 12 December 2020 01:31 UTC

Return-Path: <keyur@arrcus.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B40E03A0CFA for <idr@ietfa.amsl.com>; Fri, 11 Dec 2020 17:31:21 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=netorgft1331857.onmicrosoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Rf_OyA0Q0_aZ for <idr@ietfa.amsl.com>; Fri, 11 Dec 2020 17:31:19 -0800 (PST)
Received: from NAM04-BN3-obe.outbound.protection.outlook.com (mail-eopbgr680057.outbound.protection.outlook.com [40.107.68.57]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 77FE33A0CE2 for <idr@ietf.org>; Fri, 11 Dec 2020 17:31:19 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=YXqy8F8iAMeuBzw2vhstvlW8Upx/6DMyrxdoqUj3yODEckP36nFFcV8B6Ie/OA3yvAmXXhkBkqPSZXGeYdu+12NtGImNB4qF3ROCAR9QarmN5fvzS8n2pXnHn1zH6gkhFW5HgFHfID/4ZDM7uplXafs36s2vI0YczFwcSK/SlrwsEPHode2YgiPeCCOLpWEpjR2vtNt01Tlxtw2NgDjewVqOdj8DBV4TR03sGfEdkd7FV9jmSj5t4hC6t79VtK8JWnUZl90t1CsRoD9DJr17eaRibgSQEfagyO26XYLQjFi5bL7WZDf4bAivcYWfmIHzN7YyOeuGQpTV2jx3oR5zLw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YkkGMcfEoyPfZzkAqVFHabhJOi8HcmGn64u6++EDcZU=; b=ZZydL9MvwstEZkfsBA58Uy0zhqLdcsVirPAefs5X1C+pu7Bj9PvSgSVjF73jPHWyVOmfdIFMdu8fyeGGuqGV8K9AYOrOgFbkCKa8AA9++mW+CXrqsPILgfUCcassFSzHg37jOCT+eSUb6YAnTi4sT5XAj4qGcnVwKWHyD3uoDZdyJr5E3bkXf36DqbQbgqNieFwwdrc0Ixr2dJxvRwdAuc2w6IxHJ6mJEX7xFp8Zq9i8T9YAbzibfgdRvYqd1ZHwmOKedUqZ7DSQQnO4PyBzcHbJADaArv9lfB871WP38PJPfCMoUq/Ofbfv5mamTT57QhJ5Dq+EalTOm85b/1joPw==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arrcus.com; dmarc=pass action=none header.from=arrcus.com; dkim=pass header.d=arrcus.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=NETORGFT1331857.onmicrosoft.com; s=selector2-NETORGFT1331857-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YkkGMcfEoyPfZzkAqVFHabhJOi8HcmGn64u6++EDcZU=; b=gIL3LQcxiKaeE7uGHkOh2LVoLckGla1nbx6FNLWlu4vwXK6yut9kjSb0itqdTR+8xnlT/p16mEjdXOd0ZUwdScjljCWfyT0b9ONVFZOz0Mzk8/KlA3i1qlrzNuWuLRmx/AYDSr2Qmr7B2bUlwewgiZYwLQf7sqg9ae6ZGt89cGA=
Received: from BYAPR18MB2696.namprd18.prod.outlook.com (2603:10b6:a03:10b::26) by SJ0PR18MB3818.namprd18.prod.outlook.com (2603:10b6:a03:2ca::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3654.12; Sat, 12 Dec 2020 01:31:16 +0000
Received: from BYAPR18MB2696.namprd18.prod.outlook.com ([fe80::6835:7b3f:491c:74b4]) by BYAPR18MB2696.namprd18.prod.outlook.com ([fe80::6835:7b3f:491c:74b4%4]) with mapi id 15.20.3654.015; Sat, 12 Dec 2020 01:31:16 +0000
From: Keyur Patel <keyur@arrcus.com>
To: Jeff Tantsura <jefftant.ietf@gmail.com>
CC: John Scudder <jgs=40juniper.net@dmarc.ietf.org>, Job Snijders <job@sobornost.net>, "idr@ietf.org" <idr@ietf.org>
Thread-Topic: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
Thread-Index: AQHWz/PTKqeqTOTeJ0S3IgM5E+6+BanyUWWA//+zxQCAAI2nAP//lCGA
Date: Sat, 12 Dec 2020 01:31:16 +0000
Message-ID: <BBEA6C0A-5727-4D9F-8D7C-74E572ED612D@arrcus.com>
References: <2F238121-E468-4D0F-A0FF-9D82E44C3247@arrcus.com> <57DF4DA1-256A-4FA9-8827-EFF6D9ED2A2E@gmail.com>
In-Reply-To: <57DF4DA1-256A-4FA9-8827-EFF6D9ED2A2E@gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/16.43.20110804
authentication-results: gmail.com; dkim=none (message not signed) header.d=none;gmail.com; dmarc=none action=none header.from=arrcus.com;
x-originating-ip: [2601:646:9a00:1990:e109:2d01:f065:43e6]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 712af5f7-6bcd-4a09-9384-08d89e3d9e86
x-ms-traffictypediagnostic: SJ0PR18MB3818:
x-microsoft-antispam-prvs: <SJ0PR18MB3818469D5A617C8CAD4F406EC1C90@SJ0PR18MB3818.namprd18.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:10000;
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: Wbla1AXa0/tsBnqRKdly8Wsf2baz2S2Q6Ypq1+BXr+4F5NFXDak/JrPfAn63xOmITJdH5k9e5DDH8iV2uMLi6c0dPyVeM4SPujB6F7GA+TnYTMVbXC7xHB6ulVSGQ5TpwyDhdVtlp59ROTzjLAG3+3waP5t+JtCZvpdP1G4a98SP66veuypkkWko8mYvie7oUZ5ZH+e/FPK8/p92+v1Dwqm3SntQcaVNauj6J9eGV1eacVvylOjpQHh3XHVVaYH4B3ziUu/zx0SpnLAUiYXET/tAZs9gKR/MBuzu6g3vhd8c27NeVmoB12yHpbMYmtQLWmIoSsnHs+d7dx+KfzNIIZOA2h4nx1u9WQ36+4QUIrgowBu8DMCMuQ0obv+eYetdTynmCZM6C+HHwRA4q4yuItuoE73kUwc8xWj2U3ID4ZEIbIRpbkH9cN4hOlL0hxTq10UlS04KB+AkTFQMyi5tNA==
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BYAPR18MB2696.namprd18.prod.outlook.com; PTR:; CAT:NONE; SFS:(376002)(136003)(366004)(346002)(66446008)(33656002)(6486002)(64756008)(6916009)(966005)(54906003)(76116006)(508600001)(53546011)(4326008)(6506007)(86362001)(5660300002)(8936002)(66574015)(83380400001)(8676002)(66946007)(66476007)(6512007)(2616005)(186003)(66556008)(36756003)(71200400001)(2906002)(45980500001); DIR:OUT; SFP:1101;
x-ms-exchange-antispam-messagedata: =?utf-8?B?N1lLUUtsdnFNdzhVVzE5UHZIcmxFaDJ3L0xnUVFzYlUzcjR1OTl4UHhJQ2J0?= =?utf-8?B?RE9KeWdxSnVhRTNYU1hzdzVHTUxhdWZZekN0TmIzQndyVHUzRFpIR2hWWUJo?= =?utf-8?B?ZmpSZlFBZXEveW85UE8rMjY3OG9DZU9vV1Nma1dtbytRRlJ0NVhyNnkzaEg1?= =?utf-8?B?Q0ViUVRJeVI5M1FVSUd6ZXhqejJUcEJVUDBWZElhNGFmNVN3QjgyVVV6MmY2?= =?utf-8?B?QzFpcmpkNnJ1RUt2OS9UcDZwRURuZkZkM0NZaXExMEFuRjRjajlTVDhzS0lK?= =?utf-8?B?VVhCa25mSUxqdktWQmxLcjNBYXNmcnV1eUlkOUN4RnhIOE5RVExQeW9BYzIw?= =?utf-8?B?cVROaEdhWnh0Q0cxcXFxdVg5bmJISVNQS1NTK294ZTVXTU16RDNNK09vVTYw?= =?utf-8?B?bTNsU1hESVg3cXpvWG4zTzRXOEhuVHZVWnlVZmhmaERlYVdWV09tR241d0FU?= =?utf-8?B?OHpWenRxaHI3V1laNkNuQ1VHb1RGMS95NW56TXFwRVh0WU5RMmV1TjYzdThu?= =?utf-8?B?bk5IOE1vRTk0dlZ2MWJCNDJvRitJM3dOY01hWXVINzliSTVnM0c3ck9GUzN4?= =?utf-8?B?dERhWGRwUE1YYzdLS3JFT2daWnZsa1JWVjAxRjl3anhXQ2J0Q1ZweDV2SWNH?= =?utf-8?B?cVc1SEhFaHZBaFdTRjYwZGpEUStYYmRaZ1J5VzN5MGozeHk1KzBCNUFQdk9S?= =?utf-8?B?ZFluQjhGZmgvZ2xFcFIwUHh4VjkrU2s2NjhmYVRCUUJSMDdGbUtCNDNKaHQv?= =?utf-8?B?eWordm8yeXhTbmJXZmVxOFdCS0ZQZk9DM2QxN2o1T0NZYjlyY0lzZURDeUl3?= =?utf-8?B?aGsvMFVZaytIUHBsRmtIcGN1K3RZR1BIa3BQTWh1a0hHWW1kOHZTcWdGeEx6?= =?utf-8?B?ZnZUZ0ZTVmlnM2NNNjkxaEFibnRqY1hQdEJOZUtIelEzRFRIUE5jZklCYXBi?= =?utf-8?B?SU1RbCtjd2lFb2tWTmhQYll1bEZ2Snh6aFNxTEgraXE3RnBxaVQ5WjluWW5h?= =?utf-8?B?eTlVTWFsb1BhdDJ1Nm9mcGIvSXkvT3Q4RjE4MVhYNjdOZ1h3YjdiOXpwb0NI?= =?utf-8?B?ZjJvRVZYRVlYMFl1WFpjZUNpS0xWQXUydklDY0F3OHBxMjVpeUcxT0ZVV01B?= =?utf-8?B?MTAxcjNrYS8rdWVEYTRwWWhkRUtOSVFHT25HRHBXZWhjOVVUQTdVbkkrU2RS?= =?utf-8?B?VDRLR1J0UTRBN2Z6OGdibk00TEgxZnFpanpKcXRBOHpLOHRvQTZ4dnhGcmNp?= =?utf-8?B?eGN6eitrMW5Xc056WFhBTEVXSWV2VGp6cExNOEZlM2Z3UlR1NUlHejR1U3lm?= =?utf-8?B?TVRwZ1BDd3QrQXc5bG5VeklNQnp1U05lUmJ1cGJ3VDRPSVFUTXZxMWtNZFBD?= =?utf-8?B?aVNTeFFlU3dLSVlQU3VEK1dicnR6Rm5YdEgvSzhQTzg3NThKdS9nc3NYcElr?= =?utf-8?Q?nWW8+oUJ?=
x-ms-exchange-transport-forked: True
Content-Type: text/plain; charset="utf-8"
Content-ID: <92385FAD2C6FF44AB6C0F52604426ABF@namprd18.prod.outlook.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-OriginatorOrg: arrcus.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: BYAPR18MB2696.namprd18.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 712af5f7-6bcd-4a09-9384-08d89e3d9e86
X-MS-Exchange-CrossTenant-originalarrivaltime: 12 Dec 2020 01:31:16.1229 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 697b3529-5c2b-40cf-a019-193eb78f6820
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: wMoRYuWWJt7tPfHwj/vVFp105zOLkKMkM1pWcPPo8dFZBkNFBeOR2LJN2T+xN1zocZaYY8uMIF2Y6b10+9MVxQ==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ0PR18MB3818
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/Gs0c-eSQFg78EzGluebVQ6NZNLE>
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 12 Dec 2020 01:31:22 -0000

One more comment: 

The flapping of the session may result into persistent flaps (more network instability) if the session is brought up prematurely. Most issues are a side effect of excessive throttling of resources at the protocol/system level or bugs in the implementation. That means a manual intervention would be needed before the session is restarted.  

Regards,
Keyur

On 12/11/20, 3:57 PM, "Jeff Tantsura" <jefftant.ietf@gmail.com> wrote:

    The trade-off is (as often happens) between stability and convergence.
    Given severity, I’d prefer formalized approach rather than implementation artifact ( at mercy of Product Manager in charge ;-))

    Regards,
    Jeff

    > On Dec 11, 2020, at 15:30, Keyur Patel <keyur@arrcus.com> wrote:
    > 
    > One comment inlined #Keyur
    > 
    > On 12/11/20, 12:04 PM, "Idr on behalf of John Scudder" <idr-bounces@ietf.org on behalf of jgs=40juniper.net@dmarc.ietf.org> wrote:
    > 
    >    [all hats on]
    > 
    >    Hi Job,
    > 
    >    Thanks for bringing this up.
    > 
    >    To take the liberty of summarizing your wall of text :-) you’re saying that you believe BGP should tear down its session if it’s unable to send a message for the duration of the hold time. 
    > 
    >    Given that the conversation last time was inconclusive I think this is a good thing for the WG to discuss again. If you want to, you (or someone) could turn the idea into a short draft that updates RFC 4271, and we could have a WG adoption discussion about it. It might help focus the discussion but it’s not mandatory.
    > 
    >    I’ll point out a few things to start with —
    > 
    >    - Making it mandatory to apply hold time to the sending of messages would potentially make BGP peerings less stable. It clearly can’t make them *more* stable. Of course one can argue that if you haven’t been able to send a message for the hold time, the session has failed its metric of usefulness anyway, so any veneer of stability at this point is a harmful sham.
    >    - If I recall correctly, RST doesn’t work (or may not work) if you’re using the MD5 TCP option. Nothing much to be done, but be aware.
    >    - There is nothing stopping an implementation from doing what you describe now. The formalism that keeps you within the letter of 4271 would be that the implementation supplies a configuration option, that you set to enable the behavior. Once you’ve done that, when the implementation notices that the hold time has been exceeded in the outbound direction, it generates a ManualStop event for the session. 
    > 
    > #Keyur: +1 to what John said. This could very well be an implementation knob that generates ManualStop event.
    > 
    > Regards,
    > Keyur
    > 
    >    Thanks,
    > 
    >    —John
    > 
    >> On Dec 11, 2020, at 2:23 PM, Job Snijders <job@sobornost.net> wrote:
    >> 
    >> 
    >> Dear group,
    >> 
    >> Not too long ago an incident [1] in one Autonomous System resulted in
    >> the global Internet being unusable in many parts of the world for
    >> multiple hours. Some have reported the root cause was a 'configuration
    >> error', however I believe much of the observed communication blackouts
    >> in the global routing system stemmed from a pre-existing condition: a
    >> specific implementation property present in multiple implementations
    >> currently in use in the default-free zone.
    >> 
    >> Usually when an incident happens in one AS, affected parties can through
    >> unilateral action 'route around the problem', but the ability to 'route
    >> around problems' critically depends on the ability to distribute
    >> WITHDRAW or UPDATE messages. When messages are not processed, what
    >> generally was assumed to be a unilaterally solvable problem, now requires
    >> coordination between *all* neighbors of the suffering AS.
    >> 
    >> The global routing system requires every participant to process BGP
    >> messages, because the alternative is intervention on thousands of BGP
    >> devices to manually shutdown thousands of BGP sessions disconnecting the
    >> AS suffering from an incident, to help the rest of the default-free
    >> zone. I speak from experience when saying that coordinating a disconnection
    >> of an AS at global scale is incredibly hard and slow, any many approval
    >> levels must be worked through. It takes *hours* of phone calls & email
    >> chains, a time window during which internet traffic is routed towards
    >> stale (now blackholing) locations.
    >> 
    >> In the average ISP's network design using IBGP Route Reflectors, these
    >> blackout effects are aggravated when BGP sessions landing in such
    >> devices are not terminated when TCP causes the BGP session to stall.
    >> 
    >> The problem of how TCP and BGP-4 can interact has been discussed before,
    >> but I'm not sure the working group followed up with any publication
    >> detailing the problem and the solution.
    >> 
    >>   https://urldefense.com/v3/__https://mailarchive.ietf.org/arch/msg/idr/q0Sx5d3zZjfOmOQ4lO2OZAHh9Lc/__;!!NEt6yMaO-gk!WnfNFxBMMXzuVhI23_QuKvcPfiG3Jwero3GwHhk0hhH6WNn1W0XWUkPhCc8cBA$
    >> 
    >> Does everyone agree BGP-4 sessions MUST be terminated using a TCP RST
    >> (instead of a BGP-4 Cease NOTIFICATION) if the peer has indicated for
    >> the duration of the Hold Timer that the TCP receive window is zero?
    >> I'm fine with there being buttons to make this different, but the
    >> default for routers in the global Internet routing system should be to
    >> consider the remote peer to be 'a lost cause' when it won't accept new
    >> BGP messages for the duration of the hold timer.
    >> 
    >> Perhaps RFC 4271 Section 6.5 should be amended as following:
    >> 
    >> OLD:
    >>   If a system does not receive successive KEEPALIVE, UPDATE, and/or
    >>   NOTIFICATION messages within the period specified in the Hold Time
    >>   field of the OPEN message, then the NOTIFICATION message with the
    >>   Hold Timer Expired Error Code is sent and the BGP connection is
    >>   closed.
    >> 
    >> NEW:
    >>   If a system does not receive (or is unable to send) successive
    >>   KEEPALIVE, UPDATE, and/or NOTIFICATION messages within the period
    >>   specified in the Hold Time field of the OPEN message, then the
    >>   NOTIFICATION message with the Hold Timer Expired Error Code is sent
    >>   and the BGP connection is closed. If the NOTIFICATION message cannot
    >>   be send the BGP connection is closed.
    >> 
    >> This is an ongoing problem. I suspect the BGP Nyancat's discoloration at
    >> the left most eye might have been caused by an active TCP session
    >> keeping a stale BGP session alive. But also the observations from "BGP
    >> Zombies: an Analysis of Beacons Stuck Routes" [3] could be explained by
    >> the problematic interaction between TCP and BGP.
    >> 
    >> I appreciate the work the IDR working group has done to *SOFTEN* the
    >> blow from implementation defects on global routing (RFC 7606 is a
    >> brilliant example of this), but I fear in this case there is no subtle
    >> way to say goodbye when the peer doesn't process messages in a timely
    >> fashion. It might be good to document this.
    >> 
    >> Kind regards,
    >> 
    >> Job
    >> 
    >> [1]: https://urldefense.com/v3/__https://www.reuters.com/article/level-3-communi-outages-idUSL2N1CB00C__;!!NEt6yMaO-gk!WnfNFxBMMXzuVhI23_QuKvcPfiG3Jwero3GwHhk0hhH6WNn1W0XWUkMkF2w4cg$
    >> [2]: https://urldefense.com/v3/__https://labs.ripe.net/Members/cteusche/bgp-meets-cat__;!!NEt6yMaO-gk!WnfNFxBMMXzuVhI23_QuKvcPfiG3Jwero3GwHhk0hhH6WNn1W0XWUkMry7Ktyw$
    >> [3]: https://urldefense.com/v3/__https://www.iij-ii.co.jp/en/members/romain/pdf/romain_pam2019.pdf__;!!NEt6yMaO-gk!WnfNFxBMMXzuVhI23_QuKvcPfiG3Jwero3GwHhk0hhH6WNn1W0XWUkO8A78j8Q$
    >> 
    >> _______________________________________________
    >> Idr mailing list
    >> Idr@ietf.org
    >> https://urldefense.com/v3/__https://www.ietf.org/mailman/listinfo/idr__;!!NEt6yMaO-gk!WnfNFxBMMXzuVhI23_QuKvcPfiG3Jwero3GwHhk0hhH6WNn1W0XWUkMMXdwc-g$
    > 
    >    _______________________________________________
    >    Idr mailing list
    >    Idr@ietf.org
    >    https://www.ietf.org/mailman/listinfo/idr
    > 
    > _______________________________________________
    > Idr mailing list
    > Idr@ietf.org
    > https://www.ietf.org/mailman/listinfo/idr