Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

Jeffrey Haas <jhaas@pfrc.org> Thu, 17 December 2020 14:28 UTC

Return-Path: <jhaas@pfrc.org>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 92C1A3A08C5 for <idr@ietfa.amsl.com>; Thu, 17 Dec 2020 06:28:39 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.8
X-Spam-Level:
X-Spam-Status: No, score=-1.8 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, BODY_ENHANCEMENT2=0.1, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0gbNMwrZA4zj for <idr@ietfa.amsl.com>; Thu, 17 Dec 2020 06:28:37 -0800 (PST)
Received: from slice.pfrc.org (slice.pfrc.org [67.207.130.108]) by ietfa.amsl.com (Postfix) with ESMTP id 1C43C3A08BE for <idr@ietf.org>; Thu, 17 Dec 2020 06:28:37 -0800 (PST)
Received: from dresden.attlocal.net (99-59-193-67.lightspeed.livnmi.sbcglobal.net [99.59.193.67]) by slice.pfrc.org (Postfix) with ESMTPSA id 736561E356; Thu, 17 Dec 2020 09:46:01 -0500 (EST)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\))
From: Jeffrey Haas <jhaas@pfrc.org>
In-Reply-To: <CAH1iCiotC-9tQcfNkcJKH=OcEovi1ztZoJ_eiKg_mA-Wp+FJNw@mail.gmail.com>
Date: Thu, 17 Dec 2020 09:28:38 -0500
Cc: "Jakob Heitz (jheitz)" <jheitz=40cisco.com@dmarc.ietf.org>, "idr@ietf.org" <idr@ietf.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <CDC2CE54-6DC3-48AB-B9C1-3562F279D2C1@pfrc.org>
References: <9D6268BD-C555-4B9A-A883-9B55EEB5D5DA@juniper.net> <91D9B9F7-0DBE-45E6-84D5-2E3D9F8C44A1@tix.at> <X9kweQ5EtTL7tOAM@bench.sobornost.net> <CAOj+MMFySPXpE8QxcO+7szKzQ78faQASYKnBUYg_h_aLd=P4Lg@mail.gmail.com> <BYAPR11MB3207412804697588E4AA3F03C0C60@BYAPR11MB3207.namprd11.prod.outlook.com> <20201216093614.GI68083@diehard.n-r-g.com> <4E9BEA12-998A-4AD1-B342-4F26AA6EBA69@cisco.com> <20201216174319.GM68083@diehard.n-r-g.com> <BYAPR11MB320759EE6ABC8AB863BC1838C0C50@BYAPR11MB3207.namprd11.prod.outlook.com> <CAH1iCipjgS4-NPTjNhc7Cj73bitWgTcw=ufax7iOCCnT+xGiZQ@mail.gmail.com> <20201216220122.GE24940@pfrc.org> <CAH1iCiotC-9tQcfNkcJKH=OcEovi1ztZoJ_eiKg_mA-Wp+FJNw@mail.gmail.com>
To: Brian Dickson <brian.peter.dickson@gmail.com>
X-Mailer: Apple Mail (2.3608.120.23.2.4)
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/ZhxWwiTOh43vjhZP3vInwZrjslw>
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Dec 2020 14:28:40 -0000

Brian,


> On Dec 16, 2020, at 6:08 PM, Brian Dickson <brian.peter.dickson@gmail.com> wrote:
> Thinking a bit bigger-picture, who could or should be able to (a) detect, and (b) respond to, a situation like this in future?
> What are the pros/cons of different approaches, in terms of risk (of accidental or malicious outages induced), or effectiveness?
> 
> I'll start:
> A large network peering with another large network, is likely to have more visibility.
> If all of the sessions are stuck (but still up), that's a much stronger indicator, and maybe that'd be a good situation when auto-reaction would be appropriate.
> Assuming the auto-reaction was limited to large peers of a large network, I think this is less risky, and still very effective.
> (There would still be a challenge of how to share this state discovery across an ASN, but that's a more constrained problem to solve, IMHO.)

This sort of analysis is amenable to current centrally collected telemetry situations.  It's just not one that would be in people's playbooks.

> A large ASN may want a reliable, secure method of shutting down peers for some modest duration. Is that also something to consider developing a solution for?

Arguably, how is this different than your billing system deciding someone hasn't paid the bill and you shut down their BGP?  In this case, it's just a matter of finding all of either the impacted peering sessions, or having a list of all peering sessions by AS.

> I don't see that possible without signatures a la RPKI, but don't know if it's something anyone would really want to have available.

Pushing this action into someone else's system is probably a non-starter.  If you think people were upset at the Dutch court attack scenario...

> The conceptual model would be that of a rapid administrative shutdown of peering sessions, possibly with a pre-configured timer to re-enable sessions and/or a start time?

I think in the scenario in question, automated and persisted shutdown is the desire.  Figuring out whether you have stable BGP, much less safe transit BGP through the provider is trickier.

-- Jeff