Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

Jeff Tantsura <> Fri, 11 December 2020 23:57 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id EACCB3A1058 for <>; Fri, 11 Dec 2020 15:57:24 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id KKYOkU_-0Nlo for <>; Fri, 11 Dec 2020 15:57:23 -0800 (PST)
Received: from ( [IPv6:2607:f8b0:4864:20::530]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id ED2693A1056 for <>; Fri, 11 Dec 2020 15:57:22 -0800 (PST)
Received: by with SMTP id g18so8269853pgk.1 for <>; Fri, 11 Dec 2020 15:57:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=content-transfer-encoding:from:mime-version:subject:date:message-id :references:cc:in-reply-to:to; bh=C9+zFvPK7yn2qziEj5YgxXk51S+9Dy/K85868gbygZQ=; b=i59H7CyUWIVDJ3S0IznvlIU2OJnF83oq5Er3wuYXVLLrLyBhe4UmkuSe3qaoDuVkd5 hRIvNaNGea74ug7ovILIZeQw7PHidTZYRJ8TvvpfuZUtfCNZACfaLquEjhynu4Ycwj9J adTl3wQS460PoQhFk3F18LyKuPX0kZZF9hpfEccPO0WcFcNLGA89sbs8EJ7gGZJ1/Y6S AOAxHnIeycmlc/A/XgNZPh0O/cpmtNOTS4uiLOIHhEuqNlt9QikR9hMl0vbUrbwZkdc6 9G4TM/Zprm3dUirVP5s5url2DgjQSEDDtSKDzcMWKnEk3ZC/j3dSyIeDYzJ134a/qaAB WCsw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=C9+zFvPK7yn2qziEj5YgxXk51S+9Dy/K85868gbygZQ=; b=ssgrGYF8FkSmX25Zicm8108EJgDDknifyPiQv7ueKA8Y9LL2d7x6/9n0EhHXWXVU+L nPn2cZMB87Mh72RpUcRcBXui56glJHoMm1urKX/Ua6FwAhv1KJurxfw1LGJ8EjTZJjjP /doFYhWiuR8Ci9LuyNK/09ea45zbVncrns0e7hE1RldWca5jF9zc7spnd+ZPE42xXgEg sUk/tuKj6Es5S+PXlWQ2va3is5maeuiYTEhMgXoP+a5QTevWdktHTsHal8xbrgzKibXP 4Cs17T7eWvDvcBFyOOH0OD7tkmwdGyAqhXDnr1CAB9liO/80lxVaj3IEzT9q43sQ+Hym ZJ/g==
X-Gm-Message-State: AOAM532IJGGIf2g5gOCtl+fwCKjYmIpnp5ftwx1V4swxzh3so8Siz12x w13/QuxVwbNx9OjN1h2a478=
X-Google-Smtp-Source: ABdhPJyPX0fMZkrLNKjMNZIPZwyD8FlZqJO5nUYdZRuO7QypK/6MAPinIVzHJ/l9PDt0psg+39yV+w==
X-Received: by 2002:a65:4887:: with SMTP id n7mr14420964pgs.85.1607731042354; Fri, 11 Dec 2020 15:57:22 -0800 (PST)
Received: from [] ( []) by with ESMTPSA id o7sm12662905pfp.144.2020. (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 11 Dec 2020 15:57:21 -0800 (PST)
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
From: Jeff Tantsura <>
Mime-Version: 1.0 (1.0)
Date: Fri, 11 Dec 2020 15:57:20 -0800
Message-Id: <>
References: <>
Cc: John Scudder <>, Job Snijders <>,
In-Reply-To: <>
To: Keyur Patel <>
X-Mailer: iPhone Mail (18B92)
Archived-At: <>
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 11 Dec 2020 23:57:25 -0000

The trade-off is (as often happens) between stability and convergence.
Given severity, I’d prefer formalized approach rather than implementation artifact ( at mercy of Product Manager in charge ;-))


> On Dec 11, 2020, at 15:30, Keyur Patel <> wrote:
> One comment inlined #Keyur
> On 12/11/20, 12:04 PM, "Idr on behalf of John Scudder" < on behalf of> wrote:
>    [all hats on]
>    Hi Job,
>    Thanks for bringing this up.
>    To take the liberty of summarizing your wall of text :-) you’re saying that you believe BGP should tear down its session if it’s unable to send a message for the duration of the hold time. 
>    Given that the conversation last time was inconclusive I think this is a good thing for the WG to discuss again. If you want to, you (or someone) could turn the idea into a short draft that updates RFC 4271, and we could have a WG adoption discussion about it. It might help focus the discussion but it’s not mandatory.
>    I’ll point out a few things to start with —
>    - Making it mandatory to apply hold time to the sending of messages would potentially make BGP peerings less stable. It clearly can’t make them *more* stable. Of course one can argue that if you haven’t been able to send a message for the hold time, the session has failed its metric of usefulness anyway, so any veneer of stability at this point is a harmful sham.
>    - If I recall correctly, RST doesn’t work (or may not work) if you’re using the MD5 TCP option. Nothing much to be done, but be aware.
>    - There is nothing stopping an implementation from doing what you describe now. The formalism that keeps you within the letter of 4271 would be that the implementation supplies a configuration option, that you set to enable the behavior. Once you’ve done that, when the implementation notices that the hold time has been exceeded in the outbound direction, it generates a ManualStop event for the session. 
> #Keyur: +1 to what John said. This could very well be an implementation knob that generates ManualStop event.
> Regards,
> Keyur
>    Thanks,
>    —John
>> On Dec 11, 2020, at 2:23 PM, Job Snijders <> wrote:
>> Dear group,
>> Not too long ago an incident [1] in one Autonomous System resulted in
>> the global Internet being unusable in many parts of the world for
>> multiple hours. Some have reported the root cause was a 'configuration
>> error', however I believe much of the observed communication blackouts
>> in the global routing system stemmed from a pre-existing condition: a
>> specific implementation property present in multiple implementations
>> currently in use in the default-free zone.
>> Usually when an incident happens in one AS, affected parties can through
>> unilateral action 'route around the problem', but the ability to 'route
>> around problems' critically depends on the ability to distribute
>> WITHDRAW or UPDATE messages. When messages are not processed, what
>> generally was assumed to be a unilaterally solvable problem, now requires
>> coordination between *all* neighbors of the suffering AS.
>> The global routing system requires every participant to process BGP
>> messages, because the alternative is intervention on thousands of BGP
>> devices to manually shutdown thousands of BGP sessions disconnecting the
>> AS suffering from an incident, to help the rest of the default-free
>> zone. I speak from experience when saying that coordinating a disconnection
>> of an AS at global scale is incredibly hard and slow, any many approval
>> levels must be worked through. It takes *hours* of phone calls & email
>> chains, a time window during which internet traffic is routed towards
>> stale (now blackholing) locations.
>> In the average ISP's network design using IBGP Route Reflectors, these
>> blackout effects are aggravated when BGP sessions landing in such
>> devices are not terminated when TCP causes the BGP session to stall.
>> The problem of how TCP and BGP-4 can interact has been discussed before,
>> but I'm not sure the working group followed up with any publication
>> detailing the problem and the solution.
>> Does everyone agree BGP-4 sessions MUST be terminated using a TCP RST
>> (instead of a BGP-4 Cease NOTIFICATION) if the peer has indicated for
>> the duration of the Hold Timer that the TCP receive window is zero?
>> I'm fine with there being buttons to make this different, but the
>> default for routers in the global Internet routing system should be to
>> consider the remote peer to be 'a lost cause' when it won't accept new
>> BGP messages for the duration of the hold timer.
>> Perhaps RFC 4271 Section 6.5 should be amended as following:
>> OLD:
>>   If a system does not receive successive KEEPALIVE, UPDATE, and/or
>>   NOTIFICATION messages within the period specified in the Hold Time
>>   field of the OPEN message, then the NOTIFICATION message with the
>>   Hold Timer Expired Error Code is sent and the BGP connection is
>>   closed.
>> NEW:
>>   If a system does not receive (or is unable to send) successive
>>   KEEPALIVE, UPDATE, and/or NOTIFICATION messages within the period
>>   specified in the Hold Time field of the OPEN message, then the
>>   NOTIFICATION message with the Hold Timer Expired Error Code is sent
>>   and the BGP connection is closed. If the NOTIFICATION message cannot
>>   be send the BGP connection is closed.
>> This is an ongoing problem. I suspect the BGP Nyancat's discoloration at
>> the left most eye might have been caused by an active TCP session
>> keeping a stale BGP session alive. But also the observations from "BGP
>> Zombies: an Analysis of Beacons Stuck Routes" [3] could be explained by
>> the problematic interaction between TCP and BGP.
>> I appreciate the work the IDR working group has done to *SOFTEN* the
>> blow from implementation defects on global routing (RFC 7606 is a
>> brilliant example of this), but I fear in this case there is no subtle
>> way to say goodbye when the peer doesn't process messages in a timely
>> fashion. It might be good to document this.
>> Kind regards,
>> Job
>> [1]:;!!NEt6yMaO-gk!WnfNFxBMMXzuVhI23_QuKvcPfiG3Jwero3GwHhk0hhH6WNn1W0XWUkMkF2w4cg$
>> [2]:;!!NEt6yMaO-gk!WnfNFxBMMXzuVhI23_QuKvcPfiG3Jwero3GwHhk0hhH6WNn1W0XWUkMry7Ktyw$
>> [3]:;!!NEt6yMaO-gk!WnfNFxBMMXzuVhI23_QuKvcPfiG3Jwero3GwHhk0hhH6WNn1W0XWUkO8A78j8Q$
>> _______________________________________________
>> Idr mailing list
>    _______________________________________________
>    Idr mailing list
> _______________________________________________
> Idr mailing list