Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

Jared Mauch <> Sat, 12 December 2020 15:21 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id DF8B03A1195 for <>; Sat, 12 Dec 2020 07:21:29 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, MIME_QP_LONG_LINE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id vu-5fktNG4ws for <>; Sat, 12 Dec 2020 07:21:27 -0800 (PST)
Received: from ( [IPv6:2001:418:3f4::5]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id BED883A118D for <>; Sat, 12 Dec 2020 07:21:27 -0800 (PST)
Received: from [IPv6:2607:fb90:30:f2ea:757b:d021:1ab9:4a90] (unknown [IPv6:2607:fb90:30:f2ea:757b:d021:1ab9:4a90]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by (Postfix) with ESMTPSA id 8257D54017F; Sat, 12 Dec 2020 10:21:24 -0500 (EST)
Content-Type: multipart/alternative; boundary=Apple-Mail-5CF835E5-CC6A-4DC8-9A3A-DE3EAA721AC6
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (1.0)
From: Jared Mauch <>
In-Reply-To: <>
Date: Sat, 12 Dec 2020 10:21:19 -0500
Cc: Christoph Loibl <>,, Robert Raszuk <>
Message-Id: <>
References: <>
To: "Jakob Heitz (jheitz)" <>
X-Mailer: iPhone Mail (18B92)
Archived-At: <>
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sat, 12 Dec 2020 15:21:30 -0000

One of the issues I've seen here is the tcp stacks in routers be very ... unique. Much of this is because there is significant desire for state handling to be replicated between dual routing daemons and line cards. 

I spent many hours with cisco about the perils of non default settings for their stack for things like keepalives. These are all the choice of bad OS level defaults and the applications not having consistent socket behavior. This example with XR is mostly to air the details of their implementation and how it would impact the discussion. 

These implementation details matter as window 0 isn't clear and may not go up the stack or may not come down enough. 

There's a delicate balance here in consume the data as quickly as possible and run the rib and fib downloading in alternative threads and when it should cause a blocking event and when the back pressure should cause the notification or tcp tear down. 

The applications should be more aware of the transport state and I'm worried this runs into the "please don't write bugs" category. It's at least close to it. 

Sent from my iCar

> On Dec 12, 2020, at 9:07 AM, Jakob Heitz (jheitz) <> wrote:
>  No.
> Regards,
> Jakob.
>>> On Dec 12, 2020, at 6:01 AM, Christoph Loibl <> wrote:
>>  Hi,
>> Isn’t it save to assume that if a system cannot send any messages to its BGP-neighbor (for whatever reason) for HOLD TIME seconds, that the neighbor on the other side has by that time already declared the BGP session dead (is already “trying" to deliver a NOTIFICATION and has removed the routes from its RIB). If this is the case I see no point in trying to keep the session alive, because it will *always* sooner or later lead to a new session-setup + flapping routes. The BGP session (if the NOTIFICATION is queued) cannot recover from that state anymore (can it?) and is useless, even if there are chances that messages may get delivered later. 
>> Cheers Christoph
>> -- 
>> Christoph Loibl
>> | CL8-RIPE | PGP-Key-ID: 0x4B2C0055 |
>>> On 12.12.2020, at 10:22, Robert Raszuk <> wrote:
>>> I went back and reread the thread: 
>>> Shouldn't it be better if we first ask implementations to provide show command/api to list all peers and min-max durations of TCP Window being 0 without actually doing any automagic RST/NOTIFICATION/FIN ? 
>>> This could allow to better understand which peers are getting behind in their control plane and perhaps also allow to set the RST timer under such conditions by operator? If he chooses this to be equal to HOLD TIME so be it but I am not sure this would be universally an optimal choice. 
>>> Along the same lines we should perhaps also list per BGP peer number of DUPLICATE ACKS, RETRANSMISSIONS etc ... 
>>> Are there implementations already deployed in DFZ allowing such data to be displayed per each BGP peer ?
>>> Thx,
>>> Robert.
>>> _______________________________________________
>>> Idr mailing list
>> _______________________________________________
>> Idr mailing list
> _______________________________________________
> Idr mailing list