Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

Jeffrey Haas <jhaas@pfrc.org> Thu, 17 December 2020 14:34 UTC

Return-Path: <jhaas@pfrc.org>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 643873A08E7 for <idr@ietfa.amsl.com>; Thu, 17 Dec 2020 06:34:31 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uSsWZ0mS_8jR for <idr@ietfa.amsl.com>; Thu, 17 Dec 2020 06:34:29 -0800 (PST)
Received: from slice.pfrc.org (slice.pfrc.org [67.207.130.108]) by ietfa.amsl.com (Postfix) with ESMTP id AF5DE3A08B1 for <idr@ietf.org>; Thu, 17 Dec 2020 06:34:29 -0800 (PST)
Received: from dresden.attlocal.net (99-59-193-67.lightspeed.livnmi.sbcglobal.net [99.59.193.67]) by slice.pfrc.org (Postfix) with ESMTPSA id 5FC111E356; Thu, 17 Dec 2020 09:51:54 -0500 (EST)
Content-Type: multipart/alternative; boundary="Apple-Mail=_EDA1CF12-F02B-4677-9065-33EBBB560CB5"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\))
From: Jeffrey Haas <jhaas@pfrc.org>
In-Reply-To: <CANJ8pZ_02njLOJxJPAW4vT3q0EPGB6WY1ZGemQpfiXNMhadb6A@mail.gmail.com>
Date: Thu, 17 Dec 2020 09:34:30 -0500
Cc: Job Snijders <job@sobornost.net>, idr@ietf.org
Message-Id: <6C03CB89-E307-4803-90B1-A817D0AE16B8@pfrc.org>
References: <CANJ8pZ_02njLOJxJPAW4vT3q0EPGB6WY1ZGemQpfiXNMhadb6A@mail.gmail.com>
To: Enke Chen <enchen@paloaltonetworks.com>
X-Mailer: Apple Mail (2.3608.120.23.2.4)
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/I0-pImSnqYNoSsKUcmQqPy3vEuU>
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Dec 2020 14:34:31 -0000

Enke,

Indirectly you are pointing out that an application at the end of a socket is loosely coupled to the TCP state of a session, at best.

This loose coupling gets even messier when you're on a "real" router with staged host paths, NSR features, etc.

For daemons that are roughly Unix, the usual pattern is likely "if your socket is blocked for the requisite period of time, peek into the TCP state, then decide whether to abruptly hangup or not".

The scenarios we've discussed haven't even gotten into the esoteric problems such as SOME types of TCP/IP packets might get through, but not others.  Two ready examples from my career include bugs in TCP-MD5 where a specific bit of data was mis-signed and wedged the TCP window, and a particular network I dealt with that IP checksumming issues that caused long-lived TCP sessions to die in it.  Another many of us have seen is path MTU discovery hiccups that result in wedged sessions.  

Middleboxes of all sorts, including security THINGs, only make this stuff worse.

-- Jeff

> On Dec 16, 2020, at 9:40 PM, Enke Chen <enchen@paloaltonetworks.com> wrote:
> 
> Regarding the patch for openBGPD pointed out by Job, I do not think it would work. When the TCP rcv window from the remote is 0, the BGP keepalive can still be queued to the socket buffer. It can take a long time for the socket buffer to be filled up by BGP keepalives.
> 
> It seems that the TCP_USER_TIMEOUT option can be used for the persistent zero-size window issue.  The timeout value could be multiples of the holdtimer (with min and max adjustments), perhaps somewhere around 5 or 6 minutes.
> 
>