Re: [Idr] draft-spaghetti-idr-bgp-sendholdtimer - Feedback requested

Jeffrey Haas <> Tue, 27 April 2021 12:24 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 901F43A1621 for <>; Tue, 27 Apr 2021 05:24:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id uu-Ci1zGUwS6 for <>; Tue, 27 Apr 2021 05:24:10 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id CD97A3A1620 for <>; Tue, 27 Apr 2021 05:24:10 -0700 (PDT)
Received: by (Postfix, from userid 1001) id C80F81E44B; Tue, 27 Apr 2021 08:47:24 -0400 (EDT)
Date: Tue, 27 Apr 2021 08:47:24 -0400
From: Jeffrey Haas <>
To: "Jakob Heitz (jheitz)" <>
Cc: Robert Raszuk <>, "idr@ietf. org" <>, Ben Cox <>
Message-ID: <>
References: <> <> <> <> <> <>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <>
User-Agent: Mutt/1.5.21 (2010-09-15)
Archived-At: <>
Subject: Re: [Idr] draft-spaghetti-idr-bgp-sendholdtimer - Feedback requested
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 27 Apr 2021 12:24:13 -0000

On Sun, Apr 25, 2021 at 06:01:06AM +0000, Jakob Heitz (jheitz) wrote:
> A long time of TCP zero window does not indicate a data plane
> problem, nor a problem with routes received from the stuck peer.
> The blockage is in one direction only. The local speaker is unable
> to end routes to the stuck peer, but is able to receive routes
> from the stuck peer just fine.

A further bit of thinking for the problem:

While it's true that the blockage is one way, it's not necessarily
guaranteed that you're getting routes from the peer.  This is true even if
you might be receiving keepalives.

The example case would be a BGP that had a decoupled keepalive mechanism for
its FSM.  The implementation may not be draining its incoming rib-in from a
socket, and similarly may be stuck in publishing its rib-out.

Of course, there's no way to tell if this is happening.

This isn't a strong argument that Graceful Restart proecures may not still
be appropriate.  After all, we have some portion of a rib that may be
correct.  However, there's perhaps an argument that that it might be less in
sync in some circumstances than others.

-- Jeff