Re: [Idr] draft-spaghetti-idr-bgp-sendholdtimer - Feedback requested

Jeffrey Haas <> Sat, 24 April 2021 00:25 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 3F9273A16EA for <>; Fri, 23 Apr 2021 17:25:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id W0dwdJbVf6qh for <>; Fri, 23 Apr 2021 17:25:34 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id D0BC83A16E8 for <>; Fri, 23 Apr 2021 17:25:34 -0700 (PDT)
Received: by (Postfix, from userid 1001) id 54FD71E44B; Fri, 23 Apr 2021 20:48:39 -0400 (EDT)
Date: Fri, 23 Apr 2021 20:48:39 -0400
From: Jeffrey Haas <>
To: Robert Raszuk <>
Cc: Ben Cox <>, "idr@ietf. org" <>
Message-ID: <>
References: <> <> <>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <>
User-Agent: Mutt/1.5.21 (2010-09-15)
Archived-At: <>
Subject: Re: [Idr] draft-spaghetti-idr-bgp-sendholdtimer - Feedback requested
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sat, 24 Apr 2021 00:25:37 -0000


On Sat, Apr 24, 2021 at 12:23:15AM +0200, Robert Raszuk wrote:
> One thing which also I am worried about with this proposal is that data
> plane may be working just fine (imagine stub ASN where it advertised a
> prefix and received default) yet zero window was signalled by the peer for
> any of those reasons Jeff nicely enumerated.
> So what we are discussing is breaking data plane just because control plane
> has experienced 15 min (or worse recommended 4 min) inability to send
> keepalives.

A good analogy is the negative impacts of stale routes when you use Graceful
Restart for BGP.  Can you live with the routes in that flavor of stale for
that long?

Arguably, knowing what may be queued up - or pushed to the socket but not
acknowledged - is possibly part of the decision process to decide if you
really need to drop the session.  If you have a stable topology and it's
just keepalives queued up, you may not care as much.

> So two questions ..
> * Should we perhaps test data plane before declaring peer's failure and
> before we reset the session ? (I understand that the paramount motivation
> is BGP consistency here though - but this is one of those cases where one
> size may not fit all).

In many of these scenarios, BFD or ping would show the interface up.  It's
the TCP session that is stalled out.

> * Should we first withdraw received routes from our peers before resetting
> the session ? At least data plane will have a chance to converge to a
> different set of links with no sudden packet drops.

Would you describe the drain scenario with the involved parties and what the
congestion state is as part of that?  I don't think I'm understanding the
above point.

-- Jeff