Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

Robert Raszuk <robert@raszuk.net> Tue, 15 December 2020 23:18 UTC

MIME-Version: 1.0
References: <X9PHRuGndvsFzQrG@bench.sobornost.net> <CAOj+MME4OHmoqJfzNQ4Tj6+wCd1kJVHPfJsDbk_+Xh8fh5G8Dg@mail.gmail.com> <6F7C5906-51A8-43C2-8AEC-3DB74CB9941F@tix.at> <1B4E7C9D-BBFE-4865-87F9-133ACE55D122@cisco.com> <22C381D0-2174-4828-A724-FD97B2FE0BCB@tix.at> <9D6268BD-C555-4B9A-A883-9B55EEB5D5DA@juniper.net> <91D9B9F7-0DBE-45E6-84D5-2E3D9F8C44A1@tix.at> <X9kweQ5EtTL7tOAM@bench.sobornost.net>
In-Reply-To: <X9kweQ5EtTL7tOAM@bench.sobornost.net>
From: Robert Raszuk <robert@raszuk.net>
Date: Wed, 16 Dec 2020 00:18:36 +0100
Message-ID: <CAOj+MMFySPXpE8QxcO+7szKzQ78faQASYKnBUYg_h_aLd=P4Lg@mail.gmail.com>
To: Job Snijders <job@sobornost.net>
Cc: Christoph Loibl <c@tix.at>, John Scudder <jgs@juniper.net>, "idr@ietf.org" <idr@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000002f30f05b688fbc9"
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/V_ZJW5VKARw5bk4r8QotMcnzAjw>
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
Precedence: list

Hi Job,

Putting all other concerns aside I have few questions ...

1. Is this BGP which should trigger the session RST or FIN or TCP ?

2. If this is BGP (TCP would not be aware of HOLD_SEND) how exactly do we
know that peer's window is 0 for HOLD_SEND TIME ?

3. Which TCP socket option will return BGP an error that for the duration
of X sec window for a given peer was 0 ? I presumed even if it jumped for
100 ms above 0 the timer would be reset indicating peer is still alive ?

>From your bgpd example you are not checking anything other then BGP's
ability to write to out queue. So is this the suggestion now forgetting all
about TCP layer ? Simply if I can not write anything to a peer for over X
sec RST the session ?

Hi John,

I think the suggestion is to add a second HOLD_SEND TIME different from
normal HOLD TIME.

Also there could be lost of different type of peers so unless HOLD_SEND
would be say 5 x HOLD putting all peers under same time value may be
suboptimal.

Thx,
R.


On Tue, Dec 15, 2020 at 10:54 PM Job Snijders <job@sobornost.net> wrote:

> On Tue, Dec 15, 2020 at 09:57:47PM +0100, Christoph Loibl wrote:
> > Thanks for answering my question in more detail. Maybe I was unclear
> > (but reading your email I think we are talking about the same).
> > > On 15.12.2020, at 21:00, John Scudder <jgs@juniper.net> wrote:
> > >
> > > I think you are talking about this scenario. I’ll copy the example
> > > from Rob’s message cited above:
> > >
> > >   rtr-A                   rtr-B
> > >   (congested c-p)         (uncongested c-p)
> > >   send window: >0         send window: 0
> > >   recv window: 0          recv window: >0
> > >
> > > In this case we expect:
> > >  a) rtr-B does not send any BGP packet (KEEPALIVE/UPDATE/NOTIFICATION)
> > > to rtr-A in normal operating circumstances.
> > >  b) rtr-A does not expect any KEEPALIVE/UPDATE packets from rtr-B. The
> > > session remains established even if no packet is received in the
> > > holdtime.
> > >  c) rtr-A continues to send KEEPALIVE packets to rtr-B.
> >
> > The part I have a problem to understand is b). It is clear that rtr-A
> > will not receive any packets from rtr-B because rtr-B cannot send them
> > (send window: 0). But does "rtr-A does not expect any KEEPALIVE/UPDATE
> > packets from rtr-B” mean that rtr-A has essentially suspended its
> > hold-timer until it is ready to receive new messages and opens up its
> > recv window? If yes, why? I would expect timers to run independently
> > of the transport protocol.
>
> Yeah, I'd expect that too. We've seen congested BGP implementations
> continue to send KEEPALIVEs but not accept (or send!) other BGP
> messages. And rtr-B's attempts at KEEPALIVE just be TCP ACked with zero
> window.
>
> I'd argue in the above scenario rtr-A is simply broken and rtr-B MUST
> proceed to close down the session towards rtr-A, rtr-B must cleanup and
> generate WITHDRAWs for any routes pointing to rtr-A. By doing the
> clean-up rtr-B does both itself and rtr-A a favor. If the issue was
> transcient rtr-A and rtr-B will re-establish a few minutes later
> (IdleHoldTimer, right?) and things will normalize.
>
> Arguably and measurably, rtr-A is operating its Loc-RIB (forwarding)
> based on stale routing information (assuming rtr-A is working at all!):
> rtr-A has not received any WITHDRAWs, UPDATEs (or somewhat less
> importantly KEEPALIVEs) from rtr-B.
>
> Rtr-B is fully aware of this stale situation, because rtr-B was not able
> to write these BGP messages to the network: the messages are still in
> OutQ. Rtr-A didn't accept any KEEPALIVE (or UPDATE/WITHDRAW) from
> rtr-B.
>
> How to solve this? Claudio Jeker took a look at what it would take in
> OpenBGPD and came up with the (tiny!) following patch, should be
> readable to most: https://marc.info/?l=openbsd-tech&m=160796802508185&w=2
>
> Ben Cox helped me create a 'EBGP peer from hell': a publicly accessible
> EBGP multihop instance which can reliably produce the undesirable
> TCP/BGP behavior we're discussing here. This 'peer from hell' will do
> the OPEN exchange but then manipulates the TCP recvwindow towards zero.
>
> All BGP implementations tested so far (5 famous ones) appear vulnerable
> because they continue to consider the BGP session healthy & stable
> (meanwhile OutQ keeps growing endlessly and zero BGP messages go across
> the wire).
>
> One network operator (with thousands of EBGP sessions in the DFZ)
> reported to me the above stalled-TCP scenario is *not* a common case on
> the Internet. On a normal day, a network operator will see no (zero)
> sessions stuck this way, which leads me to believe 'recvwind=0' ...
> *for the duration of the hold timer* is a very strong indicator for a
> really broken situation which should be attempted to automatically
> resolve.
>
> I believe BGP implementations are not helping any known deployment
> scenarios by *not* disconnecting a stuck peer, however on the other we
> now know about various operational examples where honoring recvwind=0
> for (hours, days) longer than $holdtimer led to global scale problems.
>
> As the 'not-at-all progressing OutQ' situation seems somewhat rare in
> the wild (yet continues to happen from time to time) I think it is worth
> discussing & documenting how implementers can attempt to avoid this
> state from happening. It might help make the Internet 1% more robust.
>
> BGP implementers (or operators wanting to test their equipment) feel
> free to contact me off-list if you'd like to set up an EBGP multihop
> session towards the 'peer from hell' testbed. Testing potential
> solutions this way is quite easy, the behavior can be triggered within a
> few seconds.
>
> Kind regards,
>
> Job
>
> ps. At this moment we have (1) an attempt at problem description, (2) a
> demonstration BGP-4 implementation of a 'problem causer', and (3) a
> different BGP-4 implementation with a 'solution'. This enables IDR to
> test interopability & (potentially revised) protocol compliance,
> hopefully moving the problem a bit from theoretical to practical
> reality? :)
>

[Idr] TCP & BGP: Some don't send terminate BGP wh… Job Snijders
Re: [Idr] TCP & BGP: Some don't send terminate BG… Tony Li
Re: [Idr] TCP & BGP: Some don't send terminate BG… John Scudder
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeff Tantsura
Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
Re: [Idr] TCP & BGP: Some don't send terminate BG… Tony Li
Re: [Idr] TCP & BGP: Some don't send terminate BG… Keyur Patel
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeff Tantsura
Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
Re: [Idr] TCP & BGP: Some don't send terminate BG… Keyur Patel
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
Re: [Idr] TCP & BGP: Some don't send terminate BG… Christoph Loibl
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
Re: [Idr] TCP & BGP: Some don't send terminate BG… Christoph Loibl
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jared Mauch
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jared Mauch
Re: [Idr] TCP & BGP: Some don't send terminate BG… William McCall
Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jared Mauch
Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
Re: [Idr] TCP & BGP: Some don't send terminate BG… Randy Bush
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jared Mauch
Re: [Idr] TCP & BGP: Some don't send terminate BG… John Scudder
Re: [Idr] TCP & BGP: Some don't send terminate BG… Christoph Loibl
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
Re: [Idr] TCP & BGP: Some don't send terminate BG… John Scudder
Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
Re: [Idr] TCP & BGP: Some don't send terminate BG… John Scudder
Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
Re: [Idr] TCP & BGP: Some don't send terminate BG… john heasley
Re: [Idr] TCP & BGP: Some don't send terminate BG… Tony Li
Re: [Idr] TCP & BGP: Some don't send terminate BG… Keyur Patel
Re: [Idr] TCP & BGP: Some don't send terminate BG… Keyur Patel
Re: [Idr] TCP & BGP: Some don't send terminate BG… Brian Dickson
Re: [Idr] TCP & BGP: Some don't send terminate BG… Claudio Jeker
Re: [Idr] TCP & BGP: Some don't send terminate BG… Claudio Jeker
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
Re: [Idr] TCP & BGP: Some don't send terminate BG… John Heasley
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
Re: [Idr] TCP & BGP: Some don't send terminate BG… Claudio Jeker
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
Re: [Idr] TCP & BGP: Some don't send terminate BG… Brian Dickson
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
Re: [Idr] TCP & BGP: Some don't send terminate BG… Brian Dickson
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
Re: [Idr] TCP & BGP: Some don't send terminate BG… Brian Dickson
Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
Re: [Idr] TCP & BGP: Some don't send terminate BG… Gert Doering
Re: [Idr] TCP & BGP: Some don't send terminate BG… Claudio Jeker
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
Re: [Idr] TCP & BGP: Some don't send terminate BG… Brian Dickson
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
Re: [Idr] TCP & BGP: Some don't send terminate BG… Brian Dickson
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
Re: [Idr] TCP & BGP: Some don't send terminate BG… John Scudder
Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
Re: [Idr] TCP & BGP: Some don't send terminate BG… William McCall
Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen