Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

Gyan Mishra <hayabusagsm@gmail.com> Fri, 18 December 2020 23:29 UTC

MIME-Version: 1.0
References: <9D6268BD-C555-4B9A-A883-9B55EEB5D5DA@juniper.net> <91D9B9F7-0DBE-45E6-84D5-2E3D9F8C44A1@tix.at> <X9kweQ5EtTL7tOAM@bench.sobornost.net> <CAOj+MMFySPXpE8QxcO+7szKzQ78faQASYKnBUYg_h_aLd=P4Lg@mail.gmail.com> <BYAPR11MB3207412804697588E4AA3F03C0C60@BYAPR11MB3207.namprd11.prod.outlook.com> <20201216093614.GI68083@diehard.n-r-g.com> <4E9BEA12-998A-4AD1-B342-4F26AA6EBA69@cisco.com> <20201216174319.GM68083@diehard.n-r-g.com> <BYAPR11MB320759EE6ABC8AB863BC1838C0C50@BYAPR11MB3207.namprd11.prod.outlook.com> <CAH1iCipjgS4-NPTjNhc7Cj73bitWgTcw=ufax7iOCCnT+xGiZQ@mail.gmail.com> <20201216220122.GE24940@pfrc.org> <CAH1iCiotC-9tQcfNkcJKH=OcEovi1ztZoJ_eiKg_mA-Wp+FJNw@mail.gmail.com> <CDC2CE54-6DC3-48AB-B9C1-3562F279D2C1@pfrc.org> <CABNhwV05Y+M0neri07kgW9E1zc9xjND6ptVY2e54rPCvNv9E5w@mail.gmail.com> <0F8752F1-D2AC-4A82-92EE-38FBBB9AA204@pfrc.org>
In-Reply-To: <0F8752F1-D2AC-4A82-92EE-38FBBB9AA204@pfrc.org>
From: Gyan Mishra <hayabusagsm@gmail.com>
Date: Fri, 18 Dec 2020 18:29:14 -0500
Message-ID: <CABNhwV0C-yKumCUBJJO3ffR6Mi7cf_NhTQA0nrQM72pKCny7jQ@mail.gmail.com>
To: Jeffrey Haas <jhaas@pfrc.org>
Cc: Brian Dickson <brian.peter.dickson@gmail.com>, Greg Mirsky <gregimirsky@gmail.com>, "Jakob Heitz (jheitz)" <jheitz=40cisco.com@dmarc.ietf.org>, "idr@ietf.org" <idr@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000a04fde05b6c57a93"
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/6yYkOembm1HekopUVfNYnNrGu9U>
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
Precedence: list

Hi  Jeff

In-line

Thanks

Gyan

On Fri, Dec 18, 2020 at 4:49 PM Jeffrey Haas <jhaas@pfrc.org> wrote:

> Gyan,
>
>
> > On Dec 18, 2020, at 3:44 PM, Gyan Mishra <hayabusagsm@gmail.com> wrote:
> >
> > Jeffrey
> >
> > + Greg Mirsky
> >
> > Would a simple solution be to use BFD RFC 5880 for liveliness detection
> single hop in async mode with BGP to bring down the protocol BGP registered
> with BFD.
>
> BFD is used for BGP regularly.  The use of it for ISP to ISP connections,
> as in the issue described, is not very typical.  Resiliency of the session
> is far more important for ISP to ISP communication than fast failure.

    Gyan>. Understood. Agreed.  Resiliency and stability.  Generally tuning
timers for fast convergence within an operators domain or PE-CE customer
side where I agree ISP to ISP it’s about stability and may leave default
BGP 90 second dead timer as in this case.  However,  I believe ISPs are
starting to use BFD and S-BFD for inter ISP but are very careful as don’t
want to create inter ISP instability from flapping link due to tight
timers.

>
>
> For BFD sessions to customers, fast failure is sometimes used.

   Gyan> Agreed, BGP fast external failure works well for L3 connections
where as soon as you loose connected from link down the peer immediately
goes down.  The gain with BFD is with MetroE links where OAM fault
propagation far end link down is not sent or where link stays UP connected
 to L2 switch at an IXP NAP peering point where you now end up relying on
default BGP timers for convergence.  BFD also even in LAN under floor L3
links where you can rely on fast external failover, BFD or S-BFD may still
be beneficial for one way fiber scenarios.

>
>
>
> >
> > As the application is not a file transfer between end hosts, and is two
> routers running BGP I don’t know if BGP implementation has a IPC call that
> signals BGP to hang on let’s wait for the receiver RTB to clear his buffer
> and signal with non zero ack.  If BGP could sense the TCP receive window 0
> via IPC that would be best and immediately tear down BGP and send
> notification hold timer expired.
>
> BGP implementations vary quite a bit.  Simpler implementations that pay
> only attention to basic socket APIs would simply see things like
> EWOULDBLOCK, EAGAIN or similar if doing async stuff.  If they're doing
> blocking sockets (unusual!), the implementation simply hangs.

   Gyan> Understood.  I guess this could be a case where the BGP version
draft could be handy troubleshooting these types of issues.

https://datatracker.ietf.org/doc/draft-abraitis-bgp-version-capability/

>
> FWIW, blocked sockets for this sort of thing is usually a socket
> programmer's first introduction to things that cause zero-windowing.

   Gyan> Yep

>
>
> > During this time until the BGP hold time expires default 90 seconds
> traffic is not able to reroute on an alternate path and we are black
> holding traffic until RTRB sends BGP notification hold time expired
> followed by TCP RST and BGP peer session torn down.
>
> It's important for the general case to realize that just because BGP is
> wedged up (control plane) that the forwarding plane may - or may not - be
> fine.  You can't tell from BGP.
>
> What you do know in the abstract is that you care about the sessions being
> healthy in particular:
> 1. If you're not able to receive updates from your peer, you may end up
> with stale forwarding via that peer.
> 2. If you have stuff to send to the peer, they may end up with stale
> forwarding to you.

   Gyan> Good point.  The management plane is critical to monitor health
such as resources, memory, cpu etc do TCP state is part of the management
plane so BGP process being split between TCP sockets  part of the
management plane and BGP functions themselves.  Due to BGP use of TCP
socket it does make it more vulnerable to DDOS and of course security
concerns and ensuring authentication is enabled.  The NOC usually is
looking for down peers and would not catch missing routes until reported by
Customer outage.  Health of a peer telemetry is critical for operators
which is usually any non zero number for received routes and stale routes
or missing routes is very difficult to troubleshoot and easily missed until
you get a ticket.

In that second case, you have a better local sense as to how urgent being
> stuck is.  If you have thousands of updates queued, it's probably dire.  If
> you have a few... is it?  If it's for a low priority network, maybe not.
> If it's for google, probably much more important.

  Gyan> Yep.  I think operators for any critical peering including inter SP
should use BFD to mitigate quickly prolonged outages where every second
counts.

>
>
> But in general, being stuck or out of sync is a problem.
>

   Gyan> Agreed.  I think in general the stuck state is generally
 management plane TCP socket related and stale could be many other reasons.

>
> But similarly, in general, the cost of dropping and re-establishing a
> peering session is very high.  So, there's resistance to knocking a session
> over because it's had some level of "temporary" hiccup.  Your definition of
> "temporary" will vary, and thus part of the motivation for this
> conversation.

   Gyan> I disagree in the case of stuck but not stale.  I can see stale
could recover and normalize on its own possibly, where stuck prevents
convergence between onto an alternate path which once the peer bounces and
is normalized it can now take traffic.  Most ISPs load balance all their
inter ISP connections BGP multipath ECMP paths so if you reset the peer you
are much better of then black hole of traffic during the duration of the
hold timer.

>
>
> > In this case we are guessing that the TCP receive buffer is full because
> the link is congested and so cannot process any more packets on the NIC
> including BGP or BFD control packets.
>
> The fate is potentially shared, but not a guarantee.  If the congestion is
> happening because traffic is selectively dropping for your BGP session, BFD
> may behave fine.  Perhaps you have a congestion issue to your  router's
> CPU, but the line card's BFD is fine.
>

Gyan> Agreed.  I brought up that scenario below which I was not sure
happened in this particular instance but could happen and how would BFD
help if a localized router management plane issue.

>
>
> >
> > So in this particular case with BFD Asynchronous mode enabled let’s say
> with interval 50ms and multiplier 3 as soon as soon as Receiver RTR-B
> misses 3 consecutive BFD control packets it pulls down the BGP session
> within 150ms at which time RTR-B sends notification log message that the
> hold time has expired and TCP RST is sent closing the session to RTR-A.
>
> This would be way too short for most ISP scenarios.
>

    Gyan> Agreed.  Just giving an example but for inter ISP would be around
a second like 750ms is reasonable.

>
> > BFD used UDP 6784 and is checking link integrity liveliness which would
> be fine and not fail if the link is not congested.  So then if BGP is
> having an issue with the TCP session being in a paused state is their IPC
> TCP to BGP to BFD.
>
> TCP session state is very decoupled from UDP state, so the best inference
> you can make is "BFD works, TCP hopefully can get through?"  But as I noted
> above, there's no guarantee of that.

    Gyan> As BFD is detecting bi directional liveliness if the BFD control
packet is not making it especially with RFC 5880 3 way handshake session
establishment continuity test that if the BFD session cannot establish more
then likely their is a fiber cut L1 issue.  If running S-BFD it still
detects data plane bi directional liveliness but without the 3 way
handshake session establishment continuity test.

>
>
> For a different flavor of this type of problem, IS-IS doesn't use IP
> transport.  This means IP forwarding can be broken but you can get ISO
> packets through.
>

   Gyan> BFD single hop RFC 5881 can still register ISIS with BFD tuning
the timers down for fast RFC 5880 session establishment async mode link
failure detection to bring down ISIS neighbors for convergence to avoid
black hole of traffic.

>
> > I think this second scenario where the link is not congested and TCP is
> stuck can be easily tested in a lab with a Spirent traffic generator.
>
> I'd suggest playing with selective packet loss for a link for a busy TCP
> session.  You should find that with no more than 15% of TCP packet loss
> that your throughput becomes terrible, and sessions may simply fail because
> the TCP ACK necessary to advance the window may simply not get through.

    Gyan> Will give it a shot

I think overall for link congestion or failure where bidirectional
continuity needs to be detected their is tremendous gain to using BFD
single hop async for BGP, OSPF or ISIS convergence.

It would be nice maybe if BFD or S-BFD or IPPM IOAM internally on the
router maybe it could run on router management plane to detect the control
plane health of socket establishment.  I would have to noodle if possible
but that could be a new innovative draft.

>
>
> -- Jeff
>
> --

<http://www.verizon.com/>

*Gyan Mishra*

*Network Solutions A**rchitect *

*M 301 502-134713101 Columbia Pike *Silver Spring, MD

[Idr] TCP & BGP: Some don't send terminate BGP wh… Job Snijders
Re: [Idr] TCP & BGP: Some don't send terminate BG… Tony Li
Re: [Idr] TCP & BGP: Some don't send terminate BG… John Scudder
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeff Tantsura
Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
Re: [Idr] TCP & BGP: Some don't send terminate BG… Tony Li
Re: [Idr] TCP & BGP: Some don't send terminate BG… Keyur Patel
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeff Tantsura
Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
Re: [Idr] TCP & BGP: Some don't send terminate BG… Keyur Patel
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
Re: [Idr] TCP & BGP: Some don't send terminate BG… Christoph Loibl
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
Re: [Idr] TCP & BGP: Some don't send terminate BG… Christoph Loibl
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jared Mauch
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jared Mauch
Re: [Idr] TCP & BGP: Some don't send terminate BG… William McCall
Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jared Mauch
Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
Re: [Idr] TCP & BGP: Some don't send terminate BG… Randy Bush
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jared Mauch
Re: [Idr] TCP & BGP: Some don't send terminate BG… John Scudder
Re: [Idr] TCP & BGP: Some don't send terminate BG… Christoph Loibl
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
Re: [Idr] TCP & BGP: Some don't send terminate BG… John Scudder
Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
Re: [Idr] TCP & BGP: Some don't send terminate BG… John Scudder
Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
Re: [Idr] TCP & BGP: Some don't send terminate BG… john heasley
Re: [Idr] TCP & BGP: Some don't send terminate BG… Tony Li
Re: [Idr] TCP & BGP: Some don't send terminate BG… Keyur Patel
Re: [Idr] TCP & BGP: Some don't send terminate BG… Keyur Patel
Re: [Idr] TCP & BGP: Some don't send terminate BG… Brian Dickson
Re: [Idr] TCP & BGP: Some don't send terminate BG… Claudio Jeker
Re: [Idr] TCP & BGP: Some don't send terminate BG… Claudio Jeker
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
Re: [Idr] TCP & BGP: Some don't send terminate BG… John Heasley
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
Re: [Idr] TCP & BGP: Some don't send terminate BG… Claudio Jeker
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
Re: [Idr] TCP & BGP: Some don't send terminate BG… Brian Dickson
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
Re: [Idr] TCP & BGP: Some don't send terminate BG… Brian Dickson
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
Re: [Idr] TCP & BGP: Some don't send terminate BG… Brian Dickson
Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
Re: [Idr] TCP & BGP: Some don't send terminate BG… Gert Doering
Re: [Idr] TCP & BGP: Some don't send terminate BG… Claudio Jeker
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
Re: [Idr] TCP & BGP: Some don't send terminate BG… Brian Dickson
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
Re: [Idr] TCP & BGP: Some don't send terminate BG… Brian Dickson
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jakob Heitz (jheitz)
Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
Re: [Idr] TCP & BGP: Some don't send terminate BG… John Scudder
Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
Re: [Idr] TCP & BGP: Some don't send terminate BG… William McCall
Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
Re: [Idr] TCP & BGP: Some don't send terminate BG… Jeffrey Haas
Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
Re: [Idr] TCP & BGP: Some don't send terminate BG… Robert Raszuk
Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
Re: [Idr] TCP & BGP: Some don't send terminate BG… Gyan Mishra
Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen
Re: [Idr] TCP & BGP: Some don't send terminate BG… Job Snijders
Re: [Idr] TCP & BGP: Some don't send terminate BG… Enke Chen