Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

The trade-off is (as often happens) between stability and convergence.
Given severity, I’d prefer formalized approach rather than implementation artifact ( at mercy of Product Manager in charge ;-))

Regards,
Jeff

> On Dec 11, 2020, at 15:30, Keyur Patel <keyur@arrcus.com> wrote:
> 
> One comment inlined #Keyur
> 
> On 12/11/20, 12:04 PM, "Idr on behalf of John Scudder" <idr-bounces@ietf.org on behalf of jgs=40juniper.net@dmarc.ietf.org> wrote:
> 
>    [all hats on]
> 
>    Hi Job,
> 
>    Thanks for bringing this up.
> 
>    To take the liberty of summarizing your wall of text :-) you’re saying that you believe BGP should tear down its session if it’s unable to send a message for the duration of the hold time. 
> 
>    Given that the conversation last time was inconclusive I think this is a good thing for the WG to discuss again. If you want to, you (or someone) could turn the idea into a short draft that updates RFC 4271, and we could have a WG adoption discussion about it. It might help focus the discussion but it’s not mandatory.
> 
>    I’ll point out a few things to start with —
> 
>    - Making it mandatory to apply hold time to the sending of messages would potentially make BGP peerings less stable. It clearly can’t make them *more* stable. Of course one can argue that if you haven’t been able to send a message for the hold time, the session has failed its metric of usefulness anyway, so any veneer of stability at this point is a harmful sham.
>    - If I recall correctly, RST doesn’t work (or may not work) if you’re using the MD5 TCP option. Nothing much to be done, but be aware.
>    - There is nothing stopping an implementation from doing what you describe now. The formalism that keeps you within the letter of 4271 would be that the implementation supplies a configuration option, that you set to enable the behavior. Once you’ve done that, when the implementation notices that the hold time has been exceeded in the outbound direction, it generates a ManualStop event for the session. 
> 
> #Keyur: +1 to what John said. This could very well be an implementation knob that generates ManualStop event.
> 
> Regards,
> Keyur
> 
>    Thanks,
> 
>    —John
> 
>> On Dec 11, 2020, at 2:23 PM, Job Snijders <job@sobornost.net> wrote:
>> 
>> 
>> Dear group,
>> 
>> Not too long ago an incident [1] in one Autonomous System resulted in
>> the global Internet being unusable in many parts of the world for
>> multiple hours. Some have reported the root cause was a 'configuration
>> error', however I believe much of the observed communication blackouts
>> in the global routing system stemmed from a pre-existing condition: a
>> specific implementation property present in multiple implementations
>> currently in use in the default-free zone.
>> 
>> Usually when an incident happens in one AS, affected parties can through
>> unilateral action 'route around the problem', but the ability to 'route
>> around problems' critically depends on the ability to distribute
>> WITHDRAW or UPDATE messages. When messages are not processed, what
>> generally was assumed to be a unilaterally solvable problem, now requires
>> coordination between *all* neighbors of the suffering AS.
>> 
>> The global routing system requires every participant to process BGP
>> messages, because the alternative is intervention on thousands of BGP
>> devices to manually shutdown thousands of BGP sessions disconnecting the
>> AS suffering from an incident, to help the rest of the default-free
>> zone. I speak from experience when saying that coordinating a disconnection
>> of an AS at global scale is incredibly hard and slow, any many approval
>> levels must be worked through. It takes *hours* of phone calls & email
>> chains, a time window during which internet traffic is routed towards
>> stale (now blackholing) locations.
>> 
>> In the average ISP's network design using IBGP Route Reflectors, these
>> blackout effects are aggravated when BGP sessions landing in such
>> devices are not terminated when TCP causes the BGP session to stall.
>> 
>> The problem of how TCP and BGP-4 can interact has been discussed before,
>> but I'm not sure the working group followed up with any publication
>> detailing the problem and the solution.
>> 
>>   https://urldefense.com/v3/__https://mailarchive.ietf.org/arch/msg/idr/q0Sx5d3zZjfOmOQ4lO2OZAHh9Lc/__;!!NEt6yMaO-gk!WnfNFxBMMXzuVhI23_QuKvcPfiG3Jwero3GwHhk0hhH6WNn1W0XWUkPhCc8cBA$
>> 
>> Does everyone agree BGP-4 sessions MUST be terminated using a TCP RST
>> (instead of a BGP-4 Cease NOTIFICATION) if the peer has indicated for
>> the duration of the Hold Timer that the TCP receive window is zero?
>> I'm fine with there being buttons to make this different, but the
>> default for routers in the global Internet routing system should be to
>> consider the remote peer to be 'a lost cause' when it won't accept new
>> BGP messages for the duration of the hold timer.
>> 
>> Perhaps RFC 4271 Section 6.5 should be amended as following:
>> 
>> OLD:
>>   If a system does not receive successive KEEPALIVE, UPDATE, and/or
>>   NOTIFICATION messages within the period specified in the Hold Time
>>   field of the OPEN message, then the NOTIFICATION message with the
>>   Hold Timer Expired Error Code is sent and the BGP connection is
>>   closed.
>> 
>> NEW:
>>   If a system does not receive (or is unable to send) successive
>>   KEEPALIVE, UPDATE, and/or NOTIFICATION messages within the period
>>   specified in the Hold Time field of the OPEN message, then the
>>   NOTIFICATION message with the Hold Timer Expired Error Code is sent
>>   and the BGP connection is closed. If the NOTIFICATION message cannot
>>   be send the BGP connection is closed.
>> 
>> This is an ongoing problem. I suspect the BGP Nyancat's discoloration at
>> the left most eye might have been caused by an active TCP session
>> keeping a stale BGP session alive. But also the observations from "BGP
>> Zombies: an Analysis of Beacons Stuck Routes" [3] could be explained by
>> the problematic interaction between TCP and BGP.
>> 
>> I appreciate the work the IDR working group has done to *SOFTEN* the
>> blow from implementation defects on global routing (RFC 7606 is a
>> brilliant example of this), but I fear in this case there is no subtle
>> way to say goodbye when the peer doesn't process messages in a timely
>> fashion. It might be good to document this.
>> 
>> Kind regards,
>> 
>> Job
>> 
>> [1]: https://urldefense.com/v3/__https://www.reuters.com/article/level-3-communi-outages-idUSL2N1CB00C__;!!NEt6yMaO-gk!WnfNFxBMMXzuVhI23_QuKvcPfiG3Jwero3GwHhk0hhH6WNn1W0XWUkMkF2w4cg$
>> [2]: https://urldefense.com/v3/__https://labs.ripe.net/Members/cteusche/bgp-meets-cat__;!!NEt6yMaO-gk!WnfNFxBMMXzuVhI23_QuKvcPfiG3Jwero3GwHhk0hhH6WNn1W0XWUkMry7Ktyw$
>> [3]: https://urldefense.com/v3/__https://www.iij-ii.co.jp/en/members/romain/pdf/romain_pam2019.pdf__;!!NEt6yMaO-gk!WnfNFxBMMXzuVhI23_QuKvcPfiG3Jwero3GwHhk0hhH6WNn1W0XWUkO8A78j8Q$
>> 
>> _______________________________________________
>> Idr mailing list
>> Idr@ietf.org
>> https://urldefense.com/v3/__https://www.ietf.org/mailman/listinfo/idr__;!!NEt6yMaO-gk!WnfNFxBMMXzuVhI23_QuKvcPfiG3Jwero3GwHhk0hhH6WNn1W0XWUkMMXdwc-g$
> 
>    _______________________________________________
>    Idr mailing list
>    Idr@ietf.org
>    https://www.ietf.org/mailman/listinfo/idr
> 
> _______________________________________________
> Idr mailing list
> Idr@ietf.org
> https://www.ietf.org/mailman/listinfo/idr