Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

Gyan Mishra <hayabusagsm@gmail.com> Sat, 19 December 2020 23:32 UTC

Return-Path: <hayabusagsm@gmail.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2E5843A0C43 for <idr@ietfa.amsl.com>; Sat, 19 Dec 2020 15:32:27 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.087
X-Spam-Level:
X-Spam-Status: No, score=-2.087 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_REMOTE_IMAGE=0.01, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1bZKuoSIWupg for <idr@ietfa.amsl.com>; Sat, 19 Dec 2020 15:32:24 -0800 (PST)
Received: from mail-pj1-x102d.google.com (mail-pj1-x102d.google.com [IPv6:2607:f8b0:4864:20::102d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 471ED3A0C3B for <idr@ietf.org>; Sat, 19 Dec 2020 15:32:24 -0800 (PST)
Received: by mail-pj1-x102d.google.com with SMTP id b5so3699217pjl.0 for <idr@ietf.org>; Sat, 19 Dec 2020 15:32:24 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=FPbCcdQQwrTgB2s2AD3sSN2cgJ174hgaZLZi5RR4Y+I=; b=B2usdLA/Chu0U5N1LaZB2MPFWa+vz9E8Z+2aFfHz7cTlor5ILJ3zeghQq/2DMfJdU/ LLXA5kQx6HrfstIR5NGfPLiYdiAr1mjE2eMxIDhwhsruJOabxTWmpuSDizUnCmaoDriB ir7308EzFlTKHS7xG5hq/Q4rWHs/PSujnAv38PiqA8SIuTTt7YTKo0OGgQ4jar5eQ79Y E1S/xLGBGRz3e4P7o3zog03bX8rrLcycEAzFkHZzp71NZpPW7ETEdaSOE14muGPSbWjV tfY1fDC9e1ICDra6x5e6YiOzhmf9FYCWgSEv/0FBdinkl4g+Un+UQuZJB3SYw2JZEWHk iJXw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=FPbCcdQQwrTgB2s2AD3sSN2cgJ174hgaZLZi5RR4Y+I=; b=OcLyHdbk7nYXi8U1A7i+FCcrVRVUqJys/OzWtIUJXWj4mKA3+LIq7YcqOR0DIZmpRC QrS8Nz8TQq8A+h/xlLGG/FN1dDM9M+mRngNsMgYSfeTdnEiZBJkqtJICYOoQ2+ByeG7p dxLb4PGydcoMcVL56stb+wS+hdanIgCSv30Z39ZWkehRI+41Al6MYMzZjvmldLRTfXRg QD9TRwkPoQ/XXV1tgWAkfZSBkqLLXtHYIRMptc78+BOLpALMyAEFU4r/hcvBgUGLtJZj ZE4UNg8ADjcVbb3kx97hdxqhY4yjldgAsTf/JLn9zT1+uGpYe7lPpRMlfkHWlJoi393C XSiQ==
X-Gm-Message-State: AOAM532IYYCmLV+xFy1MSdzuFQu0GsP1t0jscr+qTqLX0ZXatr2UpQLn CR23sLJC+K1J4+VGbxynpir5x/J80Lo62e5EWjQ=
X-Google-Smtp-Source: ABdhPJyvtLiBOaeaLCkdD0EUQAls0teu545UroQEAmrameGKADVr9rkkDG0fewbmHUxUjiXU/g7wyH0SZ1+5qxQr6YU=
X-Received: by 2002:a17:902:a711:b029:dc:2f27:c67f with SMTP id w17-20020a170902a711b02900dc2f27c67fmr8903955plq.74.1608420743461; Sat, 19 Dec 2020 15:32:23 -0800 (PST)
MIME-Version: 1.0
References: <CANJ8pZ-WMDotkQvhN-NuP7ivZkPRR-9S2KJSar=6463U0VKkow@mail.gmail.com> <EFC56A31-1276-4DAB-9526-9C2F24814D2C@pfrc.org> <CANJ8pZ_LnDna_jtipcLJq9rrS3MM32rLdxRW8ntC2aEi9VvzMg@mail.gmail.com> <722A787A-5B83-4802-A9F4-AB2957BB3305@juniper.net> <CA+eZshBse4g6jUBMxs4bJiE+uvWScwv7ggLNOMJbUiL1YsaisQ@mail.gmail.com> <CABNhwV1ikHAknsfNDw6GJ8BngHDNjNdCxmgipJvJ7G3rxmnZVA@mail.gmail.com> <CAOj+MMHM0bHHL9UfVZC2QWy6=W5F7QtEq9v-rndcUG0u7CLi1Q@mail.gmail.com> <CABNhwV3CaWn5gsFGr4HNi_qoE4V1N1CA44KN+fFFvVCYr1YMgw@mail.gmail.com>
In-Reply-To: <CABNhwV3CaWn5gsFGr4HNi_qoE4V1N1CA44KN+fFFvVCYr1YMgw@mail.gmail.com>
From: Gyan Mishra <hayabusagsm@gmail.com>
Date: Sat, 19 Dec 2020 18:32:12 -0500
Message-ID: <CABNhwV1c8=OxB1B-5j3HzLPpiK17AGK5QDnuJiWh=qBqEKMSOw@mail.gmail.com>
To: Robert Raszuk <robert@raszuk.net>
Cc: Enke Chen <enchen@paloaltonetworks.com>, John Scudder <jgs=40juniper.net@dmarc.ietf.org>, William McCall <william.mccall@gmail.com>, "idr@ietf. org" <idr@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000103ce305b6d9a3de"
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/zBHnxi01IDfYmZh3pFBWVCdtx20>
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Dec 2020 23:32:27 -0000

I think one critical point I am missing and please correct me if I am wrong.

When the 0 receive window is received and when the hold timer expires a
notification is sent but since the buffer is full on the other end the tcp
socket does not get the notification thus end up in an indefinite TCP pause
state until the default TCP user timeout is reached.

Completely agree BFD cannot resolve this and we need to mess with TCP user
timeout option is the only solution.

Kind Regards

Gyan

On Sat, Dec 19, 2020 at 6:20 PM Gyan Mishra <hayabusagsm@gmail.com> wrote:

>
> Hi Robert
>
> On the other thread it was not quite clear.
>
> So if this scenario is completely devoid of link congestion and purely a
> management plane TCP control plane processing BGP socket processing issue
> then I agree BFD won’t help at all.
>
> I agree with the poor RP design of management plane that either lead to RP
> being overwhelmed high cpu and memory and or possibly memory leak or bug.
>
> Do we know which vendor?
>
> Something simple otter then messing with TCP parameters, if instead of
> using the default 90 second BGP dead timer,  if that was reduced down a bit
> to like 10 / 30, that could limit the time traffic is black hole and not
> rerouted to alternate path until the hold timer expires.
>
>
> Kind Regards
>
>
> Gyan
>
> On Sat, Dec 19, 2020 at 5:18 PM Robert Raszuk <robert@raszuk.net> wrote:
>
>> Hi Gyan,
>>
>> > Going down this path of does seem a lot more complicated and risker
>> then using BFD.
>>
>> But BFD is not going to help at all to the problem at hand.
>>
>> BFD is in the vast majority of cases distributed (and that is feature not
>> a bug) and responses are handled by line cards.
>>
>> Here we are dealing with RE/RP based subsystems bugs regardless if those
>> are in TCP or BGP layer.
>>
>> Thx,
>> R.
>>
>>
>>
>>
>>
>>
>> On Sat, Dec 19, 2020 at 10:36 PM Gyan Mishra <hayabusagsm@gmail.com>
>> wrote:
>>
>>>
>>> Here is the RFC 5482 TCP User timeout options from TCPM WG.
>>>
>>> https://tools.ietf.org/html/rfc5482
>>>
>>> TCPM has a bis draft update to 793 that has more info then the original.
>>>
>>> https://datatracker.ietf.org/wg/tcpm/documents/
>>>
>>> https://tools.ietf.org/html/draft-ietf-tcpm-rfc793bis-19#page-42
>>>
>>>
>>> From quick read there are caveats with devices supporting or not
>>> supporting the option.
>>>
>>> Also I guess setting the value is tricky as well not too low or too high
>>> that either could make matters worse with instability.
>>>
>>> Going down this path of does seem a lot more complicated and risker then
>>> using BFD.
>>>
>>>
>>> Kind Regards
>>>
>>> Gyan
>>>
>>>
>>> On Sat, Dec 19, 2020 at 5:38 AM William McCall <william.mccall@gmail.com>
>>> wrote:
>>>
>>>> On Fri, Dec 18, 2020 at 10:33 PM John Scudder
>>>> <jgs=40juniper.net@dmarc.ietf.org> wrote:
>>>> >
>>>> > On Dec 18, 2020, at 1:09 PM, Enke Chen <enchen@paloaltonetworks.com>
>>>> wrote:
>>>> > >
>>>> > > No, I am not assuming that packets are getting somewhere. The
>>>> TCP_USER_TIMEOUT would work as long as there is "pending data" (either
>>>> unacked, or locally queued). The data can be from the local BGP Keepalives
>>>> or the TCP_KEEPALIVE.
>>>> >
>>>> > Apart from the other objections to relying on TCP_USER_TIMEOUT, which
>>>> I think are sufficient, it’s not clear to me that implementations will
>>>> provide the desired semantics. RFC 793 seems like it specifies the right
>>>> semantics (“get this data to the peer within N seconds or close”):
>>>> >
>>>> >         The timeout, if present, permits the caller to set up a
>>>> timeout
>>>> >         for all data submitted to TCP.  If data is not successfully
>>>> >         delivered to the destination within the timeout period, the
>>>> TCP
>>>> >         will abort the connection.  The present global default is five
>>>> >         minutes.
>>>> >
>>>> > However the Linux man page documents different semantics:
>>>> >
>>>> >        TCP_USER_TIMEOUT (since Linux 2.6.37)
>>>> >               This option takes an unsigned int as an argument.  When
>>>> the
>>>> >               value is greater than 0, it specifies the maximum
>>>> amount of
>>>> >               time in milliseconds that transmitted data may remain
>>>> >               unacknowledged before TCP will forcibly close the
>>>> >               corresponding connection and return ETIMEDOUT to the
>>>> >               application.  If the option value is specified as 0,
>>>> TCP will
>>>> >               use the system default.
>>>> >
>>>> > The important difference being that whereas 793 implies data written
>>>> to the socket, the Linux man page says “transmitted” data, which seems like
>>>> it must mean data TCP has written to the network. These are two very
>>>> different things! If Linux (or another stack) implements what the man page
>>>> seems to say, it’s not useful for our purposes.
>>>> >
>>>> > —John
>>>> > _______________________________________________
>>>> > Idr mailing list
>>>> > Idr@ietf.org
>>>> > https://www.ietf.org/mailman/listinfo/idr
>>>>
>>>> I was curious too. I read the manpage, relevant linux kernel code, the
>>>> RFC, and hacked up a test case (unicast me if you want the code).
>>>> Also, Cloudflare published a relevant blog entry[0]. For this specific
>>>> scenario, see under the sub-heading "Zero window ESTAB is...
>>>> forever?".
>>>>
>>>> TCP_USER_TIMEOUT doesn't appear to kick in until there is unACKed
>>>> data, meaning that it has already been transmitted from TCP's
>>>> perspective. Stuff hanging around in the buffers due to persist state
>>>> doesn't seem to count, per the test results and the docs. Confirms
>>>> your thoughts from the reading I think.
>>>>
>>>> [0] https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
>>>>
>>>> --
>>>> William McCall
>>>>
>>>> _______________________________________________
>>>> Idr mailing list
>>>> Idr@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/idr
>>>>
>>> --
>>>
>>> <http://www.verizon.com/>
>>>
>>> *Gyan Mishra*
>>>
>>> *Network Solutions A**rchitect *
>>>
>>>
>>>
>>> *M 301 502-134713101 Columbia Pike
>>> <https://www.google.com/maps/search/13101+Columbia+Pike%C2%A0+Silver+Spring,+MD?entry=gmail&source=g>*Silver
>>> Spring, MD
>>> <https://www.google.com/maps/search/13101+Columbia+Pike%C2%A0+Silver+Spring,+MD?entry=gmail&source=g>
>>>
>>> _______________________________________________
>>> Idr mailing list
>>> Idr@ietf.org
>>> https://www.ietf.org/mailman/listinfo/idr
>>>
>> --
>
> <http://www.verizon.com/>
>
> *Gyan Mishra*
>
> *Network Solutions A**rchitect *
>
>
>
> *M 301 502-134713101 Columbia Pike *Silver Spring, MD
>
> --

<http://www.verizon.com/>

*Gyan Mishra*

*Network Solutions A**rchitect *



*M 301 502-134713101 Columbia Pike *Silver Spring, MD