Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

Enke Chen <enchen@paloaltonetworks.com> Mon, 21 December 2020 06:52 UTC

Return-Path: <enchen@paloaltonetworks.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 92F2F3A0E82 for <idr@ietfa.amsl.com>; Sun, 20 Dec 2020 22:52:52 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.796
X-Spam-Level:
X-Spam-Status: No, score=-2.796 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=paloaltonetworks.com header.b=NEwASMR5; dkim=pass (2048-bit key) header.d=paloaltonetworks-com.20150623.gappssmtp.com header.b=wSDKnRtT
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id n8Fh83uHmMoi for <idr@ietfa.amsl.com>; Sun, 20 Dec 2020 22:52:49 -0800 (PST)
Received: from mx0b-00169c01.pphosted.com (mx0b-00169c01.pphosted.com [67.231.156.123]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 54DC63A0E81 for <idr@ietf.org>; Sun, 20 Dec 2020 22:52:49 -0800 (PST)
Received: from pps.filterd (m0048189.ppops.net [127.0.0.1]) by mx0b-00169c01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 0BL6m1Q2015219 for <idr@ietf.org>; Sun, 20 Dec 2020 22:52:48 -0800
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=paloaltonetworks.com; h=mime-version : references : in-reply-to : from : date : message-id : subject : to : cc : content-type; s=PPS12012017; bh=I/saoEfBVwFJR2hgcwlpm8l3MLuNHaAODFTnRfHenu4=; b=NEwASMR5EZXpGMqKYHMt772ZhLgY6hS+3q8L9GqfQROhPCCBrdXaZ8TB+0Yua7E2Jx/q GubUAsXUtuO6tc0AfmvbOmklIuQB02HZ1YmMgWwm4SdpIIrjNtbAtPcphh5mcUABaN0t wXPKsg9pw3/0xqH01J8HssD3dloZCdvOhs2FxJmm8BCtISmgQEZ89UQe+QKVe/aL6cbI tCh/jgDwaoFEPbBKyn3Jgr8nGgMdd98MYV53WAQNhe9+mEsXxaAxnZDSL1NLWYp5RsBX amjwgm0lzxaS5/BE0xSoXqb23j3dXbG58JHl+pzaiZVFd2DlUuwbCWtoITEBWEfHvy1O vw==
Received: from mail-lf1-f70.google.com (mail-lf1-f70.google.com [209.85.167.70]) by mx0b-00169c01.pphosted.com with ESMTP id 35hhwa4741-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for <idr@ietf.org>; Sun, 20 Dec 2020 22:52:48 -0800
Received: by mail-lf1-f70.google.com with SMTP id m67so8898143lfd.6 for <idr@ietf.org>; Sun, 20 Dec 2020 22:52:47 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=paloaltonetworks-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=I/saoEfBVwFJR2hgcwlpm8l3MLuNHaAODFTnRfHenu4=; b=wSDKnRtTHvfE2kIkw/JUiMln6tpfKwg1fyTh1zmHZDed1EJsVcUICnqwtBE4H5xcgI 5GjDVmVPDBSErP5USrO50HPMPyQNp/qw7G6S3VM3XthrV+XVSPosLiLe42DPBk9APOaf rL59opFMwuhwYD1xrMz9FtTVhrBYDNS+m3vmU1xFpWzZZt2VtM4ocbE3xlaPyGXG+zkg jAcP0Hsj6l68LINio6+sP9rx5zuFKYK6GgjwKSsTtGOiUi4bDEuYkZVMNbgzTuKAT9mW I74EjIWV7NgCJchKN9VxIhRZzmzoidhpqAHMwuOSHwhCX0xaOt/yzCnktcCQbONJ0Gkv V5dA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=I/saoEfBVwFJR2hgcwlpm8l3MLuNHaAODFTnRfHenu4=; b=pCUfQV6F2DwaNyRnWXCjGilUn1xYly7gtnMjAipS04YNf2XkUQm7Aa3qjkJ82m29vS 0SDop+rtv+sBaRxMs61SZoRL5NKnPrkUA9JJV2lVMKB5wNaBaabId1zxHEBQSN3pcpFp Vj/iqn8XmFlZDAgQ2ifsFKuAw8gCuy8FSdwP8KvDuRiM1GitRMJeOTE3gYZnhlRuebE/ Tyca08YWZBrJd4aV906leL76qcFGwXYPjSGyuwlyHu20fQhmHrHn8ky3g3SXJyqSzAvy iSm24BeHivQKRdv0g9lgqSGNuYpHII+9jBqa1LIPvmmpvVLiS5w5zF1DLg+SKHrKEWaD hOXQ==
X-Gm-Message-State: AOAM532ek+B/leza+iiWTEQP7S9tObHXx+eilM2e+DYb3J6URja6McGS qWnUuEuSt58ZF81iv4NkD89CoJCRVYsZenzwtwlHllR8LRM5VeFumA6TO5olPa7aIJF/lh2o8He qhcK2+C/E5hVBdEJTabw=
X-Received: by 2002:a05:6512:6d0:: with SMTP id u16mr6278835lff.497.1608533566132; Sun, 20 Dec 2020 22:52:46 -0800 (PST)
X-Google-Smtp-Source: ABdhPJwV9UTstakLtnR/LrkB0s8MABvWtPUojB/6uakOlIaPDcoy3o/pbfnnkAMgNWoxfmezU3COtJZx630mJ6+IYpM=
X-Received: by 2002:a05:6512:6d0:: with SMTP id u16mr6278820lff.497.1608533565783; Sun, 20 Dec 2020 22:52:45 -0800 (PST)
MIME-Version: 1.0
References: <CANJ8pZ-WMDotkQvhN-NuP7ivZkPRR-9S2KJSar=6463U0VKkow@mail.gmail.com> <EFC56A31-1276-4DAB-9526-9C2F24814D2C@pfrc.org> <CANJ8pZ_LnDna_jtipcLJq9rrS3MM32rLdxRW8ntC2aEi9VvzMg@mail.gmail.com> <722A787A-5B83-4802-A9F4-AB2957BB3305@juniper.net> <CA+eZshBse4g6jUBMxs4bJiE+uvWScwv7ggLNOMJbUiL1YsaisQ@mail.gmail.com>
In-Reply-To: <CA+eZshBse4g6jUBMxs4bJiE+uvWScwv7ggLNOMJbUiL1YsaisQ@mail.gmail.com>
From: Enke Chen <enchen@paloaltonetworks.com>
Date: Sun, 20 Dec 2020 22:52:34 -0800
Message-ID: <CANJ8pZ9LfsNfqU5Sq88HTHx71BjdrfJfTrWGVyhgajKv6ACfew@mail.gmail.com>
To: William McCall <william.mccall@gmail.com>
Cc: "idr@ietf. org" <idr@ietf.org>, jgs@juniper.net, Enke Chen <enchen@paloaltonetworks.com>
Content-Type: multipart/alternative; boundary="000000000000cc5d9605b6f3e7ef"
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.343, 18.0.737 definitions=2020-12-21_02:2020-12-19, 2020-12-21 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_spam_notspam policy=outbound_spam score=0 clxscore=1015 adultscore=0 lowpriorityscore=0 phishscore=0 mlxscore=0 suspectscore=0 priorityscore=1501 bulkscore=0 mlxlogscore=999 spamscore=0 impostorscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012210047
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/1nDBZE0hNMIAZaF72gsGRc_IAs0>
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 21 Dec 2020 06:52:53 -0000

Hi, John and William:

My reading of the Linux function tcp_probe_timer()  is that the data in the
socket buffer is checked. More specifically, when there is no
un-acked data, but there is data in the socket buffer,
the "icsk_user_timeout" would be checked, and the probe timer would be set
again in tcp_send_probe0().

I am not sure what could have caused the failure that William observed. We
will need someone who is familiar with the TCP code to take a look.
There might be one potential issue in tcp_check_probe_timer() where the
probe timer is not started (please see below).

Thanks.   -- Enke

----------------------
Linux v5.10-rc7-149-g33dc961

*diff --git a/include/net/tcp.h b/include/net/tcp.h*

*index d4ef5bf..0b28af1 100644*

*--- a/include/net/tcp.h*

*+++ b/include/net/tcp.h*

@@ -1328,7 +1328,8 @@ static inline unsigned long tcp_probe0_when(const
struct sock *sk,



 static inline void tcp_check_probe_timer(struct sock *sk)

 {

-       if (!tcp_sk(sk)->packets_out && !inet_csk(sk)->icsk_pending)

+       if (!tcp_sk(sk)->packets_out &&

+           (inet_csk(sk)->icsk_pending != ICSK_TIME_PROBE0))

                tcp_reset_xmit_timer(sk, ICSK_TIME_PROBE0,

                                     tcp_probe0_base(sk), TCP_RTO_MAX);
 }

----------


static void tcp_probe_timer(struct sock *sk)

{

        struct inet_connection_sock *icsk = inet_csk(sk);

        struct sk_buff *skb = tcp_send_head(sk);

        struct tcp_sock *tp = tcp_sk(sk);

        int max_probes;


        if (tp->packets_out || !skb) {

                icsk->icsk_probes_out = 0;

                return;

        }


        /* RFC 1122 4.2.2.17 requires the sender to stay open indefinitely
as

         * long as the receiver continues to respond probes. We support
this by

         * default and reset icsk_probes_out with incoming ACKs. But if the

         * socket is orphaned or the user specifies TCP_USER_TIMEOUT, we

         * kill the socket when the retry count and the time exceeds the

         * corresponding system limit. We also implement similar policy when

         * we use RTO to probe window in tcp_retransmit_timer().

         */

        if (icsk->icsk_user_timeout) {

                u32 elapsed = tcp_model_timeout(sk, icsk->icsk_probes_out,

                                                tcp_probe0_base(sk));


                if (elapsed >= icsk->icsk_user_timeout)

                        goto abort;

        }


On Sat, Dec 19, 2020 at 2:38 AM William McCall <william.mccall@gmail.com>
wrote:

> On Fri, Dec 18, 2020 at 10:33 PM John Scudder
> <jgs=40juniper.net@dmarc.ietf.org> wrote:
> >
> > On Dec 18, 2020, at 1:09 PM, Enke Chen <enchen@paloaltonetworks.com>
> wrote:
> > >
> > > No, I am not assuming that packets are getting somewhere. The
> TCP_USER_TIMEOUT would work as long as there is "pending data" (either
> unacked, or locally queued). The data can be from the local BGP Keepalives
> or the TCP_KEEPALIVE.
> >
> > Apart from the other objections to relying on TCP_USER_TIMEOUT, which I
> think are sufficient, it’s not clear to me that implementations will
> provide the desired semantics. RFC 793 seems like it specifies the right
> semantics (“get this data to the peer within N seconds or close”):
> >
> >         The timeout, if present, permits the caller to set up a timeout
> >         for all data submitted to TCP.  If data is not successfully
> >         delivered to the destination within the timeout period, the TCP
> >         will abort the connection.  The present global default is five
> >         minutes.
> >
> > However the Linux man page documents different semantics:
> >
> >        TCP_USER_TIMEOUT (since Linux 2.6.37)
> >               This option takes an unsigned int as an argument.  When the
> >               value is greater than 0, it specifies the maximum amount of
> >               time in milliseconds that transmitted data may remain
> >               unacknowledged before TCP will forcibly close the
> >               corresponding connection and return ETIMEDOUT to the
> >               application.  If the option value is specified as 0, TCP
> will
> >               use the system default.
> >
> > The important difference being that whereas 793 implies data written to
> the socket, the Linux man page says “transmitted” data, which seems like it
> must mean data TCP has written to the network. These are two very different
> things! If Linux (or another stack) implements what the man page seems to
> say, it’s not useful for our purposes.
> >
> > —John
> > _______________________________________________
> > Idr mailing list
> > Idr@ietf.org
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ietf.org_mailman_listinfo_idr&d=DwIFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=OPLTTSu-451-QhDoSINhI2xYdwiMmfF5A2l8luvN11E&m=P-eZWmrFtootouPUugKAk40aIyuZdrP9wLMCSS7GUTU&s=6oYcnalNTtK-8ktoh-vivM6BlWM0bCrW3WuHw19s7zo&e=
>
> I was curious too. I read the manpage, relevant linux kernel code, the
> RFC, and hacked up a test case (unicast me if you want the code).
> Also, Cloudflare published a relevant blog entry[0]. For this specific
> scenario, see under the sub-heading "Zero window ESTAB is...
> forever?".
>
> TCP_USER_TIMEOUT doesn't appear to kick in until there is unACKed
> data, meaning that it has already been transmitted from TCP's
> perspective. Stuff hanging around in the buffers due to persist state
> doesn't seem to count, per the test results and the docs. Confirms
> your thoughts from the reading I think.
>
> [0]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__blog.cloudflare.com_when-2Dtcp-2Dsockets-2Drefuse-2Dto-2Ddie_&d=DwIFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=OPLTTSu-451-QhDoSINhI2xYdwiMmfF5A2l8luvN11E&m=P-eZWmrFtootouPUugKAk40aIyuZdrP9wLMCSS7GUTU&s=M-HzefvcFBD2FU8OVERU_vTL_ObzcdQdlk0BUrADphk&e=
>
> --
> William McCall
>