Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

Enke Chen <enchen@paloaltonetworks.com> Sat, 12 December 2020 18:52 UTC

Return-Path: <enchen@paloaltonetworks.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5CCA53A12CE for <idr@ietfa.amsl.com>; Sat, 12 Dec 2020 10:52:16 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.018
X-Spam-Level:
X-Spam-Status: No, score=-2.018 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, HTTPS_HTTP_MISMATCH=0.1, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=paloaltonetworks.com header.b=dVzLR88c; dkim=pass (2048-bit key) header.d=paloaltonetworks-com.20150623.gappssmtp.com header.b=UqTGL2GL
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2g8FmfOz4Wi2 for <idr@ietfa.amsl.com>; Sat, 12 Dec 2020 10:52:14 -0800 (PST)
Received: from mx0b-00169c01.pphosted.com (mx0a-00169c01.pphosted.com [67.231.148.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7D9F43A12BA for <idr@ietf.org>; Sat, 12 Dec 2020 10:52:14 -0800 (PST)
Received: from pps.filterd (m0048493.ppops.net [127.0.0.1]) by mx0a-00169c01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 0BCIn8cW013930 for <idr@ietf.org>; Sat, 12 Dec 2020 10:52:14 -0800
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=paloaltonetworks.com; h=mime-version : references : in-reply-to : from : date : message-id : subject : to : cc : content-type; s=PPS12012017; bh=VeWcoywfhcT5sRgfPDbubZSW1LZ/HXf73iAd3jquK48=; b=dVzLR88cxFOhjL7lIrCVoVMmscRdFzT33Fun+HB0FePud/6V1mEgUgtmPjgbUC9iDsyZ Amj64hZJZ5uOvmL1RtIKufUGzlZ+4sc6/tuWFqXCR8xTWkXX18xEOk3I7Q4Z/WDavqZG kVLMiZ46S8GEnUB/KmAryPgl53OnhODo4FEa/Ha2qut6B6vAtJ37CgFTzeajBWkgiMyL nLKr/22gNuagBLuhGgcr3Z6a7bSoEL5NIPznpJ+uTK+/k2U7k3KKi1xoYCbU7C92Cou6 YmBVmzW/zSgvzVuXggK7eGGzWtlt4RwLuWLC9qCyQr4dqgRSPcChEs8OELJQyMwxpZKn Wg==
Received: from mail-lf1-f69.google.com (mail-lf1-f69.google.com [209.85.167.69]) by mx0a-00169c01.pphosted.com with ESMTP id 35cwst0x5q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for <idr@ietf.org>; Sat, 12 Dec 2020 10:52:14 -0800
Received: by mail-lf1-f69.google.com with SMTP id a4so2447378lfb.23 for <idr@ietf.org>; Sat, 12 Dec 2020 10:52:13 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=paloaltonetworks-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=VeWcoywfhcT5sRgfPDbubZSW1LZ/HXf73iAd3jquK48=; b=UqTGL2GLpc/+Xf0ggYfpH8f093e/D2VmJ3qtWqh0RnwNXPvZvKwNa9DRea5PYdDs4F Z2lB2v8ozjFkmXCiqQf02Sm5WGijsnOSAF/rSh5DOiH8tWpYXAiZ67B6AFmPBRmdqcJ+ ryPlTah6zPdtIVRVgurPJkfXCIDlntIAyd6MFlqrHJW6hRy/azGd2Si2Zwr+0KmrbrJ6 YtVT0gyGg+OBMz5ZsToSVX76JxJ30lFviQR+TRelY0BfCLGOaR4zVhaO+tiIM/HL3WlY +aDNmoFkkS5LHHg4bE4IFNAe/NIPCScBjKkrXoVy//wYKF//+C0Q38mdVYwrmpthEV1F 8bzw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=VeWcoywfhcT5sRgfPDbubZSW1LZ/HXf73iAd3jquK48=; b=Gb3ZUdRLbIEjHeTJ2CWXoFgCEatkar2fTIAe5eq9h2jhf02Su+ZSPbLh4apZxhvYUS EnkdZ9dKV78RxMaf6+FC5HVLjymV2/9v799jSZF1KVDdh3JfcojKBW6vw+7hYvTqecMq sA5o1likoakIb8a5Ib79Vx6akMc+islw3NXOsV1lp+G50UqoOkyJsLWNYFskTHdDq2RC /h7qS45v5dNGUJO6saAGl/lctvDcAnoRW2uUOSqRDF0mA/P53AbvTiwdsrgGBh8c7mdH U+j73O2kZ0BXHesH8QN/NeRh/D2fEl4zmxrQ+WXyrdLNoA9o7jPikojcjdsKthlJpToP P6BA==
X-Gm-Message-State: AOAM531ZLWICEloIqQCV/SxihkpbWQWRbLH4rsR+SuImyPe995QZ93gI t4oOBBr/viK2GWR1mQF//mF0Zwl/YZMDcgc+AxDoofWfVv2kty6LuSy2e2kfI2ca8es/7046dcK bxqZ1K2RMwZfI4yqPr1k=
X-Received: by 2002:a05:651c:1341:: with SMTP id j1mr6451402ljb.216.1607799131558; Sat, 12 Dec 2020 10:52:11 -0800 (PST)
X-Google-Smtp-Source: ABdhPJyigQFanpzN1rKcW6ypp5fQ2zNebk1ignjDsl2ZbhV/F4ux0eekWt1LhFAkuvfjVZsJwlFQ2L/doDwFo1gRlj8=
X-Received: by 2002:a05:651c:1341:: with SMTP id j1mr6451397ljb.216.1607799131288; Sat, 12 Dec 2020 10:52:11 -0800 (PST)
MIME-Version: 1.0
References: <CANJ8pZ_4OasVWQ+Z7UddOXF85RgMOQGbZni9Zpivy-wa0AXj3Q@mail.gmail.com> <BYAPR11MB32077A4C76A2D5B569837698C0C90@BYAPR11MB3207.namprd11.prod.outlook.com>
In-Reply-To: <BYAPR11MB32077A4C76A2D5B569837698C0C90@BYAPR11MB3207.namprd11.prod.outlook.com>
From: Enke Chen <enchen@paloaltonetworks.com>
Date: Sat, 12 Dec 2020 10:52:00 -0800
Message-ID: <CANJ8pZ_63ar_G9fzgMG4jacbVF24h-oEmdqMSaZOiT+G7-qp0g@mail.gmail.com>
To: "Jakob Heitz (jheitz)" <jheitz@cisco.com>
Cc: "idr@ietf.org" <idr@ietf.org>
Content-Type: multipart/alternative; boundary="0000000000001752eb05b648e8b3"
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.343, 18.0.737 definitions=2020-12-12_08:2020-12-11, 2020-12-12 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_spam_notspam policy=outbound_spam score=0 adultscore=0 mlxscore=0 suspectscore=0 impostorscore=0 lowpriorityscore=0 mlxlogscore=999 clxscore=1015 bulkscore=0 malwarescore=0 priorityscore=1501 spamscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012120147
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/O5Ni5SSDaey9Phzljbflit_zHlo>
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 12 Dec 2020 18:52:19 -0000

Hi, Jakob:

It's about recovery from corner cases that occur rarely than other
"holdtime expiration" events.

Keep it simple and robust.

Thanks.   -- Enke

On Fri, Dec 11, 2020 at 11:15 PM Jakob Heitz (jheitz) <jheitz@cisco.com>
wrote:

> Kill and restart all from scratch with BGP is a scary prospect,
>
> especially if it's just going to repeat.
>
>
>
> How about we graceful restart it.
>
> Require GR to be in effect on the session.
>
> Then either RST or silently drop the session and start a new one.
>
> The receiver should accept the new session and do GR.
>
> And, yeah, use a long timer. Don't restart again until at least the EOR
>
> is both sent and received and only then start the boom timer again.
>
>
>
> Regards,
>
> Jakob.
>
>
>
> *From:* Idr <idr-bounces@ietf.org> *On Behalf Of * Enke Chen
> *Sent:* Friday, December 11, 2020 10:15 PM
> *To:* idr@ietf.org
> *Cc:* Enke Chen <enchen@paloaltonetworks.com>
> *Subject:* Re: [Idr] TCP & BGP: Some don't send terminate BGP when
> holdtimer expired, because TCP recv window is 0
>
>
> Hi, Folks:
>
>
>
> There is an interesting article titled "When TCP Sockets Refuse to Die":
>
>
>       https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__blog.cloudflare.com_when-2Dtcp-2Dsockets-2Drefuse-2Dto-2Ddie_&d=DwMGaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=OPLTTSu-451-QhDoSINhI2xYdwiMmfF5A2l8luvN11E&m=lifWUvp3-DcEhCcrTN7S0OaBssKd2YgRWJK9wKS6kME&s=iaULKaznGYFPvvgj5LqIC6FRCjyl4DU6NzlDLBRNoKA&e=>
>
>
>
> which recommends using the TCP keepalive option and the TCP_USER_TIMEOUT
> option together to deal with several TCP "stuck" scenarios.
>
>
>
> To strike a balance between maintaining routing stability and working
> around these corner cases, how about we recommend using these two TCP
> options with a timeout value larger than the BGP per-session holdtimer
> (e.g.,  2 * bgp_holdtimer) ?
>
>
>
> Thanks.   -- Enke
>
> ------
> Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired,
> because TCP recv window is 0
>
> "Jakob Heitz (jheitz)" <jheitz@cisco.com> Sat, 12 December 2020 03:29 UTCShow
> header
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mailarchive.ietf.org_arch_browse_idr_&d=DwMGaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=OPLTTSu-451-QhDoSINhI2xYdwiMmfF5A2l8luvN11E&m=lifWUvp3-DcEhCcrTN7S0OaBssKd2YgRWJK9wKS6kME&s=VXdBMh8C3fKDigTD71jkq9slGVhCFKdLa5EqbvlCUbg&e=>
>
> Good point Keyur.
>
> A receiver may be overwhelmed for a long time and not open its TCP window to avoid
>
> silly window syndrome or some other reason. The receiver may still be functional
>
> and able to clear its backlog, albeit in a long time. Resetting such a session
>
> will only make the situation worse. Telling the difference between this case
>
> and a receiver stuck in a bug is difficult.
>
>
>
> Regards,
>
> Jakob.
>
>