Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0

Enke Chen <enchen@paloaltonetworks.com> Thu, 21 January 2021 00:29 UTC

Return-Path: <enchen@paloaltonetworks.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B5EA23A163E for <idr@ietfa.amsl.com>; Wed, 20 Jan 2021 16:29:26 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=paloaltonetworks.com header.b=J84fYQcb; dkim=pass (2048-bit key) header.d=paloaltonetworks-com.20150623.gappssmtp.com header.b=JcCPbphi
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eQbcIwvp2y3H for <idr@ietfa.amsl.com>; Wed, 20 Jan 2021 16:29:24 -0800 (PST)
Received: from mx0b-00169c01.pphosted.com (mx0b-00169c01.pphosted.com [67.231.156.123]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 752843A163C for <idr@ietf.org>; Wed, 20 Jan 2021 16:29:24 -0800 (PST)
Received: from pps.filterd (m0048188.ppops.net [127.0.0.1]) by mx0b-00169c01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 10L0FDtf016732 for <idr@ietf.org>; Wed, 20 Jan 2021 16:29:23 -0800
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=paloaltonetworks.com; h=mime-version : references : in-reply-to : from : date : message-id : subject : to : cc : content-type; s=PPS12012017; bh=xMar/Q8KKFC1CTamluGMJyfvXP+13WAuXVstDuk/Fy4=; b=J84fYQcbMHvISdgRoRc58exr8ZlyVqf4fkxhrEFLGYKhNHRVVwo3qwRKFk6VMjEUggAJ gUr3MWOUaa60ZQUbJFrfWFz/MbFwJgkHn27fCoVWevlaGJKEhEcoBhwSH6kPBWJr6t0+ 5kb/TmXPcjQYk6lOlcDsuOx16frYa/wKeC/VgbMm7qV1M7K0WFaTL5zcIhI3xAymITyL /zGoc+XvkckDNzjReUTaZpekjpGimTj0T1XHb4hjDBYtCdn6tePs8KZL42xtMMa03F9k JwdbaRKEInnCgU8Tq1MUSG/Mi7qJJlqUR7IodGmQBOWJO1BYbczJ7LWvnHwu1Tz9kL+s 4A==
Received: from mail-lf1-f69.google.com (mail-lf1-f69.google.com [209.85.167.69]) by mx0b-00169c01.pphosted.com with ESMTP id 3668ncp17h-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for <idr@ietf.org>; Wed, 20 Jan 2021 16:29:23 -0800
Received: by mail-lf1-f69.google.com with SMTP id x10so44041lfu.22 for <idr@ietf.org>; Wed, 20 Jan 2021 16:29:23 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=paloaltonetworks-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=xMar/Q8KKFC1CTamluGMJyfvXP+13WAuXVstDuk/Fy4=; b=JcCPbphiW3tIO4nWJbjPDma8qs6Fh6SdpqcwK0nQ/wLRm9fciC5ZkjUd+YGgIaB3gd fVcAiASg725c4Z55KvSkc1xi/LS8xKuYSC4CJ+2ovCTN+WOUMRdTBF+FEnX67yooSftG eG+a3027xeC4f042ipoO2C0NJ5icx8u/9GuUdipvdI4hB47BDZfY0uQmudPw9asO9qw/ tbKGmHMMnQQUEP+Cf2ZcDePyz5IoqyRmApkf+wUaVoRHGo6PIuIDhKf3ekxBP+IeP+Jg cEBe8t1MjAbYnZt7sOdsPI1dDMo405DH9ZLbtdAn17AJXnvpaKOFFuVFOXQKbmaXw8eR mv5Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=xMar/Q8KKFC1CTamluGMJyfvXP+13WAuXVstDuk/Fy4=; b=tCfZ4q7p7xP0G4JdXxyMADNK5U51M1KVu/Fb0iosYKDaadvy1l1XL+WsbARAVjXKDo vgVuiskkSO7r5OryO4DLzrU0GbhY92rvj8kbE/HxuEgdWr9wcbkdUbvxqyNbmjzV8hzo 3TdDBawI84nTwtZlKtM1TMSOvC4vOTr63hWCP2lvmIxNg0AwejovdhltPiveVBMnGSum TAKoE0KKe5nQLUA+t14gBSEWDRau3QwnO89FiLCPVfi3j9fL3UGJgnhgQfpNYF4/dJV1 Q19ZloX65dVRIDa5JffNPEgoIjhgKTEb2P8t+kCnUbUivV7W2/WWNiiBXNtYPyrnRsPn A+3g==
X-Gm-Message-State: AOAM532/qGzOSW9KOIcN7q6KxDw7dEs/jsWzwV7Cjwywq9TwLBZVWjE6 yD7h1ZmkS8wu8rkDmXYe2i2o+I1duejxDR8YUDE/ZTLau9GVRrEKsJZmgiU+7yRpeusJRZHWaBp qDNubUQHm8e222y7P3F4=
X-Received: by 2002:a2e:9d8b:: with SMTP id c11mr2696661ljj.470.1611188961426; Wed, 20 Jan 2021 16:29:21 -0800 (PST)
X-Google-Smtp-Source: ABdhPJxqgAB04COW1nXL6z+EbuozMuSz8QVeuJqAAlXXL2AWIE3rH2jvVCmyW9G2Mllgs3yC96XqmreKPj0aQo+EYSw=
X-Received: by 2002:a2e:9d8b:: with SMTP id c11mr2696650ljj.470.1611188961099; Wed, 20 Jan 2021 16:29:21 -0800 (PST)
MIME-Version: 1.0
References: <CANJ8pZ-WMDotkQvhN-NuP7ivZkPRR-9S2KJSar=6463U0VKkow@mail.gmail.com> <EFC56A31-1276-4DAB-9526-9C2F24814D2C@pfrc.org> <CANJ8pZ_LnDna_jtipcLJq9rrS3MM32rLdxRW8ntC2aEi9VvzMg@mail.gmail.com> <722A787A-5B83-4802-A9F4-AB2957BB3305@juniper.net> <CANJ8pZ9K2+MS9cksczt2G4OTAiNwR5iBQp=0-UEXMS=nUNqLAQ@mail.gmail.com> <YAh/4dDQaJGcIAYf@snel>
In-Reply-To: <YAh/4dDQaJGcIAYf@snel>
From: Enke Chen <enchen@paloaltonetworks.com>
Date: Wed, 20 Jan 2021 16:29:10 -0800
Message-ID: <CANJ8pZ8A521V7fKACGqX5L9Hpv=+JTtcWmDHaOL4wF1MVW35tg@mail.gmail.com>
To: Job Snijders <job@fastly.com>
Cc: John Scudder <jgs@juniper.net>, "idr@ietf. org" <idr@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000b16d4005b95e292c"
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.343, 18.0.737 definitions=2021-01-20_10:2021-01-20, 2021-01-20 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_spam_notspam policy=outbound_spam score=0 lowpriorityscore=0 mlxscore=0 clxscore=1015 impostorscore=0 phishscore=0 priorityscore=1501 adultscore=0 bulkscore=0 spamscore=0 malwarescore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2101210000
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/KOU59BTIvsqv8OHVBIvcVr97fD4>
Subject: Re: [Idr] TCP & BGP: Some don't send terminate BGP when holdtimer expired, because TCP recv window is 0
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Jan 2021 00:29:27 -0000

Hi, Job:

Thanks for bringing up the stuck BGP issue, which has led to the unearth of
the Linux kernel bug and also the fix.

Here are my comments regarding your follow-up questions:

On #1: It makes sense to me. The value for TCP_USER_TIMEOUT associated with
the BGP session should be made configurable, though.

On #2: My recommendation is to use Code 4 (Hold Timer Expired), and
introduce several subcodes something like:
            1) Hold-timer expiration - Received.
            2) Hold-timer expiration - Sent.
            3) Hold-timer expiration - transport timeout.

Regards,   -- Enke

On Wed, Jan 20, 2021 at 11:09 AM Job Snijders <job@fastly.com> wrote:

> Dear Enke, group,
>
> On Wed, Jan 20, 2021 at 10:20:47AM -0800, Enke Chen wrote:
> > Here is an update on the TCP_USER_TIMEOUT option in Linux for the
> > zero-window case:
> >
> > 1) There is a bug in the code, and a fix has been committed to Linux's
> > networking git:
> >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_netdev_net.git_commit_-3Fid-3D9d9b1ee0b2d1c9e02b2338c4a4b0a062d2d3edac&d=DwIBAg&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=OPLTTSu-451-QhDoSINhI2xYdwiMmfF5A2l8luvN11E&m=YwLHSMUV8lCZF0-sG2EqV_NqDMsN_NK2NwDb3tQ6mMc&s=wJGNu2HSsZGTN8hE4F3nm5mHYno1xfnPYEOvXUbWUXU&e=
> >
> > 2) A patch has been committed to Linux's man-page repo to clarify that
> the
> > option also covers the case that buffered data remain untransmitted:
> >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_docs_man-2Dpages_man-2Dpages.git_commit_-3Fid-3D1942e41202aa5cc39dd8970ab62cd1b288277753&d=DwIBAg&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=OPLTTSu-451-QhDoSINhI2xYdwiMmfF5A2l8luvN11E&m=YwLHSMUV8lCZF0-sG2EqV_NqDMsN_NK2NwDb3tQ6mMc&s=8x1m3aoMVKYAwmKwGXCPL4vlWRH45CKcHnqwCbLtHnE&e=
> >
> > Thanks.   -- Enke
>
> Wow, thank you, this is a welcome development!
>
> Two follow-up questions for the group:
>
> 1) Should the *default* BGP HoldTimer value be the input to mechanisms
>    such as the Linux TCP_USER_TIMEOUT option? (a 'Send Hold Timer') Earlier
>    on in the thread Tony Li suggested "For robustness, it makes sense that
>    the transmitter also close the connection."
>    ref:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__mailarchive.ietf.org_arch_msg_idr_CS7VOx42V76RfGLmXhQqM8DyNjk_&d=DwIBAg&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=OPLTTSu-451-QhDoSINhI2xYdwiMmfF5A2l8luvN11E&m=YwLHSMUV8lCZF0-sG2EqV_NqDMsN_NK2NwDb3tQ6mMc&s=RkMmOhnl1QTDLAEAO4UFa8h5mGazaiU2sLSlj9oJIjw&e=
>
> 2) Should we request IANA to assign a new "BGP Error Notification Code"
>    for the purpose of locally correctly logging the reason the remote
>    peer was shutdown, or also use Code 4 (Hold Timer Expired) for this
>    case?
>
> John Scudder brought up that in the current version of the FSM this
> would be a 'ManualStop', but I'd like to explore the option of an
> automated termination of the connection (until IdleTimer expires).
>
> Kind regards,
>
> Job
>