Re: [Idr] Fwd: New Version Notification for draft-spaghetti-idr-bgp-sendholdtimer-05.txt

Enke Chen <enchen@paloaltonetworks.com> Thu, 04 August 2022 17:31 UTC

Return-Path: <enchen@paloaltonetworks.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7D06CC15C501 for <idr@ietfa.amsl.com>; Thu, 4 Aug 2022 10:31:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.103
X-Spam-Level:
X-Spam-Status: No, score=-2.103 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=paloaltonetworks.com header.b=jy9O86o1; dkim=pass (2048-bit key) header.d=paloaltonetworks.com header.b=RCyIkPp/
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Cr9OcoCbYfve for <idr@ietfa.amsl.com>; Thu, 4 Aug 2022 10:31:13 -0700 (PDT)
Received: from mx0b-00169c01.pphosted.com (mx0b-00169c01.pphosted.com [67.231.156.123]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E6634C14F738 for <idr@ietf.org>; Thu, 4 Aug 2022 10:31:12 -0700 (PDT)
Received: from pps.filterd (m0281121.ppops.net [127.0.0.1]) by mx0b-00169c01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 274Etq1q014750 for <idr@ietf.org>; Thu, 4 Aug 2022 10:31:12 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=paloaltonetworks.com; h=mime-version : references : in-reply-to : from : date : message-id : subject : to : cc : content-type : content-transfer-encoding; s=PPS12012017; bh=g7BTFVCrGMBZi/8RCoaje6OUdQg3Sz2CfvU2yhUeN2s=; b=jy9O86o1GQBVZRMxgq/6TvTUx6SE9fOKwuyb1/9HIhTxUsx2xjwP7Tz8ngUVyOfwY0/k JpRllAnMunF7uRUtHc2O8V8LedeFO25vih24Dtqa7lcHdDISzViwS+uOXRo8focc8sZB tp3wWHGKmBdRCjERaYdR330R7jUgv0EyuXfqrsrqyPkCIvdbAhZkUDRvyyTOPsd5NDD8 O8NpPHOwOTU6pPGVu9ID422DphavEXtQMltuiUYZ782T4zgH0+1F+WCSxXbv4TQmwA03 +iZucbeEQc54fp+3VzcEuNxn5HknrCZmEv6RrL5IKKi5ac1IHaP567dBGnIbquA9muSN Og==
Received: from mail-lj1-f198.google.com (mail-lj1-f198.google.com [209.85.208.198]) by mx0b-00169c01.pphosted.com (PPS) with ESMTPS id 3hn217h2va-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for <idr@ietf.org>; Thu, 04 Aug 2022 10:31:11 -0700
Received: by mail-lj1-f198.google.com with SMTP id c18-20020a2ebf12000000b0025e5168c246so151647ljr.1 for <idr@ietf.org>; Thu, 04 Aug 2022 10:31:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=paloaltonetworks.com; s=google.paloaltonetworks.com; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=g7BTFVCrGMBZi/8RCoaje6OUdQg3Sz2CfvU2yhUeN2s=; b=RCyIkPp/A4tGF/NfN7V5lBLZzhBbTyGQhNOZKhsYlN2gubWH7qcPe43K2tP8PWEmO6 cMqlLeULlzLmWqWe/ZEB4SmnWx4FiKrFEPYlCG3TU3Nu9P7B7trY/zQlJOAIsMoU7pWc ouB/4D8QEV+BcAMUplmgWonJRypI64HRT4oPLzZPa+5VOX2j4dylVVmP7bU6CEVNznYB phZku7XqXxFGGNahDfUgvrPEMwVTzxrbpS8IZjs16yO11YofmayixtgrYzsHmpieHGRm QYOZ6qdbWtOTw7SxtMM0jujmDU5/A995XKOMaJCR0o04biVs+Ow5wKaNfELXPvO5frWZ 6wNA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=g7BTFVCrGMBZi/8RCoaje6OUdQg3Sz2CfvU2yhUeN2s=; b=bnjZoQ8yb0nWLrVG/YSZroymEKo/oNMzxX4pO3gqKdvJWDAoc7BjxtjOq9Z8nBngkt CmmRiFrRt8ht1gKJvzHCeBLjGhqqECFd+uClMf+FCRNs1iiWQ+Cf2L1AbJXK07ysa9pu oF8C2M1H580EKUJUjqcXa3QMFQvv2ijJSVbU8fmz96d3LEbpkDuMsrq2n+daIf6OOo1G ViqndLo65uol756PkGjJ50tuaPuN4zBFHQW6N4iao2iHMaepZC/qEEaFk03VZSkpdhgI Hs2XvoqEK4RjO+E/l4ZVrFmfdCjjAsR5ZHJ2h3ylqzZxxcXQ1mSEgjCLeTFnCKrg9w+7 zeLQ==
X-Gm-Message-State: ACgBeo3Mwpettg48R3WncazHRpoPYpP6cbiwFWiEDk28vIm0XGqG1l2E Qc+VdM/iP7beEHDVREIgt349DddxrO1qRMfBY2/LO/uM0hHbLhWMbUtnFX9UrazFP1VyE3wq8Pa N2+awAMKpcYSY3F63MHY=
X-Received: by 2002:a05:651c:158e:b0:25d:babd:46f0 with SMTP id h14-20020a05651c158e00b0025dbabd46f0mr867925ljq.497.1659634270013; Thu, 04 Aug 2022 10:31:10 -0700 (PDT)
X-Google-Smtp-Source: AA6agR6JBRzOEX6edRPjLLw9noXebc14JB5MdCDOW31SiWMrO3PvDRJwyr9OZ++Y+eIGNjllhjmj0oUftdrdsehS+8Q=
X-Received: by 2002:a05:651c:158e:b0:25d:babd:46f0 with SMTP id h14-20020a05651c158e00b0025dbabd46f0mr867917ljq.497.1659634269693; Thu, 04 Aug 2022 10:31:09 -0700 (PDT)
MIME-Version: 1.0
References: <CAOj+MME7XnW7kDXL4muh4Qp1UvabQ9amUoU0Sn3h2axqKzswzA@mail.gmail.com> <77F3E1F0-486F-47DF-ABE4-EFDB9C2FB6D8@gmail.com> <CAOj+MMGR4f3eLEDZY++1m4Lpo9joG4L9OrWbeF6kREn-9a9onA@mail.gmail.com> <c6e44213-7667-0f67-71a4-634411cd102b@foobar.org> <CAOj+MMFajL6E42WCzC0ZqrfSBZjU-0B=ZzmtvCRPkuMzU8z5QA@mail.gmail.com> <Yun6e5jSb0OYZGAX@shrubbery.net> <CAOj+MMFRJr=cs+5DVOp72BVn_j3NgANwNftyj=jRbdsvPpg-wA@mail.gmail.com> <CANJ8pZ9oNvd0CGEbOQQpeZ1Sf-=ctVy8yhD0XFK-qYiE08BZUA@mail.gmail.com> <CAL=9YSX-iXEOQrERA5M_ZbG68UmgacchdODk7uwT3p0ZjLgJow@mail.gmail.com>
In-Reply-To: <CAL=9YSX-iXEOQrERA5M_ZbG68UmgacchdODk7uwT3p0ZjLgJow@mail.gmail.com>
From: Enke Chen <enchen@paloaltonetworks.com>
Date: Thu, 04 Aug 2022 10:30:58 -0700
Message-ID: <CANJ8pZ_OG9MukAKLeCVUv_oWRwBak5-WiE9Yw6iyg-aN+wiQKw@mail.gmail.com>
To: Ben Cox <ben@benjojo.co.uk>
Cc: "idr@ietf. org" <idr@ietf.org>, Enke Chen <enchen@paloaltonetworks.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Proofpoint-ORIG-GUID: UEX-rmi_cl7gtK1wHZqz7PPU_fofOxv9
X-Proofpoint-GUID: UEX-rmi_cl7gtK1wHZqz7PPU_fofOxv9
X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-04_03,2022-08-04_02,2022-06-22_01
X-Proofpoint-Spam-Details: rule=outbound_spam_notspam policy=outbound_spam score=0 lowpriorityscore=0 mlxscore=0 phishscore=0 priorityscore=1501 impostorscore=0 mlxlogscore=999 bulkscore=0 clxscore=1015 adultscore=0 suspectscore=0 malwarescore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2206140000 definitions=main-2208040074
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/wh3dVFGNWHI6RPJc78rFcgmoQIg>
Subject: Re: [Idr] Fwd: New Version Notification for draft-spaghetti-idr-bgp-sendholdtimer-05.txt
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Aug 2022 17:31:17 -0000

On Thu, Aug 4, 2022 at 7:07 AM Ben Cox <ben@benjojo.co.uk> wrote:
>
> One question with TCP_USER_TIMEOUT (I've yet to test it) but if we
> want to emit syslog/SNMP messages when it triggers, is there a way to
> get out of the kernel that the reason for a socket closing was
> TCP_USER_TIMEOUT and not something else?

As documented in the man page, the kernel would return the error code
ETIMEDOUT to the application.

To retrieve an error on the socket (associated with the BGP session),
just follow the standard procedures for socket APIs, such as read(),
write(), select(), poll(), epoll().

>
> Someone also mentioned in passing that TCP_USER_TIMEOUT in Linux was
> buggy/broken until relatively recently, assuming that is the case,
> that might be a good argument alone against going that route since the
> a number of vendors using older kernels could risk someone thinking they
> have implemented the feature but instead their kernel has
> silently/loudly broken it.

Yes, there have been several fixes for TCP_USER_TIMEOUT in Linux in
the last several years. To get these fixes, either upgrade or back
port.
Otherwise the condition may not be detected in all the cases.

I don't think the existence of bug fixes should diminish the
TCP_USER_TIMEOUT as a simple and effective solution to the issue of
"stuck" sessions.

Thanks. -- Enke

>
> On Wed, Aug 3, 2022 at 8:29 PM Enke Chen <enchen@paloaltonetworks.com> wrote:
> >
> > > if this is just queued keepalives (no BGP churn) then depending on the buffer size it may take ages to fill ..
> >
> > Indeed, in that case the local BGP speaker may not know about the
> > condition (mis-behaving remote speaker / 0-window) for a long time,
> > and that's not covered by the current draft, nor the FRR BGP patch
> > based on my reading.
> >
> > To make it complete, I believe that the detection and handling of a
> > stuck peer (i.e., "data not being transmitted") should be at the
> > transport layer.  The TCP_USER_TIMEOUT option is readily available for
> > that purpose:
> >
> > https://urldefense.com/v3/__https://man7.org/linux/man-pages/man7/tcp.7.html__;!!Mt_FR42WkD9csi9Y!byAyMMhah5M_HgRLuo7UbEgzIJ_j6QJDTgWHtcYI_Stvh2ai8GLl8eNO-Hoi9hvi-ZDtZqD84yeDNCzxHdAu$
> >
> > TCP_USER_TIMEOUT (since Linux 2.6.37)
> >               This option takes an unsigned int as an argument.  When
> >               the value is greater than 0, it specifies the maximum
> >               amount of time in milliseconds that transmitted data may
> >               remain unacknowledged, or bufferred data may remain
> >               untransmitted (due to zero window size) before TCP will
> >               forcibly close the corresponding connection and return
> >               ETIMEDOUT to the application.  If the option value is
> >               specified as 0, TCP will use the system default.
> >
> > Thanks.  -- Enke
> >
> > On Wed, Aug 3, 2022 at 2:43 AM Robert Raszuk <robert@raszuk.net> wrote:
> > >>
> > >> Even if the remote (stuck) peer does not process the FIN, the local
> > >> end will close the session and stop forwarding traffic that direction.
> > >> That is an improvement, from an operator's PoV, both for forwarding
> > >> errors and local resources.
> > >
> > >
> > > Would not the same happen when you will send keepalives *ONLY* upon receiving them from a peer.
> > >
> > > If both sides support it there should be no issue of stuck sessions. Also no new timer needed and could be knob to enable/disable it.
> > >
> > > This draft assumes that one peer is bad and the other should time out when it can not send. So I am asking what exactly is the trigger to fire that timer when BGP can not write to a TCP socket ? Is it Error 105: No buffer space available ? Something else ?
> > >
> > > If this is just queued keepalives (no BGP churn) then depending on the buffer size it may take ages to fill ...
> > >
> > > Best,
> > > R.
> > >
> > > _______________________________________________
> > > Idr mailing list
> > > Idr@ietf.org
> > > https://urldefense.com/v3/__https://www.ietf.org/mailman/listinfo/idr__;!!Mt_FR42WkD9csi9Y!a0z_mWIoJi3AkiI16RqyCvoRCWwEOL3bWCHM4NpDMSNuDupNtnpKp-7CS11XhqcJqXRdfdf7CvdIHDMJt49KTA$
> >
> > _______________________________________________
> > Idr mailing list
> > Idr@ietf.org
> > https://urldefense.com/v3/__https://www.ietf.org/mailman/listinfo/idr__;!!Mt_FR42WkD9csi9Y!byAyMMhah5M_HgRLuo7UbEgzIJ_j6QJDTgWHtcYI_Stvh2ai8GLl8eNO-Hoi9hvi-ZDtZqD84yeDNCNwiZjB$