Re: [Idr] New Version Notification for draft-spaghetti-idr-bgp-sendholdtimer-05.txt

Robert Raszuk <robert@raszuk.net> Fri, 19 August 2022 12:10 UTC

Return-Path: <robert@raszuk.net>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D6818C14F73F for <idr@ietfa.amsl.com>; Fri, 19 Aug 2022 05:10:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.105
X-Spam-Level:
X-Spam-Status: No, score=-2.105 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=raszuk.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id H4vP92VhRTiP for <idr@ietfa.amsl.com>; Fri, 19 Aug 2022 05:10:05 -0700 (PDT)
Received: from mail-ed1-x52b.google.com (mail-ed1-x52b.google.com [IPv6:2a00:1450:4864:20::52b]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6BC76C14F722 for <idr@ietf.org>; Fri, 19 Aug 2022 05:10:05 -0700 (PDT)
Received: by mail-ed1-x52b.google.com with SMTP id z2so5424923edc.1 for <idr@ietf.org>; Fri, 19 Aug 2022 05:10:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raszuk.net; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=qHM2XzpZKScLtmqjycVlgoTNaZ3WjR595bHbkNblAqk=; b=PUIRTmiHEGaQwtku/ErbYEX2jwxQdEbJoqmhxjSTvasNKkskzcvmONMWGZYHyA4rUs cNMuZbEbKX5MbstFZWqPV6uiOhUXDEI9eHKLiUCjC0yj95/3/ufoPqxJ6heSZjAicByb UCAL4Mkehfsa9YI5QN3S6QWPMFKKSoDcBtlBFsv5Fj2rxjMbeAQkGDoLhowlxntO/gT+ wexjCPoT/fcx0JL3E+WSoTygNw7ImWq4SjOiKVtYtKTqOExx6QfEI8lDYyHN9GEWDzA6 QMqPEe4J/e05GWnWvkGsltZWyaP38CADSR4RtiSjRTXRd0y7qgz41ueY0czMwAqrRsVj h08A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=qHM2XzpZKScLtmqjycVlgoTNaZ3WjR595bHbkNblAqk=; b=gNv5mg8VpMla8HaQ+XlLorJyFZ754ekSRbqLnSPh+3LxqWqyXEFCo7OJUT4F1XjN0c 6jIR9FYyXCHyS99Lt7XxvdjCxrMgZN3RbewxEQnCwgvBJsrskJJVg3LqBdyozS93qAow FT9g7x6aGhq/lOn6ETl+Lu4p9/AHcdaX0H9AcW1sfPgl1wtpVWSbzpQ7KzqrRzYfC76D NHMa92X58QtPbSBA56jgGxkvpYw2+ErvwmdsQDtt3qN4rjxwz0j9sWoE1ixjdjdXUeB9 i82OQP+VBhkietkjn4jCpsaeUoJ3ja1T+6+5o22hUEhc1OBvj0tXLTG/b7TH4BTy6Czf mAEw==
X-Gm-Message-State: ACgBeo0rx8EQDcCNIIF6FKcr52Q347AKaGmkLVrGemmDlnsi3AY8kaU5 xQvHIlp2uGNOPl58wx97ymN4RDbQ88po5pL3Y/l7gsNaJcrzug==
X-Google-Smtp-Source: AA6agR4Y9PJQzETcwkVIG0dJpvyKTGqLhTwLNHfRmAfVT0h3QcVB/D6Rcs50rdFsvtdEKgQpPcB2blsi6nopGTPHLPY=
X-Received: by 2002:a05:6402:5192:b0:43d:cc0d:6ea4 with SMTP id q18-20020a056402519200b0043dcc0d6ea4mr5849697edd.111.1660911003751; Fri, 19 Aug 2022 05:10:03 -0700 (PDT)
MIME-Version: 1.0
References: <CAMFGGcC19MJ4poutfp_C-=14RjQeNQXgc24vHyXoQsdZLNq5PQ@mail.gmail.com> <CABNhwV0b6ODL8u+VG8aYLRD9vQxwupYQT5DL0wBfZoOx-oCsZg@mail.gmail.com> <CABNhwV2v4h2Sr_jKOUPsr-jdq-SbpD7xOLsazZC8zT3J3os_Ow@mail.gmail.com> <CAOj+MMFxHoZ8=gsF3bHho+CRp3XPo4=2WSp_jAvWSXzFzOr74Q@mail.gmail.com> <Yvo2FEBH6tM3ttKd@diehard.n-r-g.com> <CAOj+MMGTQSOYbd6g55vquzBoE2EEGMu4QSMDpYSTWvFhX4+BHg@mail.gmail.com> <CAEm8Q11M35gp=m2pMjnQ_RnQ4S_Otx4wugwx03QRPDvCzMWcyw@mail.gmail.com> <CAOj+MMEdWr4mnp0Cr9QSQ+Msfb6jHwziu=ttPGhdXUrtgtZqBw@mail.gmail.com> <Yvp3eZ4iDccWNmIR@shrubbery.net> <CAOj+MMER5fTqyyXhFB0VkL51CHKC81=DNfGeqtHqPEcAgS0LBw@mail.gmail.com> <Yvq12HOd+1HPPa/t@shrubbery.net> <CAOj+MMFNVM7TrpGGrreWufkP97X0n0W11y2eOsnss+v5irE62g@mail.gmail.com> <AM7PR07MB6248651F07184633E93B1144A06A9@AM7PR07MB6248.eurprd07.prod.outlook.com> <83BA8ED7-3ABF-4079-AFC5-F9F60CEA9668@pfrc.org> <CAMFGGcBtiYzvbpDiHZ151DRva+xNz8RQHWiQPiGP+fR9tq9aTA@mail.gmail.com>
In-Reply-To: <CAMFGGcBtiYzvbpDiHZ151DRva+xNz8RQHWiQPiGP+fR9tq9aTA@mail.gmail.com>
From: Robert Raszuk <robert@raszuk.net>
Date: Fri, 19 Aug 2022 14:09:53 +0200
Message-ID: <CAOj+MMHW7TDZgvPOvWEC7Wyurjb2wQSuYXs6Q1oJGvz+ETJDUw@mail.gmail.com>
To: Job Snijders <job=40fastly.com@dmarc.ietf.org>
Cc: Jeffrey Haas <jhaas@pfrc.org>, "idr@ietf. org" <idr@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000061f46205e696fa42"
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/8dxFnVbi2cYEsiFtOsMI8nZPbf0>
Subject: Re: [Idr] New Version Notification for draft-spaghetti-idr-bgp-sendholdtimer-05.txt
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2022 12:10:11 -0000

Hi Job,

> I believe the suggested FSM update and the three known implementations
are in line with each other.

So motivated by you bringing the implementation topic I took a look at the
FRR diff David committed  ..

https://github.com/FRRouting/frr/commit/bd9fb6f368049bd5f1f6a2b7bc97fbd51c9300cc

The first problem is that it does not have much to do with either this
discussion or your draft.

What is implemented is timeout from the moment of full buffer not when peer
stopped receiving any TCP data from us (and stopped sending acks for the
data).

As I already provided evidence to fill the buffer with keepalives may take
hours.

Moreover when the buffer is actually getting drained with peer
accepting keepalives, but we have large UPDATE which still for a long time
do not fit the remaining space of the buffer the session will be
unnecessarily dropped. That's actually pretty harmful.

The timeout of 2 * holdtime is way too aggressive too - some folks set
holdtime to 9 sec if there is no BFD running.

Now if you are saying other implementations are "in-line with each other"
then this is worrying.

Thx,
R.


On Fri, Aug 19, 2022 at 11:24 AM Job Snijders <job=
40fastly.com@dmarc.ietf.org> wrote:

> Hi all,
>
> I’ve added some bits where to restart the Send Hold Timer:
>
> https://www.ietf.org/rfcdiff?url2=draft-spaghetti-idr-bgp-sendholdtimer-08.txt
>
> I believe the suggested FSM update and the three known implementations are
> in line with each other.
>
> Further tuning (such as the notion of asynchronously started timers) can
> happen as a Working Group document, including writing out a full copy of
> the FSM description in an Appendix (as the “update this update that add
> this” style of writing quickly becomes unwieldy).
>
> Kind regards,
>
> Job
>
> On Wed, 17 Aug 2022 at 15:26, Jeffrey Haas <jhaas@pfrc.org> wrote:
>
>>
>>
>> > On Aug 17, 2022, at 6:48 AM, tom petch <ietfc@btconnect.com> wrote:
>> >
>> > From: Idr <idr-bounces@ietf.org> on behalf of Robert Raszuk <
>> robert@raszuk.net>
>> > Sent: 15 August 2022 22:18
>> >
>> > https://mailarchive.ietf.org/arch/msg/idr/McRvkJ6UiNwJSKvGs0GPaqDfovA/
>> >
>> > #1 - IMO it would be a pretty bad idea to apply Send_Hold_Timer to a
>> booting node. If at all this timer should fire only after receiving the
>> <EOR> marker on the session.
>> >
>> > #2 - Authors of this draft target cases where a peer is stuck for
>> "days/weeks" ... I am yet to see a BGP node taking that much to boot.
>> >
>> > <tp>
>> > At a slight tangent, the I-D fails to state when the timer should be
>> running, in which of the FSM states, which leaves much to the imagination.
>> Since this is a function of TCP and not BGP, then it could be applicable as
>> soon as there is a TCP connection and so apply to most state.
>>
>> This detail is one of the things that has me uneasy about blanketly
>> applying the example feature Enke Chen highlighted from the Linux option.
>>
>> The challenge being faced is when BGP decides that the remote side is
>> "taking too long".  Initial startup scenarios where the firehose of a RIB
>> exchange are one of the places where a little forgiveness in timers may be
>> reasonable.  This is especially true if you're looking at some sort of
>> larger outage where every single device in a "region" may be trying to
>> restart simultaneously.
>>
>> Relevant to your point, Tom, figuring out the exact touch point in the
>> FSM to hook this is challenging enough.  If we permit an implementation to
>> turn it on "later" at a point subjective to the implementation, the FSM
>> isn't exactly good about that right now.  It'd become an asynchronously
>> started timer.
>>
>> -- Jeff (still cursing Alex for the FSM)
>> _______________________________________________
>> Idr mailing list
>> Idr@ietf.org
>> https://www.ietf.org/mailman/listinfo/idr
>>
> _______________________________________________
> Idr mailing list
> Idr@ietf.org
> https://www.ietf.org/mailman/listinfo/idr
>