Re: [Idr] New Version Notification for draft-spaghetti-idr-bgp-sendholdtimer-05.txt

Robert Raszuk <robert@raszuk.net> Wed, 17 August 2022 14:50 UTC

Return-Path: <robert@raszuk.net>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 98796C1522AF for <idr@ietfa.amsl.com>; Wed, 17 Aug 2022 07:50:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.105
X-Spam-Level:
X-Spam-Status: No, score=-7.105 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=raszuk.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id j5zHDVawNcQ4 for <idr@ietfa.amsl.com>; Wed, 17 Aug 2022 07:50:18 -0700 (PDT)
Received: from mail-ed1-x530.google.com (mail-ed1-x530.google.com [IPv6:2a00:1450:4864:20::530]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DD1ACC14CE46 for <idr@ietf.org>; Wed, 17 Aug 2022 07:50:18 -0700 (PDT)
Received: by mail-ed1-x530.google.com with SMTP id y3so17816671eda.6 for <idr@ietf.org>; Wed, 17 Aug 2022 07:50:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raszuk.net; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=sjJjyxOuBFCq6YmLDeHt0LMotmRWTfMoZuq19LR+W54=; b=HOiCyIw/HGBgilIaMBRmUw3uSCWQxONkOI/Ts31xZIWAE8mlJNLzVZMxKYUQUjT94l q41I/RKw+tpVDmmW5xI8ZKgfEjLXvOQNm+0+2CgYv2rbJiltaxljS5r/Z6Vapk1b0vXM iYwodBOuYxaPnmAHM5hbKATt8yHtQo+Ipta5uLT8//S0pwCVUwhhcrUDIQCvynihuKJl EFA8ZT6bfdT0NlbFadl3xpYmzkn/kfhxELvM9iCqHEJp70vc6JIITy4Fijlfmy0280FK JNH53IINzbT3MxqUevvNZYCbz0OwKfmlLQWAoWe8tiJDdsIQm7/W/fFnNCKsC7drncfT B1dw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=sjJjyxOuBFCq6YmLDeHt0LMotmRWTfMoZuq19LR+W54=; b=sQ3sZKGHwOyujhyFJ0KtKnwhSdl1ePb4VAydqLV6niWPHtMfE5jZ2COEPMpMummBjG 4YIxziHldZVWbmomYOhyqSyldkTySsnTogfx5R1muGtsRSCfrY719QUSCGseaIfamRYK wP94OGaabuGinclr+5SCqsDspnlI4sywlwXUsg+DnCweW510/Blc/1J1xYNdu3RASgSd kMmKajZSesmO5/CrkGnuN1u7sxDG488NBK3UEDYVKX+qnIILbhOPZo7XgQIIbihodQqq qEOm9sJdXf0tMmYbgnOx5sxnT7+D6HNFxDNsZ3ysudky2zeakm4yLm4kbqNMJ29gsdHs KYOA==
X-Gm-Message-State: ACgBeo0ryMrO3Bp98NBX7ReKpnKiR2ohhXHncyxp5qKmSi/u9ut9H6op pMzIApSjSSk6liy07U3w0ZKAPRlihFguPhkznfttsQ==
X-Google-Smtp-Source: AA6agR7xZcBOFhOlzlQAJ7QA1VHRZUbsxQT1KLl7KVowSHHCClTN/horkY3JusvF43HB+tDPl+203EPPK+gSVpGqb24=
X-Received: by 2002:aa7:cd51:0:b0:440:595d:aeed with SMTP id v17-20020aa7cd51000000b00440595daeedmr23527311edw.143.1660747817034; Wed, 17 Aug 2022 07:50:17 -0700 (PDT)
MIME-Version: 1.0
References: <CAMFGGcC19MJ4poutfp_C-=14RjQeNQXgc24vHyXoQsdZLNq5PQ@mail.gmail.com> <CABNhwV0b6ODL8u+VG8aYLRD9vQxwupYQT5DL0wBfZoOx-oCsZg@mail.gmail.com> <CABNhwV2v4h2Sr_jKOUPsr-jdq-SbpD7xOLsazZC8zT3J3os_Ow@mail.gmail.com> <CAOj+MMFxHoZ8=gsF3bHho+CRp3XPo4=2WSp_jAvWSXzFzOr74Q@mail.gmail.com> <Yvo2FEBH6tM3ttKd@diehard.n-r-g.com> <CAOj+MMGTQSOYbd6g55vquzBoE2EEGMu4QSMDpYSTWvFhX4+BHg@mail.gmail.com> <CAEm8Q11M35gp=m2pMjnQ_RnQ4S_Otx4wugwx03QRPDvCzMWcyw@mail.gmail.com> <CAOj+MMEdWr4mnp0Cr9QSQ+Msfb6jHwziu=ttPGhdXUrtgtZqBw@mail.gmail.com> <Yvp3eZ4iDccWNmIR@shrubbery.net> <CAOj+MMER5fTqyyXhFB0VkL51CHKC81=DNfGeqtHqPEcAgS0LBw@mail.gmail.com> <Yvq12HOd+1HPPa/t@shrubbery.net> <CAOj+MMFNVM7TrpGGrreWufkP97X0n0W11y2eOsnss+v5irE62g@mail.gmail.com> <AM7PR07MB6248651F07184633E93B1144A06A9@AM7PR07MB6248.eurprd07.prod.outlook.com> <83BA8ED7-3ABF-4079-AFC5-F9F60CEA9668@pfrc.org>
In-Reply-To: <83BA8ED7-3ABF-4079-AFC5-F9F60CEA9668@pfrc.org>
From: Robert Raszuk <robert@raszuk.net>
Date: Wed, 17 Aug 2022 16:50:06 +0200
Message-ID: <CAOj+MMFAjWsG_8TVjFtAhHROxYwjcJHDkEPF3y8bKhKRVU9kgw@mail.gmail.com>
To: Jeffrey Haas <jhaas@pfrc.org>
Cc: tom petch <ietfc@btconnect.com>, "idr@ietf. org" <idr@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000b23d9705e670fb82"
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/Gcv1hys_MDr2sb_YZKNeXyYoAWo>
Subject: Re: [Idr] New Version Notification for draft-spaghetti-idr-bgp-sendholdtimer-05.txt
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2022 14:50:22 -0000

Hi Jeff,

I agree with you that if we look at the BGP level it does make sense to
consider starting such timer after initial dump (as I mentioned in the
other note after receiving EOR marker).

But aren't you a bit worried that now BGP would need to handle lot's of TCP
internal events (errors) on a per peer basis ? Assume we are talking scale
here 5K+ peers.

Take your favourite code base - what events/errors would you expect to get
from TCP to start the timer. Now how do you know that perhaps peer drained
from TCP buffer just a bit (say 19 octets Keepalive) which should
effectively stop/reset the timer.

Now taking it a bit further - consider your BGP code to run on a few
different Operating Systems each with different TCP implementation. Now the
real fun starts when you have to adjust your BGP code to custom TCP
errors/events optimistically assuming that all would be capable to act
properly in this regard.

So with that, detecting really stuck sessions for a reasonably long time at
the TCP level IMHO seems a much better option.

Thx,
Robert


On Wed, Aug 17, 2022 at 3:25 PM Jeffrey Haas <jhaas@pfrc.org> wrote:

>
>
> > On Aug 17, 2022, at 6:48 AM, tom petch <ietfc@btconnect.com> wrote:
> >
> > From: Idr <idr-bounces@ietf.org> on behalf of Robert Raszuk <
> robert@raszuk.net>
> > Sent: 15 August 2022 22:18
> >
> > https://mailarchive.ietf.org/arch/msg/idr/McRvkJ6UiNwJSKvGs0GPaqDfovA/
> >
> > #1 - IMO it would be a pretty bad idea to apply Send_Hold_Timer to a
> booting node. If at all this timer should fire only after receiving the
> <EOR> marker on the session.
> >
> > #2 - Authors of this draft target cases where a peer is stuck for
> "days/weeks" ... I am yet to see a BGP node taking that much to boot.
> >
> > <tp>
> > At a slight tangent, the I-D fails to state when the timer should be
> running, in which of the FSM states, which leaves much to the imagination.
> Since this is a function of TCP and not BGP, then it could be applicable as
> soon as there is a TCP connection and so apply to most state.
>
> This detail is one of the things that has me uneasy about blanketly
> applying the example feature Enke Chen highlighted from the Linux option.
>
> The challenge being faced is when BGP decides that the remote side is
> "taking too long".  Initial startup scenarios where the firehose of a RIB
> exchange are one of the places where a little forgiveness in timers may be
> reasonable.  This is especially true if you're looking at some sort of
> larger outage where every single device in a "region" may be trying to
> restart simultaneously.
>
> Relevant to your point, Tom, figuring out the exact touch point in the FSM
> to hook this is challenging enough.  If we permit an implementation to turn
> it on "later" at a point subjective to the implementation, the FSM isn't
> exactly good about that right now.  It'd become an asynchronously started
> timer.
>
> -- Jeff (still cursing Alex for the FSM)