Re: [Idr] Fwd: New Version Notification for draft-spaghetti-idr-bgp-sendholdtimer-05.txt

Robert Raszuk <robert@raszuk.net> Sun, 31 July 2022 20:06 UTC

Return-Path: <robert@raszuk.net>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6E179C1A5D18 for <idr@ietfa.amsl.com>; Sun, 31 Jul 2022 13:06:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.104
X-Spam-Level:
X-Spam-Status: No, score=-2.104 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=raszuk.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9hRwim4ABpIr for <idr@ietfa.amsl.com>; Sun, 31 Jul 2022 13:06:01 -0700 (PDT)
Received: from mail-ua1-x92f.google.com (mail-ua1-x92f.google.com [IPv6:2607:f8b0:4864:20::92f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6D716C1A5D19 for <idr@ietf.org>; Sun, 31 Jul 2022 13:06:01 -0700 (PDT)
Received: by mail-ua1-x92f.google.com with SMTP id h19so2813914ual.8 for <idr@ietf.org>; Sun, 31 Jul 2022 13:06:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raszuk.net; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=BtCcjLW203UhhbGAcKiGeaLdt0JdOOIh+2fizYDnee4=; b=CqOn+gQzN3eKaT+V5IeZPCc72aZ7UCg/jGSKhyvb2tXZH8dd8oKe4mueR5ndhjutBr 21At3kVsNVfI0rSdo+R+KnlWojkMbAv9VHmHH5TcLkVEA27/eHu3n0zs2M4yUSYW4NPd 6oK5idp2H1M9++zfacPuTARFv6LggCgs6NqSFanpoC4LLXfafYdjYtRuYYg0VTKoYp7c QjKX7sa1lBjBcZH5pMkQziBh4vvgDnz0Kmg72WdP0JiHXfAGyTUCRWNvui1nYXfE+KCb 0SOF67kdSsWQ1MkQ/IeyGkacSxoaPkJHki2sPSeA1sRdOJDL+nQf9t27JwF6eiA/mPlv diFQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=BtCcjLW203UhhbGAcKiGeaLdt0JdOOIh+2fizYDnee4=; b=5MSv8Linos8VYZYGiJPgS43eKJ8tS/jDAcKBxq3O8tOkKhZPhFEIvqeGuoR7L80xSh C6SPMOiwkWaM/L2Qko80rygrYnMT8g3mQzd84qSZlrZ+aTLxZBLoAFRgv/hNsoaRxObV 6BqxB7TgS+zhPim434D2ldeRkY4BVL3Lpon7weDJ/oRwm1pMQokp4uE3F/S54uPX+nAw pMUCEw9PIbyjObmhpl3tgCrVHImPvEI6PjNFOTX61GZmRstGTz7a4YOVh2rMmxK4IbqB RHjEDWsVOLp80dTa8psVUgLdEx7v163GmmdtE6MQU4ApsbiFU6wcciVCgy2pmIE/EE+S dJZg==
X-Gm-Message-State: ACgBeo0XW15XvYeaFno5sTUiLLwJwpCJS5rIz6iDVs7DSuG9vetwvC8O 12e52zmS6mY6RfnQHyuo8q6pHXEKm78k3Isbr+NZ+wiHfqy4lg==
X-Google-Smtp-Source: AA6agR4puIwAJo4mMmRkKrixmnrAlOk8JEC29NImNy2pY7EG6VZChhfG7fuBkysa4fAmjyyfdGq4WNVjDgX3WZppT6c=
X-Received: by 2002:a9f:32c9:0:b0:384:2ac0:1ea with SMTP id f9-20020a9f32c9000000b003842ac001eamr4655858uac.74.1659297960107; Sun, 31 Jul 2022 13:06:00 -0700 (PDT)
MIME-Version: 1.0
References: <CAOj+MME7XnW7kDXL4muh4Qp1UvabQ9amUoU0Sn3h2axqKzswzA@mail.gmail.com> <77F3E1F0-486F-47DF-ABE4-EFDB9C2FB6D8@gmail.com>
In-Reply-To: <77F3E1F0-486F-47DF-ABE4-EFDB9C2FB6D8@gmail.com>
From: Robert Raszuk <robert@raszuk.net>
Date: Sun, 31 Jul 2022 22:05:49 +0200
Message-ID: <CAOj+MMGR4f3eLEDZY++1m4Lpo9joG4L9OrWbeF6kREn-9a9onA@mail.gmail.com>
To: Jeff Tantsura <jefftant.ietf@gmail.com>
Cc: Ben Cox <ben@benjojo.co.uk>, Job Snijders <job=40fastly.com@dmarc.ietf.org>, "idr@ietf. org" <idr@ietf.org>
Content-Type: multipart/alternative; boundary="0000000000007d38b205e51f69d4"
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/urrW4BYmYuVP1lEvMcXx0yHCDHk>
Subject: Re: [Idr] Fwd: New Version Notification for draft-spaghetti-idr-bgp-sendholdtimer-05.txt
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 31 Jul 2022 20:06:05 -0000

Jeff.

If you are saying that RFC7130 is not yet widely available, this is not the
convincing factor here.

But nevertheless if you are building high speed DC fabric and blackholing
traffic upon last LAG member failure up to holdtime then this is IMO
suboptimal design (to put it very mildly :).

To the other point brought before for IBGP and IXP scenario failure of your
sessions to RR or RS may not mean that the data path is bad. Some folks
proposed BGP persistent ideas to keep it for hours if not days - but this
is a different aspect.

*Anyhow .. back to the draft:*

Is peer sending Receive_Window of 0 really a good trigger to bring the
session down ?

Is BGP now expected to deal with (how) reception of Window zero from any
peer at the TCP level ?

Even if so why no reception of BGP keepalives will not bring the sessions
down on the other side ? I guess this is an indication of remote side going
foo yet it still sends us keepalives.

Why do not we modify RFC4271 to only allow sending keepalives when the
remote system is indicating its liveness by sending his keepalives ? That
seems to me as much more proper behaviour then new timer.

More ...

If the RCV window is 0 how can TCP send anything to the peer including TCP
RST ?

If the peer is saturated, how good is it to assign a new IANA
error notification code if it can not be sent to the peer ?

Section 2 provides the most interesting comment out of the entire draft.
Yet authors want it removed before publication .. why ???

Last - the article https://labs.ripe.net/author/romain_fontugne/bgp-zombies/
does not make a clear conclusion of the nature of stuck routes. so I am not
sure if this is also the right reference for the draft.

Best,
R.


On Sun, Jul 31, 2022 at 1:45 AM Jeff Tantsura <jefftant.ietf@gmail.com>
wrote:

> Job/Ben,
>
> I support advancement of the draft, perhaps clarifications wrt BFD  would
> be in place.
>
> Robert - BGP at TCP level and BFD are mostly orthogonal and don’t share
> same path/achieve same goals.
> In many DC BGP networks, LAG is a primary connectivity model (more
> configuration than an actual connectivity artifact). Running BFD over LAG
> has a number of serious implications and is not supported by most open
> source implications.
>
> Cheers,
> Jeff
>
> On Jul 30, 2022, at 15:50, Robert Raszuk <robert@raszuk.net> wrote:
>
> 
> Hi Ben,
>
> Indeed as I mentioned this draft fixes some of the original RFC4271 gaps.
>
> But I am still not sure if this is the right way to fix those gaps in
> 2022.
>
> For IXPs I am not sure if reacting fast on any BGP issue to RS is really a
> good thing. BFD could be used there peer to peer with RS assistance. See:
> https://datatracker.ietf.org/doc/html/draft-ietf-idr-rs-bfd-07
>
> For iBGP sure you would need multihop BFD with all of its pros and cons.
>
> So I am not against this proposal to move fwd ... as long as it actually
> discusses BFD alternatives to accomplish the same**.
>
> **And sure as BFD it is tied to BGP - so the protocol in the event of
> noticing malfunction should signal locally to the data plane to bring down
> the session. Some vendors assure this is already happening, some are silent
> ... but bad implementations should not be an excuse for more workarounds
> (at least for those who want to bring such sessions down fast - which btw
> for IBGP may or may not be a good thing).
>
> Many thx,
> R.
>
> On Sun, Jul 31, 2022 at 12:38 AM Ben Cox <ben@benjojo.co.uk> wrote:
>
>> Robert,
>>
>> I'm not sure Bidirectional Forwarding Detection and BGP hold timers
>> are solving the same problems here.
>>
>> On a lot of platforms the BGP hold timer is managed by a control
>> plane, while it's not uncommon for the BFD session (if enabled) is
>> managed by a separate data plane. I am under the impression that there
>> are a sizable amount of deployments and peerings not running BFD, with
>> either eBGP sessions (I don't know people doing BFD with IXP Route
>> servers for example) or iBGP sessions (where there are arguments about
>> best practice on if BFD is a good or bad thing to enable inside the
>> core)
>>
>> The core part here is that BGP's own keepalive mechanism has a
>> documented flaw that needs to be fixed as it's suspected to be the
>> cause of a handful of issues in the wild.
>>
>> BFD would also not solve one of the issues this internet draft is
>> targeting (one way congestion of a tcp queue of a signalling
>> protocol). It's easy to imagine a situation where BFD keeps being
>> exchanged, but due to some other fault BGP messages stop being read.
>>
>> This internet draft is designed to try and target those faults, and is
>> not targeting rapid detection of link breaks.
>>
>> Does that help clear up our intentions?
>> Ben Cartwright-Cox
>>
>>
>>
>> On Sat, Jul 30, 2022 at 11:17 PM Robert Raszuk <robert@raszuk.net> wrote:
>> >
>> > Hi Job,
>> >
>> > In my books we should really discourage people from using BGP
>> keepalives and move them over to BFD protocol to determine liveness of BGP
>> sessions.
>> >
>> > While this draft perhaps does improve RFC4271 I am not sure if we are
>> moving in the right direction.
>> >
>> > The draft does not even mention BFD once which is disappointing.
>> >
>> > Kind regards,
>> > Robert
>> >
>> >
>> >
>> > On Sat, Jul 30, 2022 at 7:23 PM Job Snijders <job=
>> 40fastly.com@dmarc.ietf.org> wrote:
>> >>
>> >> Dear IDR,
>> >>
>> >> I’d like to bring this draft to the working group for another round of
>> consideration for WG adoption.
>> >>
>> >> There now are two implementations which have implemented the concept
>> (OpenBGPd and FRR) in releases shipping to customers.
>> >>
>> >> Our hope is that more BGP implementers take an interest to help
>> improve global Internet routing system stability.
>> >>
>> >> We welcome interested parties to help co-author the document,
>> specifically in these areas:
>> >>
>> >> - is the 4271 surgery correct?
>> >>
>> >> - graceful restart considerations?
>> >>
>> >> - do chassis/COTS router vendors feel comfortable with the suggested
>> timers, or is more leeway needed?
>> >>
>> >> The goal is to bring the fault stale state back from days/weeks to a
>> few minutes (not to race to the bottom and seek sub-minute resolution).
>> >>
>> >> We’d like to ask the IDR chairs to consider kicking off the WG
>> adoption process.
>> >>
>> >> Kind regards,
>> >>
>> >> Job
>> >>
>> >> ---------- Forwarded message ---------
>> >> From: <internet-drafts@ietf.org>
>> >> Date: Sat, 30 Jul 2022 at 13:06
>> >> Subject: New Version Notification for
>> draft-spaghetti-idr-bgp-sendholdtimer-05.txt
>> >> To: Ben Cartwright-Cox <ben@benjojo.co.uk>, Job Snijders <
>> job@fastly.com>
>> >>
>> >>
>> >>
>> >> A new version of I-D, draft-spaghetti-idr-bgp-sendholdtimer-05.txt
>> >> has been successfully submitted by Ben Cartwright-Cox and posted to the
>> >> IETF repository.
>> >>
>> >> Name:           draft-spaghetti-idr-bgp-sendholdtimer
>> >> Revision:       05
>> >> Title:          Border Gateway Protocol 4 (BGP-4) Send Hold Timer
>> >> Document date:  2022-07-30
>> >> Group:          Individual Submission
>> >> Pages:          7
>> >> URL:
>> https://www.ietf.org/archive/id/draft-spaghetti-idr-bgp-sendholdtimer-05.txt
>> >> Status:
>> https://datatracker.ietf.org/doc/draft-spaghetti-idr-bgp-sendholdtimer/
>> >> Htmlized:
>> https://datatracker.ietf.org/doc/html/draft-spaghetti-idr-bgp-sendholdtimer
>> >> Diff:
>> https://www.ietf.org/rfcdiff?url2=draft-spaghetti-idr-bgp-sendholdtimer-05
>> >>
>> >> Abstract:
>> >>    This document defines the SendHoldTimer session attribute for the
>> >>    Border Gateway Protocol (BGP) Finite State Machine (FSM).
>> >>    Implementation of a SendHoldTimer should help overcome situations
>> >>    where BGP sessions are not terminated after it has become detectable
>> >>    for the local system that the remote system is not processing BGP
>> >>    messages.  For robustness, this document specifies that the local
>> >>    system should close BGP connections and not solely rely on the
>> remote
>> >>    system for session tear down when BGP timers have expired.  This
>> >>    document updates RFC4271.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> The IETF Secretariat
>> >>
>> >>
>> >> _______________________________________________
>> >> Idr mailing list
>> >> Idr@ietf.org
>> >> https://www.ietf.org/mailman/listinfo/idr
>> >
>> > _______________________________________________
>> > Idr mailing list
>> > Idr@ietf.org
>> > https://www.ietf.org/mailman/listinfo/idr
>>
> _______________________________________________
> Idr mailing list
> Idr@ietf.org
> https://www.ietf.org/mailman/listinfo/idr
>
>