Re: [Idr] draft-spaghetti-idr-bgp-sendholdtimer - Feedback requested

Robert Raszuk <robert@raszuk.net> Sun, 25 April 2021 11:51 UTC

Return-Path: <robert@raszuk.net>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 096243A0AB9 for <idr@ietfa.amsl.com>; Sun, 25 Apr 2021 04:51:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=raszuk.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id A0NirY4MNU7s for <idr@ietfa.amsl.com>; Sun, 25 Apr 2021 04:51:10 -0700 (PDT)
Received: from mail-lj1-x234.google.com (mail-lj1-x234.google.com [IPv6:2a00:1450:4864:20::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E0F6B3A0AB6 for <idr@ietf.org>; Sun, 25 Apr 2021 04:51:09 -0700 (PDT)
Received: by mail-lj1-x234.google.com with SMTP id a25so47469089ljm.11 for <idr@ietf.org>; Sun, 25 Apr 2021 04:51:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raszuk.net; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=u8o+mTvwQZfcXYz3pKDRIkaaOkRPof4jEzILdTRa904=; b=FMoYjSnfw+cAleQfGvKNTnoOZpYCN2vAlwXqPmaHBPX6GaSHHkZTOL42d0a++X0XqQ QvWIalTyElqa94KpCaiZoIk5Vrptmg0+4uSFvSIs9lIkdawUSsKiAt/dq2XS9KiXXxLA j4Do83u3v6IHu5sbzNGO6vBQ0xw8NOtLjzsdqSNl8NnH7bEB6pXJZNt79uFacUSxjZ0R Q3Dl3wepCzb4rcXGIIw8DLqQBFTGfltZ589tZeIEynd/+ncl6NY7hFVCLQCDlm/RxN6I 4QE8qr2WMWF66kG1TKJzv7gAB0B0nyhzSsxtnFmmTNYfTdrBA2uCD1eZQ+FM4L8C38R4 N0MQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=u8o+mTvwQZfcXYz3pKDRIkaaOkRPof4jEzILdTRa904=; b=mkpyIpYjwrauY2eNoIqhrRzjx39ZQ3VKI1EV11AfXtLsU7odzPAenco9ra3JJaF1cg p7otDg24UxhZ339GkbT+I657XPE8wOQQU9mIxei/rpvf6Sl5w0jXgXFPgTrzLOPcrA+N EAQ9w6IA9pXwgTKdaeHsd5+XO99cji6iTP12yrruv8ZnpaypD5SsK1uZecunBbYzbPm/ AV3tSeOE+3r0pHWlIKr98ji94LCGMkoV8+aAbeuZ3L+FYojT4TRs0J5qh6mO63hIsQ49 cRFprtZxwy4G6Ge0WFB7Sxe2Q1gkfzAmz7U2Gjz3SBF/ors4xjOWAFCQtjaiEy12JMEy PVuw==
X-Gm-Message-State: AOAM533FKeePlG9BXjblMaR60Y4MsK1m5reeDinyQeCDxxG59DJgNaCb jc+ty6v/U7ClpKYBkjw2Wz+P2UvScKw5iRBVzHLEFA==
X-Google-Smtp-Source: ABdhPJxPFje3cu2pTF3FdmMWXgUb7ePhmueprHSxKFDWpx4vUoAkx9Oh/3jJsmkfa189kzgO15r6XBi3tmjqc3xoxRk=
X-Received: by 2002:a2e:81cd:: with SMTP id s13mr4488215ljg.199.1619351466526; Sun, 25 Apr 2021 04:51:06 -0700 (PDT)
MIME-Version: 1.0
References: <CAL=9YSVy+mvxvAv+maxkUSzPbe0bfnUy-XJJTtcVhi3S3bm=WQ@mail.gmail.com> <20210423212348.GB19004@pfrc.org> <CAOj+MMGH+y-gxSLaakknWSPFLEk9ikkUU1fa=3H0FjkokAbg3w@mail.gmail.com> <20210424004838.GC19004@pfrc.org> <CAOj+MMH5yzpPZjdUcfXV4cxCORqCsQY4X+niBjnwxjPfN-tsJA@mail.gmail.com> <BYAPR11MB3207E4A0BDC3367E21886C55C0439@BYAPR11MB3207.namprd11.prod.outlook.com>
In-Reply-To: <BYAPR11MB3207E4A0BDC3367E21886C55C0439@BYAPR11MB3207.namprd11.prod.outlook.com>
From: Robert Raszuk <robert@raszuk.net>
Date: Sun, 25 Apr 2021 13:50:56 +0200
Message-ID: <CAOj+MMHwK6Zmh8tGGR_OQPe=frPBsczUsiWkG+NgoZLTtE2ynA@mail.gmail.com>
To: "Jakob Heitz (jheitz)" <jheitz@cisco.com>
Cc: Jeffrey Haas <jhaas@pfrc.org>, "idr@ietf. org" <idr@ietf.org>, Ben Cox <ben=40benjojo.co.uk@dmarc.ietf.org>
Content-Type: multipart/alternative; boundary="000000000000eddaee05c0caa4e5"
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/xq9YZ5gNpslC_fG9a-h9z0Lquno>
Subject: Re: [Idr] draft-spaghetti-idr-bgp-sendholdtimer - Feedback requested
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 25 Apr 2021 11:51:15 -0000

Hey Jakob,

Yes that was exactly my point.

However the fundamental idea around this proposal was to actually reset the
session and trigger withdraw suspecting that the peer is not "feeling
well".

That is unfortunately the problem with completely different deployment
scenarios for BGP which BGP spec so far is trying to treat all in the same
way.

And IMHO the only good way to solve it is to follow what other protocols
did in similar situations and formally define notion of BGP (deployment)
profiles.

What's good for one profile may be pretty bad for the other one (Internet
Tier 1 peer vs stub CE as example). Today such profiles are already in
place except each network creates their own flavor of it. Maybe its time to
simplify BGP deployments and either in IDR or in GROW start work on few key
deployment scenarios where vendors could adjust bgp behaviour with single
per peer profile configuration ?

Thx,
R.








On Sun, Apr 25, 2021 at 8:01 AM Jakob Heitz (jheitz) <jheitz@cisco.com>
wrote:

> A long time of TCP zero window does not indicate a data plane
>
> problem, nor a problem with routes received from the stuck peer.
>
> The blockage is in one direction only. The local speaker is unable
>
> to end routes to the stuck peer, but is able to receive routes
>
> from the stuck peer just fine.
>
>
>
> Therefore, I would propose that the response of the local speaker
>
> should be to retain the routes of the stuck peer when it resets the
>
> session, GR style.
>
>
>
> Indications of data plane problems or inability to receive routes from
>
> the stuck peer are separately handled.
>
>
>
> Regards,
>
> Jakob.
>
>
>
> *From:* Idr <idr-bounces@ietf.org> *On Behalf Of * Robert Raszuk
> *Sent:* Saturday, April 24, 2021 2:28 AM
> *To:* Jeffrey Haas <jhaas@pfrc.org>
> *Cc:* idr@ietf. org <idr@ietf.org>; Ben Cox <ben=
> 40benjojo.co.uk@dmarc.ietf.org>
> *Subject:* Re: [Idr] draft-spaghetti-idr-bgp-sendholdtimer - Feedback
> requested
>
>
>
> Hi Jeff,
>
>
>
> > So what we are discussing is breaking data plane just because control
> plane
> > has experienced 15 min (or worse recommended 4 min) inability to send
> > keepalives.
>
> A good analogy is the negative impacts of stale routes when you use
> Graceful
> Restart for BGP.  Can you live with the routes in that flavor of stale for
> that long?
>
>
>
> Unfortunately the answer is "it depends". If my stale route is single
> default to the world with working data plane I think the answer is clearly
> YES.
>
>
>
> So that IMHO sort of raises the question if this should be the default
> behaviour.
>
>
>
> Then honestly I am not quite clear how it should be handled on sessions
> which are setup with *draft-ietf-idr-long-lived-gr-00*
>
>
>
> > * Should we perhaps test data plane before declaring peer's failure and
> > before we reset the session ? (I understand that the paramount motivation
> > is BGP consistency here though - but this is one of those cases where one
> > size may not fit all).
>
> In many of these scenarios, BFD or ping would show the interface up.  It's
> the TCP session that is stalled out.
>
>
>
> True.
>
>
>
> I was rather thinking about checking data plane not to the subject peer
> but beyond it (through it). For iBGP - testing next hops. For eBGP some
> known test servers in yr network or in the Internet.
>
>
>
>
>
> > * Should we first withdraw received routes from our peers before
> resetting
> > the session ? At least data plane will have a chance to converge to a
> > different set of links with no sudden packet drops.
>
> Would you describe the drain scenario with the involved parties and what
> the
> congestion state is as part of that?  I don't think I'm understanding the
> above point.
>
>
>
>
>
> If I know I will bring BGP sessions down to you (in any case not only with
> this draft) I should withdraw paths from all of my peers received over such
> session and previously advertised (as best or as add-paths) before
> resetting it. So the control plane clears before we touch the data plane.
> Just to make the end to end reachability with no or minimal data plane
> impact.
>
>
>
>  Thx,
>
> R.
>