Re: [Idr] BGP Auto-Discovery Protocol State Requirements

Robert Raszuk <robert@raszuk.net> Tue, 23 March 2021 15:19 UTC

Return-Path: <robert@raszuk.net>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D72563A11FF for <idr@ietfa.amsl.com>; Tue, 23 Mar 2021 08:19:06 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=raszuk.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YQLnWDPO0d9V for <idr@ietfa.amsl.com>; Tue, 23 Mar 2021 08:19:02 -0700 (PDT)
Received: from mail-lf1-x12c.google.com (mail-lf1-x12c.google.com [IPv6:2a00:1450:4864:20::12c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A37CC3A11FB for <idr@ietf.org>; Tue, 23 Mar 2021 08:19:01 -0700 (PDT)
Received: by mail-lf1-x12c.google.com with SMTP id b83so27184305lfd.11 for <idr@ietf.org>; Tue, 23 Mar 2021 08:19:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raszuk.net; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=e1xtmI8npXBHwuUjY7Bf9yQ89tbimidf6pcsaPG3Fes=; b=ZOnFpV4unMFmuKDcOgEA3VWRyLFGn2JPuzi4LBWn2OQuDPqUouYr2ES3UXz5NzMgMe Ve4v5Bq6dVkeggU9WDQ0ay7FN9XB2M85UcGI4v+k6qRy3TQcfcNjW/+SRRt1kKQDXqgp pfT9hPSVtbGVXij1SwdH3bcVp38TBcetY+0ETAVaiCWmJK+wBzT40LNVgJn+nXEnwqvY mXN92/zfao6WjTWaoVT+XqI/yI85VeTw+hDsodJj5OmhPdIWafB2evIRoQqK3rO8BD4t YfCnA5YbjakH79Y/MBQw07A6pBcTY3yOQveIyuWprQHWhFZ3G2M4yMSeaN7jkt/FvNoY a+hg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=e1xtmI8npXBHwuUjY7Bf9yQ89tbimidf6pcsaPG3Fes=; b=N0ZHHNJlRlZixkh0XCuZw3c/QLg5WUYfgKqyiqeM53z1Ru3KoaI9XT3gCLdD6TONjx NzeejYzIk7GbglkKLRwyDJ88aTLHsKYnGpDH5xW+AB56obwrPLyhiWMn2n69IQWoIiRh prG0iIoY5XGmnp6D4uUFwBi8eT2XSE3Lhv1ats0KQpHtVLWGBckr1AUldlIH4ni8o0h4 QSIHVErM/QVZP73UyIRfz+wCHQFrURiFfs/XgHT56k0QYPGP6fbkrdp2t86aQyxIl10f l2AbDefN980j7swkFYEpidQPliQA3XOpsk7p6wXunQPGXA/zLqAaVRA4IhJEG3lvHW/4 DvRQ==
X-Gm-Message-State: AOAM532WrnN99PrFJqSjOLKasaVwXmGkAtbA7/Y486OeCWfYNmTYbFtk fIzgG4IG+eNvWtCP+MVqOp3o4zIcEWQWMGTsdB/iww==
X-Google-Smtp-Source: ABdhPJyMVZbgVgOune5xZHqffIs8tRiT7qMhvvwCtw5bS0PYJW4B2vxsxYaHXhP4iU9BUbAcTivQJKvqCENHvjbXJPI=
X-Received: by 2002:ac2:491d:: with SMTP id n29mr2945604lfi.541.1616512737805; Tue, 23 Mar 2021 08:18:57 -0700 (PDT)
MIME-Version: 1.0
References: <20210319135025.GK29692@pfrc.org> <CAOj+MMGndgwqLoV_Un_1Bu3F3xPkg9ZD6=4V5FmYJgQiPD_1yw@mail.gmail.com> <20210319143448.GM29692@pfrc.org> <CAOj+MMFKqpZCyzDbGr0JzZLu7sjEw9NBQ=J9rTqDOuP+Yf1mog@mail.gmail.com> <20210319144657.GO29692@pfrc.org> <CAOj+MME8GB4jo_q3kHm1jx6E60GCHeU-pz0eYy_96BJ+ak7_Bw@mail.gmail.com> <20210319152832.GP29692@pfrc.org> <BYAPR08MB549328E3379E94589DC3CE0885649@BYAPR08MB5493.namprd08.prod.outlook.com> <20210323120515.GA31047@pfrc.org> <CAOj+MMGY+sMHr29Uw4bFct9kxoBnp=fJDULVjvFQL1UxC3JYtQ@mail.gmail.com> <20210323150837.GB31047@pfrc.org>
In-Reply-To: <20210323150837.GB31047@pfrc.org>
From: Robert Raszuk <robert@raszuk.net>
Date: Tue, 23 Mar 2021 16:18:47 +0100
Message-ID: <CAOj+MMES0hiWdVy=B_HnYmobtyOR87LBrnCwEEFcJAGLwud+=Q@mail.gmail.com>
To: Jeffrey Haas <jhaas@pfrc.org>
Cc: "Fomin, Sergey (Nokia - US/Mountain View)" <sergey.fomin@nokia.com>, "idr@ietf.org" <idr@ietf.org>, "Acee Lindem (acee)" <acee=40cisco.com@dmarc.ietf.org>
Content-Type: multipart/alternative; boundary="00000000000083228b05be35b3e4"
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/RyAglTL4Mb4yWUZM0EvNxpy_y7Y>
Subject: Re: [Idr] BGP Auto-Discovery Protocol State Requirements
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 23 Mar 2021 15:19:07 -0000

Hi Jeff,

While actually I tend to agree with some of your comments I think they are
only valid if we are really talking about "high scales".

"High" in the context of BGP.  (Which to me means at least 1000s)

So tell me (and all of us here) where in the DC fabric you see this high
scale especially if we are only talking here about underlay (as draft
says).

Moreover as BGP is trying to do the IGP job here it may be perhaps useful
to tak a look on how IGPs deal with establishing adjacencies and if our
worries BGP may fall into would be something one needs to also consider
when planning to use link state in the DC fabric.

And that is why IMHO unless we clearly state the scope of this work (for
example to limit it to peers connected on the same L2 link - same L3 subnet
- no over the top transport, directly connected ebgp, ebgp between
loopbacks, no ibgp etc ... ) then I am afraid we will keep a bit of
ping-pong here as each person's view may be in fact correct in the specific
deployment scenario one has in mind for the new functionality under
discussion. Except that the deployment scenarios are very different.

Many thx,
Robert.


On Tue, Mar 23, 2021 at 3:46 PM Jeffrey Haas <jhaas@pfrc.org> wrote:

> Robert,
>
> On Tue, Mar 23, 2021 at 02:23:32PM +0100, Robert Raszuk wrote:
> > > As noted in the thread, the important property a protocol must design
> for
> > > is how you handle retries.
> >
> > As this point is being brought back again let's zoom on it a bit here.
> > Clearly the above indicates changes of "handling retries" to current
> > RFC4271.
> >
> > So I assume you possess some operational evidence which would lead to
> > suggest that if peer IP address is learned via auto discovery we need to
> > apply a different BGP OPEN retry procedure or at least timers then if the
> > same peer's IP address is pushed by cfg from mgmt station. Can you share
> > this data ?
>
> Thanks for this framing, it's exactly the point the WG needs to discuss for
> retries.
>
> If a router configures a hundred BGP sessions, it has a hundred that it
> expects to deal with.  This includes sessions that fail to connect.  But
> the
> general expectation when you have sessions configured is that you expect
> them to connect.
>
> With a discovery protocol, you have at best a rough bound of the number of
> interfaces that it is running on.  It may be more.  If the peers aren't
> found to be acceptable, they'll continue on indefinitely.
>
> Is that acceptable?  Your argument is roughly that "you asked for it, deal
> with the consequences".
>
> At low scales, it's probably quite fine.
>
> At high scales, with a central listen socket for incoming sessions, you may
> end up exhausting the "backlog" parameter of the socket and result in a
> denial of service of legitimate sessions.
>
> > If not could you please describe how auto discovered peers would need to
> > have different BGP FSM then say those configured with mistakes. To me
> this
> > difference is not obvious.
>
> The distinction is when you instantiate a session for a discovered peer and
> when you keep it provisioned vs. quiescieng or deleting it.  Remember,
> discovery may be a denial of service on the discovering party as well.
>
> If a discovered peering session is unacceptable, why would you keep running
> the BGP state machine over and over if it's not going to improve?
>
> The options we have upon discovering a peer that isn't acceptable are:
> 1. Quiesce the disovered session and require a manual clearing event.
> 2. Keep the instantiated session running even if it never connects.
>    Analogous to explicit configuration of a misconfigured session.
> 3. The discovery protocol itself tells us that the situation has changed.
>    This gives the implementation the option to quiesce without requiring a
>    manual clearing event.
>
> > Now let's see if the same concerns or observations apply if you limit
> auto
> > discovery to *only* same L2 domain.
>
> It's a matter of scale, not scope.
>
> > Last the draft says:
> >
> > "The auto-discovery mechanism will not replace or conflict with data
> > exchanged by the BGP FSM, including its OPEN message."
> >
> > But if you are thinking about modifications of handling BGP session
> retries
> > then I guess the above needs to have one exception added.
> >
> > Then we will see different OPEN behaviour depending if peer got auto
> > discovered vs provisioned ...
>
> The distinction isn't about what to do about a BGP session that has been
> created by the discovery mechanism.  It's the fact that a session is
> created
> by the discovery mechanism and whether it's left running in a broken mode
> or
> not.
>
> -- Jeff
>