Re: [Idr] BGP Auto-Discovery Protocol State Requirements

Jeffrey Haas <jhaas@pfrc.org> Tue, 23 March 2021 14:47 UTC

Return-Path: <jhaas@slice.pfrc.org>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 91BA73A108E for <idr@ietfa.amsl.com>; Tue, 23 Mar 2021 07:47:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001] autolearn=unavailable autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GJHsfr3rPFU8 for <idr@ietfa.amsl.com>; Tue, 23 Mar 2021 07:47:13 -0700 (PDT)
Received: from slice.pfrc.org (slice.pfrc.org [67.207.130.108]) by ietfa.amsl.com (Postfix) with ESMTP id 489953A1068 for <idr@ietf.org>; Tue, 23 Mar 2021 07:46:56 -0700 (PDT)
Received: by slice.pfrc.org (Postfix, from userid 1001) id 1133F1E447; Tue, 23 Mar 2021 11:08:38 -0400 (EDT)
Date: Tue, 23 Mar 2021 11:08:37 -0400
From: Jeffrey Haas <jhaas@pfrc.org>
To: Robert Raszuk <robert@raszuk.net>
Cc: "Fomin, Sergey (Nokia - US/Mountain View)" <sergey.fomin@nokia.com>, "idr@ietf.org" <idr@ietf.org>, "Acee Lindem (acee)" <acee=40cisco.com@dmarc.ietf.org>
Message-ID: <20210323150837.GB31047@pfrc.org>
References: <20210319135025.GK29692@pfrc.org> <CAOj+MMGndgwqLoV_Un_1Bu3F3xPkg9ZD6=4V5FmYJgQiPD_1yw@mail.gmail.com> <20210319143448.GM29692@pfrc.org> <CAOj+MMFKqpZCyzDbGr0JzZLu7sjEw9NBQ=J9rTqDOuP+Yf1mog@mail.gmail.com> <20210319144657.GO29692@pfrc.org> <CAOj+MME8GB4jo_q3kHm1jx6E60GCHeU-pz0eYy_96BJ+ak7_Bw@mail.gmail.com> <20210319152832.GP29692@pfrc.org> <BYAPR08MB549328E3379E94589DC3CE0885649@BYAPR08MB5493.namprd08.prod.outlook.com> <20210323120515.GA31047@pfrc.org> <CAOj+MMGY+sMHr29Uw4bFct9kxoBnp=fJDULVjvFQL1UxC3JYtQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAOj+MMGY+sMHr29Uw4bFct9kxoBnp=fJDULVjvFQL1UxC3JYtQ@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/e1F8ElvNC-Qoy5QXyFVXTaxKbag>
Subject: Re: [Idr] BGP Auto-Discovery Protocol State Requirements
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 23 Mar 2021 14:47:16 -0000

Robert,

On Tue, Mar 23, 2021 at 02:23:32PM +0100, Robert Raszuk wrote:
> > As noted in the thread, the important property a protocol must design for
> > is how you handle retries.
> 
> As this point is being brought back again let's zoom on it a bit here.
> Clearly the above indicates changes of "handling retries" to current
> RFC4271.
> 
> So I assume you possess some operational evidence which would lead to
> suggest that if peer IP address is learned via auto discovery we need to
> apply a different BGP OPEN retry procedure or at least timers then if the
> same peer's IP address is pushed by cfg from mgmt station. Can you share
> this data ?

Thanks for this framing, it's exactly the point the WG needs to discuss for
retries.

If a router configures a hundred BGP sessions, it has a hundred that it
expects to deal with.  This includes sessions that fail to connect.  But the
general expectation when you have sessions configured is that you expect
them to connect.

With a discovery protocol, you have at best a rough bound of the number of
interfaces that it is running on.  It may be more.  If the peers aren't
found to be acceptable, they'll continue on indefinitely.

Is that acceptable?  Your argument is roughly that "you asked for it, deal
with the consequences".  

At low scales, it's probably quite fine.  

At high scales, with a central listen socket for incoming sessions, you may
end up exhausting the "backlog" parameter of the socket and result in a
denial of service of legitimate sessions.

> If not could you please describe how auto discovered peers would need to
> have different BGP FSM then say those configured with mistakes. To me this
> difference is not obvious.

The distinction is when you instantiate a session for a discovered peer and
when you keep it provisioned vs. quiescieng or deleting it.  Remember,
discovery may be a denial of service on the discovering party as well.

If a discovered peering session is unacceptable, why would you keep running
the BGP state machine over and over if it's not going to improve?

The options we have upon discovering a peer that isn't acceptable are:
1. Quiesce the disovered session and require a manual clearing event.
2. Keep the instantiated session running even if it never connects.
   Analogous to explicit configuration of a misconfigured session.
3. The discovery protocol itself tells us that the situation has changed.
   This gives the implementation the option to quiesce without requiring a
   manual clearing event.

> Now let's see if the same concerns or observations apply if you limit auto
> discovery to *only* same L2 domain.

It's a matter of scale, not scope.

> Last the draft says:
> 
> "The auto-discovery mechanism will not replace or conflict with data
> exchanged by the BGP FSM, including its OPEN message."
> 
> But if you are thinking about modifications of handling BGP session retries
> then I guess the above needs to have one exception added.
> 
> Then we will see different OPEN behaviour depending if peer got auto
> discovered vs provisioned ...

The distinction isn't about what to do about a BGP session that has been
created by the discovery mechanism.  It's the fact that a session is created
by the discovery mechanism and whether it's left running in a broken mode or
not.

-- Jeff