Re: [GROW] Comment on draft-iops-grow-bgp-session-culling

Tore Anderson <tore@fud.no> Tue, 14 March 2017 12:41 UTC

Return-Path: <tore@fud.no>
X-Original-To: grow@ietfa.amsl.com
Delivered-To: grow@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A85F312706D for <grow@ietfa.amsl.com>; Tue, 14 Mar 2017 05:41:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.902
X-Spam-Level:
X-Spam-Status: No, score=-1.902 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ay5r_GRoN-b5 for <grow@ietfa.amsl.com>; Tue, 14 Mar 2017 05:41:20 -0700 (PDT)
Received: from mail.fud.no (mail.fud.no [IPv6:2a02:c0:4f0:bb02:f816:3eff:fed3:8342]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1899B127735 for <grow@ietf.org>; Tue, 14 Mar 2017 05:41:20 -0700 (PDT)
Received: from [2a02:c0:2:1:1194:17:0:1029] (port=53268 helo=echo.ms.redpill-linpro.com) by mail.fud.no with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.86_2) (envelope-from <tore@fud.no>) id 1cnllJ-0000lM-Oo; Tue, 14 Mar 2017 12:41:13 +0000
Date: Tue, 14 Mar 2017 13:41:13 +0100
From: Tore Anderson <tore@fud.no>
To: Job Snijders <job@ntt.net>
Message-ID: <20170314134113.46519b10@echo.ms.redpill-linpro.com>
In-Reply-To: <20170314121344.hwfdjlirgskkg4ho@Vurt.local>
References: <20170313121134.6676bd02@echo.ms.redpill-linpro.com> <71D584DF-94F5-40B3-BCE0-4736354ECCCB@harg.net> <20170314072225.55fdd871@echo.ms.redpill-linpro.com> <58C7BD67.6080308@foobar.org> <20170314111326.3714e0ed@echo.ms.redpill-linpro.com> <58C7D033.8060203@foobar.org> <20170314123054.3b971d1d@echo.ms.redpill-linpro.com> <58C7D895.4070002@foobar.org> <20170314130508.63d4fcba@echo.ms.redpill-linpro.com> <20170314121344.hwfdjlirgskkg4ho@Vurt.local>
X-Mailer: Claws Mail 3.14.1 (GTK+ 2.24.31; x86_64-redhat-linux-gnu)
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/grow/5-kTUuJtWELhu-VgVbRWVoY2liU>
Cc: grow@ietf.org
Subject: Re: [GROW] Comment on draft-iops-grow-bgp-session-culling
X-BeenThere: grow@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Grow Working Group Mailing List <grow.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/grow>, <mailto:grow-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/grow/>
List-Post: <mailto:grow@ietf.org>
List-Help: <mailto:grow-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/grow>, <mailto:grow-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 14 Mar 2017 12:41:22 -0000

* Job Snijders <job@ntt.net>

> TEXT:
>     In network topologies where BGP speaking routers are directly
>     attached to each other, or use fault detection mechanisms such as
>     <xref target="RFC5880">BFD</xref>, detecting and acting upon a link
>     down event (for example when someone yanks the physical connector)
>     in a timely fashion is straightforward.
> 
> So we should add something that even though detection is
> straightforward, and initiating action as a result of this event can be
> done timely, we cannot be sure of timely termination of whatever actions
> are taken because of the event, and therefor the recommendation is to
> shutdown sessions before doing maintenance, even though networks are
> directly connected to each other.
> 
> The above matches my operational experience and aligns with how we
> perform router maintenance.
> 
> There are a number of considerations:
> 
>     - an operator may not know whether they are directly connected
>     - even if directly connected, the remote side might not be able to
>       convergence in a timely fashion
> 
> Perhaps the paragraph should just be removed?

Yes.

Here's a quick suggestion or starting point for discussion (intended to
replace section 1 in its entirety):

   BGP Session Culling is the practice of ensuring BGP sessions are
   forcefully torn down before maintenance activities on a lower layer
   network commence, which otherwise would affect the flow of data
   between the BGP speakers.

   BGP Session Culling ensures that network maintenance activities
   cause the minimum possible amount of disruption, by giving BGP
   speakers advance notice of an impending outage, so they may
   preemptively react to it by gracefully converging onto alternate
   paths while the forwarding plane is still fully operational.

   The grace period required for a successful implementation BGP Session
   Culling is the sum of the time needed to detect the BGP session loss
   plus the time required for the BGP speaker to converge on alternate
   paths. The first value is in the worst case governed by the BGP Hold
   Timer (section 6.5 of [RFC4271]). The second value is implementation
   specific, but could be as much as 15 minutes or more in the case of
   sessions where a router with a slow control plane is receiving a full
   set of Internet routes.

   Operators implementing BGP Session Culling are in any case
   encouraged to avoid using a fixed grace period, but instead monitor
   forwarding plane activity while the culling is taking place and
   consider it complete once traffic levels have dropped to a minimum
   [Section 2.3].

Tore