Re: [GROW] Comment on draft-iops-grow-bgp-session-culling

Tore Anderson <tore@fud.no> Tue, 14 March 2017 06:22 UTC

Return-Path: <tore@fud.no>
X-Original-To: grow@ietfa.amsl.com
Delivered-To: grow@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1F1C61293EC; Mon, 13 Mar 2017 23:22:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.902
X-Spam-Level:
X-Spam-Status: No, score=-1.902 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mIOUMTGiSoP8; Mon, 13 Mar 2017 23:22:30 -0700 (PDT)
Received: from mail.fud.no (mail.fud.no [IPv6:2a02:c0:4f0:bb02:f816:3eff:fed3:8342]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C8D3412943B; Mon, 13 Mar 2017 23:22:29 -0700 (PDT)
Received: from [2a02:c0:2:1:1194:17:0:1029] (port=49126 helo=echo.ms.redpill-linpro.com) by mail.fud.no with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.86_2) (envelope-from <tore@fud.no>) id 1cnfqk-0008HO-G1; Tue, 14 Mar 2017 06:22:26 +0000
Date: Tue, 14 Mar 2017 07:22:25 +0100
From: Tore Anderson <tore@fud.no>
To: Will Hargrave <will@harg.net>
Message-ID: <20170314072225.55fdd871@echo.ms.redpill-linpro.com>
In-Reply-To: <71D584DF-94F5-40B3-BCE0-4736354ECCCB@harg.net>
References: <20170313121134.6676bd02@echo.ms.redpill-linpro.com> <71D584DF-94F5-40B3-BCE0-4736354ECCCB@harg.net>
X-Mailer: Claws Mail 3.14.1 (GTK+ 2.24.31; x86_64-redhat-linux-gnu)
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/grow/hKRglcYshx6pgBHrx4tyal88s2Q>
Cc: grow@ietf.org, draft-iops-grow-bgp-session-culling@ietf.org
Subject: Re: [GROW] Comment on draft-iops-grow-bgp-session-culling
X-BeenThere: grow@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Grow Working Group Mailing List <grow.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/grow>, <mailto:grow-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/grow/>
List-Post: <mailto:grow@ietf.org>
List-Help: <mailto:grow-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/grow>, <mailto:grow-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 14 Mar 2017 06:22:31 -0000

* "Will Hargrave" <will@harg.net>

> Hello, Tore and GROW, I am new here.
> 
> On 13 Mar 2017, at 11:11, Tore Anderson wrote:
> 
> > Another logical consequence of this is that the rather imprecise «a 
> > few minutes» should ideally be expanded on, taking slow routers
> > such as the MX80 into account. While five minutes of culling would
> > be helpful, it would not be enough to avoid all disruption.  
> 
> One point of the technique is that the ‘lower layer caretaker’ looks 
> at their interface traffic counters to ensure traffic has dropped to 
> near-zero before commencing the ‘destructive’ part of the 
> maintenance. As a result no traffic is affected.

Agreed that this is a smart and obvious thing to do. However, the draft
does not include language to that effect, at least not in section 2.2.
I'll try to come up with some text within the week.

> Our operational experience at LONAP shows this does usually happen 
> within 5 minutes.
> 
> > If the operator/peer would decide to cull the session, say, 30 minutes
> > ahead of the maintenance, that would be great for me and others in my
> > situation.  
> 
> I suspect this may be unnecessary, but do not have extensive data to 
> back this up.

I believe the needed convergence time is largely dependent on the
number of routes that a router need to reprogram due to the link down
event. On an IP transit circuit, that number is possibly somewhere
between 650-700k. I'm guessing that most members of an IXP receive a
much smaller amount of prefixes from the other members of the IX and/or
its RS, thus reducing the amount of culling time needed.

By the way, as an IXP operator, you also have the possibility to simply
shut down your members' interfaces prior to performing maintenance,
instead of doing culling. Doing so would be completely analogous to the
directly connected BGP speakers scenario discussed in section 1 where the
draft says «detecting and acting upon a link down event (for example
when someone yanks the physical connector) in a timely fashion is
straightforward». I take the fact that you're using culling anyway as
an agreement that this technique is very much applicable to the directly
connected BGP speakers scenario too, and that the quoted sentence is
rather misleading.

Tore