bgp bestpath conquer

mate csaba <matecs@niif.hu> Mon, 19 October 2015 20:50 UTC

Return-Path: <matecs@niif.hu>
X-Original-To: routing-discussion@ietfa.amsl.com
Delivered-To: routing-discussion@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D6B4E1ACDDF for <routing-discussion@ietfa.amsl.com>; Mon, 19 Oct 2015 13:50:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 2.594
X-Spam-Level: **
X-Spam-Status: No, score=2.594 tagged_above=-999 required=5 tests=[BAYES_20=-0.001, HELO_EQ_HU=1.35, HOST_EQ_HU=1.245] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XGocLrKZ_am2 for <routing-discussion@ietfa.amsl.com>; Mon, 19 Oct 2015 13:50:55 -0700 (PDT)
Received: from strudel.ki.iif.hu (strudel.ki.iif.hu [IPv6:2001:738:0:411:20f:1fff:fe6e:ec1e]) by ietfa.amsl.com (Postfix) with ESMTP id 6C2781AC445 for <routing-discussion@ietf.org>; Mon, 19 Oct 2015 13:50:53 -0700 (PDT)
Received: from bolha.lvs.iif.hu (bolha.lvs.iif.hu [193.225.14.181]) by strudel.ki.iif.hu (Postfix) with ESMTP id C39A0979; Mon, 19 Oct 2015 22:50:51 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at bolha.lvs.iif.hu
Received: from strudel.ki.iif.hu ([IPv6:::ffff:193.6.222.244]) by bolha.lvs.iif.hu (bolha.lvs.iif.hu [::ffff:193.225.14.72]) (amavisd-new, port 10024) with ESMTP id ju9YJZdH0-6Y; Mon, 19 Oct 2015 22:50:50 +0200 (CEST)
Received: from [IPv6:2001:db8:21:0:221a:6ff:fe6d:1304] (unknown [IPv6:2001:470:25:28f::dead:beef]) by strudel.ki.iif.hu (Postfix) with ESMTPSA id F2BB98FC; Mon, 19 Oct 2015 22:50:49 +0200 (CEST)
From: mate csaba <matecs@niif.hu>
Subject: bgp bestpath conquer
To: routing-discussion@ietf.org
Message-ID: <5625577F.9070106@niif.hu>
Date: Mon, 19 Oct 2015 22:50:07 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.3.0
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <http://mailarchive.ietf.org/arch/msg/routing-discussion/HahM-AyIW-oED3Mfe_Th_qAMeIw>
Cc: nep@listserv.niif.hu
X-BeenThere: routing-discussion@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Routing Area General mailing list <routing-discussion.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/routing-discussion>, <mailto:routing-discussion-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/routing-discussion/>
List-Post: <mailto:routing-discussion@ietf.org>
List-Help: <mailto:routing-discussion-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/routing-discussion>, <mailto:routing-discussion-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 19 Oct 2015 20:50:57 -0000

hi,
we run a network with 3 route reflectors. the third one is for own and
customer prefixes. it has 2k routes, the other two rrs have full table.
the pes advertise bestexternal, so rrs have full visibility. our policy
dictates that all the routes have locpref set to express active/backup.
it's observed multiple times that the small rr sends out the updates much
faster than full table rrs. when a customer changes from active to backup,
the active pe sends the withdraw to the rrs, the rrs select the new 
bestpath,
and floods it. but in this case, the locpref changes from high to low, 
and any
pe that wants to forward for this prefix, have to get the update from 
all the
rrs, because until one high locpref exists, that will be selected locally.
the idea is that what if i change the small rr to do the following:
when it detectes a nexthop change during bestpath calculation, take the
old bestpath's locpref, increment it by 1, and send out the new bestpath
with this incremented locpref. if another nexthop change detected for this
prefix, increment the locpref again. so every time it computes the bestpath
for this prefix, it'll send out the result with an increasing locpref, 
forcing pes
to instantly start using the new path.
optionally, a minute timer could  expire the prefix, then it'll send out 
the prefix
with the received locpref, because for that time, other rrs completed their
normal flooding process, but it would double bgp traffic.
it would speed up convergence in the active->backup case because we don't
have to wait for all the rrs to finish it's work. only drawbacks i see 
now are the
following:
-it requres local hot potato routing at backup pe to work.
-maximum one conqueror rr must be used within a single cluster.
-when active route flaps, the locpref will count to 2^31.
-it disrupts igp metric usage in case of scattered rrs or addpath.
the question is, that do you see something else? any feedback welcomed!
thanks,
csaba mate
niif/as1955