Re: [Idr] Adoption of draft-varlashkin-bgp-nh-cost-02 as IDR WG document?

"UTTARO, JAMES" <> Sun, 27 November 2011 01:21 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id ECC3B21F8880 for <>; Sat, 26 Nov 2011 17:21:33 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -105.946
X-Spam-Status: No, score=-105.946 tagged_above=-999 required=5 tests=[AWL=0.053, BAYES_00=-2.599, J_CHICKENPOX_41=0.6, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id wPpcGk+ULg+n for <>; Sat, 26 Nov 2011 17:21:33 -0800 (PST)
Received: from ( []) by (Postfix) with ESMTP id D6E3F21F87D9 for <>; Sat, 26 Nov 2011 17:21:32 -0800 (PST)
X-Originating-IP: []
X-StarScan-Version: 6.3.6; banners=-,-,-
X-VirusChecked: Checked
Received: (qmail 1332 invoked from network); 27 Nov 2011 01:21:28 -0000
Received: from (HELO ( by with DHE-RSA-AES256-SHA encrypted SMTP; 27 Nov 2011 01:21:28 -0000
Received: from (localhost.localdomain []) by (8.14.4/8.14.4) with ESMTP id pAR1Lu12017846; Sat, 26 Nov 2011 20:21:57 -0500
Received: from ( []) by (8.14.4/8.14.4) with ESMTP id pAR1LoPd017798 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Sat, 26 Nov 2011 20:21:50 -0500
Received: from ([]) by ([]) with mapi id 14.01.0339.001; Sat, 26 Nov 2011 20:21:20 -0500
From: "UTTARO, JAMES" <>
To: "''" <>
Thread-Topic: [Idr] Adoption of draft-varlashkin-bgp-nh-cost-02 as IDR WG document?
Thread-Index: AcykPwG7jl6Kwgf+R0CL/omsorSMmQF5IPuAAIh6m/AAERL1gAAFMLfg
Date: Sun, 27 Nov 2011 01:21:20 +0000
Message-ID: <>
References: <> <> <> <>
In-Reply-To: <>
Accept-Language: en-US
Content-Language: en-US
x-originating-ip: []
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: " List" <>
Subject: Re: [Idr] Adoption of draft-varlashkin-bgp-nh-cost-02 as IDR WG document?
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Inter-Domain Routing <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sun, 27 Nov 2011 01:21:34 -0000

Comments In-Line..

Jim Uttaro

-----Original Message-----
From: Robert Raszuk [] 
Sent: Saturday, November 26, 2011 12:21 PM
Cc: 'Anton Elita'; List
Subject: Re: [Idr] Adoption of draft-varlashkin-bgp-nh-cost-02 as IDR WG document?


 > First of I do not anticipate having to send more than two paths for
 > almost all cases.

Great that you said this. As tier 1 I have 25 paths for each b_net. I 
have a path for each of my peering point with other tier1s or tier2s

How do I figure which ones are optimal for the client or group of 
clients located in the same POP from the route reflector pov ?
[Jim U>] I believe that there should be some determination made as to what to flood.. I wasn't actually thinking of it from a topology sense but from a perspective of which prefixes should get the enhanced service definition above and beyond a traditional internet definition.. 
[Jim U>] Hmm I guess I am not understanding this. I am under the impression that path P1 is learned from let's say 25 different peering points.. first let's examine let's examine P1 from Peer A and Peer B where these two peers are close together.. The RR serving these two POPs would forward either P1:NH=A and P1:NH=B to RR1 based on its local BGP Table that has been statically configured (Ouch) .. If RR1 also learns P1:NH=C then you want the RR1 to evaluate the BGP "Cost" and select the NH with the best IGP cost.. is that right? Ok So RR1 has statically accumulated the cost from itself to RR and to the egress peer?  Is it really your intention to create unique tables at every point in the network which would essentially be an IGP graph of the topology using the local node as the root.. That is an operational nightmare and probably a catastrophe.. Again as I stated before there is no way to adapt to a changing topology due to failure or maintenance.. can you address that.. 

It is not clear how sub-optimal the selection of P1 with addpaths=2 would be. Certainly there are other metrics i.e AS-PATH length which will immediately paths Again to our ex.. with addpaths=2 in the RR topology you would always send the two best IGP paths based on cost.. If you do this at every point how sub-optimal could it be?  Have you done an analysis of this... That would be a good addition to the draft as it would clear up any confusion in terms of addpaths applicability and the need for this draft..

With add-paths or without add-paths the problem to be solved remains 
exactly the same.

Problem goes away only in the following cases:

- if you have only 1 path for all prefixes,
- if you have only 2 paths max for all prefixes,
- if you decide to send _all_ my domain external paths to all clients .. 
in the above case 25 paths ! Sorry ... bad idea.

 > The AIGP metric is used to carry the "cost" to the NH of the path..

AIGP is completely irrelevant here. In fact it is a pretty poor design 
choise to choose a path for all clients based on AIGP metric or IGP 
metric from the point of view of control plane route reflector in a 
given AS.

Keep in mind that AIGP is relevant only when number of ASes are under 
the same administration. Not that common case in the Internet :)Usually 
folks do know how to run single AS globally without any issues.

The problem which needs to be solved is to provide information to the RR 
in order for RR client to be able to receive optimal paths - optimal for 
example to follow hot potato routing requirements in a given AS.

If one would follow AIGP metric some control plane RR clients may 
clearly get not optimal exit paths from the AS they reside point of view.
[Jim U>] I thought you wanted an RR-client ( PE ) to select the best egress NH for a given path..Assuming a BGP 3107 design with AIGP ( pseudo IGP )


> Ilya,
> Apologies for not responding sooner.. I do have some questions in re
> the draft..
> 1. Motivation
> "ADDPATH solves this problem by letting route-reflector to advertise
> multiple paths for given prefix.  If number of advertised paths
> sufficiently big, route-reflector clients can choose same route as
> they would in case of full-mesh.  This approach however places
> additional burden on the control plane."
> First of I do not anticipate having to send more than two paths for
> almost all cases. Secondly this is applied on a per prefix basis so
> there is not a doubling of routing state.. So I do not think this
> concern is warranted.. There are tools and guidelines for setting the
> number of paths see
> for
> a complete description which examines a number of use cases..
> "For example, if next-hop information itself has been learned via BGP
> then simple SPF run on link-state database won't be sufficient to
> obtain cost information."
> The AIGP metric is used to carry the "cost" to the NH of the path..
> This draft was developed to address the exact issue of carrying NH
> info in bgp and losing the notion of an accumulated cost.. Why is
> that not sufficient?
> I also do not see how you del with this except for the creation of a
> table of NHs and associated costs.. But how is that created? Does the
> new AFI/SAFI accumulate costs across multiple IPG domains?
> I would like to understand what scenarios require this solution above
> and beyond a correct addpaths setting and/or AIGP..
> 2. Next-Hop Information Base
> "  NHIB can be populated from various sources both static and
> dynamic. This document focuses on populating NHIB using BGP.  However
> it is possible that protocols other than BGP could be also used to
> populate NHIB."
> This is a bit vague.. if you are learning NHs in BGP without the
> accumulated cost then the only way to populate is to statically
> define costs to NHs.. this seems burdensome and is not easily change
> to reflect the dynamic of cost due to maintenance or failure..
> putting aside the difficulty in creating and maintaining this NHIB it
> seems that it cannot respond to changes in the topology..
> Lots of chatter about the sanctity of the BGP Best path selection..
> Have you spoken to folks who seem to be vehemently opposed?
> 4.
> A general note about introducing AFI/SAFIs.. This is always not
> trivial as it introduces scale, control plane complexity and
> vulnerability.. What happens if the AFI/SAFI becomes compromised? It
> would be good if you could describe that here.. My assumption would
> be that you would fall back to igp cost?
> In summary, I do not understand how addpaths and AIGP do not solve
> this problem.
> -----Original Message----- From:
> [] On Behalf Of Anton Elita Sent:
> Wednesday, November 23, 2011 11:05 AM To: List Subject:
> Re: [Idr] Adoption of draft-varlashkin-bgp-nh-cost-02 as IDR WG
> document?
> Support