Re: [Idr] Adoption of draft-varlashkin-bgp-nh-cost-02 as IDRWG document?

"iLya" <ilya@nobulus.com> Mon, 28 November 2011 11:28 UTC

Return-Path: <ilya@nobulus.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3463321F8CC3 for <idr@ietfa.amsl.com>; Mon, 28 Nov 2011 03:28:36 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.002
X-Spam-Level:
X-Spam-Status: No, score=0.002 tagged_above=-999 required=5 tests=[BAYES_50=0.001, STOX_REPLY_TYPE=0.001]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pfcuc-c6-BYA for <idr@ietfa.amsl.com>; Mon, 28 Nov 2011 03:28:35 -0800 (PST)
Received: from nobulus.com (nobulus.com [IPv6:2001:6f8:892:6ff::11:152]) by ietfa.amsl.com (Postfix) with ESMTP id 0437C21F8CBF for <idr@ietf.org>; Mon, 28 Nov 2011 03:28:33 -0800 (PST)
Received: from nobulus.com (localhost [127.0.0.1]) by nobulus.com (Postfix) with ESMTP id 584D817128; Mon, 28 Nov 2011 12:28:30 +0100 (CET)
X-Virus-Scanned: amavisd-new at nobulus.com
Received: from nobulus.com ([127.0.0.1]) by nobulus.com (nobulus.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id paiA-7gIotOk; Mon, 28 Nov 2011 12:28:27 +0100 (CET)
Received: from hnivarlas1 (unknown [IPv6:2001:6f8:892:6f8:dd55:57ed:7072:e255]) by nobulus.com (Postfix) with ESMTPA id 3A2951717E; Mon, 28 Nov 2011 12:28:25 +0100 (CET)
Message-ID: <25F165FDE7154695971701696ED4F771@hnivarlas1>
From: iLya <ilya@nobulus.com>
To: "UTTARO, JAMES" <ju1738@att.com>, robert@raszuk.net
References: <7B61DC70-08D9-46C7-8C15-5DD339C61C2D@juniper.net><CAKP9Kx_k+RWqDdrh+64-G8jtyOOYJsefv_5481u_MA8_P42eOg@mail.gmail.com><B17A6910EEDD1F45980687268941550FA34B06@MISOUT7MSGUSR9I.ITServices.sbc.com><4ED12015.6060002@raszuk.net> <B17A6910EEDD1F45980687268941550FA34B6A@MISOUT7MSGUSR9I.ITServices.sbc.com>
In-Reply-To: <B17A6910EEDD1F45980687268941550FA34B6A@MISOUT7MSGUSR9I.ITServices.sbc.com>
Date: Mon, 28 Nov 2011 12:28:22 +0100
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="iso-8859-1"; reply-type="original"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
Importance: Normal
X-Mailer: Microsoft Windows Live Mail 14.0.8117.416
X-MimeOLE: Produced By Microsoft MimeOLE V14.0.8117.416
Cc: idr@ietf.org
Subject: Re: [Idr] Adoption of draft-varlashkin-bgp-nh-cost-02 as IDRWG document?
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/idr>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Nov 2011 11:28:36 -0000

Jim,

thanks for your review. Before I go into addressing your points, please bear 
in mind that NH SAFI is not a competing but complementing technology to 
ADD-PATH. Neither of these two can solve certain problems alone, but 
together they provide comprehensive toolkit for wide range of scenarios.

--------------------------------------------------
> > First of I do not anticipate having to send more than two paths for
> > almost all cases.
>

To amplify Robert's response - I also have widely distributed set of ASBR's 
that peer with foreign networks in Asia, Europe and America. Currently 
they're fully meshed because ADD-PATH alone would require me either to 
deploy almost as many RR's as I have ASBR's today or to send 4 to 8 various 
path's per prefix, else I might endup sending traffic from Hong Kong to West 
Coast US via Europe or from Netherlands to Germany via East Coast US. With 
NH SAFI _plus_ ADD-PATH solution for optimum redundant routeing is to obtain 
next-hop cost from each RR-client via NH SAFI and to send best and next-best 
route using ADD-PATH.

> [Jim U>] I believe that there should be some determination made as to what 
> to flood.. I wasn't actually thinking of it from a topology sense but from 
> a perspective of which prefixes should get the enhanced service definition 
> above and beyond a traditional internet definition..

Not sure what "enhanced" means in this context, but I as service provider do 
want to send any traffic to an optimum exit points regardless how much 
customer pays because such traffic management allows me better resource 
utilisation. And with that in mind I'd like my route reflectors to advertise 
(flood) what's better from each RR-client's perspective.

> [Jim U>] Hmm I guess I am not understanding this. I am under the 
> impression that path P1 is learned from let's say 25 different peering 
> points.. first let's examine let's examine P1 from Peer A and Peer B where 
> these two peers are close together.. The RR serving these two POPs would 
> forward either P1:NH=A and P1:NH=B to RR1 based on its local BGP Table 
> that has been statically configured (Ouch) .. If RR1 also learns P1:NH=C 
> then you want the RR1 to evaluate the BGP "Cost" and

Where does static configuration come from? NH SAFI allows RR to dynamically 
obtain cost info from each of its clients, and this information may (though 
doesn't have to) be updated when network topology changes. Also note that RR 
learns new info only when client something new to tell, which is normally 
rather infrequent. It's important to realise that RR does not query RR-c 
before sending each prefix and does not wait for client reply (it starts 
using client's perspective for best path selection when it has collected all 
necessary info), respectively client does not need to wait for request from 
RR but can guesstimate which /32 or /128 prefixes in its routeing table are 
potential exit points and send that info to RR.


> select the NH with the best IGP cost.. is that right? Ok So RR1 has 
> statically accumulated the cost from itself to RR and to the egress peer? 
> Is it really your intention to create unique tables at every point in the 
> network which would essentially be an IGP graph of the topology using the 
> local node as the root.. That is an operational nightmare and probably a 
> catastrophe.. Again as I stated before there is no way to adapt to a 
> changing topology due to failure or maintenance.. can you address that..
>

Since RR-c can send info to RR at any time, clearly NH SAFI has built-in 
mechanism to adapt to changing topology. Yes, RR needs to maintain 
per-client cost-to-NH tables, but even for large networks number of entries 
in such tables is very small (<1000 for huge networks, usually less than 50 
for tier-1 class networks). On the other hand number of various best path's 
is even smaller because you'll most likely have groups of clients that share 
same preference for particular next-hops (though their absolute costs to 
such NH may vary). Considering that NH SAFI info is stored only in RAM but 
not in TCAM, and even software-based platforms today have plenty of RAM 
available, I do not foresee NH SAFI to have significant impact on RR 
capacity.


> It is not clear how sub-optimal the selection of P1 with addpaths=2 would 
> be. Certainly there are other metrics i.e AS-PATH length which will 
> immediately paths Again to our ex.. with addpaths=2 in the RR topology you 
> would always send the two best IGP paths based on cost.. If you do this at 
> every point how sub-optimal could it be?  Have you done an analysis of 
> this... That would be a good addition to the draft as it would clear up 
> any confusion in terms of addpaths applicability and the need for this 
> draft..
>

If I peer with ISP1 in 10 locations across the globe, I can have up 10 
potential next-hops. With addpath=2 I get two best paths for RR-c to choose 
from. If one my RR is located in Amsterdam and another in LA, an RR-c in 
Switzerland will get two best paths - one via LA and one via Amsterdam, but 
path to AMS goes via Germany where traffic could exit. With NH SAFI added 
RR-c in Switzerland can tell RR that Frankfurt and Amsterdam are closest so 
both AMS and LA route reflectors will furnish Swiss client again with two 
paths (using ADD-PATH) but this time via Frankfurt and via Amsterdam.

> > The AIGP metric is used to carry the "cost" to the NH of the path..
>

Suppose AIGP info has arrived to RR. Now RR knows in fine details what's 
cost to Prefix A, but sadly that's still from RR perspective. From RR-c 
perspective AIGP has not added any improvement.

> [Jim U>] I thought you wanted an RR-client ( PE ) to select the best 
> egress NH for a given path..Assuming a BGP 3107 design with AIGP ( pseudo 
> IGP )
>

Note that best egress NH is not only for a given path, but for a given path 
from given client perspective. In situation where you have comprehensive 
number of peerings you're still back into square one regarding selection of 
best egress point.

I hope I have addressed your concerns and we could progress with the draft 
adoption.

Kind regards,
iLya