Re: [Idr] Route reflectors [was: Re: Fwd:I-D ACTION:draft-pmohapat-idr-acceptown-community-01.txt]

Brian Dickson <briand@ca.afilias.info> Thu, 01 May 2008 02:50 UTC

Return-Path: <idr-bounces@ietf.org>
X-Original-To: idr-archive@megatron.ietf.org
Delivered-To: ietfarch-idr-archive@core3.amsl.com
Received: from core3.amsl.com (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id B34A53A6B1C; Wed, 30 Apr 2008 19:50:43 -0700 (PDT)
X-Original-To: idr@core3.amsl.com
Delivered-To: idr@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id C84D53A6B1C for <idr@core3.amsl.com>; Wed, 30 Apr 2008 19:50:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.849
X-Spam-Level:
X-Spam-Status: No, score=-5.849 tagged_above=-999 required=5 tests=[AWL=0.750, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id URbfkqCANOHf for <idr@core3.amsl.com>; Wed, 30 Apr 2008 19:50:40 -0700 (PDT)
Received: from mx4.ca.afilias.info (vgateway.libertyrms.info [207.219.45.62]) by core3.amsl.com (Postfix) with ESMTP id 887E93A6A68 for <idr@ietf.org>; Wed, 30 Apr 2008 19:50:40 -0700 (PDT)
Received: from briand-vpn.int.libertyrms.com ([10.1.7.90]) by mx4.ca.afilias.info with esmtp (Exim 4.22) id 1JrOsb-0005lo-4d; Wed, 30 Apr 2008 22:50:41 -0400
Message-ID: <4819306E.8090106@ca.afilias.info>
Date: Wed, 30 Apr 2008 22:52:30 -0400
From: Brian Dickson <briand@ca.afilias.info>
User-Agent: Thunderbird 2.0.0.12 (Windows/20080213)
MIME-Version: 1.0
To: "John G. Scudder" <jgs@juniper.net>
References: <20080425213001.4EB133A69E7@core3.amsl.com> <64E4CA6A-B8E4-4390-BDA6-39EF28E95AEA@tcb.net> <7000E71D8C525042A815432358B2F1240138D45E@paul.adoffice.local.de.easynet.net> <DE879141-E245-4051-A04D-9FF5CF97F892@bgp.nu> <39074353-26E5-4239-A193-E4DD84AE75A0@tcb.net> <014A2382-C5CE-4657-B4DA-FC84D7772359@bgp.nu><4686A93B-EF16-48DC-9775-1BD241575360@tcb.net><4818D897.3070804@cisco.com><DC5EBA07-BBE5-4D6D-9F3E-C40C66ACE34B@tcb.net><4818DB47.8040002@cisco.com><82B7CFF7-86CB-4DC7-BE53-29004128B5CB@tcb.net><4818DCFC.70001@cisco.com> <8D3C8A4B-B80B-4983-8DC7-A142FFA4B41C@tcb.net> <4818ECF1.1030909@juniper.net> <157A9EDC-B512-4C05-A257-72D70DFB44FA@tcb.net> <E41B60B6-6B6C-4EF1-9170-BDAAE23B9D7E@bgp.nu> <4818FEC1.4050308@ca.afilias.info> <80344ED8-5C18-48A7-ABDA-2063A63DEBCB@juniper.net>
In-Reply-To: <80344ED8-5C18-48A7-ABDA-2063A63DEBCB@juniper.net>
X-SA-Exim-Mail-From: briand@ca.afilias.info
X-SA-Exim-Scanned: No; SAEximRunCond expanded to false
Cc: idr idr <idr@ietf.org>, Danny McPherson <danny@tcb.net>
Subject: Re: [Idr] Route reflectors [was: Re: Fwd:I-D ACTION:draft-pmohapat-idr-acceptown-community-01.txt]
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/pipermail/idr>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: idr-bounces@ietf.org
Errors-To: idr-bounces@ietf.org

John G. Scudder wrote:
> I've changed the subject line since we have drifted far afield from 
> discussion of draft-pmohapat-idr-acceptown-community-01.txt.
>
> On Apr 30, 2008, at 7:20 PM, Brian Dickson wrote:
>> John G. Scudder wrote:
>>> On Apr 30, 2008, at 6:39 PM, Danny McPherson wrote:
>>>
>>
>> The red flag has gone up. Using the term "*the* bottleneck" means you 
>> don't understand the nature
>> of complex systems.
>
> Tempting though it may be to pick up the gauntlet and enter into a 
> full scale flame war, I hope you won't mind if I try to confine myself 
> to the substance of your remarks rather than their tone.
>

Thanks. My comment was using "you" in the sense of "one", rather than 
you personally. No flame war was intended. :-)

>> Yes, there will *always* be a bottleneck. However, solving for one 
>> bottleneck
>> will *always* move the bottleneck to some other place. It may be 
>> acceptable for a stable state with
>> several equally-pernicious bottlenecks affecting the system, or to 
>> move the bottleneck to a portion of
>> the solution space that scales best (e.g. one that scales as O(n log 
>> n) vs one that scales as O(n^2).
>
> Yes, this is algorithms 101.

I wish it *were* 101, sadly, I suspect it is more along the lines of 
601, i.e. second year grad level, based on the frequency with which I 
see poorly designed (rather than merely poorly implemented) systems.

> I was making a rather simple point regarding networks as systems, viz 
> the fact that if you've got one RR feeding a large number of clients, 
> the RR usually needs to process a lot more stuff than the clients do.  
> It has N >> 1 peers (the clients and non-client peers); they have M << 
> N peers (the RR, likely a redundant RR, and probably a smaller number 
> of external peers).  Granted that this isn't universally true, but one 
> has to design to something and this would seem to be the common case.

There's more than one way to improve the performance and scalability of 
RR's.

For instance, in most of the networks I've built and run, and many 
others that I'm quite familiar with, there wasn't client-client 
reflection - for both performance concerns, as well as for the 
topological aspects, where in order to guarantee the RR makes a suitably 
sensible choice, it needs to also be in the data path.
If you have clients that see each other directly, and that bypass the 
box that is the RR, then client-client reflection can do seriously weird 
stuff even when you know what you're doing.

And, it also matters where you do policy stuff. If you put your policy 
stuff on the edges of your AS, then the RR's can be almost trivial, 
without touching updates beyond adding/changing the RR-specific bits, 
and possibly seeing further optimizations based on peer-groups.

My experience has been that for similar-class boxes acting as core and 
border routers, and core routers doing double duty as RRs, that you get 
the best bang for the buck when N is about the same as M. And not just 
because of RR scaling issues, but also because of the proportion of 
revenue-generating ports and boxes versus overhead ports and boxes.

>
> Please feel free to redesign route reflection to be optimal in all 
> cases if you have the time to spare!  Absent a really interesting 
> point being raised, I'll probably refrain from further discussion of 
> this topic myself though, since as I mentioned, this is water under 
> the bridge.
>

Ha ha ha! :-)

(I actually did start looking into RR design alternatives, and gained a 
real respect for the original authors of BGP. The implementation is 
easy, only because the underlying design got things right. Any deviation 
from the basics on how BGP works, gets really hairy really fast.)

>> There is something to very much consider, however:
>> RR's are fundamentally required
>
> If you are going to insist on precision in the use of language, you 
> should be careful with expressions like "fundamentally required".

I am (careful, that is).

For IBGP, there are obvious points at which the operational cost and 
overhead of having a full mesh, becomes excessive.
Without RRs, there are O(n^2) peering sessions, all of which are 
required to be properly configured to guarantee things work properly.
For a border router with (worst case) a single backhaul link to the rest 
of the AS, for every inbound EBGP update, there need to be N IBGP updates.
At the observed rate of thrash, for any given pipe size there is a 
corresponding value N at which updates more than fill the pipe, leading 
to congestive network collapse.

And, regardless of network size and thrash rate, as long as the pipe 
size is big enough to handle a single update stream without regularly 
congesting, an RR topology can be built to handle the network size and 
all updates.

That's what I mean when I say "fundamentally required" - that it is both 
necessary and sufficient, for arbitrarily large networks, to have RRs 
deployed to avoid congestive collapse.

But, I'm positive you already know all of what I just wrote.
(The words were for the benefit of anyone else that might wonder what I 
meant by "fundamentally required".)

>
>> (Never mind that I'm working on something to *really* mess with those 
>> same core BGP protocols...
>
> Yes, the first draft was interesting and I look forward to seeing the 
> next draft.
>

Thanks. (I've been trying to get permission from the authors of the 
"add-paths" draft to re-use it and augment it.)

>> but with good reason. :-) )
>
> That remains to be proven though it's an intriguing hypothesis.

I agree, and I hope to be able to produce convincing demonstrations of 
it, in due time.

I realize that until something of substance is available, it's academic; 
it took a bit of work to convince myself that it worked.

Brian
_______________________________________________
Idr mailing list
Idr@ietf.org
https://www.ietf.org/mailman/listinfo/idr