[Idr] Re: BGP issues

Iljitsch van Beijnum <iljitsch@muada.com> Fri, 11 January 2008 16:29 UTC

Return-path: <idr-bounces@ietf.org>
Received: from [] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1JDMkq-00066H-Fd; Fri, 11 Jan 2008 11:29:12 -0500
Received: from [] (helo=chiedprmail1.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1JDMko-000666-PE for idr@ietf.org; Fri, 11 Jan 2008 11:29:10 -0500
Received: from sequoia.muada.com ([]) by chiedprmail1.ietf.org with esmtp (Exim 4.43) id 1JDMkn-0002b7-PX for idr@ietf.org; Fri, 11 Jan 2008 11:29:10 -0500
Received: from [IPv6:2001:720:410:1001:21b:63ff:fe92:9fbb] ([IPv6:2001:720:410:1001:21b:63ff:fe92:9fbb]) (authenticated bits=0) by sequoia.muada.com (8.13.3/8.13.3) with ESMTP id m0BGT0Tr086168 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Fri, 11 Jan 2008 17:29:01 +0100 (CET) (envelope-from iljitsch@muada.com)
Message-Id: <7006D2FB-DE3D-4DB2-80FF-B16D21C2347B@muada.com>
From: Iljitsch van Beijnum <iljitsch@muada.com>
To: raszuk@juniper.net
In-Reply-To: <4784C5EF.1070806@juniper.net>
Content-Type: text/plain; charset="US-ASCII"; format="flowed"; delsp="yes"
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v915)
Date: Fri, 11 Jan 2008 17:29:01 +0100
References: <200801061814.m06IE8920382@magenta.juniper.net> <22AB40E0-1660-4B9D-BA74-B1CB98EB0882@cisco.com> <47813E05.2050602@juniper.net> <A627A8DF-42D2-4701-A6D5-1C8102537A41@cisco.com> <4781FFEA.8050800@juniper.net> <4782170D.2040200@cisco.com> <478252F3.4000809@juniper.net> <4782D04B.50703@gmail.com> <47832402.9090001@cisco.com> <47833D03.7050005@juniper.net> <4783E670.5080003@gmail.com> <4783EA14.2030900@juniper.net> <4783F0B8.3070401@gmail.com> <4783F4A8.2000506@juniper.net> <792E56AD-25FB-4942-B649-FAF12DBEFD77@muada.com> <4784C5EF.1070806@juniper.net>
X-Mailer: Apple Mail (2.915)
X-Spam-Status: No, score=-2.4 required=3.5 tests=AWL,BAYES_00 autolearn=ham version=3.0.2
X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on sequoia.muada.com
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 0770535483960d190d4a0d020e7060bd
Cc: idr <idr@ietf.org>
Subject: [Idr] Re: BGP issues
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/idr>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
Errors-To: idr-bounces@ietf.org

On 9 jan 2008, at 14:02, Robert Raszuk wrote:

> As we are discussing BGP i am redirecting this to idr list for IDR  
> WG to comment or work on.

Ah, I saw that you wrote a reply but I couldn't find it at first.  :-)

>> In addition to the fact that it's possible to configure iBGP such  
>> that you can have routing loops?

> Could you provide an example ?

R1e - R2i - R3i - R4e

If R1 and R4 both prefer their eBGP path towards a certain destination  
and then R2 prefers the path over R4 while R3 prefers the path over  
R1, packets will loop between R2 and R3. Note that this configuration  
only happens when applying policy to iBGP sessios or in unfortunate  
route reflector placement.

>> it suffers from some of the problems inherent to distance vector
>> routing, such as slow convergence

> BGP can converge today in subseconds.

Sure, but it can also take many minutes, especially with a min-route- 
advertisement timer in effect in many places and if the update is a  
withdrawl. Now I've argued that this shouldn't be too problematic  
because if you're revoking your advertisement that means you're  
unreachable anyway, but not everyone agrees with that. (Not good if  
you are revoking a more specific and you want the aggregate to attract  
the traffic.)

One reason that BGP is getting chattier is because updates tend to be  
multiplied if there are several interconnects between two ASes.

> > Another problem area for
>> BGP is the fact that all processing happens on a per-prefix basis:  
>> there
>> is no way to communicate reachability or policy changes except to  
>> update
>> all impacted prefixes.

> Not true. There is proposal for BGP aggregate withdraw.

I'm talking about what's out there, not about what's proposed. Very  
little of what is proposed makes it to wide deployment.

>> BGP is extremely agnostic as to the underlying
>> path selection algorithm in order to accommodate as much policy  
>> control
>> as possible.

> Accommodating policies is not a BGP protocol requirement but it  
> comes from operators using this protocol to fit their operational  
> needs.

Sure, but in my opinion, BGP takes it a bit too far, it's too easy to  
shoot yourself in the foot. Or, in other words: a routing protocol  
that allows loops is doing something wrong.

>> Unfortunately, this makes it very hard to predict BGP's
>> behavior and the default behavior (especially with today's rather  
>> flat
>> AS hierarchy) is more often than not suboptimal. BGP allows harmful
>> policies that keep the protocol from converging to a stable state.

> Again this is not BGP which allows to configure harmful policies.  
> Policies can be harmful indeed as there is no mechanism or in any  
> way control on them. If in IGP you block LSA flooding on all links  
> of the other routers in IGP peers would you call this an OSPF fault  
> that your router is somehow not seen by the rest of them ?

The difference is that in OSPF filtering LSAs is not supported by  
reasonable implementations because that is incompatible with the SPF  
architecture, while in BGP policies are fully supported by design,  
even some harmful ones.

> > Lack
>> of workable aggregation mechanisms means that once an address block  
>> is
>> deaggregated, it's almost impossible to get rid of the resulting long
>> prefixes, leading to excessive growth of the internet's global  
>> routing
>> table.

> Correct. I think we agree that scalable multi-homing should be solved.

And scalable traffic engineering. Remember that at least 200000  
prefixes in the DFZ are NOT the result of multihoming using the one  
AS / one prefix model.

>> Coarseness of the only available end-to-end metric (the AS path)
>> pushes operators to deaggregation for traffic engineering purposes.

> What other metrics would you recommend to add to BGP and how that  
> would that reduce deaggregation ?

I would like to see a value that is increased by every AS (twice, for  
good measure: once on reception, once on transmission), but not by  
one, but by a value (say) between 1 and 127, with a default in the  
middle (11). So where you'd see an AS path length 3 today, you'd see a  
metric value 66 if nobody in the path changed the default. But if  
someone wants to make this path just a little better or a little  
worse, they can make it 65 or 67. This allows finer grained control  
that the AS path so it reduces the need for deaggregation for TE  

>> The
>> way BGP operates within a single AS requires an additional intra- 
>> domain
>> routing protocol

> IMHO this is a feature not a bug. In fact for convergence reasons  
> this is a very useful feature to have fast IGP.

It also adds tons of complexity, especially if you don't want to have  
a full iBGP mesh. I'm not saying you shouldn't have an IGP, I'm saying  
you shouldn't _have_ to have an IGP.

>> There is no validation of routing
>> information beyond the next hop.

> Are you alluding to S-BGP/so-BGP proposals ?

Yes, to their problem statement, at least. Not sure if drowning  
everything in sticky crypto syrup is the best solution, though.

> > Paths must be explicitly
>> revoked, which in practice requires a BGP speaker to keep track of  
>> which
>> paths were communicated to which peer.

> Not today. Today you revoke the prefix not the path.

Path in the sense of "a prefix learned over a specific eBGP session".

> When the ability to send more then best path is added true .. paths  
> would have to be explicitly revoked. Any other proposal in RRG does  
> the very same.

It would be better if implementations wouldn't have to keep track of  
what they sent where. However, this is probably relatively easy to fix  
("start now" .... "forget all the prefixes that I haven't refreshed  
since "start now"") but the fix will probably be hard to deploy.

>> BGP requires fairly extensive
>> configuration (setting up filters) before it's useful.

> It could be just one line if you like to send/accept everything to/ 
> from your peer.

Yes, and I'm happy this setup is so easy to configure every single  
time I have any use for it.  </sarcasm>


Idr mailing list