RE: [Idr] Re: BGP issues

"Rajiv Asati (rajiva)" <rajiva@cisco.com> Fri, 11 January 2008 17:54 UTC

Return-path: <idr-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1JDO5W-0003Rj-2G; Fri, 11 Jan 2008 12:54:38 -0500
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1JDO5S-0003Ra-Qe for idr@ietf.org; Fri, 11 Jan 2008 12:54:36 -0500
Received: from rtp-iport-1.cisco.com ([64.102.122.148]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1JDO5R-0001n1-MW for idr@ietf.org; Fri, 11 Jan 2008 12:54:34 -0500
X-IronPort-AV: E=Sophos;i="4.24,272,1196658000"; d="scan'208";a="83071623"
Received: from rtp-dkim-1.cisco.com ([64.102.121.158]) by rtp-iport-1.cisco.com with ESMTP; 11 Jan 2008 12:54:12 -0500
Received: from rtp-core-1.cisco.com (rtp-core-1.cisco.com [64.102.124.12]) by rtp-dkim-1.cisco.com (8.12.11/8.12.11) with ESMTP id m0BHsCml013132; Fri, 11 Jan 2008 12:54:12 -0500
Received: from xbh-rtp-201.amer.cisco.com (xbh-rtp-201.cisco.com [64.102.31.12]) by rtp-core-1.cisco.com (8.12.10/8.12.6) with ESMTP id m0BHro77016967; Fri, 11 Jan 2008 17:54:08 GMT
Received: from xmb-rtp-20b.amer.cisco.com ([64.102.31.53]) by xbh-rtp-201.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.1830); Fri, 11 Jan 2008 12:54:00 -0500
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: quoted-printable
Subject: RE: [Idr] Re: BGP issues
Date: Fri, 11 Jan 2008 12:53:18 -0500
Message-ID: <15B86BC7352F864BB53A47B540C089B604B69BF1@xmb-rtp-20b.amer.cisco.com>
In-Reply-To: <7006D2FB-DE3D-4DB2-80FF-B16D21C2347B@muada.com>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: [Idr] Re: BGP issues
Thread-Index: AchUbywJMWRu9O/5RtuDC6cErIDU7wAA49tQ
From: "Rajiv Asati (rajiva)" <rajiva@cisco.com>
To: Iljitsch van Beijnum <iljitsch@muada.com>, raszuk@juniper.net
X-OriginalArrivalTime: 11 Jan 2008 17:54:00.0290 (UTC) FILETIME=[F2002C20:01C8547A]
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; l=11172; t=1200074052; x=1200938052; c=relaxed/simple; s=rtpdkim1001; h=Content-Type:From:Subject:Content-Transfer-Encoding:MIME-Version; d=cisco.com; i=rajiva@cisco.com; z=From:=20=22Rajiv=20Asati=20(rajiva)=22=20<rajiva@cisco.com > |Subject:=20RE=3A=20[Idr]=20Re=3A=20BGP=20issues |Sender:=20 |To:=20=22Iljitsch=20van=20Beijnum=22=20<iljitsch@muada.com >,=20<raszuk@juniper.net>; bh=oRC4SqYKkCo82wVqKKjDYM/ELWANeOjRmq1sn1eLvDg=; b=KZ2XqZfELZFohdKuU/R4qWJm7pnkALRJVCKonsRWxURoLgYofxKO1CEhwi 0ri1zB2nnDPdVnO6YdfW56wfjgO59tTy4tMBQwW3oCOzYHoVNMa6nyDY0g66 bDHzt3RDbV;
Authentication-Results: rtp-dkim-1; header.From=rajiva@cisco.com; dkim=pass ( sig from cisco.com/rtpdkim1001 verified; );
X-Spam-Score: -4.0 (----)
X-Scan-Signature: 10d2fdecab7a7fa796e06e001d026c91
Cc: idr <idr@ietf.org>
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/idr>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
Errors-To: idr-bounces@ietf.org

Hi Iljitsch,

Good points. However, we should not mix up 'what's deployed' with
'what's a allowed by the protocol'.

By all means, the BGP traffic engineering and BGP deaggregation points
are worthy of further discussion. 

> worse, they can make it 65 or 67. This allows finer grained control  
> that the AS path so it reduces the need for deaggregation for TE  
> purposes.

I think that you are alluding to enabling each transient AS to attach
its relevant weight/preference for each BGP prefix. This furhter enables
each AS to get end-to-end metric for each prefix with or without the
AS_PATH. Correct?

While I agree that it may help wrt TE, but it doesn't fully solve the
deaggregation. For example, a network may advertise one chunk of
prefixes (bunch of /24s) to one AS and antoher chunk (bunch of /24s) to
another AS to receive the traffic accordingly from different Ases,
though the network could have advertised one aggregate /23 (say).

> >> There is no validation of routing
> >> information beyond the next hop.
> 
> > Are you alluding to S-BGP/so-BGP proposals ?
> 
> Yes, to their problem statement, at least. Not sure if drowning  
> everything in sticky crypto syrup is the best solution, though.

I agree with you to this point. Perhaps, you would prefer something like
the IRV -
http://www3.ietf.org/proceedings/02nov/slides/rpsec-6/index.html


Regarding other points, please see inline,

> >> In addition to the fact that it's possible to configure iBGP such  
> >> that you can have routing loops?

Your example to create routing loop in BGP by (mis)using policies is
somewhat valid.  But, one can create loops by just configuring static
routes. It is quite subjective to characterize what's easy to shoot and
what's not.

BGP policies provide a toolkit to help making educated designs. Yes,
they are good if used appropriately, and bad, if not used appropriately.
That's why there are guidelines. The policies are meant to aid the one's
preference of tweaking the result of protocol at one's own expense. 

I think that the better question would be - should the protocol
claim/advertise the reachability for a certain prefix, when the
reachability may not exist in the forwarding plane? 


> The difference is that in OSPF filtering LSAs is not supported by  
> reasonable implementations because that is incompatible with the SPF  
> architecture, while in BGP policies are fully supported by design,  
> even some harmful ones.

This is an apple to orange comparision. Firstly, the OSPF specification
(RFC2328 section 3) allows for unidentical link-state database in an AS
(due to introduction of areas).

Secondly, independent of filtering, one can infuse a policy to deny OSPF
to install a route in the routing table. So, while the neighbors can
send the traffic for that route, the router will either drop it or
forward it as per the less specific route. One may end up with loop.
Again, nothing to do with protocol itself.

> Sure, but it can also take many minutes, especially with a min-route- 
> advertisement timer in effect in many places and if the update is a  
> withdrawl. Now I've argued that this shouldn't be too problematic  

Well, the min-route-advertisement timer is meant to help in reacting to
every routing change faster or slower. Well, one may argue that it is
advantageous to tune the timer according to one's preference and one's
understanding of the network(s).


> It also adds tons of complexity, especially if you don't want 
> to have  
> a full iBGP mesh. I'm not saying you shouldn't have an IGP, 
> I'm saying you shouldn't _have_ to have an IGP.

Agreed. 
BGP mustn't mandate the usage of IGP, and it doesn't.
One can use static routes to reach the next-hops, and obviate IGP
altogether.
~~~~~~~~~~~~~

Cheers,
Rajiv

> -----Original Message-----
> From: Iljitsch van Beijnum [mailto:iljitsch@muada.com] 
> Sent: Friday, January 11, 2008 11:29 AM
> To: raszuk@juniper.net
> Cc: idr
> Subject: [Idr] Re: BGP issues
> 
> On 9 jan 2008, at 14:02, Robert Raszuk wrote:
> 
> > As we are discussing BGP i am redirecting this to idr list for IDR  
> > WG to comment or work on.
> 
> Ah, I saw that you wrote a reply but I couldn't find it at first.  :-)
> 
> >> In addition to the fact that it's possible to configure iBGP such  
> >> that you can have routing loops?
> 
> > Could you provide an example ?
> 
> R1e - R2i - R3i - R4e
> 
> If R1 and R4 both prefer their eBGP path towards a certain 
> destination  
> and then R2 prefers the path over R4 while R3 prefers the path over  
> R1, packets will loop between R2 and R3. Note that this 
> configuration  
> only happens when applying policy to iBGP sessios or in unfortunate  
> route reflector placement.
> 
> >> it suffers from some of the problems inherent to distance vector
> >> routing, such as slow convergence
> 
> > BGP can converge today in subseconds.
> 
> Sure, but it can also take many minutes, especially with a min-route- 
> advertisement timer in effect in many places and if the update is a  
> withdrawl. Now I've argued that this shouldn't be too problematic  
> because if you're revoking your advertisement that means you're  
> unreachable anyway, but not everyone agrees with that. (Not good if  
> you are revoking a more specific and you want the aggregate 
> to attract  
> the traffic.)
> 
> One reason that BGP is getting chattier is because updates 
> tend to be  
> multiplied if there are several interconnects between two ASes.
> 
> > > Another problem area for
> >> BGP is the fact that all processing happens on a 
> per-prefix basis:  
> >> there
> >> is no way to communicate reachability or policy changes except to  
> >> update
> >> all impacted prefixes.
> 
> > Not true. There is proposal for BGP aggregate withdraw.
> 
> I'm talking about what's out there, not about what's proposed. Very  
> little of what is proposed makes it to wide deployment.
> 
> >> BGP is extremely agnostic as to the underlying
> >> path selection algorithm in order to accommodate as much policy  
> >> control
> >> as possible.
> 
> > Accommodating policies is not a BGP protocol requirement but it  
> > comes from operators using this protocol to fit their operational  
> > needs.
> 
> Sure, but in my opinion, BGP takes it a bit too far, it's too 
> easy to  
> shoot yourself in the foot. Or, in other words: a routing protocol  
> that allows loops is doing something wrong.
> 
> >> Unfortunately, this makes it very hard to predict BGP's
> >> behavior and the default behavior (especially with today's rather  
> >> flat
> >> AS hierarchy) is more often than not suboptimal. BGP allows harmful
> >> policies that keep the protocol from converging to a stable state.
> 
> > Again this is not BGP which allows to configure harmful policies.  
> > Policies can be harmful indeed as there is no mechanism or in any  
> > way control on them. If in IGP you block LSA flooding on all links  
> > of the other routers in IGP peers would you call this an 
> OSPF fault  
> > that your router is somehow not seen by the rest of them ?
> 
> The difference is that in OSPF filtering LSAs is not supported by  
> reasonable implementations because that is incompatible with the SPF  
> architecture, while in BGP policies are fully supported by design,  
> even some harmful ones.
> 
> > > Lack
> >> of workable aggregation mechanisms means that once an 
> address block  
> >> is
> >> deaggregated, it's almost impossible to get rid of the 
> resulting long
> >> prefixes, leading to excessive growth of the internet's global  
> >> routing table.
> 
> > Correct. I think we agree that scalable multi-homing should 
> be solved.
> 
> And scalable traffic engineering. Remember that at least 200000  
> prefixes in the DFZ are NOT the result of multihoming using the one  
> AS / one prefix model.
> 
> >> Coarseness of the only available end-to-end metric (the AS path)
> >> pushes operators to deaggregation for traffic engineering purposes.
> 
> > What other metrics would you recommend to add to BGP and how that  
> > would that reduce deaggregation ?
> 
> I would like to see a value that is increased by every AS 
> (twice, for  
> good measure: once on reception, once on transmission), but not by  
> one, but by a value (say) between 1 and 127, with a default in the  
> middle (11). So where you'd see an AS path length 3 today, 
> you'd see a  
> metric value 66 if nobody in the path changed the default. But if  
> someone wants to make this path just a little better or a little  
> worse, they can make it 65 or 67. This allows finer grained control  
> that the AS path so it reduces the need for deaggregation for TE  
> purposes.
> 
> >> The
> >> way BGP operates within a single AS requires an additional intra- 
> >> domain
> >> routing protocol
> 
> > IMHO this is a feature not a bug. In fact for convergence reasons  
> > this is a very useful feature to have fast IGP.
> 
> It also adds tons of complexity, especially if you don't want 
> to have  
> a full iBGP mesh. I'm not saying you shouldn't have an IGP, 
> I'm saying  
> you shouldn't _have_ to have an IGP.
> 
> >> There is no validation of routing
> >> information beyond the next hop.
> 
> > Are you alluding to S-BGP/so-BGP proposals ?
> 
> Yes, to their problem statement, at least. Not sure if drowning  
> everything in sticky crypto syrup is the best solution, though.
> 
> > > Paths must be explicitly
> >> revoked, which in practice requires a BGP speaker to keep 
> track of  
> >> which
> >> paths were communicated to which peer.
> 
> > Not today. Today you revoke the prefix not the path.
> 
> Path in the sense of "a prefix learned over a specific eBGP session".
> 
> > When the ability to send more then best path is added true 
> .. paths  
> > would have to be explicitly revoked. Any other proposal in 
> RRG does  
> > the very same.
> 
> It would be better if implementations wouldn't have to keep track of  
> what they sent where. However, this is probably relatively 
> easy to fix  
> ("start now" .... "forget all the prefixes that I haven't refreshed  
> since "start now"") but the fix will probably be hard to deploy.
> 
> >> BGP requires fairly extensive
> >> configuration (setting up filters) before it's useful.
> 
> > It could be just one line if you like to send/accept everything to/ 
> > from your peer.
> 
> Yes, and I'm happy this setup is so easy to configure every single  
> time I have any use for it.  </sarcasm>
> 
> Iljitsch
> 
> _______________________________________________
> Idr mailing list
> Idr@ietf.org
> https://www1.ietf.org/mailman/listinfo/idr
> 

_______________________________________________
Idr mailing list
Idr@ietf.org
https://www1.ietf.org/mailman/listinfo/idr