Re: [Idr] draft on virtual aggregation

Paul Francis <francis@cs.cornell.edu> Fri, 11 July 2008 10:39 UTC

Return-Path: <idr-bounces@ietf.org>
X-Original-To: idr-archive@megatron.ietf.org
Delivered-To: ietfarch-idr-archive@core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id E865F3A69D1; Fri, 11 Jul 2008 03:39:49 -0700 (PDT)
X-Original-To: idr@core3.amsl.com
Delivered-To: idr@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 509713A6B05 for <idr@core3.amsl.com>; Fri, 11 Jul 2008 03:39:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.258
X-Spam-Level:
X-Spam-Status: No, score=-6.258 tagged_above=-999 required=5 tests=[AWL=0.341, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DOOWF02SFGZz for <idr@core3.amsl.com>; Fri, 11 Jul 2008 03:39:47 -0700 (PDT)
Received: from exch-hub2.cs.cornell.edu (mail-hub-2.cs.cornell.edu [128.84.103.139]) by core3.amsl.com (Postfix) with ESMTP id 70E973A69D1 for <idr@ietf.org>; Fri, 11 Jul 2008 03:39:47 -0700 (PDT)
Received: from EXCHANGE1.cs.cornell.edu (128.84.96.42) by mail-hub.cs.cornell.edu (128.84.96.245) with Microsoft SMTP Server id 8.0.813.0; Fri, 11 Jul 2008 06:40:04 -0400
Received: from EXCHANGE2.cs.cornell.edu ([128.84.96.44]) by EXCHANGE1.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.3959); Fri, 11 Jul 2008 06:40:03 -0400
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-Class: urn:content-classes:message
MIME-Version: 1.0
Date: Fri, 11 Jul 2008 06:39:57 -0400
Message-ID: <37BC8961A005144C8F5B8E4AD226DE1109D860@EXCHANGE2.cs.cornell.edu>
In-Reply-To: <4877053B.2060403@juniper.net>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: [Idr] draft on virtual aggregation
Thread-Index: AcjjJVnkmnGxLlPBQB++EMttPjDigAAGrILg
References: Your message of "Mon, 07 Jul 2008 12:52:53 EDT." <37BC8961A005144C8F5B8E4AD226DE1109D823@EXCHANGE2.cs.cornell.edu> <200807091459.m69ExflG034874@harbor.brookfield.occnc.com> <37BC8961A005144C8F5B8E4AD226DE1109D856@EXCHANGE2.cs.cornell.edu> <4877053B.2060403@juniper.net>
From: Paul Francis <francis@cs.cornell.edu>
To: raszuk@juniper.net
X-OriginalArrivalTime: 11 Jul 2008 10:40:03.0665 (UTC) FILETIME=[7A24FC10:01C8E342]
Cc: idr@ietf.org
Subject: Re: [Idr] draft on virtual aggregation
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/pipermail/idr>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: idr-bounces@ietf.org
Errors-To: idr-bounces@ietf.org


> -----Original Message-----
> From: Robert Raszuk [mailto:raszuk@juniper.net]
> Sent: Friday, July 11, 2008 3:01 AM
> To: Paul Francis
> Cc: curtis@occnc.com; idr@ietf.org
> Subject: Re: [Idr] draft on virtual aggregation
> 
> Hi Paul,
> 
>  > In the meantime, however, FIB size alone is an immediate problem for
> a
>  > lot of ISPs, because it is specifically this that forces them to
>  > upgrade hardware when they otherwise don't need to do it.
> 
> As Curtis pointed out deployment of any form of tunneling in the core
> MPLS or IP ultimately addresses the FIB scaling of that part of the
> networks.

Yes, but only that part of the network.  As I said, from the ISPs I've talked
to, the issue is more in the core than the edge.

> 
> True that some networks do not have the core ... the network can be
> meshed edges or more specifically meshed POPs.
> 
> A very simple observation can be made that you can use a tunnel from
> the
> edge to one router (per POP for example) do IP lookup then encapsulate
> to the exit point. In that scenario your edge routers are free from
> carrying full table and due to shift in the place where single IP
> lookup
> is done and switching decision determined.

This would overload that one router, as well as create a point of failure.
So you'd need to increase the capacity of that router, plus replicate it for
failure.  So in essence you are designing a core architecture.  The point
behind VA is that you don't force ISPs to change their architecture and
upgrade their routers to deal with table growth.  Plus anyway you can't do
this for your edge routers that peer with ISPs that require the full routing
table, so this fix is quite limited.

> 
> It is clearly not true that vendors today have any issues in delivering
> boxes which could keep today's Internet table and at least allow for
> 5-10 time it's grow. I know at least two of them which have been
> shipping such routers for few years now.

Yes I understand that there are routers that can hold 5x or 10x the current
table.  Your company makes them.  How long will these routers last, given the
"end-game" of IPv4 address allocation?  5 years ago the new generation of
routers looked like they could handle a lot of table growth, but now they
have run out of space and yet ISPs want to keep them.  Basically you are
telling them that the solution is simple---buy your latest product.  And to
not worry about growth because you'll have yet another product for them a few
years down the road.  This is exactly what this draft is trying to avoid.

> 
> And such architecture does not require dividing address space in any
> chunks and can be deployed today on any exiting hardware without
> waiting
> for any new protocol extensions.

I'm not sure what you mean by "such architecture".  But many existing routers
in the field today cannot realistically use the kind of trick you mention
above to manage FIB size.  Rather, they resort to simply dropping some
fraction of routes, for instance.

To be clear, we are talking about one new attribute, zero changes to the data
plane, zero changes to the existing BGP decision process....just some rules
for automatically setting up tunnels and new address aggregates (virtual
prefixes).  Better to do this now well before the next generation of routers
runs out of FIB.

PF


> 
> Cheers,
> R.
> 
> 
> > It is true that there are a number of routing problems that
> ultimately need
> > to be solved...RIB and FIB size, convergence time, security, multi-
> homing.
> > RRG is working on a single grand solution to all of this, but I'm not
> holding
> > my breath.
> >
> > In the meantime, however, FIB size alone is an immediate problem for
> a lot of
> > ISPs, because it is specifically this that forces them to upgrade
> hardware
> > when they otherwise don't need to do it.  This is evidenced by the
> various
> > hacks that ISPs currently employ to extend the life of their routers.
> One is
> > the "disconnected backbone" arrangement that Dan Ginsburg mentioned
> in his
> > email.  Another is to simply ignore /24's that you don't have enough
> space
> > for (see for instance
> > http://mailman.apnic.net/mailing-
> lists/pacnog/archive/2004/12/msg00000.html).
> > I'm even familiar with a Tier1 ISP (I'm not at liberty to say who)
> that is
> > actually using a hack that messes up the AS-path, and could create
> loops if
> > another ISP did the same thing).
> >
> > So this is already a problem that needs fixing.  Worse, as IPv4
> addresses run
> > out, there is real concern that addresses will become less
> aggregatable, and
> > I'd like to have something in place should that happen.  VA
> represents about
> > the simplest architecturally sound solution I can imagine.
> >
> > I guess you are suggesting that, since RAM is "cheap and dense", FIB
> really
> > isn't the problem?  I personally don't know router architecture very
> well,
> > but if what you say is true, why don't router vendors simply build
> routers
> > with more FIB?  I suppose you could argue that they try to, but that
> they
> > under-estimate DFRT growth?  This doesn't sound likely.  Tony Li has
> argued
> > (see RFC4984) that because router memories are built in volume,
> Moore's law
> > doesn't really apply.  This all suggests to me that there is a real
> cost to
> > huge FIBs.
> >
> > As for operational complexity, I don't think that you or I know if it
> is
> > "too" much or not.  Really it is a question of the value ISPs get out
> of this
> > versus the difficulty of doing it.  I'm not going to pretend that
> running VA
> > is trivial, but nor does it strike me as all that bad.  It strikes me
> for
> > instance as simpler than route reflectors (though that ain't saying
> much!).
> > And given that ISPs already deal with config complexity in the hacks
> I
> > mention above, there is a good chance in my mind that VA will be an
> > acceptable solution.
> >
> > Regarding some specific points in your email:
> >
> > Regarding core-router MPLS fix:  To be clear, VA isn't about reducing
> *core*
> > router FIB size, it is about reducing FIB size in *any* routers, all
> of them
> > if that's what you want to do.  Since most routers aren't core
> routers,
> > solutions that only address core routers are only so useful.  I've
> been told
> > by folks at ISPs that the greater concern are edge routers.
> >
> > Regarding on-chip FIBs:  I've been told by Tony Li that he believes
> he can
> > fit 200K FIB entries on-chip.  VA can get you 5x reduction very
> easily, and
> > with some deployment creativity (which would take time and experience
> to
> > development) I'll be we could do much much better better.  But that
> is
> > somewhat besides the point...the goal here is not to get to single-
> chip
> > routers (though that might be where this leads us anyway), but to
> allow ISPs
> > to extend the lifetime of their routers, which is something that at
> least
> > some of them clearly want to be able to do (see below).
> >
> > Thanks,
> >
> > PF
> >
> >
> >
> >> -----Original Message-----
> >> From: curtis@occnc.com [mailto:curtis@occnc.com]
> >> Sent: Wednesday, July 09, 2008 11:00 AM
> >> To: Paul Francis
> >> Cc: idr@ietf.org
> >> Subject: Re: [Idr] draft on virtual aggregation
> >>
> >>
> >> In message
> >> <37BC8961A005144C8F5B8E4AD226DE1109D823@EXCHANGE2.cs.cornell.edu>
> >> Paul Francis writes:
> >>>
> >>> Gang,
> >>>
> >>> At the following URL is a draft on virtual aggregation that I'm
> >> posting to
> >>> IETF (it'll show up in a day or two), and which I'll present at IDR
> >> in
> >>> Dublin.
> >>>
> >>>  http://www.cs.cornell.edu/people/francis/draft-francis-idr-intra-
> va-
> >> 00.txt
> >>> Title and abstract are below.  I hope to create a work item on this
> >> in IDR.
> >>> I would characterize this as falling under the general charter of
> >> scaling
> >>> BGP.
> >>>
> >>> Any comments and discussion on this prior to Dublin is of course
> >> greatly
> >>> appreciated.
> >>>
> >>> PF
> >>>
> >>>
> >>> Title:  Intra-Domain Virtual Aggregation
> >>>
> >>>
> >>>    Virtual Aggregation (VA) is a technique for shrinking the DFZ
> FIB
> >>>    size in routers (both IPv4 and IPv6).  This allows ISPs to
> extend
> >> the
> >>>    lifetime of existing routers, and allows router vendors to build
> >> FIBs
> >>>    with much less concern about the growth of the DFZ routing
> table.
> >> VA
> >>>    does not shrink the size of the RIB.  VA may be deployed
> >> autonomously
> >>>    by an ISP (cooperation between ISPs is not required).  While VA
> >> can
> >>>    be deployed without changes to existing routers, doing so
> requires
> >>>    significant new management tasks.  This document describes
> changes
> >> to
> >>>    routers and BGP that greatly simplify the operation of VA.
> >>>
> >>> _______________________________________________
> >>> Idr mailing list
> >>> Idr@ietf.org
> >>> https://www.ietf.org/mailman/listinfo/idr
> >>
> >>
> >> Paul,
> >>
> >> Is there a need?
> >>
> >>   Are we still trying to do the equivalent of keeping an AGS+ with
> >>   DFRT alive somewhere?  RAM is cheap and dense.  To get to on-chip
> >>   RAM would require orders of magnitude reductions in DFRT size.
> >>
> >>   Other techniques exist for dramatically reducing core router FIB
> >>   size if that becomes a goal for a provider.
> >>
> >>   For example, MPLS (or GRE) tunneling through a BGP free core
> reduces
> >>   FIB size to about the size of the IGP (should easily fit in on-
> chip
> >>   memory).  It requires no protocol change.  Only down side is no
> ICMP
> >>   when tunnel faults in middle prior to the ingress knowing about it
> >>   (usually the case anyway due to VPN and VRF) and no fallback to IP
> >>   when ingress knows that the tunnel is down and hasn't yet
> rerouted.
> >>
> >> Is the solution worse than the problem?
> >>
> >>   This seems too operationally problematic.
> >>
> >> Curtis
> > _______________________________________________
> > Idr mailing list
> > Idr@ietf.org
> > https://www.ietf.org/mailman/listinfo/idr
> >

_______________________________________________
Idr mailing list
Idr@ietf.org
https://www.ietf.org/mailman/listinfo/idr