Re: [Idr] draft on virtual aggregation

Paul Francis <francis@cs.cornell.edu> Thu, 10 July 2008 14:27 UTC

Return-Path: <idr-bounces@ietf.org>
X-Original-To: idr-archive@megatron.ietf.org
Delivered-To: ietfarch-idr-archive@core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id B97E03A68B7; Thu, 10 Jul 2008 07:27:33 -0700 (PDT)
X-Original-To: idr@core3.amsl.com
Delivered-To: idr@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 4EAB43A6912 for <idr@core3.amsl.com>; Thu, 10 Jul 2008 07:27:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.209
X-Spam-Level:
X-Spam-Status: No, score=-6.209 tagged_above=-999 required=5 tests=[AWL=0.390, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zjwph2dGPbNT for <idr@core3.amsl.com>; Thu, 10 Jul 2008 07:27:26 -0700 (PDT)
Received: from exch-hub2.cs.cornell.edu (mail-hub-2.cs.cornell.edu [128.84.103.139]) by core3.amsl.com (Postfix) with ESMTP id 0A2B53A6827 for <idr@ietf.org>; Thu, 10 Jul 2008 07:27:25 -0700 (PDT)
Received: from EXCHANGE1.cs.cornell.edu (128.84.96.42) by mail-hub.cs.cornell.edu (128.84.96.245) with Microsoft SMTP Server id 8.0.813.0; Thu, 10 Jul 2008 10:27:40 -0400
Received: from EXCHANGE2.cs.cornell.edu ([128.84.96.44]) by EXCHANGE1.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.3959); Thu, 10 Jul 2008 10:27:39 -0400
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-Class: urn:content-classes:message
MIME-Version: 1.0
Date: Thu, 10 Jul 2008 10:27:37 -0400
Message-ID: <37BC8961A005144C8F5B8E4AD226DE1109D856@EXCHANGE2.cs.cornell.edu>
In-Reply-To: <200807091459.m69ExflG034874@harbor.brookfield.occnc.com>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: [Idr] draft on virtual aggregation
Thread-Index: Acjh1nwUHSFuo+CCQWeimIcSqRSfRAAvOhzg
References: Your message of "Mon, 07 Jul 2008 12:52:53 EDT." <37BC8961A005144C8F5B8E4AD226DE1109D823@EXCHANGE2.cs.cornell.edu> <200807091459.m69ExflG034874@harbor.brookfield.occnc.com>
From: Paul Francis <francis@cs.cornell.edu>
To: curtis@occnc.com
X-OriginalArrivalTime: 10 Jul 2008 14:27:39.0824 (UTC) FILETIME=[1B6FBB00:01C8E299]
Cc: idr@ietf.org
Subject: Re: [Idr] draft on virtual aggregation
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/pipermail/idr>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: idr-bounces@ietf.org
Errors-To: idr-bounces@ietf.org

It is true that there are a number of routing problems that ultimately need
to be solved...RIB and FIB size, convergence time, security, multi-homing.
RRG is working on a single grand solution to all of this, but I'm not holding
my breath.

In the meantime, however, FIB size alone is an immediate problem for a lot of
ISPs, because it is specifically this that forces them to upgrade hardware
when they otherwise don't need to do it.  This is evidenced by the various
hacks that ISPs currently employ to extend the life of their routers.  One is
the "disconnected backbone" arrangement that Dan Ginsburg mentioned in his
email.  Another is to simply ignore /24's that you don't have enough space
for (see for instance
http://mailman.apnic.net/mailing-lists/pacnog/archive/2004/12/msg00000.html).
I'm even familiar with a Tier1 ISP (I'm not at liberty to say who) that is
actually using a hack that messes up the AS-path, and could create loops if
another ISP did the same thing).  

So this is already a problem that needs fixing.  Worse, as IPv4 addresses run
out, there is real concern that addresses will become less aggregatable, and
I'd like to have something in place should that happen.  VA represents about
the simplest architecturally sound solution I can imagine.

I guess you are suggesting that, since RAM is "cheap and dense", FIB really
isn't the problem?  I personally don't know router architecture very well,
but if what you say is true, why don't router vendors simply build routers
with more FIB?  I suppose you could argue that they try to, but that they
under-estimate DFRT growth?  This doesn't sound likely.  Tony Li has argued
(see RFC4984) that because router memories are built in volume, Moore's law
doesn't really apply.  This all suggests to me that there is a real cost to
huge FIBs. 

As for operational complexity, I don't think that you or I know if it is
"too" much or not.  Really it is a question of the value ISPs get out of this
versus the difficulty of doing it.  I'm not going to pretend that running VA
is trivial, but nor does it strike me as all that bad.  It strikes me for
instance as simpler than route reflectors (though that ain't saying much!).
And given that ISPs already deal with config complexity in the hacks I
mention above, there is a good chance in my mind that VA will be an
acceptable solution.

Regarding some specific points in your email:

Regarding core-router MPLS fix:  To be clear, VA isn't about reducing *core*
router FIB size, it is about reducing FIB size in *any* routers, all of them
if that's what you want to do.  Since most routers aren't core routers,
solutions that only address core routers are only so useful.  I've been told
by folks at ISPs that the greater concern are edge routers.

Regarding on-chip FIBs:  I've been told by Tony Li that he believes he can
fit 200K FIB entries on-chip.  VA can get you 5x reduction very easily, and
with some deployment creativity (which would take time and experience to
development) I'll be we could do much much better better.  But that is
somewhat besides the point...the goal here is not to get to single-chip
routers (though that might be where this leads us anyway), but to allow ISPs
to extend the lifetime of their routers, which is something that at least
some of them clearly want to be able to do (see below).

Thanks,

PF



> -----Original Message-----
> From: curtis@occnc.com [mailto:curtis@occnc.com]
> Sent: Wednesday, July 09, 2008 11:00 AM
> To: Paul Francis
> Cc: idr@ietf.org
> Subject: Re: [Idr] draft on virtual aggregation
> 
> 
> In message
> <37BC8961A005144C8F5B8E4AD226DE1109D823@EXCHANGE2.cs.cornell.edu>
> Paul Francis writes:
> >
> >
> > Gang,
> >
> > At the following URL is a draft on virtual aggregation that I'm
> posting to
> > IETF (it'll show up in a day or two), and which I'll present at IDR
> in
> > Dublin.
> >
> >  http://www.cs.cornell.edu/people/francis/draft-francis-idr-intra-va-
> 00.txt
> >
> > Title and abstract are below.  I hope to create a work item on this
> in IDR.
> > I would characterize this as falling under the general charter of
> scaling
> > BGP.
> >
> > Any comments and discussion on this prior to Dublin is of course
> greatly
> > appreciated.
> >
> > PF
> >
> >
> > Title:  Intra-Domain Virtual Aggregation
> >
> >
> >    Virtual Aggregation (VA) is a technique for shrinking the DFZ FIB
> >    size in routers (both IPv4 and IPv6).  This allows ISPs to extend
> the
> >    lifetime of existing routers, and allows router vendors to build
> FIBs
> >    with much less concern about the growth of the DFZ routing table.
> VA
> >    does not shrink the size of the RIB.  VA may be deployed
> autonomously
> >    by an ISP (cooperation between ISPs is not required).  While VA
> can
> >    be deployed without changes to existing routers, doing so requires
> >    significant new management tasks.  This document describes changes
> to
> >    routers and BGP that greatly simplify the operation of VA.
> >
> > _______________________________________________
> > Idr mailing list
> > Idr@ietf.org
> > https://www.ietf.org/mailman/listinfo/idr
> 
> 
> 
> Paul,
> 
> Is there a need?
> 
>   Are we still trying to do the equivalent of keeping an AGS+ with
>   DFRT alive somewhere?  RAM is cheap and dense.  To get to on-chip
>   RAM would require orders of magnitude reductions in DFRT size.
> 
>   Other techniques exist for dramatically reducing core router FIB
>   size if that becomes a goal for a provider.
> 
>   For example, MPLS (or GRE) tunneling through a BGP free core reduces
>   FIB size to about the size of the IGP (should easily fit in on-chip
>   memory).  It requires no protocol change.  Only down side is no ICMP
>   when tunnel faults in middle prior to the ingress knowing about it
>   (usually the case anyway due to VPN and VRF) and no fallback to IP
>   when ingress knows that the tunnel is down and hasn't yet rerouted.
> 
> Is the solution worse than the problem?
> 
>   This seems too operationally problematic.
> 
> Curtis
_______________________________________________
Idr mailing list
Idr@ietf.org
https://www.ietf.org/mailman/listinfo/idr