Re: [rrg] Geoff Huston's BGP/DFZ research - 300k DFZ prefixes are the tip of the iceberg

Tony Li <tony.li@tony.li> Mon, 15 March 2010 21:05 UTC

Return-Path: <tony.li@tony.li>
X-Original-To: rrg@core3.amsl.com
Delivered-To: rrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 9022F3A6BEE for <rrg@core3.amsl.com>; Mon, 15 Mar 2010 14:05:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.136
X-Spam-Level:
X-Spam-Status: No, score=-2.136 tagged_above=-999 required=5 tests=[AWL=0.204, BAYES_00=-2.599, SARE_SUB_OBFU_Z=0.259]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iXVLfxf2RiZR for <rrg@core3.amsl.com>; Mon, 15 Mar 2010 14:05:46 -0700 (PDT)
Received: from qmta01.emeryville.ca.mail.comcast.net (qmta01.emeryville.ca.mail.comcast.net [76.96.30.16]) by core3.amsl.com (Postfix) with ESMTP id 739083A6BD8 for <rrg@irtf.org>; Mon, 15 Mar 2010 14:05:46 -0700 (PDT)
Received: from omta11.emeryville.ca.mail.comcast.net ([76.96.30.36]) by qmta01.emeryville.ca.mail.comcast.net with comcast id tRXi1d0020mlR8UA1Z5vrD; Mon, 15 Mar 2010 21:05:55 +0000
Received: from [171.70.244.111] ([171.70.244.111]) by omta11.emeryville.ca.mail.comcast.net with comcast id tZ4j1d00b2QvkQB8XZ4thC; Mon, 15 Mar 2010 21:05:50 +0000
User-Agent: Microsoft-Entourage/12.23.0.091001
Date: Mon, 15 Mar 2010 14:04:42 -0800
From: Tony Li <tony.li@tony.li>
To: Paul Jakma <paul@jakma.org>
Message-ID: <C7C3EEFA.604E%tony.li@tony.li>
Thread-Topic: [rrg] Geoff Huston's BGP/DFZ research - 300k DFZ prefixes are the tip of the iceberg
Thread-Index: AcrEgyHIBCMBFm9zg0uhsgTEKgQmiQ==
In-Reply-To: <alpine.LFD.2.00.1003151648550.4735@stoner.jakma.org>
Mime-version: 1.0
Content-type: text/plain; charset="US-ASCII"
Content-transfer-encoding: 7bit
Cc: RRG <rrg@irtf.org>, Robin Whittle <rw@firstpr.com.au>
Subject: Re: [rrg] Geoff Huston's BGP/DFZ research - 300k DFZ prefixes are the tip of the iceberg
X-BeenThere: rrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IRTF Routing Research Group <rrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/rrg>
List-Post: <mailto:rrg@irtf.org>
List-Help: <mailto:rrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Mar 2010 21:05:47 -0000

Hi Paul,

> Hehe, My implicit assumption is that there's no significant change in
> allocation densities. :) Stuff like that generally seems to be an
> administive issue, so it's maybe unlikely it can be affected through
> routing changes.

It's not clear that that's a valid assumption, especially in the face of v4
runout.  Will people just become more efficient?  What happens if carriers
do deploy CGNAT?

>> This in turn implies that BGP will take longer to converge at a
>> given node when that node has to process the full table.
> 
> That's the thing though, the most recent data doesn't seem to show
> any evidence of things like that. If per-node convergence was taking
> significantly longer (in the "scaling badly relative to prefix
> growth" sense), then Sigma(per-prefix per-node convergence) should be
> similarly increasing by at least the same amount and BGP observers
> ought to be able to see that, and so the data should show it.

Disagree.  If I understand Geoff's data correctly (and please correct me if
I don't) he's showing that the overall stability of the network hasn't
changed.  The number of updates and the number of withdrawals is scaling
nicely.  He also shows that the incremental convergence of a prefix is
roughly constant.  

This is in no way contradictory to the increasing time that it will take a
router to converge.  The two measures are wholly orthogonal.

Again, consider the situation of a router rebooting somewhere in the middle
of the network.  When the router crashes, adjacent routers will select
alternate paths.  This convergence time is not going to be visible, as Geoff
is measuring the time that it takes from the first change to convergence for
that prefix.  Since the convergence time is largely dominated by MRAI
effects today, it will be difficult to perceive any increase in overall
convergence times due to scale.

Similarly, when a router comes up, it will begin learning and advertising
prefixes.  This will trigger another convergence effect, but the delay in
starting the event will not be shown in Geoff's data.  Subsequent delay in
processing by upstream routers will get lost in MRAI as before.  The router
convergence time that I'm concerned about is the time that it takes this new
router to learn and advertise the full table of routes.  Pretty clearly,
this time is linear in the number of prefixes.  Thanks to the existence of
multiple paths, this delay is not going to be seen by end nodes.

To see this effect more clearly, consider a thought experiment where it
takes a router an hour (or a day, a week or a month) to boot and process all
prefixes.  What is the effect on the network?  Would Geoff's numbers change?


> I'm sorry for being such an arse with my scepticism, and I'll
> understand if people reply to me as if I'm half-wit, but if scaling
> is a problem surely it should be apparent in some data somewhere over
> the last decade+ that people have been worrying about it? Where's the
> smoking gun graph, based on real data, that shows the scaling
> problem? I'm somewhat willing to take your word as authoritative, but
> ideally we'd have graphs :).

<employer hat off>
Any operator who would like to stand up and embarrass their favorite router
vendor by showing a graph of router boot convergence times is welcome to do
so.  ;-)
</employer hat off>
 
> I stress again that, despite taking this contrarian view of the
> scaling problem, I still think the work here is very important!

I'll just point out the last slide of Geoff's talk:

"Will BGP Continue to Scale?

Only if: the address system continues to maintain strong alignment with
network topology & provider based addressing policies assist in maintaining
a viable global routing infrastructure."

The whole point of the work here is to decouple addressing from identity so
that it can be more easily aligned with topology.

Tony