Re: [RADIR] proposed changes to Section 3

"John G. Scudder" <> Wed, 14 May 2008 01:32 UTC

Return-Path: <>
Received: from [] (localhost []) by (Postfix) with ESMTP id 590923A68A6; Tue, 13 May 2008 18:32:11 -0700 (PDT)
Received: from localhost (localhost []) by (Postfix) with ESMTP id 105F13A68A6 for <>; Tue, 13 May 2008 18:32:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: 0.659
X-Spam-Status: No, score=0.659 tagged_above=-999 required=5 tests=[BAYES_05=-1.11, DATE_IN_PAST_06_12=1.069, SARE_BIZOP=0.7]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id nubVTeInF+qW for <>; Tue, 13 May 2008 18:32:08 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 4AEC53A698C for <>; Tue, 13 May 2008 18:32:08 -0700 (PDT)
Received: from [] ([]) by (8.14.1/8.14.1) with ESMTP id m4E1VuDx008975; Tue, 13 May 2008 21:32:06 -0400
Message-Id: <>
From: "John G. Scudder" <>
To: Thomas Narten <>
In-Reply-To: <>
Mime-Version: 1.0 (Apple Message framework v919.2)
Date: Tue, 13 May 2008 14:34:26 -0400
References: <>
X-Mailer: Apple Mail (2.919.2)
Subject: Re: [RADIR] proposed changes to Section 3
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Routing and Addressing Directorate <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

Comments in line.


On Apr 29, 2008, at 10:59 AM, Thomas Narten wrote:

> Going back through Section 3, here is what I'd propose we do:
> Better define "superlinear". I.e., to make clear we don't know exactly
> what the curve is, but that it appears to be
> quardratic/polynomial/something.

"Quadratic/polynomial/something" seems about as vague as  
"superlinear".  I don't think that the problem is that we were too  
lazy to write down which it is; I think the problem is that it's not  
known which it is.  Perhaps the best we can do is to footnote  
"superlinear" with various fits that have been suggested while noting  
that controversy exists about which (if any) of them is "right".

> The fact that it is not linear is
> problematic because historically, technology is able to keep up with
> linear growth. Some definitions:
> When multiple processors are used, you get speedup greater than the
> number of processors. I.e., you get 3.5X improvement with 3 processors
> vs. just one.
> Another example: (where parallelizing a function produces a greater
> performance improvement than the number of applied processing cores)

Much easier to find examples where parallelism produces speedup less  
than linear in the number of processors. :-(

> Superlinear/linear is also defined in the context of a converging
> series, though I'm not sure how to apply that to routing updates...
> So, do we mean something like: the cost of (or resources needed to)
> process/managing routing updates goes up at a rate greater than linear
> on the number of prefixes

No, I don't think so.  Suppose a BGP implementation's runtime cost is  
linear in the number of prefixes (it will never be better than that of  
course).  If the number of prefixes increases superlinearly, clearly  
the runtime cost of the BGP implementation does likewise.

A real analysis is quite a bit harder of course.  This kind of  
algorithms 101 big-O analysis can cover up a lot of important detail.   
In particular, the dynamics of the system in the face of specific  
events such as failures are important, but I don't think this is the  
document in which to address that.

> (though this doesn't factor in the rate of updates, which is also a
> factor).
>>   o  The overall rate of routing updates is increasing, requiring
>>      routers to process updates at an increased rate or converge more
>>      slowly if they cannot.  The rate increase is driven by a  
>> number of
>>      factors (discussed below).  It should be noted that the overall
>>      routing update rate is dependent on two factors: the number of
>>      individual prefixes and the mean per-prefix update rate.   
>> While it
>>      is clear that the overall number of prefixes is increasing  
>> super-
>>      linearly, further study is needed to determine whether the mean
>>      per-prefix update rate is increasing as well [1].
> I think we should dig a bit deeper into the last point and engage
> folks (geoff?) to find references and/or to encourge work be done
> here.

One further thing that's been pointed out to me is that the mean  
update rate is very nearly irrelevant.  Whether your router's BGP  
process idles at 1% or 3% of the CPU is of academic interest only.   
What's important is the peak rate (and the width of the peaks).

>>   This super linear growth presents a scalability challenge for  
>> current
>>   and/or future routers.  There are two aspects to the challenge.   
>> The
>>   first one is purely technical: can we build routers (i.e.,  
>> hardware &
>>   software) actually capable of handling the control plane load, both
>>   today and going forward?  The second challenge is one of economics:
>>   is the cost of developing, building and deploying such routers
>>   economically sustainable, given current and realistic business  
>> models
>>   that govern how ISPs operate as businesses?
> Tony has won me over and I think we need to collapse the two
> points. We can make pigs fly. The issue is at what cost. It is the
> cost fact that is the issue. Or, the cost factor will be looming large
> if we are really bumping into technical challenges.
>> 3.1.  Technical Aspects
>>   The technical challenge of building routers relates to the  
>> resources
>>   needed to process a larger and increasingly dynamic amount of  
>> routing
>>   information.  More specifically, routers must maintain an  
>> increasing
>>   amount of associated state information in the RIB, they must be
>>   capable of populating a growing FIB, they must perform forwarding
>>   lookups at line rates (while accessing the FIB) and they must be  
>> able
>>   to initialize the RIB and FIB at boot time.  Moreover, this  
>> activity
>>   must take place within acceptable time frames (i.e., paths for
>>   individual destinations must converge and stabilize within an
>>   acceptable time period).  Finally, the hardware needed to achieve
>>   this cannot have unreasonable power consumption or cooling
>>   demands.
> Reword slightly to just list what routers have to do (technically) and
> how bigger tables/faster updates increases the challenge.

I have no problem with "reword slightly" in principle but would have  
to see the slight rewording to comment further.

>> 3.2.  Business Aspects
>>   Even if it is technically possible to build routers capable of
>>   meeting the technical and operational requirements, it is also
>>   necessary that the overall cost to build, maintain and deploy such
>>   equipment meet reasonable business expectations.  ISPs, after all,
>>   are run as businesses.  As such, they must be able to plan, develop
>>   and construct viable business plans that provide an acceptable  
>> return
>>   on investment (i.e., one acceptable to investors).
> Reword to just say that key issue is "at what cost". (Note: by saying
> "even if it is technically possible" we are probably hitting the hot
> buttons of folk who do not doubt it can be done.)
>>   While the IETF does not (and cannot) concern itself with business
>>   models or the profitability of the ISP community, the cost of  
>> running
>>   the routing subsystem as a whole is directly influenced by the
>>   routing architecture of the Internet, which clearly is the IETF's
>>   business.  Further, because cost implications are part of each and
>>   every engineering decision, controlling or limiting the overall  
>> cost
>>   of running the routing subsystem (through architectural  
>> decisions) is
>>   part of the IETF's fundamental charter.  Consequently, having the
>>   IETF continue with an architectural model that places unbounded  
>> cost
>>   requirements on critical infrastructure represents an undue risk to
>>   the future of the Internet as a whole.
>>   One aspect of planning concerns the assumptions made about the
>>   expected usable lifetime of purchased equipment.  Businesses
>>   typically expect that once deployed, equipment can remain in use  
>> for
>>   some projected amount of time (e.g., 3-5 years).  Upgrading  
>> equipment
>>   earlier than planned is more easily justified (as an unplanned
>>   expense) when a new business opportunity is enabled as a result  
>> of an
>>   upgrade.  For example, an upgrade might be justified by an  
>> ability to
>>   support increased traffic or an increase in the number of customer
>>   connections, etc., where the upgrade can translate into increased
>>   revenue.  In contrast, it is more difficult to justify unplanned
>>   upgrades in the absence of corresponding customer benefit (and
>>   revenue) to cover the upgrade cost.  It is generally desired that
>>   deployed equipment remain usable over its planned lifetime.  An
>>   increase in the resources required to support larger or more  
>> dynamic
>>   routing tables is viewed as a sort of "unfunded mandate", in that
>>   customers do not expect to have to pay more just to retain the same
>>   level of service as before, i.e., having all destinations be
>>   reachable as was the case in the past.  This undermining of  
>> planning
>>   is particularly problematic when the increase in routing demand
>>   originates external to the ISP, and the ISP has no way to control  
>> or
>>   limit it (e.g., the increased demand comes from being part of the
>>   DFZ).
>>   From a business perspective, it is desirable to maintain or  
>> increase
>>   the useful lifespan of routing equipment, by improving the scaling
>>   properties of the routing and addressing system.
> Actually, let me suggest we scrap this entire section and try
> again. (I suspect this section is the one that people are the most
> unhappy with.).
> How about something like:
> While it seems likely that it will remain technically feasible to
> build routers that meet the technical and operational requirements of
> operating within the DFZ, the more important question is at what cost
> and what the actual useable lifetime of such routing equipment would
> be.
> One cost is the capital cost to purchase a router that can adequately
> particpate in DFZ routing. As the cost rises, smaller (or more
> struggling) ISPs may find themselves priced out of the market and
> unable to fully participate in DFZ routing. Should the cost of such
> routers be high enough, we may find that only a small number of the
> largest ISPs can operate within the DFZ. At some point, we may find
> that global routing is effectively controlled by a small number of
> operators, with a high barrier to entry for newcomers. This would be a
> significant change from how routing within the Internet has
> historically been managed and may result in a reduction in the
> innovation that has historically fueled Internet growth.
> Another cost relates to how long a given piece of equipment (i.e.,
> hardware configuration) is sufficient to fully participate in routing.
> Hardware purchases are made assuming that the equipment will be
> useable for a fixed amount of time (e.g., 3-5 years) before needing to
> be upgraded and replaced. But increased load on routers stemming from
> increased routing updates external to an ISP are a special case, as
> they are not under the control of an ISP and hence are difficult to
> predict and plan for. Should the routing load increase too quickly,
> ISPs will need to replace routers earlier than predicted and budgeted.
> For businesses that are not growing, i.e., that are not expanding
> service or capacity to existing customers or increasing the size of
> their customer base, replacing hardware earlier than planned can be
> problematic, as they cannot easily pass such "unplanned" costs onto
> their customers, who do not see increased value from a price
> increase. (Selling customers on potential service-reduction if they
> don't pay more is a difficult business model to sustain.)

At first glance I'm fine with the proposed re-wording.

>> 3.3.  Alignment of Incentives
>>   Today's growth pattern is influenced by the scaling properties of  
>> the
>>   current system.  If the system had better scaling properties, we
>>   would be able support and enable more widespread usage of certain
>>   applications such as multihoming and traffic engineering.   
>> Currently
>>   the system does not allow everyone to multihome, as there are some
>>   barriers to multihoming due to operational practices that try to
>>   strike a balance between the amount of multihoming and preservation
>>   of routing slots.  It is desirable that the routing and addressing
>>   system exert the least possible back pressure on end user
>>   applications and deployment scenarios, to enable the broadest
>>   possible use of the Internet.
> I'd suggest being blunt here and adding a line: If everyone who
> potentially wanted to multihome were given PI space, the routing
> system would simply collapse. Hence, there is a need to say "no".


>>   One aspect of the current architecture is a misalignment of cost  
>> and
>>   benefit.  Injecting individual prefixes into the DFZ creates a  
>> small
>>   amount of "pain" for those routers that are part of the DFZ.  Each
>>   individual prefix has a small cost, but the aggregate sum of all
>>   prefixes is significant, and leads to the core problem at hand.
>>   Those that inject prefixes into the DFZ do not generally pay the  
>> cost
>>   associated with the individual prefix -- it is carried by the  
>> routers
>>   in the DFZ.  But the originator of the prefix receives the benefit.
>>   Hence, there is misalignment of incentives between those receiving
>>   the benefit and those bearing the cost of providing the benefit.
>>   Consequently, incentives are not aligned properly to produce a
>>   natural balance between the cost and benefit of maintaining routing
>>   tables.
>> 3.4.  Table Growth Targets
>>   A precise target for the rate of table size or routing update
>>   increase that should reasonably be supported going forward is
>>   difficult to state in quantitative terms.  One target might  
>> simply be
>>   to keep the growth at a stable, but manageable growth rate so that
>>   the increased router functionality can roughly be covered by
>>   improvements in technology (e.g., increased processor speeds,
>>   reductions in component costs, etc.).
> Say "something close to linear growth" would be ideal.


>>   However, it is highly desirable to significantly bring down (or  
>> even
>>   reverse) the growth rate in order to meet user expectations for
>>   specific services.  As discussed below, there are numerous  
>> pressures
>>   to deaggregate routes.  These pressures come from users seeking
>>   specific, tangible service improvements that provide "business-
>>   critical" value.  Today, some of those services simply cannot be
>>   supported to the degree that future demand can reasonably be  
>> expected
>>   because of the negative implications on DFZ table growth.  Hence,
>>   valuable services are available to some, but not all potential
>>   customers.  As the need for such services becomes increasingly
>>   important, it will be difficult to deny such services to large
>>   numbers of users, especially when some "lucky" sites are able to  
>> use
>>   the service and others are not.
> Thoughts?
> Thomas
> _______________________________________________
> RADIR mailing list

RADIR mailing list