Re: [Idr] On FIB sizing...

Jeff Wheeler <jsw@inconcepts.biz> Sun, 16 December 2012 06:18 UTC

Return-Path: <jsw@inconcepts.biz>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 43C3521F8653 for <idr@ietfa.amsl.com>; Sat, 15 Dec 2012 22:18:56 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.822
X-Spam-Level:
X-Spam-Status: No, score=-2.822 tagged_above=-999 required=5 tests=[AWL=0.154, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XAJdtNFPy961 for <idr@ietfa.amsl.com>; Sat, 15 Dec 2012 22:18:55 -0800 (PST)
Received: from mail-ia0-f172.google.com (mail-ia0-f172.google.com [209.85.210.172]) by ietfa.amsl.com (Postfix) with ESMTP id 0362621F8652 for <idr@ietf.org>; Sat, 15 Dec 2012 22:18:54 -0800 (PST)
Received: by mail-ia0-f172.google.com with SMTP id z13so4471764iaz.31 for <idr@ietf.org>; Sat, 15 Dec 2012 22:18:54 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:in-reply-to:references:date :message-id:subject:from:to:cc:content-type:x-gm-message-state; bh=HsHABJLueN2fTYnynmwsCwf+JHh0PoCwUtVahKSiVrc=; b=UDazrz9X4prvsAo7ygtNtSJwKw6G3hkGFyeJM52PS0eTRgNUyMEA8E8crv04PLRX3v XlsVT6w3xeff8/zn3Fe76RTTzl3vIr0+rQrzigRtZcgICnOSgOD3Tt2OonjYijN6Sy7/ uiIAqvrvKq+oLUTAwsdpV7IMMraZb/Z9IY5Rjl6/N85sSaxmt/FghyrMESB5f/QlOj5B hdvF5IoHRFLHmbiibJqAD+3ukM6YTGD5bLr+JTfWVzbH0nNx4IRE73U9HM8nJjg1zBde 7/Y+EcBE2nI3x+IphgD5auZz+K1PD9049EFDEyCkyWYtQeT1F8BG2LtGJqUgx4f8VKTz RXTQ==
MIME-Version: 1.0
Received: by 10.42.180.65 with SMTP id bt1mr8364247icb.41.1355638734427; Sat, 15 Dec 2012 22:18:54 -0800 (PST)
Received: by 10.64.132.33 with HTTP; Sat, 15 Dec 2012 22:18:54 -0800 (PST)
X-Originating-IP: [74.134.22.105]
In-Reply-To: <AEA6A7A2-9934-4B0F-BCD0-19CC4C1B371B@tony.li>
References: <CA+b+ERnSVvewSpftXs3FhW12-S+sgnB1SwD4L+xqFW+hhbQayw@mail.gmail.com> <7120600D-71BD-4E61-8F06-25B7C2BAE6A8@riw.us> <20121211185917.GA21813@puck.nether.net> <CA+b+ERnzo2BLWjE1J_dMfYuExbG9WYJroPE4ZAWg++KK2_jy1g@mail.gmail.com> <CA+b+ERm=Agr7b6JXcXOwiP4wBjnEFmnVNt5fAJrn18R0hGtSzg@mail.gmail.com> <50C78C29.3070406@foobar.org> <50C8B8D9.4090903@umn.edu> <50C9039E.1050104@foobar.org> <20121213144147.GB4524@puck.nether.net> <50CB52E0.7080602@foobar.org> <20121214174012.GA18502@puck.nether.net> <50CBB294.1000300@umn.edu> <B5907AE4-F639-4CC7-B522-B9AD92E61A51@kumari.net> <50CCFD49.1060307@foobar.org> <1355614542.6115.19.camel@galileo.millnert.se> <CAPWAtb+Lo6hRu6Hg9dQNf2VUJ9q_85H+bKiRZE1WEaRnLN4BVQ@mail.gmail.com> <AEA6A7A2-9934-4B0F-BCD0-19CC4C1B371B@tony.li>
Date: Sun, 16 Dec 2012 01:18:54 -0500
Message-ID: <CAPWAtbJhh2r84hF0CyRND1n++ns2Mj6b21SwCt4JemGXnHjHgA@mail.gmail.com>
From: Jeff Wheeler <jsw@inconcepts.biz>
To: Tony Li <tony.li@tony.li>
Content-Type: multipart/alternative; boundary="90e6ba6e81d83596fb04d0f23d53"
X-Gm-Message-State: ALoCoQm+NJdKDO2rU59WyoI3jU7HjsFNtCYZkoRNmyQb+uri9PFIYDIwQkAp58MUmqWNcEPcK8vv
Cc: IETF IDR Working Group <idr@ietf.org>
Subject: Re: [Idr] On FIB sizing...
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/idr>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Dec 2012 06:18:56 -0000

On Sat, Dec 15, 2012 at 9:25 PM, Tony Li <tony.li@tony.li> wrote:
> As an aside, FIB sizing only varies loosely with the maximum number of
bits in the address because the FIB can be constructed to hold only
prefixes.  Thus, what's more important is the average and distribution of
prefix lengths.

Certainly true, but you reduce power/heat/die area at the expense of
complexity.  I would say that vendors have been doing this long enough to
have pretty much mastered it, but there are products on the market right
now that have sub-optimal FIB scale and will deliver greater FIB scale
through software upgrades, once the software people for these boxes have
time to actually implement the needed improvements.

It is also the reason why people worry about whether or not their box of
choice will work right if they decide to configure /120s or whatever.  My
results have been good, but a few people claim that specific routers do not
work well in such circumstances.  They are probably not wrong.

> Thus, (for a core router) the choice of /128 is largely irrelevant when
remote prefixes are all /64 and shorter.  For a datacenter switch that has
to store possibly many /128's, this is more of an issue, but is offset
because *there is no ARP cache to deal with*.

It would be nice if that were true.  Unfortunately you are not correct.

There is no router that will forward based on an L2 address encoded into a
global IPv6 L3 address, even as an optional configuration knob on the
subnet, egress interface, or adjacency table entry.  I wish such router
would hurry up and exist because it is a smart idea that has just not been
implemented in products that are shipping.

Most routers even need to install link-local address L3 to L2 mappings into
the data-plane even though this should usually be unnecessary.  If you
think this means that the number of L3 to L2 mappings in the FIB is
effectively doubled for subnets where the attached machines all have 1
global IPv6 address and 1 link-local address, you are right.  It means a
router that claims to have room for 2000 ND entries can only support 1000
machines in practice.

If you think there is no "ARP cache" and thus less problems to consider
with IPv6, you should learn about why Cisco has implemented a knob to limit
the number of ND entries that can be used on a per-interface basis.  ND
cache size, churn, exhaustion attacks are a serious concern that operators
will have to deal with once IPv6 adoption grows to the point that
IPv6-based DDoS becomes common.  Most vendors are totally ignoring this
problem and it worries many of us.  It is especially concerning on
platforms like Cisco SUP720 where an attack of this nature will break not
only IPv6 but also IPv4.


My point is, raising the size of the AS number to 64 bits, or some other
ridiculous size, really would not have had data-plane cost the way IPv6
address sizes do.  The only place AS numbers are used in the data-plane is
netflow, and abstractions can be used in place of real AS numbers there.

I am not advocating any scheme that would again raise the AS number space
size.  I'm just pointing out that it would not increase the cost of FIBs.

If anyone thought that hierarchical AS number allocation was a smart idea
and private AS numbers should be eliminated by simply making enough global
AS to satisfy every org with as many as they want (Randy Bush's position),
then hopefully these people spoke up at the time 4-octet AS numbers were
standardized.

-- 
Jeff S Wheeler <jsw@inconcepts.biz>
Sr Network Operator  /  Innovative Network Concepts