[RAM] IPv6 /32 to /48 prefixes and Tree-Bitmap FIB speed

Robin Whittle <rw@firstpr.com.au> Thu, 19 July 2007 12:22 UTC

Return-path: <ram-bounces@iab.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1IBV19-0001p8-0w; Thu, 19 Jul 2007 08:22:03 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1IBV18-0001on-2M for ram@iab.org; Thu, 19 Jul 2007 08:22:02 -0400
Received: from gair.firstpr.com.au ([150.101.162.123]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1IBV15-0004H0-LF for ram@iab.org; Thu, 19 Jul 2007 08:22:02 -0400
Received: from [10.0.0.8] (zita.firstpr.com.au [10.0.0.8]) by gair.firstpr.com.au (Postfix) with ESMTP id 8648259E3D; Thu, 19 Jul 2007 22:21:55 +1000 (EST)
Message-ID: <469F5757.9040700@firstpr.com.au>
Date: Thu, 19 Jul 2007 22:21:43 +1000
From: Robin Whittle <rw@firstpr.com.au>
Organization: First Principles
User-Agent: Thunderbird 2.0.0.4 (Windows/20070604)
MIME-Version: 1.0
To: ram@iab.org
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 6cca30437e2d04f45110f2ff8dc1b1d5
Subject: [RAM] IPv6 /32 to /48 prefixes and Tree-Bitmap FIB speed
X-BeenThere: ram@iab.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Routing and Addressing Mailing List <ram.iab.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ram>, <mailto:ram-request@iab.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ram>
List-Post: <mailto:ram@iab.org>
List-Help: <mailto:ram-request@iab.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ram>, <mailto:ram-request@iab.org?subject=subscribe>
Errors-To: ram-bounces@iab.org

If someone can help me with a few questions, I would really
appreciate it.  One concerns IPv6 BGP practice and the other how
Tree-Bitmap FIBs work with long prefixes.

When I look at the table "Root Prefix Length Distributions":

  http://bgp.potaroo.net/v6/as1221/

(not the first table which precedes it "Prefix Length
Distributions") I see a bunch of prefix lengths which I assume are
those advertised globally:

  /19     2
  ...
  /32	617
  /33	  4 	
  /34	  1 	
  /35	 18 	
  /48	 60 	

Is there some formal or informal agreement not to accept
advertisements for prefixes longer than /48?  I know
there is such a rule for IPv4 (/24) but am not sure if it is
formalised anywhere.

If so, then when an end-user gets a /48 of PI space, as per ARIN
or AfriNIC policies:

  http://www.arin.net/policy/nrpm.html#six58
  http://www.afrinic.net/Registration/afsup-ipv6pi-assignments.htm

I guess this means they can't split this prefix, for instance for
load balancing in a multihomed situation or for using with two
offices in different cities or countries which use different
providers.  This would mean the end-user needs a separate /48 for
each physical office in a different city or country.  Can someone
confirm this?


I also want to confirm or improve my understanding of how
ASIC-based FIBs (rather than those which use TCAM) actually cope
with classifying packets which match really long rules.

I understand that the CRS-1 uses the Tree Bitmap algorithm:

  http://www.firstpr.com.au/ip/sram-ip-forwarding/#Vargese_CRS_1

I have the final 2004 paper if anyone wants a copy.  In this
paper, the hardware reference design deals with the most
significant 8 bits of address in an on-chip RAM lookup table
(actually a 12 bit index).  Then the rest of the address bits are
processed 6 at a time (stride of 6, in Figure 12).

I recognise that the actual CRS-1 algorithm may differ somewhat
from this, but still, it seems that to match a packet against a
/32 rule will probably take 4 external memory reads.  For instance
ignore the first 3 bits which are always 001, then use an internal
RAM table for the next 11 bits and then the last 18 bits as a 3 *
6 bit stride.

For a /48 rule, it seems that with a 6 bit stride, the number of
memory accesses goes up to 7 or so.   This is a really ugly amount
of DRAM cycles to classify each packet passing through a transit
router.  Maybe on-chip caches could help, but I can't imagine how.

I find it impossible to think happily of such cumbersome
techniques, but as far as I know, strides beyond 6 bits have
various problems.  Even if the stride was 12 bits, it would still
be 3 or 4 memory accesses per packet.

Is the future really this grim?  TCAMs are fast but power-hungry
and a can of worms in other respects, such as worst-case update
time.  The fact that Juniper and Cisco use trie-based techniques
indicates that TCAMs can't do the job - but these trie-based
approaches seem to be so slow.

  - Robin


_______________________________________________
RAM mailing list
RAM@iab.org
https://www1.ietf.org/mailman/listinfo/ram