Re: [RAM] IPv6 /32 to /48 prefixes and Tree-Bitmap FIB speed

Robin Whittle <rw@firstpr.com.au> Fri, 20 July 2007 02:20 UTC

Return-path: <ram-bounces@iab.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1IBi6U-00021B-Ni; Thu, 19 Jul 2007 22:20:26 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1IBi6T-0001zy-V7 for ram@iab.org; Thu, 19 Jul 2007 22:20:25 -0400
Received: from gair.firstpr.com.au ([150.101.162.123]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1IBi6P-0001ri-8V for ram@iab.org; Thu, 19 Jul 2007 22:20:25 -0400
Received: from [10.0.0.8] (zita.firstpr.com.au [10.0.0.8]) by gair.firstpr.com.au (Postfix) with ESMTP id 8FD3B59E3D; Fri, 20 Jul 2007 12:20:19 +1000 (EST)
Message-ID: <46A01BD7.80802@firstpr.com.au>
Date: Fri, 20 Jul 2007 12:20:07 +1000
From: Robin Whittle <rw@firstpr.com.au>
Organization: First Principles
User-Agent: Thunderbird 2.0.0.4 (Windows/20070604)
MIME-Version: 1.0
To: ram@iab.org
Subject: Re: [RAM] IPv6 /32 to /48 prefixes and Tree-Bitmap FIB speed
References: <469F5757.9040700@firstpr.com.au> <BD76EE31-CA0F-4A96-B825-7254F84717C0@muada.com>
In-Reply-To: <BD76EE31-CA0F-4A96-B825-7254F84717C0@muada.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.3 (/)
X-Scan-Signature: 2ed806e2f53ff1a061ad4f97e00345ac
Cc:
X-BeenThere: ram@iab.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Routing and Addressing Mailing List <ram.iab.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ram>, <mailto:ram-request@iab.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ram>
List-Post: <mailto:ram@iab.org>
List-Help: <mailto:ram-request@iab.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ram>, <mailto:ram-request@iab.org?subject=subscribe>
Errors-To: ram-bounces@iab.org

Thanks Jeroen and Iljitsch for your helpful responses regarding
IPv6 prefix lengths.

I got rather lost in the Ghost Route Hunter pages, but I didn't
have time to try to fully understand them.

Thanks for answering my questions about /48s and for pointing me
to the guidelines which some or many people follow - Gert
Doering's page:

  http://www.space.net/~gert/RIPE/ipv6-filters.html

As noted in that page, the rules there are not fully updated to
recent developments, such as:

 ipv6 prefix-list ipv6-ebgp-strict permit 2620::/23 ge 24 le 32

 route-filter 2620::/23 prefix-length-range /24-/32;

which should end in "48" to reflect the fact that this prefix is
where ARIN is assigning /48s to AS-end-users.  A few of these are
listed in the May 2007 report (page 19):

 http://www.space.net/~gert/RIPE/R54-v6-table.pdf

There are a regular series of such reports in that directory.

I wrote to Gert Doering about this initially, not realising he was
in fact the person who maintains these guidelines.

I often wondered whether there was a formal arrangement for the
way most networks (or at least sufficient networks to affect
global reachability) filter BGP advertisements based on their
length.  Gert's guidelines above are the closest it comes to a
recognised agreement - which is not very formal.  The length limit
varies according to which part of the address space it is for.
For instance, in areas where /32s are assigned, the rules reject
advertisements for prefixes longer than /32.

Gert also told me there is no formal agreement about the widely
accepted practice of refusing to accept advertisements for IPv4
prefixes longer than /24:  He wrote:

> For IPv4, there are a number of large providers that do not
> accept prefixes longer than /24, and thus, have made it a
> de-facto rule - but it's not written into any sort of "Internet
> book of rules" anywhere that this must be so.

He also wrote:

> For IPv6, people still very much disagree on what they want to
> see in the global BGP table - some just announce /64s, others
> filter everything that is more specific than a /32, except for
> well-known exceptions like the ARIN PI range, where it's known
> that the allocation is done in /48 chunks.

He said that /48s shouldn't be split and that there should be a
/48 for each "site".  However, from the point of view of keeping
the global IPv6 BGP routing table small, ideally these sites would
just get one or more /48s of PA space from their local provider.
(But this doesn't provide portability or traditional network-wide
multihoming.)


Iljitsch, you wrote, in part:

> In IPv4 there is a good distribution over the first 8 bits but
> in IPv6 there isn't, lots of stuff is still in 2001::/16. I
> think you need to look at the first 24 bits or so if you want to
> do the same here.

OK - but I think that a simple array for the first "stride" of 24
bits wouldn't fit into on-chip RAM of any ASIC such as Cisco's
SPP, unless there was on-chip code to do a second lookup for that
smaller subset of the address range, which sounds feasible.

>> I recognise that the actual CRS-1 algorithm may differ somewhat
>> from this, but still, it seems that to match a packet against a
>> /32 rule will probably take 4 external memory reads.  For
>> instance ignore the first 3 bits which are always 001, then use
>> an internal RAM table for the next 11 bits and then the last 18
>> bits as a 3 * 6 bit stride.

> Just 6 bits?

I was going by the suggested hardware implementation from the 2004
paper - but I guess in high-powered implementations such as
Cisco's SPP in the CRS-1, they may have enough RAM to do more bits
per stride.  I don't fully understand Tree-Bitmap, but perhaps
there is some problem with larger strides than 6, something to do
with the memory system not being wide enough to read in a single
clock cycle all the data they need for the various things which
could happen next.


> It would also help to optimize the address assignment policies. For
> instance, leaving 7 /48s unused between two ones that are used so there
> is room for growth is really suboptimal here. Being able to see whether
> an address will match /32 or shorter or whether it will have to match a
> /48 could also be helpful.

Yes.  Although the train is just pushing off from the station, the
seeds of it tumbling into the ravine at full speed five or ten
years from now are clearly apparent.

I suggested in a now largely defunct I-D:

  http://www.firstpr.com.au/ip/sram-ip-forwarding/

that before the train moves any more from the station, that there
be an agreement to assign PI address space in a particular area,
such as a /10, in a way that this could be easily handled by a
single memory access FIB system.  (Section 6.3 in the above I-D,
although I explain it there in terms of specific fast SRAM chips,
rather than what I would write now: as an array in the DRAM of an
existing ASIC-based FIB system.)

Then, for instance, 32 million /35s could be mapped into that
area.  The idea is that future routers (by the time the IPv6
global routing table starts to become painfully large) will be
equipped to do single external memory cycle classification of all
packets in this /10, on the assumption that there is no rule with
a longer prefix than /35.

This gives rise to a completely different approach to assigning
this address space to end-users.  Give each end-user one or more
(on some reasonable basis - they need to justify it) /35s,
starting at the bottom of this /10.  Just fill it up from one end
to the other.  That is 32 million end-user prefixes, with each
able to be advertised in any provider network, because we don't
care about route aggregation - because we have a RAM-lookup FIB.

By the time this process reaches the end of the /10, routers will
have more RAM and allocation will continue on the next /10 above
the first one.

The trouble with this idea is that it solves the FIB problem
beautifully, in a way which only makes sense if the BGP control
plane could handle millions of routes.

Maybe some scheme such as this should be kept in mind, for a
future situation where BGP or some new system can cope with such a
large number of routes.

 - Robin


_______________________________________________
RAM mailing list
RAM@iab.org
https://www1.ietf.org/mailman/listinfo/ram