Re: [RAM] IPv6 /32 to /48 prefixes and Tree-Bitmap FIB speed
Robin Whittle <rw@firstpr.com.au> Fri, 20 July 2007 02:20 UTC
Return-path: <ram-bounces@iab.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1IBi6U-00021B-Ni; Thu, 19 Jul 2007 22:20:26 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1IBi6T-0001zy-V7 for ram@iab.org; Thu, 19 Jul 2007 22:20:25 -0400
Received: from gair.firstpr.com.au ([150.101.162.123]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1IBi6P-0001ri-8V for ram@iab.org; Thu, 19 Jul 2007 22:20:25 -0400
Received: from [10.0.0.8] (zita.firstpr.com.au [10.0.0.8]) by gair.firstpr.com.au (Postfix) with ESMTP id 8FD3B59E3D; Fri, 20 Jul 2007 12:20:19 +1000 (EST)
Message-ID: <46A01BD7.80802@firstpr.com.au>
Date: Fri, 20 Jul 2007 12:20:07 +1000
From: Robin Whittle <rw@firstpr.com.au>
Organization: First Principles
User-Agent: Thunderbird 2.0.0.4 (Windows/20070604)
MIME-Version: 1.0
To: ram@iab.org
Subject: Re: [RAM] IPv6 /32 to /48 prefixes and Tree-Bitmap FIB speed
References: <469F5757.9040700@firstpr.com.au> <BD76EE31-CA0F-4A96-B825-7254F84717C0@muada.com>
In-Reply-To: <BD76EE31-CA0F-4A96-B825-7254F84717C0@muada.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.3 (/)
X-Scan-Signature: 2ed806e2f53ff1a061ad4f97e00345ac
Cc:
X-BeenThere: ram@iab.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Routing and Addressing Mailing List <ram.iab.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ram>, <mailto:ram-request@iab.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ram>
List-Post: <mailto:ram@iab.org>
List-Help: <mailto:ram-request@iab.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ram>, <mailto:ram-request@iab.org?subject=subscribe>
Errors-To: ram-bounces@iab.org
Thanks Jeroen and Iljitsch for your helpful responses regarding IPv6 prefix lengths. I got rather lost in the Ghost Route Hunter pages, but I didn't have time to try to fully understand them. Thanks for answering my questions about /48s and for pointing me to the guidelines which some or many people follow - Gert Doering's page: http://www.space.net/~gert/RIPE/ipv6-filters.html As noted in that page, the rules there are not fully updated to recent developments, such as: ipv6 prefix-list ipv6-ebgp-strict permit 2620::/23 ge 24 le 32 route-filter 2620::/23 prefix-length-range /24-/32; which should end in "48" to reflect the fact that this prefix is where ARIN is assigning /48s to AS-end-users. A few of these are listed in the May 2007 report (page 19): http://www.space.net/~gert/RIPE/R54-v6-table.pdf There are a regular series of such reports in that directory. I wrote to Gert Doering about this initially, not realising he was in fact the person who maintains these guidelines. I often wondered whether there was a formal arrangement for the way most networks (or at least sufficient networks to affect global reachability) filter BGP advertisements based on their length. Gert's guidelines above are the closest it comes to a recognised agreement - which is not very formal. The length limit varies according to which part of the address space it is for. For instance, in areas where /32s are assigned, the rules reject advertisements for prefixes longer than /32. Gert also told me there is no formal agreement about the widely accepted practice of refusing to accept advertisements for IPv4 prefixes longer than /24: He wrote: > For IPv4, there are a number of large providers that do not > accept prefixes longer than /24, and thus, have made it a > de-facto rule - but it's not written into any sort of "Internet > book of rules" anywhere that this must be so. He also wrote: > For IPv6, people still very much disagree on what they want to > see in the global BGP table - some just announce /64s, others > filter everything that is more specific than a /32, except for > well-known exceptions like the ARIN PI range, where it's known > that the allocation is done in /48 chunks. He said that /48s shouldn't be split and that there should be a /48 for each "site". However, from the point of view of keeping the global IPv6 BGP routing table small, ideally these sites would just get one or more /48s of PA space from their local provider. (But this doesn't provide portability or traditional network-wide multihoming.) Iljitsch, you wrote, in part: > In IPv4 there is a good distribution over the first 8 bits but > in IPv6 there isn't, lots of stuff is still in 2001::/16. I > think you need to look at the first 24 bits or so if you want to > do the same here. OK - but I think that a simple array for the first "stride" of 24 bits wouldn't fit into on-chip RAM of any ASIC such as Cisco's SPP, unless there was on-chip code to do a second lookup for that smaller subset of the address range, which sounds feasible. >> I recognise that the actual CRS-1 algorithm may differ somewhat >> from this, but still, it seems that to match a packet against a >> /32 rule will probably take 4 external memory reads. For >> instance ignore the first 3 bits which are always 001, then use >> an internal RAM table for the next 11 bits and then the last 18 >> bits as a 3 * 6 bit stride. > Just 6 bits? I was going by the suggested hardware implementation from the 2004 paper - but I guess in high-powered implementations such as Cisco's SPP in the CRS-1, they may have enough RAM to do more bits per stride. I don't fully understand Tree-Bitmap, but perhaps there is some problem with larger strides than 6, something to do with the memory system not being wide enough to read in a single clock cycle all the data they need for the various things which could happen next. > It would also help to optimize the address assignment policies. For > instance, leaving 7 /48s unused between two ones that are used so there > is room for growth is really suboptimal here. Being able to see whether > an address will match /32 or shorter or whether it will have to match a > /48 could also be helpful. Yes. Although the train is just pushing off from the station, the seeds of it tumbling into the ravine at full speed five or ten years from now are clearly apparent. I suggested in a now largely defunct I-D: http://www.firstpr.com.au/ip/sram-ip-forwarding/ that before the train moves any more from the station, that there be an agreement to assign PI address space in a particular area, such as a /10, in a way that this could be easily handled by a single memory access FIB system. (Section 6.3 in the above I-D, although I explain it there in terms of specific fast SRAM chips, rather than what I would write now: as an array in the DRAM of an existing ASIC-based FIB system.) Then, for instance, 32 million /35s could be mapped into that area. The idea is that future routers (by the time the IPv6 global routing table starts to become painfully large) will be equipped to do single external memory cycle classification of all packets in this /10, on the assumption that there is no rule with a longer prefix than /35. This gives rise to a completely different approach to assigning this address space to end-users. Give each end-user one or more (on some reasonable basis - they need to justify it) /35s, starting at the bottom of this /10. Just fill it up from one end to the other. That is 32 million end-user prefixes, with each able to be advertised in any provider network, because we don't care about route aggregation - because we have a RAM-lookup FIB. By the time this process reaches the end of the /10, routers will have more RAM and allocation will continue on the next /10 above the first one. The trouble with this idea is that it solves the FIB problem beautifully, in a way which only makes sense if the BGP control plane could handle millions of routes. Maybe some scheme such as this should be kept in mind, for a future situation where BGP or some new system can cope with such a large number of routes. - Robin _______________________________________________ RAM mailing list RAM@iab.org https://www1.ietf.org/mailman/listinfo/ram
- [RAM] IPv6 /32 to /48 prefixes and Tree-Bitmap FI… Robin Whittle
- Re: [RAM] IPv6 /32 to /48 prefixes and Tree-Bitma… Jeroen Massar
- Re: [RAM] IPv6 /32 to /48 prefixes and Tree-Bitma… Iljitsch van Beijnum
- Re: [RAM] IPv6 /32 to /48 prefixes and Tree-Bitma… Robin Whittle
- Re: [RAM] IPv6 /32 to /48 prefixes and Tree-Bitma… Per Heldal
- Re: [RAM] IPv6 /32 to /48 prefixes and Tree-Bitma… Brian E Carpenter