Re: [v6ops] [GROW] Deaggregation by large organizations

Owen DeLong <owen@delong.com> Thu, 16 October 2014 18:04 UTC

Return-Path: <owen@delong.com>
X-Original-To: v6ops@ietfa.amsl.com
Delivered-To: v6ops@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 47F051A033B; Thu, 16 Oct 2014 11:04:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.699
X-Spam-Level: *
X-Spam-Status: No, score=1.699 tagged_above=-999 required=5 tests=[BAYES_50=0.8, DKIM_ADSP_ALL=0.8, DKIM_SIGNED=0.1, SPF_PASS=-0.001, T_DKIM_INVALID=0.01, T_RP_MATCHES_RCVD=-0.01] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oj6EnDh8qp2f; Thu, 16 Oct 2014 11:03:57 -0700 (PDT)
Received: from owen.delong.com (owen.delong.com [IPv6:2620:0:930::200:2]) by ietfa.amsl.com (Postfix) with ESMTP id 719CD1A0275; Thu, 16 Oct 2014 11:03:57 -0700 (PDT)
Received: from [IPv6:2620::930:0:ca2a:14ff:fe3e:d024] ([IPv6:2620:0:930:0:ca2a:14ff:fe3e:d024]) (authenticated bits=0) by owen.delong.com (8.14.2/8.14.2) with ESMTP id s9GHxcJb024214 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Thu, 16 Oct 2014 10:59:38 -0700
X-DKIM: Sendmail DKIM Filter v2.8.3 owen.delong.com s9GHxcJb024214
DKIM-Signature: v=1; a=rsa-sha1; c=simple/simple; d=delong.com; s=mail; t=1413482378; bh=mmc9CYg08EBU1zvpiieeM8c+Zvs=; h=Content-Type:Mime-Version:Subject:From:In-Reply-To:Date:Cc: Content-Transfer-Encoding:Message-Id:References:To; b=LRMSddJfg7WmyMw60UynwBkBwUbou7AmPpHKWOrH0IszFGCS4BSJmldhlR9tZzqpL SdD0sq/fTk5Uu2ASL3CU0j3oCQrWiNxwULlffsyMDBGxf97Ee8nAl2vu4vMNkSn+/n C25wlsJdhwIjaxUa8IKGU61Ox4D022bo9RXM9zTs=
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
From: Owen DeLong <owen@delong.com>
In-Reply-To: <755DE4C3-CDDF-41AF-BA9C-E8EC5B4DFC4C@muada.com>
Date: Thu, 16 Oct 2014 10:59:24 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <A7F6BEA0-BCDD-4197-B6CB-7EB8797ACA9C@delong.com>
References: <F5C06CAF-0AD2-4225-8EE7-FC72CE9913F0@muada.com> <755DE4C3-CDDF-41AF-BA9C-E8EC5B4DFC4C@muada.com>
To: Iljitsch van Beijnum <iljitsch@muada.com>
X-Mailer: Apple Mail (2.1878.6)
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0rc1 (owen.delong.com [IPv6:2620:0:930::200:2]); Thu, 16 Oct 2014 10:59:38 -0700 (PDT)
Archived-At: http://mailarchive.ietf.org/arch/msg/v6ops/q_6rDdNwwl8r4cyqQXVvh2ZYcJI
Cc: v6ops@ietf.org, grow@ietf.org
Subject: Re: [v6ops] [GROW] Deaggregation by large organizations
X-BeenThere: v6ops@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: v6ops discussion list <v6ops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/v6ops>, <mailto:v6ops-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/v6ops/>
List-Post: <mailto:v6ops@ietf.org>
List-Help: <mailto:v6ops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/v6ops>, <mailto:v6ops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 16 Oct 2014 18:04:00 -0000

On Oct 16, 2014, at 02:43 , Iljitsch van Beijnum <iljitsch@muada.com> wrote:

> Let me address a few points that were brought up by different people.
> 
> Renumbering:
> 
> We had great plans for making renumbering easy in the early days of IPv6. Remember A6 records and bitlabels in the DNS? But none of that went anywhere. The problem isn't so much that we can't push a prefix down the network or update the DNS (although both of these still have challenges associated with them), but that addresses tend to get hardcoded all over the place, starting with firewalls all the way up to homegrown applications. I don't think renumbering addresses this issue, although it could help steer some smaller organizations away from PI.

They never went anywhere because they focused on solving the easy part of the renumbering problem while utterly and completely ignoring the hard part.

Easy part: Changing your stuff.
Hard part: Updating all of the prefix lists, filters, firewall rules, VPN configurations, and other configuration elements in systems not under your control that contain your prefixes (partner VPNs, peer routers, etc.).

NONE of the things proposed in those days did anything at all to address the hard part.

> A prefix length limit for the IPv6 DFZ:
> 
> Someone mentioned that this didn't work in IPv6. When Sprint decided to make that /18, that didn't really work. But there's a de facto /24 limit that everyone understands. With IPv6, that would translate into a /48. Obviously no router can hold 2^48 or 2^45 prefixes, so as a backstop against accidental/malicious IPv6 routing table explosion this doesn't help. Even exploding a /28 or so into individual /48s would kill the IPv6 DFZ.
> 
> What COULD work is to have prefix length limits depending on the allocation size by the RIRs. Something like:
> 
> 2100::/16 -> /48
> 2200::/16 -> /32
> 2200::/15 -> /29

Because the 4.3 billion routes in the first category won't be a problem somehow?

Also, 2200::/16 and 2200::/15 overlap. Did you mean 2300::/16? If I apply longest match rule to what you've put above, that's the effective result. If you meant something else, I'm not sure how to divine that from what you wrote.

> However, for this to work well the RIRs would have to group allocations of the same size into separate blocks, with the result that it would no longer be possible to reserve space to grow an allocation. (Things like allocating a /48 but reserving a /44 reduce the opportunities for prefix length filtering because now the strictest filter you can make allows 16 x as many prefixes worst case than average case. The worst and average case need to be as close together as possible.)

I think overall, that would be worse than what we have today.

> I'd say that allowing two or three extra bits for traffic engineering for PA blocks would be good. So for the part of the IPv6 space where /29s are allocated, allow /31s or /32s. As traffic engineering incoming traffic by deaggregation requires that different parts of the aggregate all generate similar levels of incoming traffic, this wouldn't usually work for organizations using PI so I'd say don't allow deaggregating below /48.

What about multihomed customers? Do you want all multihomed customers to be forced into getting their space directly from RIRs and not from LIRs?

There are currently many cases where organizations obtain a /48 (or larger) from their ISP and subsequently choose to connect to an additional ISP and advertise the PA space as an independent route through both ISPs. Many ISPs cooperate in this process and allow this behavior. It does not change the number of routes in the routing table, but it can be less expensive for the customer if they choose to go that way.

Obviously, at their first renumbering event, it makes sense for them to renumber into PI, but this can prevent them from having to undergo renumbering while they remain connected to the initial provider without actually affecting the global routing table any more than getting PI would.

> Geographic communities:
> 
> I know this is controversial. "Topology ain't geography". Actually, most of the time there is a significant correlation. If all German cities inject a more specific, do you really need to hear those in Tokyo or Seattle? Just send the traffic to Europe as per the aggregate and let them figure it out there.

Spend much time in Asia or Africa or the Caribbean?

Sure, in the case of Germany, you probably don't need them in Tokyo or Seattle. OTOH, if you get a bunch of more specifics coming out of Rwanda, it might actually be significant to have them in Germany, Paris, and Brussels.

Where, exactly would you draw these lines and how would you go about handling the necessary exceptions?

Europe and the US are rich with exchange points and peering density. The rest of the world, less so to varying degrees resulting in more significant differences between topology and geography, often in ways that are not necessarily obvious.

> Compiling a list of communities that identify regions/countries/cities would allow for experimentation in this place without any downsides that I can see. Don't like this? Filter the communities. There's a handy list that you can copy and paste into your filter.

For these communities to be useful, they'd have to be transitive and people would have to agree to apply them to their prefixes. What happens in the case of "vigilante" tagging where some other AS decides to start applying these tags to my routes even though I specifically don't want that?

> Injecting an aggregate as a point of last resort:
> 
> I think this can be done today and probably is done today. But a document describing how to do it would probably be helpful. I'm thinking along the following lines:

It is already BCP for networks that have the ability to do so. It's not a separate service or anything, you just source the aggregate from one or more locations where you have the ability to forward the traffic as needed.

> The AoLR (Aggregate of Last Resort) service would entail a service provider announcing the aggregate without necessarily providing connectivity towards all the places announcing more specifics covered by the aggregate. So if ISP A announces the AoLR and ISP B provides connectivity to a more specific, ISP C would send traffic to A as per the aggregate and then A would immediately hand it over to B.

This assumes that A:
	1.	Is willing ot accept the more specifics from B.
	2.	Is willing to provide (likely unbillable) transit for the customer in question.
	3.	Has a peering relationship with all ISP Bs for the given customer.

In my experience, ISPs are usually hesitant to take responsibility for forwarding traffic they can't bill in some way.

Item 3 is an even more difficult problem to solve.

Most organizations, instead of depending on such a service simply build the necessary tunnels to provide their own AoLR capabilities as needed when they don't have internal circuits for the task.

> So as part of the AoLR service, a service provider would agree to accept all more specifics that fall under the aggregate (up to an agreed prefix length) from all the networks providing connectivity towards those more specifics. This would be an attractive service for tier-1s to provide, because presumably, they peer with everyone everywhere, so in the case where they receive the traffic over peering and need to deliver it to another service provider over peering, this could probably happen in the same city, so they wouldn't carry the traffic over long distances. But the (sub-)organization(s) in question still gets to buy connectivity from a wider range of smaller service providers.

Tier-1s in my experience not only don't usually peer with everyone everywhere, they often try to avoid peering with anyone they consider "beneath them" as a tactic to try and force those organizations to buy services from them.

Again, I don't see the ISPs wanting to do what you describe since there would be cost, but little benefit to them.

> In practice an organization would contract two or more service providers to provide the AoLR service for redundancy.
> 
> Wouldn't they just get PI:
> 
> Yes. That's why I think it's important to find a way to give these organizations what they need in a way that keeps the IPv6 DFZ growth on a workable trajectory.

I think the IPv6 DFZ growth is already on a workable trajectory. If we could eliminate the massive IPv4 routing table, then IPv6 is nowhere near outpacing router memory capacity growth for the foreseeable future.

The problem is surviving in the interim while we need IPv4 on a global basis. The reality is that I think IPv4 routing table bloat is eventually going to be the primary driver for IPv6 adoption. I think this will occur much faster than most people imagine because I believe that the fragmentation of the IPv4 table in the transfer market is going to make IPv4 routing progressively more untenable until it essentially collapses under its own weight. At that point, the remaining laggards will scramble to deploy IPv6 as quickly as possible and the IPv4 table will, largely, evaporate rather quickly.

> AS numbers:
> 
> BGP assumes that an AS always has internal connectivity. This can be accomplished using tunnels, but it's much better to simply have separate AS numbers for each subunit. Would it make sense to allocate ranges of AS numbers to enterprise LIRs? Certainly with 32-bit AS numbers there's no lack of numbers, and this would allow tools to be developed to work on CIDR-like AS number ranges in the future.

Yes... I'm all for going back to the definition of an AS as a contiguous collection of networks with an identical routing policy. With 32 bits, I think we have enough ASNs to accomplish this globally.

Owen