Re: [Rtg-dt-encap-considerations] overlay encapsulation group

"Fred Baker (fred)" <fred@cisco.com> Thu, 02 April 2015 22:18 UTC

From: "Fred Baker (fred)" <fred@cisco.com>
To: Erik Nordmark <nordmark@sonic.net>
Thread-Topic: overlay encapsulation group
Thread-Index: AQHQbYatxdSPvDu/mEWulxJ99q7LTp06m1+A
Date: Thu, 02 Apr 2015 22:18:11 +0000
Message-ID: <D8DF4756-6170-4316-BA1F-26586252E802@cisco.com>
References: <366DD3DD-A092-421F-B1C3-03BEDF1FE126@cisco.com> <551DAB9B.3010309@sonic.net>
In-Reply-To: <551DAB9B.3010309@sonic.net>
Accept-Language: en-US
Content-Language: en-US
Content-Type: multipart/signed; boundary="Apple-Mail=_E9CB24EF-6331-40F1-8878-1C3D4DBB5BC2"; protocol="application/pgp-signature"; micalg="pgp-sha1"
MIME-Version: 1.0
Archived-At: <http://mailarchive.ietf.org/arch/msg/rtg-dt-encap-considerations/syFY6S39-mo1dUCGjJpY4qND0yg>
Cc: "rtg-dt-encap-considerations@ietf.org" <Rtg-dt-encap-considerations@ietf.org>
Subject: Re: [Rtg-dt-encap-considerations] overlay encapsulation group
Precedence: list

> On Apr 2, 2015, at 1:50 PM, Erik Nordmark <nordmark@sonic.net> wrote:
> 
> We've looked briefly at the draft but I for one didn't quite understand the essence of the proposal.
> The draft seems to be about the IPv6 addressing plan for the datacenter. Is it also claiming that an encaps is insufficient if the VNI ID is encoded somewhere else (e.g., in the IPv6 address)?

Well, a little history. I started looking at OpenStack last year, noted that (at that time) it didn’t have an IPv6 capability, and wanted to create one. A couple of nanoseconds later, it occurred to me that “they didn’t use my favorite toy” is a lousy reason to do that work. What *would* make sense is to use my favorite toy to improve the offering.

In Icehouse, Juno, and Kilo, there has been work to bring IPv6 in. What the guys doing that have done - this is not a criticism, it’s an observation, and I have said it to them - is look for uses of an IPv4 address and make it be “either an IPv4 or an IPv6 address”. What that does is import into the IPv6 capability a number of design decisions that were probably appropriate for IPv4, but might not be for IPv6.

The first example of that is the widespread use of customer-supplied IPv4 address space and NA(P)T. Why would one do that? One reason is that it is a familiar and common thing in IPv4 networks. Another is that it is common in IPv4 networks due to address scarcity, and the data center can monetize floating IPv4 address space allocated to a tenant. It is basically a response to address scarcity, with the lipstick on the pig being that maybe the tenant can avoid the use of DNS (why?). The downside of the practice is the issues raised in RFC 2998 - it imposes constraints consistent with the client/server model and inconsistent with a peer/peer model. It also has issues in routing within the data center - routing is forced to go through one’s favorite virtual router rather than using the data center’s routing.

The second is that it takes the issue of ensuring inter-tenant isolation, which is fundamentally an authorization problem, and makes it a routing problem. Making it a routing problem imposes interesting limitations - a typical structure of a tenant application has several zones, a perimeter, one or more “compute” zones, and one or more “storage” zones. Treating access control to those zones as a routing problem flattens the application or makes routing more complex. But I can treat it as an authorization problem if I can simply enumerate the zones and specify the rules under which they can communicate.

The third has to do with the efficiency of communication and a coupling between the IP domain and the underlying infrastructure, and the management of that infrastructure. Early NFV implementations layered a tenant on a VLAN. That has an obvious business limit - a data center is now limited to 4094 tenants, or has to jump through design hoops in the datacenter to have multiple instances of a VLAN. A common replacement for that is VxLAN, which basically replaces a 12 bit VLAN number plus flags with an IP header, a UDP header, and a little more data. Both a VxLAN and a VLAN depend on the operator having some form of OSS that remembers and manages the actual structure of the V*LAN. So we now have an operational as well as a bits-in-the-packet cost. In addition, we layer a GRE encapsulation between members of a tenant. So to send a simple IP datagram from one system in a tenant to another, I have to expend something like 60-100 bytes per packet on the mechanics of getting it there.

That’s all within a single data center. Now repeat that using the words “multi-administration multi-location hybrid data center service”.

Really?

So my fundamental premise is that if we can in some scalable way label a packet with a tenant ID, and distribute ACLs appropriately, I can solve the problem with a basic IP(v6) network without those considerations. It provides inter-tenant isolation, efficiently and scalably, and bypasses the fiction and limitations (in IPv6) of an artificial address shortage. Responding to RFC 3439, I want the data center operational process to do exactly the same thing to interconnect two VMs/Containers in the same physical chassis as they would between a hybrid networks home territory and its datacenter service, or between elements of a multi-location multi-administration at a center operation.

The next obvious question is “so where do you put the label?”. In the appendix at the end of the draft I list six (or maybe five-and-a-half) different approaches. The first uses the flow label, assuming the sender of a packet can know the tenant ID of the intended receiver. That has several problems, not the least of which is that the flow label is a popular target, and the second is that it would be really nice to impose policy at the sender. The second came out of a discussion with a customer, and is pretty much obsoleted by Sebastian Jeuks’ UCL doctoral thesis, which is the third approach mentioned. He really likes his approach, but it suffers from the “only apply policy in the destination” issue. The Segment Routing guys *really*want*this*to*be*a*segment*routing* use case. They have showed me their header, which I made comments on and they fixed, but haven’t seen fit to work on how one actually makes it such a use case.

The final two, using an RFC 4291 address and a privacy address, put the tenant ID into the IID. The more I think about that, the better I like it. It allows me to have policy at the source, it doesn’t expand the packet, and in the segment routing case, it allows me to have a separate tenant ID on each address in sequence. The only thing it doesn’t give me, frankly, is the complexity of Sebastian’s model - and complexity is something I’m trying to avoid.

In the -03 version I sent as an attachment (preparing to post, but I have some more work I want to do before doing so), I inserted a multi-datacenter multi-administration hybrid use case. You might take a look at that.

Does that help?

Attachment: signature.asc

[Rtg-dt-encap-considerations] Fwd: overlay encaps… Erik Nordmark
Re: [Rtg-dt-encap-considerations] overlay encapsu… Erik Nordmark
Re: [Rtg-dt-encap-considerations] overlay encapsu… Fred Baker (fred)
Re: [Rtg-dt-encap-considerations] overlay encapsu… Fred Baker (fred)
Re: [Rtg-dt-encap-considerations] overlay encapsu… Tom Herbert
Re: [Rtg-dt-encap-considerations] overlay encapsu… Fred Baker (fred)

Re: [Rtg-dt-encap-considerations] overlay encapsulation group

Attachment: signature.asc