Re: [nvo3] Draft NVO3 WG Charter

Thomas Narten <narten@us.ibm.com> Fri, 17 February 2012 21:59 UTC

Message-Id: <201202172159.q1HLx06B030233@cichlid.raleigh.ibm.com>
To: stbryant@cisco.com
In-reply-to: <4F3EA613.5040202@cisco.com>
References: <201202171451.q1HEptR3027370@cichlid.raleigh.ibm.com> <4F3EA613.5040202@cisco.com>
Comments: In-reply-to Stewart Bryant <stbryant@cisco.com> message dated "Fri, 17 Feb 2012 19:10:11 +0000."
Date: Fri, 17 Feb 2012 16:58:59 -0500
From: Thomas Narten <narten@us.ibm.com>
Cc: nvo3@ietf.org
Subject: Re: [nvo3] Draft NVO3 WG Charter
Precedence: list

Stewart Bryant <stbryant@cisco.com> writes:

> On 17/02/2012 14:51, Thomas Narten wrote:
> > Below is a draft charter for this effort. One detail is that we
> > started out calling this effort NVO3 (Network Virtualization Over L3),
> > but have subsequently realized that we should not focus on just "over
> > L3". One goal of this effort is to develop an overlay standard that
> > works over L3, but we do not want to restrict ourselves only to "over
> > L3". The framework and architecture that we are proposing to work on
> > should be applicable to other overlays as well (e.g., L2 over
> > L2). This is (hopefully) captured in the proposed charter.

> This worries me. It is going to be difficult to avoid getting into
> a situation of boiling oceans here, and we need to make sure we
> start in the right place. That said I think that if a simple general
> solution can be designed as opposed to yet another application
> specific encapsulation, that would be a great service to the industry
> over the long term.

Developing a general solution is exactly the thinking. There is
absolutely no intention at all to boil the ocean. That is the last
thing I want to be involved in.

But note that we have the challenge that if we narrowly say we are
doing just L2 over L3, we get sent over to L2VPN right away. Or, if we
say just L3 over L3, L3VPN applies.

Personally (though I'm probably not speaking for everyone) I'd like to
see this effort explicitely support both (and only) L2 over L3 and L3
over L3. I think that most workloads today (and going forward) are
entirely IP based. For those, why not just dispense with L2 assume all
traffic will be IP?

But there will still be some workloads that do depend on L2
communication. For those, L2 over L3 needs to be supported.

But IMO a common approach should be able to support both.

Clearly there will be some differences in the control plane (e.g.,
different address familiies) but most of what is needed for either
solution can be the same. And of course packet encapsulations would
also depend on the specific technology.

So when I generalize about which layers to use, I'm not at all looking
to do "anything over anything". But I don't want to develop technology
that isn't easily extendible or reusable by those that have a need to
do so (if justified).

But such extensibility would be out of scope for this WG>

> > This WG will develop an approach to multi-tenancy that does not rely
> > on any underlying L2 mechanisms to support multi-tenancy. In
> > particular, the WG will develop an approach where multitenancy is
> > provided at the IP layer using an encapsulation header that resides
> > above IP. This effort is explicitly intended to leverage the interest
> > in L3 overlay approaches as exemplified by VXLAN
> > (draft-mahalingam-dutt-dcops-vxlan-00.txt) and NVGRE
> > (draft-sridharan-virtualization-nvgre-00.txt).

> The WG will need to consider the operations that it wants to
> encode in the "transport" layer. Encapsulation, delivery and
> multiplexing are compulsory, but are there others?

As far as I know at this point, I'm not aware of any others. The key
one is demultiplexing, i.e., identifying the tenant the packet belongs
to.

> I am NOT proposing to push MPLS here, but think for a little
> while about the subtly of that small header which encodes
> not (encap + delivery + mux), but a set of opaque
> instructions agreed between peers. So think very carefully
> about what you want hard coded and self describing and
> what flexibility you want to provide.

Would welcome some further discussions or pointers as to what in MPLS
you specifically are referring to.

> > A second work area is in the control plane, which allows an ingress
> > node to map the "inner" (tenant VM) address into an "outer"
> > (underlying transport network) address in order to tunnel a packet
> > across the data center. We propose to develop two control planes. One
> > control plane will use a learning mechanism similar to IEEE 802.1D
> > learning, and could be appropriate for smaller data centers. A second,
> > more scalable control plane would be aimed at large sites, capable of
> > scaling to hundreds of thousands of nodes.

> The WG clearly needs to solve both problems, but I think that
> it is too early to say whether you need two control planes or not
> for scaling. However the concept of mandating that the encapsulation
> layer be decoupled from the control protocol adds significantly
> to the utility of the encapsulation. The WG needs to bare in mind
> that there may in the long run be many reasons to create
> additional control protocols besides scaling.

At least among the folk I have talked to that are interested in this
work, it seems generally recognized that just using "learning" to
distribute address mappings is not viewed as being
reliable/predictable/robust enough for very large data centers. One of
the goals here is scaling to at least 100K physical machines, and if
we assume 10 VMs per physical machine (a low estimate going forward),
that's a million edge VMs.

Can we make a learning based approach scale to that size? Some folk
will probably argue yes, but others will say "maybe, but why bother
when we know we can do better."

> > Both control planes will
> > need to handle the case of VMs moving around the network in a dynamic
> > fashion, meaning that they will need to support tunnel endpoints
> > registering and deregistering mappings as VMs change location and
> > ensuring that out-of-date mapping tables are only used for short
> > periods of time. Finally, the second control plane must also be
> > applicable to geographically dispersed data centers.

> I think that we need to start by figuring out the properties that are
> needed and the figure out whether we need one, two of some other
> number of control protocols.

I would hope we don't need more than two!

> > The specific deliverables for this group include:
> >
> > 1) Finalize and publish the overall problem statement as an
> > Informational RFC (basis:
> > draft-narten-nvo3-overlay-problem-statement-01.txt)
> OK

> However consider producing a framework next so that
> work can proceed in parallel on the next topics with the
> framework acting as glue to hold the independent components
> together.

OK.

> > 2) Develop requirements and desirable properties for any encapsulation
> > format, and identify suitable encapsulations. Given the number of
> > already existing encapsulation formats, it is not an explicit goal of
> > this effort to choose exactly one format or to develop a new one.
> OK. When we get here we will know whether we use one or
> more existing or a new one. We need to work really hard to
> make sure the decision here is objective.

Yes.

> > 3) Produce a Standards Track control plane document that specifies how
> > to build mapping tables using a "learning" approach. This document is
> > expected to be short, as the algorithm itself will use a mechanism
> > similar to IEEE 802.1D learning.
> This is not how we should define this in the charter "similar to
> 802.1D" sets all sorts of expectations concerning the design
> and takes us into all sorts of territory we do not want to go to.

> At this stage only the first sentence is needed.

OK. But when we talk about learning, it is the 802.1D learning that
defines the approach. If we just saying "learning", do folk generally
know what that means?

> >
> > 4) Develop requirements (and later a Standards Track protocol) for a
> > more scalable control plane for managing and distributing the mappings
> > of "inner" to "outer" addresses. We will develop a reusable framework
> > suitable for use by any mapping function in which there is a need to
> > map "inner" to outer addresses. Starting point:
> > draft-kreeger-nvo3-overlay-cp-00.txt
> Conceptually the first sentence is correct, although you assume a
> lot by talking about inner and outer addresses.
> The WG will decide the starting point when it is formed.

> For the purposes of going forward we need to talk about scalability
> in the control plane, but I am not sure exactly how much solution
> oriented detail is appropriate at this stage.

FYI, draft-kreeger-nvo3-overlay-cp-00.txt says:

   5.   Highly scalable.  This means scaling to hundreds of thousands of
        OBPs and several million VNs within a single administrative
        domain.  As the number of OBPs and/or VNs within a data center
        grows, the protocol overhead at any one OBP should not increase
        significantly.

with OBP and VNs defined as:

   OBP:  Overlay Boundary Point.  This is a network entity that is on
      the edge boundary of the overlay.  It performs encapsulation to
      send packets to other OBPs across an Underlying Network for
      decapsulation.  An OBP could be implemented as part of a virtual
      switch within a hypervisor, a physical switch or router, a Network
      Service Appliance or even be embedded within an End Station.


   VN:  Virtual Network.  This is one instance of a virtual overlay
      network.  Two Virtual Networks are isolated from one another and
      may use overlapping addresses.

Thomas

[nvo3] Draft NVO3 WG Charter Thomas Narten
Re: [nvo3] Draft NVO3 WG Charter John E Drake
Re: [nvo3] Draft NVO3 WG Charter david.black
Re: [nvo3] Draft NVO3 WG Charter John E Drake
Re: [nvo3] Draft NVO3 WG Charter Thomas Narten
Re: [nvo3] Draft NVO3 WG Charter John E Drake
Re: [nvo3] Draft NVO3 WG Charter david.black
Re: [nvo3] Draft NVO3 WG Charter Randy Bush
Re: [nvo3] Draft NVO3 WG Charter Robert Raszuk
Re: [nvo3] Draft NVO3 WG Charter John E Drake
Re: [nvo3] Draft NVO3 WG Charter Robert Raszuk
Re: [nvo3] Draft NVO3 WG Charter John E Drake
Re: [nvo3] Draft NVO3 WG Charter Robert Raszuk
Re: [nvo3] Draft NVO3 WG Charter John E Drake
Re: [nvo3] Draft NVO3 WG Charter Robert Raszuk
Re: [nvo3] Draft NVO3 WG Charter Igor Gashinsky
Re: [nvo3] Draft NVO3 WG Charter david.black
Re: [nvo3] Draft NVO3 WG Charter david.black
Re: [nvo3] Draft NVO3 WG Charter Robert Raszuk
Re: [nvo3] Draft NVO3 WG Charter John E Drake
Re: [nvo3] Draft NVO3 WG Charter John E Drake
Re: [nvo3] Draft NVO3 WG Charter Stewart Bryant
Re: [nvo3] Draft NVO3 WG Charter david.black
Re: [nvo3] Draft NVO3 WG Charter Yakov Rekhter
Re: [nvo3] Draft NVO3 WG Charter Thomas Narten
Re: [nvo3] Draft NVO3 WG Charter Larry Kreeger
Re: [nvo3] Draft NVO3 WG Charter Robert Raszuk
Re: [nvo3] Draft NVO3 WG Charter Paul Unbehagen
Re: [nvo3] Draft NVO3 WG Charter david.black
Re: [nvo3] Draft NVO3 WG Charter Larry Kreeger
Re: [nvo3] Draft NVO3 WG Charter Ping Pan
Re: [nvo3] Draft NVO3 WG Charter Pat Thaler
Re: [nvo3] Draft NVO3 WG Charter Pat Thaler
Re: [nvo3] Draft NVO3 WG Charter Larry Kreeger
Re: [nvo3] Draft NVO3 WG Charter Lizhong Jin
Re: [nvo3] Draft NVO3 WG Charter Roger Jørgensen
Re: [nvo3] Draft NVO3 WG Charter Stiliadis, Dimitrios (Dimitri)
Re: [nvo3] Draft NVO3 WG Charter Stiliadis, Dimitrios (Dimitri)
Re: [nvo3] Draft NVO3 WG Charter Yakov Rekhter
Re: [nvo3] Draft NVO3 WG Charter Anoop Ghanwani
Re: [nvo3] Draft NVO3 WG Charter david.black
Re: [nvo3] Draft NVO3 WG Charter Robert Raszuk
Re: [nvo3] Draft NVO3 WG Charter Xuxiaohu
Re: [nvo3] Draft NVO3 WG Charter Xuxiaohu
Re: [nvo3] Draft NVO3 WG Charter Lizhong Jin
Re: [nvo3] Draft NVO3 WG Charter Yakov Rekhter
Re: [nvo3] Draft NVO3 WG Charter david.black
Re: [nvo3] Draft NVO3 WG Charter Pedro Marques
Re: [nvo3] Draft NVO3 WG Charter david.black
Re: [nvo3] Draft NVO3 WG Charter Pat Thaler