Re: [nvo3] Draft NVO3 WG Charter

Thomas Narten <narten@us.ibm.com> Fri, 17 February 2012 21:59 UTC

Return-Path: <narten@us.ibm.com>
X-Original-To: nvo3@ietfa.amsl.com
Delivered-To: nvo3@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2B20621F864F for <nvo3@ietfa.amsl.com>; Fri, 17 Feb 2012 13:59:42 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -109.185
X-Spam-Level:
X-Spam-Status: No, score=-109.185 tagged_above=-999 required=5 tests=[AWL=0.814, BAYES_00=-2.599, J_CHICKENPOX_13=0.6, RCVD_IN_DNSWL_HI=-8, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YjErJ5ReUy5R for <nvo3@ietfa.amsl.com>; Fri, 17 Feb 2012 13:59:41 -0800 (PST)
Received: from e39.co.us.ibm.com (e39.co.us.ibm.com [32.97.110.160]) by ietfa.amsl.com (Postfix) with ESMTP id 094CF21F8622 for <nvo3@ietf.org>; Fri, 17 Feb 2012 13:59:40 -0800 (PST)
Received: from /spool/local by e39.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for <nvo3@ietf.org> from <narten@us.ibm.com>; Fri, 17 Feb 2012 14:59:39 -0700
Received: from d03dlp01.boulder.ibm.com (9.17.202.177) by e39.co.us.ibm.com (192.168.1.139) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 17 Feb 2012 14:59:38 -0700
Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by d03dlp01.boulder.ibm.com (Postfix) with ESMTP id 9229FC40001 for <nvo3@ietf.org>; Fri, 17 Feb 2012 14:59:37 -0700 (MST)
Received: from d03av06.boulder.ibm.com (d03av06.boulder.ibm.com [9.17.195.245]) by d03relay04.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q1HLxarW099076 for <nvo3@ietf.org>; Fri, 17 Feb 2012 14:59:36 -0700
Received: from d03av06.boulder.ibm.com (loopback [127.0.0.1]) by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q1HLxiWR016872 for <nvo3@ietf.org>; Fri, 17 Feb 2012 14:59:44 -0700
Received: from cichlid.raleigh.ibm.com (sig-9-65-227-26.mts.ibm.com [9.65.227.26]) by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id q1HLxhs6016823 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 17 Feb 2012 14:59:44 -0700
Received: from cichlid.raleigh.ibm.com (localhost [127.0.0.1]) by cichlid.raleigh.ibm.com (8.14.5/8.12.5) with ESMTP id q1HLx06B030233; Fri, 17 Feb 2012 16:59:00 -0500
Message-Id: <201202172159.q1HLx06B030233@cichlid.raleigh.ibm.com>
To: stbryant@cisco.com
In-reply-to: <4F3EA613.5040202@cisco.com>
References: <201202171451.q1HEptR3027370@cichlid.raleigh.ibm.com> <4F3EA613.5040202@cisco.com>
Comments: In-reply-to Stewart Bryant <stbryant@cisco.com> message dated "Fri, 17 Feb 2012 19:10:11 +0000."
Date: Fri, 17 Feb 2012 16:58:59 -0500
From: Thomas Narten <narten@us.ibm.com>
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 12021721-4242-0000-0000-000000CE40BE
Cc: nvo3@ietf.org
Subject: Re: [nvo3] Draft NVO3 WG Charter
X-BeenThere: nvo3@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "L2 \"Network Virtualization Over l3\" overlay discussion list \(nvo3\)" <nvo3.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nvo3>, <mailto:nvo3-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nvo3>
List-Post: <mailto:nvo3@ietf.org>
List-Help: <mailto:nvo3-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nvo3>, <mailto:nvo3-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 17 Feb 2012 21:59:42 -0000

Stewart Bryant <stbryant@cisco.com> writes:

> On 17/02/2012 14:51, Thomas Narten wrote:
> > Below is a draft charter for this effort. One detail is that we
> > started out calling this effort NVO3 (Network Virtualization Over L3),
> > but have subsequently realized that we should not focus on just "over
> > L3". One goal of this effort is to develop an overlay standard that
> > works over L3, but we do not want to restrict ourselves only to "over
> > L3". The framework and architecture that we are proposing to work on
> > should be applicable to other overlays as well (e.g., L2 over
> > L2). This is (hopefully) captured in the proposed charter.

> This worries me. It is going to be difficult to avoid getting into
> a situation of boiling oceans here, and we need to make sure we
> start in the right place. That said I think that if a simple general
> solution can be designed as opposed to yet another application
> specific encapsulation, that would be a great service to the industry
> over the long term.

Developing a general solution is exactly the thinking. There is
absolutely no intention at all to boil the ocean. That is the last
thing I want to be involved in.

But note that we have the challenge that if we narrowly say we are
doing just L2 over L3, we get sent over to L2VPN right away. Or, if we
say just L3 over L3, L3VPN applies.

Personally (though I'm probably not speaking for everyone) I'd like to
see this effort explicitely support both (and only) L2 over L3 and L3
over L3. I think that most workloads today (and going forward) are
entirely IP based. For those, why not just dispense with L2 assume all
traffic will be IP?

But there will still be some workloads that do depend on L2
communication. For those, L2 over L3 needs to be supported.

But IMO a common approach should be able to support both.

Clearly there will be some differences in the control plane (e.g.,
different address familiies) but most of what is needed for either
solution can be the same. And of course packet encapsulations would
also depend on the specific technology.

So when I generalize about which layers to use, I'm not at all looking
to do "anything over anything". But I don't want to develop technology
that isn't easily extendible or reusable by those that have a need to
do so (if justified).

But such extensibility would be out of scope for this WG>

> > This WG will develop an approach to multi-tenancy that does not rely
> > on any underlying L2 mechanisms to support multi-tenancy. In
> > particular, the WG will develop an approach where multitenancy is
> > provided at the IP layer using an encapsulation header that resides
> > above IP. This effort is explicitly intended to leverage the interest
> > in L3 overlay approaches as exemplified by VXLAN
> > (draft-mahalingam-dutt-dcops-vxlan-00.txt) and NVGRE
> > (draft-sridharan-virtualization-nvgre-00.txt).

> The WG will need to consider the operations that it wants to
> encode in the "transport" layer. Encapsulation, delivery and
> multiplexing are compulsory, but are there others?

As far as I know at this point, I'm not aware of any others. The key
one is demultiplexing, i.e., identifying the tenant the packet belongs
to.

> I am NOT proposing to push MPLS here, but think for a little
> while about the subtly of that small header which encodes
> not (encap + delivery + mux), but a set of opaque
> instructions agreed between peers. So think very carefully
> about what you want hard coded and self describing and
> what flexibility you want to provide.

Would welcome some further discussions or pointers as to what in MPLS
you specifically are referring to.

> > A second work area is in the control plane, which allows an ingress
> > node to map the "inner" (tenant VM) address into an "outer"
> > (underlying transport network) address in order to tunnel a packet
> > across the data center. We propose to develop two control planes. One
> > control plane will use a learning mechanism similar to IEEE 802.1D
> > learning, and could be appropriate for smaller data centers. A second,
> > more scalable control plane would be aimed at large sites, capable of
> > scaling to hundreds of thousands of nodes.

> The WG clearly needs to solve both problems, but I think that
> it is too early to say whether you need two control planes or not
> for scaling. However the concept of mandating that the encapsulation
> layer be decoupled from the control protocol adds significantly
> to the utility of the encapsulation. The WG needs to bare in mind
> that there may in the long run be many reasons to create
> additional control protocols besides scaling.

At least among the folk I have talked to that are interested in this
work, it seems generally recognized that just using "learning" to
distribute address mappings is not viewed as being
reliable/predictable/robust enough for very large data centers. One of
the goals here is scaling to at least 100K physical machines, and if
we assume 10 VMs per physical machine (a low estimate going forward),
that's a million edge VMs.

Can we make a learning based approach scale to that size? Some folk
will probably argue yes, but others will say "maybe, but why bother
when we know we can do better."

> > Both control planes will
> > need to handle the case of VMs moving around the network in a dynamic
> > fashion, meaning that they will need to support tunnel endpoints
> > registering and deregistering mappings as VMs change location and
> > ensuring that out-of-date mapping tables are only used for short
> > periods of time. Finally, the second control plane must also be
> > applicable to geographically dispersed data centers.

> I think that we need to start by figuring out the properties that are
> needed and the figure out whether we need one, two of some other
> number of control protocols.

I would hope we don't need more than two!

> > The specific deliverables for this group include:
> >
> > 1) Finalize and publish the overall problem statement as an
> > Informational RFC (basis:
> > draft-narten-nvo3-overlay-problem-statement-01.txt)
> OK

> However consider producing a framework next so that
> work can proceed in parallel on the next topics with the
> framework acting as glue to hold the independent components
> together.

OK.

> > 2) Develop requirements and desirable properties for any encapsulation
> > format, and identify suitable encapsulations. Given the number of
> > already existing encapsulation formats, it is not an explicit goal of
> > this effort to choose exactly one format or to develop a new one.
> OK. When we get here we will know whether we use one or
> more existing or a new one. We need to work really hard to
> make sure the decision here is objective.

Yes.

> > 3) Produce a Standards Track control plane document that specifies how
> > to build mapping tables using a "learning" approach. This document is
> > expected to be short, as the algorithm itself will use a mechanism
> > similar to IEEE 802.1D learning.
> This is not how we should define this in the charter "similar to
> 802.1D" sets all sorts of expectations concerning the design
> and takes us into all sorts of territory we do not want to go to.

> At this stage only the first sentence is needed.

OK. But when we talk about learning, it is the 802.1D learning that
defines the approach. If we just saying "learning", do folk generally
know what that means?

> >
> > 4) Develop requirements (and later a Standards Track protocol) for a
> > more scalable control plane for managing and distributing the mappings
> > of "inner" to "outer" addresses. We will develop a reusable framework
> > suitable for use by any mapping function in which there is a need to
> > map "inner" to outer addresses. Starting point:
> > draft-kreeger-nvo3-overlay-cp-00.txt
> Conceptually the first sentence is correct, although you assume a
> lot by talking about inner and outer addresses.
> The WG will decide the starting point when it is formed.

> For the purposes of going forward we need to talk about scalability
> in the control plane, but I am not sure exactly how much solution
> oriented detail is appropriate at this stage.

FYI, draft-kreeger-nvo3-overlay-cp-00.txt says:

   5.   Highly scalable.  This means scaling to hundreds of thousands of
        OBPs and several million VNs within a single administrative
        domain.  As the number of OBPs and/or VNs within a data center
        grows, the protocol overhead at any one OBP should not increase
        significantly.

with OBP and VNs defined as:

   OBP:  Overlay Boundary Point.  This is a network entity that is on
      the edge boundary of the overlay.  It performs encapsulation to
      send packets to other OBPs across an Underlying Network for
      decapsulation.  An OBP could be implemented as part of a virtual
      switch within a hypervisor, a physical switch or router, a Network
      Service Appliance or even be embedded within an End Station.


   VN:  Virtual Network.  This is one instance of a virtual overlay
      network.  Two Virtual Networks are isolated from one another and
      may use overlapping addresses.

Thomas