Re: [nvo3] Review of draft-ietf-nvo3-framework-02

"LASSERRE, MARC (MARC)" <marc.lasserre@alcatel-lucent.com> Thu, 28 February 2013 16:02 UTC

Return-Path: <marc.lasserre@alcatel-lucent.com>
X-Original-To: nvo3@ietfa.amsl.com
Delivered-To: nvo3@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6BAC121F8BB2 for <nvo3@ietfa.amsl.com>; Thu, 28 Feb 2013 08:02:38 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.282
X-Spam-Level:
X-Spam-Status: No, score=-9.282 tagged_above=-999 required=5 tests=[AWL=0.367, BAYES_00=-2.599, HELO_EQ_FR=0.35, J_CHICKENPOX_14=0.6, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TnmjiUM57Blo for <nvo3@ietfa.amsl.com>; Thu, 28 Feb 2013 08:02:34 -0800 (PST)
Received: from smail3.alcatel.fr (smail3.alcatel.fr [64.208.49.56]) by ietfa.amsl.com (Postfix) with ESMTP id 356ED21F8BAB for <nvo3@ietf.org>; Thu, 28 Feb 2013 08:02:34 -0800 (PST)
Received: from FRMRSSXCHHUB02.dc-m.alcatel-lucent.com (FRMRSSXCHHUB02.dc-m.alcatel-lucent.com [135.120.45.62]) by smail3.alcatel.fr (8.14.3/8.14.3/ICT) with ESMTP id r1SG2SBn010363 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NOT); Thu, 28 Feb 2013 17:02:30 +0100
Received: from FRMRSSXCHMBSC3.dc-m.alcatel-lucent.com ([135.120.45.46]) by FRMRSSXCHHUB02.dc-m.alcatel-lucent.com ([135.120.45.62]) with mapi; Thu, 28 Feb 2013 17:02:29 +0100
From: "LASSERRE, MARC (MARC)" <marc.lasserre@alcatel-lucent.com>
To: Thomas Narten <narten@us.ibm.com>, "nvo3@ietf.org" <nvo3@ietf.org>
Date: Thu, 28 Feb 2013 17:02:55 +0100
Thread-Topic: [nvo3] Review of draft-ietf-nvo3-framework-02
Thread-Index: Ac4LzJwBWIf/eakZSrGQ2gtcetO6EQJ+G+Og
Message-ID: <E6E66922099CFB4391FAA7A7D3238F9F2999EB686E@FRMRSSXCHMBSC3.dc-m.alcatel-lucent.com>
References: <201302152233.r1FMXxdx020559@cichlid.raleigh.ibm.com>
In-Reply-To: <201302152233.r1FMXxdx020559@cichlid.raleigh.ibm.com>
Accept-Language: fr-FR, en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: fr-FR, en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 2.69 on 155.132.188.83
Subject: Re: [nvo3] Review of draft-ietf-nvo3-framework-02
X-BeenThere: nvo3@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Network Virtualization Overlays \(NVO3\) Working Group" <nvo3.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nvo3>, <mailto:nvo3-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nvo3>
List-Post: <mailto:nvo3@ietf.org>
List-Help: <mailto:nvo3-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nvo3>, <mailto:nvo3-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Feb 2013 16:02:38 -0000

Hi Thomas,

Thanks for your thorough review.
As far as terminology clarifications, I'm fine with them and can change them if the rest of the group agrees.

See me answers inline for the other comments.

Marc

> -----Original Message-----
> From: nvo3-bounces@ietf.org [mailto:nvo3-bounces@ietf.org] On
> Behalf Of Thomas Narten
> Sent: Friday, February 15, 2013 11:34 PM
> To: nvo3@ietf.org
> Subject: [nvo3] Review of draft-ietf-nvo3-framework-02
>
> Below is a detailed review of the framework document.  Pretty much all
> of these are editorial -- I don't think I really have any issue with
> the substance. But there are a lot of suggestions for clarifying text
> and tightening up the terminology, language, etc.
>
> Thomas
>
> High level: what happened to adding text describing the "oracle"
> model, where there was a clear separation of the NVE, the oracle (and
> the notion of a federated oracle). as well as separate protocosl for
> the inter/intra-oracle control vs. server-to-nve control? Per the
> September interim meeting, this needs to be added.

This is described in section 3.1.5.1. The term 'oracle' did not get used by a note mentioning that a logically centralized controller is sometimes referred as an 'oracle' can be added.

Section 3.1.5.2 describes the server-to-nve control plane.

>
>        [OVCPREQ] describes the requirements for a control
> plane protocol
>        required by overlay border nodes to exchange overlay mappings.
>
> The above document has been split into 2 documents. I think both
> should be referenced. Also, I think it would be good to not combine
> the two problem areas in the 2 drafts into one generic "control plane
> protocol" reference. They are very different problems with no overlap,
> and I think when folk talk about a "control plane", they are really
> referring to problem in draft-kreeger-nvo3-hypervisor-nve-cp-00.txt.

The intent of the 2 sections 3.1.5.[1-2] was to describe them separetely.
A reference to the 2 drafts will be added.

>
> >     1.2. General terminology
>
> Nit: On terminology, how about using consistent syntax/format of terms
> with abbreviation in parenthesis following the term itself, e.g.,
> something like
>
> OLD:
>
>        NVE: Network Virtualization Edge.
>
> NEW:
>       Network Virtualization Edge (NVE): ...
>
> Add the following (other documents use this term and we should define
> it here for use throughout NVO3):
>
>    Closed User Group (CUG): Another term for Virtual Network.
>

While this term is not used in the draft, I'll add a note in the VN definition.

>
> >        NVE: Network Virtualization Edge. It is a network
> entity that sits
> >        on the edge of the NVO3 network. It implements network
> >        virtualization functions that allow for L2 and/or L3 tenant
> >        separation and for hiding tenant addressing
> information (MAC and IP
> >        addresses). An NVE could be implemented as part of a
> virtual switch
> >        within a hypervisor, a physical switch or router, or
> a network
> >        service appliance.
>
> Could be improved. How about:
>
>        NVO3 Network: on overlay network that provides an L2 or L3
>        service to Tenant Systems over an L3 underlay network, using
>        the architecture and protocols as defined by the NVO3 Working
>        Group.
>
>        Network Virtualization Edge (NVE). An NVE is the network entity
>        that implements network virtualization functions and sits on
>        the boundary between an NVO3 network and an underlying network.
>        The network-facing side of the NVE uses the underlying L3
>        network to tunnel frames to and from other NVEs. The
>        server-facing side of the NVE sends and receives Ethernet
>        Frames to and from individual Tenant Systems.  An NVE could be
>        implemented as part of a virtual switch within a hypervisor, a
>        physical switch or router, a Network Service Appliance, or be
>        split across multiple devices.
>
>
> >        VN: Virtual Network. This is a virtual L2 or L3
> domain that belongs
> >        to a tenant.
>
> Better
>
>        Virtual Network (VN). A virtual network is a logical
>        abstraction of a physical network that provides network
>        services to a set of Tenant Systems.  To Tenant Systems, a
>        virtual network looks like a normal network (i.e., providing
>        unrestricted ethernet or L3 service), except that the only end
>        stations connected to the virtual network are those belonging
>        to a tenant's specific virtual network.
>
> >        VNI: Virtual Network Instance. This is one instance
> of a virtual
> >        overlay network. It refers to the state maintained
> for a given VN on
> >        a given NVE. Two Virtual Networks are isolated from
> one another and
> >        may use overlapping addresses.
>
> Better
>
>        Virtual Network Instance (VNI): An specific instance of a
>        Virtual Network.
>
> (no need to say more)
>
> >        Virtual Network Context or VN Context: Field that is
> part of the
> >        overlay encapsulation header which allows the
> encapsulated frame to
> >        be delivered to the appropriate virtual network
> endpoint by the
> >        egress NVE. The egress NVE uses this field to determine the
> >        appropriate virtual network context in which to
> process the packet.
> >        This field MAY be an explicit, unique (to the
> administrative domain)
> >        virtual network identifier (VNID) or MAY express the
> necessary
> >        context information in other ways (e.g., a locally
> significant
> >        identifier).
>
> Better:
>
>        Virtual Network Context (VN Context): Field in overlay
>        encapsulation header that identifies the specific VN the packet
>        belongs to. The egress NVE uses the VN Context to deliver the
>        packet to the correct Tenant System.  The VM Context can be a
>        locally significant identifier having meaning only in
>        conjunction with additional information, such as the
>        destination NVE address. Alternatively, the VM Context can have
>        broader scope, e.g., be unique across the entire NVO3 network.
>
>
> >        VNID:  Virtual Network Identifier. In the case where
> the VN context
> >        identifier has global significance, this is the ID
> value that is
> >        carried in each data packet in the overlay
> encapsulation that
> >        identifies the Virtual Network the packet belongs to.
>
> A VNID definition by itself doesn't seem all that helpful. (This term
> came from early on when some of us assumed that the Context ID always
> had non-local significance.)
>
> I think we may want to define some additional terms here. VNID by
> itself is not sufficient. Have a look at the terms "VN Alias", "VN
> Name" and "VN ID" in
> draft-kreeger-nvo3-hypervisor-nve-cp-00.txt. Those (or similar) terms
> should probably be moved into the framework document.
>
> >        Underlay or Underlying Network: This is the network
> that provides
> >        the connectivity between NVEs. The Underlying Network can be
> >        completely unaware of the overlay packets. Addresses
> within the
> >        Underlying Network are also referred to as "outer
> addresses" because
> >        they exist in the outer encapsulation. The
> Underlying Network can
> >        use a completely different protocol (and address
> family) from that
> >        of the overlay.
>
> We should say that for NVo3, the underlay is assumed to be IP
>
> Better:
>
>        Underlay or Underlying Network: the network that provides the
>        connectivity between NVEs and over which NVO3 packets are
>        tunneled. The Underlay Network does not need to be aware that
>        it is carrying NVO3 packets. Addresses on the Underlay Network
>        appear as "outer addresses" in encapsulated NVO3 packets. In
>        general, the Underlay Network can use a completely different
>        protocol (and address family) from that of the overlay. In the
>        case of NVO3, the underlay network will always be IP.
>
> >        Data Center (DC): A physical complex housing
> physical servers,
> >        network switches and routers, network service sppliances and
> >        networked storage. The purpose of a Data Center is
> to provide
> >        application, compute and/or storage services. One
> such service is
> >        virtualized infrastructure data center services,
> also known as
> >        Infrastructure as a Service.
>
> Should we add defn. for "network service appliance" ??
>
> >        Virtual Data Center or Virtual DC: A container for
> virtualized
> >        compute, storage and network services. Managed by a
> single tenant, a
> >        Virtual DC can contain multiple VNs and multiple
> Tenant Systems that
> >        are connected to one or more of these VNs.
>
> Tenant manages what is in the VDC. Network Admin manages all aspects
> of mapping the virtual components to physical components.
>
> Better:
>
>        Virtual Data Center (Virtual DC): A container for virtualized
>        compute, storage and network services. A Virtual DC is
>        associated with a single tenant, and can contain multiple VNs
>        and Tenant Systems connected to one or more of these VNs.
>
> >        VM: Virtual Machine. Several Virtual Machines can share the
> >        resources of a single physical computer server using
> the services of
> >        a Hypervisor (see below definition).
>
> Better (taken/adapted from RFC6820):
>
>    Virtual machine (VM):  A software implementation of a physical
>       machine that runs programs as if they were executing on a
>       physical, non-virtualized machine.  Applications (generally) do
>       not know they are running on a VM as opposed to running on a
>       "bare" host or server, though some systems provide a
>       paravirtualization environment that allows an operating
> systems or
>       application to be aware of the presences of virtualization for
>       optimization purposes.
>
> >        Hypervisor: Server virtualization software running
> on a physical
> >        compute server that hosts Virtual Machines. The
> hypervisor provides
> >        shared compute/memory/storage and network
> connectivity to the VMs
> >        that it hosts. Hypervisors often embed a Virtual
> Switch (see below).
>
> Compute server is not defined and the term isn't used elsewhere in the
> document. How about:
>
>        Hypervisor: Software running on a server that allows multiple
>        VMs to run on the same physical server. The hypervisor provides
>        shared compute/memory/storage and network connectivity to the
>        VMs that it hosts. Hypervisors often embed a Virtual Switch
>        (see below).
>
> Also, add (for completeness)
>
>       Server: A physical end host machine that runs user
>       applications. A standalone (or "bare metal") server runs a
>       conventional operating system hosting a single tenant
>       application. A virtualized server runs a hypervisor supporting
>       one or more VMs.
>
>
> >        Virtual Switch: A function within a Hypervisor (typically
> >        implemented in software) that provides similar services to a
> >        physical Ethernet switch.  It switches Ethernet
> frames between VMs
> >        virtual NICs within the same physical server, or
> between a VM and a
> >        physical NIC card connecting the server to a
> physical Ethernet
> >        switch or router. It also enforces network isolation
> between VMs
> >        that should not communicate with each other.
>
> slightly better:
>
>        Virtual Switch (vSwitch): A function within a Hypervisor
>        (typically implemented in software) that provides similar
>        services to a physical Ethernet switch.  A vSwitch forwards
>        Ethernet frames between VMs running on the same server, or
>        between a VM and a physical NIC card connecting the server to a
>        physical Ethernet switch. A vSwitch also enforces network
>        isolation between VMs that by policy are not permitted to
>        communicate with each other (e.g., by honoring VLANs).
>
> >
> >        Tenant: In a DC, a tenant refers to a customer that
> could be an
> >        organization within an enterprise, or an enterprise
> with a set of DC
> >        compute, storage and network resources associated with it.
>
> Better:
>
>        Tenant: The customer using a virtual network and any associated
>        resources (e.g., compute, storage and network).  A tenant could
>        be an enterprise, a department, an organization within an
>        enterprise, etc.
>
>
> >        Tenant System: A physical or virtual system that can
> play the role
> >        of a host, or a forwarding element such as a router, switch,
> >        firewall, etc. It belongs to a single tenant and
> connects to one or
> >        more VNs of that tenant.
>
> Better:
>
>        Tenant System: A physical or virtual host associated
> with a specific
>        tenant.  A Tenant System can play the role of a host, or a
>        forwarding element such as a router, switch, firewall, etc. A
>        Tenant System is associated with a specific tenant and connects
>        to one or more of the tenant's VNs.
>
> >        End device: A physical system to which networking service is
> >        provided. Examples include hosts (e.g. server or
> server blade),
> >        storage systems (e.g., file servers, iSCSI storage
> systems), and
> >        network devices (e.g., firewall, load-balancer,
> IPSec gateway). An
> >        end device may include internal networking
> functionality that
> >        interconnects the device's components (e.g. virtual
> switches that
> >        interconnect VMs running on the same server). NVE
> functionality may
> >        be implemented as part of that internal networking.
>
> Better:
>
>         End device: A physical device that connects directly to the
>         data center Underlay Network. An End Device is administered by
>         the data center operator rather than a tenant and is part of
>         data center infrastructure. An End Device may implement NVO3
>         technology in support of NVO3 functions. Contrast with Tenant
>         System, which is only connected to a Virtual Network.
>         Examples include hosts (e.g. server or server blade), storage
>         systems (e.g., file servers, iSCSI storage systems), and
>         network devices (e.g., firewall, load-balancer, IPSec
>         gateway).
>
>
>
> >
> >        ELAN: MEF ELAN, multipoint to multipoint Ethernet service
>
> I'd suggest dropping these terms. They are barely used and are not
> critical to understanding the framework
>
> >
> >        EVPN: Ethernet VPN as defined in [EVPN]
>
> Remove. These terms are not used elswhere in the document.
>
> >     1.3. DC network architecture
> >
> >        A generic architecture for Data Centers is depicted
> in Figure 1:
> >
> >                                     ,---------.
> >                                   ,'           `.
> >                                  (  IP/MPLS WAN )
> >                                   `.           ,'
> >                                     `-+------+'
> >                                  +--+--+   +-+---+
> >                                  |DC GW|+-+|DC GW|
> >                                  +-+---+   +-----+
> >                                     |       /
> >                                     .--. .--.
> >                                   (    '    '.--.
> >                                 .-.' Intra-DC     '
> >                                (     network      )
> >                                 (             .'-'
> >                                  '--'._.'.    )\ \
> >                                  / /     '--'  \ \
> >                                 / /      | |    \ \
> >                           +---+--+   +-`.+--+  +--+----+
> >                           | ToR  |   | ToR  |  |  ToR  |
> >                           +-+--`.+   +-+-`.-+  +-+--+--+
> >                            /     \    /    \   /       \
> >                         __/_      \  /      \ /_       _\__
> >                  '--------'   '--------'   '--------'   '--------'
> >                  :  End   :   :  End   :   :  End   :   :  End   :
> >                  : Device :   : Device :   : Device :   : Device :
> >                  '--------'   '--------'   '--------'   '--------'
> >
> >                  Figure 1 : A Generic Architecture for Data Centers
>
> The above is not necessarily what a DC looks like. ARMD went through
> this already, and there are many different data center network types.
> For example, the above doesn't allow for a chassis with an embedded
> switch between the End Device and ToR.

This is a logical view and does not prohibit intermediate devices to be used.

>
> This picture should be generalized to use terms like "access layer",
> and "aggregation layer" rather than specific terms like ToR. Have a
> look at RFC 6820.

In DC networks, this is a well-known and accepted term. I do not think that using generic terms helps.
But if there is a consensus, I'll be happy to do so.

>
> Feel free to grab text or just point there.
>
> >        An example of multi-tier DC network architecture is
> presented in
> >        this figure. It provides a view of physical
> components inside a DC.
>
> s/this figure/Figure 1/
>
> >        A cloud network is composed of intra-Data Center
> (DC) networks and
> >        network services, and inter-DC network and network
> connectivity
> >        services. Depending upon the scale, DC distribution,
> operations
> >        model, Capex and Opex aspects, DC networking
> elements can act as
> >        strict L2 switches and/or provide IP routing
> capabilities, including
> >        service virtualization.
>
> Do we really need to use the term "cloud network" and say what a
> "cloud network" is? The term does not seem to be used elsewhere in the
> document...

Agreed. It should read "A Data Center is usually composed of..."

>
> >        In some DC architectures, it is possible that some
> tier layers
> >        provide L2 and/or L3 services, are collapsed, and
> that Internet
> >        connectivity, inter-DC connectivity and VPN support
> are handled by a
> >        smaller number of nodes. Nevertheless, one can
> assume that the
> >        functional blocks fit in the architecture above.
>
> Per above, see how the ARMD document handled this.

Wouldn't a reference to the ARMD RFC6820 be sufficient?

>
> >     1.4. Tenant networking view
> >
> >        The DC network architecture is used to provide L2
> and/or L3 service
> >        connectivity to each tenant. An example is depicted
> in Figure 2:
> >
> >
> >                          +----- L3 Infrastructure ----+
> >                          |                            |
> >                       ,--+--.                      ,--+--.
> >                 .....( Rtr1  )......              ( Rtr2  )
> >                 |     `-----'      |               `-----'
> >                 |     Tenant1      |LAN12      Tenant1|
> >                 |LAN11         ....|........          |LAN13
> >           ..............        |        |     ..............
> >              |        |         |        |       |        |
> >             ,-.      ,-.       ,-.      ,-.     ,-.      ,-.
> >            (VM )....(VM )     (VM )... (VM )   (VM )....(VM )
> >             `-'      `-'       `-'      `-'     `-'      `-'
> >
> >             Figure 2 : Logical Service connectivity for a
> single tenant
> >
> >        In this example, one or more L3 contexts and one or
> more LANs (e.g.,
> >        one per application type) are assigned for DC tenant1.
>
> This picture is unclear. what does "tenant 1" cover in the picture?
> What is an "L3 context"? I would assume it needs to refer to specific
> VMs too...

The intent was to show how several VMs belonging to a specific tenant are logically attached to and how they are connected to.

>
> >        For a multi-tenant DC, a virtualized version of this
> type of service
> >        connectivity needs to be provided for each tenant by
> the Network
> >        Virtualization solution.
>
> I would assume  NVO3 only cares about the multi-tenant case. Which
> makes me wonder what the previous example is supposed to show. A
> single tenant case? What does that mean?

Agreed but a diagram with multiple tenants would make the picture even less clear.

>
> >
> >     2. Reference Models
> >
> >     2.1. Generic Reference Model
> >
> >        The following diagram shows a DC reference model for network
> >        virtualization using L3 (IP/MPLS) overlays where
> NVEs provide a
> >        logical interconnect between Tenant Systems that belong to a
> >        specific tenant network.
> >
>
> Should the above say "that belong to a specific tenant's Virtual
> Network"?

Agreed.

>
> >
> >              +--------+
> +--------+
> >              | Tenant +--+
> +----| Tenant |
> >              | System |  |                           (')
> | System |
> >              +--------+  |    ...................   (   )
> +--------+
> >                          |  +-+--+           +--+-+  (_)
> >                          |  | NV |           | NV |   |
> >                          +--|Edge|           |Edge|---+
> >                             +-+--+           +--+-+
> >                             / .                 .
> >                            /  .   L3 Overlay +--+-++--------+
> >              +--------+   /   .    Network   | NV || Tenant |
> >              | Tenant +--+    .              |Edge|| System |
> >              | System |       .    +----+    +--+-++--------+
> >              +--------+       .....| NV |........
> >                                    |Edge|
> >                                    +----+
> >                                      |
> >                                      |
> >                            =====================
> >                              |               |
> >                          +--------+      +--------+
> >                          | Tenant |      | Tenant |
> >                          | System |      | System |
> >                          +--------+      +--------+
>
> s/NV Edge/NVE/ for consistency
>
> >
> >           Figure 3 : Generic reference model for DC network
> virtualization
> >                            over a Layer3 infrastructure
> >
> >        A Tenant System can be attached to a Network
> Virtualization Edge
> >        (NVE) node in several ways:
>
> Each of these ways should be clearly labeled in Figure3.
>
> >
> >          - locally, by being co-located in the same device
>
> add something like: (e.g., as part of the hypervisor)
>
> >
> >          - remotely, via a point-to-point connection or a
> switched network
> >          (e.g., Ethernet)
> >
> >        When an NVE is local, the state of Tenant Systems
> can be provided
> >        without protocol assistance. For instance, the
> operational status of
> >        a VM can be communicated via a local API. When an
> NVE is remote, the
> >        state of Tenant Systems needs to be exchanged via a
> data or control
> >        plane protocol, or via a management entity.
>
> Better:
>
>        When an NVE is co-located with a Tenant System, communication
>        and synchronization between the TS and NVE takes place via
>        software (e.g., using an internal API). When an NVE and TS are
>        separated by an access link, interaction and synchronization
>        between an NVE and TS require an explicit data plane, control
>        plane, or management protocol.

Fine either way.

>
> >
> >        The functional components in Figure 3 do not necessarily map
> >        directly with the physical components described in Figure 1.
> >
> >        For example, an End Device can be a server blade
> with VMs and
> >        virtual switch, i.e. the VM is the Tenant System and the NVE
> >        functions may be performed by the virtual switch and/or the
> >        hypervisor. In this case, the Tenant System and NVE
> function are co-
> >        located.
> >
> >        Another example is the case where an End Device can
> be a traditional
> >        physical server (no VMs, no virtual switch), i.e.
> the server is the
> >        Tenant System and the NVE function may be performed
> by the ToR.
>
> We should not use the term "ToR" here. We should be more generic and
> say something like the "attached switch" or "access switch".

See above comment.

>
> >
> >        The NVE implements network virtualization functions
> that allow for
> >        L2 and/or L3 tenant separation and for hiding tenant
> addressing
> >        information (MAC and IP addresses), tenant-related
> control plane
> >        activity and service contexts from the underlay nodes.
>
> We should probably  define "tenant separation" earlier in the
> document and then
> just refer to that defintion. Add something like the following to the
> defintions?:
>
>     Tenant Separation: Tenant Separation refers to isolating traffic
>     of different tenants so that traffic from one tenant is not
>     visible to or delivered to another tenant, except when allowed by
>     policy. Tenant Separation also refers to address space separation,
>     whereby different tenants use the same address space for different
>     virtual networks without conflict.
>

Why not.

>
> >     2.2. NVE Reference Model
>
> >
> >        One or more VNIs can be instantiated on an NVE.
> Tenant Systems
> >        interface with a corresponding VNI via a Virtual Access Point
> >        (VAP).
>
> Define VAP in the terminology section.
>
> >        An overlay module that provides tunneling overlay
> functions (e.g.,
> >        encapsulation and decapsulation of tenant traffic
> from/to the tenant
> >        forwarding instance, tenant identification and
> mapping, etc), as
> >        described in figure 4:
>
> Doesn't quite parse. Better:
>
>         An overlay module on the NVE provides tunneling overlay
>         functions (e.g., encapsulation and decapsulation of tenant
>         traffic from/to the tenant forwarding instance, tenant
>         identification and mapping, etc), as described in figure 4:
>
> >
> >                           +------- L3 Network ------+
> >                           |                         |
> >                           |       Tunnel Overlay    |
> >              +------------+---------+
> +---------+------------+
> >              | +----------+-------+ |       |
> +---------+--------+ |
> >              | |  Overlay Module  | |       | |  Overlay
> Module  | |
> >              | +---------+--------+ |       |
> +---------+--------+ |
> >              |           |VN context|       | VN context|
>        |
> >              |           |          |       |           |
>        |
> >              |  +--------+-------+  |       |
> +--------+-------+  |
> >              |  | |VNI|   .  |VNI|  |       |  | |VNI|   .
> |VNI|  |
> >         NVE1 |  +-+------------+-+  |       |
> +-+-----------+--+  | NVE2
> >              |    |   VAPs     |    |       |    |    VAPs
>  |     |
> >              +----+------------+----+
> +----+-----------+-----+
> >                   |            |                 |           |
> >
> -------+------------+-----------------+-----------+-------
> >                   |            |     Tenant      |           |
> >                   |            |   Service IF    |           |
> >                  Tenant Systems                 Tenant Systems
> >
> >                   Figure 4 : Generic reference model for NV Edge
> >
> >        Note that some NVE functions (e.g., data plane and
> control plane
> >        functions) may reside in one device or may be
> implemented separately
> >        in different devices. For example, the NVE
> functionality could
> >        reside solely on the End Devices, or be distributed
> between the End
> >        Devices and the ToRs. In the latter case we say that
> the End Device
> >        NVE component acts as the NVE Spoke, and ToRs act as
> NVE hubs.
> >        Tenant Systems will interface with VNIs maintained
> on the NVE
> >        spokes, and VNIs maintained on the NVE spokes will
> interface with
> >        VNIs maintained on the NVE hubs.
>
> Don't always assume "ToR".
>
> Also, in Figure 4, "VN context' is listed as if it were a component or
> something. But VN context is  previously defined as a field in the
> overlay header. Is this something different? If so, we should use a
> different term (to avoid confusion).
>
> >     2.3. NVE Service Types
> >
> >        NVE components may be used to provide different
> types of virtualized
> >        network services. This section defines the service types and
> >        associated attributes. Note that an NVE may be
> capable of providing
> >        both L2 and L3 services.
>
> >     2.3.1. L2 NVE providing Ethernet LAN-like service
> >
> >        L2 NVE implements Ethernet LAN emulation (ELAN), an
> Ethernet based
>
> drop the "(ELAN)" abbreviation. (its not needed and is not used
> anywhere else).

Ok

>
> >        multipoint service where the Tenant Systems appear to be
> >        interconnected by a LAN environment over a set of L3
> tunnels. It
> >        provides per tenant virtual switching instance with
> MAC addressing
> >        isolation and L3 (IP/MPLS) tunnel encapsulation
> across the underlay.
> >
> >     2.3.2. L3 NVE providing IP/VRF-like service
> >
> >        Virtualized IP routing and forwarding is similar
> from a service
> >        definition perspective with IETF IP VPN (e.g.,
> BGP/MPLS IPVPN
> >        [RFC4364] and IPsec VPNs). It provides per tenant
> routing instance
>
> should provide an RFC reference for IPsec VPNs too...
>
> s/per tenant/per-tenant/
>
> >        with addressing isolation and L3 (IP/MPLS) tunnel
> encapsulation
> >        across the underlay.
> >
> >     3. Functional components
> >
> >        This section decomposes the Network Virtualization
> architecture into
> >        functional components described in Figure 4 to make
> it easier to
> >        discuss solution options for these components.
> >
> >     3.1. Service Virtualization Components
> >
> >     3.1.1. Virtual Access Points (VAPs)
> >
> >        Tenant Systems are connected to the VNI Instance
> through Virtual
> >        Access Points (VAPs).
> >
> >        The VAPs can be physical ports or virtual ports
> identified through
> >        logical interface identifiers (e.g., VLAN ID,
> internal vSwitch
> >        Interface ID coonected to a VM).
>
> s/coonected/connected/
>
> >
> >     3.1.2. Virtual Network Instance (VNI)
> >
> >        The VNI represents a set of configuration attributes
> defining access
> >        and tunnel policies and (L2 and/or L3) forwarding functions.
> >        Per tenant FIB tables and control plane protocol
> instances are used
> >        to maintain separate private contexts between
> tenants. Hence tenants
> >        are free to use their own addressing schemes without
> concerns about
> >        address overlapping with other tenants.
>
> Not exactly. The VNI is a VN Instance. Implementing a VNI requires a
> bunch of stuff. Also, here I think it is better to talk about VNs than
> tenants.

In fact, this text is here to provide details about the "bunch of stuff" associated with a VNI.

>
> Also, in reading through the doc, we say over and over again that you
> get address space separation, etc. No need to repeat this all
> the time!

Will fix it.

>
> Better:
>
>         A VNI is a specific VN instance. Associated with each VNI is a
>         set of metat data necessary to implement the specific VN
>         service. For example, a per-VN forwarding or mapping table is
>         needed deliver traffic to other members of the VN and ensure
>         tenant separation between between different VNs.
>
> >     3.1.3. Overlay Modules and VN Context
> >
> >        Mechanisms for identifying each tenant service are
> required to allow
> >        the simultaneous overlay of multiple tenant services
> over the same
> >        underlay L3 network topology. In the data plane,
> each NVE, upon
> >        sending a tenant packet, must be able to encode the
> VN Context for
> >        the destination NVE in addition to the L3 tunnel
> information (e.g.,
> >        source IP address identifying the source NVE and the
> destination IP
> >        address identifying the destination NVE, or MPLS
> label). This allows
> >        the destination NVE to identify the tenant service
> instance and
> >        therefore appropriately process and forward the
> tenant packet.
> >
> >        The Overlay module provides tunneling overlay
> functions: tunnel
> >        initiation/termination, encapsulation/decapsulation
> of frames from
> >        VAPs/L3 Backbone and may provide for transit
> forwarding of IP
>
> s/L3 Backbone/L3 underlay
>
> >        traffic (e.g., transparent tunnel forwarding).
>
> What is this "transit forwarding"?
>
> >        In a multi-tenant context, the tunnel aggregates
> frames from/to
> >        different VNIs. Tenant identification and traffic
> demultiplexing are
> >        based on the VN Context identifier (e.g., VNID).
>
> Let's drop use of VNID here (since IDs can be locally
> significant too).

Ok.

>
> >
> >        The following approaches can been considered:
> >
> >           o One VN Context per Tenant: A globally unique
> (on a per-DC
> >             administrative domain) VNID is used to identify
> the related
> >             Tenant instances. An example of this approach
> is the use of
> >             IEEE VLAN or ISID tags to provide virtual L2 domains.
>
> I think this is off. We are mixing "tenant" and "VN". A tenant can
> have multiple different VNIs associated with it.  Each of those VNIs
> uses different VN Contexts. Thus, the VN Context != Tenant.

This is not correct. This text explains the scope of a VN context. Obviously, a tenant can have multiple VNs.

>
> >
> >           o One VN Context per VNI: A per-tenant local value is
> >             automatically generated by the egress NVE and usually
> >             distributed by a control plane protocol to all
> the related
> >             NVEs. An example of this approach is the use of
> per VRF MPLS
> >             labels in IP VPN [RFC4364].
>
> This seems off. There could be a different VN Context for each NVE, so
> its not "per VNI".

Again, I think that you mis-interpreted this text. It is related to a VN context scope.

>
> >           o One VN Context per VAP: A per-VAP local value
> is assigned and
> >             usually distributed by a control plane
> protocol. An example of
> >             this approach is the use of per CE-PE MPLS
> labels in IP VPN
> >             [RFC4364].
> >
> >        Note that when using one VN Context per VNI or per VAP, an
> >        additional global identifier may be used by the
> control plane to
> >        identify the Tenant context.
>
> need a name for that global identifier term and put it in the
> terminology section.

This is optional and TBD in solutions drafts.

>
> >
> >     3.1.4. Tunnel Overlays and Encapsulation options
> >
> >        Once the VN context identifier is added to the
> frame, a L3 Tunnel
>
> When using term Context Identifier, capitalize it.
>
>
> >        encapsulation is used to transport the frame to the
> destination NVE.
> >        The backbone devices do not usually keep any per
> service state,
> >        simply forwarding the frames based on the outer tunnel
> >        header.
>
> don't use "backbone devices" term. use "underlay devices"?

Ok

>
> >
> >        Different IP tunneling options (e.g., GRE, L2TP,
> IPSec) and MPLS
> >        tunneling options (e.g., BGP VPN, VPLS) can be used.
> >
> >     3.1.5. Control Plane Components
>
> This section should be expanded to show the different problem areas
> for the control plane (specifically server-to-NVE and NVE-to-oracle).
>
> >
> >        Control plane components may be used to provide the
> following
> >        capabilities:
> >
> >           . Auto-provisioning/Service discovery
> >
> >           . Address advertisement and tunnel mapping
> >
> >           . Tunnel management
>
> The above really should be expanded a bit. E.g., what does "auto
> provisioning" refer to? I don't think a lot is needed, but a sentence
> or two per bullet point would help. Also, there are 3 bullet points
> here, but there are 4 subsections that follow, and they do not match
> the above bullet points.
>
> >
> >        A control plane component can be an on-net control protocol
> >        implemented on the NVE or a management control entity.
>
> What is "on-net protocol"

Will clarify this term. Would "embedded" be better?

>
> >
> >     3.1.5.1. Distributed vs Centralized Control Plane
> >
> >        A control/management plane entity can be centralized
> or distributed.
> >        Both approaches have been used extensively in the
> past. The routing
> >        model of the Internet is a good example of a
> distributed approach.
> >        Transport networks have usually used a centralized
> approach to
> >        manage transport paths.
>
> What is a "transport network"? I'm not sure what these are and why
> they are "usually"  use a centrallized network.

You missed all the good fun in MPLS when discussing MPLS-TP ;-)


>
> >        It is also possible to combine the two approaches
> i.e. using a
> >        hybrid model. A global view of network state can
> have many benefits
> >        but it does not preclude the use of distributed
> protocols within the
> >        network. Centralized controllers provide a facility
> to maintain
> >        global state, and distribute that state to the
> network which in
> >        combination with distributed protocols can aid in
> achieving greater
> >        network efficiencies, and improve reliability and
> robustness. Domain
> >        and/or deployment specific constraints define the
> balance between
> >        centralized and distributed approaches.
> >
> >        On one hand, a control plane module can reside in
> every NVE. This is
> >        how routing control plane modules are implemented in
> routers. At the
> >        same time, an external controller can manage a group
> of NVEs via an
> >        agent in each NVE. This is how an SDN controller
> could communicate
> >        with the nodes it controls, via OpenFlow [OF] for instance.
>
> Expand SDN on first usage...
>
> >
> >        In the case where a logically centralized control plane is
> >        preferred, the controller will need to be
> distributed to more than
> >        one node for redundancy and scalability in order to
> manage a large
> >        number of NVEs. Hence, inter-controller
> communication is necessary
> >        to synchronize state among controllers. It should be
> noted that
> >        controllers may be organized in clusters. The
> information exchanged
> >        between controllers of the same cluster could be
> different from the
> >        information exchanged across clusters.
>
> This section  does not really capture the oracle discussion we had at
> the interim meeting.

What do you propose to change/add?

>
> >     3.1.5.2. Auto-provisioning/Service discovery
> >
> >        NVEs must be able to identify the appropriate VNI
> for each Tenant
> >        System. This is based on state information that is
> often provided by
> >        external entities. For example, in an environment
> where a VM is a
> >        Tenant System, this information is provided by
> compute management
> >        systems, since these are the only entities that have
> visibility of
> >        which VM belongs to which tenant.
>
> Above. might be better to say this is provided by the vm orchestration
> system (and maybe define this term in terminology section?)
>
> >     3.1.5.3. Address advertisement and tunnel mapping
> >
> >        As traffic reaches an ingress NVE, a lookup is performed to
> >        determine which tunnel the packet needs to be sent
> to. It is then
>
> s/sent to/sent on/ ??
>
> >        encapsulated with a tunnel header containing the destination
> >        information (destination IP address or MPLS label)
> of the egress
> >        overlay node. Intermediate nodes (between the
> ingress and egress
> >        NVEs) switch or route traffic based upon the outer
> destination
> >        information.
>
> Use active voice above? And say which entity it is that does the work?
> E.g.
>
>         As traffic reaches an ingress NVE, the NVE performs a lookup
>         to determine which remote NVE the packet should be sent
>         to. The NVE [or Overlay Module???] then adds a tunnel
>         encapsulation header containing the destination information
>         (destination IP address or MPLS label) of the egress NVE as
>         well as an appropriate Context ID.  Nodes on the underlay
>         network (between the ingress and egress NVEs) forward traffic
>         based solely on the outer destination header information.
>
> >        One key step in this process consists of mapping a
> final destination
> >        information to the proper tunnel. NVEs are responsible for
> >        maintaining such mappings in their forwarding
> tables. Several ways
> >        of populating these tables are possible: control
> plane driven,
> >        management plane driven, or data plane driven.
>
> Better:
>
>         A key step in the above process consists of identifying the
>         destination NVE the packet is to be tunneled to. NVEs are
>         responsible for maintaining a set of forwarding or mapping
>         tables that hold the bindings between between destination VM
>         and egress NVE addresses. Several ways of populating these
>         tables are possible: control plane driven, management plane
>         driven, or data plane driven.
>
> >     3.2. Multi-homing
> >
> >        Multi-homing techniques can be used to increase the
> reliability of
> >        an nvo3 network. It is also important to ensure that
> physical
> >        diversity in an nvo3 network is taken into account
> to avoid single
> >        points of failure.
> >
> >        Multi-homing can be enabled in various nodes, from
> tenant systems
> >        into TORs, TORs into core switches/routers, and core
> nodes into DC
> >        GWs.
> >
> >        The nvo3 underlay nodes (i.e. from NVEs to DC GWs)
> rely on IP
> >        routing as the means to re-route traffic upon
> failures and/or ECMP
> >        techniques or on MPLS re-rerouting capabilities.
> >
> >        When a tenant system is co-located with the NVE on
> the same end-
> >        system, the tenant system is single homed to the NVE
> via a vport
> >        that is virtual NIC (vNIC). When the end system and the NVEs
> > are
>
> vport/vNIC terminology is not defined (and is somewhat proprietary).

Will make it more general.

>
> And shouldn't the VAP terminology be used here?
>
> >        separated, the end system is connected to the NVE
> via a logical
> >        Layer2 (L2) construct such as a VLAN. In this latter
> case, an end
> >        device or vSwitch on that device could be
> multi-homed to various
> >        NVEs. An NVE may provide an L2 service to the end
> system or a l3
> >        service. An NVE may be multi-homed to a next layer
> in the DC at
> >        Layer2 (L2) or Layer3 (L3). When an NVE provides an
> L2 service and
> >        is not co-located with the end system, techniques
> such as Ethernet
> >        Link Aggregation Group (LAG) or Spanning Tree
> Protocol (STP) can be
> >        used to switch traffic between an end system and connected
> >        NVEs without creating loops. Similarly, when the NVE
> provides L3
> >        service, similar dual-homing techniques can be used.
> When the NVE
> >        provides a L3 service to the end system, it is
> possible that no
> >        dynamic routing protocol is enabled between the end
> system and the
> >        NVE. The end system can be multi-homed to multiple
> physically-
> >        separated L3 NVEs over multiple interfaces. When one of the
> >        links connected to an NVE fails, the other
> interfaces can be used to
> >        reach the end system.
>
> The above seemt to talk about what I'll call a "distributed NVE
> model", where a TES is connected to more than one NVE (or the one NVE
> that is somehow distributed). If this document is going to talk about
> that as a possibility in the context of multihoming, I think we need
> to talk more generally about a TES connected to more than one
> TES. This document doesn't really talk about that at all.


No, this text talks various multi-homing techniques, taking into account that a TS and NVE can be co-located or not.

>
> Personally, I'm not sure we should go here. A lot of complexity that I
> suspect is not worth the cost. I'd suggest sticking with a single NVE
> per TES.
>
> >        External connectivity out of an nvo3 domain can be
> handled by two or
> >        more nvo3 gateways. Each gateway is connected to a
> different domain
> >        (e.g. ISP), providing access to external networks
> such as VPNs or
> >        the Internet. A gateway may be connected to two
> nodes. When a
> >        connection to an upstream node is lost, the
> alternative connection
> >        is used and the failed route withdrawn.
>
> For external multihoming, there is no reason to say they are connected
> to "different domains". You just want redundancy.
>
> Actually, I'm not sure what point the above is trying to highlight. Is
> this a generic requirement for multihoming out of a DC that happens to
> be running NVO3 internally? If so, I don't think we need to say
> that. It's a given and mostly out of scope.

I'm fine with removing this text.

>
> If we are talking just about "nvo3 gateways", I presume that means
> getting in and out of a specific VN. For that, you just need
> multihoming for redundancy, and there is no need to talk about
> connecting those gateways to "different domains".
>
> >     3.3. VM Mobility
> >
> >        In DC environments utilizing VM technologies, an
> important feature
> >        is that VMs can move from one server to another
> server in the same
> >        or different L2 physical domains (within or across DCs) in a
> >        seamless manner.
> >
> >        A VM can be moved from one server to another in
> stopped or suspended
> >        state ("cold" VM mobility) or in running/active
> state ("hot" VM
> >        mobility). With "hot" mobility, VM L2 and L3
> addresses need to be
> >        preserved. With "cold" mobility, it may be desired
> to preserve VM L3
> >        addresses.
> >
> >        Solutions to maintain connectivity while a VM is
> moved are necessary
> >        in the case of "hot" mobility. This implies that transport
> >        connections among VMs are preserved and that ARP
> caches are updated
> >        accordingly.
> >
> >        Upon VM mobility, NVE policies that define
> connectivity among VMs
> >        must be maintained.
> >
> >        Optimal routing during VM mobility is also an
> important aspect to
> >        address. It is expected that the VM's default
> gateway be as close as
> >        possible to the server hosting the VM and triangular
> routing be
> >        avoided.
>
> What is meant by "triangular routing" above? Specifically, how is this
> a result of mobility vs. a general requirement?
>
> >     3.4. Service Overlay Topologies
> >
> >        A number of service topologies may be used to
> optimize the service
> >        connectivity and to address NVE performance limitations.
> >
> >        The topology described in Figure 3 suggests the use
> of a tunnel mesh
> >        between the NVEs where each tenant instance is one
> hop away from a
> >        service processing perspective. Partial mesh
> topologies and an NVE
> >        hierarchy may be used where certain NVEs may act as
> service transit
> >        points.
> >
> >     4. Key aspects of overlay networks
> >
> >        The intent of this section is to highlight specific
> issues that
> >        proposed overlay solutions need to address.
> >
> >     4.1. Pros & Cons
> >
> >        An overlay network is a layer of virtual network
> topology on top of
> >        the physical network.
> >
> >        Overlay networks offer the following key advantages:
> >
> >           o Unicast tunneling state management and
> association with tenant
> >             systems reachability are handled at the edge of
> the network.
> >             Intermediate transport nodes are unaware of
> such state. Note
> >             that this is not the case when multicast is
> enabled in the core
> >             network.
>
> The comment about multicast needs expansion. Multicast (in the
> underlay)  is an underlay issue and has nothing to do with
> overlays. If Tenant traffic is mapped into multicast service on the
> underlay, then there is a connection. If the latter is what is meant,
> please add text to that effect.

This is exactly the point of this text. For unicast, state is handled at the edge, unlike for mcast.
I'll change "core" with "underlay" for clarity.

>
> >           o Tunneling is used to aggregate traffic and hide tenant
> >             addresses from the underkay network, and hence
> offer the
> >             advantage of minimizing the amount of
> forwarding state required
> >             within the underlay network
> >
> >           o Decoupling of the overlay addresses (MAC and
> IP) used by VMs
> >             from the underlay network. This offers a clear
> separation
> >             between addresses used within the overlay and
> the underlay
> >             networks and it enables the use of overlapping
> addresses spaces
> >             by Tenant Systems
> >
> >           o Support of a large number of virtual network
> identifiers
> >
> >        Overlay networks also create several challenges:
> >
> >           o Overlay networks have no controls of underlay
> networks and lack
> >             critical network information
>
> Is some text missing from the above bullet?

No. It simply says that underlays are like black boxes to overlays.

>
> >
> >                o Overlays typically probe the network to
> measure link or
> >                  path properties, such as available
> bandwidth or packet
> >                  loss rate. It is difficult to accurately
> evaluate network
> >                  properties. It might be preferable for the
> underlay
> >                  network to expose usage and performance
> >                information.
>
> I don't follow the above. Isn't the above a true statement for a host
> connected to an IP network as well?

True, it also applies to apps running over IP.

>
> >           o Miscommunication or lack of coordination
> between overlay and
> >             underlay networks can lead to an inefficient
> usage of network
> >             resources.
>
> Might be good to give an example.
>
> >           o When multiple overlays co-exist on top of a
> common underlay
> >             network, the lack of coordination between
> overlays can lead to
> >             performance issues.
>
> Can you give examples? and how is this different from what we have
> today with different hosts "not coordinating"  when they use the
> network?
>
> >           o Overlaid traffic may not traverse firewalls and NAT
>             devices.
>
> Explain why this is a challenge. I'd argue that is the point. if a FW
> is needed, it could be part of the overlay.
>
> >           o Multicast service scalability. Multicast support may be
> >             required in the underlay network to address for
> each tenant
> >             flood containment or efficient multicast
> handling. The underlay
> >             may be also be required to maintain multicast
> state on a per-
> >             tenant basis, or even on a per-individual
> multicast flow of a
> >             given tenant.
> >
> >           o Hash-based load balancing may not be optimal as
> the hash
> >             algorithm may not work well due to the limited
> number of
> >             combinations of tunnel source and destination
> addresses. Other
> >             NVO3 mechanisms may use additional entropy
> information than
> >             source and destination addresses.
> >     4.2. Overlay issues to consider
> >
> >     4.2.1. Data plane vs Control plane driven
> >
> >        In the case of an L2NVE, it is possible to
> dynamically learn MAC
> >        addresses against VAPs.
>
> rewrite? what does it mean to "learn MAC addresses against VAPs".
>
> > It is also possible that such addresses be
> >        known and controlled via management or a control
> protocol for both
> >        L2NVEs and L3NVEs.
> >
> >        Dynamic data plane learning implies that flooding of unknown
> >        destinations be supported and hence implies that
> broadcast and/or
> >        multicast be supported or that ingress replication
> be used as
> >        described in section 4.2.3. Multicasting in the
> underlay network for
> >        dynamic learning may lead to significant scalability
> limitations.
> >        Specific forwarding rules must be enforced to
> prevent loops from
> >        happening. This can be achieved using a spanning
> tree, a shortest
> >        path tree, or a split-horizon mesh.
> >
> >        It should be noted that the amount of state to be
> distributed is
> >        dependent upon network topology and the number of
> virtual machines.
> >        Different forms of caching can also be utilized to
> minimize state
> >        distribution between the various elements. The
> control plane should
> >        not require an NVE to maintain the locations of all
> the tenant
> >        systems whose VNs are not present on the NVE. The
> use of a control
> >        plane does not imply that the data plane on NVEs has
> to maintain all
> >        the forwarding state in the control plane.
> >
> >     4.2.2. Coordination between data plane and control plane
> >
> >        For an L2 NVE, the NVE needs to be able to determine
> MAC addresses
> >        of the end systems connected via a VAP. This can be
> achieved via
> >        dataplane learning or a control plane. For an L3
> NVE, the NVE needs
> >        to be able to determine IP addresses of the end
> systems connected
> >        via a VAP.
>
> Better:
>
>         For an L2 NVE, the NVE needs to be able to determine MAC
>         addresses of the end systems connected via a VAP. For an L3
>         NVE, the NVE needs to be able to determine IP addresses of the
>         end systems connected via a VAP. In both cases, this can be
>         achieved via dataplane learning or a control plane.
>
> >        In both cases, coordination with the NVE control
> protocol is needed
> >        such that when the NVE determines that the set of
> addresses behind a
> >        VAP has changed, it triggers the local NVE control plane to
> >        distribute this information to its peers.
> >
> >     4.2.3. Handling Broadcast, Unknown Unicast and
> Multicast (BUM) traffic
> >
> >        There are two techniques to support packet
> replication needed for
> >        broadcast, unknown unicast and multicast:
>
> s/for broadcast/for tenant broadcast/
>
> >           o Ingress replication
> >
> >           o Use of underlay multicast trees
>
> draft-ghanwani-nvo3-mcast-issues-00.txt describes a third technique.
>
> >        There is a bandwidth vs state trade-off between the
> two approaches.
> >        Depending upon the degree of replication required
> (i.e. the number
> >        of hosts per group) and the amount of multicast
> state to maintain,
> >        trading bandwidth for state should be considered.
> >
> >        When the number of hosts per group is large, the use
> of underlay
> >        multicast trees may be more appropriate. When the
> number of hosts is
> >        small (e.g. 2-3), ingress replication may not be an issue.
> >
> >        Depending upon the size of the data center network
> and hence the
> >        number of (S,G) entries, but also the duration of
> multicast flows,
> >        the use of underlay multicast trees can be a challenge.
> >
> >        When flows are well known, it is possible to
> pre-provision such
> >        multicast trees. However, it is often difficult to predict
> >        application flows ahead of time, and hence
> programming of (S,G)
> >        entries for short-lived flows could be impractical.
> >
> >        A possible trade-off is to use in the underlay
> shared multicast
> >        trees as opposed to dedicated multicast trees.
>
> >     4.2.4. Path MTU
> >
> >        When using overlay tunneling, an outer header is
> added to the
> >        original frame. This can cause the MTU of the path
> to the egress
> >        tunnel endpoint to be exceeded.
> >
> >        In this section, we will only consider the case of
> an IP overlay.
> >
> >        It is usually not desirable to rely on IP fragmentation for
> >        performance reasons. Ideally, the interface MTU as
> seen by a Tenant
> >        System is adjusted such that no fragmentation is
> needed. TCP will
> >        adjust its maximum segment size accordingly.
> >
> >        It is possible for the MTU to be configured manually
> or to be
> >        discovered dynamically. Various Path MTU discovery
> techniques exist
> >        in order to determine the proper MTU size to use:
> >
> >           o Classical ICMP-based MTU Path Discovery
> [RFC1191] [RFC1981]
> >
> >                o
> >                 Tenant Systems rely on ICMP messages to
> discover the MTU of
> >                  the end-to-end path to its destination.
> This method is not
> >                  always possible, such as when traversing
> middle boxes
> >                  (e.g. firewalls) which disable ICMP for
> security reasons
> >
> >           o Extended MTU Path Discovery techniques such as
> defined in
> >             [RFC4821]
> >
> >        It is also possible to rely on the overlay layer to perform
> >        segmentation and reassembly operations without
> relying on the Tenant
> >        Systems to know about the end-to-end MTU. The
> assumption is that
> >        some hardware assist is available on the NVE node to
> perform such
> >        SAR operations. However, fragmentation by the
> overlay layer can lead
>
> expand "SAR"
>
> >        to performance and congestion issues due to TCP
> dynamics and might
> >        require new congestion avoidance mechanisms from
> then underlay
> >        network [FLOYD].
> >
> >        Finally, the underlay network may be designed in
> such a way that the
> >        MTU can accommodate the extra tunneling and possibly
> additional nvo3
> >        header encapsulation overhead.
> >
> >     4.2.5. NVE location trade-offs
> >
> >        In the case of DC traffic, traffic originated from a
> VM is native
> >        Ethernet traffic. This traffic can be switched by a
> local virtual
> >        switch or ToR switch and then by a DC gateway. The
> NVE function can
> >        be embedded within any of these elements.
> >
> >        There are several criteria to consider when deciding
> where the NVE
> >        function should happen:
> >
> >           o Processing and memory requirements
> >
> >               o Datapath (e.g. lookups, filtering,
> >                  encapsulation/decapsulation)
> >
> >               o Control plane processing (e.g. routing,
> signaling, OAM) and
> >                  where specific control plane functions
> should be enabled
>
> missing closing ")"
>
> >           o FIB/RIB size
> >
> >           o Multicast support
> >
> >               o Routing/signaling protocols
> >
> >               o Packet replication capability
> >
> >               o Multicast FIB
> >
> >           o Fragmentation support
> >
> >           o QoS support (e.g. marking, policing, queuing)
> >
> >           o Resiliency
> >
> >     4.2.6. Interaction between network overlays and underlays
> >
> >        When multiple overlays co-exist on top of a common
> underlay network,
> >        resources (e.g., bandwidth) should be provisioned to
> ensure that
> >        traffic from overlays can be accommodated and QoS
> objectives can be
> >        met. Overlays can have partially overlapping paths
> (nodes and
> >        links).
> >
> >        Each overlay is selfish by nature. It sends traffic so as to
> >        optimize its own performance without considering the
> impact on other
> >        overlays, unless the underlay paths are traffic
> engineered on a per
> >        overlay basis to avoid congestion of underlay resources.
> >
> >        Better visibility between overlays and underlays, or
> generally
> >        coordination in placing overlay demand on an
> underlay network, can
> >        be achieved by providing mechanisms to exchange
> performance and
> >        liveliness information between the underlay and
> overlay(s) or the
> >        use of such information by a coordination system.
> Such information
> >        may include:
> >
> >           o Performance metrics (throughput, delay, loss, jitter)
> >
> >           o Cost metrics
> >
> >     5. Security Considerations
> >
> >        Nvo3 solutions must at least consider and address
> the following:
> >
> >           . Secure and authenticated communication between
> an NVE and an
> >             NVE management system.
> >
> >           . Isolation between tenant overlay networks. The
> use of per-
> >             tenant FIB tables (VNIs) on an NVE is essential.
> >
> >           . Security of any protocol used to carry overlay network
> >             information.
> >
> >           . Avoiding packets from reaching the wrong NVI,
> especially during
> >             VM moves.
> >
> >
>
> _______________________________________________
> nvo3 mailing list
> nvo3@ietf.org
> https://www.ietf.org/mailman/listinfo/nvo3
>