Re: [nvo3] Review of draft-ietf-nvo3-framework-02
Benson Schliesser <bensons@queuefull.net> Sat, 23 February 2013 00:33 UTC
Return-Path: <bensons@queuefull.net>
X-Original-To: nvo3@ietfa.amsl.com
Delivered-To: nvo3@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D3F4121F8861 for <nvo3@ietfa.amsl.com>; Fri, 22 Feb 2013 16:33:19 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.299
X-Spam-Level:
X-Spam-Status: No, score=-3.299 tagged_above=-999 required=5 tests=[AWL=-0.300, BAYES_00=-2.599, J_CHICKENPOX_14=0.6, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OYLJ-zL2DFHN for <nvo3@ietfa.amsl.com>; Fri, 22 Feb 2013 16:33:16 -0800 (PST)
Received: from mail-ve0-f173.google.com (mail-ve0-f173.google.com [209.85.128.173]) by ietfa.amsl.com (Postfix) with ESMTP id 18F2521F8860 for <nvo3@ietf.org>; Fri, 22 Feb 2013 16:33:16 -0800 (PST)
Received: by mail-ve0-f173.google.com with SMTP id oz10so1067815veb.4 for <nvo3@ietf.org>; Fri, 22 Feb 2013 16:33:15 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding :x-gm-message-state; bh=xqDCFngxsb+3TuE1QRdx3a5fGNuGXd7QGnD8+hv4nZo=; b=KX8xzhbXgjCMnApISug7JX1P/Wfv1ISbZvjcmdx5qQiv/NfS/VbjfgAQS88vlcGFA1 wMh8cUoFlLFN+z5ri3F3yzzbA15+tSJ8mXpNxjJCXi/4jicWL4DvK+61P3w9ibX6mK4C MTb5ga8Sgzfh7G7DxVjOCd1Kr5F383/vGaHh09xPIcu4zjCzBgwXsNh/xXhCqq5kl9hK L7zX92/CLaL+zP4Oft25XIqOYul0N87GCpuw0XckNB7Hu9bdBG48iypMWn9M/Ajbgaq7 1HNig2ka7OuZHkfeitn7URqtnbmj6/dYV563uKZjDHSkuseIu44bVQPo9cfBmRZ1enYO aQfA==
X-Received: by 10.58.221.228 with SMTP id qh4mr5301683vec.49.1361579595407; Fri, 22 Feb 2013 16:33:15 -0800 (PST)
Received: from tdickens-sslvpn-nc.jnpr.net (westford-nat.juniper.net. [66.129.232.2]) by mx.google.com with ESMTPS id i17sm6612401vdj.1.2013.02.22.16.33.13 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 22 Feb 2013 16:33:14 -0800 (PST)
Message-ID: <51280E48.30600@queuefull.net>
Date: Fri, 22 Feb 2013 19:33:12 -0500
From: Benson Schliesser <bensons@queuefull.net>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130216 Thunderbird/17.0.3
MIME-Version: 1.0
To: nvo3@ietf.org
References: <201302152233.r1FMXxdx020559@cichlid.raleigh.ibm.com>
In-Reply-To: <201302152233.r1FMXxdx020559@cichlid.raleigh.ibm.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Gm-Message-State: ALoCoQkAhIqF5SiMO8gWYcwQir40asts/AJRggBnliCSLWEn/F6LU+0UtXODCoDniK4R2iXQrif+
Subject: Re: [nvo3] Review of draft-ietf-nvo3-framework-02
X-BeenThere: nvo3@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Network Virtualization Overlays \(NVO3\) Working Group" <nvo3.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nvo3>, <mailto:nvo3-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/nvo3>
List-Post: <mailto:nvo3@ietf.org>
List-Help: <mailto:nvo3-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nvo3>, <mailto:nvo3-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 23 Feb 2013 00:33:20 -0000
Thank you for such a thorough review, Thomas. Speaking personally, I find myself in agreement with most of your comments. But as WG co-chair, I'd really like to hear from the authors and other NVO3 contributors. Folks, please take time to read and comment on Thomas' review below (or at http://www.ietf.org/mail-archive/web/nvo3/current/msg02072.html). Cheers, -Benson On 2/15/13 5:33 PM, Thomas Narten wrote: > Below is a detailed review of the framework document. Pretty much all > of these are editorial -- I don't think I really have any issue with > the substance. But there are a lot of suggestions for clarifying text > and tightening up the terminology, language, etc. > > Thomas > > High level: what happened to adding text describing the "oracle" > model, where there was a clear separation of the NVE, the oracle (and > the notion of a federated oracle). as well as separate protocosl for > the inter/intra-oracle control vs. server-to-nve control? Per the > September interim meeting, this needs to be added. > > [OVCPREQ] describes the requirements for a control plane protocol > required by overlay border nodes to exchange overlay mappings. > > The above document has been split into 2 documents. I think both > should be referenced. Also, I think it would be good to not combine > the two problem areas in the 2 drafts into one generic "control plane > protocol" reference. They are very different problems with no overlap, > and I think when folk talk about a "control plane", they are really > referring to problem in draft-kreeger-nvo3-hypervisor-nve-cp-00.txt. > >> 1.2. General terminology > Nit: On terminology, how about using consistent syntax/format of terms > with abbreviation in parenthesis following the term itself, e.g., > something like > > OLD: > > NVE: Network Virtualization Edge. > > NEW: > Network Virtualization Edge (NVE): ... > > Add the following (other documents use this term and we should define > it here for use throughout NVO3): > > Closed User Group (CUG): Another term for Virtual Network. > > >> NVE: Network Virtualization Edge. It is a network entity that sits >> on the edge of the NVO3 network. It implements network >> virtualization functions that allow for L2 and/or L3 tenant >> separation and for hiding tenant addressing information (MAC and IP >> addresses). An NVE could be implemented as part of a virtual switch >> within a hypervisor, a physical switch or router, or a network >> service appliance. > Could be improved. How about: > > NVO3 Network: on overlay network that provides an L2 or L3 > service to Tenant Systems over an L3 underlay network, using > the architecture and protocols as defined by the NVO3 Working > Group. > > Network Virtualization Edge (NVE). An NVE is the network entity > that implements network virtualization functions and sits on > the boundary between an NVO3 network and an underlying network. > The network-facing side of the NVE uses the underlying L3 > network to tunnel frames to and from other NVEs. The > server-facing side of the NVE sends and receives Ethernet > Frames to and from individual Tenant Systems. An NVE could be > implemented as part of a virtual switch within a hypervisor, a > physical switch or router, a Network Service Appliance, or be > split across multiple devices. > > >> VN: Virtual Network. This is a virtual L2 or L3 domain that belongs >> to a tenant. > Better > > Virtual Network (VN). A virtual network is a logical > abstraction of a physical network that provides network > services to a set of Tenant Systems. To Tenant Systems, a > virtual network looks like a normal network (i.e., providing > unrestricted ethernet or L3 service), except that the only end > stations connected to the virtual network are those belonging > to a tenant's specific virtual network. > >> VNI: Virtual Network Instance. This is one instance of a virtual >> overlay network. It refers to the state maintained for a given VN on >> a given NVE. Two Virtual Networks are isolated from one another and >> may use overlapping addresses. > Better > > Virtual Network Instance (VNI): An specific instance of a > Virtual Network. > > (no need to say more) > >> Virtual Network Context or VN Context: Field that is part of the >> overlay encapsulation header which allows the encapsulated frame to >> be delivered to the appropriate virtual network endpoint by the >> egress NVE. The egress NVE uses this field to determine the >> appropriate virtual network context in which to process the packet. >> This field MAY be an explicit, unique (to the administrative domain) >> virtual network identifier (VNID) or MAY express the necessary >> context information in other ways (e.g., a locally significant >> identifier). > Better: > > Virtual Network Context (VN Context): Field in overlay > encapsulation header that identifies the specific VN the packet > belongs to. The egress NVE uses the VN Context to deliver the > packet to the correct Tenant System. The VM Context can be a > locally significant identifier having meaning only in > conjunction with additional information, such as the > destination NVE address. Alternatively, the VM Context can have > broader scope, e.g., be unique across the entire NVO3 network. > > >> VNID: Virtual Network Identifier. In the case where the VN context >> identifier has global significance, this is the ID value that is >> carried in each data packet in the overlay encapsulation that >> identifies the Virtual Network the packet belongs to. > A VNID definition by itself doesn't seem all that helpful. (This term > came from early on when some of us assumed that the Context ID always > had non-local significance.) > > I think we may want to define some additional terms here. VNID by > itself is not sufficient. Have a look at the terms "VN Alias", "VN > Name" and "VN ID" in > draft-kreeger-nvo3-hypervisor-nve-cp-00.txt. Those (or similar) terms > should probably be moved into the framework document. > >> Underlay or Underlying Network: This is the network that provides >> the connectivity between NVEs. The Underlying Network can be >> completely unaware of the overlay packets. Addresses within the >> Underlying Network are also referred to as "outer addresses" because >> they exist in the outer encapsulation. The Underlying Network can >> use a completely different protocol (and address family) from that >> of the overlay. > We should say that for NVo3, the underlay is assumed to be IP > > Better: > > Underlay or Underlying Network: the network that provides the > connectivity between NVEs and over which NVO3 packets are > tunneled. The Underlay Network does not need to be aware that > it is carrying NVO3 packets. Addresses on the Underlay Network > appear as "outer addresses" in encapsulated NVO3 packets. In > general, the Underlay Network can use a completely different > protocol (and address family) from that of the overlay. In the > case of NVO3, the underlay network will always be IP. > >> Data Center (DC): A physical complex housing physical servers, >> network switches and routers, network service sppliances and >> networked storage. The purpose of a Data Center is to provide >> application, compute and/or storage services. One such service is >> virtualized infrastructure data center services, also known as >> Infrastructure as a Service. > Should we add defn. for "network service appliance" ?? > >> Virtual Data Center or Virtual DC: A container for virtualized >> compute, storage and network services. Managed by a single tenant, a >> Virtual DC can contain multiple VNs and multiple Tenant Systems that >> are connected to one or more of these VNs. > Tenant manages what is in the VDC. Network Admin manages all aspects > of mapping the virtual components to physical components. > > Better: > > Virtual Data Center (Virtual DC): A container for virtualized > compute, storage and network services. A Virtual DC is > associated with a single tenant, and can contain multiple VNs > and Tenant Systems connected to one or more of these VNs. > >> VM: Virtual Machine. Several Virtual Machines can share the >> resources of a single physical computer server using the services of >> a Hypervisor (see below definition). > Better (taken/adapted from RFC6820): > > Virtual machine (VM): A software implementation of a physical > machine that runs programs as if they were executing on a > physical, non-virtualized machine. Applications (generally) do > not know they are running on a VM as opposed to running on a > "bare" host or server, though some systems provide a > paravirtualization environment that allows an operating systems or > application to be aware of the presences of virtualization for > optimization purposes. > >> Hypervisor: Server virtualization software running on a physical >> compute server that hosts Virtual Machines. The hypervisor provides >> shared compute/memory/storage and network connectivity to the VMs >> that it hosts. Hypervisors often embed a Virtual Switch (see below). > Compute server is not defined and the term isn't used elsewhere in the > document. How about: > > Hypervisor: Software running on a server that allows multiple > VMs to run on the same physical server. The hypervisor provides > shared compute/memory/storage and network connectivity to the > VMs that it hosts. Hypervisors often embed a Virtual Switch > (see below). > > Also, add (for completeness) > > Server: A physical end host machine that runs user > applications. A standalone (or "bare metal") server runs a > conventional operating system hosting a single tenant > application. A virtualized server runs a hypervisor supporting > one or more VMs. > > >> Virtual Switch: A function within a Hypervisor (typically >> implemented in software) that provides similar services to a >> physical Ethernet switch. It switches Ethernet frames between VMs >> virtual NICs within the same physical server, or between a VM and a >> physical NIC card connecting the server to a physical Ethernet >> switch or router. It also enforces network isolation between VMs >> that should not communicate with each other. > slightly better: > > Virtual Switch (vSwitch): A function within a Hypervisor > (typically implemented in software) that provides similar > services to a physical Ethernet switch. A vSwitch forwards > Ethernet frames between VMs running on the same server, or > between a VM and a physical NIC card connecting the server to a > physical Ethernet switch. A vSwitch also enforces network > isolation between VMs that by policy are not permitted to > communicate with each other (e.g., by honoring VLANs). > >> Tenant: In a DC, a tenant refers to a customer that could be an >> organization within an enterprise, or an enterprise with a set of DC >> compute, storage and network resources associated with it. > Better: > > Tenant: The customer using a virtual network and any associated > resources (e.g., compute, storage and network). A tenant could > be an enterprise, a department, an organization within an > enterprise, etc. > > >> Tenant System: A physical or virtual system that can play the role >> of a host, or a forwarding element such as a router, switch, >> firewall, etc. It belongs to a single tenant and connects to one or >> more VNs of that tenant. > Better: > > Tenant System: A physical or virtual host associated with a specific > tenant. A Tenant System can play the role of a host, or a > forwarding element such as a router, switch, firewall, etc. A > Tenant System is associated with a specific tenant and connects > to one or more of the tenant's VNs. > >> End device: A physical system to which networking service is >> provided. Examples include hosts (e.g. server or server blade), >> storage systems (e.g., file servers, iSCSI storage systems), and >> network devices (e.g., firewall, load-balancer, IPSec gateway). An >> end device may include internal networking functionality that >> interconnects the device's components (e.g. virtual switches that >> interconnect VMs running on the same server). NVE functionality may >> be implemented as part of that internal networking. > Better: > > End device: A physical device that connects directly to the > data center Underlay Network. An End Device is administered by > the data center operator rather than a tenant and is part of > data center infrastructure. An End Device may implement NVO3 > technology in support of NVO3 functions. Contrast with Tenant > System, which is only connected to a Virtual Network. > Examples include hosts (e.g. server or server blade), storage > systems (e.g., file servers, iSCSI storage systems), and > network devices (e.g., firewall, load-balancer, IPSec > gateway). > > > >> ELAN: MEF ELAN, multipoint to multipoint Ethernet service > I'd suggest dropping these terms. They are barely used and are not > critical to understanding the framework > >> EVPN: Ethernet VPN as defined in [EVPN] > Remove. These terms are not used elswhere in the document. > >> 1.3. DC network architecture >> >> A generic architecture for Data Centers is depicted in Figure 1: >> >> ,---------. >> ,' `. >> ( IP/MPLS WAN ) >> `. ,' >> `-+------+' >> +--+--+ +-+---+ >> |DC GW|+-+|DC GW| >> +-+---+ +-----+ >> | / >> .--. .--. >> ( ' '.--. >> .-.' Intra-DC ' >> ( network ) >> ( .'-' >> '--'._.'. )\ \ >> / / '--' \ \ >> / / | | \ \ >> +---+--+ +-`.+--+ +--+----+ >> | ToR | | ToR | | ToR | >> +-+--`.+ +-+-`.-+ +-+--+--+ >> / \ / \ / \ >> __/_ \ / \ /_ _\__ >> '--------' '--------' '--------' '--------' >> : End : : End : : End : : End : >> : Device : : Device : : Device : : Device : >> '--------' '--------' '--------' '--------' >> >> Figure 1 : A Generic Architecture for Data Centers > The above is not necessarily what a DC looks like. ARMD went through > this already, and there are many different data center network types. > For example, the above doesn't allow for a chassis with an embedded > switch between the End Device and ToR. > > This picture should be generalized to use terms like "access layer", > and "aggregation layer" rather than specific terms like ToR. Have a > look at RFC 6820. > > Feel free to grab text or just point there. > >> An example of multi-tier DC network architecture is presented in >> this figure. It provides a view of physical components inside a DC. > s/this figure/Figure 1/ > >> A cloud network is composed of intra-Data Center (DC) networks and >> network services, and inter-DC network and network connectivity >> services. Depending upon the scale, DC distribution, operations >> model, Capex and Opex aspects, DC networking elements can act as >> strict L2 switches and/or provide IP routing capabilities, including >> service virtualization. > Do we really need to use the term "cloud network" and say what a > "cloud network" is? The term does not seem to be used elsewhere in the > document... > >> In some DC architectures, it is possible that some tier layers >> provide L2 and/or L3 services, are collapsed, and that Internet >> connectivity, inter-DC connectivity and VPN support are handled by a >> smaller number of nodes. Nevertheless, one can assume that the >> functional blocks fit in the architecture above. > Per above, see how the ARMD document handled this. > >> 1.4. Tenant networking view >> >> The DC network architecture is used to provide L2 and/or L3 service >> connectivity to each tenant. An example is depicted in Figure 2: >> >> >> +----- L3 Infrastructure ----+ >> | | >> ,--+--. ,--+--. >> .....( Rtr1 )...... ( Rtr2 ) >> | `-----' | `-----' >> | Tenant1 |LAN12 Tenant1| >> |LAN11 ....|........ |LAN13 >> .............. | | .............. >> | | | | | | >> ,-. ,-. ,-. ,-. ,-. ,-. >> (VM )....(VM ) (VM )... (VM ) (VM )....(VM ) >> `-' `-' `-' `-' `-' `-' >> >> Figure 2 : Logical Service connectivity for a single tenant >> >> In this example, one or more L3 contexts and one or more LANs (e.g., >> one per application type) are assigned for DC tenant1. > This picture is unclear. what does "tenant 1" cover in the picture? > What is an "L3 context"? I would assume it needs to refer to specific > VMs too... > >> For a multi-tenant DC, a virtualized version of this type of service >> connectivity needs to be provided for each tenant by the Network >> Virtualization solution. > I would assume NVO3 only cares about the multi-tenant case. Which > makes me wonder what the previous example is supposed to show. A > single tenant case? What does that mean? > >> 2. Reference Models >> >> 2.1. Generic Reference Model >> >> The following diagram shows a DC reference model for network >> virtualization using L3 (IP/MPLS) overlays where NVEs provide a >> logical interconnect between Tenant Systems that belong to a >> specific tenant network. >> > Should the above say "that belong to a specific tenant's Virtual > Network"? > >> >> +--------+ +--------+ >> | Tenant +--+ +----| Tenant | >> | System | | (') | System | >> +--------+ | ................... ( ) +--------+ >> | +-+--+ +--+-+ (_) >> | | NV | | NV | | >> +--|Edge| |Edge|---+ >> +-+--+ +--+-+ >> / . . >> / . L3 Overlay +--+-++--------+ >> +--------+ / . Network | NV || Tenant | >> | Tenant +--+ . |Edge|| System | >> | System | . +----+ +--+-++--------+ >> +--------+ .....| NV |........ >> |Edge| >> +----+ >> | >> | >> ===================== >> | | >> +--------+ +--------+ >> | Tenant | | Tenant | >> | System | | System | >> +--------+ +--------+ > s/NV Edge/NVE/ for consistency > >> >> Figure 3 : Generic reference model for DC network virtualization >> over a Layer3 infrastructure >> >> A Tenant System can be attached to a Network Virtualization Edge >> (NVE) node in several ways: > Each of these ways should be clearly labeled in Figure3. > >> - locally, by being co-located in the same device > add something like: (e.g., as part of the hypervisor) > >> - remotely, via a point-to-point connection or a switched network >> (e.g., Ethernet) >> >> When an NVE is local, the state of Tenant Systems can be provided >> without protocol assistance. For instance, the operational status of >> a VM can be communicated via a local API. When an NVE is remote, the >> state of Tenant Systems needs to be exchanged via a data or control >> plane protocol, or via a management entity. > Better: > > When an NVE is co-located with a Tenant System, communication > and synchronization between the TS and NVE takes place via > software (e.g., using an internal API). When an NVE and TS are > separated by an access link, interaction and synchronization > between an NVE and TS require an explicit data plane, control > plane, or management protocol. > >> The functional components in Figure 3 do not necessarily map >> directly with the physical components described in Figure 1. >> >> For example, an End Device can be a server blade with VMs and >> virtual switch, i.e. the VM is the Tenant System and the NVE >> functions may be performed by the virtual switch and/or the >> hypervisor. In this case, the Tenant System and NVE function are co- >> located. >> >> Another example is the case where an End Device can be a traditional >> physical server (no VMs, no virtual switch), i.e. the server is the >> Tenant System and the NVE function may be performed by the ToR. > We should not use the term "ToR" here. We should be more generic and > say something like the "attached switch" or "access switch". > >> The NVE implements network virtualization functions that allow for >> L2 and/or L3 tenant separation and for hiding tenant addressing >> information (MAC and IP addresses), tenant-related control plane >> activity and service contexts from the underlay nodes. > We should probably define "tenant separation" earlier in the document and then > just refer to that defintion. Add something like the following to the > defintions?: > > Tenant Separation: Tenant Separation refers to isolating traffic > of different tenants so that traffic from one tenant is not > visible to or delivered to another tenant, except when allowed by > policy. Tenant Separation also refers to address space separation, > whereby different tenants use the same address space for different > virtual networks without conflict. > > >> 2.2. NVE Reference Model >> One or more VNIs can be instantiated on an NVE. Tenant Systems >> interface with a corresponding VNI via a Virtual Access Point >> (VAP). > Define VAP in the terminology section. > >> An overlay module that provides tunneling overlay functions (e.g., >> encapsulation and decapsulation of tenant traffic from/to the tenant >> forwarding instance, tenant identification and mapping, etc), as >> described in figure 4: > Doesn't quite parse. Better: > > An overlay module on the NVE provides tunneling overlay > functions (e.g., encapsulation and decapsulation of tenant > traffic from/to the tenant forwarding instance, tenant > identification and mapping, etc), as described in figure 4: > >> +------- L3 Network ------+ >> | | >> | Tunnel Overlay | >> +------------+---------+ +---------+------------+ >> | +----------+-------+ | | +---------+--------+ | >> | | Overlay Module | | | | Overlay Module | | >> | +---------+--------+ | | +---------+--------+ | >> | |VN context| | VN context| | >> | | | | | | >> | +--------+-------+ | | +--------+-------+ | >> | | |VNI| . |VNI| | | | |VNI| . |VNI| | >> NVE1 | +-+------------+-+ | | +-+-----------+--+ | NVE2 >> | | VAPs | | | | VAPs | | >> +----+------------+----+ +----+-----------+-----+ >> | | | | >> -------+------------+-----------------+-----------+------- >> | | Tenant | | >> | | Service IF | | >> Tenant Systems Tenant Systems >> >> Figure 4 : Generic reference model for NV Edge >> >> Note that some NVE functions (e.g., data plane and control plane >> functions) may reside in one device or may be implemented separately >> in different devices. For example, the NVE functionality could >> reside solely on the End Devices, or be distributed between the End >> Devices and the ToRs. In the latter case we say that the End Device >> NVE component acts as the NVE Spoke, and ToRs act as NVE hubs. >> Tenant Systems will interface with VNIs maintained on the NVE >> spokes, and VNIs maintained on the NVE spokes will interface with >> VNIs maintained on the NVE hubs. > Don't always assume "ToR". > > Also, in Figure 4, "VN context' is listed as if it were a component or > something. But VN context is previously defined as a field in the > overlay header. Is this something different? If so, we should use a > different term (to avoid confusion). > >> 2.3. NVE Service Types >> >> NVE components may be used to provide different types of virtualized >> network services. This section defines the service types and >> associated attributes. Note that an NVE may be capable of providing >> both L2 and L3 services. >> 2.3.1. L2 NVE providing Ethernet LAN-like service >> >> L2 NVE implements Ethernet LAN emulation (ELAN), an Ethernet based > drop the "(ELAN)" abbreviation. (its not needed and is not used > anywhere else). > >> multipoint service where the Tenant Systems appear to be >> interconnected by a LAN environment over a set of L3 tunnels. It >> provides per tenant virtual switching instance with MAC addressing >> isolation and L3 (IP/MPLS) tunnel encapsulation across the underlay. >> >> 2.3.2. L3 NVE providing IP/VRF-like service >> >> Virtualized IP routing and forwarding is similar from a service >> definition perspective with IETF IP VPN (e.g., BGP/MPLS IPVPN >> [RFC4364] and IPsec VPNs). It provides per tenant routing instance > should provide an RFC reference for IPsec VPNs too... > > s/per tenant/per-tenant/ > >> with addressing isolation and L3 (IP/MPLS) tunnel encapsulation >> across the underlay. >> >> 3. Functional components >> >> This section decomposes the Network Virtualization architecture into >> functional components described in Figure 4 to make it easier to >> discuss solution options for these components. >> >> 3.1. Service Virtualization Components >> >> 3.1.1. Virtual Access Points (VAPs) >> >> Tenant Systems are connected to the VNI Instance through Virtual >> Access Points (VAPs). >> >> The VAPs can be physical ports or virtual ports identified through >> logical interface identifiers (e.g., VLAN ID, internal vSwitch >> Interface ID coonected to a VM). > s/coonected/connected/ > >> 3.1.2. Virtual Network Instance (VNI) >> >> The VNI represents a set of configuration attributes defining access >> and tunnel policies and (L2 and/or L3) forwarding functions. >> Per tenant FIB tables and control plane protocol instances are used >> to maintain separate private contexts between tenants. Hence tenants >> are free to use their own addressing schemes without concerns about >> address overlapping with other tenants. > Not exactly. The VNI is a VN Instance. Implementing a VNI requires a > bunch of stuff. Also, here I think it is better to talk about VNs than > tenants. > > Also, in reading through the doc, we say over and over again that you > get address space separation, etc. No need to repeat this all the time! > > Better: > > A VNI is a specific VN instance. Associated with each VNI is a > set of metat data necessary to implement the specific VN > service. For example, a per-VN forwarding or mapping table is > needed deliver traffic to other members of the VN and ensure > tenant separation between between different VNs. > >> 3.1.3. Overlay Modules and VN Context >> >> Mechanisms for identifying each tenant service are required to allow >> the simultaneous overlay of multiple tenant services over the same >> underlay L3 network topology. In the data plane, each NVE, upon >> sending a tenant packet, must be able to encode the VN Context for >> the destination NVE in addition to the L3 tunnel information (e.g., >> source IP address identifying the source NVE and the destination IP >> address identifying the destination NVE, or MPLS label). This allows >> the destination NVE to identify the tenant service instance and >> therefore appropriately process and forward the tenant packet. >> >> The Overlay module provides tunneling overlay functions: tunnel >> initiation/termination, encapsulation/decapsulation of frames from >> VAPs/L3 Backbone and may provide for transit forwarding of IP > s/L3 Backbone/L3 underlay > >> traffic (e.g., transparent tunnel forwarding). > What is this "transit forwarding"? > >> In a multi-tenant context, the tunnel aggregates frames from/to >> different VNIs. Tenant identification and traffic demultiplexing are >> based on the VN Context identifier (e.g., VNID). > Let's drop use of VNID here (since IDs can be locally significant too). > >> The following approaches can been considered: >> >> o One VN Context per Tenant: A globally unique (on a per-DC >> administrative domain) VNID is used to identify the related >> Tenant instances. An example of this approach is the use of >> IEEE VLAN or ISID tags to provide virtual L2 domains. > I think this is off. We are mixing "tenant" and "VN". A tenant can > have multiple different VNIs associated with it. Each of those VNIs > uses different VN Contexts. Thus, the VN Context != Tenant. > >> o One VN Context per VNI: A per-tenant local value is >> automatically generated by the egress NVE and usually >> distributed by a control plane protocol to all the related >> NVEs. An example of this approach is the use of per VRF MPLS >> labels in IP VPN [RFC4364]. > This seems off. There could be a different VN Context for each NVE, so > its not "per VNI". > >> o One VN Context per VAP: A per-VAP local value is assigned and >> usually distributed by a control plane protocol. An example of >> this approach is the use of per CE-PE MPLS labels in IP VPN >> [RFC4364]. >> >> Note that when using one VN Context per VNI or per VAP, an >> additional global identifier may be used by the control plane to >> identify the Tenant context. > need a name for that global identifier term and put it in the > terminology section. > >> 3.1.4. Tunnel Overlays and Encapsulation options >> >> Once the VN context identifier is added to the frame, a L3 Tunnel > When using term Context Identifier, capitalize it. > > >> encapsulation is used to transport the frame to the destination NVE. >> The backbone devices do not usually keep any per service state, >> simply forwarding the frames based on the outer tunnel >> header. > don't use "backbone devices" term. use "underlay devices"? > >> Different IP tunneling options (e.g., GRE, L2TP, IPSec) and MPLS >> tunneling options (e.g., BGP VPN, VPLS) can be used. >> >> 3.1.5. Control Plane Components > This section should be expanded to show the different problem areas > for the control plane (specifically server-to-NVE and NVE-to-oracle). > >> Control plane components may be used to provide the following >> capabilities: >> >> . Auto-provisioning/Service discovery >> >> . Address advertisement and tunnel mapping >> >> . Tunnel management > The above really should be expanded a bit. E.g., what does "auto > provisioning" refer to? I don't think a lot is needed, but a sentence > or two per bullet point would help. Also, there are 3 bullet points > here, but there are 4 subsections that follow, and they do not match > the above bullet points. > >> A control plane component can be an on-net control protocol >> implemented on the NVE or a management control entity. > What is "on-net protocol" > >> 3.1.5.1. Distributed vs Centralized Control Plane >> >> A control/management plane entity can be centralized or distributed. >> Both approaches have been used extensively in the past. The routing >> model of the Internet is a good example of a distributed approach. >> Transport networks have usually used a centralized approach to >> manage transport paths. > What is a "transport network"? I'm not sure what these are and why > they are "usually" use a centrallized network. > >> It is also possible to combine the two approaches i.e. using a >> hybrid model. A global view of network state can have many benefits >> but it does not preclude the use of distributed protocols within the >> network. Centralized controllers provide a facility to maintain >> global state, and distribute that state to the network which in >> combination with distributed protocols can aid in achieving greater >> network efficiencies, and improve reliability and robustness. Domain >> and/or deployment specific constraints define the balance between >> centralized and distributed approaches. >> >> On one hand, a control plane module can reside in every NVE. This is >> how routing control plane modules are implemented in routers. At the >> same time, an external controller can manage a group of NVEs via an >> agent in each NVE. This is how an SDN controller could communicate >> with the nodes it controls, via OpenFlow [OF] for instance. > Expand SDN on first usage... > >> In the case where a logically centralized control plane is >> preferred, the controller will need to be distributed to more than >> one node for redundancy and scalability in order to manage a large >> number of NVEs. Hence, inter-controller communication is necessary >> to synchronize state among controllers. It should be noted that >> controllers may be organized in clusters. The information exchanged >> between controllers of the same cluster could be different from the >> information exchanged across clusters. > This section does not really capture the oracle discussion we had at > the interim meeting. > >> 3.1.5.2. Auto-provisioning/Service discovery >> >> NVEs must be able to identify the appropriate VNI for each Tenant >> System. This is based on state information that is often provided by >> external entities. For example, in an environment where a VM is a >> Tenant System, this information is provided by compute management >> systems, since these are the only entities that have visibility of >> which VM belongs to which tenant. > Above. might be better to say this is provided by the vm orchestration > system (and maybe define this term in terminology section?) > >> 3.1.5.3. Address advertisement and tunnel mapping >> >> As traffic reaches an ingress NVE, a lookup is performed to >> determine which tunnel the packet needs to be sent to. It is then > s/sent to/sent on/ ?? > >> encapsulated with a tunnel header containing the destination >> information (destination IP address or MPLS label) of the egress >> overlay node. Intermediate nodes (between the ingress and egress >> NVEs) switch or route traffic based upon the outer destination >> information. > Use active voice above? And say which entity it is that does the work? > E.g. > > As traffic reaches an ingress NVE, the NVE performs a lookup > to determine which remote NVE the packet should be sent > to. The NVE [or Overlay Module???] then adds a tunnel > encapsulation header containing the destination information > (destination IP address or MPLS label) of the egress NVE as > well as an appropriate Context ID. Nodes on the underlay > network (between the ingress and egress NVEs) forward traffic > based solely on the outer destination header information. > >> One key step in this process consists of mapping a final destination >> information to the proper tunnel. NVEs are responsible for >> maintaining such mappings in their forwarding tables. Several ways >> of populating these tables are possible: control plane driven, >> management plane driven, or data plane driven. > Better: > > A key step in the above process consists of identifying the > destination NVE the packet is to be tunneled to. NVEs are > responsible for maintaining a set of forwarding or mapping > tables that hold the bindings between between destination VM > and egress NVE addresses. Several ways of populating these > tables are possible: control plane driven, management plane > driven, or data plane driven. > >> 3.2. Multi-homing >> >> Multi-homing techniques can be used to increase the reliability of >> an nvo3 network. It is also important to ensure that physical >> diversity in an nvo3 network is taken into account to avoid single >> points of failure. >> >> Multi-homing can be enabled in various nodes, from tenant systems >> into TORs, TORs into core switches/routers, and core nodes into DC >> GWs. >> >> The nvo3 underlay nodes (i.e. from NVEs to DC GWs) rely on IP >> routing as the means to re-route traffic upon failures and/or ECMP >> techniques or on MPLS re-rerouting capabilities. >> >> When a tenant system is co-located with the NVE on the same end- >> system, the tenant system is single homed to the NVE via a vport >> that is virtual NIC (vNIC). When the end system and the NVEs >> are > vport/vNIC terminology is not defined (and is somewhat proprietary). > > And shouldn't the VAP terminology be used here? > >> separated, the end system is connected to the NVE via a logical >> Layer2 (L2) construct such as a VLAN. In this latter case, an end >> device or vSwitch on that device could be multi-homed to various >> NVEs. An NVE may provide an L2 service to the end system or a l3 >> service. An NVE may be multi-homed to a next layer in the DC at >> Layer2 (L2) or Layer3 (L3). When an NVE provides an L2 service and >> is not co-located with the end system, techniques such as Ethernet >> Link Aggregation Group (LAG) or Spanning Tree Protocol (STP) can be >> used to switch traffic between an end system and connected >> NVEs without creating loops. Similarly, when the NVE provides L3 >> service, similar dual-homing techniques can be used. When the NVE >> provides a L3 service to the end system, it is possible that no >> dynamic routing protocol is enabled between the end system and the >> NVE. The end system can be multi-homed to multiple physically- >> separated L3 NVEs over multiple interfaces. When one of the >> links connected to an NVE fails, the other interfaces can be used to >> reach the end system. > The above seemt to talk about what I'll call a "distributed NVE > model", where a TES is connected to more than one NVE (or the one NVE > that is somehow distributed). If this document is going to talk about > that as a possibility in the context of multihoming, I think we need > to talk more generally about a TES connected to more than one > TES. This document doesn't really talk about that at all. > > Personally, I'm not sure we should go here. A lot of complexity that I > suspect is not worth the cost. I'd suggest sticking with a single NVE > per TES. > >> External connectivity out of an nvo3 domain can be handled by two or >> more nvo3 gateways. Each gateway is connected to a different domain >> (e.g. ISP), providing access to external networks such as VPNs or >> the Internet. A gateway may be connected to two nodes. When a >> connection to an upstream node is lost, the alternative connection >> is used and the failed route withdrawn. > For external multihoming, there is no reason to say they are connected > to "different domains". You just want redundancy. > > Actually, I'm not sure what point the above is trying to highlight. Is > this a generic requirement for multihoming out of a DC that happens to > be running NVO3 internally? If so, I don't think we need to say > that. It's a given and mostly out of scope. > > If we are talking just about "nvo3 gateways", I presume that means > getting in and out of a specific VN. For that, you just need > multihoming for redundancy, and there is no need to talk about > connecting those gateways to "different domains". > >> 3.3. VM Mobility >> >> In DC environments utilizing VM technologies, an important feature >> is that VMs can move from one server to another server in the same >> or different L2 physical domains (within or across DCs) in a >> seamless manner. >> >> A VM can be moved from one server to another in stopped or suspended >> state ("cold" VM mobility) or in running/active state ("hot" VM >> mobility). With "hot" mobility, VM L2 and L3 addresses need to be >> preserved. With "cold" mobility, it may be desired to preserve VM L3 >> addresses. >> >> Solutions to maintain connectivity while a VM is moved are necessary >> in the case of "hot" mobility. This implies that transport >> connections among VMs are preserved and that ARP caches are updated >> accordingly. >> >> Upon VM mobility, NVE policies that define connectivity among VMs >> must be maintained. >> >> Optimal routing during VM mobility is also an important aspect to >> address. It is expected that the VM's default gateway be as close as >> possible to the server hosting the VM and triangular routing be >> avoided. > What is meant by "triangular routing" above? Specifically, how is this > a result of mobility vs. a general requirement? > >> 3.4. Service Overlay Topologies >> >> A number of service topologies may be used to optimize the service >> connectivity and to address NVE performance limitations. >> >> The topology described in Figure 3 suggests the use of a tunnel mesh >> between the NVEs where each tenant instance is one hop away from a >> service processing perspective. Partial mesh topologies and an NVE >> hierarchy may be used where certain NVEs may act as service transit >> points. >> >> 4. Key aspects of overlay networks >> >> The intent of this section is to highlight specific issues that >> proposed overlay solutions need to address. >> >> 4.1. Pros & Cons >> >> An overlay network is a layer of virtual network topology on top of >> the physical network. >> >> Overlay networks offer the following key advantages: >> >> o Unicast tunneling state management and association with tenant >> systems reachability are handled at the edge of the network. >> Intermediate transport nodes are unaware of such state. Note >> that this is not the case when multicast is enabled in the core >> network. > The comment about multicast needs expansion. Multicast (in the > underlay) is an underlay issue and has nothing to do with > overlays. If Tenant traffic is mapped into multicast service on the > underlay, then there is a connection. If the latter is what is meant, > please add text to that effect. > >> o Tunneling is used to aggregate traffic and hide tenant >> addresses from the underkay network, and hence offer the >> advantage of minimizing the amount of forwarding state required >> within the underlay network >> >> o Decoupling of the overlay addresses (MAC and IP) used by VMs >> from the underlay network. This offers a clear separation >> between addresses used within the overlay and the underlay >> networks and it enables the use of overlapping addresses spaces >> by Tenant Systems >> >> o Support of a large number of virtual network identifiers >> >> Overlay networks also create several challenges: >> >> o Overlay networks have no controls of underlay networks and lack >> critical network information > Is some text missing from the above bullet? > >> o Overlays typically probe the network to measure link or >> path properties, such as available bandwidth or packet >> loss rate. It is difficult to accurately evaluate network >> properties. It might be preferable for the underlay >> network to expose usage and performance >> information. > I don't follow the above. Isn't the above a true statement for a host > connected to an IP network as well? > >> o Miscommunication or lack of coordination between overlay and >> underlay networks can lead to an inefficient usage of network >> resources. > Might be good to give an example. > >> o When multiple overlays co-exist on top of a common underlay >> network, the lack of coordination between overlays can lead to >> performance issues. > Can you give examples? and how is this different from what we have > today with different hosts "not coordinating" when they use the > network? > >> o Overlaid traffic may not traverse firewalls and NAT > devices. > > Explain why this is a challenge. I'd argue that is the point. if a FW > is needed, it could be part of the overlay. > >> o Multicast service scalability. Multicast support may be >> required in the underlay network to address for each tenant >> flood containment or efficient multicast handling. The underlay >> may be also be required to maintain multicast state on a per- >> tenant basis, or even on a per-individual multicast flow of a >> given tenant. >> >> o Hash-based load balancing may not be optimal as the hash >> algorithm may not work well due to the limited number of >> combinations of tunnel source and destination addresses. Other >> NVO3 mechanisms may use additional entropy information than >> source and destination addresses. >> 4.2. Overlay issues to consider >> >> 4.2.1. Data plane vs Control plane driven >> >> In the case of an L2NVE, it is possible to dynamically learn MAC >> addresses against VAPs. > rewrite? what does it mean to "learn MAC addresses against VAPs". > >> It is also possible that such addresses be >> known and controlled via management or a control protocol for both >> L2NVEs and L3NVEs. >> >> Dynamic data plane learning implies that flooding of unknown >> destinations be supported and hence implies that broadcast and/or >> multicast be supported or that ingress replication be used as >> described in section 4.2.3. Multicasting in the underlay network for >> dynamic learning may lead to significant scalability limitations. >> Specific forwarding rules must be enforced to prevent loops from >> happening. This can be achieved using a spanning tree, a shortest >> path tree, or a split-horizon mesh. >> >> It should be noted that the amount of state to be distributed is >> dependent upon network topology and the number of virtual machines. >> Different forms of caching can also be utilized to minimize state >> distribution between the various elements. The control plane should >> not require an NVE to maintain the locations of all the tenant >> systems whose VNs are not present on the NVE. The use of a control >> plane does not imply that the data plane on NVEs has to maintain all >> the forwarding state in the control plane. >> >> 4.2.2. Coordination between data plane and control plane >> >> For an L2 NVE, the NVE needs to be able to determine MAC addresses >> of the end systems connected via a VAP. This can be achieved via >> dataplane learning or a control plane. For an L3 NVE, the NVE needs >> to be able to determine IP addresses of the end systems connected >> via a VAP. > Better: > > For an L2 NVE, the NVE needs to be able to determine MAC > addresses of the end systems connected via a VAP. For an L3 > NVE, the NVE needs to be able to determine IP addresses of the > end systems connected via a VAP. In both cases, this can be > achieved via dataplane learning or a control plane. > >> In both cases, coordination with the NVE control protocol is needed >> such that when the NVE determines that the set of addresses behind a >> VAP has changed, it triggers the local NVE control plane to >> distribute this information to its peers. >> >> 4.2.3. Handling Broadcast, Unknown Unicast and Multicast (BUM) traffic >> >> There are two techniques to support packet replication needed for >> broadcast, unknown unicast and multicast: > s/for broadcast/for tenant broadcast/ > >> o Ingress replication >> >> o Use of underlay multicast trees > draft-ghanwani-nvo3-mcast-issues-00.txt describes a third technique. > >> There is a bandwidth vs state trade-off between the two approaches. >> Depending upon the degree of replication required (i.e. the number >> of hosts per group) and the amount of multicast state to maintain, >> trading bandwidth for state should be considered. >> >> When the number of hosts per group is large, the use of underlay >> multicast trees may be more appropriate. When the number of hosts is >> small (e.g. 2-3), ingress replication may not be an issue. >> >> Depending upon the size of the data center network and hence the >> number of (S,G) entries, but also the duration of multicast flows, >> the use of underlay multicast trees can be a challenge. >> >> When flows are well known, it is possible to pre-provision such >> multicast trees. However, it is often difficult to predict >> application flows ahead of time, and hence programming of (S,G) >> entries for short-lived flows could be impractical. >> >> A possible trade-off is to use in the underlay shared multicast >> trees as opposed to dedicated multicast trees. >> 4.2.4. Path MTU >> >> When using overlay tunneling, an outer header is added to the >> original frame. This can cause the MTU of the path to the egress >> tunnel endpoint to be exceeded. >> >> In this section, we will only consider the case of an IP overlay. >> >> It is usually not desirable to rely on IP fragmentation for >> performance reasons. Ideally, the interface MTU as seen by a Tenant >> System is adjusted such that no fragmentation is needed. TCP will >> adjust its maximum segment size accordingly. >> >> It is possible for the MTU to be configured manually or to be >> discovered dynamically. Various Path MTU discovery techniques exist >> in order to determine the proper MTU size to use: >> >> o Classical ICMP-based MTU Path Discovery [RFC1191] [RFC1981] >> >> o >> Tenant Systems rely on ICMP messages to discover the MTU of >> the end-to-end path to its destination. This method is not >> always possible, such as when traversing middle boxes >> (e.g. firewalls) which disable ICMP for security reasons >> >> o Extended MTU Path Discovery techniques such as defined in >> [RFC4821] >> >> It is also possible to rely on the overlay layer to perform >> segmentation and reassembly operations without relying on the Tenant >> Systems to know about the end-to-end MTU. The assumption is that >> some hardware assist is available on the NVE node to perform such >> SAR operations. However, fragmentation by the overlay layer can lead > expand "SAR" > >> to performance and congestion issues due to TCP dynamics and might >> require new congestion avoidance mechanisms from then underlay >> network [FLOYD]. >> >> Finally, the underlay network may be designed in such a way that the >> MTU can accommodate the extra tunneling and possibly additional nvo3 >> header encapsulation overhead. >> >> 4.2.5. NVE location trade-offs >> >> In the case of DC traffic, traffic originated from a VM is native >> Ethernet traffic. This traffic can be switched by a local virtual >> switch or ToR switch and then by a DC gateway. The NVE function can >> be embedded within any of these elements. >> >> There are several criteria to consider when deciding where the NVE >> function should happen: >> >> o Processing and memory requirements >> >> o Datapath (e.g. lookups, filtering, >> encapsulation/decapsulation) >> >> o Control plane processing (e.g. routing, signaling, OAM) and >> where specific control plane functions should be enabled > missing closing ")" > >> o FIB/RIB size >> >> o Multicast support >> >> o Routing/signaling protocols >> >> o Packet replication capability >> >> o Multicast FIB >> >> o Fragmentation support >> >> o QoS support (e.g. marking, policing, queuing) >> >> o Resiliency >> >> 4.2.6. Interaction between network overlays and underlays >> >> When multiple overlays co-exist on top of a common underlay network, >> resources (e.g., bandwidth) should be provisioned to ensure that >> traffic from overlays can be accommodated and QoS objectives can be >> met. Overlays can have partially overlapping paths (nodes and >> links). >> >> Each overlay is selfish by nature. It sends traffic so as to >> optimize its own performance without considering the impact on other >> overlays, unless the underlay paths are traffic engineered on a per >> overlay basis to avoid congestion of underlay resources. >> >> Better visibility between overlays and underlays, or generally >> coordination in placing overlay demand on an underlay network, can >> be achieved by providing mechanisms to exchange performance and >> liveliness information between the underlay and overlay(s) or the >> use of such information by a coordination system. Such information >> may include: >> >> o Performance metrics (throughput, delay, loss, jitter) >> >> o Cost metrics >> >> 5. Security Considerations >> >> Nvo3 solutions must at least consider and address the following: >> >> . Secure and authenticated communication between an NVE and an >> NVE management system. >> >> . Isolation between tenant overlay networks. The use of per- >> tenant FIB tables (VNIs) on an NVE is essential. >> >> . Security of any protocol used to carry overlay network >> information. >> >> . Avoiding packets from reaching the wrong NVI, especially during >> VM moves. >> >> > _______________________________________________ > nvo3 mailing list > nvo3@ietf.org > https://www.ietf.org/mailman/listinfo/nvo3
- [nvo3] Review of draft-ietf-nvo3-framework-02 Thomas Narten
- Re: [nvo3] Review of draft-ietf-nvo3-framework-02 Benson Schliesser
- Re: [nvo3] Review of draft-ietf-nvo3-framework-02 LASSERRE, MARC (MARC)
- Re: [nvo3] Review of draft-ietf-nvo3-framework-02 Larry Kreeger (kreeger)