I have a number of comments on draft-sajassi-bess-evpn-mvpn-seamless-interop. 1. It seems that the proposal does not do correct ethernet emulation. Intra-subnet multicast only sometimes preserves MAC SA and IP TTL, sometimes not, depending upon the topology. TTL handling for inter-subnet multicast seems inconsistent as well, depending upon the topology. The proposal exposes the operator's internal network structure to the user, and will cause "LAN-only" applications to break. These concerns are acknowledged, then quickly dismissed based on wishful thinking. (In my experience, wishful thinking doesn't work out very well in routing.) 2. In order to do inter-subnet multicast in EVPN, the proposal requires L3VPN/MVPN configuration on ALL the EVPN PEs. This is required even when there is no need for MVPN/EVPN interworking. This is portrayed as a "low provisioning" solution! 3. The draft claims that the exact same control plane should be used for EVPN and MVPN, despite the fact that MVPN's control plane is unaware of certain information that is very important in EVPN (e.g., EVIs, TagIDs). (This is largely responsible for point 1 above.) This is claimed to be a way of providing a "uniform solution". As we examine the problems that arise, perhaps this will be seen as more a case of "pounding square pegs into round holes". When interworking between two domains, generally one gets a more flexible and robust scheme by maintaining clean interfaces and having well-defined points of attachment, not by entangling the internal protocols of one domain with the internal protocols of the other. 4. The draft proposes to use the same tunnels for MVPN and EVPN, i.e., to have tunnels that traverse both the MVPN and the EVPN domains. Various "requirements" are stated that seem to require this solution. Somewhere along the line it was realized that this requirement cannot be met if MVPN and EVPN do not use the same tunnel types. So for this very common scenario, a completely different solution is proposed, that (a) tries to keep the EVPN control plane out of the MVPN domain, and vice versa, and (b) uses different tunnels in the two domains. Perhaps the "requirements" that suggest using a single cross-domain tunnel are not really requirements! And why would we want different solutions for different deployment scenarios? Yes, the solution needs to handle all the use cases, but we don't want to look at the use cases one at a time and design a different solution for each one. While the authors have realized that one cannot have cross-domain tunnels when EVPN uses VxLAN and MVPN uses MPLS, they do not seem to have acknowledged the multitude of other scenarios in which cross-domain tunnels cannot be used. For instance, MVPN may be using mLDP, while EVPN is using IR. Or MVPN may be using RSVP-TE P2MP while EVPN is using AR. Etc., etc. I suspect that "different tunnel types" will be the common case, especially when trying to interwork existing MVPN and EVPN deployments. The inability to use EVPN-specific tunnels also causes a number of specific problems when attempting to interwork with MVPN; these will be examined below. 5. A number of the draft's stated "requirements" seem to be entirely bogus. a. In some cases, the "requirements" for optimality in one or another respect (e.g., routing, replication) are really only considerations that an operator should be able to trade off against other considerations. The real requirement is to be able to create a deployment scenario in which such optimality is achievable. Other deployment scenarios, that optimize for other considerations, should not be prohibited. b. Many of the "requirements" are applied very selectively, e.g., the "requirement" for MVPN and EVPN to use the same set of multicast tunnels, and the requirement for there to be no "gateways". 6. The gateway-based proposal for interworking MVPN and EVPN when they use different tunnel types is severely underspecified. One possible approach to this would be to have a single MVPN domain that includes the EVPN PEs, and to use MVPN tunnel segmentation at the boundary. While that is a complicated solution, at least it is known to work. However, that does not seem to be what is being proposed. Another approach would be to set up two independent MVPN domains and carefully assign RTs to ensure that routes are not leaked from one domain to another. One would also have to ensure that the boundary points send the proper set of routes into the "other" domain. (This includes the unicast routes as well as the multicast routes.) And one would have to include a whole bunch of applicability restrictions, such as "don't use the same RR to hold routes of both domains". I think that's what's being proposed, but there isn't enough discussion of RT and RD management to be sure, and there isn't much discussion of what information the boundary points send into each domain. 7. The proposal requires that EVPN export a host route to MVPN for each EVPN-attached multicast source. It's a good thing that there is no requirement like "do not burden existing MVPN deployments with a whole bunch of additional host routes". Wait a minute, maybe there is such a requirement. In fact, whether the host routes are necessary to achieve optimal routing depends on the topology. And this is a case where an operator might well want to sacrifice some routing optimality to reduce the routing burden on the MVPN nodes. 8. The proposal simply does not work when MVPN receivers are interested in multicast flows from EVPN sources that are attached to all-active multi-homed ethernet segments. This issue is worth examining in detail. Suppose EVPN-PE1 and EVPN-PE2 are both attached to the same ethernet segment, using all-active multi-homing. Suppose there is a multicast source S on that segment. In such a case, (S,G1) traffic might arrive at PE1, while (S,G2) traffic might arrive at PE2. (Which PE gets a particular flow from S depends on LAG hashing algorithms over which we have no control.) Now suppose that an MVPN PE, say PE3, needs to receive (S,G1) traffic. MVPN requires PE3 to select the "Upstream PE" for the (S,G1) traffic. PE3 does this by looking at the VRF Route Import EC on its best route to S. In order to receive the (S,G1) traffic, PE3 must select PE1, rather than PE2, as the Upstream PE. However, there is absolutely nothing in the MVPN specs or in this document to ensure that PE3 selects PE1 rather than PE2. Generally, an MVPN node will select PE2 if it is closer to PE2. Perhaps the authors are under the impression that MVPN Source Active A-D routes can be used to solve this problem. That is not so. Vanilla MVPN nodes do not generally base their selection of the Upstream PE for (S,G) on the SA A-D routes. Let me explain a little about the way SA A-D routes are used. There are two different MVPN "modes" that affect the use of SA A-D routes. In one mode (sometimes known as 'rpt-spt' mode, and described in Section 13 of RFC 6514), an SA A-D route for (S,G) is originated by a PE when that PE receives a C-multicast route for (S,G). In another mode (sometimes known as 'spt-only' mode, and described in Section 14 of RFC 6514), an SA A-D route for (S,G) is originated by a PE when that PE receives a PIM Register message for (S,G), or when that PE receives an MSDP SA message for (S,G). Note that in this mode, the PE originating the SA A-D route is not necessarily the best (or even a good) ingress PE for the flow. - In both modes, if an egress PE receives a PIM Join (S,G) from a CE, its choice of ingress PE is never impacted by the SA A-D routes. Note that CEs send PIM Join(S,G) messages for both ASM and SSM groups. - In spt-only mode, the SA A-D routes are used to discover sources, but not to select the ingress PE. (The selected ingress PE is not necessarily the one originating the SA A-D route.) - The choice of ingress PE is impacted by the SA A-D routes for (S,G) only when (a) rpt-spt mode is being used, (b) the egress PE has received a PIM Join (*,G) from a CE, and (c) the egress PE has not received a PIM Join (S,G) from a CE. This is typically just a transient state, as the CE will generally emit a PIM Join(S,G) as soon as it sees any (S,G) traffic. Bottom line: if a source is on an EVPN all-active multi-homed segment, MVPN receivers have no way to select the proper ingress PE. If the segment is n-way-homed, the MVPN PEs have just a 1/n chance of getting the traffic. Of course, this problem could be eliminated if EVPN and MVPN didn't have to use the same tunnels. In that case, if an MVPN node selects the wrong ingress PE, the selected PE could obtain the traffic from the real ingress PE, and then relay it to the MVPN node. This might result in sub-optimal routing, but that's better than a black hole! Perhaps the gateway-based solution needs to be used whenever there is all-active multi-homing? ;-) One could imagine modifying the MVPN installed based so that the SA A-D routes play more of a role in selecting the Upstream PE. However, I believe the requirement is to allow MVPN/EVPN interworking without modifying the existing MVPN nodes. 9. In the case where all the multicast sources for a given group are attached via EVPN, there is a very simple procedure for providing Join(*,G) functionality. This procedure makes use of EVPN-specific knowledge. Since the MVPN protocols cannot take advantage of the EVPN-specific knowledge, a more complicated procedure is needed when only MVPN protocols are used. This is explained further below. 10. Most of the problems above are the result of (a) trying to use the exact same control plane for both MVPN and EVPN, and (b) treating the case where both domains use the same tunnel type as the design center. It would be better to keep clean interfaces between EVPN and MVPN, with clearly defined points of attachment. The proposal in draft-lin-bess-evpn-irb-mcast does this, and thus does not run into the above problems. That proposal also shows how the "optimal routing" requirements can be met, and how they can be traded off against other considerations. (In fairness, it must be acknowledged that both proposals are still works in progress. It's also worth noting that the two proposals have a lot in common.) BESS Working Group A. Sajassi Internet Draft S. Thoria Category: Standard Track N. Fazlollahi Cisco A. Gupta Avi Networks Expires: January 2, 2017 July 2, 2017 Seamless Multicast Interoperability between EVPN and MVPN PEs draft-sajassi-bess-evpn-mvpn-seamless-interop-00.txt Abstract Ethernet Virtual Private Network (EVPN) solution is becoming pervasive for Network Virtualization Overlay (NVO) services in data center (DC) networks and as the next generation VPN services in service provider (SP) networks. As service providers transform their networks in their COs toward next generation data center with Software Defined Networking (SDN) based fabric and Network Function Virtualization (NFV), they want to be able to maintain their offered services including multicast VPN (MVPN) service between their existing network and their new SPDC network seamlessly without the use of gateway devices. They want to have such seamless interoperability between their new SPDCs and their existing networks for a) reducing cost, b) having optimum forwarding, and c) reducing provisioning. This document describes a unified solution based on RFC 6513 for seamless interoperability of multicast VPN between EVPN and MVPN PEs. Furthermore, it describes how the proposed solution can be used as a routed multicast solution in data centers with EVPN-IRB PEs per [EVPN-IRB]. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." Patel, et al. Expires January 2, 2017 [Page 1] INTERNET DRAFT Seamless Interop between EVPN & MVPN PEs July 2, 2017 The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Copyright and License Notice Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 5 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 4. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 5 4.1. Optimum Forwarding . . . . . . . . . . . . . . . . . . . . 6 4.2. Optimum Replication . . . . . . . . . . . . . . . . . . . . 6 4.3. All-Active and Single-Active Multi-Homing . . . . . . . . . 6 4.4. Inter-AS Tree Stitching . . . . . . . . . . . . . . . . . . 6 4.5. EVPN Service Interfaces . . . . . . . . . . . . . . . . . . 7 4.6. Distributed Anycast Gateway . . . . . . . . . . . . . . . . 7 4.7. Selective & Aggregate Selective Tunnels . . . . . . . . . . 7 4.8. Tenants' (S,G) or (*,G) states . . . . . . . . . . . . . . 7 5. Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 5.1. Operational Model for Homogenous EVPN IRB NVEs . . . . . . 8 5.1.1 Control Plane Operation . . . . . . . . . . . . . . . . 10 5.1.2 Data Plane Operation . . . . . . . . . . . . . . . . . 12 5.1.2.1 Sender and Receiver in same MAC-VRF . . . . . . . . 12 5.1.2.2 Sender and Receiver in different MAC-VRF . . . . . . 13 5.2. Operational Model for Heterogeneous EVPN IRB PEs . . . . . 13 5.3. All-Active Multi-Homing . . . . . . . . . . . . . . . . . 13 5.3.1. Source and receivers in same ES but on different subnets . . . . . . . . . . . . . . . . . . . . . . . 14 5.3.2. Source and some receivers in same ES and on same Patel, et al. Expires January 2, 2017 [Page 2] INTERNET DRAFT Seamless Interop between EVPN & MVPN PEs July 2, 2017 subnet . . . . . . . . . . . . . . . . . . . . . . . . 14 5.4. Mobility for Tenant's sources and receivers . . . . . . . 15 5.5. Single-Active Multi-Homing . . . . . . . . . . . . . . . . 15 6. DCs with only EVPN NVEs . . . . . . . . . . . . . . . . . . . 15 6.1 Setup of overlay multicast delivery . . . . . . . . . . . . 16 6.3 Data plane considerations . . . . . . . . . . . . . . . . . 17 7 Handling of different encapsulations . . . . . . . . . . . . . . 17 7.1 MPLS Encapsulation . . . . . . . . . . . . . . . . . . . . 18 7.2 VxLAN Encapsulation . . . . . . . . . . . . . . . . . . . . 18 7.3 Other Encapsulation . . . . . . . . . . . . . . . . . . . . 18 8. DCI with MPLS in WAN and VxLAN in DCs . . . . . . . . . . . . 18 8.1 Control plane inter-connect . . . . . . . . . . . . . . . . 18 8.2 Data plane inter-connect . . . . . . . . . . . . . . . . . . 20 8.3 Multi-homing among DCI gateways . . . . . . . . . . . . . . 20 9. Inter-AS Operation . . . . . . . . . . . . . . . . . . . . . . 20 10. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 20 10.1 DCs with only IGMP/MLD hosts w/o tenant router . . . . . . 20 10.2 DCs with mixed of IGMP/MLD hosts & multicast routers running PIM-SSM . . . . . . . . . . . . . . . . . . . . . 21 10.3 DCs with mixed of IGMP/MLD hosts & multicast routers running PIM-ASM . . . . . . . . . . . . . . . . . . . . . 21 10.4 DCs with mixed of IGMP/MLD hosts & multicast routers running PIM-Bidir . . . . . . . . . . . . . . . . . . . . 22 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 12. Security Considerations . . . . . . . . . . . . . . . . . . . 22 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 22 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 22 14.1. Normative References . . . . . . . . . . . . . . . . . . 22 15.2. Informative References . . . . . . . . . . . . . . . . . 23 15. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . 23 Patel, et al. Expires January 2, 2017 [Page 3] INTERNET DRAFT Seamless Interop between EVPN & MVPN PEs July 2, 2017 1. Introduction Ethernet Virtual Private Network (EVPN) solution is becoming pervasive for Network Virtualization Overlay (NVO) services in data center (DC) networks and as the next generation VPN services in service provider (SP) networks. As service providers transform their networks in their COs toward next generation data center with Software Defined Networking (SDN) based fabric and Network Function Virtualization (NFV), they want to be able to maintain their offered services including multicast VPN (MVPN) service between their existing network and their new SPDC network seamlessly without the use of gateway devices. **** "Gateway devices" needs to be defined. If a "gateway device" is, e.g., **** a device that participates in both MVPN and EVPN protocols, this draft **** doesn't eliminate gateway devices. Rather, it forces EVERY EVPN PE to **** be a gateway device. **** Are providers demanding to run MVPN protocols (and to deploy L3VPN/MVPN **** configuration) on all their EVPN devices whenever they want to do **** inter-subnet multicast within a single EVPN tenant domain? I don't **** think so. **** Of course it is true that some providers do not want to be limited to **** having one or two gateways per DC. Certainly they may not want to have **** one or two gateways per DC that have to maintain all the EVPN state for **** the DC. However, there is no proposal that imposes such a limit. **** On the other hand, some providers may well want a small number of **** MVPN-aware nodes to perform a "gateway" function for a larger number of **** EVPN nodes; I don't see why one would want to prohibit that. **** Operators should have the freedom to decide where they want this **** control plane gateway function to be performed, and to be able to **** control the trade-offs. **** But perhaps "gateway device" means "device that moves data between MVPN **** tunnels and EVPN tunnels". In that case, the requirement to have no **** gateway devices means that MVPN and EVPN cannot interoperate at all **** when they use different tunnel types. I don't think there is any such **** requirement. In fact, this draft does advocate the use of "gateway **** devices" for the case where MVPN uses VXLAN and for interoperating with **** EVPN PEs that don't do MVPN protocol. So even the draft authors don't **** seem to take this requirement seriously. **** The draft doesn't seem to acknowledge that "different tunnel types" **** doesn't just mean "MPLS and VxLAN", it also means "MVPN-using-mLDP and **** EVPN-using-IR", "MVPN-using-RSVP and EVPN-using-mLDP", "MVPN-using-IR **** and EVPN-using-AR", etc., etc. There are several reasons for having such seamless interoperability between their new DCs and their existing networks: **** Note that "seamless interoperability" is not defined either. **** If "seamless interoperabilty" means "MVPN and EVPN use the same **** tunnels", then this is not a requirement, it's a proposed solution. I **** don't think there's actually a requirement for MVPN and EVPN to use the **** same tunnel types in order to interwork. **** Note that MVPN/EVPN interworking will invariably involve legacy EVPN **** nodes that support unenhanced RFC7432, and hence that cannot use the **** tunnel types that are typically found in MVPN deployments. **** If "seamless interoperability" means "MVPN and EVPN use the same **** control protocol", then again this is not a requirement but a proposed **** solution, which needs to be evaluated based on costs and benefits. - Lower Cost: gateway devices need to have very high scalability to handle VPN services for their DCs and as such need to handle large number of VPN instances (in tens or hundreds of thousands) and very large number of routes (e.g., in millions). For the same speed and feed, these high scale gateway boxes are relatively much more expensive than their TOR devices that support much lower number of routes and VPN instances. **** The desire for deployment options that do not require one or two boxes **** per DC to support state for a large number of EVPN domains is **** undisputed. However, there is no proposal has such a requirement. - Optimum Forwarding: in a given CO, both EVPN PEs and MVPN PEs can be connected to the same network (e.g., same IGP domain). In such scenarios, the service providers want to have optimum forwarding **** I don't really see what "same network (e.g., same IGP domain)" has to **** do with a desire for optimum forwarding. In general, one would not **** expect all the PEs to be in the same IGP domain anyway. among these PE devices without the use of gateway devices. Because if gateway devices are used, then the multicast traffic between an EVPN and MVPN PEs can no longer be optimum and is some case, it may even get tromboned. Furthermore, when an SPDC network spans across multiple LATA (multiple geographic areas) and gateways are used between EVPN and MVPN PEs, then with respect to multicast traffic, only one GW can be designated forwarder (DF) between EVPN and MVPN PEs. **** This isn't so. If a given subnet is contained within a single **** geographic area, only the gateways attached to that area need to **** advertise routes to that subnet. Even if a given subnet is stretched **** across geographical areas, judicious advertising of host routes by the **** gateways can eliminate the need for any tromboning. So multicast **** traffic originating in one area does not need to be sent to the other **** area before going on to the MVPN domain. **** For the case where a source is in the MVPN domain but there are **** receivers in different EVPN "areas", it is possible for each area's **** gateway to pull the traffic and then redistribute it only to other PEs **** in the same geographical area. Such scenarios not only results in non-optimum forwarding but also it can result in tromboing of multicast traffic between the two LATAs when both source and destination PEs are in the same LATA and the DF gateway is elected to be in a different LATA. **** A requirement to avoid tromboning is sensible, but that does not imply **** that every EVPN PE has to support MVPN procedures. - Less Provisioning: If gateways are used, then the operator need to configure per-tenant info. In other words, for each tenant that is configured, one (or maybe two) additional touch points are needed. **** Don't forget that some operators want inter-subnet multicast within a **** tenant's EVPN domain, but don't necessarily want MVPN interworking. If **** you tell those operators "no problem, just configure L3VPN/MVPN on all **** your nodes", they probably won't regard that as a "low provisioning" **** solution. **** Entangling MVPN and EVPN like this is certainly no simplification if **** different tunnel types are used, or if different administrations **** control the MVPN and EVPN deployments, or if EVPN protocols continue to **** develop while MVPN protocols remain stable. **** With regard to optimum forwarding, usually this is one of the factors **** to be traded off against the need to upgrade all one's software and **** re-configure all one's nodes. A solution should allow operators to **** make these trade-offs. **** Also, "one or two additional touchpoints" doesn't seem like that big a **** deal. It's not as if the gateways need to be configured with all the **** BDs of the tenant. This document describes a unified solution based on [RFC6513] and [RFC6514] for seamless interoperability of multicast VPN between EVPN and MVPN PEs. Furthermore, it describes how the proposed solution can be used as a routed multicast solution for EVPN-only applications in Patel, et al. Expires January 2, 2017 [Page 4] INTERNET DRAFT Seamless Interop between EVPN & MVPN PEs July 2, 2017 data centers (e.g., routed multicast VPN only among EVPN PEs). 2. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in [RFC2119] only when they appear in all upper case. They may also appear in lower or mixed case as English words, without any normative meaning. 3. Terminology ARP: Address Resolution Protocol BEB: Backbone Edge Bridge B-MAC: Backbone MAC Address CE: Customer Edge C-MAC: Customer/Client MAC Address ES: Ethernet Segment ESI: Ethernet Segment Identifier IRB: Integrated Routing and Bridging LSP: Label Switched Path MP2MP: Multipoint to Multipoint MP2P: Multipoint to Point ND: Neighbor Discovery NA: Neighbor Advertisement P2MP: Point to Multipoint P2P: Point to Point PE: Provider Edge EVPN: Ethernet VPN EVI: EVPN Instance RT: Route Target Single-Active Redundancy Mode: When only a single PE, among a group of PEs attached to an Ethernet segment, is allowed to forward traffic to/from that Ethernet Segment, then the Ethernet segment is defined to be operating in Single-Active redundancy mode. All-Active Redundancy Mode: When all PEs attached to an Ethernet segment are allowed to forward traffic to/from that Ethernet Segment, then the Ethernet segment is defined to be operating in All-Active redundancy mode. 4. Requirements This section describes the requirements specific in providing Patel, et al. Expires January 2, 2017 [Page 5] INTERNET DRAFT Seamless Interop between EVPN & MVPN PEs July 2, 2017 seamless multicast VPN service between MVPN and EVPN capable networks. 4.1. Optimum Forwarding The solution SHALL support optimum multicast forwarding between EVPN and MVPN PEs within a network. The network can be confined to a CO or it can span across multiple LATAs. The solution SHALL support optimum multicast forwarding with both ingress replication tunnels and P2MP tunnels. **** Agreed that it should be possible to deploy in such a way that traffic **** from a multicast source to a multicast receiver follows an optmimum **** path. However, this is a factor that operators may want to trade off **** against other factors. 4.2. Optimum Replication For EVPN PEs with IRB capability, the solution SHALL use only a single multicast tunnel among EVPN and MVPN PEs for IP multicast traffic. **** This does not seem like a requirement, as it strictly prohibits **** interworking between MVPN domains and EVPN domains that use different **** tunnel types. **** Even in the case where the EVPN nodes support the same tunnel type that **** the vanilla MVPN nodes support, I don't see why it is a requirement to **** use only a single multicast tunnel among both kinds of nodes. Multicast tunnels can be either ingress replication tunnels **** Note that ingress replication NEVER provides optimum replication, and **** thus is ruled out altogether if "optimum replication" is really a **** requirement ;-) or P2MP tunnels. The solution MUST support optimum replication for both Intra-subnet and Inter-subnet IP multicast traffic: - Non-IP traffic SHALL be forwarded per EVPN baseline [RFC7432] or [OVERLAY] - If a Multicast VPN spans across both Intra and Inter subnets, then for Ingress replication regardless of whether the traffic is Intra or Inter subnet, only a single copy of multicast traffic SHALL be sent from the source PE to the destination PE. **** Note that this requirement cannot be met if the inter-subnet multicast **** domain includes nodes that support only RFC7432. Certainly one does **** want to avoid having a source send the same packet twice to the same **** destination, unless there is some advantage in doing so. But it will **** be hard to avoid this if doing inter-subnet multicast between two **** subnets that have presence on an RFC7432 node. - If a Multicast VPN spans across both Intra and Inter subnets, then for P2MP tunnels regardless of whether the traffic is Intra or Inter subnet, only a single copy of multicast data SHALL be transmitted by the source PE. Source PE can be either EVPN or MVPN PE and receiving PEs can be a mix of EVPN and MVPN PEs - i.e., a multicast VPN can be spread across both EVPN and MVPN PEs. **** As already remarked, this "requirement" prohibits interworking between **** domains that use different tunnel types. It also prevents new tunnel **** types from being deployed in EVPN unless they are also deployed in **** MVPN, and vice versa. **** This "requirement" is responsible for the problem that arises when **** there are MVPN receivers for an EVPN multicast source that is on an **** all-active multi-homed ethernet segment. **** This really is just not a requirement. A better requirement would be **** "allow MVPN/EVPN interworking without requiring the same tunnel types **** (or even the same tunnels of a given type) to be used in both domains". **** An even better requirement would be "keep the interface between MVPN **** and EVPN clean, clear, and properly layered, to avoid unnecessary **** entanglements and dependencies, and to allow each domain to provide its **** proper functionality internally." 4.3. All-Active and Single-Active Multi-Homing The solution MUST support multi-homing of source devices and receivers that are sitting in the same subnet (e.g., VLAN) and are multi-homed to EVPN PEs. The solution SHALL allow for both Single- Active and All-Active multi-homing. **** As already discussed, the solution in this document will not work with **** all-active multi-homing, as the vanilla MVPN PEs will not be able to **** properly select the upstream PE for (S,G) when S is on an all-active **** multi-homed segment. The solution MUST prevent loop during steady and transient states just like EVPN baseline solution [RFC7432] and [OVERLAY] for all multi-homing types. 4.4. Inter-AS Tree Stitching The solution SHALL support multicast tree stitching when the tree spans across multiple Autonomous Systems. **** But this is not "seamless"! (Sorry, one of my pet peeves is schemes **** that purport to be seamless and then talk about stitching things **** together; by definition, stitching creates a seam ;-)) **** Note that MVPN's tunnel segmentation procedures are not restricted to **** deployment at AS boundaries. See, e.g., RFC 7524. Patel, et al. Expires January 2, 2017 [Page 6] INTERNET DRAFT Seamless Interop between EVPN & MVPN PEs July 2, 2017 4.5. EVPN Service Interfaces The solution MUST support all EVPN service interfaces listed in section 6 of [RFC7432]: - VLAN-based service interface - VLAN-bundle service interface - VLAN-aware bundle service interface 4.6. Distributed Anycast Gateway The solution SHALL support distributed anycast gateways for tenant workloads on NVE devices operating in EVPN-IRB mode. **** It's not completely clear what this means, as "distributed anycast **** gateways" are a unicast feature that seems orthogonal to the topic of **** the current draft. 4.7. Selective & Aggregate Selective Tunnels The solution SHALL support selective and aggregate selective P- tunnels as well as inclusive and aggregate inclusive P-tunnels. When selective tunnels are used, then multicast traffic SHOULD only be forwarded to the remote PE which have receivers - i.e., if there are no receivers at a remote PE, the multicast traffic SHOULD NOT be forwarded to that PE and if there are no receivers on any remote PEs, then the multicast traffic SHOULD NOT be forwarded to the core. **** Note that even when selective tunnels are used, MVPN procedures do not **** ensure that a given PE only gets traffic for which it has **** receivers. MVPN is explicitly designed to allow multiple PMSIs to share **** a single tunnel, so that operators can trade off state against **** optimality. 4.8. Tenants' (S,G) or (*,G) states The solution SHOULD store (C-S,C-G) and (C-*,C-G) states only on PE devices that have interest in such states hence reducing memory and processing requirements - i.e., PE devices that have sources and/or receivers interested in such multicast groups. **** This "requirement" makes inter-AS tunnel segmentation impossible, even **** though such segmentation has already been stated as a requirement. **** This "requirement" also seems to rule out the use of RRs to propagate **** the various BGP multicast-related routes. And it rules out the use of **** MVPN's Source Active A-D routes. **** I'd agree though that the solution should avoid creating state where **** that state is not useful. 5. Solution [EVPN-IRB] describes the operation for EVPN PEs in IRB mode for unicast traffic. The same EVPN PE model, where an IP-VRF is attached to one or more MAC-VRF via virtual IRB interfaces, is also applicable here. However, there are some noticeable differences between the IRB mode operation for unicast traffic described in [EVPN-IRB] versus for multicast traffic described here. For unicast traffic, the intra- subnet traffic, is bridged within the MAC-VRF associated with that subnet (i.e., a lookup based on MAC-DA is performed); whereas, the inter-subnet traffic is routed in the corresponding IP-VRF (ie, a lookup based on IP-DA is performed). A given tenant can have one or more IP-VRFs; however, without loss of generality, this document assumes one IP-VRF per tenant. For multicast traffic, the intra- subnet traffic is bridged for non-IP traffic and it is Layer-2 Patel, et al. Expires January 2, 2017 [Page 7] INTERNET DRAFT Seamless Interop between EVPN & MVPN PEs July 2, 2017 switched for IP traffic. The differentiation between bridging and L2- switching for multicast traffic is that the former uses MAC-DA lookup for forwarding the traffic; whereas, the latter uses IP-DA lookup for forwarding the multicast traffic where the forwarding states are built using IGMP/MLD snooping. The inter-subnet multicast traffic is always routed in the corresponding IP-VRF. **** I'm not sure I understand the above paragraph. It's true that for both **** intra-subnet and inter-subnet distribution of an (S,G) IP multicast **** packet, the distribution of the traffic is based on the IP multicast **** states ((*,G) or (S,G)) rather than on the MAC DA. But I'm not sure I **** understand the point that is being made -- you can't ever tell from the **** MAC DA of a multicast frame where the packet needs to be sent. Also, **** the multicast states may be built from PIM snooping or from full **** participation in PIM, not just from IGMP/MLD snooping. **** Perhaps all that is meant is that "bridging" doesn't involve any (S,G) **** lookups or states? This section describes a multicast VPN solution based on [MVPN] for EVPN PEs operating in IRB mode that want to perform seamless interoperability with their counterparts MVPN PEs. 5.1. Operational Model for Homogenous EVPN IRB NVEs In this section, we consider the scenario where all EVPN PEs have IRB capability and operating in IRB mode for both unicast and multicast traffic (e.g., all EVPN PEs are homogenous in terms of their capabilities and operational modes). In this scenario, the EVPN PEs terminate IGMP/MLD messages from tenant host devices or PIM messages from tenant routers on their IRB interfaces, thus avoid sending these messages over MPLS/IP core. A tenant virtual/physical router (e.g., CE) attached to an EVPN PE becomes a multicast routing adjacency of that PE and the multicast routing protocol on the PE-CE link link is presumed to be PIM-SM with both the ASM and the SSM service models per [RFC6513]. **** Is the suggestion here that there has to be a tenant router running **** PIM? Or are you saying that IF there is a tenant router running PIM, it **** has to become a PIM adjacency of all the EVPN PEs that attach to the **** same BD as the tenant router? **** In EVPN, the CE is usually a bridge, not a router, so one can't really **** talk of the "multicast routing protocol on the PE-CE link". **** Additionally, if the PE is to become a PIM adjacency to a tenant **** router, the PE and the tenant router need to also be unicast routing **** adjacencies (because RPF selection depends upon the unicast routing). **** This is not necessarily the case in EVPN deployments. Furthermore, the PE uses MVPN BGP protocol and procedures per [RFC6513] and [RFC6514]. With respect to tenant PIM protocol, PIM-SM with Any Source Multicast (ASM) mode, PIM-SM with Source Specific Multicast (SSM) mode, and PIM Bidirectional (BIDIR) mode are all supported per [RFC6513]. **** RFC6513 does have a rather sketchy mention of BIDIR-PIM, but doesn't **** actually specify the procedures for it. What it does specify is **** unlikely to be compatible with any existing BIDIR-PIM deployment by the **** tenant. Support of PIM-DM (Dense Mode) is excluded in this document per [RFC6513]. The EVPN PEs use MVPN BGP routes [RFC 6514] to convey tenant (S,G) or (*,G) states to other MVPN or EVPN PEs and to set up overlay trees (inclusive or selective) for a given MVPN. The leaves and roots of these overlay trees are composed of Provider Multicast Service Interface (PMSI) and it can be Inclusive-PMSI (I-PMSI) or Selective- PMSI (S-PMSI) per [RFC6513]. **** In RFC 6513, a PMSI is actually a multicast service that delivers **** packets from a PE transmitter to a set of PE receivers. A PMSI is **** instantiated by one or more multicast tunnels (P-tunnels). Sending on **** a VPN's I-PMSI sends to all PEs in that VPN, sending on an S-PMSI sends **** to only the set of PEs that explicitly decide that they need to **** receive traffic from that S-PMSI. A given PMSI is associated with a single IP-VRF of an EVPN PE and/or a MVPN PE for that MVPN - e.g., a MVPN PMSI is never associated with a MAC-VRF of an EVPN PE. **** It's not clear what's being said here. What is this "association"? Overlay-trees **** "Overlay-trees"? Is that what is called "C-multicast trees" in MVPN, **** i.e., multicast trees that exist in the customer (tenant) domain? Or **** does "overlay-trees" mean what "P-tunnels" means in MVPN. Either way, **** I can't make any sense of the following sentence: are instantiated by underlay provider tunnels (P-tunnels) - e.g., P2MP, MP2MP, or unicast tunnels per [RFC 6513]. **** C-multicast trees are carried through the core by PMSIs, and PMSIs are **** instantiated by P-tunnels. When there are many- to-one mapping of PMSIs to a P-tunnel (e.g. mapping many S-PMSIs or many I-PMSI to a single P-tunnel), the tunnel is referred to as aggregate tunnel. **** Per section 6.3 of RFC 6513, an aggregate tunnel is one that carries **** the traffic of multiple VPNs, not one that carries the traffic of **** multiple PMSIs within the same VPN. Figure-1 below depicts a scenario where a tenant's MVPN spans across both EVPN and MVPN PEs; where all EVPN PEs have IRB capability. An EVPN PE (with IRB capability) can be modeled as a MVPN PE where the virtual IRB interface of an EVPN PE (virtual interface between MAC- Patel, et al. Expires January 2, 2017 [Page 8] INTERNET DRAFT Seamless Interop between EVPN & MVPN PEs July 2, 2017 VRF and IP-VRF) can be considered as an attachment circuit (AC) for the MVPN PE. In other words, an EVPN PE can be modeled as a PE that consists of a MVPN PE whose ACs are replaced with IRB interfaces connecting each IP-VRF of the MVPN PE to a set of MAC-VRFs. Similar to a MVPN PE where an attachment circuit serves as a routed multicast interface for an IP-VRF associated with a MVPN instance, an IRB interface serves as a routed multicast interface for the IP-VRF associated with the MVPN instance. Since EVPN PEs run MVPN protocols (e.g., [RFC6513] and [RFC6514]), for all practical purposes, they look just like MVPN PEs to other PE devices. Such modeling of EVPN PEs, transforms the multicast VPN operation of EVPN PEs to that of [MVPN] and thus simplifies the interoperability between EVPN and MVPN PEs to that of running a single unified solution based on [MVPN]. **** This paragraph runs a lot of things together. **** If we want to develop an MVPN/EVPN interworking scheme that doesn't **** require software upgrades to the MVPN nodes, then, of course, some set **** of EVPN nodes will have to present themselves as MVPN nodes to the **** vanilla MVPN nodes. The stuff about "simplifying interoperability" and **** "running a single unified solution" is not very meaningful, and doesn't **** seem to follow from the rest of the paragraph. **** I'd point out that "simplifying interoperability" between two schemes **** is generally facilitated by maintaining clean interfaces between them, **** not by tangling them up with each other. Also, "single unified **** solution" is sometimes code for "just push the square pegs into the **** round holes". **** I agree that the vanilla MVPN nodes cannot have any EVPN-specific **** knowledge, but it's worth pointing out some of the difficultes this **** causes. If EVPN-PE1 and EVPN-PE2 are both attached to subnet S (but **** each to a different ethernet segment of subnet S), and each PE exports **** a subnet route for S, the MVPN PEs will have no way to know which of **** PE1 or PE2 is a possible ingress node for a particular source on that **** subnet. PE1 and PE2 may each have an IRB interface to BD1, but if **** those IRB interfaces lead to different ethernet segments, the vanilla **** MVPN nodes cannot tell. (This is not the same issue as the **** multi-homing issue discussed previously.) **** To get around this, EVPN PEs would have to export unicast host routes **** to all multicast sources. It's a good thing the "requirements" don't **** include stuff like "do not burden the vanilla MVPN nodes with lots of **** host routes that they don't really need" ;-) **** The gateway-based solution discussed in the draft seems to require that **** only the gateways export unicast routes to MVPN, but says nothing about **** how they know which routes to export. **** Note that this problem can be ameliorated if the EVPN domain connects **** to the MVPN domain at only a few points. Then only those points would **** need to advertise unicast routes to MVPN, and, depending upon the **** topology, it might be perfectly fine for them to advertise subnet **** routes for subnets containing multicast sources. **** In practice, I think operators will want to be able to trade off **** "optimal routing" against "increased amount of state in vanilla MVPN **** node", and the procedures in this draft do not seem to allow that **** trade-off. EVPN PE1 +------------+ Src1 +----|(MAC-VRF1) | MVPN PE1 Rcvr1 +----| \ | +---------+ +--------+ | (IP-VRF)|----| |---|(IP-VRF)|--- Rcvr5 | / | | | +--------+ Rcvr2 +---|(MAC-VRF2) | | | +------------+ | | | MPLS/ | EVPN PE2 | IP | +------------+ | | Rcvr3 +---|(MAC-VRF1) | | | MVPN PE2 | \ | | | +--------+ | (IP-VRF)|----| |---|(IP-VRF)|--- Rcvr6 | / | +---------+ +--------+ Rcvr4 +---|(MAC-VRF3) | +------------+ Figure-1: Homogenous EVPN NVEs Although modeling an EVPN PE as a MVPN PE, conceptually simplifies the operation to that of a solution based on [MVPN], the following operational aspects of EVPN are impacted and needs to be factored in the solution: 1) All-Active multi-homing of IP multicast sources and receivers 2) Mobility for Tenant's sources and receivers 3) Unicast route advertisements for IP multicast source 4) non-IP multicast traffic handling Patel, et al. Expires January 2, 2017 [Page 9] INTERNET DRAFT Seamless Interop between EVPN & MVPN PEs July 2, 2017 The first bullet, All-Active multi-homing of IP multicast source and receivers, is described in section 5.3. The second bullet is described in section 5.4. Third and fourth bullets are described next. When an IP multicast source is attached to an EVPN PE, the unicast route for that IP multicast source needs to be advertised. **** Do you mean a host route needs to be advertised? MVPN does not depend **** upon host routes, so this is not an MVPN procedure; it's an "MVPN/EVPN **** gateway" procedure. This unicast route is advertised with VRF Route Import extended community which in turn is used as the Route Target for Join (S,G) messages sent toward the source PE by the remote MVPN PEs. The EVPN PE advertises this unicast route using EVPN route type 5 or IPVPN unicast route or both along with VRF Route Import extended community. **** MVPN procedures do not recognize RT-5 routes, so if we're just using **** MVPN procedures it hardly matters what attributes are carried by the **** RT-5 routes. This seems like an EVPN-only adaptation of MVPN **** procedures. There's nothing wrong with EVPN-specific adaptations of **** MVPN procedures, but let's not pretend we're just using MVPN **** procedures. Also, the adaptation is not properly specified above. One **** needs to be a lot more specific than simply saying "used as the RT for **** Join (S,G) messages sent toward the source PE". **** BTW, I don't see why RT-5s are required, since RT-2s can also specify **** IP addresses. When unicast routes are advertised by MVPN PEs, they are advertised using IPVPN unicast route along with VRF Route Import extended community per [RFC6514]. **** Plus whatever other attributes RFCs 6513/6514 (and perhaps RFC 7900 as **** well) requires the unicast routes to carry. Link local multicast traffic (e.g. addressed to 224.0.0.x in case of IPv4) as well as IP protocols such as OSPF, and non-IP multicast/broadcast traffic are sent per EVPN [RF7432] BUM procedures and does not get routed via IP-VRF for multicast addresses. **** "IP protocols such as OSPF"? Presumably PIM/IGMP/MLD aren't "IP **** protocols such as OSPF"? This needs a bit more detail. So, such BUM traffic will be limited to a given EVI/VLAN (e.g., a give subnet); whereas, IP multicast traffic, will be locally switched for local interfaces attached on the same subnet and will be routed for local interfaces attached on a different subnet or for forwarding traffic to other EVPN PEs (refer to section 5.1.1 for data plane operation). **** As already discussed, this produces incorrect results (improper **** ethernet emulation with regard to MAC SA and IP TTL) when a source and **** a receiver are in different segments of the same BD, and each segment **** is attached to a different PE. **** This also produces inconsistent results when source and receiver are on **** different BDs, depending upon whether the source and receiver are **** attached to the same PE or not. **** Intra-subnet multicasts should have MAC SA and IP TTL unchanged. **** Inter-subnet multicasts should have TTL decremented by 1. The way the **** EVPN infrastructure has been put together by the operator should not be **** visible to the tenants. **** Failure to do the ethernet emulation correctly will lead to a plethora **** of tenant complaints like "why doesn't my single subnet application **** work any more, it always worked on real ethernet?", "my application **** uses TTL for multicast scoping, why doesn't it work anymore?", etc., **** etc. I see that this concern is quickly dismissed, but the basis for **** its dismissal seems to be nothing more than wishful thinking. 5.1.1 Control Plane Operation Just like a MVPN PE, an EVPN PE runs a separate tenant multicast routing instance (VPN-specific) per MVPN instance and the following tenant multicast routing instances are supported: - PIM Sparse Mode (PIM-SM) with the ASM service model - PIM Sparse Mode with the SSM service model - PIM Bidirectional Mode (BIDIR-PIM), which uses bidirectional tenant-trees to support the ASM service model A given tenant's PIM join messages, (C-*, C-G) or (C-S, C-G), are processed by the corresponding tenant multicast routing protocol and they are advertised over MPLS/IP network using Shared Tree Join route (route type 6) and Source Tree Join route (route type 7) respectively of MCAST-VPN NLRI per [RFC6514]. **** As specified above, this assumes that the tenants are running PIM **** routers, and that the PIM routers are unicast routing adjacencies of **** the EVPN PEs. Both of these assumptions seem problematic. **** I think this draft is very unclear about the various ways a PE might **** learn that there is interest in receiving a particular multicast flow, **** and I never quite know which parts of the draft presume IGMP/MLD, which **** parts presume PIM, and which parts don't care. The following NLRIs from [RFC6514] SHOULD be used for forming Underlay/Core tunnels inside a data center. Patel, et al. Expires January 2, 2017 [Page 10] INTERNET DRAFT Seamless Interop between EVPN & MVPN PEs July 2, 2017 Intra-AS I-PMSI A-D route is used to form default tunnel (also called inclusive tunnel) for a tenant VRF. The tunnel attributes are indicated using PMSI attribute with this route. S-PMSI A-D route is used to form Customer flow specific underlay tunnels. This enables selective delivery of data to PEs having active receivers and optimizes fabric bandwidth utilization. The tunnel attributes are indicated using PMSI attribute with this route. Source Active A-D route is used by source connected PE in order to announce active multicast source. This enables PEs having active receivers for the flow to join the tunnels and switch to Shortest Path tree. **** As discussed, this is just not an accurate description of what SA A-D **** routes do in MVPN. Each EVPN PE supporting a specific MVPN discovers the set of other PEs in its AS that are attached to sites of that MVPN using Intra-AS I-PMSI A-D route (route type 1) per [RFC6514]. It can also discover the set of other ASes that have PEs attached to sites of that MVPN using Inter-AS I-PMSI A-D route (route type 2) per [RFC6514]. After the discovery of PEs that are attached to sites of the MVPN, an inclusive overlay tree (I-PMSI) can be setup for carrying tenant multicast flows for that MVPN; however, this is not a requirement per [RFC6514] and it is possible to adopt a policy in which all tenant flows are carried on S-PMSIs. **** I don't believe there is any significant deployment of Inter-AS I-PMSI **** A-D routes, and I wouldn't suggest their use. I don't really think **** that Inter-AS I-PMSIs are needed. An EVPN PE also sets up a multipoint-to-multipoint (MP2MP) tree per EVI using Inclusive Multicast Ethernet Tag route (route type 3) of EVPN NLRI per [RFC7432]. **** In MVPN terms, this is an MI-PMSI, not a MP2MP tree. The only ways to **** set up MP2MP trees are mLDP (with MP2MP FEC) and BIDIR-PIM, This MP2MP tree can be instantiated using unicast tunnels or P2MP tunnels. In [RFC7432], this tree is used for transmission of all BUM traffic including IP multicast traffic. However, for multicast traffic handling in EVPN-IRB PEs, this tree is used for all broadcast, unknown-unicast and non-IP multicast traffic - i.e., it is used for all BUM traffic except IP multicast user traffic. **** Shouldn't the BUM tunnels should also be used for link-local IP **** multicast traffic, except perhaps for IGMP messages and PIM J/P **** messages (assuming that PIM/IGMP will be proxied via BGP). Things **** like PIM Hello messages need to go over the BUM tunnels, unless there **** is a way of advertising via BGP that one wants PIM Hellos. (Whether an **** IP multicast packet is link-local is determined by its IP destination **** address.) Therefore, an EVPN-IRB PE sends a customer IP multicast flow only on a single tunnel that is instantiated for MVPN I-PMSI or S- PMSI. In other words, IP multicast traffic sent over MPLS/IP network are not sent off of MAC-VRF but rather IP-VRF. If a tenant host device is multi-homed to two or more EVPN PEs using All-Active multi-homing, then IGMP join and leave messages are synchronized between these EVPN PEs using EVPN IGMP Join Synch route (route type 7) and EVPN IGMP Leave Synch route (route type 8). **** This just means that the DF of the multi-homed segment will maintain **** all the ICMP state for the segment. True enough, but I don't see what **** it has to do with the rest of the paragraph. There is no need to use EVPN Selective Multicast Tag route (SMET route) because the IGMP messages are terminated by the EVPN-IRB PE and tenant (S,G) or (*,G) join messages are sent via MVPN Source/Shared Tree Join messages. **** This is incorrect. The MVPN C-multicast Source Tree Join and Shared **** Tree Join routes cannot replace the SMET routes, because the **** C-multicast routes do not have an "originating router" field in the **** NLRI. To eliminate the need for SMET routes, you need the MVPN Leaf **** A-D routes as well. Since MVPN Leaf A-D routes are only sent in **** response to x-PMSI A-D routes with the Leaf Info Required flag set, **** those routes are needed as well. **** Suppose an EVPN PE is interested in receiving (*,G) traffic, where all **** of G's sources are within the EVPN domain, but all the receivers are **** not on the same BD. One can imagine two possible procedures: **** 1. Originate a single SMET route for (*,G), with an RT that takes it to **** all the EVPN PEs attached to ethernet segments of the same tenant. **** 2. a. Set up a PIM RP for G on one of the tenant's BDs. (Or more **** likely, tell the tenant that he has to set this up.) **** b. Originate a C-multicast Shared Tree Join with an RT that takes it **** to the one PE attached to that RP. **** c. Then have the PE attached to the RP originate a C-multicast Source **** Tree Join (S,G) for every PE that is attached to a source for G. **** d. Then have each such PE originate a Source Active A-D route for (S,G) **** and an S-PMSI A-D route for (S,G). **** e. Then have each PE attached to a receiver originate n Leaf A-D **** routes, one for each PE attached to a source. **** It's true that the second procedure (the one proposed in the draft) **** eliminates the need for SMET routes. But I don't think one could argue **** that the second procedure is simpler. In the case where we need **** inter-subnet multicast within EVPN, but do not need MVPN interworking, **** the second procedure seems somewhat non-optimal. **** Anyway, for inter-subnet multicasts that are entirely within an EVPN **** domain, I don't see why one would want to replace an existing deployed **** mechanism like the SMET routes with something else. The SMET routes **** are well-adapted for their purpose. Furthermore, if some EVPN nodes **** support SMET routes but do not support MVPN, support for the SMET **** routes will definitely be needed anyway. **** A useful enhancement would be to allow SMET (S,G) routes to be **** "targeted" (via the RT) to a particular ingress PE. Patel, et al. Expires January 2, 2017 [Page 11] INTERNET DRAFT Seamless Interop between EVPN & MVPN PEs July 2, 2017 5.1.2 Data Plane Operation **** This paragraph seems to presume IGMP, and doesn't mention PIM. When an EVPN-IRB PE receives an IGMP/MLD join message over one of its Attachment Circuits (ACs), it adds that AC to its Layer-2 (L2) OIF list. **** Even if that AC attaches to a BD on all-active multi-homed segment for **** which this PE is not the DF? Is this a consequence of the decision to **** use local bias to prevent duplication on all-active ethernet segments? **** If a PIM or IGMP Join is received by PE1 on an AC attaching to a BD on **** all-active multi-homed segment for which PE1 is not the DF, does PE1 **** originate a C-multicast Join, or does only the DF originate the **** C-multicast Join? If PE1 pulls the multicast traffic, does it deliver **** the traffic even though it is not the DF? If PE1 does not deliver the **** traffic, the optimum routing "requirement" might be violated. (PE1 might **** be closer to the source of the traffic than the DF is.) If PE1 does **** deliver the traffic, we might get duplicate delivery (unless the two **** PEs attached to the segment run PIM Assert on the BD). This L2 OIF list is associated with the MAC-VRF corresponding to the subnet of the tenant device that sent the IGMP/MLD join. Therefore, tenant (S,G) or (*,G) forwarding entries are created/updated for the corresponding MAC-VRF based on these source and group IP addresses. **** Why are the multicast states associated with a MAC-VRF rather than with **** a Bridge Table? Furthermore, the IGMP/MLD join message is propagated over the corresponding IRB interface and it is processed by the tenant multicast routing instance which creates the corresponding tenant (S,G) or (*,G) Layer-3 (L3) forwarding entries. It adds this IRB interface to the L3 OIF list. An IRB is removed as a L3 OIF when all L2 tenant (S,G) or (*,G) forwarding states is removed for the MAC-VRF associated with that IRB. Furthermore, tenant (S,G) or (*,G) L3 forwarding state is removed when all of its L3 OIFs are removed - i.e., all the IRB interfaces associated with that tenant (S,G) or (*,G) are removed. **** If this is a description of how the multicast states are constructed, **** you surely need to specify how the IIF is set, not just the OIFlist. When an EVPN-IRB PE receives IP multicast traffic, if it has any attached receivers for that subnet, it does L2 switching for such intra-subnet traffic. **** By "L2 switching" here, do you mean forwarded as indicated in the **** matching layer 2 (x,G) multicast state? Since those states are **** per-MAC-VRF rather than per-BD, this will deliver the traffic to any **** ACs of other BDs that are in the same MAC-VRF. Is that the intention? **** Probably not. It then sends the multicast traffic over the corresponding IRB interface. The multicast traffic then gets routed over IRB interfaces that are included in the OIF list for that **** In the OIF list of the layer 3 state, right? **** A clearer distinction between L2 and L3 multicast states is needed. multicast traffic (and TTL gets decremented). When the multicast traffic is received on an IRB interface by the MAC-VRF corresponding to that interface, it gets L2 switched and sent over ACs that belong to the L2 OIF list. **** It seems to me that if the OIFlist of the per-MAC-VRF (S,G) state has **** ACs that are not in the source BD, this procedure will cause the **** traffic to be delivered twice to the local ACs of those non-source **** BDs. I know that's not the intention; probably the L2 states need to be **** per-BD rather than per-MAC-VRF. **** The usual semantics of sending a packet down an IRB interface to a **** partcular BD are that it gets tunneled to other PEs attached to the **** same BD. Those semantics are modified for multicast, but the draft **** hasn't to this point said that explicitly. (Modification of the **** semantics of the IRB interface is casually mentioned deep inside **** section 6.) Furthermore, the multicast traffic gets sent over I-PMSI or S-PMSI associated with that multicast flow to other PE devices that are participating in that MVPN. 5.1.2.1 Sender and Receiver in same MAC-VRF Rcvr1 in Figure 1 is connected to PE1 in MAC-VRF1 (same as Src1) and sends IGMP join for (C-S, C-G), IGMP snooping will record this state in local bridging entry. A routing entry will be formed as well which will point to MAC-VRF1 as RPF for Src1. We assume that Src1 is known via ARP or similar procedures. **** What will happen if there is no route to Src1 in either the MAC-VRF or **** the IP-VRF? Do (S,G) packets need to be dropped if there is no route to **** S? Rcvr1 will get a locally bridged copy of multicast traffic from Src1. Rcvr3 is also connected in MAC-VRF1 but to PE2 and hence would send IGMP join which will be recorded at PE2. **** This seems off-topic given the section title. **** I don't think a single MAC-VRF can exist on two PEs, but that's what **** the above text suggests. You need to spell out what you really mean. PE2 will also form routing entry and RPF will be assumed as Tenant Tunnel "Tenant1" formed beforehand using MVPN procedures. **** I don't follow this. What is "Tenant Tunnel Tenant1 formed **** beforehand using MVPN procedures"? Also this would cause multicast control plane to initiate a BGP MCAST-VPN type 7 route which would include VRI for PE1 and hence be accepted on PE1. PE1 will include Tenant1 tunnel as Outgoing Interface (OIF) in the routing entry. **** What is "Tenant1 tunnel"? **** By "Tenant1 tunnel", do you mean "whatever tunnel is the match for **** reception (RFC 6625) for (S,G)"? Now, since it has knowledge of remote receivers via MVPN control plane it will encapsulate original multicast traffic in Tenant1 tunnel towards **** If this is supposed to interwork with MVPN, the ingress PE will need to **** use an S-PMSI A-D route to announce the tunnel on which it is sending **** the packets. (I assume you don't want to use the I-PMSI, as this will **** violate the optimal replication "requirement".) Other PEs may need to **** join the tunnel by originating Leaf A-D routes. Patel, et al. Expires January 2, 2017 [Page 12] INTERNET DRAFT Seamless Interop between EVPN & MVPN PEs July 2, 2017 core. On PE2, since C-S falls in the MAC-VRF1 subnet, MAC-VRF1 Outgoing interface is treated as Ingress MAC-VRF bridging. Hence no rewrite is performed on the received customer data packet while forwarding towards Rcvr3. **** I'm not sure what this is trying to say. To get the packet from PE1 to **** PE2, it has to travel on an MVPN tunnel, which means the ethernet **** encapsulation has been stripped. Presumably it needs to be **** encapsulated in ethernet again to be sent towards Rcvr3. Thus I'm not **** sure what is meant by "no rewrite is performed". Do you just mean that **** the IP TTL handling is influenced by whether C-S is in the MAC-VRF? **** I'd say that's a pretty significant layering violation. (Of course, if **** a packet is intra-subnet, it doesn't much matter whether its TTL is **** decremented by 1 or by 2, both are wrong.) 5.1.2.2 Sender and Receiver in different MAC-VRF Rcvr2 in Figure 1 is connected to PE1 in MAC-VRF2 and hence PE2 will record its membership in MAC-VRF2. Since MAC-VRF2 is enabled with IRB, it gets added as another OIF to routing entry formed for (C-S, C-G). Rcvr3 and Rcvr4 are also in different MAC-VRFs than multicast speaker Src1 and hence need Inter-subnet forwarding. PE2 will form local bridging entry in MAC-VRF2 due to IGMP joins received from Rcvr3 and Rcvr4 respectively. PE2 now adds another OIF 'MAC-VRF2' to its existing routing entry. But there is no change in control plane states since its already sent MVPN route and no further signaling is required. Also since Src1 is not part of MAC-VRF2 subnet, it is treated as routing OIF and hence MAC header gets modified as per normal procedures for routing. PE3 forms routing entry very similar to PE2. It is to be noted that PE3 does not have MAC-VRF1 configured locally but still can receive the multicast data traffic over Tenant1 tunnel formed due to MVPN procedures 5.2. Operational Model for Heterogeneous EVPN IRB PEs **** I want to see how this is handled without violating any of the **** requirements given in Section 4! 5.3. All-Active Multi-Homing EVPN solution [RFC7432] uses ESI MPLS label for split-horizon filtering of Broadcast/Unknown unicast/multicast (BUM) traffic from an All-Active multi-homing Ethernet Segment to ensure that BUM traffic doesn't get loop back to the same Ethernet Segment that it came from. In MVPN, there is no concept of ESI label and split- horizon filtering because there is no support for All-Active multi- homing; however, EVPN NVEs rely on this function to prevent loop for an access Ethernet Segment. Figure-2 depicts a source sitting behind an All-Active dual-homing Ethernet Segment. The following scenarios needs special considerations: Patel, et al. Expires January 2, 2017 [Page 13] INTERNET DRAFT Seamless Interop between EVPN & MVPN PEs July 2, 2017 EVPN PE1 +------------+ Rcvr1 +----|(MAC-VRF1) | MVPN PE1 | \ | +---------+ +--------+ | (IP-VRF)|----| |---|(IP-VRF)|--- Rcvr4 | / | | | +--------+ +---|(MAC-VRF2) | | | Src1 | +------------+ | | (ES1) | | MPLS/ | Rcvr6 | EVPN PE2 | IP | (*,G) | +------------+ | | +---|(MAC-VRF2) | | | MVPN PE2 | \ | | | +--------+ | (IP-VRF)|----| |---|(IP-VRF)|--- Rcvr5 | / | +---------+ +--------+ Rcvr2 +----|(MAC-VRF3) | +------------+ Figure-2: Multi-homing 5.3.1. Source and receivers in same ES but on different subnets If the tenant multicast source sits on a different subnet than its receivers, then EVPN DF election procedure for multi-homing ES is sufficient and there will be no need to do any split-horizon filtering for that Ethernet Segment because with IGMP/MLD snooping enabled on VLANs for the multi-homing ES, only the VLANs for which IGMP/MLD join have been received are placed in OIF list for that (S,G) or (*,G) on that ES. Therefore, multicast traffic will not be loop backed on the source subnet (because there is no receiver on that subnet) and for other subnets that the multicast traffic is loop backed, the DF election ensures only a single copy of the multicast traffic is sent on that subnet. 5.3.2. Source and some receivers in same ES and on same subnet If the tenant multicast source sits on the same subnet and the same ES as some of its receivers and those receivers have interest in (*,G), then Besides DF election mechanism, there needs to be split- horizon filtering to ensure that the multicast traffic originated from that is not loop backed to itself. The existing split-horizon filtering as specified in [RFC7432] cannot be used because the received VPN label identifies the multicast IP-VRF and not MAC-VRF. Therefore, egress PE doesn't know for which EVI/BD it needs to perform split-horizon filtering and for which EVI/BDs Patel, et al. Expires January 2, 2017 [Page 14] INTERNET DRAFT Seamless Interop between EVPN & MVPN PEs July 2, 2017 belonging to the the same ES, it needs not to perform split-horizon filtering. This issue is resolved by extending the local-bias solution per [OVERLAY] to MPLS tunnels. **** It would be good to see some discussion of what is lost by forcing MPLS **** to use local bias instead of ESI label. My understanding is that using **** the ESI label is more robust during transitional periods, especially **** when there is single-active multi-homing. **** It's difficult to see why one should have to change an internal EVPN **** mechanism just to allow interworking with MVPN. This is what happens **** when one entangles the two different control procedures instead of **** providing clean interfaces between them. There are two cases to consider here: a) Ingress-replication tunnels used for the multicast traffic and b) P2MP tunnels used for the multicast traffic. If ingress-replication tunnels are used, then each PE in the multi- homing group instead of advertising an ESI label, it advertises to each PE in the multi-homing group a downstream assigned label identifying that PE, so that when it receives a packet with this label, it know who the originating PE is. Once the egress PE can identify the originating PE for a packet, then it can execute local- bias procedure per [OVERLAY] for each of its EVI/BDs corresponding to that IP-VRF. **** I don't understand the procedure by which these labels are advertised. **** By saying "instead of advertising an ESI label," the draft suggests **** that if one has multicast, one can't use ESI label for unicast either. **** Is that the intention? Or do you just mean that the ESI label won't be **** applied to multicast packets? **** There is a procedure in MVPN for advertising per-ingress labels in IR. **** A given ingress needs to advertise a (C-*,C-*) S-PMSI A-D route with **** the Leaf Info Required bit set, and the egress nodes respond with Leaf **** A-D routes targeted to the given ingress. I can't tell from the above **** whether the proposal is to use that procedure, or something else. If P2MP tunnels are used (e.g., mLDP, RSVP-TE, or BIER), the tunnel label identifies the tunnel and thus the originating PE. **** In BIER, the BFIR-id together with the BIER-MPLS label identifies the **** originating PE. **** The PE to be identified should be learned from the originating router **** field of the x-PMSI A-D route that advertises the tunnel. The IP **** address specified there may be different than the IP address that **** appears in (or is referenced in) the tunnel encapsulation header. Since the originating PE can be identified, the local-bias procedure per [OVERLAY] is applied to prevent multicast data to be sent on the Ethernet Segments in common with the originating PE. The difference between the local-bias procedure in here versus the one described in [OVERLAY] is that the multicast traffic in [OVERLAY] is only intended for one subnet (and thus one BD) whereas the multicast traffic in Figure-2 can span across multiple subnets (and thus multiple BDs). Therefore, local-bias procedure in [OVERLAY] is expanded to perform local bias across all the BDs of that tenant. In other words, the same local-bias procedure is applied to all BDs of that tenant in both the originating EVPN NVE as well as all other EVPN NVEs that share the Ethernet Segment with the originating EVPN NVE. **** Suppose a gateway (section 8) receives (S,G) packets from an MVPN/MPLS **** tunnel, and has to put them on an EVPN/VXLAN tunnel. And suppose the **** gateway also is attached to a multi-homed segment, ES1, one of whose **** BDs is BD1. If there are local receivers for (S,G) in BD1, and those **** receivers are attached to ES1, then the gateway needs to send the (S,G) **** packets to BD1/ES1, even if it is not BD1's DF on ES1. This needs to **** be stated clearly. (If this is not done, application of local bias by **** the DF will result in a black hole.) **** I don't think I could explain from the draft all the details of the **** control protocol that makes the local bias procedure work in this sort **** of case. Presumably the EVPN/VXLAN tunnel is advertised by the gateway **** in an x-PMSI A-D route whose "originating router id" field (in the **** NLRI) identifies the gateway, not the actual MVPN ingress PE? **** Note that this section does not mention the problem with all-active **** multihoming that has already been discussed. 5.4. Mobility for Tenant's sources and receivers 5.5. Single-Active Multi-Homing 6. DCs with only EVPN NVEs As mentioned earlier, the proposed solution can be used as a routed multicast solution for EVPN-only applications in data centers (e.g., routed multicast VPN only among EVPN PEs). It should be noted that the scope of intra-subnet, forwarding for the solution described in this document, is limited to a single EVPN-IRB PE. In other words, the IP multicast traffic that needs to be forwarded from one PE to another is always routed (L3 forwarded) regardless of whether the traffic is intra-subnet or inter-subnet. As the result, the TTL value for intra-subnet traffic that spans across two or more PEs get Patel, et al. Expires January 2, 2017 [Page 15] INTERNET DRAFT Seamless Interop between EVPN & MVPN PEs July 2, 2017 decremented. **** In other words, the ethernet emulation doesn't work correctly. Based on past experiences with MVPN over last dozen years for supported IP multicast applications, layer-3 forwarding of intra-subnet multicast traffic should be fine. **** Since MVPN does not do layer 3 forwarding (or any other kind of **** forwarding) of intra-subnet traffic, years of MVPN experience do not **** allow us to infer anything whatsoever about the needs of intra-subnet **** traffic. However, if there are applications that require intra-subnet multicast traffic to be L2 forwarded (e.g., without decrementing TTL value), then [EVPN-IRB- MCAST] proposes a solution to accommodate such applications. **** So if one needs both intra-subnet and inter-subnet multicast on the **** same BD, one has to deploy two different solutions? That doesn't seem **** to make much sense. Also, it is far from clear that the two solutions **** can actually co-exist on the same BD. 6.1 Setup of overlay multicast delivery It must be emphasized that this solution poses no restriction on the setup of the tenant BDs and that neither the source PE, nor the receiver PEs do not need to know/learn about the BD configuration on other PEs in the MVPN. **** I'm not sure what the point is here, an EVPN PE already knows which **** other EVPN PEs attach to the same BDs. Perhaps the point is just that **** the ingress and egress PEs do not have to be configured with all the **** tenant's BDs? The Reverse Path Forwarder (RPF) is selected per the tenant multicast source and the IP-VRF in compliance with the procedures in [RFC6514], using the incoming IP Prefix route (route type 5) of EVPN NLRI per [RFC7432]. **** Why not RT-2 routes? Or VPN-IP routes? Or regular IP routes? Those **** might also be in the IP-VRF. The VRF Route Import (VRI) extended community that is carried with the IP-VPN routes in [RFC6514] MUST be carried via the EVPN unicast routes instead. The construction and processing of the VRI are consistent with [RFC6514]. The VRI MUST uniquely identify the PE which is advertising a multicast source and the IP-VRF it resides in. **** I think the ECs that RFC6514 requires on unicast routes need to be on **** all the IP unicast routes exported from an EVPN-PE, no matter what the **** AFI/SAFI or route type is. VRI is constructed as following: - The 4-octet Global Administrator field MUST be set to an IP address of the PE. This address SHOULD be common for all the IP-VRFs on the PE (e.g., this address may be the PE's loopback address). - The 2-octet Local Administrator field associated with a given IP-VRF contains a number that uniquely identifies that IP-VRF within the PE that contains the IP-VRF. Every PE which detects a local receiver via a local IGMP join or a local PIM join for a specific source (overlay SSM mode) **** PIM Joins for a specific source occur in both SSM mode and ASM mode. MUST terminate the IGMP/PIM signaling at the IP-VRF and generate a (C-S,C- G) via the BGP MCAST-VPN route type 7 per [RFC6514] if and only if the RPF for the source points to the fabric. **** I guess "points to the fabric" means "is a remote PE"? If the RPF points to a local multicast source on the same MAC-VRF or a different MAC-VRF on that PE, the MCAST-VPN MUST NOT be advertised and data traffic will be locally routed/bridged to the receiver as detailed in section 6.2. **** It's not completely clear whether this is intended to be a summary of **** RFC 6513/6514 procedures for "Upstream PE" selection, or whether this **** is intended to be a replacement for them, or just an extension of them **** so that Upstream PE selection can take into account EVPN routes. The VRI received with EVPN route type 5 NLRI from source PE will be appended as an export route-target extended community. More details about handling of various types of local receivers are in section 10. The PE which has advertised the unicast route with VRI, will import the incoming MCAST-VPN NLRI in the IP-VRF with the same import route- Patel, et al. Expires January 2, 2017 [Page 16] INTERNET DRAFT Seamless Interop between EVPN & MVPN PEs July 2, 2017 target extended-community and other PEs SHOULD ignore it. Following such procedure the source PE learns about the existence of at least one remote receiver in the tenant overlay **** The ingress PE learns from the C-multicast routes that there is a **** receiver, but not who that receiver is. and programs data plane accordingly so that a single copy of multicast data is forwarded into the core VRF using tenant VRF tunnel. **** What is "the core VRF"? **** What is the "tenant VRF tunnel"? If the multicast source is unknown (overlay ASM mode), the MCAST-VPN route type 6 (C-*,C-G) join SHOULD be targeted towards the designated overlay Rendezvous Point (RP) by appending the received RP VRI as an export route-target extended community. **** Note that this implies knowledge of the tenant's group-to-RP mappings, **** which may imply participation in a tenant's BSR/Auto-RP infrastructure. Every PE which detects a local source, registers with its RP PE. **** Every PE performs First Hop Router (FHR) functionality? If there are **** tenant PIM routers on a particular BD, one of them may win the PIM DR **** election, and then the tenant router will perform the FHR **** functionality. **** In the gateway-based variant, who performs the FHR functionality, and **** how does it know to do so? That is how the RP learns about the tenant source(s) and group(s) within the MVPN. Once the overlay RP PE receives either the first remote (C-RP,C-G) join or a local IGMP join or a local PIM join, it will trigger an MCAST-VPN route type 7 (C-S,C-G) towards the actual source PE for which it has received PIM register message in full compliance with regular PIM procedures. **** More accurately, the RP originates a PIM Join(S,G) that propagates to **** the RP's ingress PE. Then that ingress PE originates a C-multicast **** Source Tree Join for (S,G), with an RT identifying the ingress PE for **** S. This process is repeated for each (S,G) when G is an ASM group. This involves the source PE to advertise the MCAST-VPN Source Active A-D route (MCAST-VPN route-type 5) towards all PEs. The Source Active A-D route is used to inform the active multicast source to all PEs in the Overlay so they can potentially switch from RP-Shared-Tree to Shortest-Path-Tree. **** Note that the SA A-D routes for (S,G) propagate to nodes that don't **** necessarily have interest in (S,G), which violates the stated **** "requirement" that (S,G) state not be present in nodes that don't send **** or receive (S,G) traffic ;-) **** I thought the PEs were part of the underlay, not part of the overlay. **** Since you're talking about the "RP-Shared-Tree", I have to assume you **** mean the tenant's (*,G) tree. The SA A-D routes have nothing to do **** with the choice made in the overlay between using the tenant's (*,G) **** shared tree and using source-specific trees. The above procedure is optional per [RFC6514], and user SHALL enable an auto-discovery mode where the temporary RP-Shared-Tree is not involved. In this mode, the source PE MUST advertise the MCAST-VPN Source Active A-D route (type 5) as soon as it detects data traffic from the local tenant multicast source. **** There is no such mode in the MVPN standards. Hence the PEs at different sites of the same MVPN will directly join the Shortest-Path-Tree once they receive the MCAST-VPN Source Active A-D route. **** This synopsis of MVPN is somewhat mangled ;-( 6.3 Data plane considerations Data-center fabrics are implemented using variety of core technologies but predominant ones are IP/VXLAN Ingress Replication, IP/VXLAN PIM and MPLS LSM. IP and MPLS have been predominant choice for MVPN core as well hence all existing procedures for forming tunnels for these technologies are applicable in EVPN as well. Also as described in earlier section, each PE acts as PIM DR in its locally connected Bridge Domain, **** Only if no other router is elected as PIM DR. we MUST NOT forward post-routed traffic out of IRB interfaces towards the core. **** I don't understand that sentence. Does it just mean that when Layer 2 **** receives a packet from Layer 3 via an IRB interface, the packet is not **** sent to any other PEs? If so, I agree, but this is an important point **** about the semantics of IRB interfaces, and shouldn't be buried so late **** in the document. **** The term "post-routed traffic out of IRB interfaces" is a bit dense ;-) 7 Handling of different encapsulations Just as in [RFC6514] the A-D routes are used to form the overlay multicast tunnels and signal the tunnel type using the P-Multicast Service Interface Tunnel (PMSI Tunnel) attribute. Patel, et al. Expires January 2, 2017 [Page 17] INTERNET DRAFT Seamless Interop between EVPN & MVPN PEs July 2, 2017 7.1 MPLS Encapsulation The [RFC6514] assumes MPLS/IP core and there is no modification to the signaling procedures and encoding for PMSI tunnel formation therein. Also, there is no need for a gateway to inter-operate with non-EVPN PEs supporting [RFC6514] based MVPN over IP/MPLS. **** This just isn't true. If EVPN is using IR and MVPN is using mLDP, both **** are using MPLS but the gateway is still needed. 7.2 VxLAN Encapsulation In order to signal VXLAN, the corresponding BGP encapsulation extended community [TUNNEL-ENCAP] SHOULD be appended to the A-D routes. **** If we were to decide to do something like this, it might be better to **** use the Tunnel Encapsulation attribute, in order to have the ability to **** signal other details of the encapsulation. The MPLS label in the PMSI Tunnel Attribute MUST be the Virtual Network Identifier (VNI) associated with the customer MVPN. The supported PMSI tunnel types with VXLAN encapsulation are: PIM-SSM Tree, PIM-SM Tree, BIDIR-PIM Tree, Ingress Replication [RFC6514]. Further details are in [OVERLAY]. **** If one of these routes makes it to a vanilla MVPN PE, the encapsulation **** EC will be ignored and trouble will ensue. How is this to be **** prevented? **** Why use an EC rather than new tunnel types or a new flag in the PMSI **** Tunnel attribute? In this case, a gateway is needed for inter-operation between the EVPN-IRB PEs and non-EVPN MVPN PEs. The gateway should re-originate the control plane signaling with the relevant tunnel encapsulation on either side. **** If an A-D route originated by an EVPN-PE is carrying the RT **** derived from the VRI of a given unicast route, and that unicast route **** was originated by an MVPN-PE, the RT will cause the route to be **** distributed to that MVPN-PE. I don't see how the gateway is going to **** get that route. **** In other words, this document has so far not specified the control **** plane procedures needed to make this gateway work. I see this is **** somewhat addressed in section 8, but the specification there is rather **** sketchy and has numerous issues (see my comments on section 8). **** BTW, by using this sort of gateway, you violate a number of the **** "requirements" given in Section 4. In the data plane, the gateway terminates the tunnels formed on either side and performs the relevant stitching/re- encapsulation on data packets. 7.3 Other Encapsulation In order to signal a different tunneling encapsulation such as NVGRE, VXLAN-GPE or MPLSoGRE the corresponding BGP encapsulation extended community [TUNNEL-ENCAP] SHOULD be appended to the A-D routes. If the Tunnel Type field in the encapsulation extended-community is set to a type which requires Virtual Network Identifier (VNI), e.g., VXLAN-GPE or NVGRE [TUNNEL-ENCAP], then the MPLS label in the PMSI Tunnel Attribute MUST be the VNI associated with the customer MVPN. Same as in VXLAN case, a gateway is needed for inter-operation between the EVPN-IRB PEs and non-EVPN MVPN PEs. **** Same comments as for Section 7.2 8. DCI with MPLS in WAN and VxLAN in DCs This section describers the inter-operation between MVPN MPLS WAN with MVPN-EVPN in a data-center which runs on VxLAN. Since the tunnel encapsulation between these networks are different, we must have at least one gateway in between. Usually, two or more are required for redundancy and load balancing purpose. Some aspects of the multi- homing between VxLAN DC networks and MPLS WAN is in common with [INTERCON-EVPN]. Herein, only the differences are described. **** While one needs to have a set of procedures that handle all the use **** cases, it doesn't make sense to have a separate set of procedures for **** each use case. The same sort of considerations apply if MVPN uses mLDP **** and EVPN uses IR; or if MVPN uses RSVP-TE P2MP and EVPN uses mLDP. **** It's not just MPLS/VxLAN that is important. 8.1 Control plane inter-connect Patel, et al. Expires January 2, 2017 [Page 18] INTERNET DRAFT Seamless Interop between EVPN & MVPN PEs July 2, 2017 The gateway(s) MUST be setup with the inclusive set of all the IP- VRFs that span across the two domains. On each gateway, there will be at least two BGP sessions: one towards the DC side and the other towards the WAN side. **** Is the idea that the MVPN routes and the EVPN routes are distinguished **** not by anything inherent in the routes, but only by the BGP sessions **** over which they arrive? That is very fragile; it will be difficult to **** prevent route leaks, and route leaks will result in disasters. It also **** seems to rule out the use of a single RR to handle both sets of **** UPDATEs; ruling that out doesn't seem very wise. The distribution of **** routes needs to be controlled by the RTs. **** If unicast routes get exchanged between MVPN and EVPN, the RTs of the **** multicast routes are computed from the VRIs of the unicast routes. So **** one needs to require that the only EVPN nodes that get unicast routes **** from MVPN are the gateways, and that the only EVPN nodes that export **** unicast routes to MVPN are the gateways. Also, a rather more thorough **** discussion of how the RTs and RDs are to be managed is needed. **** Changing tunnel types at a domain boundary of some sort is very similar **** to MVPN tunnel segmentation (as discussed in RFCs 6514 and 7524.) **** However, the draft doesn't seem to be proposing to treat this as MVPN **** segmentation. MVPN segmentation is rather complicated, but it does **** address all the RT issues. **** Of course, it is probably simpler just to keep the MVPN and EVPN **** tunnels (and the related BGP routes) separate and move data from one **** tunnel to another at the points of contact. **** It is worth noting that when MVPN segmentation is used, a single RR **** with the PEs as clients can handle all the C-multicast and SA A-D **** routes; only the S-PMSI A-D routes and Leaf A-D routes need to pass **** through the segmentation points. I'm not sure how the proposal in the **** draft would function in an environment like that. Usually for redundancy purpose, more sessions are setup on each side. The unicast route propagation follows the exact same procedures in [INTERCON-EVPN]. Hence, a multicast host located in either domain, is advertised with the gateway IP address as the next-hop to the other domain. **** The issue is not how the sessions are set up, it's how the RTs are **** assigned such that the routes get distributed to the right set of PEs **** and/or gateways. **** Also, it's not just the next hop field that needs to be modified, it's **** the VRI, perhaps the "originating router" fields of the NLRIs, etc. **** You also need to address the translation of all the other kinds of MVPN **** routes. As a result, PEs view the hosts in the other domain as directly attached to the gateway and all inter-domain multicast signaling is directed towards the gateway(s). Received MVPN routes type 1-7 from either side of the gateway(s), MUST NOT be reflected back to the same side **** That doesn't make sense; it should be possible to configure the gateway **** to be a RR and have it reflect routes back to the same side. Maybe you **** mean "MUST NOT be sent to the other side?" Again, this is really a **** matter of managing the Route Targets, and the draft needs to say more **** than "set the Route Targets so the right thing happens." but processed locally and re-advertised (if needed) to the other side: - Intra-AS I-PMSI A-D Route: these are distributed within each domain to form the overlay tunnels which terminate at gateway(s). They are not passed to the other side of the gateway(s). **** This just doesn't seem right. If the gateway-based scheme requires the **** MVPN nodes to see the gateway but not to see the non-gateway EVPN **** nodes, the gateway would not pass through the Intra-AS I-PMSI A-D **** routes from the non-gateway nodes. Also, there aren't any MVPN **** procedures that allow alteration of the PMSI Tunnel attribute in **** Intra-AS I-PMSI A-D routes. **** I see you didn't mention the Inter-AS I-PMSI A-D routes in this section **** ;-) - C-Multicast Route: joins are imported into the corresponding IP-VRF on each gateway and advertised as a new route to the other side with the following modifications (the rest of NLRI fields and path attributes remain on-touched): * Route-Distinguisher is set to that of the IP-VRF * Route-target is set to the exported route-target list on IP-VRF **** I thought the "exported route-target list on the IP-VRF" would include **** the RTs that cause the routes to be distributed within the EVPN domain **** as well. Or are you assuming that a gateway cannot have EVPN ACs? If **** so, that would be a rather unwise assumption. * The PMSI tunnel attribute and BGP Encapsulation extended community will be modified according to section 8 **** Neither of these is carried on a C-multicast route. * Next-hop will be set to the IP address which represents the gateway on either domain **** Next hop of a C-multicast route is of no importance, and in fact is **** typically modified by the RRs. - Source Active A-D Route: same as joins **** If you want the MVPN nodes to see the gateway as the ingress PE for a **** given source, the gateway has to originate SA A-D routes for that **** source; it cannot pass along the same routes it receives from the EVPN **** nodes. **** Note that if an SA A-D route for (S,G) is exported by a given PE (or **** gateway), that PE (or gateway) must also export a unicast route (not **** necessarily a host route) to S, and both routes must have the same **** RD. There is insufficient discussion of RD management in this document **** for me to tell whether this requirement can be met when gateways are **** involved. - S-PMSI A-D Route: these are passed to the other side to form selective PMSI tunnels per every (C-S,C-G) from the gateway to the PEs in the other domain provided it contains receivers for the given (C-S, C-G). Similar modifications made to joins are made to the newly originated S-PMSI. **** You need to look at every field, attribute, and extended community of **** the S-PMSI A-D routes AND the Leaf A-D routes, and describe the **** modifications. Oh, don't forget to take account of the flags. In addition, the Originating Router's IP address is set to GW's IP address. Multicast signaling from/to hosts on local ACs on the gateway(s) are generated and propagated in both domains (if needed) per the procedures in section 7 in this document and in [RFC6514] with no change. It must be noted that for a locally attached source, the gateway will program an OIF per every domain from which it receives a remote join in its forwarding plane and different Patel, et al. Expires January 2, 2017 [Page 19] INTERNET DRAFT Seamless Interop between EVPN & MVPN PEs July 2, 2017 encapsulation will be used on the data packets. **** This violates the "requirements" of section 4 ;-) Other point to notice is that if there are multiple gateways in an ESI which peer with each other, each one will receive two sets of the local MCAST-VPN routes from the other gateway: 1) the WAN set 2) the DC set. Following the same procedure as in [INTERCON-EVPN], the WAN set SHALL be given a higher priority. **** I don't really understand what's being said here. Did you mean "EVI" **** rather than "ESI"? How will a gateway distinguish one set of MCAST-VPN **** routes from another? Why would it prefer the external set of routes to **** the internal ones? 8.2 Data plane inter-connect Traffic forwarding procedures on gateways are same as those described for PEs in section 5 and 6 except that, unlike a non-border leaf PE, the gateway will not only route or bridge the incoming traffic from one side to its local receivers, but will also send it to the remote receivers in the the other domain after de-capsulation and appending the right encapsulation. The OIF and IIF are programmed in FIB based on the received joins from either side and the RPF calculation to the source or RP. The de-capsulation and encapsulation actions are programmed based on the received I-PMSI or S-PMSI A-D routes from either sides. If there are more than one gateway between two domains, the multi- homing procedures described in the following section must be considered so that incoming traffic from one side is not looped back to the other gateway. The multicast traffic from local hosts on each gateway flows to the other gateway with the preferred encapsulation (WAN encapsulation is preferred as described in previous section). 8.3 Multi-homing among DCI gateways Just as in [INTERCON-EVPN] every set of multi-homed gateways between the WAN and a given DC are assigned a unique ESI. 9. Inter-AS Operation **** What is the relevance of AS boundaries? In MVPN, AS boundaries turned **** out to be rather irrelevant, which is one of the issues that gave rise **** to RFC 7524. 10. Use Cases 10.1 DCs with only IGMP/MLD hosts w/o tenant router In a EVPN network consisting of only IGMP/MLD hosts, PE's will receive IGMP (*, G) or (S, G) joins from their locally attached host and would originate MVPN C-Multicast Route Type 6 and 7 NLRI's respectively. As described in RFC 6514 these NLRI's are directed towards RP-PE for Type 6 or Source-PE for Type 7. In case of (*, G) join a Shared-Path Tree will be built in the core from RP-PE towards Patel, et al. Expires January 2, 2017 [Page 20] INTERNET DRAFT Seamless Interop between EVPN & MVPN PEs July 2, 2017 all Receiver-PE's. **** A (*,G) S-PMSI is not the same thing as a (*,G) PIM tree; I'm not sure **** just which notion this is about. Once a Source starts to send Multicast data to specified multicast-group, the PE directly connected to Source will do PIM-registration with RP. **** If the source is on an all-active multi-homed ES, is the FHR function **** for (S,G) done by the PE receiving the (S,G) traffic or by the DF? Since there are existing receivers for the Group, RP will originate a PIM (S, G) join towards Source. This will be converted to MVPN Type 7 NLRI by RP-PE. Please note that since there are no other routers RP-PE would be the PE configured as RP **** By "since there are no other routers", do you mean "if there is no **** tenant multicast router performing the RP function for G"? using static configuration or by using BSR or Auto-RP procedures. **** If BSR or Auto-RP is used, how do you ensure that the PE is selected as **** the RP? The detailed working of such protocols is beyond the scope of this document. Upon receiving Type 7 NLRI, Source-PE will include MVPN Tunnel in its Outgoing Interface List. Furthermore, Source-PE will follow the procedures in RFC-6514 to originate MVPN SA-AD route (RT 5) to avoid duplicate traffic and allow all Receiver-PE's to shift from Share-Tree to Shortest-Path-Tree rooted at Source-PE. Section 13 of RFC6514 describes it. However a network operator can chose to have only Shortest-Path-Tree built in MVPN core as described in RFC6513. To achieve this, all PE's can act as RP for its locally connected hosts and thus avoid sending any Shared-Tree Join (MVPN Type 6) into the core. In this scenario, there will be no PIM registration needed since all PE's are first-hop router as well as acting RP. One a source starts to send multicast data, the PE directly connected to it originates Source-Active AD (RT 5) to all other PE's in network. Upon Receiving Source-Active AD route a PE must cache it in its local database and also look for any matching interest for (*, G) where G is the multicast group described in received Source-Active AD route. If it finds any such matching entry, it must originate a C-Multicast route (RT 7) in order to start receiving traffic from Source-PE. **** Note that in this mode (spt-only), this C-multicast route is not **** necessarily targeted to the originator of the SA A-D route. This procedure must be repeated on reception of any further Source-Active AD routes. 10.2 DCs with mixed of IGMP/MLD hosts & multicast routers running PIM- SSM This scenario has multicast routers which can send PIM SSM (S, G) joins. Upon receiving these joins and if source described in join is learnt to be behind a MVPN peer PE, local PE will originate C- Multicast Join (RT 7) towards Source-PE. It is expected that PIM SSM group ranges are kept separate from ASM range for which IGMP hosts can send (*, G) joins. Hence both ASM and SSM groups shall operate without any overlap. There is no RP needed for SSM range groups and Shortest Path tree rooted at Source is built once a receiver interest is known. **** PIM (S,G) Joins are sent in both ASM and SSM mode. I don't understand **** the relevance of the stuff about keeping the group ranges separate. 10.3 DCs with mixed of IGMP/MLD hosts & multicast routers running PIM- ASM This scenario includes reception of PIM (*, G) joins on PE's local AC. These joins are handled similar to IGMP (*, G) join as explained Patel, et al. Expires January 2, 2017 [Page 21] INTERNET DRAFT Seamless Interop between EVPN & MVPN PEs July 2, 2017 in sections above. Another interesting case can arise here is when one of the tenant routers can act as RP for some of the ASM Groups. In such scenario, a Upstream Multicast Hop (UMH) will be elected by other PE's in order to send C-Multicast Routes (RT 6). All procedures described in RFC 6513 with respect to UMH should be used to avoid traffic duplication due to incoherent selection of RP-PE by different Receiver-PE's. **** "incoherent" --> "inconsistent", I think. 10.4 DCs with mixed of IGMP/MLD hosts & multicast routers running PIM- Bidir Creating Bidirectional (*, G) trees is useful when a customer wants least amount of control state in network. But on downside all receivers for a particular multicast group receive traffic from all sources sending to that group. **** You're saying that the "downside" of bidir is that all the receivers of **** the group receive traffic from all the sources?? But that's always the **** case in ASM, even when bidir is not being used. I'm not sure what the **** above is intending to say. However for the purpose of this document, all procedures as described in RFC 6513 and RFC 6514 apply when PIM-Bidir is used. **** BIDIR-PIM support is not fully specified in RFCs 6513/6514. 11. IANA Considerations There is no additional IANA considerations for PBB-EVPN beyond what is already described in [RFC7432]. **** PBB-EVPN?? Is this a cut and paste error ;-) 12. Security Considerations All the security considerations in [RFC7432] apply directly to this document because this document leverages [RFC7432] control plane and their associated procedures. 13. Acknowledgements The authors would like to thank Samir Thoria, Ashutosh Gupta, Niloofar Fazlollahi, and Aamod Vyavaharkar for their discussions and contributions. 14. References **** Has the reference section been pasted from a different document? ;-) 14.1. Normative References [RFC7024] Jeng, H., Uttaro, J., Jalil, L., Decraene, B., Rekhter, Y., and R. Aggarwal, "Virtual Hub-and-Spoke in BGP/MPLS VPNs", RFC 7024, October 2013. **** This is not a normative reference, and is not even referenced at all. **** However, the MVPN specs should be normative references. Patel, et al. Expires January 2, 2017 [Page 22] INTERNET DRAFT Seamless Interop between EVPN & MVPN PEs July 2, 2017 [RFC7432] A. Sajassi, et al., "BGP MPLS Based Ethernet VPN", RFC 7432 , February 2015. 15.2. Informative References [RFC7080] A. Sajassi, et al., "Virtual Private LAN Service (VPLS) Interoperability with Provider Backbone Bridges", RFC 7080, December 2013. **** Not referenced. [RFC7209] D. Thaler, et al., "Requirements for Ethernet VPN (EVPN)", RFC 7209, May 2014. **** Not referenced. [RFC4389] A. Sajassi, et al., "Neighbor Discovery Proxies (ND Proxy)", RFC 4389, April 2006. **** Not referenced. [RFC4761] K. Kompella, et al., "Virtual Private LAN Service (VPLS) Using BGP for Auto-Discovery and Signaling", RFC 4761, Jauary 2007. **** Not referenced. [OVERLAY] A. Sajassi, et al., "A Network Virtualization Overlay Solution using EVPN", draft-ietf-bess-evpn-overlay-01, work in progress, February 2015. **** Normative reference? [RFC6514] R. Aggarwal, et al., "BGP Encodings and Procedures for Multicast in MPLS/BGP IP VPNs", RFC6514, February 2012. **** Normative reference. [RFC6513] E. Rosen, et al., "Multicast in MPLS/BGP IP VPNs", RFC6513, February 2012. **** Normative reference? [INTERCON-EVPN] J. Rabadan, et al., "Interconnect Solution for EVPN Overlay networks", https://tools.ietf.org/html/draft-ietf- bess-dci-evpn-overlay-04, September 2016 **** Normative reference? [TUNNEL-ENCAPS] E. Rosen, et al. "The BGP Tunnel Encapsulation Attribute", https://tools.ietf.org/html/draft-ietf-idr- tunnel-encaps-06, work in progress, June 2017. **** Normative reference? 15. Authors' Addresses Ali Sajassi Cisco 170 West Tasman Drive San Jose, CA 95134, US Email: sajassi@cisco.com Patel, et al. Expires January 2, 2017 [Page 23] INTERNET DRAFT Seamless Interop between EVPN & MVPN PEs July 2, 2017 Samir Thoria Cisco 170 West Tasman Drive San Jose, CA 95134, US Email: sthoria@cisco.com Niloofar Fazlollahi Cisco 170 West Tasman Drive San Jose, CA 95134, US Email: nifazlol@cisco.com Ashutosh Gupta Avi Networks Email: ashutosh@avinetworks.com Patel, et al. Expires January 2, 2017 [Page 24]