Re: [bess] Comments on draft-sajassi-bess-evpn-mvpn-seamless-interop

"Ali Sajassi (sajassi)" <sajassi@cisco.com> Thu, 28 June 2018 21:23 UTC

Return-Path: <sajassi@cisco.com>
X-Original-To: bess@ietfa.amsl.com
Delivered-To: bess@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D65BD130E28 for <bess@ietfa.amsl.com>; Thu, 28 Jun 2018 14:23:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -14.51
X-Spam-Level:
X-Spam-Status: No, score=-14.51 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sc7R6PSdgWJc for <bess@ietfa.amsl.com>; Thu, 28 Jun 2018 14:23:39 -0700 (PDT)
Received: from rcdn-iport-7.cisco.com (rcdn-iport-7.cisco.com [173.37.86.78]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 42AB6128CF3 for <bess@ietf.org>; Thu, 28 Jun 2018 14:23:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=18758; q=dns/txt; s=iport; t=1530221019; x=1531430619; h=from:to:subject:date:message-id:references:in-reply-to: content-id:content-transfer-encoding:mime-version; bh=C7DAvaTxCpkQXXpGwzhs/ApsTk8F1caxx/dtYlG+LTE=; b=YriPG8RnYPHGJ68hWRlF4NUqmAkKZlrGDyEH/JU+x0Nb8yO2KjVN+zNe ENNLtIkO+xW4HuqxjpU5FZGDt/XXSmw7giZK0WBDNaUVWrTU9NE8xMf5u rQSeimIbXBvKM1e4Qxqy0CMdqITyjQoSfceQqg7RSSt5pWymGfCvYwdkh s=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A0DLAADGUDVb/4UNJK1dGQEBAQEBAQEBAQEBAQcBAQEBAYNJgWEoCoNviASMPoFlgReUKYF6C4RsAheDAyE0GAECAQECAQECbSiFNwYjEVUCAQgaAiYCAgIwFRACBAESG4MFggCuV4IchFuDeoEkgQuHYoIVgQ4BJwyBXkk1hHqDATGCJAKZPgkCiGSGMIFAhAeIBpFSAhETAYEkHTiBUnAVZQGCPpBRb49UgRoBAQ
X-IronPort-AV: E=Sophos;i="5.51,284,1526342400"; d="scan'208";a="413618706"
Received: from alln-core-11.cisco.com ([173.36.13.133]) by rcdn-iport-7.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Jun 2018 21:23:38 +0000
Received: from XCH-RTP-005.cisco.com (xch-rtp-005.cisco.com [64.101.220.145]) by alln-core-11.cisco.com (8.14.5/8.14.5) with ESMTP id w5SLNbn3015861 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=FAIL); Thu, 28 Jun 2018 21:23:38 GMT
Received: from xch-rtp-005.cisco.com (64.101.220.145) by XCH-RTP-005.cisco.com (64.101.220.145) with Microsoft SMTP Server (TLS) id 15.0.1320.4; Thu, 28 Jun 2018 17:23:37 -0400
Received: from xch-rtp-005.cisco.com ([64.101.220.145]) by XCH-RTP-005.cisco.com ([64.101.220.145]) with mapi id 15.00.1320.000; Thu, 28 Jun 2018 17:23:37 -0400
From: "Ali Sajassi (sajassi)" <sajassi@cisco.com>
To: Eric C Rosen <erosen@juniper.net>, Bess WG <bess@ietf.org>
Thread-Topic: [bess] Comments on draft-sajassi-bess-evpn-mvpn-seamless-interop
Thread-Index: AQHTWYIvABdylJGGT0iQFKheHltmgqR3ZzoA
Date: Thu, 28 Jun 2018 21:23:37 +0000
Message-ID: <0A6CB14F-993C-4BCB-8678-26C3AE0AFE52@cisco.com>
References: <d1e53751-289d-6ac9-d019-2fe07cc33602@juniper.net>
In-Reply-To: <d1e53751-289d-6ac9-d019-2fe07cc33602@juniper.net>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/10.c.0.180410
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [10.41.32.24]
Content-Type: text/plain; charset="utf-8"
Content-ID: <54601448FE704E40AD552E6FE7BDCB45@emea.cisco.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/bess/MBWlZGeqsOk3l1O0YxMWC5OlHjA>
Subject: Re: [bess] Comments on draft-sajassi-bess-evpn-mvpn-seamless-interop
X-BeenThere: bess@ietf.org
X-Mailman-Version: 2.1.26
Precedence: list
List-Id: BGP-Enabled ServiceS working group discussion list <bess.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bess>, <mailto:bess-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bess/>
List-Post: <mailto:bess@ietf.org>
List-Help: <mailto:bess-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bess>, <mailto:bess-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Jun 2018 21:23:42 -0000

Eric, 

Please see my responses to your comments inline marked w/ "Ali>"

On 11/9/17, 9:42 AM, "BESS on behalf of Eric C Rosen" <bess-bounces@ietf.org on behalf of erosen@juniper.net> wrote:

    I have a number of comments on 
    draft-sajassi-bess-evpn-mvpn-seamless-interop.
    
    1. It seems that the proposal does not do correct ethernet emulation.  
    Intra-subnet multicast only sometimes preserves MAC SA and IP TTL, 
    sometimes not, depending upon the topology.  TTL handling for 
    inter-subnet multicast seems inconsistent as well, depending upon the 
    topology.  The proposal exposes the operator's internal network 
    structure to the user, and will cause "LAN-only" applications to break.  
    These concerns are acknowledged, then quickly dismissed based on wishful 
    thinking.  (In my experience, wishful thinking doesn't work out very 
    well in routing.)

Ali> EVPN doesn't provide LAN service per IEEE 802.1Q but rather an emulation of LAN service. This document defines what that emulation means wrt IP multicat traffic for intra-subnet & inter-subnet IP multicast traffic. I added section 5.1 to expand on that. BTW, TTL handling for inter-subnet IP multicast traffic is done consistent!
    
    2. In order to do inter-subnet multicast in EVPN, the proposal requires 
    L3VPN/MVPN configuration on ALL the EVPN PEs.  This is required even 
    when there is no need for MVPN/EVPN interworking. This is portrayed as a 
    "low provisioning" solution!

Ali> Using MVPN constructs doesn't requires additional configuration on EVPN PEs beyond multicast configuration needed for IRB-mcast operation. 
    
    3. The draft claims that the exact same control plane should be used for 
    EVPN and MVPN, despite the fact that MVPN's control plane is unaware of 
    certain information that is very important in EVPN (e.g., EVIs, 
    TagIDs).  (This is largely responsible for point 1 above.) This is 
    claimed to be a way of providing a "uniform solution".  As we examine 
    the problems that arise, perhaps this will be seen as more a case of 
    "pounding square pegs into round holes".  When interworking between two 
    domains, generally one gets a more flexible and robust scheme by 
    maintaining clean interfaces and having well-defined points of 
    attachment, not by entangling the internal protocols of one domain with 
    the internal protocols of the other.
    
Ali>  IP multicast described in the draft is done at the tenant's level (IP-VRF) and not BD level !! So, BD level info such as tagIDs are not relevant. 

    4. The draft proposes to use the same tunnels for MVPN and EVPN, i.e., 
    to have tunnels that traverse both the MVPN and the EVPN domains.  
    Various "requirements" are stated that seem to require this solution.  
    Somewhere along the line it was realized that this requirement cannot be 
    met if MVPN and EVPN do not use the same tunnel types.  So for this very 
    common scenario, a completely different solution is proposed, that (a) 
    tries to keep the EVPN control plane out of the MVPN domain, and vice 
    versa, and (b) uses different tunnels in the two domains.  Perhaps the 
    "requirements" that suggest using a single cross-domain tunnel are not 
    really requirements!  And why would we want different solutions for 
    different deployment scenarios?  Yes, the solution needs to handle all 
    the use cases, but we don't want to look at the use cases one at a time 
    and design a different solution for each one.

Ali> There are SPDCs with MPLS underlay and there are SPDCs with VxLAN underlay. We need a solution that is optimum for both. Just the same way that we need both ASBR and GWs to optimize connectivity for inter-AS scenarios.
    
    While the authors have realized that one cannot have cross-domain 
    tunnels when EVPN uses VxLAN and MVPN uses MPLS, they do not seem to 
    have acknowledged the multitude of other scenarios in which cross-domain 
    tunnels cannot be used.  For instance, MVPN may be using mLDP, while 
    EVPN is using IR.  Or MVPN may be using RSVP-TE P2MP while EVPN is using 
    AR.  Etc., etc.  I suspect that "different tunnel types" will be the 
    common case, especially when trying to interwork existing MVPN and EVPN 
    deployments.

Ali> This will be captured in the next rev. and that's why the need for both GW and ASBRs.
    
    The inability to use EVPN-specific tunnels also causes a number of 
    specific problems when attempting to interwork with MVPN; these will be 
    examined below.
    
    
    5. A number of the draft's stated "requirements" seem to be entirely bogus.
    
    a. In some cases, the "requirements" for optimality in one or another 
    respect (e.g., routing, replication) are really only considerations that 
    an operator should be able to trade off against other considerations.  
    The real requirement is to be able to create a deployment scenario in 
    which such optimality is achievable.  Other deployment scenarios, that 
    optimize for other considerations, should not be prohibited.
    
Ali> What deployment scenarios do you think are prohibited ?

    b. Many of the "requirements" are applied very selectively, e.g., the 
    "requirement" for MVPN and EVPN to use the same set of multicast 
    tunnels, and the requirement for there to be no "gateways".
   
Ali> That has been explained in context of SPDC.
 
    
    6. The gateway-based proposal for interworking MVPN and EVPN when they 
    use different tunnel types is severely underspecified.

Ali> Agreed. This will be covered in the subsequent revisions.
    
    One possible approach to this would be to have a single MVPN domain that 
    includes the EVPN PEs, and to use MVPN tunnel segmentation at the 
    boundary. While that is a complicated solution, at least it is known to 
    work. However, that does not seem to be what is being proposed.

Ali> It is not clear to me exactly what you are suggesting here. At the boundary, is there any mcast address lookup or not? 
    
    Another approach would be to set up two independent MVPN domains and 
    carefully assign RTs to ensure that routes are not leaked from one 
    domain to another.  One would also have to ensure that the boundary 
    points send the proper set of routes into the "other" domain.  (This 
    includes the unicast routes as well as the multicast routes.)  And one 
    would have to include a whole bunch of applicability restrictions, such 
    as "don't use the same RR to hold routes of both domains".  I think 
    that's what's being proposed, but there isn't enough discussion of RT 
    and RD management to be sure, and there isn't much discussion of what 
    information the boundary points send into each domain.

Ali> I will expand on that with the RD and RT management aspects. Both the intension is with a single MVPN domain where both EVPN and MVPN PEs participate.
    
    7. The proposal requires that EVPN export a host route to MVPN for each 
    EVPN-attached multicast source.  It's a good thing that there is no 
    requirement like "do not burden existing MVPN deployments with a whole 
    bunch of additional host routes".  Wait a minute, maybe there is such a 
    requirement.
    
Ali> :-)

    In fact, whether the host routes are necessary to achieve optimal 
    routing depends on the topology.  And this is a case where an operator 
    might well want to sacrifice some routing optimality to reduce the 
    routing burden on the MVPN nodes.
  
Ali> If there is mobility, then there is host route advertisement :-) If there is no mobility, then prefixes can be advertised.

    8. The proposal simply does not work when MVPN receivers are interested 
    in multicast flows from EVPN sources that are attached to all-active 
    multi-homed ethernet segments.

Ali> This issue has been addressed in the new revision. 
    
    This issue is worth examining in detail. 
    
    Suppose EVPN-PE1 and EVPN-PE2 are both attached to the same ethernet 
    segment, using all-active multi-homing.  Suppose there is a multicast 
    source S on that segment.  In such a case, (S,G1) traffic might arrive 
    at PE1, while (S,G2) traffic might arrive at PE2. (Which PE gets a 
    particular flow from S depends on LAG hashing algorithms over which we 
    have no control.)  Now suppose that an MVPN PE, say PE3, needs to 
    receive (S,G1) traffic.
    
    MVPN requires PE3 to select the "Upstream PE" for the (S,G1) traffic.  
    PE3 does this by looking at the VRF Route Import EC on its best route to S.
    
    In order to receive the (S,G1) traffic, PE3 must select PE1, rather than 
    PE2, as the Upstream PE.  However, there is absolutely nothing in the 
    MVPN specs or in this document to ensure that PE3 selects PE1 rather 
    than PE2. Generally, an MVPN node will select PE2 if it is closer to PE2.
    
    Perhaps the authors are under the impression that MVPN Source Active A-D 
    routes can be used to solve this problem.  That is not so. Vanilla MVPN 
    nodes do not generally base their selection of the Upstream PE for (S,G) 
    on the SA A-D routes.
    
    Let me explain a little about the way SA A-D routes are used.  There are 
    two different MVPN "modes" that affect the use of SA A-D routes.
    
    In one mode (sometimes known as 'rpt-spt' mode, and described in Section 
    13 of RFC 6514), an SA A-D route for (S,G) is originated by a PE when 
    that PE receives a C-multicast route for (S,G).
    
    In another mode (sometimes known as 'spt-only' mode, and described in 
    Section 14 of RFC 6514), an SA A-D route for (S,G) is originated by a PE 
    when that PE receives a PIM Register message for (S,G), or when that PE 
    receives an MSDP SA message for (S,G).  Note that in this mode, the PE 
    originating the SA A-D route is not necessarily the best (or even a 
    good) ingress PE for the flow.
    
    - In both modes, if an egress PE receives a PIM Join (S,G) from a CE, 
    its choice of ingress PE is never impacted by the SA A-D routes.  Note 
    that CEs send PIM Join(S,G) messages for both ASM and SSM groups.
    
    - In spt-only mode, the SA A-D routes are used to discover sources, but 
    not to select the ingress PE. (The selected ingress PE is not 
    necessarily the one originating the SA A-D route.)
    
    - The choice of ingress PE is impacted by the SA A-D routes for (S,G) 
    only when (a) rpt-spt mode is being used, (b) the egress PE has received 
    a PIM Join (*,G) from a CE, and (c) the egress PE has not received a PIM 
    Join (S,G) from a CE.  This is typically just a transient state, as the 
    CE will generally emit a PIM Join(S,G) as soon as it sees any (S,G) traffic.
    
    Bottom line: if a source is on an EVPN all-active multi-homed segment, 
    MVPN receivers have no way to select the proper ingress PE.  If the 
    segment is n-way-homed, the MVPN PEs have just a 1/n chance of getting 
    the traffic.
    
    Of course, this problem could be eliminated if EVPN and MVPN didn't have 
    to use the same tunnels.  In that case, if an MVPN node selects the 
    wrong ingress PE, the selected PE could obtain the traffic from the real 
    ingress PE, and then relay it to the MVPN node.  This might result in 
    sub-optimal routing, but that's better than a black hole!
    
    Perhaps the gateway-based solution needs to be used whenever there is 
    all-active multi-homing? ;-)
    
    One could imagine modifying the MVPN installed based so that the SA A-D 
    routes play more of a role in selecting the Upstream PE. However, I 
    believe the requirement is to allow MVPN/EVPN interworking without 
    modifying the existing MVPN nodes.
    
    9. In the case where all the multicast sources for a given group are 
    attached via EVPN, there is a very simple procedure for providing 
    Join(*,G) functionality.  This procedure makes use of EVPN-specific 
    knowledge.  Since the MVPN protocols cannot take advantage of the 
    EVPN-specific knowledge, a more complicated procedure is needed when 
    only MVPN protocols are used.  This is explained further in the in-line 
    comments.
    
    10. Most of the problems above are the result of (a) trying to use the 
    exact same control plane for both MVPN and EVPN, and (b) treating the 
    case where both domains use the same tunnel type as the design center.  
    It would be better to keep clean interfaces between EVPN and MVPN, with 
    clearly defined points of attachment.  The proposal in 
    draft-lin-bess-evpn-irb-mcast does this, and thus does not run into the 
    above problems.  That proposal also shows how the "optimal routing" 
    requirements can be met, and how they can be traded off against other 
    considerations. (In fairness, it must be acknowledged that both 
    proposals are still works in progress.  It's also worth noting that the 
    two proposals have a lot in common.)
    
Ali> The proposal in evpn-irb-mcast is not ruled out. 

    A number of additional comments can be found in-line in the attachment.  
    (I realize that some of them are repetitive, sorry.) Look for lines 
    beginning "****".  The above comments are also repeated at the front of 
    the attachment.

Ali> I will go over your additional comments and address them separately.

Cheers,
Ali