Re: [bess] Comments on draft-sajassi-bess-evpn-mvpn-seamless-interop

"Kesavan Thiruvenkatasamy (kethiruv)" <kethiruv@cisco.com> Mon, 15 July 2019 18:28 UTC

Return-Path: <kethiruv@cisco.com>
X-Original-To: bess@ietfa.amsl.com
Delivered-To: bess@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4318B12023F for <bess@ietfa.amsl.com>; Mon, 15 Jul 2019 11:28:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -14.5
X-Spam-Level:
X-Spam-Status: No, score=-14.5 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com header.b=kFb1pZJp; dkim=pass (1024-bit key) header.d=cisco.onmicrosoft.com header.b=oBRDCTTf
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QxAx191bg7EY for <bess@ietfa.amsl.com>; Mon, 15 Jul 2019 11:28:25 -0700 (PDT)
Received: from rcdn-iport-6.cisco.com (rcdn-iport-6.cisco.com [173.37.86.77]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6389E120254 for <bess@ietf.org>; Mon, 15 Jul 2019 11:28:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=96507; q=dns/txt; s=iport; t=1563215295; x=1564424895; h=from:to:subject:date:message-id:references:in-reply-to: mime-version; bh=0TeXkmXQkJaqmwSZSt8RfDtConIYxHHSkCv9OYc83xQ=; b=kFb1pZJpieLxxDBKwfA5xeT/EsTYqzDhkkFlVsj575oxOUEs52PebqBw /XhM8C0NAFQwiO3UiY2IHztqcuoHqT2meu7NsQLYOCl7ARqB8u9AAhybj vtSVu/lFowI3JrPUr2CILq9AiHEJJ8DY8H16n2lpjRuE0W1QgoXpUXXTK 8=;
IronPort-PHdr: =?us-ascii?q?9a23=3AVHBNVR0jy5oJJQbIsmDT+zVfbzU7u7jyIg8e44?= =?us-ascii?q?YmjLQLaKm44pD+JxKGt+51ggrPWoPWo7JfhuzavrqoeFRI4I3J8RVgOIdJSw?= =?us-ascii?q?dDjMwXmwI6B8vQCkDnLP/wcjISF8VZX1gj9Ha+YgBY?=
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: =?us-ascii?q?A0AGAQCwxCxd/51dJa1lHAEBAQQBAQc?= =?us-ascii?q?EAQGBVQUBAQsBgRQvJCwDalUgBAsohByDRwOOT0yCD36WUYEuFIEQA1QJAQE?= =?us-ascii?q?BDAEBLQIBAYMFgTsCF4JOIzYHDgEDAQEEAQECAQVthTwMhUoBAQEBAxIIAQg?= =?us-ascii?q?EBhMBAQUgEw8CAQgRAwECFgsBCQICAjAdCAIEARIbB4MAAYEdTQMdAaEtAoE?= =?us-ascii?q?4iGBxfzOCeQEBBS6EWxiCEwmBNAGBWYoFF4FAP4EQAScfgU5JNT6Dfy0vCQY?= =?us-ascii?q?GE4JLMoImiHqDCBIOAYJVhH6Ia40ZbQkCghmLRoQChEQbgi2HJYxngVGNMQS?= =?us-ascii?q?XUAIEAgQFAg4BAQWBVwcqN4EhcBU7KgGCQYJBERKDTopSAXKBKY8CAQE?=
X-IronPort-AV: E=Sophos;i="5.63,493,1557187200"; d="scan'208,217";a="595500985"
Received: from rcdn-core-6.cisco.com ([173.37.93.157]) by rcdn-iport-6.cisco.com with ESMTP/TLS/DHE-RSA-SEED-SHA; 15 Jul 2019 18:28:06 +0000
Received: from XCH-RCD-013.cisco.com (xch-rcd-013.cisco.com [173.37.102.23]) by rcdn-core-6.cisco.com (8.15.2/8.15.2) with ESMTPS id x6FIS1KD023189 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=FAIL); Mon, 15 Jul 2019 18:28:03 GMT
Received: from xhs-aln-002.cisco.com (173.37.135.119) by XCH-RCD-013.cisco.com (173.37.102.23) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Mon, 15 Jul 2019 13:28:01 -0500
Received: from xhs-rcd-001.cisco.com (173.37.227.246) by xhs-aln-002.cisco.com (173.37.135.119) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Mon, 15 Jul 2019 13:28:00 -0500
Received: from NAM04-SN1-obe.outbound.protection.outlook.com (72.163.14.9) by xhs-rcd-001.cisco.com (173.37.227.246) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Mon, 15 Jul 2019 13:28:00 -0500
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=YeWkFY6aJ5eUp3ZB3yrfo0Ndu6+IKt58ixpBZLM81GM+hapN9xxxgfQolbCrIbDZZbeZIjNTBlexrcM0z03ZoHjUqbImAkDoTUoN5Lg5AGK9/kBQGHHVmDLcWy+qiGYLhhXLU2TYdoaUuNHDFXo/sN8gsDKiFt7wkXkC/d1GjdVqRgcoM2jeTfh5S/3G6tXirAPrJKTVvJgYWIqh72MajBaADmL/6JwhrM5b2kpI4DdbCzHGMYPg0bRpCn+UZzjwyxLgse7Tnz2LlIhckgEzMHxZUpr6232jlQu1ZcxkvqCw1XbzQVGn3XURksQK8x1ly0jGflnrkv2+jZrSopSHHg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=0TeXkmXQkJaqmwSZSt8RfDtConIYxHHSkCv9OYc83xQ=; b=LNiAl4uj9SegdLzhwqftDz0/oVxN2bi90XAt5wsNaLS/EhKHB28k80yiVNVz63mc104Ep2DhXVoN0w0Y/UqnpDQPH+8/SYPbk7WE83/FURZBEAmZHLYZrz/AH3taUWrPdMi7LnOs+ozwNqTyDT2U1YvLXv7SEKUgphEQpjNwxGsRydKnK2ctcZwxNzKvULPKflxmlNNezVEOaMW7DZEUEPiAw/mkpClRYI4NRG1e8zyVS2KvbT1VrCfUHLUrwkJTUjJT669B8mM6oMeYU/rQmU8OsMWXNNMp/0e9IWWB3rzwHVECCd/X0v8dSojYFEDkj2HvqVA71v5LFks8iRGaXw==
ARC-Authentication-Results: i=1; mx.microsoft.com 1;spf=pass smtp.mailfrom=cisco.com;dmarc=pass action=none header.from=cisco.com;dkim=pass header.d=cisco.com;arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cisco.onmicrosoft.com; s=selector2-cisco-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=0TeXkmXQkJaqmwSZSt8RfDtConIYxHHSkCv9OYc83xQ=; b=oBRDCTTfx5a4x3zPu5TlikiQn2PKPDeCCJ8r8pSr7afCy0zXtWBlN1vnQMy+bzRUwIub4i7NfsOzzx5WqJ1nGEDuimnKjs64SmVUdHtVtPSuQFKvrc7HH3u+Q2SZm3jpz7z4ABK249u3CKSmArbN6yNmi+u/nvdpx+LduAjxILg=
Received: from MWHPR11MB1726.namprd11.prod.outlook.com (10.169.233.137) by MWHPR11MB2048.namprd11.prod.outlook.com (10.169.235.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2073.14; Mon, 15 Jul 2019 18:27:58 +0000
Received: from MWHPR11MB1726.namprd11.prod.outlook.com ([fe80::e9bf:eb4a:836b:a3b]) by MWHPR11MB1726.namprd11.prod.outlook.com ([fe80::e9bf:eb4a:836b:a3b%7]) with mapi id 15.20.2073.012; Mon, 15 Jul 2019 18:27:58 +0000
From: "Kesavan Thiruvenkatasamy (kethiruv)" <kethiruv@cisco.com>
To: Eric C Rosen <erosen@juniper.net>, "Ali Sajassi (sajassi)" <sajassi@cisco.com>, Bess WG <bess@ietf.org>
Thread-Topic: [bess] Comments on draft-sajassi-bess-evpn-mvpn-seamless-interop
Thread-Index: AQHTWYIvABdylJGGT0iQFKheHltmgqR3ZzoAgHST6ACB41KIgA==
Date: Mon, 15 Jul 2019 18:27:58 +0000
Message-ID: <C4C8D90B-8B9A-426D-90CD-F38BF703D309@cisco.com>
References: <d1e53751-289d-6ac9-d019-2fe07cc33602@juniper.net> <0A6CB14F-993C-4BCB-8678-26C3AE0AFE52@cisco.com> <a593031b-2c8e-823d-ea2e-13dcfbdfd558@juniper.net>
In-Reply-To: <a593031b-2c8e-823d-ea2e-13dcfbdfd558@juniper.net>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/10.1a.0.190609
authentication-results: spf=none (sender IP is ) smtp.mailfrom=kethiruv@cisco.com;
x-originating-ip: [2001:420:30d:1254:7dfc:2382:5987:7ed9]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 147d28ff-fbca-4086-bbe3-08d70952299c
x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(7193020); SRVR:MWHPR11MB2048;
x-ms-traffictypediagnostic: MWHPR11MB2048:
x-ms-exchange-purlcount: 2
x-microsoft-antispam-prvs: <MWHPR11MB20481F46446AF65513E63A31CACF0@MWHPR11MB2048.namprd11.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:8882;
x-forefront-prvs: 00997889E7
x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(4636009)(396003)(376002)(39860400002)(346002)(136003)(366004)(199004)(51444003)(37854004)(189003)(2616005)(316002)(66946007)(66574012)(476003)(66476007)(81166006)(66446008)(66556008)(64756008)(229853002)(86362001)(71190400001)(11346002)(446003)(71200400001)(102836004)(81156014)(53546011)(9326002)(110136005)(8936002)(30864003)(58126008)(76116006)(6506007)(91956017)(186003)(5660300002)(256004)(5024004)(46003)(99286004)(14444005)(14454004)(76176011)(25786009)(68736007)(561944003)(7736002)(478600001)(8676002)(6512007)(53946003)(6116002)(6246003)(6306002)(790700001)(54896002)(53936002)(33656002)(6486002)(1941001)(486006)(36756003)(2906002)(6436002); DIR:OUT; SFP:1101; SCL:1; SRVR:MWHPR11MB2048; H:MWHPR11MB1726.namprd11.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1;
received-spf: None (protection.outlook.com: cisco.com does not designate permitted sender hosts)
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam-message-info: /EbL6j12uMuBNKDkiAkJB7bDlSfuAVCQJ8g9MOEtxJsIHs7LSOL/EtHSpnBSPRJXV/GzG/95pHpTTJVTaoTRHQEoHtjo7oC67+8YMXyCwSMVZw6QNQS5zetSHb0doLG7BGadaNV5H7Gqo7rnSQIMLnW1VNE5RzMeUeIp1ESaZlPOSnHlhWTCqg/i5IkroScoEy4Jc6iF3ly2RuK62MAoQuJ+J16Ab+BbRPa7FbQ6AVhGv4OHPuE1Ccqc7rx2ZLL/hzhxPbu22IyOA57OIJbfsxslIaVvVXcRXpa8wMns/8N0ul7vnPI73tPsCVzgkmKpBh7Qm8TgmKkUimT1gfg0he5IjKaV7B0+dIYRdjhkFhXHu11otuCw6HmLbvfsHqe53epBmcG3rEGqm8aiw52MQ+DJZkaAzWlugPCk/Lq3jSA=
Content-Type: multipart/alternative; boundary="_000_C4C8D90B8B9A426D90CDF38BF703D309ciscocom_"
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: 147d28ff-fbca-4086-bbe3-08d70952299c
X-MS-Exchange-CrossTenant-originalarrivaltime: 15 Jul 2019 18:27:58.4088 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5ae1af62-9505-4097-a69a-c1553ef7840e
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: kethiruv@cisco.com
X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR11MB2048
X-OriginatorOrg: cisco.com
X-Outbound-SMTP-Client: 173.37.102.23, xch-rcd-013.cisco.com
X-Outbound-Node: rcdn-core-6.cisco.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/bess/Pyqdttzqdo6b-Ee516lj5BDzsqU>
Subject: Re: [bess] Comments on draft-sajassi-bess-evpn-mvpn-seamless-interop
X-BeenThere: bess@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: BGP-Enabled ServiceS working group discussion list <bess.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bess>, <mailto:bess-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bess/>
List-Post: <mailto:bess@ietf.org>
List-Help: <mailto:bess-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bess>, <mailto:bess-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Jul 2019 18:28:31 -0000

Hi Eric,

Thanks for  your comments.   Please see the inline responses below.

Regards,
Kesavan

From: BESS <bess-bounces@ietf.org> on behalf of Eric C Rosen <erosen@juniper.net>
Date: Monday, September 10, 2018 at 10:39 AM
To: "Ali Sajassi (sajassi)" <sajassi@cisco.com>om>, Bess WG <bess@ietf.org>
Subject: Re: [bess] Comments on draft-sajassi-bess-evpn-mvpn-seamless-interop

Eric> 1. It seems that the proposal does not do correct ethernet emulation.  Intra-subnet multicast only sometimes preserves MAC SA and IP TTL, sometimes not, depending upon the topology.

Ali> EVPN doesn't provide LAN service per IEEE 802.1Q but rather an emulation of LAN service. This document defines what that emulation means

The fact that the proposal doesn't do correct ethernet emulation cannot be resolved by having the proposal redefine "emulation" to mean "whatever the proposal does".

EVPN needs to ensure that whatever works on a real ethernet will work on the emulated ethernet as well; the externally visible service characteristics on which the higher layers may depend must be properly offered by the emulation.  This applies to both unicast and multicast equally.

Otherwise anyone attempting to replace a real ethernet with EVPN will find that not every application and/or protocol working on the real ethernet will continue to work on the EVPN.

Kesavan>>  A solution has been proposed  in the new revision to preserve MAC-SA and IP TTL for intra-subnet  traffic.


Eric> TTL handling for inter-subnet multicast seems inconsistent as well, depending upon the topology.

Ali> BTW, TTL handling for inter-subnet IP multicast traffic is done consistent!

Consider the following in a pure MVPN environment:

- Source S is on subnet1, which is attached to PE1.

- Receivers R1 and R2 are on subnet2, which is attached to both PE1 and PE2.

- Subnet1 and subnet2 are different subnets.

Now every (S,G) packet will follow the same path: either (a) subnet1-->PE1-->subnet2 or (b) subnet1-->PE1-->PE2-->subnet2.

Both paths cannot be used at the same time, because L3 multicast will not allow both PE1 and PE2 to transmit the (S,G) flow to subnet2.  So an (S,G) packet received by R1 will always have the same TTL as the same packet received by R2.  TTL scoping will therefore work consistently; depending on the routing, and from the perspective of any given flow, the two subnets are either one hop away from each other, or two hops away from each other.

In the so-called "seamless-mcast" scheme, on the other hand, if R1 and R2 get the same (S,G) packet, each may see a different TTL.  Suppose R1 is on an ES attached to PE1 but not to PE2, S is on an ES attached to PE1 but not to PE2, and R2 is on an ES attached to PE2 but not to PE1.  Then a given (S,G) packet received by R1 will have a smaller TTL than the same packet received by R2, even though R1 and R2 are on the same subnet.

Note that the seamless-mcast proposal does not provide the behavior that would be provided by MVPN, despite the claim that it is "just MVPN".

This user-visible inconsistency may break any use of TTL scoping, and is just the sort of thing that tends generate a stream of service calls from customers that pay attention to this sort of stuff.

In general, TTL should be decremented by 0 for intra-subnet and by 1 (within the EVPN domain) for inter-subnet.  Failure to handle the TTL decrement properly will break anything that depends upon RFC 3682 ("The Generalized TTL Security Mechanism").  Have you concluded that no use of multicast together with RFC 3682 will, now or in the future, ever need to run over EVPN?  I'd like to know how that conclusion is supported.  You may also wish to do a google search for "multicast ttl scoping".

A related issue is that the number of PEs through which a packet passes should not be inferrable by a tenant.  Any sort of multicast traceroute tool used by a tenant will give unexpected results if TTL is not handled properly; at the very least this will result in service calls.

The OISM proposal (as described in the irb-mcast draft) will decrement TTL by 1 when packets go from one subnet to another, as an IP multicast frame is distributed unchanged to the PEs that need it, and its TTL is decremented by 1 if an egress PE needs to deliver it to a subnet other than its source subnet.

Kesavan>>  TTL is handled very similar to inter-subnet unicast traffic.  ( In EVPN-IRB model,  TTL will get decremented once for hosts that are attached to same PE.  TTL. will get decremented twice, if hosts are connected behind two different PEs.)


The draft still makes the following peculiar claim:

   "Based on past experiences with MVPN over last dozen years for supported IP multicast applications, layer-3 forwarding of intra-subnet multicast traffic should be fine."

Since MVPN does not do intra-subnet multicast, experience with MVPN has no bearing whatsoever on the needs of intra-subnet multicast.

Kesavan>>    The above mentioned statement has been removed in the new revision.


Eric> 2. In order to do inter-subnet multicast in EVPN, the proposal requires L3VPN/MVPN configuration on ALL the EVPN PEs.  This is required even when there is no need for MVPN/EVPN interworking. This is portrayed as a "low provisioning" solution!

Ali> Using MVPN constructs doesn't requires additional configuration on EVPN PEs beyond multicast configuration needed for IRB-mcast operation.

I think you'll find that if you don't reconfigure all the BGP sessions to carry AFI/SAFIs 1/128, 2/128, 1/5, and 2/5, you'll have quite a bit of trouble running any of the native MVPN procedures ;-) This is perhaps the simplest example of additional configuration that is needed.

If doing MVPN/EVPN interworking, one needs to go to every EVPN PE and set up all the RTs used to control the distribution of routes within the L3VPN domain.  One has to consider whether the RDs already used by EVPN are distinct from the RDs already used by L3VPN.  One has to enable the tunneling mechanisms that are used in the L3VPN domain (hopefully the EVPN PEs can support those tunneling techniques).  If the L3VPN deployment has been set up with particular routing policies (special communities carried, or whatever), these need to be configured on every EVPN PE.  One needs to take account of whether the L3VPN deployment uses segmented P-tunnels or non-segmented P-tunnels, and whether it depends upon the use of (C-*,C-*) S-PMSI A-D routes or not.  One needs to configure whether the L3VPN is expecting procedures of RFC6514 Section 13 ("rpt-spt") or whether it is expecting procedures of RFC6514 Section 14 ("spt-only").  I think there are quite a few other configuration items (various timers, and additional stuff that I probably don't even know about) that may need to be coordinated with the L3VPN deployment with which one is attempting to interwork.

To do interworking between EVPN and L3VPN/MVPN, the L3VPN/MVPN stuff obviously needs to be configured at the interworking points.  The REQUIREMENT to do ALL this configuration at EVERY single EVPN PE is what seems excessive.

Even if one is not doing MVPN/EVPN interworking, all this stuff still has to be configured; one just wouldn't have to worry about in that case about coordinating with a pre-existing MVPN deployment.  But no one ever called L3VPN a "low provisioning solution".   EVPN (unlike MVPN), has a fair amount of auto-provisioning built-in, and one loses the advantages of that if one has to do MVPN provisioning on every PE.

Kesavan>>  L3VPN configuration is not needed, if there is no need for MVPN/EVPN internetworking. But, MVPN configuration is still required .
BTW auto-provisioning can still be used.  MVPN config and RT can be auto-configured in the EVPN fabric.

Eric> 3. The draft claims that the exact same control plane should be used for EVPN and MVPN, despite the fact that MVPN's control plane is unaware of certain information that is very important in EVPN (e.g., EVIs, TagIDs).

Ali> IP multicast described in the draft is done at the tenant's level (IP-VRF) and not BD level !! So, BD level info such as tagIDs are not relevant.

The failure to carry BD level info is what causes the ethernet emulation to be done incorrectly.  Remember that most of EVPN is taken from L3VPN, with modifications to add stuff that is needed to correctly emulate the ethernet service.

Certainly if you look at the control plane used by EVPN to distribute unicast IP addresses, you'll see that it does not "just use L3VPN", but instead has lots of EVPN-specific stuff.

It's also worth pointing out that the draft does not really use the exact same control plane as MVPN, as it seems to require that each IP host address be advertised in two routes (an EVPN-specific route and a VPN-IP route), and the EVPN-specific routes (types 2 or 5) are now required to carry attributes that are typically carried only by the VPN-IP routes.  Also, there are the intra-ES tunnels (discussed below), something that doesn't exist in MVPN.  And then there are those under-specified EVPN-specific 'gateways' (discussed below) that are used to connect tunnels of different types.

Kesavan>> W.r.t multicast route handling, same signaling procedures are used between EVPN and MVPN PEs. Yes, there are additional changes that are required in the EVPN control plane . (Even OISM proposes changes in the existing EVPN control plane to accommodate OISM solution)


Eric> 4. The draft proposes to use the same tunnels for MVPN and EVPN, i.e., to have tunnels that traverse both the MVPN and the EVPN domains.  Various "requirements" are stated that seem to require this solution.  Somewhere along the line it was realized that this requirement cannot be met if MVPN and EVPN do not use the same tunnel types.  So for this very common scenario, a completely different solution is proposed, that (a) tries to keep the EVPN control plane out of the MVPN domain, and vice versa, and (b) uses different tunnels in the two domains.  Perhaps the "requirements" that suggest using a single cross-domain tunnel are not really requirements!

Ali> There are SPDCs with MPLS underlay and there are SPDCs with VxLAN underlay. We need a solution that is optimum for both. Just the same way that we need both ASBR and GWs to optimize connectivity for inter-AS scenarios.

My point is that the document states "requirements", but applies them very selectively and very inconsistently.  There is a "requirement" to "use the same tunnels for MVPN and EVPN", but there are many deployment scenarios in which this "requirement" simply cannot be met.  If the "requirement" were stated as "only use tunnels that provide value",  I'd have no problem with it.  It seems that many of the specified requirements were reverse engineered from the solution as it was originally proposed, and then are silently ignored whenever it is discovered that they can't be met.

Kesavan>>  The requirement section has been updated such that optimum replication shall be provided when both technology use the same tunnel type.


Eric> 5a. In some cases, the "requirements" for optimality in one or another respect (e.g., routing, replication) are really only considerations that an operator should be able to trade off against other considerations.  The real requirement is to be able to create a deployment scenario in which such optimality is achievable.  Other deployment scenarios, that optimize for other considerations, should not be prohibited.

Ali> What deployment scenarios do you think are prohibited ?

The draft does not appear support scenarios in which the MVPN/EVPN interworking procedures are confined to a subset of the EVPN PEs, and not even visible to the majority of the EVPN PEs.

Kesavan>> The latest revision covers above mentioned use case.


Eric> While the authors have realized that one cannot have cross-domain tunnels when EVPN uses VxLAN and MVPN uses MPLS, they do not seem to have acknowledged the multitude of other scenarios in which cross-domain tunnels cannot be used.  For instance, MVPN may be using mLDP, while EVPN is using IR.  Or EVPN may be using "Assisted Replication", which does not exist in MVPN.  Or MVPN may be using PIM while EVPN is using RSVP-TE P2MP.  Etc., etc.  I suspect that "different tunnel types" will be the common case, especially when trying to interwork existing MVPN and EVPN deployments.

I note that the latest rev of the draft still does not take this into account.

Eric> The gateway-based proposal for interworking MVPN and EVPN when they use different tunnel types is severely underspecified.

Ali> Agreed. This will be covered in the subsequent revisions.

It doesn't seem to be in the latest revision.

Eric> One possible approach to this would be to have a single MVPN domain that includes the EVPN PEs, and to use MVPN tunnel segmentation at the boundary. While that is a complicated solution, at least it is known to work. However, that does not seem to be what is being proposed.

Ali> It is not clear to me exactly what you are suggesting here. At the boundary, is there any mcast address lookup or not?

If I were working on a proposal like the one in seamless-multicast, I would consider whether the MVPN inter-AS segmented P-tunnels feature could be leveraged at the border nodes between domains that use different tunnel types.  After all, one of the main purposes of MVPN inter-AS segmentation is to connect domains that use different tunnel types.  Done properly, that does not require any IP lookups at the ASBRs.  The draft seems to be trying to reinvent MVPN P-tunnel segmentation from scratch.  This is a very intricate part of the MVPN specs and you can't just make it up as you go along.

Here is just a selection of some of the problems with section 10.1 ("Control Plane Interconnect") of the -02 revision:

- Much of the document seems to assume that the RTs used in the MVPN domain will be the same as the RTs used in the EVPN domain.  If that is the case, all the A-D routes from one domain will propagate into the other.  This does not appear to be compatible with the sketchy description of "gateway" behavior given in Section 10.

- Section 10.1 states that the RD in a Source Active A-D route needs to be changed when a such a route is re-originated by a gateway.  Unfortunately, MVPN requires that the SA A-D route for (S,G) have the same RD as the unicast route for S.  So you would need to block all the IPVPN routes at the gateway and reoriginate them with new RDs.  The spec fails to mention this.  Note that this is not even possible if the EVPN PEs share the RTs of the MVPN domain.

- Interesting effects could arise if an EVPN PE chooses a gateway as the UMH, but the gateway chooses an EVPN PE as the UMH.  Can you demonstrate that this is impossible?

- Section 10.1 says that the C-multicast routes originated by the gateway carry the "exported RT list on the IP-VRF".  In MVPN, C-multicast routes do not carry the exported RT list, they carry an RT created from the VRF Route Import EC of the Selected UMH route.

- Section 10.1 talks about putting the BGP Encapsulation EC on the C-multicast routes sent into the MVPN domain.  However, MVPN does not make any use of this EC.

- Section 10.1 states that the S-PMSI A-D routes just propagate from one domain to the other, but with some unspecified "modifications".

- Leaf A-D routes are not discussed at all, nor is the setting of the LIR flag in the PMSI Tunnel attribute.

- Inter-AS I-PMSI A-D routes are not discussed.

This section is still severely underspecified.  It seems to be inventing a new way of interconnecting two L3VPN/MVPN domains, but it's not "option A", "option B", or "option C", and it's not "segmented P-tunnels".  So what is it exactly, and how do we know it works?

Have you thought about cases where multiple domains (i.e., more than 2) using different tunnel types are interconnected, perhaps in a cycle?

I think the issue of how to interwork domains that use different tunnel types is quite important.  If one wants to interwork an MVPN domain that uses mLDP-based P2MP LSPs with an EVPN domain that uses IR, I don't think one wants to tell customers that interoperability requires them to start using mLDP inside the EVPN domain.  If one is using assisted replication (AR) within the EVPN domain, I don't think anyone will want to hear "sorry, AR is not supported by MVPN".  I don't think the interworking between two domains can be called "seamless" if one has to change the tunnel types  of either domain.  But the details for how to do the interworking between different tunnel types just don't seem to be present.

Furthermore, it is pretty clear that some sort of gateway is going to be needed to provide interoperability with RFC 7432 nodes that do not implement MVPN; this needs to be addressed as well.

Kesavan>>  Latest revision covers gateway based proposal.  Some updates are still required w.r.t MVPN-EVPN internetworking, which will be taken care in the next revision.

Eric> Another approach would be to set up two independent MVPN domains and carefully assign RTs to ensure that routes are not leaked from one domain to another.  One would also have to ensure that the boundary points send the proper set of routes into the "other" domain.  (This includes the unicast routes as well as the multicast routes.)  And one would have to include a whole bunch of applicability restrictions, such as "don't use the same RR to hold routes of both domains".  I think that's what's being proposed, but there isn't enough discussion of RT and RD management to be sure, and there isn't much discussion of what information the boundary points send into each domain.

Ali> I will expand on that with the RD and RT management aspects.  But the intention is with a single MVPN domain where both EVPN and MVPN PEs participate.

Note that the use of a single RT by both MVPN and EVPN nodes will cause routes to be distributed throughout the "single MVPN domain", with no opportunity for a gateway to modify the routes.  But section 10.1 does seem to require a gateway to modify routes in order to connnect tunnels of different types.

Kesavan>> yes, Gateway needs to re-originate routes to connect different tunnel types. Please check next version.


Eric> 7. The proposal requires that EVPN export a host route to MVPN for each EVPN-attached multicast source.  It's a good thing that there is no requirement like "do not burden existing MVPN deployments with a whole bunch of additional host routes".  Wait a minute, maybe there is such a requirement.

Eric> In fact, whether the host routes are necessary to achieve optimal routing depends on the topology.  And this is a case where an operator might well want to sacrifice some routing optimality to reduce the routing burden on the MVPN nodes.

Ali> If there is mobility, then there is host route advertisement If there is no mobility, then prefixes can be advertised.

It seems to me that this is simply not true.  Consider the following example:

- BD1 has subnet 192.168.168.0/24.

- BD1 exists on ES1, which is attached to PE1.

- BD2 exists on ES2, which is attached to PE2.  (ES1 and ES2 are not the same ES.)

- On BD1/ES1, there are hosts 192.168.168.1, 192.168.168.103, 192.168.168.204.

- On BD2/ES2 there are hosts 192.168.168.2, 192.168.168.104, 192.168.1.203.

Assume there is no mobility.

In this scenario, I don't see how either PE1 or PE2 can advertise any prefix shorter than a /32.  And I don't see how one will prevent all these /32 routes from being distributed to all the MVPN nodes.

The fundamental issue here is that while IP addresses can be aggregated on a per-BD basis, they cannot be aggregated on a per-ES basis.

I don't think you get "seamless" interworking by requiring all the MVPN nodes to receive an unbounded number of host routes.

Kesavan>> With seamless interop model, single copy of data traffic  serves both MVPN and EVPN PEs.  But, host routes need to be advertised to MVPN-PEs that are directly attached to the fabric.
In gateway model, summarized routes can be advertised to MVPN PEs.


Eric> 8. The proposal simply does not work when MVPN receivers are interested in multicast flows from EVPN sources that are attached to all-active multi-homed ethernet segments.

Ali> This issue has been addressed in the new revision.

Yes, this is an improvement.

Suppose PE1 receives an (S,G) IP multicast frame over a local AC from BD1/ES1.  And suppose PE2,...,PEn are also attached to ES1.  Per the new revision, PE1 transmits a copy of the frame on an EVPN-specific tunnel to PE2,...,PEn (an "intra-ES1" tunnel), as well as transmitting a copy of the contained IP datagram on whatever MVPN tunnel it uses to carry (S,G) packets.  Now any EVPN PE attached to the source ES can be selected as the UMH by an MVPN node, because all such EVPN PEs get the (S,G) frames and can forward forwards them to MVPN receivers.

It's good to see the draft recognizing that IP multicast frames do need to be transmitted as frames on EVPN-specific tunnels, in addition to being transmitted as packets on MVPN tunnels.  (Of course this solution violates the stated "requirement" that a given IP multicast packet not be transmitted on two different tunnels. Sigh, another example of "requirements" being applied inconsistently.)

However, there are still several problems with this solution.

No control plane is described to support this intra-ES tunneling.  Is that "for the next revision"? ;-)

There's a suggestion that this solution is trivial, because no one would ever home an ES to more than two PEs, and therefore you just have to unicast a copy to the other PE.

But the PE receiving a frame has to figure out whether the frame was sent to it on an intra-ES tunnel or not, and if so, which ES the tunnel is associated with.  It is not clear how the receving PE is supposed to make this determination.  One needs to say something more than "just use ingress replicaton".

The draft also suggests that "multi-homed" always means "dual-homed", which I don't think is acceptable.

Note also that a scheme like this causes EVERY (S,G) frame to get sent to EVERY PE that is attached to S's source ES.  This happens even if there are NO receivers anywhere interested in (S,G) at all.  In effect, the LAG hashing algorithm is defeated.  If a switch is multi-homed to n PEs, it uses a LAG hashing algorithm to ensure that any given packet is sent to just one of those PEs.  Then one EVPN PE gets the packet and sends it to the other n-1 PEs, who have to treat the packet as if it had just arrived on the AC from the multi-homed switch.  Iit would be better to have a "pull model" where PEx gets the (S,G) packet from PE1 only if some MVPN PE has sent a C-multicast (S,G) or (*,G) route to PEx.

In addition, the latest rev of the draft is still confused about the way UMH selection is done.  It seems to assume all the PEs will select the same "Upstream PE" for a given (S,G).  While this is one possible option (generally referred to as Single Forwarder Selection), it is not required, and I believe the most common deployment scenario is to use the "Installed UMH Route" as the "Selected UMH Route".  (See section 5.1.3 of RFC 6513.)  This means that it is always possible for a PE to receive more than one copy of an (S,G) packet, and the PE must therefore always be able to apply the "discard from the wrong PE" procedures of RFC 6513 Section 9.1.1.

Suppose for example that EVPN-PE1 transmits its IP multicast frames on an I-PMSI that is instantiated by a P2MP LSP.  EVPN-PE2 will have to join that I-PMSI.  If PE1 and PE2 are both attached to BD1/ES1, then when PE1 gets an (S,G) IP multicast frame from BD1/ES1, PE2 will get two copies: one on the intra-ES1 tunnel from PE1 and one on the I-PMSI tunnel from PE1.  PE2 will probably choose itself as the "Upstream PE" for (S,G), in which case it needs to discard the copy that arrives on the I-PMSI tunnel from PE1, while accepting the copy that arrives on the intra-ES1 tunnel from PE1.  (If PE2 for some reason chose PE1 as the Upstream PE for (S,G), it would have to discard the copy arriving on the intra-ES1 tunnel and accept the copy arriving on the I-PMSI tunnel.)  The draft seems to imply, incorrectly, that the "discard from the wrong PE" procedure is not necessary.

The "discard from the wrong PE" procedures are also needed to handle the case where the source is at a site homed to two or more MVPN PEs, and there are MVPN receivers that do not do single forwarder selection.  This may cause some packets to appear on multiple I-PMSIs, and each EVPN-PE will have to join all the I-PMSIs, of course.

(The use of S-PMSIs rather than I-PMSIs does not eliminate this problem.  A given S-PMSI from PE1 might carry a flow that PE2 needs from PE1, and it might also carry a flow that PE2 is getting on an S-PMSI from PE3.)

Please note that if MPLS ingress replication is being used, the "discard from the wrong PE" functionality requires that the egress PE be able to tell from a packet's encapsulation when a packet is from the wrong ingress PE.

If the MVPN nodes are using the "extranet" feature (RFC 7900), "discard from the wrong PE" is not actually sufficient, one needs to "discard from the wrong ingress VRF".

Since there is no clean layering between MVPN and EVPN protocols in this proposal, every little nit and corner case of MVPN has to be examined to make sure it will also work in the EVPN domain.

Another problem: according to the draft, if an EVPN PE, say PE1, learns of a source via a locally attached all-active multi-homed ES, it will originate an IP route for that source.  Consider another PE, say PE2, attached to the same multi-homed ES.  When PE2 receives that IP route from PE1, PE2 will then originate its own IP route for that source.  Since PE1 receives PE2's route, it is not clear how the route ever gets withdrawn.  If PE1 stops seeing the local traffic, it will still see PE2's route, and hence will still originate its own route.  One might think this is easily fixed by attaching to PE1's route an EC that declares that route to be "authoritative"; PE2's route would not have that EC.  Note though that the adding or removal of this "authoritative" EC will cause some churn that will be visible to the MVPN-only nodes, even though it does not provide them with any useful information.

Kesavan>>   Will cover intra-ES tunneling procedure in the next revision.

I would also like to take note of the following issue.  From the draft:

       "The EVPN PEs terminate ...  PIM messages from tenant routers on their IRB interfaces, thus avoid sending these messages over MPLS/IP core."

A PIM control message from a given PIM router needs to reach whichever other PIM router is a possible unicast next hop for any multicast source or RP.  The scheme of having each EVPN PE terminate the PIM messages presupposes that each tenant router will have the nearest EVPN PE as its unicast next hop towards the multicast source or RP.

This is likely to be a common scenario, but it certainly is not the only scenario.  A tenant might have several PIM routers on a given BD, where each PIM router is attached to a different PE.  The PIM routers could be IGP neighbors in the tenant's IGP, and may be exchanging IGP updates with each other.  In this case, PIM control messages from one tenant PIM router on the BD need to reach the other tenant routers on the BD.

Kesavan>>   IGPs are usually terminated at the PE in the EVPN fabric.  This is the typical deployment model.

For example, suppose Tenant Router R1 on BD1 attaches to PE1, and Tenant Router R2 on a different ES of BD1 attaches to PE2.  If R1 and R2 are IGP neighbors, R2 may see R1 as the next hop to a given source S.  In that case, R2 may choose to target a PIM Join(S,G) to R1.

In this scenario, the PIM control messages between R1 and R2 have to be sent between PE1 and PE2.  Since PIM control messages have a TTL of 1, they would have to be sent on BD1's BUM tunnels rather than on the IP multicast tunnels.

Now the question is, if R2 sends PIM Join(S,G) to R1, how does R2 get the (S,G) traffic from R1?  Either PE1 has to send it on BD1's BUM tunnel, or else PE2 has to figure out that it needs to pull (S,G) traffic from PE1 on an IP multicast tunnel.  The spec needs to explain how this situation is handled.  If the (S,G) traffic travels on BD1's BUM tunnel, the spec also has to make it clear how that traffic gets to other BDs.

BTW, section 6.5 of the draft says that any frame containing an IP packet whose destination address is in the range 224/8 is sent as a BUM frame.  I suspect that 224.0.0/24 is what is meant, as that seems to be the IPv4 multicast link-local address space.

One more thing.  The draft says that SPT-ONLY (RFC 6514 section 14) mode should be the default configuration.  This has several problems:

- SPT-ONLY mode requires each PE to function as an RP, which creates a considerable amount of additional work for the PE (handling the register messages and maintaining a large number of (S,G) states).  It also requires the PE to originate a Source Active A-D route for each (S,G), a route that would not otherwise be needed.

- If the tenant or MVPN customer already has a multicast infrastructure with Rendezvous Points (RPs), it may be impossible to use SPT-ONLY mode, as this mode may not be compatible with the customer/tenant's infrastructure.  However, it may still be desirable to have RP-free operation for multicast groups whose sources and receivers are all in the EVPN domain.

- SPT-ONLY mode can sometimes be made compatible with an existing tenant/customer multicast infrastructure by having the PEs participate in the BSR or Auto-RP protocols, and/or by having the PEs participate in MSDP.  This would not generally be regarded as a simplification.

- If one is interworking with an MVPN whose PEs are configured to use RPT-SPT mode (RFC 6514 section 13), one must configure the EVPN-PEs to use RPT-SPT mode as well, because the two modes are not interoperable.  I believe most MVPN deployments use RPT-SPT mode.

So I don't see the grounds for recommending the SPT-ONLY mode as the default.  The choice between SPT-ONLY mode and RPT-SPT mode depends on many factors and requires knowledge of (a) a particular tenant's deployment scenario, and (b) if MVPN interworking is being done, the mode that is being used by the MVPN nodes.

Kesavan >>  Using spt-only mode has advantages compared to rpt-spt mode in  evpn only fabric. Hence it is recommended as default.  BTW, the solution supports rpt-spt mode as well, which can be used while doing interop with existing MVPN network that uses rpt-spt mode.