Re: [bess] Comments on draft-sajassi-bess-evpn-mvpn-seamless-interop

"Ali Sajassi (sajassi)" <sajassi@cisco.com> Thu, 28 June 2018 21:23 UTC

From: "Ali Sajassi (sajassi)" <sajassi@cisco.com>
To: Eric C Rosen <erosen@juniper.net>, Bess WG <bess@ietf.org>
Thread-Topic: [bess] Comments on draft-sajassi-bess-evpn-mvpn-seamless-interop
Thread-Index: AQHTWYIvABdylJGGT0iQFKheHltmgqR3ZzoA
Date: Thu, 28 Jun 2018 21:23:37 +0000
Message-ID: <0A6CB14F-993C-4BCB-8678-26C3AE0AFE52@cisco.com>
References: <d1e53751-289d-6ac9-d019-2fe07cc33602@juniper.net>
In-Reply-To: <d1e53751-289d-6ac9-d019-2fe07cc33602@juniper.net>
Accept-Language: en-US
Content-Language: en-US
user-agent: Microsoft-MacOutlook/10.c.0.180410
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [10.41.32.24]
Content-Type: text/plain; charset="utf-8"
Content-ID: <54601448FE704E40AD552E6FE7BDCB45@emea.cisco.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/bess/MBWlZGeqsOk3l1O0YxMWC5OlHjA>
Subject: Re: [bess] Comments on draft-sajassi-bess-evpn-mvpn-seamless-interop
X-BeenThere: bess@ietf.org
X-Mailman-Version: 2.1.26
Precedence: list
List-Id: BGP-Enabled ServiceS working group discussion list <bess.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bess>, <mailto:bess-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bess/>
List-Post: <mailto:bess@ietf.org>
List-Help: <mailto:bess-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bess>, <mailto:bess-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Jun 2018 21:23:42 -0000

Eric,

Please see my responses to your comments inline marked w/ "Ali>"

On 11/9/17, 9:42 AM, "BESS on behalf of Eric C Rosen" <bess-bounces@ietf.org on behalf of erosen@juniper.net> wrote:

I have a number of comments on
draft-sajassi-bess-evpn-mvpn-seamless-interop.

1. It seems that the proposal does not do correct ethernet emulation.
Intra-subnet multicast only sometimes preserves MAC SA and IP TTL,
sometimes not, depending upon the topology. TTL handling for
inter-subnet multicast seems inconsistent as well, depending upon the
topology. The proposal exposes the operator's internal network
structure to the user, and will cause "LAN-only" applications to break.
These concerns are acknowledged, then quickly dismissed based on wishful
thinking. (In my experience, wishful thinking doesn't work out very
well in routing.)

Ali> EVPN doesn't provide LAN service per IEEE 802.1Q but rather an emulation of LAN service. This document defines what that emulation means wrt IP multicat traffic for intra-subnet & inter-subnet IP multicast traffic. I added section 5.1 to expand on that. BTW, TTL handling for inter-subnet IP multicast traffic is done consistent!

2. In order to do inter-subnet multicast in EVPN, the proposal requires
L3VPN/MVPN configuration on ALL the EVPN PEs. This is required even
when there is no need for MVPN/EVPN interworking. This is portrayed as a
"low provisioning" solution!

Ali> Using MVPN constructs doesn't requires additional configuration on EVPN PEs beyond multicast configuration needed for IRB-mcast operation.

3. The draft claims that the exact same control plane should be used for
EVPN and MVPN, despite the fact that MVPN's control plane is unaware of
certain information that is very important in EVPN (e.g., EVIs,
TagIDs). (This is largely responsible for point 1 above.) This is
claimed to be a way of providing a "uniform solution". As we examine
the problems that arise, perhaps this will be seen as more a case of
"pounding square pegs into round holes". When interworking between two
domains, generally one gets a more flexible and robust scheme by
maintaining clean interfaces and having well-defined points of
attachment, not by entangling the internal protocols of one domain with
the internal protocols of the other.

Ali> IP multicast described in the draft is done at the tenant's level (IP-VRF) and not BD level !! So, BD level info such as tagIDs are not relevant.

4. The draft proposes to use the same tunnels for MVPN and EVPN, i.e.,
to have tunnels that traverse both the MVPN and the EVPN domains.
Various "requirements" are stated that seem to require this solution.
Somewhere along the line it was realized that this requirement cannot be
met if MVPN and EVPN do not use the same tunnel types. So for this very
common scenario, a completely different solution is proposed, that (a)
tries to keep the EVPN control plane out of the MVPN domain, and vice
versa, and (b) uses different tunnels in the two domains. Perhaps the
"requirements" that suggest using a single cross-domain tunnel are not
really requirements! And why would we want different solutions for
different deployment scenarios? Yes, the solution needs to handle all
the use cases, but we don't want to look at the use cases one at a time
and design a different solution for each one.

Ali> There are SPDCs with MPLS underlay and there are SPDCs with VxLAN underlay. We need a solution that is optimum for both. Just the same way that we need both ASBR and GWs to optimize connectivity for inter-AS scenarios.

While the authors have realized that one cannot have cross-domain
tunnels when EVPN uses VxLAN and MVPN uses MPLS, they do not seem to
have acknowledged the multitude of other scenarios in which cross-domain
tunnels cannot be used. For instance, MVPN may be using mLDP, while
EVPN is using IR. Or MVPN may be using RSVP-TE P2MP while EVPN is using
AR. Etc., etc. I suspect that "different tunnel types" will be the
common case, especially when trying to interwork existing MVPN and EVPN
deployments.

Ali> This will be captured in the next rev. and that's why the need for both GW and ASBRs.

The inability to use EVPN-specific tunnels also causes a number of
specific problems when attempting to interwork with MVPN; these will be
examined below.

5. A number of the draft's stated "requirements" seem to be entirely bogus.

a. In some cases, the "requirements" for optimality in one or another
respect (e.g., routing, replication) are really only considerations that
an operator should be able to trade off against other considerations.
The real requirement is to be able to create a deployment scenario in
which such optimality is achievable. Other deployment scenarios, that
optimize for other considerations, should not be prohibited.

Ali> What deployment scenarios do you think are prohibited ?

b. Many of the "requirements" are applied very selectively, e.g., the
"requirement" for MVPN and EVPN to use the same set of multicast
tunnels, and the requirement for there to be no "gateways".

Ali> That has been explained in context of SPDC.

6. The gateway-based proposal for interworking MVPN and EVPN when they
use different tunnel types is severely underspecified.

Ali> Agreed. This will be covered in the subsequent revisions.

One possible approach to this would be to have a single MVPN domain that
includes the EVPN PEs, and to use MVPN tunnel segmentation at the
boundary. While that is a complicated solution, at least it is known to
work. However, that does not seem to be what is being proposed.

Ali> It is not clear to me exactly what you are suggesting here. At the boundary, is there any mcast address lookup or not?

Another approach would be to set up two independent MVPN domains and
carefully assign RTs to ensure that routes are not leaked from one
domain to another. One would also have to ensure that the boundary
points send the proper set of routes into the "other" domain. (This
includes the unicast routes as well as the multicast routes.) And one
would have to include a whole bunch of applicability restrictions, such
as "don't use the same RR to hold routes of both domains". I think
that's what's being proposed, but there isn't enough discussion of RT
and RD management to be sure, and there isn't much discussion of what
information the boundary points send into each domain.

Ali> I will expand on that with the RD and RT management aspects. Both the intension is with a single MVPN domain where both EVPN and MVPN PEs participate.

7. The proposal requires that EVPN export a host route to MVPN for each
EVPN-attached multicast source. It's a good thing that there is no
requirement like "do not burden existing MVPN deployments with a whole
bunch of additional host routes". Wait a minute, maybe there is such a
requirement.

Ali> :-)

In fact, whether the host routes are necessary to achieve optimal
routing depends on the topology. And this is a case where an operator
might well want to sacrifice some routing optimality to reduce the
routing burden on the MVPN nodes.

Ali> If there is mobility, then there is host route advertisement :-) If there is no mobility, then prefixes can be advertised.

8. The proposal simply does not work when MVPN receivers are interested
in multicast flows from EVPN sources that are attached to all-active
multi-homed ethernet segments.

Ali> This issue has been addressed in the new revision.

This issue is worth examining in detail.

Suppose EVPN-PE1 and EVPN-PE2 are both attached to the same ethernet
segment, using all-active multi-homing. Suppose there is a multicast
source S on that segment. In such a case, (S,G1) traffic might arrive
at PE1, while (S,G2) traffic might arrive at PE2. (Which PE gets a
particular flow from S depends on LAG hashing algorithms over which we
have no control.) Now suppose that an MVPN PE, say PE3, needs to
receive (S,G1) traffic.

MVPN requires PE3 to select the "Upstream PE" for the (S,G1) traffic.
PE3 does this by looking at the VRF Route Import EC on its best route to S.

In order to receive the (S,G1) traffic, PE3 must select PE1, rather than
PE2, as the Upstream PE. However, there is absolutely nothing in the
MVPN specs or in this document to ensure that PE3 selects PE1 rather
than PE2. Generally, an MVPN node will select PE2 if it is closer to PE2.

Perhaps the authors are under the impression that MVPN Source Active A-D
routes can be used to solve this problem. That is not so. Vanilla MVPN
nodes do not generally base their selection of the Upstream PE for (S,G)
on the SA A-D routes.

Let me explain a little about the way SA A-D routes are used. There are
two different MVPN "modes" that affect the use of SA A-D routes.

In one mode (sometimes known as 'rpt-spt' mode, and described in Section
13 of RFC 6514), an SA A-D route for (S,G) is originated by a PE when
that PE receives a C-multicast route for (S,G).

In another mode (sometimes known as 'spt-only' mode, and described in
Section 14 of RFC 6514), an SA A-D route for (S,G) is originated by a PE
when that PE receives a PIM Register message for (S,G), or when that PE
receives an MSDP SA message for (S,G). Note that in this mode, the PE
originating the SA A-D route is not necessarily the best (or even a
good) ingress PE for the flow.

- In both modes, if an egress PE receives a PIM Join (S,G) from a CE,
its choice of ingress PE is never impacted by the SA A-D routes. Note
that CEs send PIM Join(S,G) messages for both ASM and SSM groups.

- In spt-only mode, the SA A-D routes are used to discover sources, but
not to select the ingress PE. (The selected ingress PE is not
necessarily the one originating the SA A-D route.)

- The choice of ingress PE is impacted by the SA A-D routes for (S,G)
only when (a) rpt-spt mode is being used, (b) the egress PE has received
a PIM Join (*,G) from a CE, and (c) the egress PE has not received a PIM
Join (S,G) from a CE. This is typically just a transient state, as the
CE will generally emit a PIM Join(S,G) as soon as it sees any (S,G) traffic.

Bottom line: if a source is on an EVPN all-active multi-homed segment,
MVPN receivers have no way to select the proper ingress PE. If the
segment is n-way-homed, the MVPN PEs have just a 1/n chance of getting
the traffic.

Of course, this problem could be eliminated if EVPN and MVPN didn't have
to use the same tunnels. In that case, if an MVPN node selects the
wrong ingress PE, the selected PE could obtain the traffic from the real
ingress PE, and then relay it to the MVPN node. This might result in
sub-optimal routing, but that's better than a black hole!

Perhaps the gateway-based solution needs to be used whenever there is
all-active multi-homing? ;-)

One could imagine modifying the MVPN installed based so that the SA A-D
routes play more of a role in selecting the Upstream PE. However, I
believe the requirement is to allow MVPN/EVPN interworking without
modifying the existing MVPN nodes.

9. In the case where all the multicast sources for a given group are
attached via EVPN, there is a very simple procedure for providing
Join(*,G) functionality. This procedure makes use of EVPN-specific
knowledge. Since the MVPN protocols cannot take advantage of the
EVPN-specific knowledge, a more complicated procedure is needed when
only MVPN protocols are used. This is explained further in the in-line
comments.

10. Most of the problems above are the result of (a) trying to use the
exact same control plane for both MVPN and EVPN, and (b) treating the
case where both domains use the same tunnel type as the design center.
It would be better to keep clean interfaces between EVPN and MVPN, with
clearly defined points of attachment. The proposal in
draft-lin-bess-evpn-irb-mcast does this, and thus does not run into the
above problems. That proposal also shows how the "optimal routing"
requirements can be met, and how they can be traded off against other
considerations. (In fairness, it must be acknowledged that both
proposals are still works in progress. It's also worth noting that the
two proposals have a lot in common.)

Ali> The proposal in evpn-irb-mcast is not ruled out.

A number of additional comments can be found in-line in the attachment.
(I realize that some of them are repetitive, sorry.) Look for lines
beginning "****". The above comments are also repeated at the front of
the attachment.

Ali> I will go over your additional comments and address them separately.

Cheers,
Ali

Re: [bess] Comments on draft-sajassi-bess-evpn-mv… Ali Sajassi (sajassi)
[bess] Comments on draft-sajassi-bess-evpn-mvpn-s… Eric C Rosen
Re: [bess] Comments on draft-sajassi-bess-evpn-mv… Eric C Rosen
Re: [bess] Comments on draft-sajassi-bess-evpn-mv… Kesavan Thiruvenkatasamy (kethiruv)
Re: [bess] Comments on draft-sajassi-bess-evpn-mv… Kesavan Thiruvenkatasamy (kethiruv)