Re: [bess] draft-mohanty-bess-evpn-bum-opt-00 - clarification on problem description

"Ali Sajassi (sajassi)" <sajassi@cisco.com> Sat, 24 March 2018 17:13 UTC

Return-Path: <sajassi@cisco.com>
X-Original-To: bess@ietfa.amsl.com
Delivered-To: bess@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B6FE91271FD for <bess@ietfa.amsl.com>; Sat, 24 Mar 2018 10:13:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -14.511
X-Spam-Level:
X-Spam-Status: No, score=-14.511 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id v9Va0zLN16Zw for <bess@ietfa.amsl.com>; Sat, 24 Mar 2018 10:13:55 -0700 (PDT)
Received: from rcdn-iport-5.cisco.com (rcdn-iport-5.cisco.com [173.37.86.76]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A8F26126D05 for <bess@ietf.org>; Sat, 24 Mar 2018 10:13:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=2659350; q=dns/txt; s=iport; t=1521911634; x=1523121234; h=from:to:cc:subject:date:message-id:references: in-reply-to:mime-version; bh=6W7AToWnegny/9xUbsZd+j7YvJKXi+rBXWlsHScv7As=; b=F0HwgLh3cwFt1LtZTYe0cO6ijHho9NyXxsPd7NONBjbgxbM/dBe2tPFB y1UixCAYuwbOFT3vcBwvKWH0xQ1g+f1l1o5A4WtEkAa9ruohKe28ejObT zIgx9I57Gf2okm3ebYU5i9AwwO2EegH96rYfkX4OTbxa38xLNJZzcr3QQ w=;
X-Files: image001.png : 1891369
X-IronPort-AV: E=Sophos;i="5.48,355,1517875200"; d="png'149?scan'149,208,217,149";a="154086621"
Received: from rcdn-core-2.cisco.com ([173.37.93.153]) by rcdn-iport-5.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Mar 2018 17:13:53 +0000
Received: from XCH-RTP-011.cisco.com (xch-rtp-011.cisco.com [64.101.220.151]) by rcdn-core-2.cisco.com (8.14.5/8.14.5) with ESMTP id w2OHDqc4019612 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=FAIL); Sat, 24 Mar 2018 17:13:53 GMT
Received: from xch-rtp-005.cisco.com (64.101.220.145) by XCH-RTP-011.cisco.com (64.101.220.151) with Microsoft SMTP Server (TLS) id 15.0.1320.4; Sat, 24 Mar 2018 13:13:50 -0400
Received: from xch-rtp-005.cisco.com ([64.101.220.145]) by XCH-RTP-005.cisco.com ([64.101.220.145]) with mapi id 15.00.1320.000; Sat, 24 Mar 2018 13:13:50 -0400
From: "Ali Sajassi (sajassi)" <sajassi@cisco.com>
To: "Satya Mohanty (satyamoh)" <satyamoh@cisco.com>, Sandy Breeze <sandy.breeze@eu.clara.net>, "bess@ietf.org" <bess@ietf.org>
CC: "Ali Sajassi (sajassi)" <sajassi@cisco.com>
Thread-Topic: [bess] draft-mohanty-bess-evpn-bum-opt-00 - clarification on problem description
Thread-Index: AQHTwUI3pKI1nWgbp0ukIa0wdFUT3aPbhOcA///87ICAAI6wgIADJRgA
Date: Sat, 24 Mar 2018 17:13:50 +0000
Message-ID: <196DA827-7AD4-4366-AAF1-C065914470C6@cisco.com>
References: <ACCB9010-6A78-42E6-BA47-372E9E4F3002@cisco.com> <A1D7C338-C665-40A7-B124-378695DE949D@cisco.com> <783960F6-3EAA-4EC5-BB9F-72138ECCB9F4@cisco.com> <DD65A92C-9429-4D8E-A215-859928D4741F@cisco.com>
In-Reply-To: <DD65A92C-9429-4D8E-A215-859928D4741F@cisco.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/10.b.0.180311
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [10.24.22.103]
Content-Type: multipart/related; boundary="_004_196DA8277AD44366AAF1C065914470C6ciscocom_"; type="multipart/alternative"
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/bess/XsAQpyJ0XBHHg5MvR7M49KcZTbU>
Subject: Re: [bess] draft-mohanty-bess-evpn-bum-opt-00 - clarification on problem description
X-BeenThere: bess@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: BGP-Enabled ServiceS working group discussion list <bess.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bess>, <mailto:bess-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bess/>
List-Post: <mailto:bess@ietf.org>
List-Help: <mailto:bess-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bess>, <mailto:bess-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 24 Mar 2018 17:14:00 -0000

Hi Satya,

Wrt BW utilization in the core, in EVPN we have always tried to be cognizant of it and thus handling it in the best way possible. That’s why we got:

  1.  ARP suppression – to suppress flooding ARP message over core
  2.  Capability to turn off flooding in the core altogether
  3.  IGMP suppression – to suppress flooding of IGMP messages via IGMP proxy
  4.  Avoid unnecessary transmission of multicast streams – send mcast flow to the PEs that only have receivers

The best solution for your use case is (option-A):

  1.  Enable DF election on per mcast flow – gives the best load-balancing for DF election among multicast flows and avoids FAT VLAN issue
  2.  Enable IGMP proxy and SMET route – avoid unnecessary transmission of mcast flows over core

The 2nd best solution for your use case is (option-B):

  1.  Enable DF election per VLAN and turn off ESI getting factored-in the HRW hash
  2.  Enable IGMP proxy and SMET route – avoid unnecessary transmission of mcast flows over core

The 3rd solution for your ucase case is (option-C):

  1.  Enable DF election per VLAN and turn off ESI getting factored-in the HRW hash
  2.  Stop sending IMET route for non-DF BD

I guess you can guess which one I am in favor of ☺

Cheers,
Ali


From: "Satya Mohanty (satyamoh)" <satyamoh@cisco.com>
Date: Wednesday, March 21, 2018 at 11:35 PM
To: Cisco Employee <sajassi@cisco.com>, Sandy Breeze <sandy.breeze@eu.clara.net>, "bess@ietf.org" <bess@ietf.org>
Subject: Re: [bess] draft-mohanty-bess-evpn-bum-opt-00 - clarification on problem description

Hi Ali,

Your suggestion of the default/non-default mode will really help here.
Let me explain with reference to #2 that you posted in the email-chain (and I quote below) because I think that captures the reasoning well.

"Even for multi-homing with >2 PEs in the redundancy group, the chances of a PE not becoming a DF across all ES's in a BD is extremely low. We need to keep in mind that number of ES's are much larger than number of PEs !! And HRW algorithm in our df-framework draft takes into account the ES-id in its hash algorithm which means for the same BD, different PEs can become DF for different ES's !!”

The above is very true for the general case. No doubt about it.
However there may be a case to not spread the DF in special toplogies and for specific reasons.

Consider the topology below.

ES1 - - - - - - - - - PE1
                |
ES2 - - - - - - - - - PE2
                |
ES3  - - - - - - - - - PE3

Here all the PEs host all the ESes.
That is, ES1, ES2 and ES3 are connected to all of the PEs, PE1, PE2 and PE3, and all of them have the exact same set of vlans configured; by setting the ESid to zero in the HRW hash function, the carving of the vlan is the same for each of the ES; it would be independent of the ESID, similar to the base modulo (ESID oblivious) algorithm.

  1.  When we combine the above with the proposed IMET suppression approach, then we will save bandwidth utilization in the core, because the same PE would be the DF for an EVI (vlan) regardless of the ESI’s (no spraying of DF for the same vlan). Only this PE would then attract the BUM.
  2.  As you know, HRW does a ranking of the PE per-vlan. When the DF goes down, the new DF for a vlan, is still a unique one, regardless of the ES. This may help in the user having “some control” of the DF. This is in fact a matter of independent interest of the current discussion. It will perform similarly as the base-modulo DF election as far as carving is concerned and still have the benefits of the HRW when a PE goes down or comes into the reckoning, in the sense, that it  will minimize on the number of (vlan, DF) re-mappings.

At least, that was my thinking when we had internal discussions with Sandy regarding IMET suppression.
So, as you can see, the main goal in (1) is to restrict the BUM flodding rather than the uniform distribution of the PE access side utilization as the point #2 that you mentioned in your email.

I will let Sandy comment  if he can have two ES in his BD on the EVPN GW. Now he has one.
Then this will be more useful  for his case.
Once he is here, we can discuss with him.

Thanks,
—Satya


From: "Ali Sajassi (sajassi)" <sajassi@cisco.com<mailto:sajassi@cisco.com>>
Date: Thursday, March 22, 2018 at 5:05 AM
To: satyamoh <satyamoh@cisco.com<mailto:satyamoh@cisco.com>>, Sandy Breeze <sandy.breeze@eu.clara.net<mailto:sandy.breeze@eu.clara.net>>, "bess@ietf.org<mailto:bess@ietf.org>" <bess@ietf.org<mailto:bess@ietf.org>>
Subject: Re: [bess] draft-mohanty-bess-evpn-bum-opt-00 - clarification on problem description

Hi Satya,

As I mentioned earlier in another thread among the co-authors, we need to have a default mode of operation and for the default mode, the ESI MUST be factored-in the hash function (e.g.;, (vlan, ESI, PE’s IP address). Also, as discussed earlier we can capture in the draft the “option” for not factoring-in the ESI  (ESI=0 in the hash algorithm) – i.e., PEs in a redundancy group MAY all be configured to set ESI=0 in the hash algorithm.

BTW, do you have certain scenarios/use-cases in mind for setting ESI=0?

Cheers,
Ali

From: "Satya Mohanty (satyamoh)" <satyamoh@cisco.com<mailto:satyamoh@cisco.com>>
Date: Wednesday, March 21, 2018 at 3:16 PM
To: Cisco Employee <sajassi@cisco.com<mailto:sajassi@cisco.com>>, Sandy Breeze <sandy.breeze@eu.clara.net<mailto:sandy.breeze@eu.clara.net>>, "bess@ietf.org<mailto:bess@ietf.org>" <bess@ietf.org<mailto:bess@ietf.org>>
Subject: Re: [bess] draft-mohanty-bess-evpn-bum-opt-00 - clarification on problem description

We will take the feedback and revise the next version with the EVPN GW case as the primary use case.
Also, we will make it informational.

I need to make a mention again of what I spoke at the mic because I think it may not have been clear to everyone.
In the DF election framework draft, the weight is now a function of  the tuple(vlan, Esid, PE’s IP).
If we set the Esid to 0, then as long as each ES has the exact same set if vlans, the carving of vlans by the algorithm is the same.

Thanks,
—Satya

From: BESS <bess-bounces@ietf.org<mailto:bess-bounces@ietf.org>> on behalf of "Ali Sajassi (sajassi)" <sajassi@cisco.com<mailto:sajassi@cisco.com>>
Date: Wednesday, March 21, 2018 at 6:27 PM
To: Sandy Breeze <sandy.breeze@eu.clara.net<mailto:sandy.breeze@eu.clara.net>>, "bess@ietf.org<mailto:bess@ietf.org>" <bess@ietf.org<mailto:bess@ietf.org>>
Subject: Re: [bess] draft-mohanty-bess-evpn-bum-opt-00 - clarification on problem description

Hi Sandy,

The key point in here is that the proposal is intended for EVPN GWs (and not PEs). By talking about PEs and NVEs at BESS yesterday, lot of people got confused. Although for EVPN GWs, this proposal makes better sense, for EVPN PEs, it doesn’t much because:

  1.  Vast majority (if not all) of TORs/PEs multi-homing are dual-homing which gives us zero benefit
  2.  Even for multi-homing with >2 PEs in the redundancy group, the chances of a PE not becoming a DF across all ES's in a BD is extremely low. We need to keep in mind that number of ES's are much larger than number of PEs !! And HRW algorithm in our df-framework draft takes into account the ES-id in its hash algorithm which means for the same BD, different PEs can become DF for different ES's !!
3) As soon as there is a stub node (e.g., a single-home CE) connected to any PE, then all bets are off and that PE needs to send IMET route and receive mcast traffic
4) As soon as there is a link/ES failure, then we will end-up with (3) above for dual-homing scenario and the PE with active link needs to send IMET route and receive mcast traffic
5) For mcast flow (*,G) or (S,G), the solution described in igmp-proxy draft  is the most optimal

So, I would suggest to do the following:

  1.  In the problem statement of the draft, capture the below use case clearly.
  2.  Change the name of the draft to “bum optimization for EVPN gateways”
  3.  Capture briefly why the proposal is not intended for EVPN PEs/NVEs because of the above reasons.

Cheers,
Ali

From: BESS <bess-bounces@ietf.org<mailto:bess-bounces@ietf.org>> on behalf of Sandy Breeze <sandy.breeze@eu.clara.net<mailto:sandy.breeze@eu.clara.net>>
Date: Wednesday, March 21, 2018 at 8:58 AM
To: "bess@ietf.org<mailto:bess@ietf.org>" <bess@ietf.org<mailto:bess@ietf.org>>
Subject: [bess] draft-mohanty-bess-evpn-bum-opt-00 - clarification on problem description

After some discussion, we acknowledge the problem description needs further clarification for this not to become too specific a use case.  Consider the following example of our existing live deployments;

[cid:image001.png@01D3C12D.50A50F00]


The main points to articulate here are;

  *   PE[1..4] are at the boundary of an EVPN/MPLS domain (core side) and an EVPN/VXLAN domain (datacentre fabric side)
  *   They are responsible for L2VNI VTEP from ToR and MPLS L2VPN in core.
  *   From their point of view, 1 BD = 1 L2VNI (=1 ES).
  *   For any given DF type (modulo/HRW/etc) they distribute DF’s per-ES between them.
  *   Therefore, all nDF PE’s attract BUM for ES’s they’re not allowed to forward on and hence the waste of bandwidth in the EVPN core and cycles.

In our case, the solution we propose works very well.  We also showed this does no harm for the more typical EVPN-multihoming at the PE use case yesterday, which held up to technical scrutiny.

Sandy