Re: [bess] FW: WG Last Call (including implem status & shepherd) for draft-ietf-bess-evpn-vpws-03

Sami Boutros <sboutros@vmware.com> Wed, 18 May 2016 17:01 UTC

Return-Path: <sboutros@vmware.com>
X-Original-To: bess@ietfa.amsl.com
Delivered-To: bess@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3409112D5CB for <bess@ietfa.amsl.com>; Wed, 18 May 2016 10:01:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -8.346
X-Spam-Level:
X-Spam-Status: No, score=-8.346 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.426, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RmJWDjFEA0mk for <bess@ietfa.amsl.com>; Wed, 18 May 2016 10:01:50 -0700 (PDT)
Received: from smtp-outbound-2.vmware.com (smtp-outbound-2.vmware.com [208.91.2.13]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F230C12D5A8 for <bess@ietf.org>; Wed, 18 May 2016 10:01:49 -0700 (PDT)
Received: from sc9-mailhost3.vmware.com (sc9-mailhost3.vmware.com [10.113.161.73]) by smtp-outbound-2.vmware.com (Postfix) with ESMTP id 40D669806F; Wed, 18 May 2016 10:01:48 -0700 (PDT)
Received: from EX13-CAS-010.vmware.com (ex13-cas-010.vmware.com [10.113.191.62]) by sc9-mailhost3.vmware.com (Postfix) with ESMTP id 011F4402BE; Wed, 18 May 2016 10:01:49 -0700 (PDT)
Received: from EX13-MBX-037.vmware.com (10.113.191.78) by EX13-MBX-025.vmware.com (10.113.191.45) with Microsoft SMTP Server (TLS) id 15.0.1156.6; Wed, 18 May 2016 10:01:47 -0700
Received: from EX13-MBX-029.vmware.com (10.113.191.49) by EX13-MBX-037.vmware.com (10.113.191.78) with Microsoft SMTP Server (TLS) id 15.0.1156.6; Wed, 18 May 2016 10:01:46 -0700
Received: from EX13-MBX-029.vmware.com ([fe80::1846:2003:39f8:8a33]) by EX13-MBX-029.vmware.com ([fe80::1846:2003:39f8:8a33%15]) with mapi id 15.00.1156.000; Wed, 18 May 2016 10:01:46 -0700
From: Sami Boutros <sboutros@vmware.com>
To: "Jeffrey (Zhaohui) Zhang" <zzhang@juniper.net>
Thread-Topic: [bess] FW: WG Last Call (including implem status & shepherd) for draft-ietf-bess-evpn-vpws-03
Thread-Index: AQHRrgTJ1XbDGXV9VEKztuPSaZn3Jp+4z9qAgAO6eQCAAmeWAA==
Date: Wed, 18 May 2016 17:01:46 +0000
Message-ID: <67A01B47-48C4-4899-9D87-F64E379DA174@vmware.com>
References: <572B2655.3020605@alcatel-lucent.com> <D3095B2F-072B-42CA-B160-DB4888DA02A7@alcatel-lucent.com> <57303CEE.8040104@orange.com> <7A432DD5-E28B-4670-B53E-2137A0A6E445@alcatel-lucent.com> <57309395.8090408@orange.com> <SN1PR0501MB1709149BA36C410EF7E46E52C7700@SN1PR0501MB1709.namprd05.prod.outlook.com> <SN1PR0501MB1709DD24B90B52F014AA58CFC7700@SN1PR0501MB1709.namprd05.prod.outlook.com> <BLUPR0501MB17151371F2D0A16368B63FFAD4720@BLUPR0501MB1715.namprd05.prod.outlook.com> <CAFKBPj58Q_cDi3GhAtgCQ0XfAaYtdNeCzzuWN3eJURc47y5sWg@mail.gmail.com> <0D4E38AC-C11C-4DB9-8D61-8778FA2852B4@vmware.com> <BLUPR0501MB1715DB345CF2E66870572148D4770@BLUPR0501MB1715.namprd05.prod.outlook.com>
In-Reply-To: <BLUPR0501MB1715DB345CF2E66870572148D4770@BLUPR0501MB1715.namprd05.prod.outlook.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [10.113.160.246]
Content-Type: multipart/alternative; boundary="_000_67A01B4748C448999D87F64E379DA174vmwarecom_"
MIME-Version: 1.0
Archived-At: <http://mailarchive.ietf.org/arch/msg/bess/5qVX80t45YTd05Xhr4daG_P4TGM>
Cc: "bess@ietf.org" <bess@ietf.org>
Subject: Re: [bess] FW: WG Last Call (including implem status & shepherd) for draft-ietf-bess-evpn-vpws-03
X-BeenThere: bess@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: BGP-Enabled ServiceS working group discussion list <bess.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bess>, <mailto:bess-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bess/>
List-Post: <mailto:bess@ietf.org>
List-Help: <mailto:bess-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bess>, <mailto:bess-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 18 May 2016 17:01:54 -0000

Hi Jeffrey,

Please see responses inline in green.

From: "Jeffrey (Zhaohui) Zhang" <zzhang@juniper.net<mailto:zzhang@juniper.net>>
Date: Monday, May 16, 2016 at 2:18 PM
To: Sami Boutros <sboutros@vmware.com<mailto:sboutros@vmware.com>>
Cc: "bess@ietf.org<mailto:bess@ietf.org>" <bess@ietf.org<mailto:bess@ietf.org>>
Subject: RE: [bess] FW: WG Last Call (including implem status & shepherd) for draft-ietf-bess-evpn-vpws-03

Hi Sami,

Please see zzh> below. I trimmed some text.


   ... eliminates the
   need for single-segment and multi-segment PW signaling,

This is not discussed in the draft later. It should either be discussed, or removed from the abstract & introduction.
[Sami]
We do have a section 5 on comparison between PW signaling and EVPN, Actually, the draft is proposing a mechanism that eliminates PW signaling for P2P service using EVPN, so not sure how can we remove this?

Zzh> It’s the mentioning of “single-segment and multi-segment” that led to my comments. If that were not there, it would not have been an issue. For example, I had expected some text on how multi-segment PW is realized in EVPN-VPWS. If you don’t plan to talk about that, then just don’t bring it up. For example, simply say “eliminates the need for traditional way of PW signaling”.

   ... and provides
   fast protection using data-plane prefix independent convergence upon
   node or link failure.

This is not discussed in the draft either. Can you elaborate? Is it really enabled by EVPN or actually independent of EVPN?

[Sami]
Please have a look at section 5.
[Sami]

Zzh> When it comes to PIC, the closed thing I can find in section 5 is the following:

   Upon link or node failure, EVPN can trigger failover with the
   withdrawal of a single BGP route per EVPL service or multiple EVPL
   services, whereas with VPWS PW redundancy, the failover sequence
   requires exchange of two control plane messages: one message to
   deactivate the group of primary PWs and a second message to activate
   the group of backup PWs associated with the access link. Finally,
   EVPN may employ data plane local repair mechanisms not available in
   VPWS.

The first part (before “Finally”) talks about the two control plane message – so that’s not “data-plane” (the text I had question about talks about “data-plane prefix independent convergence). The second part talks about “data plane local repair” but has no details that I was looking for.

Sami:
The local repair here will be done by the primary PE o(on local AC down) using the label advertised in per EVI EAD route advertised by the backup PE will direct the traffic to backup PE, I will add this text to clarify this, is this ok?
Sami:

It seems that the Ethernet Segment is not used consistently. From RFC 7432:

   Ethernet Segment (ES): When a customer site (device or network) is
      connected to one or more PEs via a set of Ethernet links, then
      that set of links is referred to as an 'Ethernet segment'.

My understanding is that ES is at link/port level and it refers to a set of links, while AC could be at vlan level. In the below paragraph,

   [EVPN] has the ability to forward customer traffic to/from a given
   customer Attachment Circuit (AC), aka Ethernet Segment in EVPN
   terminology, without any MAC lookup.

Here we're referring AC as ES - it does not seem to be stringent.
It's better to remove "aka Ethernet Segment in EVPN terminology" to avoid inconsistency.

Sami: Sure will remove.

[Sami]

There is no inconsistency, the text you refer to in [7432] is part of the explanation of the ESI or ethernet segment identifier.
in [7432] Page 31, it mentions exactly what we are referring to above. "On the other hand, a unique label per <ESI, Ethernet tag> allows an egress PE to forward a packet that it receives from another PE, to the connected CE, after looking up only the MPLS labels without having to perform a MAC lookup.”

Zzh> My point is about the relationship between ES and AC/link/port (see below) and not about whether mac lookup is done or not.

[Sami]

   ... [MEF] defines Ethernet
   Virtual Private Line (EVPL) service as p2p service between a pair of
   ACs (designated by VLANs) and Ethernet Private Line (EPL) service, in
   which all traffic flows are between a single pair of ESes.

Perhaps change "ESes" to "links or ESes"? The definition of ES in RFC7432 is "set of links". See below about generalization.

[Sami]

From one PE point of view an ES is a single link.[7432] is not denying this.

Zzh> My original point was that, RFC 7432 defines a ES as “set of links” for multi-homing case so “between a single pair of ESes” here may be a little misleading. I am not going to split hair on this – but please see more below on this to understand my intention and a simple fix (again it’s up to you and I am not insisting).

[Sami]

   ... EVPL can
   be considered as a VPWS with only two ACs.

Both EVPL and EPL can be considered as VPWS with only two ACs (I assume an AC is not necessarily at vlan level).

   ... In a VPWS service,  the traffic from an
   originating Ethernet Segment can be forwarded only to a single
   destination Ethernet Segment; hence, no MAC lookup is needed and the
   MPLS label associated with the per-EVI Ethernet AD route can be used
   in forwarding user traffic to the destination AC.

I can understand that ES here is generalized, but it's better to point out at the beginning.

Zzh> So ES here is loosened to refer both multi-homing case and single-homing case (again this has nothing to do with mac lookup or not). Doing a search in RFC 7432 reveals that 7432 also uses it in a somewhat loose manner (e.g. section 6.1). I am not going to split hair, but given that you have a terminology section for terms like MPLS/MAC/EVI, you might as well add ES there and point out that it may be used for single-homing case as well. BTW – it would be better to have the terminology section before (or right at the beginning of) the introduction (currently it’s after the introduction paragraphs).

Sami: Will add to the terminology section that an ES on a PE refer to the link attached to it, this link can be part of a set of links attached to different Pes in multi home cases, or could be a single link in single home cases. How does that sound?


   For a multi-homed CE, in an advertised Ethernet A-D per EVI route the
   ESI field is set to the CE's ESI and the Ethernet Tag field is set to
   the VPWS service instance identifier, which MUST have the same value
   on all PEs attached to that ES.

What if you receive a set of per EVI routes with the same Ethernet Tag field but different ESIs? The spec should define the behavior.

[Sami]
Those will be different routes as per [7432]. We are not defining any new behavior.

Zzh> Since the Ethernet Tag field identifies service instance and the two ends of the same PW could use different values, we cannot correlate routes using [7432] criteria. However, I will withdraw my original question, since “MUST have the same value” in the above text could refer to both the ESI and TAG fields (I originally I thought that it referred to the Tag field only).

Sami: It is the Ethernet tag field only identifies both ends of the PW (VPWS instance) if one side of the PW is multihomed then we set the ESI field on that side only.

[Sami]

   In all cases traffic follows the transport paths, which
   may be asymmetric.

"follows the transport paths" is hard to understand. Perhaps this sentence could be deleted.

[Sami]
Transport paths carry the service traffic over the MPLS transport network, we are simply mentioning this, those paths can for sure be asymetric as the per the nature of MPLS LSPs. I am not really sure if we should delete this.

Zzh> In all cases (way beyond EVPN) all traffic will always follow transport path ☺ That’s what was causing trouble for me. My personal habit is to remove text that only adds confusion (I did not see other purpose of that text in this case) so I said “perhaps delete it”, but I am not going to split hair here – consider my comment addressed/withdrawn.

Sami: Ok.

[Sami]


In the following paragraph:

   For EVPL
   service, the Ethernet frames transported over an MPLS/IP network MUST
   remain tagged with the originating VID and any VID translation is
   performed at the disposition PE. For EPL service, the Ethernet frames
   are transported as is and the tags are not altered.

"remain tagged" is a little unclear to me and RFC 7432 does not talk about it either. Is it that incoming tagging is not changed at all (e.g. double tagged) or is it single tagged with normalized VIDs? Is it that for both services, the frames are transported as is across the core, and the tag alteration is only happening at the disposition PE in case of EVPL?

[Sami]

We will change the MUST to a SHOULD to be consistent with [7432] Vlan based services section 6.1.

Zzh> It’s not the MUST that I am having trouble with. It’s the “remain tagged” that is not clear to me (7432 is not clear to me either). As I mentioned in the other exchange with Jorge on this – is it that in both cases the Ethernet frames are transported in the core as is?


If both cases are “transported as is” (in the core – I understand that in EVPL case translation may be done on the egress PE), then it is confusing to me to say “remain tagged” for EVPL but “as is” for EPL. Better use consistent wording.

If “remain tagged” means that traffic in the core is using “normalized VID” (e.g., double tagged incoming packets become single tagged), then the word “remain” is not clear enough.

Sami: Shall we say, for EVPL, at least one VID should be present on the packet, and the VID manipulation is a local matter.

[Sami]

   5. Also, multiple EVPL service VLANs on the same trunk could belong
   to the same EVPN instance (EVI), or they could belong to different
   EVIs. This should be purely an administrative choice of the network
   operator.

   6. A given access trunk could have hundreds of EVPL services, and a
   given PE could have thousands of EVPLs configured. It must be
   possible to configure multiple EVPL services within the same EVI.

Aren't the above two the same?
[Sami]
They are not the same, [5] talks about multiple EVPL on the same trunk interface belonging to different EVI(s), and [6] talks about EVPL on a multiple access interfaces belonging to the same EVI.

Zzh> The first sentence of [5] talks about “multiple EVPL service VLANs on the same trunk could belong  to the same EVPN instance (EVI)”. The second sentence talks about them belonging to different EVIs. Both cases (same of different EVIs) are covered.

Zzh>  The first sentene of [6] talks about “A given access trunk …”, i.e. the “same trunk interface” as covered in [5]. If the intention is to emphasize EVPLs on different access trunks, then perhaps delete the “A given access trunk …” text:

   6. A given PE could have thousands of EVPLs configured. It must be
   possible to configure multiple EVPL services within the same EVI,
   regardess whether they’re on the same or multiple access interfaces.

Sami: Sure will update the text.

[Sami]

   ... For this service interface, each VLAN is
   presented by a single VID which means no VLAN translation is allowed.

Perhaps change to "all VLANs are represented by ..."?
[Sami]

Not sure I get that, you mean all VLANs are presented by different VIDs?

Zzh> I misunderstood the text earlier. All set on this one.

Sami: Cool.
[Sami]


   ... Finally,
   EVPN may employ data plane local repair mechanisms not available in
   VPWS.

Can you elaboration on the above? What is different from non-EVPN VPWS wrt local repair?

[Sami]
In EVPN, on failure we can direct traffic to the backup PE using a backup label signaled, PW redundancy doesn’t have that.

Zzh> If I understand it correctly, switching to a backup label (either single-active or all-active) is triggered by the withdraw of the per-ES route. It’s PIC for sure but that does not look like “data plane local repair” to me. Besides, that’s already talked about before the “finally” sentence, so the “finally” sentence would be redundant.

Sami: As mentioned above, the primary PE detecting a local AC failure, can direct the traffic to the backup PE using the backup PE per EVI Ethernet-AD route label for the same VPWS instance, the primary PE will still send the withdraw for the per-ES route to the remote Pes.

[Sami]

[Sami]
The draft already explicitly mention that in single active, traffic will not start flowing until the remote PE, receives from at least one of the PE(s) in the single active redundancy an Ethernet-AD route with the P bit set. Identifying the backup PE will help to failover quicker at the remote PE. We have spent weeks as co-authors on this section before, and to be quite honest, I am not sure if any new text on this will clarify!

Zzh> The fact that we’re here again shows the need for some clarification. Being someone who does not have perfect understanding of every detail of EVPN (I assume this would be the profile of most people, especially for some developers who may implement this feature for another vendor in the future), my confusion came from the following, and I hope the document will clear that right off the bat:


1.  The fact that primary PE in case of single-active is determined by DF election is only casually mentioned in RFC 7432 in one place (I missed that before) and not in “8.5.  Designated Forwarder Election”:


14.1.1.  Single-Active Redundancy Mode



   …  It should be noted that the primary PE for a given <ES,

   VLAN> (or <ES, VLAN bundle>) is the DF for that <ES, VLAN> (or <ES,

   VLAN bundle>).


2.      It was not clear that the multiple P=1 is transient and it gave me the impression that the receiving router can pick anyone of them to send traffic to (there may be heuristics that can lead to a good choice).

3.      Current text does not explain how a P=B=0 route is used.

Sami: The authors are agreeing on some text to clarify the DF election usage here, and the P/B bits assignment, will update the draft with it shortly.
[Sami]


Looks like that a PE includes the community for the other end to request the BW enforcement/accounting from the PSN. Should/could both ends include the community? Does it make sense for the two to signal different BW? I suppose so?

[Sami]
This BW will be symmetric, we can make that explicit in the doc.

Zzh> I suppose there could be use cases where the BW is asymmetric. There is no need to restrict that.

Sami: I will add text to allow that.

[Sami]

   In the case where PSN resources are not available, the PE receiving
   this attribute MUST re-send its local Ethernet AD routes for this
   EVPL service with the ESI Bandwidth = All FFs to declare that the
   "PSN Resources Unavailable".

Shouldn't we use a different indication that the requested BW is not granted (see comments above)?
[Sami]
Not sure, I get this, if it can’t be granted BW in one direction, are you saying may be the other direction will get it?

Zzh> Right. RSVP-TE BW is one direction. It could be that a service only need 4M in one direction and 10M in the other, and the BW reservation in the network could be satisfied in one direction but not the other.

Sami: however if one side BW is not granted, we still bring the PW up.

[Sami]

   The scope of the ESI Bandwidth is limited to only one Autonomous
   System.

The BW request might be used in two ways:

- request the PSN to "guarantee" the requested BW
- request the other PE to shape/limit the traffic that it sends towards this PE

For the first case, the BW may not be guaranteed across ASes and I assume that's the reason for the limitation in the above quoted text (better to explain it). For the second case, it would be valid even if it's across ASes. In fact, draft-boutros-bess-evpn-vpws-service-edge-gateway has the following:

      - Auto-provision features such as QOS access lists (ACL), tunnel
        preference, bandwidth, L3VPN on a per head-end interface basis

I assumed that the "auto-provision ... bandwidth" in that draft is more about traffic shaping/limiting by the PE than about bandwidth guarantee by the PSN. If that's the case, then it should be valid even if it's across ASes.

It would help a lot if the draft can elaborate on the use case of the bandwidth community.

[Sami]
We are not proposing any Shaping, or filters for traffic entering the EVPN-VPWS service, it is a simple BW negotiation, that can be provisioned single sided.

Zzh> Using the draft-boutros-bess-evpn-vpws-service-edge-gateway auto-provisioning example (at least that’s how I understand it) – it can be provisioned on the AN side so that the SN can shape/limit the traffic entering the EVPN-VPWS. In that case, is it necessary that “The scope of the ESI Bandwidth is limited to only one Autonomous  System”?

Zzh> Basically, my points about BW are:

- It seems that it does not have to be limited to BW reservation in the network; it could be simply traffic shaping/limiting on the PE into the network. Therefore it does not have to be limited to only one AS. For example, the nework could be using LDP and there is no way to reserve BW but shaping is still desired on the PEs.

Sami: The issue here is that we are using the BW idr draft, and there is no shaping defined there, and we don’t plan to define a new attribute for this, perhaps we can add that to a new draft?

- It seems that the BW signaling does not have to be symmetric, and even if it’s symmetric and it’s for BW reservation, failure in one direction should be independent of the other direction, so the failure indication should be done separately. For example, originally both ends signal 10M. PE1 fails the reservation but PE2 succeeds. Or it could be that initially both succeeds but later the TE tunnel from PE1 fails. Is it desirable for PE1 to signal BW=FF, which (I assume) would lead to PE2 to cancel its successful BW reservation? If a different failure indication is used, we won’t have this problem.

Sami: Agreed the failure indication, shall be different, given that we are bringing up the PW even if the BW can’t be granted.

[Sami]

7.2 Multi-Homed CEs

   For a faster convergence in multi-homed scenarios with either Single-
   Active Redundancy or All-active redundancy, mass withdraw technique
   as per [EVPN] baseline is used. A PE previously advertising an
   Ethernet A-D per ES route, can withdraw this route signaling to the
   remote PEs to switch all the VPWS service instances associated with
   this multi-homed ES to the backup PE

Does the PE also withdraw the individual per-EVI AD routes? I assume not; better make it clear.
[Sami]
It will withdraw as per [7432], we are not defining any new behavior.

Zzh> Now that you mention it, I did a search in RFC 7432 and the only place I can found about withdrawing per-EVI route is the following:


   When an Ethernet tag is decommissioned on an Ethernet segment, then

   the PE MUST withdraw the Ethernet A-D per EVI route(s) announced for

   the <ESI, Ethernet tags> that are impacted by the decommissioning.

All other mentioning of withdrawing are about mac/ip and per-ES AD route. I think it’s better to explicitly point out in this section that per-EVI routes are not withdrawn (so that when the link comes back only a single per-ES route is advertised) when the link goes down.

Sami: Actually per EVI EAD routes must be withdrawn too

[Sami]

If that's case, is it desirable to assign non-zero ESIs even for single-home case and advertise per ES AD routes so that mass withdraw can be done?

[Sami]
Correct, an operator may chose that. We are not precluding this.

Zzh> The spec should allow it so that a vendor can implement it so that an operator can do it ☺

Sami: Will add to the terminology section that ESI can be set for single-home case too.

Thanks,

Sami