[bess] draft-ietf-bess-evpn-prefix-advertisement-05 comments

"Jeffrey (Zhaohui) Zhang" <zzhang@juniper.net> Fri, 13 October 2017 03:20 UTC

Return-Path: <zzhang@juniper.net>
X-Original-To: bess@ietfa.amsl.com
Delivered-To: bess@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 56E97134323 for <bess@ietfa.amsl.com>; Thu, 12 Oct 2017 20:20:16 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.021
X-Spam-Level:
X-Spam-Status: No, score=-2.021 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=juniper.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FzhyxVr3euUW for <bess@ietfa.amsl.com>; Thu, 12 Oct 2017 20:20:13 -0700 (PDT)
Received: from NAM01-SN1-obe.outbound.protection.outlook.com (mail-sn1nam01on0109.outbound.protection.outlook.com [104.47.32.109]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2F0561342ED for <bess@ietf.org>; Thu, 12 Oct 2017 20:20:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=juniper.net; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=I4XE1n9l1KKpvzClqPxmeMF8AMOj3q5ceewMKpGdeL0=; b=ItgZ+KH3zuX1PZ8wL/xR1dnF0A0V++ktdrOsJJDj6z0cqR3T5O8UIcKCSCvZYJ8qeTsB3QoYs86fALxGTVfjuKY44EFnRw7f8G18vV1FFj9FSLCVeig+xd/EBdmQVYqSOhEgpUQ5zAK7QV2VbsRArkUbFbxg9EttkHheZ1dmX18=
Received: from DM5PR05MB3145.namprd05.prod.outlook.com (10.173.219.15) by DM5PR05MB3146.namprd05.prod.outlook.com (10.173.219.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.77.5; Fri, 13 Oct 2017 03:20:11 +0000
Received: from DM5PR05MB3145.namprd05.prod.outlook.com ([10.173.219.15]) by DM5PR05MB3145.namprd05.prod.outlook.com ([10.173.219.15]) with mapi id 15.20.0077.021; Fri, 13 Oct 2017 03:20:11 +0000
From: "Jeffrey (Zhaohui) Zhang" <zzhang@juniper.net>
To: "Rabadan, Jorge (Nokia - US/Mountain View)" <jorge.rabadan@nokia.com>, "draft-ietf-bess-evpn-prefix-advertisement@tools.ietf.org" <draft-ietf-bess-evpn-prefix-advertisement@tools.ietf.org>, BESS <bess@ietf.org>
Thread-Topic: draft-ietf-bess-evpn-prefix-advertisement-05 comments
Thread-Index: AdNDWJtgzjJlkeiVRDaWqq9GqYzXUw==
Date: Fri, 13 Oct 2017 03:20:11 +0000
Message-ID: <DM5PR05MB31455D67D3F259C157889ABBD4480@DM5PR05MB3145.namprd05.prod.outlook.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: spf=none (sender IP is ) smtp.mailfrom=zzhang@juniper.net;
x-originating-ip: [173.76.166.92]
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; DM5PR05MB3146; 6:l86FColWaNZs3WlU9T8eV6d92MXdYaIpw5nLpS3kOiwGH5XuqGVwvai+Bh30BSds1pGo1P4Zt9pEchLT2PSXLRp/U6obChPayK+Lq7shXmr/3Vbkj1oPco2YNb+zd9NYd1Q3pwn0+mz0XQcn5ApCT0naLhZbI6gllWvTsc2jS9Nr+rXiexz+fIMcSrqbjH73PJv1HtTH9bbtnit6tAv6bi58X722jI+hZ3tUXbIVVDulZhXAfmd/7DZn0+qQWuPz8y5Ar7ooMLZAf9Ae7QX1BKREXHr1yhKxPm//YW8jelMSSksviktuQsxrRdyRe5lmgzLE8+bmL4c0fCC3sCHg2Q==; 5:qk3gGPqMF+Jhw4KfIKUTP5+ai0LDHxPTcPvTLxOrPfZ/z84ofilmDixgMo35ta6uU/XBMyVadMjavsBVc4pPRF00T4VwVIhxvTLVtMA2IaaI6IkIn8N6081TLCtFMgo53KT1Ct8/4GCDJTfZYLqM0rVpwvwfaQOiLAEI3IevDSw=; 24:vsuGvZBxDQTco8J/1bHtha9CHAdSGeAf/Fsg/60mwC8Xtt+ANyART5jMCEUcnsHEJn1RhNHx35i7IUlyOqoZM236W2A//91S/w2IxmGnjrI=; 7:MOcvG0MwuWNP1E0AyWDD5BnxUxlIp13nGVLGQws+vK9yxIje5Ln3smjXU30lZExVRtrDxo3iExd5nZ/K4XcpNZapSn0vMB0dSs7nnLUOYEg29XZun8BwDoIJqhoMpwgCXf6Omsk5hTq1cpSWiRPlcztoJgy9ijkmOCQSC+ekwpt46scQDzyHFAtAqYprglF8xK6GPBQpjHTCmaHFlTKPOH53qYYwPH53No2igp6ewP8=
x-ms-exchange-antispam-srfa-diagnostics: SSOS;
x-ms-office365-filtering-correlation-id: 2a75dc0a-6e3b-4af3-878e-08d511e9504a
x-ms-office365-filtering-ht: Tenant
x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(2017030254152)(48565401081)(2017052603199)(201703131423075)(201703031133081)(201702281549075); SRVR:DM5PR05MB3146;
x-ms-traffictypediagnostic: DM5PR05MB3146:
x-exchange-antispam-report-test: UriScan:(60795455431006)(21532816269658);
x-microsoft-antispam-prvs: <DM5PR05MB3146B65AC14487AB0FB84D7CD4480@DM5PR05MB3146.namprd05.prod.outlook.com>
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(8121501046)(5005006)(100000703101)(100105400095)(3002001)(10201501046)(93006095)(93001095)(6055026)(6041248)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123562025)(20161123564025)(20161123560025)(20161123555025)(20161123558100)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:DM5PR05MB3146; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:DM5PR05MB3146;
x-forefront-prvs: 04599F3534
x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(6009001)(376002)(39860400002)(346002)(199003)(189002)(6116002)(25786009)(2900100001)(68736007)(55016002)(97736004)(9686003)(53936002)(99286003)(50986999)(101416001)(54356999)(86362001)(5660300001)(7696004)(189998001)(478600001)(66066001)(105586002)(106356001)(230783001)(33656002)(8936002)(6506006)(6436002)(8676002)(81156014)(81166006)(110136005)(7736002)(77096006)(316002)(14454004)(5890100001)(2501003)(305945005)(3846002)(2906002)(102836003)(74316002)(3660700001)(3280700002); DIR:OUT; SFP:1102; SCL:1; SRVR:DM5PR05MB3146; H:DM5PR05MB3145.namprd05.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en;
received-spf: None (protection.outlook.com: juniper.net does not designate permitted sender hosts)
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: juniper.net
X-MS-Exchange-CrossTenant-originalarrivaltime: 13 Oct 2017 03:20:11.4282 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: bea78b3c-4cdb-4130-854a-1d193232e5f4
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR05MB3146
Archived-At: <https://mailarchive.ietf.org/arch/msg/bess/SBkPOrkyZVKcaP2V--xH-4R6N_E>
Subject: [bess] draft-ietf-bess-evpn-prefix-advertisement-05 comments
X-BeenThere: bess@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: BGP-Enabled ServiceS working group discussion list <bess.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bess>, <mailto:bess-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bess/>
List-Post: <mailto:bess@ietf.org>
List-Help: <mailto:bess-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bess>, <mailto:bess-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 13 Oct 2017 03:20:16 -0000

Hi Jorge and other co-authors,

I am giving another round of review as the document shepherd before I do the shepherd write-up.
Please see some nits/comments/questions below.

Thanks!
Jeffrey

--------------------------------------------------------------------------------------

   EVPN provides a flexible control plane that allows intra-subnet
   connectivity in an IP/MPLS and/or an NVO-based network.

Isn't NVO based on IP? There is no pure-IP based EVPN, right? So perhaps either "in an IP/MPLS based overlay network" or "in an MPLS and/or NVO-based network"?

   EVI: EVPN Instance spanning the NVE and PE devices that are
      participating on that EVPN.

"NVE/PE"?

   IP-VRF: A VPN Routing and Forwarding table for IP addresses on an
      NVE/PE, similar to the VRF concept defined in [RFC4364], however,
      in this document, the IP routes are always populated by the EVPN
      address family.

Do we really want to distinguish the IP-VRF in RFC4364 and the one in this document? I think it's really the same IP-VRF - routes could be populated from both EVPN and IP-VPN address family, especially on the DGWs.

   If we use the term Tenant System (TS) to designate a physical or
   virtual system identified by MAC and IP addresses, and connected to a
   MAC-VRF by an Attachment Circuit, the following considerations apply:
         ...
        o Although these VAs provide IP connectivity to VMs and subnets
          behind them, they do not always have their own IP interface
          connected to the EVPN NVE, e.g. layer-2 firewalls are examples
          of VAs not supporting IP interfaces.

In the above two paragraphs, the first one says the TS is identified by
MAC "and IP addresses", then the second paragraph says "do not always
Have their own IP interface". Should "and IP addresses" be changed to
"and maybe IP address as well"?

   o TS2 and TS3 are Virtual Appliances (VA) that generate/receive
     traffic from/to the subnets and hosts sitting behind them

s/generate/send/

   o Integrated Routing and Bridging interfaces IRB1, IRB2 and IRB3 have
     their own IP addresses that belong to the EVI-10 subnet too. These
     IRB interfaces connect the EVI-10 subnet to Virtual Routing and
     Forwarding (IP-VRF) instances that can route the traffic to other
     connected subnets for the same tenant (within the DC or at the
     other end of the WAN).

s/connected subnets/subnets/

   One example of such use cases is the "floating IP" example described
   in section 2.1. In this example we need to decouple the advertisement
   of the prefixes from the advertisement of the floating IP (vIP23 in
   Figure 1) and MAC associated to it, otherwise the solution gets
   highly inefficient and does not scale.

I understand what the above is trying to say, but had trouble parsing the sentence before "otherwise". I think it would better to say "decouple ... from the advertisement of MAC address of either M2 or M3", as we're advertising with the floating IP as the overlay index (but not the mac).

   o The GW IP (Gateway IP Address) will be a 32 or 128-bit field (ipv4
     or ipv6), and will encode an overlay IP index for the IP Prefixes.

s/encode an overlay IP index/encode an IP address as an overlay index/

   o The MPLS Label field is encoded as 3 octets, where the high-order
     20 bits contain the label value. When sending, the label value
     SHOULD be zero to indicate that recursive resolution is needed. If
     the received MPLS Label value is zero, the route MUST contain an
     Overlay Index and the ingress NVE/PE MUST do recursive resolution
     to find the egress NVE/PE. If the received Label value is non-zero,
     the route will not be used for recursive resolution unless a local
     policy says so.

How about change the second sentence to the following:

    ... SHOULD be zero if recursive resolution based on overlay index is used.

Notice the "if".

   o An Overlay Index can be an ESI, IP address in the address space of
     the tenant or MAC address and it is used by an NVE as the next-hop
     for a given IP Prefix.

I like it that a mac address can be used as an overlay index; but I don't see how the mac address as overlay index is encoded?
I see the following later:

   *   MAC with Zero value means no Router's MAC extended community is
       present along with the RT-5. Non-Zero indicates that the extended
       community is present and carries a valid MAC address. Examples of
       invalid MAC addresses are broadcast or multicast MAC addresses.

It would be good to point out up front (right where the RT-5 format is given) that Router's MAC EC may be attached to the RT-5.

     It is important to note that recursive
     resolution of the Overlay Index applies upon installation into an
     IP-VRF, and not upon BGP propagation.

What does the above sentence mean? Why is it important to note? Nothing is upon propagation, right?

   o Irrespective of the recursive resolution, if there is no IGP or BGP
     route to the BGP next-hop of an RT-5, BGP may fail to install the
     RT-5 even if the Overlay Index can be resolved.

May? Should? Must?

   The indirection provided by the Overlay Index and its recursive
   lookup resolution is required to achieve fast convergence in case of
   a failure of the object represented by the Overlay Index. For
   instance: in Figure 1, let's assume NVE2/NVE3 advertise 1k RT-5
   routes associated to the floating IP address (GWIP=vIP23) and NVE2
   advertises an RT-2 claiming the ownership of the floating IP, i.e.
   NVE2 encodes vIP23 and M2 in the RT-2. When the floating IP owner
   changes from M2 to M3, a single RT-2 withdraw/update is required to
   indicate the change. The remote DGW will not change any of the 1k
   prefixes associated to vIP23, but will only update the ARP resolution
   entry for vIP23 (now pointing at M3).

The "for instance" part is a repetition of section 2.2. How about simply referring to section 2.2?

   +----------+----------+----------+------------+----------------+
   | ESI      | GW-IP    | MAC*     | Label      | Overlay Index  |
   |--------------------------------------------------------------|
   | Non-Zero | Zero     | Zero     | Don't Care | ESI            |
   | Non-Zero | Zero     | Non-Zero | Don't Care | ESI            |
   | Zero     | Non-Zero | Zero     | Don't Care | GW-IP          |
   | Zero     | Zero     | Non-Zero | Zero       | MAC            |
   | Zero     | Zero     | Non-Zero | Non-Zero   | MAC or None**  |
   | Zero     | Zero     | Zero     | Non-Zero   | None(IP NVO)***|
   +----------+----------+----------+------------+----------------+

It seems that mac address is a more specific overlay index, so if ESI is also present then the mac address should be used as the overlay index?

The fifth row is like a variation of the fourth row;  why isn't there a corresponding variation for each of the first three rows? The following paragraph mentioned earlier seems to apply to all situations.

   o The MPLS Label field is encoded as 3 octets, where the high-order
     20 bits contain the label value. When sending, the label value
     SHOULD be zero to indicate that recursive resolution is needed. If
     the received MPLS Label value is zero, the route MUST contain an
     Overlay Index and the ingress NVE/PE MUST do recursive resolution
     to find the egress NVE/PE. If the received Label value is non-zero,
     the route will not be used for recursive resolution unless a local
     policy says so.

I struggled with the "IP NVO" in the sixth row because clearly this is MPLS tunnel not IP tunnel. Then I realized that "IP" here refers to the payload not the tunnel type:

   IP NVO tunnel: it refers to Network Virtualization Overlay tunnels
      with IP payload (no MAC header in the payload).  

I have to say that "IP NVO tunnel" is a little misleading.

4. IP Prefix Overlay Index use-cases
4.1 TS IP address Overlay Index use-case

If you compare the two section titles above, you may realize the first one is a little misleading ("IP Prefix" used as overlay index?). Perhaps change to "4. Overlay Index use-cases"?

In section 4.1:

        o Based on the MAC-VRF10 route-target in DGW1 and DGW2, the IP
          Prefix route is also imported and SN1/24 is added to the IP-
          VRF with Overlay Index IP2 pointing at the local MAC-VRF10. We
          assume the RT-5 from NVE2 is preferred over the RT-5 from
          NVE3. Should ECMP be enabled in the IP-VRF and both routes
          equally preferable, SN1/24 would also be added to the routing
          table with Overlay Index IP3.

The last two sentences seem to be contradicting. One says "preferred over" and the other says "equally preferable".

   (5) When the packet arrives at NVE2:
        o Based on the tunnel information (VNI for the VXLAN case), the
          MAC-VRF10 context is identified for a MAC lookup.
        o Encapsulation is stripped-off and based on a MAC lookup
          (assuming MAC forwarding on the egress NVE), the packet is
          forwarded to TS2, where it will be properly routed.

If the destination is actually on the TS3 side, how does TS2 send traffic to the final destination? Unless the topology is actually like the one in section 4.2 traffic will get blackholed? But then, the only difference between 4.1 and 4.2 is whether two overlay index (in 4.1, with ECMP) or one overlay index (in 4.2) is used?

In section 4.3:

             . Destination inner MAC = M2 (this MAC will be obtained
               from the Router's MAC Extended Community received along
               with the RT-5 for SN1).

My understanding is that section 4 is descriptive (use cases). The above really should be "specified" somewhere else not "described" here. OK as I read it on further it becomes a more and more "specificative". 

I do see some text about the Router's MAC EC in 4.4.1, but should that be pulled out to somewhere that covers all cases (not just 4.4.1).

BTW - it's important to emphasize that the Router's MAC EC here is used to carry TS MAC address not the "Router's MAC address" :-)

Section 4.4:

   In order to provide connectivity for (1), MAC/IP routes (RT-2) are
   needed so that IRB or TS MACs and IPs can be distributed.
   Connectivity type (2) is accomplished by the exchange of IP Prefix
   routes (RT-5) for IPs and subnets sitting behind certain Overlay
   Indexes, e.g. GW IP or ESI.

"e.g. GW IP or ESI or TS MAC"

   ... If
   no recursive resolution is needed, the core EVI may not be needed and
   the IP-VRFs may be connected directly by Ethernet or IP NVO tunnels.

Even if the core EVI is needed, the tunnels are still ethernet tunnels, right? Perhaps the last sentence should really be "... connected directly by tenant (non-core) EVIs"?

   Depending on the existence and characteristics of the core-facing IRB
   interface in the core EVI, there are three different IP-VRF-to-IP-VRF
   scenarios identified and described in this document:

   1) Interface-less model
   2) Interface-ful with core-facing IRB model
   3) Interface-ful with unnumbered core-facing IRB model

I once commented that the "interface-less" and "interface-full" here are convoluted. It really means if a core EVI and if core VRF IRBs are used. While I am not requesting to change the terms, it would be good to point out what it really means. Proposed new text:

   Depending on the existence and characteristics of the core EVI and
   IRB interfaces for the core-VRFs, there are three different IP-VRF-to-IP-VRF
   scenarios identified and described in this document:

   1) Interface-less model: no core EVI, no overlay index.
   2) Interface-ful with core-VRF IRB model: core EVI, IP address as overlay index.
   3) Interface-ful with unnumbered core-VRF IRB model: core EVI, mac address as overlay index.

BTW, I would still prefer to rename the "core EVI" to "Supplemental BD" for two reasons:

- The "core" wording is confusing/misleading, because all the EVIs go over the core.
- The "core EVI" is really the same as the "Supplemental BD" in draft-lin.

So why not take this opportunity to use the proper name?

4.4.2:

   d) The core EVI is composed of the NVE/DGW MAC-VRFs and may contain
      other MAC-VRFs without IRB interfaces. Those non-IRB MAC-VRFs will
      typically connect TSes that need layer-3 connectivity to remote
      subnets.

Can you elaborate the "other MAC-VRFs w/o IRB interfaces"? I have two confusions here:
- you already mention "NVE/DGW MAC-VRFs", so what are "other" MAC-VRFs?
- If you want to say some MAC-VRFs do not have IRB interfaces, perhaps just say:

   d) The core EVI is composed of NVE/DGW MAC-VRFs w/ or w/o IRB interfaces.

But how to get remote traffic to those NVEs w/o core-VRF IRBs using this model?

        o Label value SHOULD be zero since the RT-5 route requires a
          recursive lookup resolution to an RT-2 route. The MPLS label
          or VNI to be used when forwarding packets will be derived from
          the RT-2's MPLS Label1 field. The RT-5's Label field will be
          ignored on reception.

Perhaps swap the last two sentences:

        o Label value SHOULD be zero since the RT-5 route requires a
          recursive lookup resolution to an RT-2 route. It is ignored on
          reception, and the MPLS label or VNI from the RT-2's MPLS
          Label1 field is used when forwarding packets.

Section 5:

   c) Allows a flexible implementation where the prefix can be linked to
      different types of Overlay Indexes: overlay IP address, overlay
      MAC addresses, overlay ESI, underlay BGP next-hops, etc.

Perhaps:

   c) Allows a flexible implementation where the prefix can be linked to
      different types of Overlay/Underlay Indexes: overlay IP address, overlay
      MAC addresses, overlay ESI, underlay BGP next-hops, etc.