Re: [bess] draft-ietf-bess-evpn-prefix-advertisement-05 comments

"Rabadan, Jorge (Nokia - US/Mountain View)" <> Mon, 16 October 2017 15:31 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 47C85133049 for <>; Mon, 16 Oct 2017 08:31:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.91
X-Spam-Status: No, score=-2.91 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=-1, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (1024-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id FjGIK0U4FkT3 for <>; Mon, 16 Oct 2017 08:31:31 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 024401344FF for <>; Mon, 16 Oct 2017 08:31:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=selector1-nokia-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=XKabGXh5mkArt8d/F6aqLIZHCnk/Wd6tP9Z4Qt455WM=; b=BjVO0Jy0YscfERRFhewP1KlySmSyFxp/9PSR+WUcNwu303/9gISS8ZI69oW27c2r9CxqHnpUyYPMABHb2ZsAMWg0LC+VgKpsIxPCsvnjHypvPAG7HvoUNsReI5erW7DvuMhUfoBHhHbJXRJ/aTVTEDZStZD+zHk7vpth3nNPdY8=
Received: from ( by ( with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id; Mon, 16 Oct 2017 15:31:27 +0000
Received: from ([fe80::a0b3:3d85:7af8:98d7]) by ([fe80::a0b3:3d85:7af8:98d7%13]) with mapi id 15.20.0077.022; Mon, 16 Oct 2017 15:31:26 +0000
From: "Rabadan, Jorge (Nokia - US/Mountain View)" <>
To: "Jeffrey Zhang (Zhaohui)" <>, "" <>, BESS <>
Thread-Topic: draft-ietf-bess-evpn-prefix-advertisement-05 comments
Thread-Index: AdNDWJtgzjJlkeiVRDaWqq9GqYzXUwDOzhWA
Date: Mon, 16 Oct 2017 15:31:26 +0000
Message-ID: <>
References: <>
In-Reply-To: <>
Accept-Language: en-US
Content-Language: en-US
user-agent: Microsoft-MacOutlook/f.27.0.171010
authentication-results: spf=none (sender IP is );
x-originating-ip: []
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; AM4PR07MB3410; 6:muey044rjO9m9hBucyTh2WBoqnYhndfG4aC/62JRjyRWkmhWt0s93gE8Ze30hvE88uhPVEbPXweY85oMNnX/fIG6rSn/ezRpxDNvJEPA38L09H7u35xDTrW6/VtbkrOsJrh+wIE3jI9WXmkt5w/T2YujSRUcx/NisB9kPJA6LGPmCv8z90rJ+m/J8gp0XMaE96Dqrgl1fva4K257oS5604UYPofyZDwNaNyRPCk5VpiIawWqYsCCySzeB21GSHxnA7UBc8iqYJJGkIKa3xcDGO8vENB5Vwyemma9K3jnFnu6pVsF3VXKxVAsUfrBoRBw3npH7daEDdNo4x7DJ+YhjQ==; 5:KDIEaS3prhRWfPnHEoeHeUHmpstPqjhKljZBFzw/vn8awkuox4V05uXOBFrEThHXD9zjVpYPeZrBYcI3dqzoVyu1ZuFmjR41HstoCE69yn/dXhKW/zI6ocJApeWSuhhekpvlRf/ChQlJr4UEcdDS9g==; 24:wkJO5wR8GDxPs50anbM47dLB2QLYbTZrnor2IczQz2bDD4J78OaKOxnB21SU2gqWicLHZgC+SD8MqWScaOrhDmkote/EZHhreopzFOhZa5U=; 7:Hj7e9F1olkn65V+4oA8fRfxfYcJ/ckt8hr/8Mwu1H4B5b6ZTn6uRzdw/XV5vWZ9tHrvCpj9d6nkbqGMl6eCBEG/2hyIhIIP31U1/pXA2JbuFL/IY6XruB6jyxklv0MLGW+eMPkoUijXZWWsHiLirnymIR20HhAHFG+3jOR7OajxBaoLQmOBGaZgPRtYt2SNh7FZdFJOoWS6jKLjYRHk3Tkx5TYGtw24CF6gv8YAad8A=
x-ms-exchange-antispam-srfa-diagnostics: SSOS;SSOR;
x-forefront-antispam-report: SFV:SKI; SCL:-1; SFV:NSPM; SFS:(10019020)(6009001)(346002)(376002)(39860400002)(199003)(189002)(45984002)(377454003)(24454002)(229853002)(478600001)(316002)(230783001)(110136005)(8676002)(7736002)(2900100001)(58126008)(50986999)(76176999)(54356999)(6486002)(14454004)(2501003)(66066001)(8936002)(82746002)(5250100002)(101416001)(81166006)(53546010)(25786009)(81156014)(2906002)(86362001)(3280700002)(305945005)(5890100001)(3660700001)(105586002)(33656002)(2950100002)(106356001)(6246003)(6436002)(36756003)(5660300001)(68736007)(6512007)(3846002)(83716003)(102836003)(6116002)(8666007)(83506001)(53936002)(99286003)(53946003)(1941001)(6506006)(189998001)(97736004); DIR:OUT; SFP:1102; SCL:1; SRVR:AM4PR07MB3410;; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en;
x-ms-office365-filtering-correlation-id: 7438809b-4bdc-407f-41ed-08d514aaf75c
x-ms-office365-filtering-ht: Tenant
x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(2017030254152)(48565401081)(2017052603199)(201703131423075)(201703031133081)(201702281549075); SRVR:AM4PR07MB3410;
x-ms-traffictypediagnostic: AM4PR07MB3410:
x-exchange-antispam-report-test: UriScan:(60795455431006)(138986009662008)(21532816269658);
x-microsoft-antispam-prvs: <>
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(5005006)(8121501046)(100000703101)(100105400095)(93006095)(93001095)(3002001)(10201501046)(6055026)(6041248)(20161123560025)(20161123558100)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123555025)(20161123562025)(20161123564025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:AM4PR07MB3410; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:AM4PR07MB3410;
x-forefront-prvs: 0462918D61
received-spf: None ( does not designate permitted sender hosts)
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="utf-8"
Content-ID: <>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-originalarrivaltime: 16 Oct 2017 15:31:26.8557 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5d471751-9675-428d-917b-70f44f9630b0
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM4PR07MB3410
Archived-At: <>
Subject: Re: [bess] draft-ietf-bess-evpn-prefix-advertisement-05 comments
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: BGP-Enabled ServiceS working group discussion list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 16 Oct 2017 15:31:34 -0000

Hi Jeffrey,

Thank you very much for yet another good look at it. I took most of your suggestions. By the way, I changed the terms MAC-VRF and Core-EVI to BD and SBD as you suggested. I agree it is better.
Please find my responses below, and check out rev 06 that is already posted.

Thank you.

On 10/13/17, 5:20 AM, "Jeffrey (Zhaohui) Zhang" <> wrote:

    Hi Jorge and other co-authors,
    I am giving another round of review as the document shepherd before I do the shepherd write-up.
    Please see some nits/comments/questions below.
       EVPN provides a flexible control plane that allows intra-subnet
       connectivity in an IP/MPLS and/or an NVO-based network.
    Isn't NVO based on IP? There is no pure-IP based EVPN, right? So perhaps either "in an IP/MPLS based overlay network" or "in an MPLS and/or NVO-based network"?
[JORGE] ok
       EVI: EVPN Instance spanning the NVE and PE devices that are
          participating on that EVPN.
[JORGE] ok
       IP-VRF: A VPN Routing and Forwarding table for IP addresses on an
          NVE/PE, similar to the VRF concept defined in [RFC4364], however,
          in this document, the IP routes are always populated by the EVPN
          address family.
    Do we really want to distinguish the IP-VRF in RFC4364 and the one in this document? I think it's really the same IP-VRF - routes could be populated from both EVPN and IP-VPN address family, especially on the DGWs.
[JORGE] ok, changed.
       If we use the term Tenant System (TS) to designate a physical or
       virtual system identified by MAC and IP addresses, and connected to a
       MAC-VRF by an Attachment Circuit, the following considerations apply:
            o Although these VAs provide IP connectivity to VMs and subnets
              behind them, they do not always have their own IP interface
              connected to the EVPN NVE, e.g. layer-2 firewalls are examples
              of VAs not supporting IP interfaces.
    In the above two paragraphs, the first one says the TS is identified by
    MAC "and IP addresses", then the second paragraph says "do not always
    Have their own IP interface". Should "and IP addresses" be changed to
    "and maybe IP address as well"?
[JORGE] ok, changed.
       o TS2 and TS3 are Virtual Appliances (VA) that generate/receive
         traffic from/to the subnets and hosts sitting behind them
[JORGE] ok, changed.
       o Integrated Routing and Bridging interfaces IRB1, IRB2 and IRB3 have
         their own IP addresses that belong to the EVI-10 subnet too. These
         IRB interfaces connect the EVI-10 subnet to Virtual Routing and
         Forwarding (IP-VRF) instances that can route the traffic to other
         connected subnets for the same tenant (within the DC or at the
         other end of the WAN).
    s/connected subnets/subnets/
[JORGE] ok, changed.
       One example of such use cases is the "floating IP" example described
       in section 2.1. In this example we need to decouple the advertisement
       of the prefixes from the advertisement of the floating IP (vIP23 in
       Figure 1) and MAC associated to it, otherwise the solution gets
       highly inefficient and does not scale.
    I understand what the above is trying to say, but had trouble parsing the sentence before "otherwise". I think it would better to say "decouple ... from the advertisement of MAC address of either M2 or M3", as we're advertising with the floating IP as the overlay index (but not the mac).
[JORGE] ok, changed.
       o The GW IP (Gateway IP Address) will be a 32 or 128-bit field (ipv4
         or ipv6), and will encode an overlay IP index for the IP Prefixes.
    s/encode an overlay IP index/encode an IP address as an overlay index/
[JORGE] ok, changed.
       o The MPLS Label field is encoded as 3 octets, where the high-order
         20 bits contain the label value. When sending, the label value
         SHOULD be zero to indicate that recursive resolution is needed. If
         the received MPLS Label value is zero, the route MUST contain an
         Overlay Index and the ingress NVE/PE MUST do recursive resolution
         to find the egress NVE/PE. If the received Label value is non-zero,
         the route will not be used for recursive resolution unless a local
         policy says so.
    How about change the second sentence to the following:
        ... SHOULD be zero if recursive resolution based on overlay index is used.
    Notice the "if".
[JORGE] ok, changed.
       o An Overlay Index can be an ESI, IP address in the address space of
         the tenant or MAC address and it is used by an NVE as the next-hop
         for a given IP Prefix.
    I like it that a mac address can be used as an overlay index; but I don't see how the mac address as overlay index is encoded?
    I see the following later:
       *   MAC with Zero value means no Router's MAC extended community is
           present along with the RT-5. Non-Zero indicates that the extended
           community is present and carries a valid MAC address. Examples of
           invalid MAC addresses are broadcast or multicast MAC addresses.
    It would be good to point out up front (right where the RT-5 format is given) that Router's MAC EC may be attached to the RT-5.

[JORGE] ok, added:
“An IP Prefix Route MAY be sent along with a Router's MAC Extended Community (defined in [EVPN-INTERSUBNET]).”
         It is important to note that recursive
         resolution of the Overlay Index applies upon installation into an
         IP-VRF, and not upon BGP propagation.
    What does the above sentence mean? Why is it important to note? Nothing is upon propagation, right?
[JORGE] I think I added this based on a comment from Eric, where he was suggesting to clarify that no recursive resolution is needed on ABR/ASBRs that process the route but don’t install IP Prefixes in their FIBs (no IP-VRF). I added: “(for instance, on an ASBR)”.
       o Irrespective of the recursive resolution, if there is no IGP or BGP
         route to the BGP next-hop of an RT-5, BGP may fail to install the
         RT-5 even if the Overlay Index can be resolved.
    May? Should? Must?
[JORGE] changed to should.
       The indirection provided by the Overlay Index and its recursive
       lookup resolution is required to achieve fast convergence in case of
       a failure of the object represented by the Overlay Index. For
       instance: in Figure 1, let's assume NVE2/NVE3 advertise 1k RT-5
       routes associated to the floating IP address (GWIP=vIP23) and NVE2
       advertises an RT-2 claiming the ownership of the floating IP, i.e.
       NVE2 encodes vIP23 and M2 in the RT-2. When the floating IP owner
       changes from M2 to M3, a single RT-2 withdraw/update is required to
       indicate the change. The remote DGW will not change any of the 1k
       prefixes associated to vIP23, but will only update the ARP resolution
       entry for vIP23 (now pointing at M3).
    The "for instance" part is a repetition of section 2.2. How about simply referring to section 2.2?
[JORGE] OK, I replaced it with: “(see the example described in section 2.2).”
       | ESI      | GW-IP    | MAC*     | Label      | Overlay Index  |
       | Non-Zero | Zero     | Zero     | Don't Care | ESI            |
       | Non-Zero | Zero     | Non-Zero | Don't Care | ESI            |
       | Zero     | Non-Zero | Zero     | Don't Care | GW-IP          |
       | Zero     | Zero     | Non-Zero | Zero       | MAC            |
       | Zero     | Zero     | Non-Zero | Non-Zero   | MAC or None**  |
       | Zero     | Zero     | Zero     | Non-Zero   | None(IP NVO)***|
    It seems that mac address is a more specific overlay index, so if ESI is also present then the mac address should be used as the overlay index?
[JORGE] I wouldn’t say the mac is more specific. If MAC and ESI are present, ESI is the overlay index. The table includes the combinations that are allowed in the document and described in the use-cases. Rows 1 and 2 are described in section 4.3.
    The fifth row is like a variation of the fourth row;  why isn't there a corresponding variation for each of the first three rows? The following paragraph mentioned earlier seems to apply to all situations.
[JORGE] in rows 4 and 5, the label value 0 or non-0 has a meaning. In the first three rows, the label doesn’t have any meaning. 
       o The MPLS Label field is encoded as 3 octets, where the high-order
         20 bits contain the label value. When sending, the label value
         SHOULD be zero to indicate that recursive resolution is needed. If
         the received MPLS Label value is zero, the route MUST contain an
         Overlay Index and the ingress NVE/PE MUST do recursive resolution
         to find the egress NVE/PE. If the received Label value is non-zero,
         the route will not be used for recursive resolution unless a local
         policy says so.
    I struggled with the "IP NVO" in the sixth row because clearly this is MPLS tunnel not IP tunnel. Then I realized that "IP" here refers to the payload not the tunnel type:
       IP NVO tunnel: it refers to Network Virtualization Overlay tunnels
          with IP payload (no MAC header in the payload).  
    I have to say that "IP NVO tunnel" is a little misleading.
[JORGE] well, that’s why we put it in the terminology in section 1. Let me know if you think the description requires clarification. I’ll leave it as it is for the time being.
    4. IP Prefix Overlay Index use-cases
    4.1 TS IP address Overlay Index use-case
    If you compare the two section titles above, you may realize the first one is a little misleading ("IP Prefix" used as overlay index?). Perhaps change to "4. Overlay Index use-cases"?
[JORGE] ok, changed.
    In section 4.1:
            o Based on the MAC-VRF10 route-target in DGW1 and DGW2, the IP
              Prefix route is also imported and SN1/24 is added to the IP-
              VRF with Overlay Index IP2 pointing at the local MAC-VRF10. We
              assume the RT-5 from NVE2 is preferred over the RT-5 from
              NVE3. Should ECMP be enabled in the IP-VRF and both routes
              equally preferable, SN1/24 would also be added to the routing
              table with Overlay Index IP3.
    The last two sentences seem to be contradicting. One says "preferred over" and the other says "equally preferable".
[JORGE] ok, I clarified it with this sentence:
“In this example, we assume the RT-5 from NVE2 is preferred over the RT-5 from NVE3. If both routes were equally preferable and ECMP enabled, SN1/24 would also be added to the routing table with Overlay Index IP3.”
       (5) When the packet arrives at NVE2:
            o Based on the tunnel information (VNI for the VXLAN case), the
              MAC-VRF10 context is identified for a MAC lookup.
            o Encapsulation is stripped-off and based on a MAC lookup
              (assuming MAC forwarding on the egress NVE), the packet is
              forwarded to TS2, where it will be properly routed.
    If the destination is actually on the TS3 side, how does TS2 send traffic to the final destination? Unless the topology is actually like the one in section 4.2 traffic will get blackholed? 
[JORGE] yes the topology for SN1 is the same. But we wanted to add more subnets and hosts. I added: “We assume SN1/24 is dual-homed to NVE2 and NVE3.”

But then, the only difference between 4.1 and 4.2 is whether two overlay index (in 4.1, with ECMP) or one overlay index (in 4.2) is used?
[JORGE] the difference is the use of a floating IP, which was something we wanted to highlight. 

    In section 4.3:
                 . Destination inner MAC = M2 (this MAC will be obtained
                   from the Router's MAC Extended Community received along
                   with the RT-5 for SN1).
    My understanding is that section 4 is descriptive (use cases). The above really should be "specified" somewhere else not "described" here. OK as I read it on further it becomes a more and more "specificative". 
    I do see some text about the Router's MAC EC in 4.4.1, but should that be pulled out to somewhere that covers all cases (not just 4.4.1).
[JORGE] I added the Router’s MAC EC at the end of section 3.1.
    BTW - it's important to emphasize that the Router's MAC EC here is used to carry TS MAC address not the "Router's MAC address" :-)
[JORGE] True. Added: “Note that the Router's MAC Extended Community is used in this case to carry the TS' MAC address, as opposed to the NVE/PE's MAC address.”
    Section 4.4:
       In order to provide connectivity for (1), MAC/IP routes (RT-2) are
       needed so that IRB or TS MACs and IPs can be distributed.
       Connectivity type (2) is accomplished by the exchange of IP Prefix
       routes (RT-5) for IPs and subnets sitting behind certain Overlay
       Indexes, e.g. GW IP or ESI.
    "e.g. GW IP or ESI or TS MAC"
[JORGE] OK, added.
       ... If
       no recursive resolution is needed, the core EVI may not be needed and
       the IP-VRFs may be connected directly by Ethernet or IP NVO tunnels.
    Even if the core EVI is needed, the tunnels are still ethernet tunnels, right? Perhaps the last sentence should really be "... connected directly by tenant (non-core) EVIs"?
[JORGE] hmm... the IP-VRFs are connected by tunnels that can have an ethernet payload (e.g. VXLAN) or ip payload. I think adding here “tenant EVIs” to connect IP-VRFs and not TSes may be confusing. I prefer to leave it as it is.
       Depending on the existence and characteristics of the core-facing IRB
       interface in the core EVI, there are three different IP-VRF-to-IP-VRF
       scenarios identified and described in this document:
       1) Interface-less model
       2) Interface-ful with core-facing IRB model
       3) Interface-ful with unnumbered core-facing IRB model
    I once commented that the "interface-less" and "interface-full" here are convoluted. It really means if a core EVI and if core VRF IRBs are used. While I am not requesting to change the terms, it would be good to point out what it really means. Proposed new text:
       Depending on the existence and characteristics of the core EVI and
       IRB interfaces for the core-VRFs, there are three different IP-VRF-to-IP-VRF
       scenarios identified and described in this document:
       1) Interface-less model: no core EVI, no overlay index.
       2) Interface-ful with core-VRF IRB model: core EVI, IP address as overlay index.
       3) Interface-ful with unnumbered core-VRF IRB model: core EVI, mac address as overlay index.
[JORGE] OK, I changed it.
    BTW, I would still prefer to rename the "core EVI" to "Supplemental BD" for two reasons:
    - The "core" wording is confusing/misleading, because all the EVIs go over the core.
    - The "core EVI" is really the same as the "Supplemental BD" in draft-lin.
    So why not take this opportunity to use the proper name?
[JORGE] I think you are right. I changed core EVI to SBD. I also changed all the EVI and MAC-VRFs in the diagrams to BDs, so that we make it more generic for the 3 different service models.
       d) The core EVI is composed of the NVE/DGW MAC-VRFs and may contain
          other MAC-VRFs without IRB interfaces. Those non-IRB MAC-VRFs will
          typically connect TSes that need layer-3 connectivity to remote
    Can you elaborate the "other MAC-VRFs w/o IRB interfaces"? I have two confusions here:
    - you already mention "NVE/DGW MAC-VRFs", so what are "other" MAC-VRFs?
    - If you want to say some MAC-VRFs do not have IRB interfaces, perhaps just say:
       d) The core EVI is composed of NVE/DGW MAC-VRFs w/ or w/o IRB interfaces.
    But how to get remote traffic to those NVEs w/o core-VRF IRBs using this model?
[JORGE] you are right that it is confusing. I removed that piece.
            o Label value SHOULD be zero since the RT-5 route requires a
              recursive lookup resolution to an RT-2 route. The MPLS label
              or VNI to be used when forwarding packets will be derived from
              the RT-2's MPLS Label1 field. The RT-5's Label field will be
              ignored on reception.
    Perhaps swap the last two sentences:
            o Label value SHOULD be zero since the RT-5 route requires a
              recursive lookup resolution to an RT-2 route. It is ignored on
              reception, and the MPLS label or VNI from the RT-2's MPLS
              Label1 field is used when forwarding packets.
[JORGE] ok, I modified it to improve readability.
    Section 5:
       c) Allows a flexible implementation where the prefix can be linked to
          different types of Overlay Indexes: overlay IP address, overlay
          MAC addresses, overlay ESI, underlay BGP next-hops, etc.
       c) Allows a flexible implementation where the prefix can be linked to
          different types of Overlay/Underlay Indexes: overlay IP address, overlay
          MAC addresses, overlay ESI, underlay BGP next-hops, etc.
[JORGE] ok, changed.