Re: [bess] Benjamin Kaduk's Discuss on draft-ietf-bess-evpn-inter-subnet-forwarding-09: (with DISCUSS and COMMENT)

Benjamin Kaduk <kaduk@mit.edu> Fri, 30 October 2020 00:26 UTC
Date: Thu, 29 Oct 2020 17:26:10 -0700
From: Benjamin Kaduk <kaduk@mit.edu>
To: "Ali Sajassi (sajassi)" <sajassi@cisco.com>
Cc: The IESG <iesg@ietf.org>, "draft-ietf-bess-evpn-inter-subnet-forwarding@ietf.org" <draft-ietf-bess-evpn-inter-subnet-forwarding@ietf.org>, "bess-chairs@ietf.org" <bess-chairs@ietf.org>, "bess@ietf.org" <bess@ietf.org>, Zhaohui Zhang <zzhang@juniper.net>
Message-ID: <20201030002610.GA39170@kduck.mit.edu>
References: <159476040701.14459.4825957938068100547@ietfa.amsl.com> <00556735-AFE3-4F2D-9280-6B3CC2348F22@cisco.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <00556735-AFE3-4F2D-9280-6B3CC2348F22@cisco.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/bess/FZVblkeKUlBIXY-ztoME29h6RK0>
Subject: Re: [bess] Benjamin Kaduk's Discuss on draft-ietf-bess-evpn-inter-subnet-forwarding-09: (with DISCUSS and COMMENT)
Precedence: list
Hi Ali,

Sorry for the delayed response.  Comments inline.

On Thu, Sep 03, 2020 at 06:17:01AM +0000, Ali Sajassi (sajassi) wrote:
> Hi Ben,
> 
> Thanks very much for your review and comments. Please refer to my replies inline marked with [AS].
> 
> On 7/14/20, 2:00 PM, "Benjamin Kaduk via Datatracker" <noreply@ietf.org> wrote:
> 
>     Benjamin Kaduk has entered the following ballot position for
>     draft-ietf-bess-evpn-inter-subnet-forwarding-09: Discuss
> 
>     When responding, please keep the subject line intact and reply to all
>     email addresses included in the To and CC lines. (Feel free to cut this
>     introductory paragraph, however.)
> 
> 
>     Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
>     for more information about IESG DISCUSS and COMMENT positions.
> 
> 
>     The document, along with other ballot positions, can be found here:
>     https://datatracker.ietf.org/doc/draft-ietf-bess-evpn-inter-subnet-forwarding/
> 
> 
> 
>     ----------------------------------------------------------------------
>     DISCUSS:
>     ----------------------------------------------------------------------
> 
>     (1) Possibly a "discuss discuss", but ...
>     if I'm understanding correctly, the symmetric IRB case over an Ethernet
>     NVO tunnel (not MPLS or IP NVO) described in this document is
>     introducing a new scenario where traffic using router (PE) MAC addresses
>     as source and destination is comingled on the same tunnel with traffic
>     using tenant system MAC addresses as source and destination.  This
>     places an obligation on the tunnel endpoints to properly isolate and
>     process such "internal" tunnel traffic without hampering the ability of
>     tenans systems to communicate.  In a world where tenant systems can
>     appear at any time, using previously unknown MAC addresses, this
>     represents a rather challenging problem: how will the PEs be able to
>     pick (and avertise) MAC addresses that they know will not conflict with
>     any present or future customer systems?  (A similar dilemma led to quite
>     a delay in the processing of draft-ietf-bfd-vxlan, which in that case
>     was resolved by limiting the BFD operation to just the "management VNI"
>     which is not subject to MAC address conflict with customer systems.)  In
>     this docuement's case, we seem to be using a "well-known"/reserved MAC
>     address range from RFC 5798; in principle, this should be enough to
>     avoid conflicts, if customer systems are known to not squat on this
>     reserved range.  However, this document has some text in Section 4.1
>     that indicates that there must be external knowledge that auto-derived
>     MAC addresses in the RFC 5798 ranges "[do] not collide with any customer
>     MAC address".  So I'm left uncertain whether or not this potential
>     problem is addressed or not.  (Also, I don't know if the limit on 255
>     distinct such MAC addresses presents a scaling issue.)
> 
> [AS] Because the MAC addresses based on RFC 5798 is from a reserved range, there should be no conflict with customer MAC addresses that typically use their corresponding vendor OUI. If there was a conflict, then we would have the same issue with VRRP. However, this document doesn't want to give a blank check for auto-derivation of MAC addresses and that's why the statement about "... auto- derived MAC does not collide with any customer MAC address".

Okay, thanks for the extra explanation.

>     Also, is there any risk that the EVPN IRB setup might also want to use
>     VRRPv3, and thus collide on the MAC addresses in a different manner?
> 
> [AS] No, because redundancy procedure in EVPN replaces VRRP protocol. 

Should we say something like that in the document ("because EVPN provides a
superset of VRRP functionality, VRRP MUST NOT be used when EVPN is used")?

>     (1.1) I'm less sure whether there's a similar collision risk for IP
>     addresses -- we assign IP addresses to NVEs (e.g., for use as BGP Next
>     Hop addresses) and these are used as VTEP addresses when encapsulating
>     packets that are going inter-subnet.  I think that at this point in the
>     packet processing we may already know that we're in the the "IRB tunnel"
>     scope and that may be enough to de-conflict any potential IP address
>     collision between NVE and customer addresses.
> 
> [AS] customer IP addresses are from overlay-space; whereas, NVE addresses are from underlay space. Each customer has its own overlay address space that can overlap with other customers or underlay space but that doesn't cause an issue because they are kept in their own set of VRFs (its own set of routing tables).  

Thanks for confirming.

>     (2) Section 7 appears to reference (in a normative fashion)
>     [IRB-EXT-MOBILITY] but there is no such reference.
> 
> [AS] It is fixed now.
> 
>     Similarly (as Murray notes), there are a couple apparent references to
>     [TUNNEL-ENCAP] that are also arguably normative, but the target of the
>     reference is not forthcoming (and the IANA registry does not show a "BGP
>     Encapsulation Extended Community" that is supposedly defined by
>     [TUNNEL-ENCAP]).

Thanks for updating the references.  I want to ask again about the status
as informative vs normative, though -- it seems that we say things like
"The procedures in [EXTENDED-MOBILITY] must be exercised" and require the
use of the [TUNNEL-ENCAP] Extended Communit(ies) alongside the RT-2 EVPN
NLRI.  That seems to make them required reading in order to actually
implement/use this functionality.


> [AS]. "Encapsulation Extended Community" is in IANA registry: https://www.iana.org/assignments/bgp-extended-communities/bgp-extended-communities.xhtml. 

Oops, my mistake.  I can only assume that I searched for the string with
leading "BGP", which in hindsight doesn't make much sense.

> 
>     There is also not a listed reference for [EVPN-PREFIX], though one could
>     perhaps surmise that it is intended to be
>     [I-D.ietf-bess-evpn-prefix-advertisement] (given that the latter does
>     allocate EVPN route type 5 for carrying IP Prefix routes, etx.).
>     However, given that this document has text like "MUST advertise local
>     subnet routes as RT-5", this needs to be a normative (not informative)
>     reference.  (We may also want to explicitly reference [EVPN-PREFIX]
>     where those normative requirements are made.)
> 
> [AS] Agreed. Moved it to normative.
> 
>     (3) I'm not sure whether we are modifying the error-handling semantics
>     for RT-2 from what is specified in RFC 7432 (and, furthermore, whether
>     the changes are backwards-compatible).  If so, it seems like we may need
>     an Updates: relationship.  The text in question is in Section 9.1.1
>     (which itself seems problematic, since this section is advertised as an
>     (example) operational model/scenario):
> 
> [AS] We are not modifying the error-handling semantics for RT-2 in RFC 7432. Section 9.1.1 is about IRB and RFC 7432 does not cover IRB - it is for pure L2.  

I guess I can see how anything attempting IRB could be an
error/treat-as-withdraw for a pure 7432 implementation.  But I think the
underlying point remains that we are changing the semantics for the RT-2
EVPN NLRI structure -- we assign semantics to the Label2 field, and we
allow Label1/Label2 to be VNI values in addition to MPLS labels.  Shouldn't
this be indicated somehow, whether an Updates: 7432 relationship, or an
additional reference for the EVPN Route Types registry entry?

>        If the receiving NVE receives a RT-2 with only Label-1 and only a
>        single Route Target corresponding to IP-VRF, or if it receives a RT-2
>        with only a single Route Target corresponding to MAC-VRF but with
>        both Label-1 and Label-2, or if it receives a RT-2 with MAC Address
>        Length of zero, then it MUST treat the route as withdraw [RFC7606]
>        and log an error message.
> 
>     Are all of these treat-as-withdraw behaviors specified in RFC 7432?
> 
> [AS] No, because they are related to malformed IRB advertisements; whereas, RFC 7432 is for pure L2.
> 
>     (4) Let's discuss whether we need more generally a "backwards
>     compatibility" section (I mention a couple other places where this might
>     come into play, in the COMMENT).

Part of why I asked about this was because the behavior of a system
containing a pure 7432 node and a node that attempts to use IRB
functionality was not clear to me.  The above discussion suggests that IRB
will just fail to operate for the legacy node, as opposed to some kind of
degraded L2-only service, but the discussion below suggests that the
L2-only service continues to work.  It seems like it would be worth
mentioning which of these cases would apply, for a mixed deployment.

> 
>     ----------------------------------------------------------------------
>     COMMENT:
>     ----------------------------------------------------------------------
> 
>     The shepherd writeup links to the IPR search page, which finds a
>     disclosure, yet also claims "no discussions".  In the absence of
>     explicit discussion, it's not clear that the WG actively concluded that
>     it's appropriate to publish despite the potential encumberance, though
>     I'm told that this is fairly common in the routing area (hence only a COMMENT).
> 
>     Please use the BCP 14 boilerplate as given in RFC 8174 (i.e., including
>     "NOT RECOMMENDED" as a keyword).
> 
> [AS] done.
> 
>     I'm not entirely sure that the mobility procedures are fully specified
>     for producing correct operation, but I'm not confident enough in my
>     understanding of the setup to ballot Discuss over it.  I have lots of
>     inline comments for Section 7 (and subsections).
> 
>     Section 2
> 
>        EVPN [RFC7432] provides an extensible and flexible multi-homing VPN
>        solution over an MPLS/IP network for intra-subnet connectivity among
>        Tenant Systems (TSes) and End Devices that can be physical or
>        virtual; where an IP subnet is represented by an EVI for a VLAN-based
>        service or by an (EVI, VLAN) for a VLAN-aware bundle service.
> 
>     In the terminology section (Broadcast Domain definition) we considered
>     three classes of model: VLAN-bundle, VLAN-based, and VLAN-aware.  Should
>     we mention VLAN-bundle here as well?
> 
> [AS] Nice observation! But when talking about intra-subnet forwarding, a subnet is mapped to a VLAN, that's why we mention VLAN-based and VLAN-aware bundle service.

Ah, thanks for the explanation.

>        The inter-subnet communication is traditionally achieved at
>        centralized L3 Gateway (L3GW) devices where all the inter-subnet
>        forwarding are performed and all the inter-subnet communication
>        policies are enforced.  When two TSes belonging to two different
>     [...]
>        In order to overcome the drawback of centralized layer-3 GW approach,
>        IRB functionality is needed on the PEs (also referred to as EVPN
>        NVEs) attached to TSes in order to avoid inefficient forwarding of
>        tenant traffic (i.e., avoid back-hauling and hair-pinning).  When a
> 
>     It feels like there's a bit of jargon in here that wasn't captured in
>     the terminology section (or references).  None of it seems particularly
>     inscrutable to me, personally, but perhaps it's worth another thought.
> 
>        PE with IRB capability receives tenant traffic over an Attachment
>        Circuit (AC), it can not only locally bridge the tenant intra-subnet
>        traffic but also can locally route the tenant inter-subnet traffic on
>        a packet by packet basis thus meeting the requirements for both intra
>        and inter-subnet forwarding and avoiding non-optimum traffic
>        forwarding associated with centralized layer-3 GW approach.
> 
>     nit(?): mybe "avoiding the non-optimal traffic forwarding"?
> 
> [AS] OK. Changed "non-optimum" to non-optimal"
> 
>     Section 3
> 
>        BD).  If service interfaces for an EVPN PE are configured in VLAN-
>        Based mode (i.e., section 6.1 of RFC7432), then there is only a
>        single BT per MAC-VRF (per EVI) - i.e., there is only one tenant VLAN
>        per EVI.  However, if service interfaces for an EVPN PE are
>        configured in VLAN-Aware Bundle mode (i.e., section 6.3 of RFC7432),
>        then there are several BTs per MAC-VRF (per EVI) - i.e., there are
>        several tenant VLANs per EVI.
> 
>     [Is VLAN-bundle excluded again here?  No need to repeat the response,
>     just noting it as similar.]
> 
> [AS] It is intentionally excluded because in VLAN-bundle, we cannot identify individual VLAN (e.g., it is not VLAN-aware).
> 
>     Section 4
> 
>     I think Figures 2 and 3 would benefit from some visual indication that
>     the "top half" is in "IP space" but the "bottom half" is in "Ethernet
>     space".
> 
>        In symmetric IRB as shown in figure-2, the inter-subnet forwarding
>        between two PEs is done between their associated IP-VRFs.  Therefore,
>        the tunnel connecting these IP-VRFs can be either IP-only tunnel (in
>        case of MPLS or GENEVE encapsulation) or Ethernet NVO tunnel (in case
>        of VxLAN encapsulation).  [...]
> 
>     [editorial]: there's nothing forbidding an Ethernet tunnel for this case
>     when using MPLS or GENEVE, right, it's just inefficient?  This text
>     could be read as saying that it's forbidden.
> 
> [AS] Correct! I added "e.g.," in () so that the readers know that these encaps are examples. 
> 
>        to each BT via its associated IRB interface.  Each BT on a PE is
>        associated with a unique VLAN (e.g., with a BD) where in turn is
>        associated with a single MAC-VRF in case of VLAN-Based mode or a
>        number of BTs can be associated with a single MAC-VRF in case of
>        VLAN-Aware Bundle mode.  Whether the service interface on a PE is
> 
>     nit: missing "it", probably for "where in turn it is associated".
> 
> [AS] added "it".
> 
>        VLAN-Based or VLAN-Aware Bundle mode does not impact the IRB
>        operation and procedures.  It mainly impacts the setting of Ethernet
> 
>     [noting another non-mention of VLAN-bundle mode]
> 
> [AS] same explanation as before.
> 
>        tag field in EVPN BGP routes as described in [RFC7432].
> 
>     Perhaps this is clear to most people, but I think I would benefit from a
>     section reference within 7432 or another sentence indicating how this
>     works.
> 
> [AS] added section number.
> 
>     Figure 4 has a line for "Mx/IPx" in PE1 but no corresponding line in
>     PE2.  Since the IP address of the VRF is anycase, we can presume that
>     PE2 also uses IPx, but I'm not entirely sure what the "Mx" represents
>     for PE1 and what the corresponding behavior for PE2 would be.
> 
> [AS] Mx corresponds to MACx. Because there is no room in the figure to spell out MAC, M is used instead. Since we are using M1 and M2 to corresponds to MAC1 and MAC2 for TS1 and TS2 respectively in the figure, Mx would follow that. I changed "Mx/IPx" to "IPx/Mx" to be consistent with "IP1/M1" and "IP2/M2" notations.

Thanks, both for the update and the explanation.

>     Section 4.1
> 
>        It is worth noting that if the applications that are running on the
>        TSes are employing or relying on any form of MAC security, then
>        either the first model (i.e. using anycast MAC address) should be
>        used to ensure that the applications receive traffic from the same
>        IRB interface MAC address that they are sending to, or if the second
>        model is used, then the IRB interface MAC address MUST be the one
>        used in the initial ARP reply or ND Neighbor Advertisement (NA)for
>        that TS.
> 
>     nit: the "either" here seems redundant.  There are only two choices, so
>     to say "either do (1) or, if you do (2), also do this" is equivalent to
>     just saying "if you do (2), also do this", optionally with a note that
>     (1) also works and is simpler.  E.g.,
> 
> [AS] Done.
> 
>     % It is worth noting that if the applications that are running on the
>     % TSes are employing or relying on any form of MAC security, then when
>     % using the second model (using per-PE MAC addresses), the IRB interface
>     % MAC address MUST be the one used in the initial ARP reply or ND
>     % Neighbor Advertisement (NA) for that TS.
> 
>        Although both of these options are equally applicable to both
>        [...]
>        the VLANs for that tenant.  Furthermore, it simplifies the operation
>        as there is no need for Default Gateway extended community
>        advertisement and its associated MAC aliasing procedure.  Yet another
>        advantage is that following host mobility, the host does not need to
>        refresh the default GW ARP/ND entry.
> 
>     We're really promoting option-1 here, almost to the extent that we
>     should ask if option-2 needs to be mentioned at all.  The only benefit I
>     can think of so far is that for option-1 you have to share MACsec keys
>     all over the place, which option-2 avoids (to be fair, this is a
>     significant advantage in some cases); are there other benefits?
> 
> [AS] that's about it.
> 
>        Where the last octet is generated based on a configurable Virtual
>        Router ID (VRID, range 1-255)).  If not explicitly configured, the
>        default value for the VRID octet is '01'.  Auto-derivation of the
> 
>     We could perhaps not mix decimal and (presumed) hex in the same
>     paragraph.
> 
> [AS] changed "01" to "1".
> 
>        anycast MAC can only be used if there is certainty that the auto-
>        derived MAC does not collide with any customer MAC address.
> 
>     Per the Discuss point, how might such certainty be obtained?  I note
>     that there does not seem to be anything preventing new customer MAC
>     addresses from appearing at any time.  (It also seems like this risk of
>     collision would hold however the PE MAC addresses are selected?)
> 
> [AS] explained previously. 
> 
>        Irrespective of using only the anycast address or both anycast and
>        non-anycast addresses on the same IRB, when a TS sends an ARP request
>        or ND Neighbor Solicitation (NS) to the PE that is attached to, the
>        request is sent for the anycast IP address of the IRB interface
>        associated with the TS's subnet and the reply will use anycast MAC
>        address (in both Source MAC in the Ethernet header and Sender
>        hardware address in the payload).  [...]
> 
>     nit(?): I don't really see how the declarative "the [ARP or NS] request
>     is sent for the anycast IP address can place such a restriction on the
>     TS behavior; are we intending to instead say that "when a [ARP or NS]
>     request is sent for the anycast IP address of [the IRB interface], the
>     reply will use the anycase MAC address", with a more clear cause/effect
>     relationship?
> 
> [AS] changed it from " the reply will use anycast MAC..." to "then the reply will use anycast MAC..."
> 
>        Traffic routed from IP-VRF1 to TS1 SHOULD use the anycast MAC address
>        as source MAC address.
> 
>     When would this SHOULD be violated?
> 
> [AS] changed "SHOULD use" to "uses".
> 
>     Section 5.1
> 
>        When a PE (e.g., PE1 in figure 4 above) learns MAC and IP address of
>        a TS (via an ARP request), it adds the MAC address to the
> 
>     nit: in Section 6.1 this parenthetical is "e.g., via an ARP request".
> 
> [AS] Added "e.g.," here for consistency.  
> 
>        o  Route Distinguisher (RD), Ethernet Segment Identifier, Ethernet
>           Tag ID, MAC Address Length, MAC Address, IP Address Length, IP
>           Address, and MPLS Label1 fields MUST be set per [RFC7432] and
>           [RFC8365].
> 
>     (side note) I would have an easier time understanding what is needed if
>     we phrased this in terms of "the same values as used for [some existing
>     advertisement]", but I'm also not in the target audience for this
>     document, so maybe that is not a good idea.
> 
> [AS] We are following the same language as RFC7432 and other EVPN RFCs here.
> 
>        o  The MPLS Label2 field is set to either an MPLS label or a VNI
>           corresponding to the tenant's IP-VRF.  In case of an MPLS label,
>           this field is encoded as 3 octets, where the high-order 20 bits
>           contain the label value.
> 
>     How do I choose between MPLS label and VNI?
> 
> [AS] The route is sent with an Extended Community that specifies encapsulation type. If the encap type is VxLAN, then the Label2 field is a VNI as described in RFC 8365.

It seems so obvious now that you said it; thanks :)

>     Section 5.2
> 
>        The inclusion of MPLS label2 field in this route signals to the
>        receiving PE that this route is for symmetric IRB mode and MPLS
>        label2 needs to be installed in forwarding path to identify the
>        corresponding IP-VRF.
> 
>     How does the receiving PE know whether the label2 field is a label or a
>     VNI?  (Maybe a section reference to RFC 8365 would be useful, not
>     necessarily right here, but perhaps from earlier in this document.)
> 
> [AS] Please refer to the previous comment.
> 
>        If the receiving PE receives this route with both the MAC-VRF and IP-
>        VRF route targets but the MAC/IP Advertisement route does not include
>        MPLS label2 field and if the receiving PE supports asymmetric IRB
>        mode, then the receiving PE installs the MAC address in the
>        corresponding MAC-VRF and (IP, MAC) association in the ARP table for
>        that tenant (identified by the corresponding IP-VRF route target).
> 
>     [a little surprising to see this given that Section 5 is about
>     "Symmetric IRB Procedures", but I guess it makes sense to say something,
>     so I shouldn't quibble about what form that something takes.]
> 
> [AS] (

(I assume there wasn't supposed to be anything earth-shattering here, since
I was basically just making a side note.)

>        If the receiving PE receives this route with both the MAC-VRF and IP-
>        VRF route targets and if the receiving PE does not support either
>        asymmetric or symmetric IRB modes, then if it has the corresponding
>        MAC-VRF, it only imports the MAC address; otherwise, if it doesn't
>        have the corresponding MAC-VRF, it MUST not import this route.
> 
>     I can't quite tell if this MUST NOT (nit: "NOT" should be capital, too)
>     is just re-expressing the preexisting behavior for non-IRB EVPN behavior
>     or if it's a new normative requirement.
> 
> [AS] changed it to "must not" as it is not a new behavior for L2-only PEs. 
> 
>     Section 5.3
> 
>        only if that PE has locally attached hosts in that subnet.  In order
>        to enable inter-subnet routing across PEs in a deployment where all
>        subnets are not provisioned at all PEs participating in an EVPN IRB
>        instance, PEs MUST advertise local subnet routes as RT-5.  These
> 
>     nit: "where all subnets are not provisioned at all PEs participating"
>     has a slightly different meaning than "where not all subnets are
>     provisioned at all PEs participating".  I suspect the latter is what's
>     intended, though I am not 100% sure.
> 
> [AS] changed it to the latter one as it reads better for the intended purpose.
> 
>        Consider a subnet A that is locally attached to PE1 and subnet B that
>        is locally attached to PE2 and to PE3.  Host A in subnet A, that is
>        attached to PE1 initiates a data packet destined to host B in subnet
>        B that is attached to PE3.  If host B's (MAC, IP) has not yet been
> 
>     Could we perhaps use different names/symbols for subnets and hosts
>     within them?
> 
> [AS] since we are talking about two subnets A & B and only two hosts (each in one subnet). I think the current naming is easy to follow - i.e., host A belongs to subnet A and host B belong to subnet B.

Okay, it's your call.

>        subnet B locally attached.  Once the packet is received at PE2 OR
>        PE3, and the route lookup yields a glean result, an ARP request is
>        triggered and flooded across the layer-2 overlay.  This ARP request
>        would be received and replied to by host B, resulting in host B (MAC,
>        IP) learning at PE3, and its advertisement across the EVPN network.
> 
>     (side note) When I first read this, the "layer-2 overlay" (vs. "layer-3
>     overlay") confused me, but I think I figured it out.  To check: the ARP
>     request on the layer-2 overlay is needed in order to cover all of subnet
>     B (and thus, to meet the criterion that the packet from PE1 has to be
>     able to be processed whether it arrives at PE2 or PE3).  The ARP request
>     triggers the PE to which host B is attached in order to advertise the
>     EVPN route over BGP, so that PE1 learns about host B's (MAC, IP) and can
>     send directly to PE3 in the future.
> 
> [AS] Exactly! I am impressed as this section & the concept described within it, is one of the more complex concept.
> 
>     Section 5.4
> 
>        routed.  Hence, the ingress PE performs an IP lookup in the
>        associated IP-VRF table.  The lookup identifies BGP next hop of
> 
>     (This is a lookup of the destination IP address, right?)
> 
> [AS] correct!
> 
>     Also, nit: "the BGP next hop" (and "the egress PE" going into the next
>     line).
> 
> [AS] It looks OK in my ".txt"
> 
>        label is set to the received label2 in the route.  In case of
>        Ethernet NVO tunnel type, VNI may be set one of two ways:
>        [...]
>        PE's may be configured to operate in one of these two modes depending
>        on the administrative domain boundaries across PEs participating in
>        the NVO, and PE's capability to support downstream VNI mode.
> 
>     Is it safe to mix these two ways in a single deployment, or does the
>     operator need to be careful to provide homogeneous configuration?
> 
> [AS] It can be mixed although it is not typical. 

Okay, so we don't need to put a warning in the document, then.

>     Section 6.2
> 
>        o  Using MAC-VRF route target, it identifies the corresponding MAC-
> 
>     Can a non-zero Ethernet Tag also come into play here (as was the case
>     for Symmetric operation)?
> 
> [AS] yes, Ethernet Tag in here can be either zero or non-zero.

Okay.  (I was obliquely suggesting that the phrasing "Using MAC-VRF route
target (and Ethernet Tag if different from zero" could be used, which would
closely match the language in Section 5.2.  This is not required by any
means, but can be helpful to the reader in giving a uniform/parallel
structure to the document for portions that are similar to each other.)

>           An implementation may choose to consolidate the lookup at the
>           ingress PE's IP-VRF with the lookup at the ingress PE's
>           destination subnet MAC-VRF.  Consideration for such consolidation
>           of lookups is an implementation exercise and thus its
>           specification is outside the scope of this document.
> 
>     (This feels a little redundant with the earlier text in Section 4 (where
>     we described the operation with "collapsed" vs. "consolidate" here).)
> 
> [AS] Agreed. Removed this couple of sentences. 
> 
>        If the receiving PE receives the MAC/IP Advertisement route with MPLS
>        label2 field and it can support symmetric IRB mode, then it should
>        use the MAC-VRF route target to identify its corresponding MAC-VRF
>        table and import the MAC address.  It should use the IP-VRF route
>        target to identify the corresponding IP-VRF table and import the IP
>        address, as specified in symmetric IRB handling.  It MUST NOT import
>        (IP, MAC) association into its ARP table.
> 
>     I don't think I understand why this is a MUST NOT, especially since we
>     do store the (IP, MAC) association in the ARP table in the previous
>     paragraph's case.
> 
> [AS] That's the contrast between asymmetric and symmetric IRB. In asymmetric IRB (previous paragraph), <IP,MAC> association is stored in ARP table; however, in symmetric IRB, this association need not be stored in ARP table. That's why symmetric IRB scales much better than asymmetric IRB. 

Thanks for explaining that symmetric is better/preferred, it helps me
understand the intent here.  That said, this paragraph ~starts with "if ...
it can *support* symmetric IRB mode" (emphasis mine).  It seems like
something of a mismatch to have a MUST NOT use attached only to a "can
support" predicate -- I would have expected the predicate to be "is using
symmetric IRB mode" rather than "can support".

>     Section 7
> 
>     The mobility procedures in Section 15 of RFC 7432 on paper don't have
>     very great security properties -- at any remote signal indicating a
>     mobility event for a given MAC address, a PE advertising reachability to
>     that MAC will withdraw the route.  (Though, this is probably something
>     of a "the entire system is trusted" effect.)  Perhaps local learning
>     would cause it to be reintrodced if the MAC in question had not actually
>     moved away, but there seems to be some risk of flapping and triggering
>     the MAC Duplication detection. 
> 
> [AS] That's why we have MAC duplication detection mechanism and sequence numbering for mobility procedure in case the host moves too fast.

Do you think it is worth mentioning the risk and mitigation in the security
considerations of this document?  ("No" is an okay answer, as it does not
really seem to be a new risk with this document, though of course I
wouldn't mention it if I thought that "no" was the clear/correct answer.)

>     The concrete advice we give in Section
>     7.1 to send a local ARP probe is good, but how rigid does the sequencing
>     need to be amongst (receive EVPN MAC/IP Advertisement, send local ARP
>     probe/wait for response, and withdraw EVPN Mac/IP Advertisement)?  If
>     there was a way to avoid the need to withdraw+readvertise step, it seems
>     like that might be preferable.
> 
> [AS] If the reply to the local ARP probe is positive, then the source PE doesn't withdraw the MAC/IP but rather it readvertised it with a higher sequence number and performs MAC duplication detection. 

The current text does not give me that impression.  I would prefer if we
could reword it somehow to clarify, perhaps "It then sends an ARP probe
locally to ensure that the MAC is gone, and withdraws the EVP MAC/IP
Advertisement route upon confirmation that the MAC is gone".

>     Can we confirm that the backwards- and forwards-compatibility story is
>     okay for mixed deployments with "stock EVPN" vs. IRB-enhanced EVPN (this
>     document)?  It looks like we are using the MAC Mobility Extended
>     Community to signal both MAC and IP mobility events, but I am not sure
>     that I can reason through all the cases and confirm that the right thing
>     happens.
> 
> [AS] Since in this document for a host there is one MAC address and one global IP address (in context of the customer's IP address space), the MAC and IP mobility is one of the same. There is another draft that covers many MACs to one IP address binding as well as many IP addresses to a single MAC binding and the mobility procedures for them.

Okay, so it sounds like we are backwards-compatible in that a non-updated
node will still process the MAC bit and properly ignore the additional IP
information that happens to be present, good.

>     Also, do all NVEs need to be prepared to cope with all three classes of
>     TS behavior at all times?  That seems like something worth stating more
>     clearly.
> 
> [AS]. Sure, I added a sentence to that effect.
> 
>     Section 7.2
> 
>        If EVPN-IRB NVEs are configured to advertise MAC-only routes in
>        addition to MAC-and-IP EVPN routes, then the following steps are
>        taken:
> 
>     Does this configuration need to be globally consistent across all PEs?
>     If so, where do we state this requirement?
> 
> [AS] It is NVE specific - i.e., some NVE can be operating in L2-only mode and thus advertising MAC-only routes; whereas, some other NVEs are configured in IRB mode and thus advertising both routes.

Ah, I think I see now.  Thanks.

>        o  The target NVE upon learning this MAC address in data-plane,
>           updates this MAC address entry in the corresponding MAC-VRF with
>           the local adjacency information (e.g., local interface).  It also
>           recognizes that this MAC has moved and initiates MAC mobility
>           procedures per [RFC7432] by advertising an EVPN MAC/IP
>           Advertisement route with only the MAC address filled in along with
>           MAC Mobility Extended Community with the sequence number
>           incremented by one.
> 
>     The lead-in had "in addition to MAC-and-IP EVPN routes", but this
>     step only does the MAC EVPN route.  It looks like the MAC-and-IP route
>     will be added later, in the third step; would it make sense to switch
>     this to a numbered list and mention that the additional route will be
>     added in step 3?
> 
> [AS] The MAC-and-IP EVPN route that gets sent in the step 3 is the result of ARP request sent in step 2. So, the sequence of the events described in these bullets are good and step-3 cannot be mentioned in step-1 without explaining how it happens (which is step-2).

Thank you for checking.

>           advertised such route previously).  Furthermore, it searches for
>           the corresponding MAC-IP entry and sends an ARP probe for this
>           (MAC,IP) pair.  The ARP request message is sent both locally to
>           all attached TSes in that subnet as well as it is sent to other
>           NVEs participating in that subnet including the target NVE.  Note
> 
>     I think a comment about the sequencing of this ARP request, similar to
>     my comment above, may apply.  But here the ARP request is sent both
>     locally and remotely, whereas previously it was only needed locally.  I
>     confess I don't understand what the difference is between the cases that
>     makes it okay for the Section-7.1 scenario to only send the ARP probe
>     locally.
> 
>     Also, what happens if we get a response back from the local TSes (in
>     addition to the actions from the target NVE covered in the next bullet
>     point)?
> 
> [AS] In section 7.1 the reason the source NVE sends ARP request to only locally attached TSes is because the target NVE has already received an ARP message (gratuitous ARP); however, in section 7.2, the target NVE hasn't received any ARP messages. Therefore, the source NVE sends the ARP request both locally and remotely so that it triggers an ARP response message by moved TS to be received by the target NVE.

Thanks for the explanation of why the remote ARP request is needed.
Could we consider rewording here (as well) to avoid implying that the
withdrawl occurs even when the ARP probe receives a local response and the
NVE will be re-advertising the route?  (It seems a bit more complicated to
write something for this case, so I don't have a concrete suggestion right
at hand.)

>           that the PE would need to maintain a correlation between MAC and
>           MAC-IP route entries in the MAC-VRF to accomplish this.
> 
>     I find this "would need to" language very concerning.  If this is
>     required for correct operation, isn't it a MUST?
> 
> [AS] changed "would need" to "needs".
> 
>        o  All other remote NVE devices upon receiving the MAC/IP
>           advertisement route with MAC Mobility extended community compare
> 
>     Does this only happen when they receive the EVPN MAC/IP Advertisement
>     route with both MAC and IP address information (vs. the earlier one with
>     just MAC information)?
> 
> [AS] it happens with both advertisements. 
> 
>        If EVPN-IRB NVEs are configured not to advertise MAC-only routes,
>        then upon receiving the first data packet, it learns the MAC address
> 
>     (editorial) The description for the MAC,MAC+IP case is quite involved and
>     has many separate steps, but the MAC+IP-only case in this paragraph is
>     much simpler, and so the description for it does not have a parallel
>     structure to the MAC,MAC+IP case.  Perhaps splitting into subsections
>     might help make it clear that, despite the difference in complexity, the
>     procedures in question are actually quite parallel in the nature of what
>     role they play.
> 
> [AS] thanks for the suggestion but the vendors who have implemented these procedures are OK with the text.

Okay.

>     Section 7.3
> 
>        On the source NVE, an age-out timer (for the silent host that has
>        moved) is used to trigger an ARP probe.  This age-out timer can be
>        either ARP timer or MAC age-out timer and this is an implementation
>        choice.  The ARP request gets sent both locally to all the attached
>        TSes on that subnet as well as it gets sent to all the remote NVEs
>        (including the target NVE) participating in that subnet.  The source
>        NVE also withdraw the EVPN MAC/IP Advertisement route with only the
>        MAC address (if it has previously advertised such a route).
> 
>     Does this route withdraw occur at the same time that the ARP probe is
>     sent, or only if there is not a response?
> 
> [AS] The withdraw happens upon age-out timer expiration.

Sticking to the theme of my past few comments, I think it would be good to
state this clearly in the document.

>     Also, does this need for a timer-driven full broadcast ARP probe impose
>     any scaling limits on the number of subnets that can be joined via IRB
>     for intra-subnet traffic?  (I don't expect there to be fundamentally
>     different scaling properties than for more typical ARP usage, but it's a
>     question about whether the joining of subnets will cause the limit to be
>     reached before other scaling limits would be reached.)
> 
> [AS] ARP probe is sent on a single subnet corresponding the MAC-VRF for that TS. 
> 
>        The target NVE passes the ARP request to its locally attached TSes
>        and when it receives the ARP response, it updates its MAC-VRF, IP-
>        VRF, and ARP table with the host (MAC, IP) and local adjacency
>        information (e.g., local interface).  It also sends an EVPN MAC/IP
>        advertisement route with both the MAC and IP address fields filled in
>        along with MAC Mobility Extended Community with the sequence number
>        incremented by one.
> 
>     Do we need to mention the possibility of an EVPN MAC/IP Advertisement
>     route with only the MAC address here, as well?
> 
> [AS] no, MAC-only advertisement is the result of data-plane learning a MAC address; whereas, MAC+IP advertisement is the result of learning both MAC and IP addresses when the NVE receives and ARP message from a TS. 
> 
>        All other remote NVE devices upon receiving the MAC/IP Advertisement
>        route route with MAC Mobility extended community compare the sequence
> 
>     [this is the one with both MAC and IP filled in, right?]
> 
> [AS] correct.
> 
>     Section 8.1
> 
>        This extended community is used to carry the PE's MAC address for
>        symmetric IRB scenarios and it is sent with RT-2.
> 
>     It's only needed for specifically the Ethernet NVO tunnel (not MPLS or
>     IP-only NVO) symmetric IRB scenarios, right?
> 
> [AS] That's correct.

Perhaps we should say "for Ethernet NVO symmetric IR scenarios", then?

>     Section 9
> 
>     I see that we only give examples/operational models for the symmetric
>     IRB case and for scenarios with IP subnets behind tenant systems.  Is
>     there any desire to give a similar example for asymmetric IRB?
> 
> [AS] symmetric IRB is a more general case. Once it is covered, one can deduce the scenarios for asymmetric IRB. Besides, these are just examples.

Okay, thank you for the explanation.

>     Section 9, 9.1
> 
>        procedures.  In the following scenarios, without loss of generality,
>        it is assumed that a given tenant is represented by a single IP-VPN
>        instance.  [...]
>        In this scenario, without loss of generality, it is assumed that NVEs
>        operate in VLAN-based service interface mode with one Bridge
>        Table (BT) per MAC-VRF.  [...]
> 
>     Just to check my understanding: the "more general" case would have,
>     e.g., multiple IP-VPN instsances for a single tenant, or ... multiple
>     BTs per MAC-VRF?  
> 
> [AS] Correct!
> 
>     I guess for the former case the IP-VPN instances would
>     be logically separate (absent some central L3GW), but I'm not really
>     sure what the latter case would look like.  Though maybe the VLAN-aware
>     bundling case in the following sentences (with multiple BTs per MAC-VRF)
>     is exactly the generalization being assumed here?
> 
> [AS] Exactly! VLAN-aware bundling case is the case for multiple BTs per MAC-VRF
> 
>     Section 9.1.1
> 
>        o  If the route carries the new Router's MAC Extended Community, and
>           if the receiving NVE uses Ethernet NVO tunnel, then the receiving
>           NVE imports the IP address into IP-VRF with NVE's MAC address
> 
>     What does the receiving NVE do if it uses Ethernet NVO tunnels but the
>     route does not carry the Router's MAC Extended Community?
> 
> [AS] It should get discarded.
> 
>        o  If the receiving NVE ration MPLS encapsulation, then the receiving
>           NVE imports the IP address into IP-VRF with BGP Next Hop address
> 
>     What does "ration" mean here?
> 
> [AS] Thanks for catching it. I replaced it with "uses"
> 
>        If the receiving NVE receives a RT-2 with only Label-1 and only a
>        single Route Target corresponding to IP-VRF, or if it receives a RT-2
>        with only a single Route Target corresponding to MAC-VRF but with
>        both Label-1 and Label-2, or if it receives a RT-2 with MAC Address
>        Length of zero, then it MUST treat the route as withdraw [RFC7606]
>        and log an error message.
> 
>     Is this changing the error handling for the existing RT-2 advertisement?
>     How is such a change backwards compatible?
> 
> [AS] These combinations are invalid in both RFC 7432 and in this document.
> 
>     Also, a "MUST log" without rate-limiting risks mandating a DoS channel.
> 
> [AS] added "SHOULD log"

Thanks.

>     Section 9.1.2
> 
>        o  On the egress NVE, if the packet arrives on Ethernet NVO tunnel
>           (e.g., it is VxLAN encapsulated), then the NVO tunnel header is
> 
>     (side note) I can't decide whether it's good or bad that we use VxLAN as
>     the example here vs. the explicit MPLS/etc. mentions in the previous
>     point.  On the one hand, it lets us mention all the various different
>     implementation technologies, but on the other hand we don't have a
>     coherent thread to tie the steps together.
> 
> [AS] VxLAN processing is more involved than MPLS and thus it is emphasized more.
> 
>     Section 9.2
> 
>        their next hop.  The receiving NVEs perform recursive route
>        resolution to resolve the subnet prefix with its associated ingress
>        NVE so that they know which NVE to forward the packets to when they
>        are destined for that subnet prefix.
> 
>     nit(?): is the "ingress" part important here?  The "forward the packets
>     to" operation would seem to be describing a case where that NVE is
>     acting as egress, at least...
> 
> [AS] You're right! However, since we are talking about control plane, I changed it to the control plane terminology - i.e., I changed "associated ingress NVE" to "advertising NVE". 

Thanks!

>        and MAC-VRF3 across two NVEs.  There are four TSes associated with
>        these three MAC-VRFs - i.e., TS1, TS5 are connected to MAC-VRF1 on
> 
>     [there is no TS5 in the figure]
> 
> [AS] Good catch! Removed TS5.
> 
>     Section 9.2.1
> 
>        o  Label = 0
> 
>     [draft-ietf-bess-evpn-prefix-advertisement seems to describe this field
>     as the "MPLS Label".]
> 
> [AS] Changed it to "MPLS Label"
> 
>        This RT-5 is advertised with one or more Route Targets that have been
>        configured as "export route targets" of the IP-VRF from which the
>        route is originated.
> 
>     This is the only place in this document where we use the phrase "export
>     route target" and no reference is provided; please clarify what is
>     meant.
> [AS] sometimes less is more. I removed "that have been configured as "export route targets""

:)

>        o  It imports the IP prefix into its corresponding IP-VRF that is
>           configured with an import RT that is one of the RTs being carried
>           by the RT-5 route along with the IP address of the associated TS
>           as its next hop.
> 
>     The phrase "that is one of the RTs being carried by the RT-5 route"
>     seems like it leaves a significant degree of freedom to the receiving
>     NVE.  Do we want to make this more precisely specified?
> 
> [AS] an IP-VRF can be associated with more than one RT and as long as one of them match, the receiving PE can import the route.
> 
>     Section 9.2.2
> 
>        The following description of the data-plane operation describes just
>        the logical functions and the actual implementation may differ.  Lets
>        consider data-plane operation when a host on SN1 sitting behind TS1
>        wants to send traffic to a host sitting behind SN3 behind TS3.
> 
>        o  TS1 send a packet with MAC DA corresponding to the MAC-VRF1 IRB
>           interface of NVE1, and VLAN-tag corresponding to MAC-VRF1.
> 
>     [I guess the step where there's an ethernet frame from SN1 to TS1 is
>     boring and okay to skip?]
> 
> [AS] correct (
> 
>     Section 11
> 
>     I agree with the secdir reviewer's comment that incorporating by
>     reference the security considerations of the core protocol RFCs would be
>     worthwhile.
> 
>     Similarly, a reminder seems appropriate (a la RFC 4365) that this VPN
>     scheme does not provide the usual quartet of security properties that we
>     ask about (confidentiality protection, source authentication, integrity
>     protection, replay protection), and that if such functionality is needed
>     it must be provided in some other manner.
> 
> [AS] Added " The VPN scheme described in this document does not provide 
>    the quartet of security properties mentioned in <xref target="RFC4365"/>
>    (confidentiality protection, source authentication, integrity
>    protection, replay protection), and If these are desired, they must be 
>    provided by mechanisms that are outside the scope of the VPN mechanisms."

Thank you.

>        Furthermore, the security consideration for layer-3 routing is this
>        document follows that of [RFC4365] with the exception for application
> 
>     The Security Considerations of RFC 4365 notes that RFC 4111 provides a
>     template "that may be used to evaluate and summarize how a given PPVPN
>     approach (solution) measures up against the PPVPN Security Framework".
>     Given that the IP-layer inter-subnet routing introduced by this document
>     is in some sense a new L3VPN technology, would it be appropriate to fill
>     out that template as it applies here?  It's unfortunate that RFC 7432
>     does not itself fill out the template from RFC 4111, as it would be
>     useful to have that information readily available as well (though I
>     understand that the L2-only parts of the mechanims described in this
>     document are essentially unchanged from RFC 7432 and it is only our
>     responsibility to document otherwise-undocumented critical security
>     flaws).
> 
> [AS] Yes, the L2-only parts of this document (MAC-VRF) are basically the same as RFC 7432.

But the L3 parts are new.  Shouldn't we at least document that part?

>     I think we should note that the asymmetric IRB scheme leaves tenant MAC
>     addresses visible in cleartext (absent other cryptographic protections)
>     over the backbone/underlay, which has potential privacy consequences.
> 
> [AS] The text that I added in this section should cover it. Both tenant MAC and IP addresses are visible in clear text.

Okay.

Thanks for these and all the other updates, as well as the explanations and
confirmation of my explanations for the parts I wasn't sure of.

-Ben

>     The mechanisms in this document bring additional exposure/usage for
>     per-NVE IP and MAC addresses, at least some of which may be (locally)
>     automatically generated in some cases; it might be appropriate to
>     discuss how the system would fail if there were collisions in these
>     identifiers between NVEs.
> 
> [AS] This comment is addressed at the beginning.
> 
>     As alluded to in a couple of my previous comments, we may also wish to
>     discuss any scaling issues (and consequent DoS risks) that may arise due
>     to the need for NVEs to store additional state in ARP tables and
>     IP-VRFs.  (If such risks are negligible, then maybe not.)
> 
>     There probably aren't any additional mobility-related security
>     considerations other than what RFC 7432 discusses (but mentioning
>     mobility concerns specifically may still be worthwhile).
> 
>     Section 14.1
> 
>     RFC 8214 is listed as a normative reference but not cited anywhere.
>     Should it be removed, or citations added?
> 
> [AS]  it is already done.
> 
>     Section 14.2
> 
>     RFC 7606 seems like it should be normative, since we use it to define
>     what "treat the route as withdraw" means, which itself is behavior
>     mandated at a MUST level.
> 
> [AS] OK. 
> 
>
[bess] Benjamin Kaduk's Discuss on draft-ietf-bes… Benjamin Kaduk via Datatracker
Re: [bess] Benjamin Kaduk's Discuss on draft-ietf… Ali Sajassi (sajassi)
Re: [bess] Benjamin Kaduk's Discuss on draft-ietf… Benjamin Kaduk
Re: [bess] Benjamin Kaduk's Discuss on draft-ietf… Ali Sajassi (sajassi)
Re: [bess] Benjamin Kaduk's Discuss on draft-ietf… Benjamin Kaduk
Re: [bess] Benjamin Kaduk's Discuss on draft-ietf… John E Drake
Re: [bess] Benjamin Kaduk's Discuss on draft-ietf… Ali Sajassi (sajassi)
Re: [bess] Benjamin Kaduk's Discuss on draft-ietf… Benjamin Kaduk
Re: [bess] Benjamin Kaduk's Discuss on draft-ietf… John E Drake
Re: [bess] Benjamin Kaduk's Discuss on draft-ietf… Ali Sajassi (sajassi)