[bess] Benjamin Kaduk's Discuss on draft-ietf-bess-evpn-inter-subnet-forwarding-09: (with DISCUSS and COMMENT)

Benjamin Kaduk via Datatracker <noreply@ietf.org> Tue, 14 July 2020 21:00 UTC

Return-Path: <noreply@ietf.org>
X-Original-To: bess@ietf.org
Delivered-To: bess@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 0A22D3A0CBC; Tue, 14 Jul 2020 14:00:07 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-bess-evpn-inter-subnet-forwarding@ietf.org, bess-chairs@ietf.org, bess@ietf.org, Zhaohui Zhang <zzhang@juniper.net>, zzhang@juniper.net
X-Test-IDTracker: no
X-IETF-IDTracker: 7.8.0
Auto-Submitted: auto-generated
Precedence: bulk
Reply-To: Benjamin Kaduk <kaduk@mit.edu>
Message-ID: <159476040701.14459.4825957938068100547@ietfa.amsl.com>
Date: Tue, 14 Jul 2020 14:00:07 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/bess/OyzcwwpX2Ia6dcspNu6zoGzCxAQ>
Subject: [bess] Benjamin Kaduk's Discuss on draft-ietf-bess-evpn-inter-subnet-forwarding-09: (with DISCUSS and COMMENT)
X-BeenThere: bess@ietf.org
X-Mailman-Version: 2.1.29
List-Id: BGP-Enabled ServiceS working group discussion list <bess.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bess>, <mailto:bess-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bess/>
List-Post: <mailto:bess@ietf.org>
List-Help: <mailto:bess-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bess>, <mailto:bess-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 14 Jul 2020 21:00:07 -0000

Benjamin Kaduk has entered the following ballot position for
draft-ietf-bess-evpn-inter-subnet-forwarding-09: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-bess-evpn-inter-subnet-forwarding/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

(1) Possibly a "discuss discuss", but ...
if I'm understanding correctly, the symmetric IRB case over an Ethernet
NVO tunnel (not MPLS or IP NVO) described in this document is
introducing a new scenario where traffic using router (PE) MAC addresses
as source and destination is comingled on the same tunnel with traffic
using tenant system MAC addresses as source and destination.  This
places an obligation on the tunnel endpoints to properly isolate and
process such "internal" tunnel traffic without hampering the ability of
tenans systems to communicate.  In a world where tenant systems can
appear at any time, using previously unknown MAC addresses, this
represents a rather challenging problem: how will the PEs be able to
pick (and avertise) MAC addresses that they know will not conflict with
any present or future customer systems?  (A similar dilemma led to quite
a delay in the processing of draft-ietf-bfd-vxlan, which in that case
was resolved by limiting the BFD operation to just the "management VNI"
which is not subject to MAC address conflict with customer systems.)  In
this docuement's case, we seem to be using a "well-known"/reserved MAC
address range from RFC 5798; in principle, this should be enough to
avoid conflicts, if customer systems are known to not squat on this
reserved range.  However, this document has some text in Section 4.1
that indicates that there must be external knowledge that auto-derived
MAC addresses in the RFC 5798 ranges "[do] not collide with any customer
MAC address".  So I'm left uncertain whether or not this potential
problem is addressed or not.  (Also, I don't know if the limit on 255
distinct such MAC addresses presents a scaling issue.)

Also, is there any risk that the EVPN IRB setup might also want to use
VRRPv3, and thus collide on the MAC addresses in a different manner?

(1.1) I'm less sure whether there's a similar collision risk for IP
addresses -- we assign IP addresses to NVEs (e.g., for use as BGP Next
Hop addresses) and these are used as VTEP addresses when encapsulating
packets that are going inter-subnet.  I think that at this point in the
packet processing we may already know that we're in the the "IRB tunnel"
scope and that may be enough to de-conflict any potential IP address
collision between NVE and customer addresses.

(2) Section 7 appears to reference (in a normative fashion)
[IRB-EXT-MOBILITY] but there is no such reference.
Similarly (as Murray notes), there are a couple apparent references to
[TUNNEL-ENCAP] that are also arguably normative, but the target of the
reference is not forthcoming (and the IANA registry does not show a "BGP
Encapsulation Extended Community" that is supposedly defined by
[TUNNEL-ENCAP]).

There is also not a listed reference for [EVPN-PREFIX], though one could
perhaps surmise that it is intended to be
[I-D.ietf-bess-evpn-prefix-advertisement] (given that the latter does
allocate EVPN route type 5 for carrying IP Prefix routes, etx.).
However, given that this document has text like "MUST advertise local
subnet routes as RT-5", this needs to be a normative (not informative)
reference.  (We may also want to explicitly reference [EVPN-PREFIX]
where those normative requirements are made.)

(3) I'm not sure whether we are modifying the error-handling semantics
for RT-2 from what is specified in RFC 7432 (and, furthermore, whether
the changes are backwards-compatible).  If so, it seems like we may need
an Updates: relationship.  The text in question is in Section 9.1.1
(which itself seems problematic, since this section is advertised as an
(example) operational model/scenario):

   If the receiving NVE receives a RT-2 with only Label-1 and only a
   single Route Target corresponding to IP-VRF, or if it receives a RT-2
   with only a single Route Target corresponding to MAC-VRF but with
   both Label-1 and Label-2, or if it receives a RT-2 with MAC Address
   Length of zero, then it MUST treat the route as withdraw [RFC7606]
   and log an error message.

Are all of these treat-as-withdraw behaviors specified in RFC 7432?

(4) Let's discuss whether we need more generally a "backwards
compatibility" section (I mention a couple other places where this might
come into play, in the COMMENT).


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

The shepherd writeup links to the IPR search page, which finds a
disclosure, yet also claims "no discussions".  In the absence of
explicit discussion, it's not clear that the WG actively concluded that
it's appropriate to publish despite the potential encumberance, though
I'm told that this is fairly common in the routing area (hence only a COMMENT).

Please use the BCP 14 boilerplate as given in RFC 8174 (i.e., including
"NOT RECOMMENDED" as a keyword).

I'm not entirely sure that the mobility procedures are fully specified
for producing correct operation, but I'm not confident enough in my
understanding of the setup to ballot Discuss over it.  I have lots of
inline comments for Section 7 (and subsections).

Section 2

   EVPN [RFC7432] provides an extensible and flexible multi-homing VPN
   solution over an MPLS/IP network for intra-subnet connectivity among
   Tenant Systems (TSes) and End Devices that can be physical or
   virtual; where an IP subnet is represented by an EVI for a VLAN-based
   service or by an (EVI, VLAN) for a VLAN-aware bundle service.

In the terminology section (Broadcast Domain definition) we considered
three classes of model: VLAN-bundle, VLAN-based, and VLAN-aware.  Should
we mention VLAN-bundle here as well?

   The inter-subnet communication is traditionally achieved at
   centralized L3 Gateway (L3GW) devices where all the inter-subnet
   forwarding are performed and all the inter-subnet communication
   policies are enforced.  When two TSes belonging to two different
[...]
   In order to overcome the drawback of centralized layer-3 GW approach,
   IRB functionality is needed on the PEs (also referred to as EVPN
   NVEs) attached to TSes in order to avoid inefficient forwarding of
   tenant traffic (i.e., avoid back-hauling and hair-pinning).  When a

It feels like there's a bit of jargon in here that wasn't captured in
the terminology section (or references).  None of it seems particularly
inscrutable to me, personally, but perhaps it's worth another thought.

   PE with IRB capability receives tenant traffic over an Attachment
   Circuit (AC), it can not only locally bridge the tenant intra-subnet
   traffic but also can locally route the tenant inter-subnet traffic on
   a packet by packet basis thus meeting the requirements for both intra
   and inter-subnet forwarding and avoiding non-optimum traffic
   forwarding associated with centralized layer-3 GW approach.

nit(?): mybe "avoiding the non-optimal traffic forwarding"?

Section 3

   BD).  If service interfaces for an EVPN PE are configured in VLAN-
   Based mode (i.e., section 6.1 of RFC7432), then there is only a
   single BT per MAC-VRF (per EVI) - i.e., there is only one tenant VLAN
   per EVI.  However, if service interfaces for an EVPN PE are
   configured in VLAN-Aware Bundle mode (i.e., section 6.3 of RFC7432),
   then there are several BTs per MAC-VRF (per EVI) - i.e., there are
   several tenant VLANs per EVI.

[Is VLAN-bundle excluded again here?  No need to repeat the response,
just noting it as similar.]

Section 4

I think Figures 2 and 3 would benefit from some visual indication that
the "top half" is in "IP space" but the "bottom half" is in "Ethernet
space".

   In symmetric IRB as shown in figure-2, the inter-subnet forwarding
   between two PEs is done between their associated IP-VRFs.  Therefore,
   the tunnel connecting these IP-VRFs can be either IP-only tunnel (in
   case of MPLS or GENEVE encapsulation) or Ethernet NVO tunnel (in case
   of VxLAN encapsulation).  [...]

[editorial]: there's nothing forbidding an Ethernet tunnel for this case
when using MPLS or GENEVE, right, it's just inefficient?  This text
could be read as saying that it's forbidden.

   to each BT via its associated IRB interface.  Each BT on a PE is
   associated with a unique VLAN (e.g., with a BD) where in turn is
   associated with a single MAC-VRF in case of VLAN-Based mode or a
   number of BTs can be associated with a single MAC-VRF in case of
   VLAN-Aware Bundle mode.  Whether the service interface on a PE is

nit: missing "it", probably for "where in turn it is associated".

   VLAN-Based or VLAN-Aware Bundle mode does not impact the IRB
   operation and procedures.  It mainly impacts the setting of Ethernet

[noting another non-mention of VLAN-bundle mode]

   tag field in EVPN BGP routes as described in [RFC7432].

Perhaps this is clear to most people, but I think I would benefit from a
section reference within 7432 or another sentence indicating how this
works.

Figure 4 has a line for "Mx/IPx" in PE1 but no corresponding line in
PE2.  Since the IP address of the VRF is anycase, we can presume that
PE2 also uses IPx, but I'm not entirely sure what the "Mx" represents
for PE1 and what the corresponding behavior for PE2 would be.

Section 4.1

   It is worth noting that if the applications that are running on the
   TSes are employing or relying on any form of MAC security, then
   either the first model (i.e. using anycast MAC address) should be
   used to ensure that the applications receive traffic from the same
   IRB interface MAC address that they are sending to, or if the second
   model is used, then the IRB interface MAC address MUST be the one
   used in the initial ARP reply or ND Neighbor Advertisement (NA)for
   that TS.

nit: the "either" here seems redundant.  There are only two choices, so
to say "either do (1) or, if you do (2), also do this" is equivalent to
just saying "if you do (2), also do this", optionally with a note that
(1) also works and is simpler.  E.g.,

% It is worth noting that if the applications that are running on the
% TSes are employing or relying on any form of MAC security, then when
% using the second model (using per-PE MAC addresses), the IRB interface
% MAC address MUST be the one used in the initial ARP reply or ND
% Neighbor Advertisement (NA) for that TS.

   Although both of these options are equally applicable to both
   [...]
   the VLANs for that tenant.  Furthermore, it simplifies the operation
   as there is no need for Default Gateway extended community
   advertisement and its associated MAC aliasing procedure.  Yet another
   advantage is that following host mobility, the host does not need to
   refresh the default GW ARP/ND entry.

We're really promoting option-1 here, almost to the extent that we
should ask if option-2 needs to be mentioned at all.  The only benefit I
can think of so far is that for option-1 you have to share MACsec keys
all over the place, which option-2 avoids (to be fair, this is a
significant advantage in some cases); are there other benefits?

   Where the last octet is generated based on a configurable Virtual
   Router ID (VRID, range 1-255)).  If not explicitly configured, the
   default value for the VRID octet is '01'.  Auto-derivation of the

We could perhaps not mix decimal and (presumed) hex in the same
paragraph.

   anycast MAC can only be used if there is certainty that the auto-
   derived MAC does not collide with any customer MAC address.

Per the Discuss point, how might such certainty be obtained?  I note
that there does not seem to be anything preventing new customer MAC
addresses from appearing at any time.  (It also seems like this risk of
collision would hold however the PE MAC addresses are selected?)

   Irrespective of using only the anycast address or both anycast and
   non-anycast addresses on the same IRB, when a TS sends an ARP request
   or ND Neighbor Solicitation (NS) to the PE that is attached to, the
   request is sent for the anycast IP address of the IRB interface
   associated with the TS's subnet and the reply will use anycast MAC
   address (in both Source MAC in the Ethernet header and Sender
   hardware address in the payload).  [...]

nit(?): I don't really see how the declarative "the [ARP or NS] request
is sent for the anycast IP address can place such a restriction on the
TS behavior; are we intending to instead say that "when a [ARP or NS]
request is sent for the anycast IP address of [the IRB interface], the
reply will use the anycase MAC address", with a more clear cause/effect
relationship?

   Traffic routed from IP-VRF1 to TS1 SHOULD use the anycast MAC address
   as source MAC address.

When would this SHOULD be violated?

Section 5.1

   When a PE (e.g., PE1 in figure 4 above) learns MAC and IP address of
   a TS (via an ARP request), it adds the MAC address to the

nit: in Section 6.1 this parenthetical is "e.g., via an ARP request".

   o  Route Distinguisher (RD), Ethernet Segment Identifier, Ethernet
      Tag ID, MAC Address Length, MAC Address, IP Address Length, IP
      Address, and MPLS Label1 fields MUST be set per [RFC7432] and
      [RFC8365].

(side note) I would have an easier time understanding what is needed if
we phrased this in terms of "the same values as used for [some existing
advertisement]", but I'm also not in the target audience for this
document, so maybe that is not a good idea.

   o  The MPLS Label2 field is set to either an MPLS label or a VNI
      corresponding to the tenant's IP-VRF.  In case of an MPLS label,
      this field is encoded as 3 octets, where the high-order 20 bits
      contain the label value.

How do I choose between MPLS label and VNI?

Section 5.2

   The inclusion of MPLS label2 field in this route signals to the
   receiving PE that this route is for symmetric IRB mode and MPLS
   label2 needs to be installed in forwarding path to identify the
   corresponding IP-VRF.

How does the receiving PE know whether the label2 field is a label or a
VNI?  (Maybe a section reference to RFC 8365 would be useful, not
necessarily right here, but perhaps from earlier in this document.)

   If the receiving PE receives this route with both the MAC-VRF and IP-
   VRF route targets but the MAC/IP Advertisement route does not include
   MPLS label2 field and if the receiving PE supports asymmetric IRB
   mode, then the receiving PE installs the MAC address in the
   corresponding MAC-VRF and (IP, MAC) association in the ARP table for
   that tenant (identified by the corresponding IP-VRF route target).

[a little surprising to see this given that Section 5 is about
"Symmetric IRB Procedures", but I guess it makes sense to say something,
so I shouldn't quibble about what form that something takes.]

   If the receiving PE receives this route with both the MAC-VRF and IP-
   VRF route targets and if the receiving PE does not support either
   asymmetric or symmetric IRB modes, then if it has the corresponding
   MAC-VRF, it only imports the MAC address; otherwise, if it doesn't
   have the corresponding MAC-VRF, it MUST not import this route.

I can't quite tell if this MUST NOT (nit: "NOT" should be capital, too)
is just re-expressing the preexisting behavior for non-IRB EVPN behavior
or if it's a new normative requirement.

Section 5.3

   only if that PE has locally attached hosts in that subnet.  In order
   to enable inter-subnet routing across PEs in a deployment where all
   subnets are not provisioned at all PEs participating in an EVPN IRB
   instance, PEs MUST advertise local subnet routes as RT-5.  These

nit: "where all subnets are not provisioned at all PEs participating"
has a slightly different meaning than "where not all subnets are
provisioned at all PEs participating".  I suspect the latter is what's
intended, though I am not 100% sure.

   Consider a subnet A that is locally attached to PE1 and subnet B that
   is locally attached to PE2 and to PE3.  Host A in subnet A, that is
   attached to PE1 initiates a data packet destined to host B in subnet
   B that is attached to PE3.  If host B's (MAC, IP) has not yet been

Could we perhaps use different names/symbols for subnets and hosts
within them?

   subnet B locally attached.  Once the packet is received at PE2 OR
   PE3, and the route lookup yields a glean result, an ARP request is
   triggered and flooded across the layer-2 overlay.  This ARP request
   would be received and replied to by host B, resulting in host B (MAC,
   IP) learning at PE3, and its advertisement across the EVPN network.

(side note) When I first read this, the "layer-2 overlay" (vs. "layer-3
overlay") confused me, but I think I figured it out.  To check: the ARP
request on the layer-2 overlay is needed in order to cover all of subnet
B (and thus, to meet the criterion that the packet from PE1 has to be
able to be processed whether it arrives at PE2 or PE3).  The ARP request
triggers the PE to which host B is attached in order to advertise the
EVPN route over BGP, so that PE1 learns about host B's (MAC, IP) and can
send directly to PE3 in the future.

Section 5.4

   routed.  Hence, the ingress PE performs an IP lookup in the
   associated IP-VRF table.  The lookup identifies BGP next hop of

(This is a lookup of the destination IP address, right?)

Also, nit: "the BGP next hop" (and "the egress PE" going into the next
line).

   label is set to the received label2 in the route.  In case of
   Ethernet NVO tunnel type, VNI may be set one of two ways:
   [...]
   PE's may be configured to operate in one of these two modes depending
   on the administrative domain boundaries across PEs participating in
   the NVO, and PE's capability to support downstream VNI mode.

Is it safe to mix these two ways in a single deployment, or does the
operator need to be careful to provide homogeneous configuration?

Section 6.2

   o  Using MAC-VRF route target, it identifies the corresponding MAC-

Can a non-zero Ethernet Tag also come into play here (as was the case
for Symmetric operation)?

      An implementation may choose to consolidate the lookup at the
      ingress PE's IP-VRF with the lookup at the ingress PE's
      destination subnet MAC-VRF.  Consideration for such consolidation
      of lookups is an implementation exercise and thus its
      specification is outside the scope of this document.

(This feels a little redundant with the earlier text in Section 4 (where
we described the operation with "collapsed" vs. "consolidate" here).)

   If the receiving PE receives the MAC/IP Advertisement route with MPLS
   label2 field and it can support symmetric IRB mode, then it should
   use the MAC-VRF route target to identify its corresponding MAC-VRF
   table and import the MAC address.  It should use the IP-VRF route
   target to identify the corresponding IP-VRF table and import the IP
   address, as specified in symmetric IRB handling.  It MUST NOT import
   (IP, MAC) association into its ARP table.

I don't think I understand why this is a MUST NOT, especially since we
do store the (IP, MAC) association in the ARP table in the previous
paragraph's case.

Section 7

The mobility procedures in Section 15 of RFC 7432 on paper don't have
very great security properties -- at any remote signal indicating a
mobility event for a given MAC address, a PE advertising reachability to
that MAC will withdraw the route.  (Though, this is probably something
of a "the entire system is trusted" effect.)  Perhaps local learning
would cause it to be reintrodced if the MAC in question had not actually
moved away, but there seems to be some risk of flapping and triggering
the MAC Duplication detection.  The concrete advice we give in Section
7.1 to send a local ARP probe is good, but how rigid does the sequencing
need to be amongst (receive EVPN MAC/IP Advertisement, send local ARP
probe/wait for response, and withdraw EVPN Mac/IP Advertisement)?  If
there was a way to avoid the need to withdraw+readvertise step, it seems
like that might be preferable.

Can we confirm that the backwards- and forwards-compatibility story is
okay for mixed deployments with "stock EVPN" vs. IRB-enhanced EVPN (this
document)?  It looks like we are using the MAC Mobility Extended
Community to signal both MAC and IP mobility events, but I am not sure
that I can reason through all the cases and confirm that the right thing
happens.

Also, do all NVEs need to be prepared to cope with all three classes of
TS behavior at all times?  That seems like something worth stating more
clearly.

Section 7.2

   If EVPN-IRB NVEs are configured to advertise MAC-only routes in
   addition to MAC-and-IP EVPN routes, then the following steps are
   taken:

Does this configuration need to be globally consistent across all PEs?
If so, where do we state this requirement?

   o  The target NVE upon learning this MAC address in data-plane,
      updates this MAC address entry in the corresponding MAC-VRF with
      the local adjacency information (e.g., local interface).  It also
      recognizes that this MAC has moved and initiates MAC mobility
      procedures per [RFC7432] by advertising an EVPN MAC/IP
      Advertisement route with only the MAC address filled in along with
      MAC Mobility Extended Community with the sequence number
      incremented by one.

The lead-in had "in addition to MAC-and-IP EVPN routes", but this
step only does the MAC EVPN route.  It looks like the MAC-and-IP route
will be added later, in the third step; would it make sense to switch
this to a numbered list and mention that the additional route will be
added in step 3?

      advertised such route previously).  Furthermore, it searches for
      the corresponding MAC-IP entry and sends an ARP probe for this
      (MAC,IP) pair.  The ARP request message is sent both locally to
      all attached TSes in that subnet as well as it is sent to other
      NVEs participating in that subnet including the target NVE.  Note

I think a comment about the sequencing of this ARP request, similar to
my comment above, may apply.  But here the ARP request is sent both
locally and remotely, whereas previously it was only needed locally.  I
confess I don't understand what the difference is between the cases that
makes it okay for the Section-7.1 scenario to only send the ARP probe
locally.

Also, what happens if we get a response back from the local TSes (in
addition to the actions from the target NVE covered in the next bullet
point)?

      that the PE would need to maintain a correlation between MAC and
      MAC-IP route entries in the MAC-VRF to accomplish this.

I find this "would need to" language very concerning.  If this is
required for correct operation, isn't it a MUST?

   o  All other remote NVE devices upon receiving the MAC/IP
      advertisement route with MAC Mobility extended community compare

Does this only happen when they receive the EVPN MAC/IP Advertisement
route with both MAC and IP address information (vs. the earlier one with
just MAC information)?

   If EVPN-IRB NVEs are configured not to advertise MAC-only routes,
   then upon receiving the first data packet, it learns the MAC address

(editorial) The description for the MAC,MAC+IP case is quite involved and
has many separate steps, but the MAC+IP-only case in this paragraph is
much simpler, and so the description for it does not have a parallel
structure to the MAC,MAC+IP case.  Perhaps splitting into subsections
might help make it clear that, despite the difference in complexity, the
procedures in question are actually quite parallel in the nature of what
role they play.

Section 7.3

   On the source NVE, an age-out timer (for the silent host that has
   moved) is used to trigger an ARP probe.  This age-out timer can be
   either ARP timer or MAC age-out timer and this is an implementation
   choice.  The ARP request gets sent both locally to all the attached
   TSes on that subnet as well as it gets sent to all the remote NVEs
   (including the target NVE) participating in that subnet.  The source
   NVE also withdraw the EVPN MAC/IP Advertisement route with only the
   MAC address (if it has previously advertised such a route).

Does this route withdraw occur at the same time that the ARP probe is
sent, or only if there is not a response?

Also, does this need for a timer-driven full broadcast ARP probe impose
any scaling limits on the number of subnets that can be joined via IRB
for intra-subnet traffic?  (I don't expect there to be fundamentally
different scaling properties than for more typical ARP usage, but it's a
question about whether the joining of subnets will cause the limit to be
reached before other scaling limits would be reached.)

   The target NVE passes the ARP request to its locally attached TSes
   and when it receives the ARP response, it updates its MAC-VRF, IP-
   VRF, and ARP table with the host (MAC, IP) and local adjacency
   information (e.g., local interface).  It also sends an EVPN MAC/IP
   advertisement route with both the MAC and IP address fields filled in
   along with MAC Mobility Extended Community with the sequence number
   incremented by one.

Do we need to mention the possibility of an EVPN MAC/IP Advertisement
route with only the MAC address here, as well?

   All other remote NVE devices upon receiving the MAC/IP Advertisement
   route route with MAC Mobility extended community compare the sequence

[this is the one with both MAC and IP filled in, right?]

Section 8.1

   This extended community is used to carry the PE's MAC address for
   symmetric IRB scenarios and it is sent with RT-2.

It's only needed for specifically the Ethernet NVO tunnel (not MPLS or
IP-only NVO) symmetric IRB scenarios, right?

Section 9

I see that we only give examples/operational models for the symmetric
IRB case and for scenarios with IP subnets behind tenant systems.  Is
there any desire to give a similar example for asymmetric IRB?

Section 9, 9.1

   procedures.  In the following scenarios, without loss of generality,
   it is assumed that a given tenant is represented by a single IP-VPN
   instance.  [...]
   In this scenario, without loss of generality, it is assumed that NVEs
   operate in VLAN-based service interface mode with one Bridge
   Table (BT) per MAC-VRF.  [...]

Just to check my understanding: the "more general" case would have,
e.g., multiple IP-VPN instsances for a single tenant, or ... multiple
BTs per MAC-VRF?  I guess for the former case the IP-VPN instances would
be logically separate (absent some central L3GW), but I'm not really
sure what the latter case would look like.  Though maybe the VLAN-aware
bundling case in the following sentences (with multiple BTs per MAC-VRF)
is exactly the generalization being assumed here?

Section 9.1.1

   o  If the route carries the new Router's MAC Extended Community, and
      if the receiving NVE uses Ethernet NVO tunnel, then the receiving
      NVE imports the IP address into IP-VRF with NVE's MAC address

What does the receiving NVE do if it uses Ethernet NVO tunnels but the
route does not carry the Router's MAC Extended Community?

   o  If the receiving NVE ration MPLS encapsulation, then the receiving
      NVE imports the IP address into IP-VRF with BGP Next Hop address

What does "ration" mean here?

   If the receiving NVE receives a RT-2 with only Label-1 and only a
   single Route Target corresponding to IP-VRF, or if it receives a RT-2
   with only a single Route Target corresponding to MAC-VRF but with
   both Label-1 and Label-2, or if it receives a RT-2 with MAC Address
   Length of zero, then it MUST treat the route as withdraw [RFC7606]
   and log an error message.

Is this changing the error handling for the existing RT-2 advertisement?
How is such a change backwards compatible?

Also, a "MUST log" without rate-limiting risks mandating a DoS channel.

Section 9.1.2

   o  On the egress NVE, if the packet arrives on Ethernet NVO tunnel
      (e.g., it is VxLAN encapsulated), then the NVO tunnel header is

(side note) I can't decide whether it's good or bad that we use VxLAN as
the example here vs. the explicit MPLS/etc. mentions in the previous
point.  On the one hand, it lets us mention all the various different
implementation technologies, but on the other hand we don't have a
coherent thread to tie the steps together.

Section 9.2

   their next hop.  The receiving NVEs perform recursive route
   resolution to resolve the subnet prefix with its associated ingress
   NVE so that they know which NVE to forward the packets to when they
   are destined for that subnet prefix.

nit(?): is the "ingress" part important here?  The "forward the packets
to" operation would seem to be describing a case where that NVE is
acting as egress, at least...

   and MAC-VRF3 across two NVEs.  There are four TSes associated with
   these three MAC-VRFs - i.e., TS1, TS5 are connected to MAC-VRF1 on

[there is no TS5 in the figure]

Section 9.2.1

   o  Label = 0

[draft-ietf-bess-evpn-prefix-advertisement seems to describe this field
as the "MPLS Label".]

   This RT-5 is advertised with one or more Route Targets that have been
   configured as "export route targets" of the IP-VRF from which the
   route is originated.

This is the only place in this document where we use the phrase "export
route target" and no reference is provided; please clarify what is
meant.

   o  It imports the IP prefix into its corresponding IP-VRF that is
      configured with an import RT that is one of the RTs being carried
      by the RT-5 route along with the IP address of the associated TS
      as its next hop.

The phrase "that is one of the RTs being carried by the RT-5 route"
seems like it leaves a significant degree of freedom to the receiving
NVE.  Do we want to make this more precisely specified?

Section 9.2.2

   The following description of the data-plane operation describes just
   the logical functions and the actual implementation may differ.  Lets
   consider data-plane operation when a host on SN1 sitting behind TS1
   wants to send traffic to a host sitting behind SN3 behind TS3.

   o  TS1 send a packet with MAC DA corresponding to the MAC-VRF1 IRB
      interface of NVE1, and VLAN-tag corresponding to MAC-VRF1.

[I guess the step where there's an ethernet frame from SN1 to TS1 is
boring and okay to skip?]

Section 11

I agree with the secdir reviewer's comment that incorporating by
reference the security considerations of the core protocol RFCs would be
worthwhile.

Similarly, a reminder seems appropriate (a la RFC 4365) that this VPN
scheme does not provide the usual quartet of security properties that we
ask about (confidentiality protection, source authentication, integrity
protection, replay protection), and that if such functionality is needed
it must be provided in some other manner.

   Furthermore, the security consideration for layer-3 routing is this
   document follows that of [RFC4365] with the exception for application

The Security Considerations of RFC 4365 notes that RFC 4111 provides a
template "that may be used to evaluate and summarize how a given PPVPN
approach (solution) measures up against the PPVPN Security Framework".
Given that the IP-layer inter-subnet routing introduced by this document
is in some sense a new L3VPN technology, would it be appropriate to fill
out that template as it applies here?  It's unfortunate that RFC 7432
does not itself fill out the template from RFC 4111, as it would be
useful to have that information readily available as well (though I
understand that the L2-only parts of the mechanims described in this
document are essentially unchanged from RFC 7432 and it is only our
responsibility to document otherwise-undocumented critical security
flaws).

I think we should note that the asymmetric IRB scheme leaves tenant MAC
addresses visible in cleartext (absent other cryptographic protections)
over the backbone/underlay, which has potential privacy consequences.

The mechanisms in this document bring additional exposure/usage for
per-NVE IP and MAC addresses, at least some of which may be (locally)
automatically generated in some cases; it might be appropriate to
discuss how the system would fail if there were collisions in these
identifiers between NVEs.

As alluded to in a couple of my previous comments, we may also wish to
discuss any scaling issues (and consequent DoS risks) that may arise due
to the need for NVEs to store additional state in ARP tables and
IP-VRFs.  (If such risks are negligible, then maybe not.)

There probably aren't any additional mobility-related security
considerations other than what RFC 7432 discusses (but mentioning
mobility concerns specifically may still be worthwhile).

Section 14.1

RFC 8214 is listed as a normative reference but not cited anywhere.
Should it be removed, or citations added?

Section 14.2

RFC 7606 seems like it should be normative, since we use it to define
what "treat the route as withdraw" means, which itself is behavior
mandated at a MUST level.