[bess] John Scudder's No Objection on draft-ietf-bess-evpn-optimized-ir-11: (with COMMENT)

John Scudder via Datatracker <noreply@ietf.org> Thu, 06 January 2022 22:33 UTC

Return-Path: <noreply@ietf.org>
X-Original-To: bess@ietf.org
Delivered-To: bess@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 4E6D23A07A6; Thu, 6 Jan 2022 14:33:27 -0800 (PST)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: John Scudder via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-bess-evpn-optimized-ir@ietf.org, bess-chairs@ietf.org, bess@ietf.org, Matthew Bocci <matthew.bocci@nokia.com>, matthew.bocci@nokia.com
X-Test-IDTracker: no
X-IETF-IDTracker: 7.41.0
Auto-Submitted: auto-generated
Precedence: bulk
Reply-To: John Scudder <jgs@juniper.net>
Message-ID: <164150840664.31494.16640346005965398794@ietfa.amsl.com>
Date: Thu, 06 Jan 2022 14:33:27 -0800
Archived-At: <https://mailarchive.ietf.org/arch/msg/bess/lwxgw7T6SMaSzkNiQHXt0e024ZA>
Subject: [bess] John Scudder's No Objection on draft-ietf-bess-evpn-optimized-ir-11: (with COMMENT)
X-BeenThere: bess@ietf.org
X-Mailman-Version: 2.1.29
List-Id: BGP-Enabled ServiceS working group discussion list <bess.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bess>, <mailto:bess-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bess/>
List-Post: <mailto:bess@ietf.org>
List-Help: <mailto:bess-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bess>, <mailto:bess-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Jan 2022 22:33:27 -0000

John Scudder has entered the following ballot position for
draft-ietf-bess-evpn-optimized-ir-11: No Objection

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/blog/handling-iesg-ballot-positions/
for more information about how to handle DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-bess-evpn-optimized-ir/



----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

Thanks for the updates!

Overall comments:

1. This document suffers from what I think is an overuse of
   abbreviations.  See
   https://www.psychologicalscience.org/observer/alienating-the-audience-how-abbreviations-hamper-scientific-communication
   for one perspective on why this is problematic.  Any individual one of these
   doesn't rise to the level of being objectionable, but in aggregate at some
   point it makes the document a lot less accessible to anyone who isn't part
   the in-group who has memorized the abbreviations. 4r xm, <- ... is !@? to
   rd, 4r no gd rn, [see terminology section below] even though anyone who goes
   to the effort of looking up the terminology can decode it.  I would really
   prefer it if this were improved; I think it's not that much work for the
   authors and will make the resulting spec more usable.  I had intended to
   offer an example edit that expands many of the abbreviations, but have run
   out of time; I'd still be willing to do it later if requested, let me know.

   (Consider also the contrast with RFC 6514; for instance instead of
   referring to "the L-flag", when mentioning that flag it says "the Leaf
   Information Required flag".  Since we don't pay by the byte for publishing
   our documents, it seems to be worth spending a few more keystrokes to
   make it easier to read them.)

2. The document starts in the middle.  It jumps right from the requirements
   to the tunnel attribute diagram, with no overview or outline of the
   solution.  This is related to Pascal's review comment, mentioned by Éric
   Vyncke.

Terminology:

   4r: for
   xm: example
   <-: this
   ...: sentence
   !@?: difficult
   rd: read
   gd: good
   rn: reason

Detailed review:

I’ve done my comments in the form of an edited copy of the draft.  I
don't think the datatracker tooling allows me to use attachments, so
I'll follow up to this with an email with attached edited copy, as well
as a PDF of the rfcdiff output for your convenience if you’d like to use
it. I’ve also pasted a traditional diff below to capture the comments
for the record and in case you want to use it for in-line reply. I’d
appreciate feedback regarding whether you found this a useful way to
receive my comments as compared to a more traditional numbered list of
comments with selective quotation from the draft.

*** draft-ietf-bess-evpn-optimized-ir-09.txt    2021-10-20 13:48:15.000000000
-0400 --- draft-ietf-bess-evpn-optimized-ir-09-jgs-markup.txt 2021-10-20
20:39:39.000000000 -0400
***************
*** 19,25 ****

  Abstract

!    Network Virtualization Overlay (NVO) networks using EVPN as control
     plane may use Ingress Replication (IR) or PIM (Protocol Independent
     Multicast) based trees to convey the overlay Broadcast, Unknown
     unicast and Multicast (BUM) traffic.  PIM provides an efficient
--- 19,25 ----

  Abstract

!    Network Virtualization Overlay (NVO) networks using EVPN as their control
     plane may use Ingress Replication (IR) or PIM (Protocol Independent
     Multicast) based trees to convey the overlay Broadcast, Unknown
     unicast and Multicast (BUM) traffic.  PIM provides an efficient
***************
*** 105,111 ****

     Ethernet Virtual Private Networks (EVPN) may be used as the control
     plane for a Network Virtualization Overlay (NVO) network.  Network
!    Virtualization Edge (NVE) devices and Provider Edges (PEs) that are

--- 105,111 ----

     Ethernet Virtual Private Networks (EVPN) may be used as the control
     plane for a Network Virtualization Overlay (NVO) network.  Network
!    Virtualization Edge (NVE) and Provider Edge (PEs) devices that are

***************
*** 182,187 ****
--- 182,191 ----
     "OPTIONAL" in this document are to be interpreted as described in BCP
     14 [RFC2119] [RFC8174] when, and only when, they appear in all
     capitals, as shown here.
+
+ Is there any logic to the order in which the terms are presented? If so,
+ it escaped me. It would have been much better for my reading of the document,
+ if the terms had been given in alphabetical order, for obvious reasons.

     The following terminology is used throughout the document:

***************
*** 236,241 ****
--- 240,247 ----
        Replicator-AR route.  It is used to identify the ingress packets
        that must follow AR procedures ONLY in the Single-IP AR-REPLICATOR
        case.
+
+ A reference to section 8 would be helpful in the above.

     -  IR-VNI: VNI advertised along with the RT-3 for IR.

***************
*** 288,296 ****
--- 294,313 ----
     hereafter) meets the following requirements:

     a.  It provides an IR optimization for BM (Broadcast and Multicast)
+
+ Thank you for expanding "BM", but... you've already defined it in your
+ Terminology section, so maybe you don't need to define it again. (But see
+ also my general comment on the subject of abbreviations in general;
+ depending on how we resolve that this comment may be overtaken by events.)
+
         traffic without the need for PIM, while preserving the packet
         order for unicast applications, i.e., known and unknown unicast
         traffic should follow the same path.  This optimization is
+
+ ... the same path as what? If you mean unknown should follow the same path
+ as known, then use "... i.e., unknown unicast traffic should follow the same
+ path as known unicast traffic". If you mean something different, what is it?
+
         required in low-performance NVEs.

     b.  It reduces the flooded traffic in NVO networks where some NVEs do
***************
*** 361,369 ****
--- 378,403 ----

     The Flags field is 8 bits long.  This document defines the use of 4
     bits of this Flags field:
+
+ It would be quite helpful to include a diagram of the Flags field as in
+ RFC 6514 §5:
+
+    The Flags field has the following format:
+
+        0 1 2 3 4 5 6 7
+       +-+-+-+-+-+-+-+-+
+       |  reserved   |L|
+       +-+-+-+-+-+-+-+-+
+
+ except of course with all the new and previously-defined flags filled
+ in too.

     -  bits 3 and 4, forming together the Assisted-Replication Type (T)
        field
+
+ Up here you call it the Assisted-Replication Type field.  Just a few lines
+ later you call it the AR Type field.  Can you make up your mind and use
+ one or the other, please?

     -  bit 5, called the Broadcast and Multicast (BM) flag

***************
*** 406,411 ****
--- 440,448 ----
     -  Flag L is an existing flag defined in [RFC6514] (L=Leaf
        Information Required) and it will be used only in the Selective AR
        Solution.
+
+ I think it would be nice to provide the bit position for this flag, as in
+ "(L=Leaf Information Required, bit 7)"

     Please refer to Section 11 for the IANA considerations related to the
     PTA flags.
***************
*** 420,436 ****
--- 457,497 ----
        address that we denominate IR-IP in this document.  When
        advertised by an AR-LEAF node, the Regular-IR route SHOULD be
        advertised with type T= AR-LEAF.
+
+ Your use of SHOULD implies there is at least one case where a reasonable
+ implementation could choose to advertise a Regular-IR route from an
+ AR-LEAF node with a different type.  I am left to guess what the case is,
+ and what value it should choose then.  Maybe it would use RNVE instead?
+ Please say something about this.  On the other hand if there isn't any
+ such case, this should be a MUST.

     -  Replicator-AR route: this route is used by the AR-REPLICATOR to
        advertise its AR capabilities, with the fields set as follows:

        o  Originating Router's IP Address MUST be set to an IP address of
           the PE that should be common to all the EVIs on the PE (usually
+
+ What's "the PE" in this context?  I'm assuming it means "the advertising
+ router".  If that's right, please say that instead of "the PE".
+
           this is the PE's loopback address).  The Tunnel Identifier and
           Next-Hop SHOULD be set to the same IP address as the
           Originating Router's IP address when the NVE/PE originates the
           route.  The Next-Hop address is referred to as the AR-IP and
           SHOULD be different than the IR-IP for a given PE/NVE.
+
+ Similar question to my earlier one about the two SHOULDs above.  I guess
+ in the case of the second SHOULD, it MAY be the same in the case of a
+ router unable to support two different IP addresses for this purpose, in
+ which case the procedures of Section 8 MUST be applied?  If that's right,
+ please add language to that effect.
+
+ As for the first SHOULD, does this imply that the Tunnel Identifier and
+ Next-Hop MAY be set to the IP address of some other router?
+
+ Also, "when the NVE/PE originates the route" -- in this section aren't
+ we always talking about the NVE/PE originating the route?  This clause
+ makes me think there is another case, but I can't figure out what it is.

        o  Tunnel Type = Assisted-Replication Tunnel.  Section 11 provides
           the allocated type value.
***************
*** 440,446 ****
        o  L (Leaf Information Required) = 0 (for non-selective AR) or 1
           (for selective AR).

!    In addition, this document also uses the Leaf A-D route (RT-11)
     defined in [I-D.ietf-bess-evpn-bum-procedure-updates] in case the

--- 501,507 ----
        o  L (Leaf Information Required) = 0 (for non-selective AR) or 1
           (for selective AR).

!    In addition, this document also uses the Leaf Auto-Discovery (A-D) route
(RT-11)
     defined in [I-D.ietf-bess-evpn-bum-procedure-updates] in case the

***************
*** 452,457 ****
--- 513,522 ----

     selective AR mode is used.  The Leaf A-D route MAY be used by the AR-
     LEAF in response to a Replicator-AR route (with the L flag set) to
+
+ The above is ambiguous.  Maybe "An AR-LEAF MAY send a Leaf A-D route in
+ response to reception of a Replicator-AR route whose L flag is set."?
+
     advertise its desire to receive the BM traffic from a specific AR-
     REPLICATOR.  It is only used for selective AR and its fields are set
     as follows:
***************
*** 459,466 ****
--- 524,538 ----

        o  Originating Router's IP Address is set to the advertising PE's
+
+ What's "the PE" in this context?  I'm assuming it means "the advertising
+ router".  If that's right, please say that instead of "the PE".
+
           IP address (same IP used by the AR-LEAF in regular-IR routes).
           The Next-Hop address is set to the IR-IP.
+
+ ... and the IR-IP is different from the "advertising PE's IP address" I
+ guess?

        o  Route Key is the "Route Type Specific" NLRI of the Replicator-
           AR route for which this Leaf A-D route is generated.
***************
*** 477,483 ****

        o  The Leaf A-D route MUST include the PMSI Tunnel attribute with
           the Tunnel Type set to AR, type set to AR-LEAF and the Tunnel
!          Identifier set to the IP of the advertising AR-LEAF.  The PMSI
           Tunnel attribute MUST carry a downstream-assigned MPLS label or
           VNI that is used by the AR-REPLICATOR to send traffic to the
           AR-LEAF.
--- 549,555 ----

        o  The Leaf A-D route MUST include the PMSI Tunnel attribute with
           the Tunnel Type set to AR, type set to AR-LEAF and the Tunnel
!          Identifier set to the IP address of the advertising AR-LEAF.  The
PMSI
           Tunnel attribute MUST carry a downstream-assigned MPLS label or
           VNI that is used by the AR-REPLICATOR to send traffic to the
           AR-LEAF.
***************
*** 488,494 ****

     Each node attached to the BD may understand and process the BM/U
     flags.  Note that these BM/U flags may be used to optimize the
!    delivery of multi-destination traffic and its use SHOULD be an
     administrative choice, and independent of the AR role.

     Non-optimized-IR nodes will be unaware of the new PMSI attribute flag
--- 560,566 ----

     Each node attached to the BD may understand and process the BM/U
     flags.  Note that these BM/U flags may be used to optimize the
!    delivery of multi-destination traffic and their use SHOULD be an
     administrative choice, and independent of the AR role.

     Non-optimized-IR nodes will be unaware of the new PMSI attribute flag
***************
*** 512,518 ****
     AR function is enabled.  Three different roles are defined for a
     given BD: AR-REPLICATOR, AR-LEAF and RNVE (Regular NVE).  The
     solution is called "non-selective" because the chosen AR-REPLICATOR
!    for a given flow MUST replicate the BM traffic to 'all' the NVE/PEs
     in the BD except for the source NVE/PE.

                             (           )
--- 584,590 ----
     AR function is enabled.  Three different roles are defined for a
     given BD: AR-REPLICATOR, AR-LEAF and RNVE (Regular NVE).  The
     solution is called "non-selective" because the chosen AR-REPLICATOR
!    for a given flow MUST replicate the BM traffic to all the NVE/PEs
     in the BD except for the source NVE/PE.

                             (           )
***************
*** 567,572 ****
--- 639,658 ----
     An AR-REPLICATOR is defined as an NVE/PE capable of replicating
     ingress BM (Broadcast and Multicast) traffic received on an overlay
     tunnel to other overlay tunnels and local Attachment Circuits (ACs).
+
+ This is different from the definition you have in the terminology section,
+ which is:
+
+    -  AR-REPLICATOR: Assisted Replication - REPLICATOR, refers to an
+       NVE/PE that can replicate Broadcast or Multicast traffic received
+       on overlay tunnels to other overlay tunnels.
+
+ In the definition here, you mention local attachment circuits, in §2 you
+ don't.  Probably you should harmonize these definitions.  Having done so,
+ it's not clear to me that you need to repeat the definition here (though
+ if you think you need to remind the reader of what you already told them,
+ it's OK).
+
     The AR-REPLICATOR signals its role in the control plane and
     understands where the other roles (AR-LEAF nodes, RNVEs and other AR-
     REPLICATORs) are located.  A given AR-enabled BD service may have
***************
*** 584,608 ****
         generate a Regular-IR route if it does not have local attachment
         circuits (AC).  If the Regular-IR route is advertised, the AR
         Type field is set to zero.

     c.  The Replicator-AR and Regular-IR routes are generated according
         to section 3.  The AR-IP and IR-IP used by the AR-REPLICATOR are
         different routable IP addresses.

     d.  When a node defined as AR-REPLICATOR receives a BM packet on an
!        overlay tunnel, it will do a tunnel destination IP lookup and
         apply the following procedures:

!        o  If the destination IP is the AR-REPLICATOR IR-IP Address the
            node will process the packet normally as in [RFC7432].

!        o  If the destination IP is the AR-REPLICATOR AR-IP Address the
            node MUST replicate the packet to local ACs and overlay
            tunnels (excluding the overlay tunnel to the source of the
            packet).  When replicating to remote AR-REPLICATORs the tunnel
!           destination IP will be an IR-IP.  That will be an indication
            for the remote AR-REPLICATOR that it MUST NOT replicate to
!           overlay tunnels.  The tunnel source IP used by the AR-
            REPLICATOR MUST be its IR-IP when replicating to either AR-
            REPLICATOR or AR-LEAF nodes.

--- 670,705 ----
         generate a Regular-IR route if it does not have local attachment
         circuits (AC).  If the Regular-IR route is advertised, the AR
         Type field is set to zero.
+
+ Do you mean "... the AR Type field of the Replicator-AR route MUST be
+ set to zero"?  If so, please say that.

     c.  The Replicator-AR and Regular-IR routes are generated according
         to section 3.  The AR-IP and IR-IP used by the AR-REPLICATOR are
+
+ I think you mean Section 4?
+
         different routable IP addresses.
+
+ I think you'll find that "routable IP address" isn't a well-defined
+ term (for example I'm sure you're NOT talking specifically about non-
+ RFC 1918 addresses).  Can you choose different language here to say
+ what you mean?

     d.  When a node defined as AR-REPLICATOR receives a BM packet on an
!        overlay tunnel, it will do a tunnel destination IP address lookup and
         apply the following procedures:

!        o  If the destination IP address is the AR-REPLICATOR IR-IP Address the
            node will process the packet normally as in [RFC7432].

!        o  If the destination IP address is the AR-REPLICATOR AR-IP Address the
            node MUST replicate the packet to local ACs and overlay
            tunnels (excluding the overlay tunnel to the source of the
            packet).  When replicating to remote AR-REPLICATORs the tunnel
!           destination IP address will be an IR-IP.  That will be an indication
            for the remote AR-REPLICATOR that it MUST NOT replicate to
!           overlay tunnels.  The tunnel source IP address used by the AR-
            REPLICATOR MUST be its IR-IP when replicating to either AR-
            REPLICATOR or AR-LEAF nodes.

***************
*** 628,642 ****
        and remote NVE/PEs), skipping the non-BM overlay tunnels.

     -  When an AR-REPLICATOR receives a BM packet on an overlay tunnel,
!       it will check the destination IP of the underlay IP header and:

!       o  If the destination IP matches its AR-IP, the AR-REPLICATOR will
           forward the BM packet to its flooding list (ACs and overlay
           tunnels) excluding the non-BM overlay tunnels.  The AR-
!          REPLICATOR will do source squelching to ensure the traffic is
           not sent back to the originating AR-LEAF.

!       o  If the destination IP matches its IR-IP, the AR-REPLICATOR will
           skip all the overlay tunnels from the flooding list, i.e.  it
           will only replicate to local ACs.  This is the regular IR
           behavior described in [RFC7432].
--- 725,742 ----
        and remote NVE/PEs), skipping the non-BM overlay tunnels.

     -  When an AR-REPLICATOR receives a BM packet on an overlay tunnel,
!       it will check the destination IP address of the underlay IP header and:

!       o  If the destination IP address matches its AR-IP, the AR-REPLICATOR
will
           forward the BM packet to its flooding list (ACs and overlay
           tunnels) excluding the non-BM overlay tunnels.  The AR-
!          REPLICATOR will ensure the traffic is
           not sent back to the originating AR-LEAF.
+
+ Above, I suggested the removal of "do source squelching" since AFAICT
+ it removes jargon while leaving the intention clear.

!       o  If the destination IP address matches its IR-IP, the AR-REPLICATOR
will
           skip all the overlay tunnels from the flooding list, i.e.  it
           will only replicate to local ACs.  This is the regular IR
           behavior described in [RFC7432].
***************
*** 645,650 ****
--- 745,754 ----
        is different for BM traffic, as far as Unknown unicast traffic
        forwarding is concerned, AR-LEAF nodes behave exactly in the same
        way as AR-REPLICATORs do.
+
+ I'm unclear why you're defining the behavior of AR-LEAF nodes here, when
+ you started by saying "An AR-REPLICATOR will follow..."  Surely, defining
+ AR-LEAF behavior here is misplaced?

     -  The AR-REPLICATOR/LEAF nodes will build an Unknown unicast flood-
        list composed of ACs and overlay tunnels to the IR-IP Addresses of
***************
*** 655,660 ****
--- 759,767 ----
        o  When an AR-REPLICATOR/LEAF receives an unknown packet on an AC,
           it will forward the unknown packet to its flood-list, skipping
           the non-U overlay tunnels.
+
+ Possibly the term "unknown packet" is well-understood by the target
+ audience, but I think it needs either an explanation or a reference here.

        o  When an AR-REPLICATOR/LEAF receives an unknown packet on an
           overlay tunnel will forward the unknown packet to its local ACs
***************
*** 688,696 ****
     b.  In this non-selective AR solution, the AR-LEAF MUST advertise a
         single Regular-IR inclusive multicast route as in [RFC7432].  The
         AR-LEAF SHOULD set the AR Type field to AR-LEAF.  Note that
!        although this flag does not make any difference for the egress
         nodes when creating an EVPN destination to the AR-LEAF, it is
!        RECOMMENDED to use this flag for an easy operation and
         troubleshooting of the BD.

     c.  In a service where there are no AR-REPLICATORs, the AR-LEAF MUST
--- 795,803 ----
     b.  In this non-selective AR solution, the AR-LEAF MUST advertise a
         single Regular-IR inclusive multicast route as in [RFC7432].  The
         AR-LEAF SHOULD set the AR Type field to AR-LEAF.  Note that
!        although this field does not make any difference for the egress
         nodes when creating an EVPN destination to the AR-LEAF, it is
!        RECOMMENDED to use this field for an easy operation and
         troubleshooting of the BD.

     c.  In a service where there are no AR-REPLICATORs, the AR-LEAF MUST
***************
*** 701,706 ****
--- 808,816 ----
         IGP or any other detection mechanism).  Ingress replication MUST
         use the forwarding information given by the remote Regular-IR
         Inclusive Multicast Routes as described in [RFC7432].
+
+ I found the above paragraph to be confusing.  Does it boil down to,
+ if there are no AR-REPLICATORS, use regular IR?

     d.  In a service where there is one or more AR-REPLICATORs (based on
         the received Replicator-AR routes for the BD), the AR-LEAF can
***************
*** 709,720 ****
         o  A single AR-REPLICATOR MAY be selected for all the BM packets
            received on the AR-LEAF attachment circuits (ACs) for a given
            BD.  This selection is a local decision and it does not have
!           to match other AR-LEAF's selection within the same BD.

         o  An AR-LEAF MAY select more than one AR-REPLICATOR and do
            either per-flow or per-BD load balancing.

!        o  In case of a failure on the selected AR-REPLICATOR, another
            AR-REPLICATOR will be selected.

         o  When an AR-REPLICATOR is selected, the AR-LEAF MUST send all
--- 819,830 ----
         o  A single AR-REPLICATOR MAY be selected for all the BM packets
            received on the AR-LEAF attachment circuits (ACs) for a given
            BD.  This selection is a local decision and it does not have
!           to match other AR-LEAFs' selections within the same BD.

         o  An AR-LEAF MAY select more than one AR-REPLICATOR and do
            either per-flow or per-BD load balancing.

!        o  In case of a failure of the selected AR-REPLICATOR, another
            AR-REPLICATOR will be selected.

         o  When an AR-REPLICATOR is selected, the AR-LEAF MUST send all
***************
*** 752,757 ****
--- 862,874 ----
         to the AR-REPLICATOR and be programmed.  While the AR-REPLICATOR-
         activation-time is running, the AR-LEAF node will use regular
         ingress replication.
+
+ Probably you should say something about the case where a router has
+ selected its preferred AR-REPLICATOR from the set that are available,
+ and then a new AR-REPLICATOR shows up that is more preferable.  Should
+ the router shift to the new, preferred replicator?  Should it stick
+ with the one it was already using even though less-preferred?  Is it a
+ matter of local policy?

     An AR-LEAF will follow a data path implementation compatible with the
     following rules:
***************
*** 849,874 ****
         REPLICATORs will fall back to non-selective AR mode.

     c.  The Selective AR-REPLICATOR MUST follow the procedures described
!        in section Section 5.1, except for the following differences:

         o  The Replicator-AR route MUST include L=1 (Leaf Information
            Required) in the Replicator-AR route.  This flag is used by
            the AR-REPLICATORs to advertise their 'selective' AR-
            REPLICATOR capabilities.  In addition, the AR-REPLICATOR auto-
            configures its IP-address-specific import route-target as
!           described in section Section 4.

         o  The AR-REPLICATOR will build a 'selective' AR-LEAF-set with
            the list of nodes that requested replication to its own AR-IP.
            For instance, assuming NVE1 and NVE2 advertise a Leaf A-D
            route with PE1's IP-address-specific route-target and NVE3
            advertises a Leaf A-D route with PE2's IP-address-specific
!           route-target, PE1 MUST only add NVE1/NVE2 to its selective AR-
!           LEAF-set for BD-1, and exclude NVE3.

!        o  When a node defined and operating as Selective AR-REPLICATOR
            receives a packet on an overlay tunnel, it will do a tunnel
!           destination IP lookup and if the destination IP is the AR-
            REPLICATOR AR-IP Address, the node MUST replicate the packet
            to:

--- 966,997 ----
         REPLICATORs will fall back to non-selective AR mode.

     c.  The Selective AR-REPLICATOR MUST follow the procedures described
!        in Section 5.1, except for the following differences:

         o  The Replicator-AR route MUST include L=1 (Leaf Information
            Required) in the Replicator-AR route.  This flag is used by
            the AR-REPLICATORs to advertise their 'selective' AR-
            REPLICATOR capabilities.  In addition, the AR-REPLICATOR auto-
            configures its IP-address-specific import route-target as
!           described in the third bullet of the procedures for Leaf A-D
!           route in Section 4.

         o  The AR-REPLICATOR will build a 'selective' AR-LEAF-set with
            the list of nodes that requested replication to its own AR-IP.
            For instance, assuming NVE1 and NVE2 advertise a Leaf A-D
            route with PE1's IP-address-specific route-target and NVE3
            advertises a Leaf A-D route with PE2's IP-address-specific
!           route-target, PE1 will only add NVE1/NVE2 to its selective AR-
!           LEAF-set for BD-1, and exclude NVE3.  Likewise, PE2 will only
!           add NVE3 to its selective AR-LEAF-set for BD-1, and exclude
!           NVE1/NVE2.
!
! I changed the MUST to "will" above -- it's an example, it's inappropriate
! to use RFC 2119 type keywords in it.

!        o  When a node defined and operating as a Selective AR-REPLICATOR
            receives a packet on an overlay tunnel, it will do a tunnel
!           destination IP lookup and if the destination IP address is the AR-
            REPLICATOR AR-IP Address, the node MUST replicate the packet
            to:

***************
*** 878,893 ****
               overlay tunnel to the source AR-LEAF).

            +  overlay tunnels to the RNVEs if the tunnel source IP is the
!              IR-IP of an AR-LEAF (in any other case, the AR-REPLICATOR
!              MUST NOT replicate the BM traffic to remote RNVEs).  In
               other words, only the first-hop selective AR-REPLICATOR
               will replicate to all the RNVEs.

            +  overlay tunnels to the remote Selective AR-REPLICATORs if
!              the tunnel source IP is an IR-IP of its own AR-LEAF-set (in
               any other case, the AR-REPLICATOR MUST NOT replicate the BM
!              traffic to remote AR-REPLICATORs), where the tunnel
!              destination IP is the AR-IP of the remote Selective AR-
               REPLICATOR.  The tunnel destination IP AR-IP will be an

--- 1001,1016 ----
               overlay tunnel to the source AR-LEAF).

            +  overlay tunnels to the RNVEs if the tunnel source IP is the
!              IR-IP of an AR-LEAF.  In any other case, the AR-REPLICATOR
!              MUST NOT replicate the BM traffic to remote RNVEs.  In
               other words, only the first-hop selective AR-REPLICATOR
               will replicate to all the RNVEs.

            +  overlay tunnels to the remote Selective AR-REPLICATORs if
!              the tunnel source IP address is an IR-IP of its own AR-LEAF-set.
 In
               any other case, the AR-REPLICATOR MUST NOT replicate the BM
!              traffic to remote AR-REPLICATORs.  When doing this replication,
the tunnel !              destination IP address is the AR-IP of the remote
Selective AR-
               REPLICATOR.  The tunnel destination IP AR-IP will be an

***************
*** 911,916 ****
--- 1034,1042 ----
            destination IP addresses.  Some of those overlay tunnels MAY
            be flagged as non-BM receivers based on the BM flag received
            from the remote nodes in the BD.
+
+ It's not clear to me why you'd include "overlay tunnels ... flagged as
+ non-BM receivers" in a flood-list that's used for flooding BM traffic?

        2.  Flood-list #2 - composed of ACs, a Selective AR-LEAF-set and a
            Selective AR-REPLICATOR-set, where:
***************
*** 928,945 ****
     -  When a Selective AR-REPLICATOR receives a BM packet on an AC, it
        will forward the BM packet to its flood-list #1, skipping the non-
        BM overlay tunnels.

     -  When a Selective AR-REPLICATOR receives a BM packet on an overlay
        tunnel, it will check the destination and source IPs of the
        underlay IP header and:

!       o  If the destination IP matches its AR-IP and the source IP
           matches an IP of its own Selective AR-LEAF-set, the AR-
           REPLICATOR will forward the BM packet to its flood-list #2, as
           long as the list of AR-REPLICATORs for the BD matches the
           Selective AR-REPLICATOR-set.  If the Selective AR-REPLICATOR-
           set does not match the list of AR-REPLICATORs, the node reverts
           back to non-selective mode and flood-list #1 is used.

        o  If the destination IP matches its AR-IP and the source IP does
           not match any IP of its Selective AR-LEAF-set, the AR-
--- 1054,1104 ----
     -  When a Selective AR-REPLICATOR receives a BM packet on an AC, it
        will forward the BM packet to its flood-list #1, skipping the non-
        BM overlay tunnels.
+
+ It sure seems like it would have been cleaner to have expressed this by
+ naming a list (Flood-list #3, whatever) that doesn't include the non-BM
+ overlay tunnels to begin with, and then saying that's the list used in
+ this case. I guess this also relates to my previous comment/question --
+ basically, why are the non-BM overlay tunnels even included?

     -  When a Selective AR-REPLICATOR receives a BM packet on an overlay
        tunnel, it will check the destination and source IPs of the
        underlay IP header and:

!       o  If the destination IP address matches its AR-IP and the source IP
address
           matches an IP of its own Selective AR-LEAF-set, the AR-
           REPLICATOR will forward the BM packet to its flood-list #2, as
           long as the list of AR-REPLICATORs for the BD matches the
           Selective AR-REPLICATOR-set.  If the Selective AR-REPLICATOR-
           set does not match the list of AR-REPLICATORs, the node reverts
           back to non-selective mode and flood-list #1 is used.
+
+ Presumably this time the non-BM overlay tunnels are NOT excluded?
+
+ Also, I guess the language above is where the answer to the "fall back to
+ non-selective AR mode" puzzle from point b, above, is hidden. It requires
+ that I make some assumptions:
+
+ - The "list of AR-REPLICATORS for the BD" is derived from the set of
+   AR-REPLICATOR advertisements for the BD. (This is not intuitively
+   obvious; "list" is very generic and could be, for example, configured
+   or something.)
+ - The Selective AR-REPLICATOR-set is all the members of the above list
+   that have advertised L=1.
+ - Ergo, if the sets aren't identical, some of them must have advertised
+   L=0.
+
+ It seems to me as though it would be more understandable to say something
+ like:
+
+ --
+       o  If the destination IP address matches its AR-IP and the source IP
address +          matches an IP of its own Selective AR-LEAF-set, the AR- +   
      REPLICATOR will forward the BM packet to its flood-list #2, +         
unless some AR-REPLICATOR within the BD has advertised L=0. +          In the
latter case, the node reverts +          back to non-selective mode and
flood-list #1 is used. + --

        o  If the destination IP matches its AR-IP and the source IP does
           not match any IP of its Selective AR-LEAF-set, the AR-
***************
*** 960,970 ****
           This is the regular-IR behavior described in [RFC7432].

     -  In any case, non-BM overlay tunnels are excluded from flood-lists
!       and, also, source squelching is always done in order to ensure the
        traffic is not sent back to the originating source.  If the
!       encapsulation is MPLSoGRE (or MPLSoUDP) and the BD label is not
        the bottom of the stack, the AR-REPLICATOR MUST copy the rest of
        the labels when forwarding them to the egress overlay tunnels.

  6.2.  Selective AR-LEAF procedures

--- 1119,1142 ----
           This is the regular-IR behavior described in [RFC7432].

     -  In any case, non-BM overlay tunnels are excluded from flood-lists
!
! That seems inconsistent with what point 1, above, says -- the place where
! I asked why you'd include non-BM receivers.  In any case, there you say
! they can be part of flood-list #1. Here you say they "are excluded".
! Which is it?
!
!       and, also,
        traffic is not sent back to the originating source.  If the
!       encapsulation is MPLSoGRE or MPLSoUDP and the BD label is not
        the bottom of the stack, the AR-REPLICATOR MUST copy the rest of
        the labels when forwarding them to the egress overlay tunnels.
+
+ Above, I removed "source squelching" again since it seemed not to add
+ anything, as previously.
+
+ Reference needed for "BD label".  I also wonder, is the requirement that
+ the replicator copy the rest of the labels a new one introduced here, or
+ are you just repeating an existing requirement from an underlying spec?

  6.2.  Selective AR-LEAF procedures

***************
*** 991,996 ****
--- 1163,1174 ----
     b.  The AR-LEAF MAY advertise a Regular-IR route if there are RNVEs
         in the BD.  The Selective AR-LEAF MUST advertise a Leaf A-D route
         after receiving a Replicator-AR route with L=1.  It is
+
+ "after receiving" -- so, does this mean it MUST NOT advertise a Leaf A-D
+ route prior to receiving any Replicator-AR route with L=1?  That would also
+ imply that if all Replicator-AR routes with L=1 are withdrawn, the Leaf A-D
+ route MUST be withdrawn?
+
         RECOMMENDED that the Selective AR-LEAF waits for a AR-LEAF-join-
         wait-timer (in seconds, default value is 3) before sending the
         Leaf A-D route, so that the AR-LEAF can collect all the
***************
*** 998,1004 ****
         route.

     c.  In a service where there is more than one Selective AR-
!        REPLICATORs the Selective AR-LEAF MUST locally select a single
         Selective AR-REPLICATOR for the BD.  Once selected:

--- 1176,1182 ----
         route.

     c.  In a service where there is more than one Selective AR-
!        REPLICATOR the Selective AR-LEAF MUST locally select a single
         Selective AR-REPLICATOR for the BD.  Once selected:

***************
*** 1021,1026 ****
--- 1199,1211 ----
         o  In case of a failure on the selected AR-REPLICATOR, another
            AR-REPLICATOR will be selected and a new Leaf A-D update will
            be issued for the new AR-REPLICATOR.  This new route will
+
+ What does "in case of a failure on the selected AR-REPLICATOR" mean,
+ practically speaking?  How is this detected?  I presume the failure
+ is detected when the relevant route becomes infeasible as the result
+ of any of the relevant underlying BGP mechanisms (nexthop unresolvability,
+ holdtime expired, route withdrawal, etc).
+
            update the selective list in the new Selective AR-REPLICATOR.
            In case of failure on the active Selective AR-REPLICATOR, it
            is RECOMMENDED for the Selective AR-LEAF to revert to IR
***************
*** 1030,1035 ****
--- 1215,1223 ----
            AR mode with the new Selective AR-REPLICATOR.  The AR-
            REPLICATOR-activation-timer MAY be the same configurable
            parameter as in Section 5.2.
+
+ What happens if a new AR-REPLICATOR is learned by the AR-LEAF, and the
+ new replicator is preferred over the currently-selected one?

     All the AR-LEAFs in a BD are expected to be configured as either
     selective or non-selective.  A mix of selective and non-selective AR-
***************
*** 1045,1051 ****

        1.  Flood-list #1 - composed of ACs and the overlay tunnel to the
            selected AR-REPLICATOR (using the AR-IP as the tunnel
!           destination IP).

        2.  Flood-list #2 - composed of ACs and overlay tunnels to the
            remote IR-IP Addresses.
--- 1233,1239 ----

        1.  Flood-list #1 - composed of ACs and the overlay tunnel to the
            selected AR-REPLICATOR (using the AR-IP as the tunnel
!           destination IP address).

        2.  Flood-list #2 - composed of ACs and overlay tunnels to the
            remote IR-IP Addresses.
***************
*** 1054,1061 ****
        there is any selected AR-REPLICATOR.  If there is, flood-list #1
        will be used.  Otherwise, flood-list #2 will.

!    -  When an AR-LEAF receives a BM packet on an overlay tunnel, will
!       forward the BM packet to its local ACs and never to an overlay
        tunnel.  This is the regular IR behavior described in [RFC7432].

--- 1242,1249 ----
        there is any selected AR-REPLICATOR.  If there is, flood-list #1
        will be used.  Otherwise, flood-list #2 will.

!    -  When an AR-LEAF receives a BM packet on an overlay tunnel, it will
!       forward the packet to its local ACs and never to an overlay
        tunnel.  This is the regular IR behavior described in [RFC7432].

***************
*** 1071,1076 ****
--- 1259,1267 ----
     In addition to AR, the second optimization supported by this solution
     is the ability for the all the BD nodes to signal Pruned-Flood-Lists
     (PFL).  As described in section 3, an EVPN node can signal a given
+
+ I guess you meant Section 4?
+
     value for the BM and U PFL flags in the IR Inclusive Multicast
     Routes, where:

***************
*** 1085,1090 ****
--- 1276,1286 ----
     PFL flag and remove the sender from the corresponding flood-list.  A
     given BD node receiving BUM traffic on an overlay tunnel MUST
     replicate the traffic normally, regardless of the signaled PFL flags.
+
+ What exactly does "replicate the traffic normally" mean, in the context
+ of this specification?  I guess you should say something like "replicate
+ the traffic according to [reference]".  Also, I don't get it: what are the
+ flags FOR, if they're ignored when receiving on an overlay tunnel?

     This optimization MAY be used along with the AR solution.

***************
*** 1123,1128 ****
--- 1319,1328 ----

         NVE2, but not to NVE3.  PE2 and NVE2 will replicate the BM
+
+ "but not to NVE3".  What happened to "MUST replicate the traffic normally"?
+ To me, these two pieces of text seem to contradict one another.
+
         packets to their local ACs but we will avoid NVE3 having to
         replicate unnecessarily those BM packets to VM31 and VM32.

***************
*** 1135,1147 ****
--- 1335,1357 ----
         NVE3 to NVE2, PE1 and PE2 but not NVE1.  The solution avoids the
         unnecessary replication to NVE1, since the destination of the
         unknown traffic cannot be at NVE1.
+
+ It's not clear to me why the destination can't be at NVE1.

     4.  Any Unknown unicast packet sent from TS1 will be forwarded by PE1
         to the WAN link, PE2 and NVE2 but not to NVE1 and NVE3, since the
         target of the unknown traffic cannot be at those NVEs.
+
+ Similarly, I don't get why this is the case.

  8.  AR Procedures for single-IP AR-REPLICATORS

+ I'm curious why the design choice was made to specify two different ways to
+ do the same thing.  You motivate why not all routers can use distinguished
+ IP addresses for the two different functional modes; however, presumably all
+ routers could make use of distinguished VNIs as you do here.  I'd appreciate
+ a few words about why you didn't choose to just always use the VNI approach.
+
     The procedures explained in sections Section 5 and Section 6 assume
     that the AR-REPLICATOR can use two local routable IP addresses to
     terminate and originate NVO tunnels, i.e. IR-IP and AR-IP addresses.
***************
*** 1184,1201 ****
  9.  AR Procedures and EVPN All-Active Multi-homing Split-Horizon

     This section extends the procedures for the cases where AR-LEAF nodes
!    or AR-REPLICATOR nodes are attached to the the same Ethernet Segment
     in the BD.  The case where one (or more) AR-LEAF node(s) and one (or
     more) AR-REPLICATOR node(s) are attached to the same Ethernet Segment
     is out of scope.

  9.1.  Ethernet Segments on AR-LEAF nodes

     If VXLAN or NVGRE are used, and if the Split-horizon is based on the
     tunnel IP SA and "Local-Bias" as described in [RFC8365], the Split-
     horizon check will not work if there is an Ethernet-Segment shared
!    between two AR-LEAF nodes, and the AR-REPLICATOR changes the tunnel
     IP SA of the packets with its own AR-IP.

     In order to be compatible with the IP SA split-horizon check, the AR-
     REPLICATOR MAY keep the original received tunnel IP SA when
--- 1394,1418 ----
  9.  AR Procedures and EVPN All-Active Multi-homing Split-Horizon

     This section extends the procedures for the cases where AR-LEAF nodes
!    or AR-REPLICATOR nodes are attached to the same Ethernet Segment
     in the BD.  The case where one (or more) AR-LEAF node(s) and one (or
     more) AR-REPLICATOR node(s) are attached to the same Ethernet Segment
     is out of scope.
+
+ I just can't understand what this paragraph is telling me. :-(  Apart from
+ anything else, to the casual reader the second sentence seems to contradict
+ the first.

  9.1.  Ethernet Segments on AR-LEAF nodes

     If VXLAN or NVGRE are used, and if the Split-horizon is based on the
     tunnel IP SA and "Local-Bias" as described in [RFC8365], the Split-
     horizon check will not work if there is an Ethernet-Segment shared
!    between two AR-LEAF nodes, and the AR-REPLICATOR replaces the tunnel
     IP SA of the packets with its own AR-IP.
+
+ I changed "changes" to "replaces"; it's my best guess as to what you meant.
+ If that's wrong, please help me understand what you did mean.

     In order to be compatible with the IP SA split-horizon check, the AR-
     REPLICATOR MAY keep the original received tunnel IP SA when
***************
*** 1203,1209 ****
     LEAF nodes to apply Split-horizon check procedures for BM packets,
     before sending them to the local Ethernet-Segment.  Even if the AR-
     LEAF's IP SA is preserved when replicating to AR-LEAFs or RNVEs, the
!    AR-REPLICATOR MUST always use its IR-IP as IP SA when replicating to
     other AR-REPLICATORs.

     When EVPN is used for MPLS over GRE (or UDP), the ESI-label based
--- 1420,1426 ----
     LEAF nodes to apply Split-horizon check procedures for BM packets,
     before sending them to the local Ethernet-Segment.  Even if the AR-
     LEAF's IP SA is preserved when replicating to AR-LEAFs or RNVEs, the
!    AR-REPLICATOR MUST always use its IR-IP as the IP SA when replicating to
     other AR-REPLICATORs.

     When EVPN is used for MPLS over GRE (or UDP), the ESI-label based
***************
*** 1220,1226 ****

  9.2.  Ethernet Segments on AR-REPLICATOR nodes

!    Ethernet Segments associated to one or more AR-REPLICATOR nodes
     SHOULD follow "Local-Bias" procedures for EVPN all-active multi-
     homing, as follows:

--- 1437,1443 ----

  9.2.  Ethernet Segments on AR-REPLICATOR nodes

!    Ethernet Segments associated with one or more AR-REPLICATOR nodes
     SHOULD follow "Local-Bias" procedures for EVPN all-active multi-
     homing, as follows:

***************
*** 1240,1245 ****
--- 1457,1464 ----
        it had been received on a local AC that is part of the ES and will
        be forwarded to all local ES, irrespective of their DF or NDF
        state.
+
+ Please define/expand "ES".

     -  BUM traffic received on an AR-REPLICATOR overlay tunnel with IR-IP
        as the IP DA, will follow regular [RFC8365] "Local-Bias" rules and
***************
*** 1254,1259 ****
--- 1473,1483 ----
     In addition, the procedures introduced by this document may bring
     some new risks for the successful delivery of BM traffic.  Unicast
     traffic is not affected by this document.  The forwarding of
+
+ If unicast traffic isn't affected, what's the U flag even for?  It sure
+ seems as though it's intended to affect the forwarding of (unknown)
+ unicast traffic.
+
     Broadcast and Multicast (BM) traffic is modified though, and BM
     traffic from the AR-LEAF nodes will be attracted by the existance of
     AR-REPLICATORs in the BD.  An AR-LEAF will forward BM traffic to its
***************
*** 1262,1270 ****

     An implementation following the procedures in this document should
     not create BM loops, since the AR-REPLICATOR will always forward the
     BM traffic using the correct tunnel IP Destination Address that
     indicates the remote nodes how to forward the traffic.  This is true
!    in both, the Non-Selective and Selective modes defined in this
     document.

     The Selective mode provides a multi-staged replication solution,
--- 1486,1503 ----

     An implementation following the procedures in this document should
     not create BM loops, since the AR-REPLICATOR will always forward the
+
+ Instead of "should not create BM loops" I suggest "will not create" or
+ if you can't actually promise that, "is not expected to create".  I assume
+ you're using "should" in the sense of weak expectation, and not like a
+ RFC 2119 SHOULD.
+
     BM traffic using the correct tunnel IP Destination Address that
     indicates the remote nodes how to forward the traffic.  This is true
!
! Instead of "indicates", try instructs, cues, or directs?
!
!    in both the Non-Selective and Selective modes defined in this
     document.

     The Selective mode provides a multi-staged replication solution,