[bess] Benjamin Kaduk's No Objection on draft-ietf-bess-evpn-optimized-ir-09: (with COMMENT)

Benjamin Kaduk via Datatracker <noreply@ietf.org> Thu, 28 October 2021 02:53 UTC

Return-Path: <noreply@ietf.org>
X-Original-To: bess@ietf.org
Delivered-To: bess@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 6EF003A05C7; Wed, 27 Oct 2021 19:53:03 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-bess-evpn-optimized-ir@ietf.org, bess-chairs@ietf.org, bess@ietf.org, Matthew Bocci <matthew.bocci@nokia.com>, matthew.bocci@nokia.com
X-Test-IDTracker: no
X-IETF-IDTracker: 7.39.0
Auto-Submitted: auto-generated
Precedence: bulk
Reply-To: Benjamin Kaduk <kaduk@mit.edu>
Message-ID: <163538958283.18208.14529066116844758492@ietfa.amsl.com>
Date: Wed, 27 Oct 2021 19:53:03 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/bess/6Xxt8v5yJxby14LJcEkyLJ_G-qw>
Subject: [bess] Benjamin Kaduk's No Objection on draft-ietf-bess-evpn-optimized-ir-09: (with COMMENT)
X-BeenThere: bess@ietf.org
X-Mailman-Version: 2.1.29
List-Id: BGP-Enabled ServiceS working group discussion list <bess.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bess>, <mailto:bess-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bess/>
List-Post: <mailto:bess@ietf.org>
List-Help: <mailto:bess-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bess>, <mailto:bess-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Oct 2021 02:53:04 -0000

Benjamin Kaduk has entered the following ballot position for
draft-ietf-bess-evpn-optimized-ir-09: No Objection

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/blog/handling-iesg-ballot-positions/
for more information about how to handle DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-bess-evpn-optimized-ir/



----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

Thanks to Derek Atkins for the secdir review.

Thanks as well to John Scudder for his detailed review; I support his
discuss position and have omitted a few of my comments that he has
already covered.  (There are probably a few more that I could have
omitted, but I did not do an exhaustive check.  Feel free to just point
to his ballot thread instead of repeating the explanation to me.)

I think it would be very helpful to clearly state early on what the
difference between the "selective" and "non-selective" setups is.  The
first description I see is not until §6.2 (I comment below where it
appears as well).

Section 2

   -  AR-IP: IP address owned by the AR-REPLICATOR and used to
      differentiate the ingress traffic that must follow the AR
      procedures.

>From context I infer that the AR-IP is advertised along with the
Replicator-AR RT-3 route.  Since we talk about other defined values as
being advertised along with such RT-3 routes, should we also say that
this IP is advertised along with the corresponding RT-3 route?

   -  AR-VNI: VNI advertised by the AR-REPLICATOR along with the
      Replicator-AR route.  It is used to identify the ingress packets
      that must follow AR procedures ONLY in the Single-IP AR-REPLICATOR
      case.

This phrasing seems ambiguous: please distinguish whether this is used
only in the single-IP AR-REPLICATOR case or it identifies packets that
sometimes follow AR procedures (in the single-IP AR-REPLICATOR case) and
sometimes do not.

   -  PTA: PMSI Tunnel Attribute

PMSI is not marked as "well-known" at
https://www.rfc-editor.org/materials/abbrev.expansion.txt and should be
expanded on first use or otherwise defined.

   -  EVI: EVPN Instance.  An EVPN instance spanning the Provider Edge
      (PE) devices participating in that EVPN

This seems rather circular.  Can we define "EVPN Instance" without
reference to "EVPN instance"?

Section 3

   c.  The solution is compatible with [RFC7432] and [RFC8365] and has
       no impact on the EVPN procedures for BM traffic.  In particular,

I do not think that "no impact on the EVPN procedures" is what was
intended -- it obviously has impact on the procedures, since it is
implemented differently.  Perhaps it has no impact on the CE, but that's
not what this text seems to say.

Section 4

I agree with the directorate reviewer that splitting the RT-3 NLRI
layout and the PTA general format into separate figures is quite
worthwhile.  I would also suggest naming the first one as the NLRI of
the RT-3 route type, rather than leaving that implicit.

   The Inclusive Multicast Ethernet Tag route (RT-3) and its PMSI Tunnel
   Attribute's (PTA) general format used in [RFC7432] are shown below:

I suggest referencing RFC 6514 as the source of the PTA format.

   The Flags field is 8 bits long.  This document defines the use of 4
   bits of this Flags field:

That's half of the flag bits!  Why is it better to allocate so many
flags than to move more structure into the tunnel identifier portion of
the PTA?  I guess RFC 7902 does provision for extended tunnel attribute
flags, but the question of whether these all belong as flags still seems
valid.

   -  Regular-IR route: in this route, Originating Router's IP Address,
      Tunnel Type (0x06), MPLS Label and Tunnel Identifier MUST be used
      as described in [RFC7432] when Ingress Replication is in use.  The
      NVE/PE that advertises the route will set the Next-Hop to an IP
      address that we denominate IR-IP in this document.  When
      advertised by an AR-LEAF node, the Regular-IR route SHOULD be
      advertised with type T= AR-LEAF.

When would I violate this SHOULD (and what other behaviors would be
usable)?

      o  Originating Router's IP Address MUST be set to an IP address of
         the PE that should be common to all the EVIs on the PE (usually
         this is the PE's loopback address).  The Tunnel Identifier and

Is it really the usual case that a PE has only one loopback address (so
that the definite article "the" applies)?  This seems particularly
poigniant since we assume that AR-REPLICATORs will have multiple
addresses available for use, to distinguish inbound IR and AR traffic.

         Next-Hop SHOULD be set to the same IP address as the
         Originating Router's IP address when the NVE/PE originates the
         route.  [...]

Should we say anything about what they are set to when the NVE/PE does
not originate the route?

                It is only used for selective AR and its fields are set
   as follows:

The antecedent for "its fields" seems to be "the Leaf A-D route
(RT-11)"; I suggest using the precise terminology that the fields of the
"route type specific portion of the route" are what are described.
Precise use of terminology makes the documents much more approachable to
unfamiliar readers that rely on textual search to correlate the relevant
parts of the various documents in question.

   -  Replicator-AR route: this route is used by the AR-REPLICATOR to
      advertise its AR capabilities, with the fields set as follows:

      o  Originating Router's IP Address MUST be set to an IP address of
         the PE that should be common to all the EVIs on the PE (usually
         this is the PE's loopback address).  The Tunnel Identifier and

I note that the guidance in RFC 7432 for constructing what we in this
document refer to as the "Regular-IR route" also has text about "the
PE's loopback address" being useful for what we would call the IR-IP,
but this address here is the AR-IP and (if we keep reading) SHOULD be
different than the IR-IP.  I think we need to say something about
whether PEs are really expected to only have one ("the") loopback
address vs multiple, and if there is only one how to decide whether to
use it as AR-IP or IR-IP.  To use language ("the PE's loopback address")
that implies there is only one, while strongly suggesting that it be
used for two different purposes and also strongly suggesting that those
two different purposes have different addresses, seems to be internally
inconsistent.

      o  The AR-LEAF constructs an IP-address-specific route-target as
         indicated in [I-D.ietf-bess-evpn-bum-procedure-updates], by
         placing the IP address carried in the Next-Hop field of the
         received Replicator-AR route in the Global Administrator field
         of the Community, with the Local Administrator field of this
         Community set to 0.  [...]

The analogous text in draft-ietf-bess-evpn-bum-procedure-updates also
mentions "setting the Extended Communities attribute of the Leaf A-D
route to that Community"; would that be useful to include here as well?

      o  The Leaf A-D route MUST include the PMSI Tunnel attribute with
         the Tunnel Type set to AR, type set to AR-LEAF and the Tunnel
         Identifier set to the IP of the advertising AR-LEAF.  The PMSI
         Tunnel attribute MUST carry a downstream-assigned MPLS label or
         VNI that is used by the AR-REPLICATOR to send traffic to the
         AR-LEAF.

This seems to be the only place where we specify the actual
format/contents (i.e., including Tunnel Identifier contents) of the "AR"
PTA tunnel type.  I would have expected something more declarative of a
declaration, that the IANA registration could point to.

Section 5.1

It's a bit unfortunate that there's so much overlap between the list of
"considerations" and the "rules" that an implementation must be
compatible with, but it may be too risky to try to coalesce them at this
time.

   b.  An AR-REPLICATOR MUST advertise a Replicator-AR route and MAY
       advertise a Regular-IR route.  The AR-REPLICATOR MUST NOT
       generate a Regular-IR route if it does not have local attachment
       circuits (AC).  If the Regular-IR route is advertised, the AR
       Type field is set to zero.

This seems to merit some more substantial discussion, since the value of
zero in the AR type field is otherwise avoided in this document.  That
is, we have specific values for "leaf" and "acting as replicator", but
the value zero is normally "does not support optimized-ir".  Except here
it's also used for "replicator advertising as non-replicator role"; it's
probably appropriate to not abuse "leaf" for this case, but using zero
seems to in some sense be a different abuse.  Would the '11' value have
been usable to indicate this distinction?

Section 6

                                                                   The
   solution is called "selective" because a given AR-REPLICATOR MUST
   replicate the BM traffic to only the AR-LEAF that requested the
   replication (as opposed to all the AR-LEAF nodes) and MAY replicate
   the BM traffic to the RNVEs.  [...]

I'm not sure I understand the motivation behind MAY, here.  If we don't
replicate the BM traffic to RNVEs isn't that data loss?

Section 6.1

       o  When a node defined and operating as Selective AR-REPLICATOR
          receives a packet on an overlay tunnel, it will do a tunnel
          destination IP lookup and if the destination IP is the AR-
          REPLICATOR AR-IP Address, the node MUST replicate the packet
          to:
       [...]
          +  overlay tunnels to the remote Selective AR-REPLICATORs if
             the tunnel source IP is an IR-IP of its own AR-LEAF-set (in
             any other case, the AR-REPLICATOR MUST NOT replicate the BM
             traffic to remote AR-REPLICATORs), where the tunnel
             destination IP is the AR-IP of the remote Selective AR-
             REPLICATOR.  The tunnel destination IP AR-IP will be a

It seems like it would require less cognitive burden on the reader if we
disambiguated "tunnel source IP" as it relates to the incoming tunnel on
which the packet in question was received vs the outgoing tunnel to
which it is being replicated.  ("tunnel destination IP" is arguably
already disambiguated by the lead-in text that talks about doing a
lookup based on the tunnel the packet was received on.)  Given that the
"rules" that appear later to specifically say that it checks both
destination and source of the underlay IP header, it seems reasonable to
say something similar here when listing the "considerations".

          +  The Selective AR-REPLICATOR-set is composed of the overlay
             tunnels to all the AR-REPLICATORs that send a Replicator-AR
             route with L=1.  The AR-IP addresses are used as tunnel
             destination IP.

I'm not sure why the "that send a Replicator-AR route with L=1" clause
is needed -- if there are AR-REPLICATORS that send with L=0 then aren't
we required to fall back to the non-selective procedures?

   -  In any case, non-BM overlay tunnels are excluded from flood-lists
      and, also, source squelching is always done in order to ensure the
      traffic is not sent back to the originating source.  If the
      encapsulation is MPLSoGRE (or MPLSoUDP) and the BD label is not
      the bottom of the stack, the AR-REPLICATOR MUST copy the rest of
      the labels when forwarding them to the egress overlay tunnels.

I'm not sure that I understand which labels "the rest of the labels"
are in this context.

Section 6.2

   In the example of Figure 1, we consider NVE1/NVE2/NVE3 as Selective
   AR-LEAFs.  NVE1 selects PE1 as its Selective AR-REPLICATOR.  If that
   is so, NVE1 will send all its BM traffic for BD-1 to PE1.  If other
   AR-LEAF/REPLICATORs send BM traffic, NVE1 will receive that traffic
   from PE1.  These are the differences in the behavior of a Selective
   AR-LEAF compared to a non-selective AR-LEAF:

I think this might be the first time we concretely say what makes the
"selective" procedures earn that name (the combined selectivity of all
BM traffic, in both directions, between leaf and replicator, as opposed
to the non-selective case where leafs pick only the replicator that they
send to, and must receive from everywhere.  This seems like something
that would be useful to have much earlier in the document, e.g., in the
introduction.  (It's also somewhat different than the sense in which RFC
6513 constrasts selective and inclusive tunnels, though I expect it's
probably too late to try to change the terminology used here.)

Section 8

   -  An AR-REPLICATOR will perform IR or AR forwarding mode for the
      incoming Overlay packets based on an ingress VNI lookup, as
      opposed to the tunnel IP DA lookup.  Note that, when replicating
      to remote AR-REPLICATOR nodes, the use of the IR-VNI or AR-VNI
      advertised by the egress node will determine the IR or AR
      forwarding mode at the subsequent AR-REPLICATOR.

Does this implicitly put a requirement on all AR-REPLICATOR
implementations to support the VNI-based scheme, since they might be
called upon to forward to another replicator using it?

Section 9.1

   In order to be compatible with the IP SA split-horizon check, the AR-
   REPLICATOR MAY keep the original received tunnel IP SA when
   replicating packets to a remote AR-LEAF or RNVE.  This will allow AR-
   LEAF nodes to apply Split-horizon check procedures for BM packets,
   before sending them to the local Ethernet-Segment.  Even if the AR-
   LEAF's IP SA is preserved when replicating to AR-LEAFs or RNVEs, the
   AR-REPLICATOR MUST always use its IR-IP as IP SA when replicating to
   other AR-REPLICATORs.

It seems unfortunate that an AR-LEAF node needs to have knowledge of the
configuration in use at remote AR-REFLECTORs in order to know if the
split-horizon check will be effective.  Is there no way to always
require certain replicator behavior and give the leafs reliable
knowledge in split-horizon scenarios?

Section 9.2

   Ethernet Segments associated to one or more AR-REPLICATOR nodes
   SHOULD follow "Local-Bias" procedures for EVPN all-active multi-
   homing, as follows:

Is it really the "ethernet segments" that would follow local-bias
procedures, or the EVPN nodes attached to them?  Is putting SHOULD-level
guidance to this effect in effect updating the core EVPN specification
to privilege one way of handling multi-homing over others?  (Maybe not,
since the requirements only come into play when AR-REPLICATORs are
involved and we disclaim applicability to cases where AR-REPLICATOR and
AR-LEAF are on the same ethernet segment ... we might consider saying
that as some part of why that case is out of scope.)

Also, if we know of procedures other than local-bias that will still be
effective, we might mention them as some justification for why this is
only a SHOULD and not a MUST.

Section 10

Since we use the Leaf A-D route from [bum-procedure-update], we might
want to pull in its security considerations as well.

I feel like there may be some more considerations to mention that are
specific to the multi-homing case, but I don't think I understand that
scenario well enough to be able to state them, myself.

We might also mention that AR-REPLICATORs are, by design, using more
bandwidth than stock RFC7432 PEs would, and if they exceed their local
bandwidth that will cause service disruption.

The text that's here already does do a pretty good job of capturing the
important topics for the common case, though -- thanks!

   An implementation following the procedures in this document should
   not create BM loops, since the AR-REPLICATOR will always forward the
   BM traffic using the correct tunnel IP Destination Address that
   indicates the remote nodes how to forward the traffic.  This is true
   in both, the Non-Selective and Selective modes defined in this
   document.

(In the vein of my earlier comment,) what about the case when the tunnel
destination is expecting to use VNI to determine how to forward the
traffic?

Section 14.2

Having SHOULD-level guidance to use the "local bias" procedures detailed
in RFC 8365 might require that document to be promoted to a normative
reference; see
https://www.ietf.org/about/groups/iesg/statements/normative-informative-references/

NITS

Section 1

   Section 3 lists the requirements of the combined optimized-IR
   solution, whereas Section 5 and Section 6 describe the Assisted-
   Replication (AR) solution, and Section 7 the Pruned-Flood-Lists (PFL)
   solution.

I suggest mentioning that sections 5 and 6 differ in that they cover the
selective and non-selective cases.

Section 2

   -  Regular-IR: Refers to Regular Ingress Replication, where the
      source NVE/PE sends a copy to each remote NVE/PE part of the BD.

s/part of/that is part of/

Section 4

   -  Regular-IR route: in this route, Originating Router's IP Address,
      Tunnel Type (0x06), MPLS Label and Tunnel Identifier MUST be used
      as described in [RFC7432] when Ingress Replication is in use.  The
      NVE/PE that advertises the route will set the Next-Hop to an IP
      address that we denominate IR-IP in this document.  When
      advertised by an AR-LEAF node, the Regular-IR route SHOULD be
      advertised with type T= AR-LEAF.

Hmm, down near the end of page 9 we say that AR-enabled nodes MUST
signal the proper AR type (1 or 2) according to its administrative
choice -- how is that MUST compatible with the SHOULD here?

Also, if we're going to write out T = 01 (AR-REPLICATOR) just a few lines
later, we should write out T = 10 (AR-LEAF) here.

      o  The AR-LEAF constructs an IP-address-specific route-target as
         indicated in [I-D.ietf-bess-evpn-bum-procedure-updates], by
         placing the IP address carried in the Next-Hop field of the

Pedantically, "as indicated in [bum-procedure-update]" would involve
"placing the IP address carried in the Next Hop of the received I/S-PMSI
A-D route in the Global Administrator field of the Community", which is
obviously not going to be applicable in this case.  So "analogously to"
might be more appropriate than "as indicated".

         received Replicator-AR route in the Global Administrator field
         of the Community, with the Local Administrator field of this
         Community set to 0.  Note that the same IP-address-specific
         import route-target is auto-configured by the AR-REPLICATOR
         that sent the Replicator-AR, in order to control the acceptance
         of the Leaf A-D routes.

This "Note that ... is auto-configured by" phrasing suggests to me that
there is some more detailed text elsewhere laying out a requirement to
do this (and any needed procedures, though I suspect there are no real
procedures to document).  However, later on §6.1 refer back to §4 (here)
for "the AR-REPLICATOR auto-configures its IP-address-specific import
route-target as described in section Section 4."  Maybe we could write
this in a way that's more clearly a specification and binding on the
AR-REPLICATOR?

      o  The Leaf A-D route MUST include the PMSI Tunnel attribute with
         the Tunnel Type set to AR, type set to AR-LEAF and the Tunnel

"type" here seems to refer to the new T field in the PTA flags, and
should probably be referenced using consistent terminlogy.

   Each AR-enabled node MUST understand and process the AR type field in
   the PTA (Flags field) of the routes, and MUST signal the

(same point about consistent terminology for T/AR-type)

   corresponding type (1 or 2) according to its administrative choice.

I suggest writing "01" and "10" to match the previous treatment of the
two-bit field.

Section 5.1

   -  When an AR-REPLICATOR receives a BM packet on an AC, it will
      forward the BM packet to its flooding list (including local ACs
      and remote NVE/PEs), skipping the non-BM overlay tunnels.

I assume that it goes without saying that the AR-REPLICATOR does not
flood the packet back to the AC it came in on.  (The "rules" later in
the section do specify source squelching.)

Section 5.2

   b.  In this non-selective AR solution, the AR-LEAF MUST advertise a
       single Regular-IR inclusive multicast route as in [RFC7432].  The
       AR-LEAF SHOULD set the AR Type field to AR-LEAF.  Note that
       although this flag does not make any difference for the egress
       nodes when creating an EVPN destination to the AR-LEAF, it is

egress, or ingress?