[bess] Benjamin Kaduk's Discuss on draft-ietf-bess-evpn-igmp-mld-proxy-13: (with DISCUSS and COMMENT)

Benjamin Kaduk via Datatracker <noreply@ietf.org> Thu, 21 October 2021 00:48 UTC

Return-Path: <noreply@ietf.org>
X-Original-To: bess@ietf.org
Delivered-To: bess@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 94A123A0940; Wed, 20 Oct 2021 17:48:41 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-bess-evpn-igmp-mld-proxy@ietf.org, bess-chairs@ietf.org, bess@ietf.org, slitkows.ietf@gmail.com
X-Test-IDTracker: no
X-IETF-IDTracker: 7.39.0
Auto-Submitted: auto-generated
Precedence: bulk
Reply-To: Benjamin Kaduk <kaduk@mit.edu>
Message-ID: <163477731824.13216.11701195886404718166@ietfa.amsl.com>
Date: Wed, 20 Oct 2021 17:48:41 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/bess/7rTjRxcmKAARvzg0NWm7SVkia54>
Subject: [bess] Benjamin Kaduk's Discuss on draft-ietf-bess-evpn-igmp-mld-proxy-13: (with DISCUSS and COMMENT)
X-BeenThere: bess@ietf.org
X-Mailman-Version: 2.1.29
List-Id: BGP-Enabled ServiceS working group discussion list <bess.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bess>, <mailto:bess-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bess/>
List-Post: <mailto:bess@ietf.org>
List-Help: <mailto:bess-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bess>, <mailto:bess-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Oct 2021 00:48:42 -0000

Benjamin Kaduk has entered the following ballot position for
draft-ietf-bess-evpn-igmp-mld-proxy-13: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/blog/handling-iesg-ballot-positions/
for more information about how to handle DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-bess-evpn-igmp-mld-proxy/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

(1) Apparently each PE is supposed to store version flags for each other PE
in the EVI (I guess on a per-route basis?), but this is mentioned just
once, in passing, in step 2 of the Leave Group procedures in §4.1.2.
Similarly, §6.1 defines, somewhat in passing, some "local IGMP
Membership Request (x,G) state" that must be maintained in some cases.
Let's discuss whether it's appropriate/useful to have a general introductory
section that covers what new state PEs are expected to retain as part of
supporting IGMP/MLD proxying.  Maybe the answer is "no", but I would
like to have the conversation.

(2) I am not sure if the body text is consistent with what is being
allocated from IANA.  §8 describes PEs that are not using ingress
replication as being identifiable as """any PE that has advertised an
Inclusive Multicast Tag route for the BD without the "IGMP Proxy
Support" flag""", but the IANA considerations allocate flags for both
IGMP Proxy Support and MLD Proxy Support.  Is a PE that advertises MLD
Proxy Support but not IGMP Proxy Support to be treated as not using
ingress replication, as the literal interpretation of this text would
require?  Similarly, §9.2.1 and §9.3.1 include restrictions on
indication of support for "IGMP Proxy" with no mention of "MLD Proxy".
I do see that there is a generic disclaimer at the end of Section 3 but
the way it is written does not actually seem to cover this usage.


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

As one of the directorate reviewers noted (and Éric promoted to a
DISCUSS), this document does not really give any specific description of
how an EVPN PE should construct outgoing IGMP/MLD messages to send out
on its ACs as a result of receiving EVP information over BGP.  From a
brief examination of the relevant IGMP messages, it seems that the EVPN
messages might actually contain information to populate literally all
the IGMP fields, but this is probably worth mentioning explicitly.  In
particular, guidance might be interesting for (e.g.) IGMPv3, that lets
multiple Group Records be included in a single Membership Report.
(Pedantically, such IGMPv3 multiplexing might also require phrasing
changes for the reverse process, taking IGMP and constructing EVPN
routes, since we refer to (e.g) "the Group address of the IGMP
Membership Report" in places, and that is not a well-defined concept in
the absence of some text indicating group-by-group processing.)

Abstract

   This document describes how to support efficiently endpoints running
   IGMP for the above services over an EVPN network by incorporating
   IGMP proxy procedures on EVPN PEs.

I see Lars already noted the dangling reference to "above services".
That really needs to be fixed before approval, and even looking at the
diff from -12 to -13 does not give me a clear picture of what to suggest
as a rewrite.

Section 1

I strongly suggest mentioning and referencing some of the core
technologies that readers are assumed to be familiar with (e.g., RFC
7432 for EVPN, RFC 6514 for various tunnel types including Ingress
Replication).  At present the document is quite unfriendly to a reader
from an outside field, who has little to no indication as to what
background material is required in order to be able to make sense of
this document.

   In DC applications, a point of delivery (POD) can consist of a

Data Center is not marked as "well-known" at
https://www.rfc-editor.org/materials/abbrev.expansion.txt and needs to
be expanded on first use.

   2.  Distributed anycast multicast proxy: it is desirable for the EVPN
       network to act as a distributed anycast multicast router with

I honestly don't know what a "distributed anycast multicast router" is
supposed to be.  Google finds only a handful of instances of that
(quoted) phrase, most of which can be traced back to this document.
There is a similar phrase in §4.2 that perhaps clarifies that the
collection of EVPN PEs is intended to function as a distributed
multicast router (that is perhaps in some sense transparent to the CEs).
But how does the "anycast" part come into play?  How is the anycast IP
address assigned, and which protocol messages is it conveyed in?

Section 3

I suggest adding SMET to the terminology listed here.

   o  Ethernet Segment (ES): When a customer site (device or network) is
      connected to one or more PEs via a set of Ethernet links.

That looks like an extremely unconventional definition for "Ethernet
Segment".

   Membership Report too.  Similarly, text for IGMPv2 applies to MLDv1
   and text for IGMPv3 applies to MLDv2.  IGMP / MLD version encoding in
   BGP update is stated in Section 9

I suggest stating explicitly that this equivalence is possible because
the indicated versions provide analogous functionality for IPv4 and
IPv6, respectively.

Section 4.1.1

       is considered as a new BGP route advertisement.  When different
       version of IGMP join are received, final state MUST be as per
       section 5.1 of [RFC3376].  At the end of route processing local
       and remote group record state MUST be as per section 5.1 of
       [RFC3376].

I interpret "different version of IGMP join" as "join messages from
different IGMP protocol versions", which makes this reference to RFC
3376 make no sense to me -- the referenced section does not talk about
multiple protocol versions at all.  Please clarify what behavior from
RFC 3376 is being referenced.

       logged.  If the v3 flag is set (in addition to v2), then the IE
       flag MUST indicate "exclude".  If not, then an error SHOULD be
       logged.  [...]

It's great to say that this is an error condition and should be logged.
What does the recipient actually do while processing the message?
An RFC 7606 named behavior would be nice.

Section 4.2

   As mentioned in the previous sections, each PE MUST have proxy
   querier functionality for the following reasons:

I'm not really sure which previous mentions this is supposed to refer
to.

Section 6.2.1

Just to confirm: the PE receiving a BGP Leave Synch route does *not*
produce local IGMP Query messages, on the assumption that the PE that
did receive the Leave locally has already done so?  (I don't think this
necessarily needs to be written out in the document itself; I just want
to confirm my understanding.)

Section 6.3

   A PE which has received an IGMP Membership Request would have synced
   the IGMP Join by the procedure defined in section 6.1.  If a PE with
   local join state goes down or the PE to CE link goes down, it would
   lead to a mass withdraw of multicast routes.  Remote PEs (PEs where

Can we have greater clarity on "would lead to"?  Are there actually
routes that will be withdrawn and we are just ignoring the consequences
of that for the purposes of local state, using some heuristic (as
mentioned later) for detecting whether a mass-withdraw is due to a
failure at a peer?  Or is the mass withdraw a hypothetical scenario that
the procedures described here fully avoid?

   these routes were remote IGMP Joins) SHOULD NOT remove the state
   immediately; instead General Query SHOULD be generated to refresh the
   states.  There are several ways to detect failure at a peer, e.g.
   using IGP next hop tracking or ES route withdraw.

Does each PE initiate the General Query, in this scenario?

Section 7

   Note that to facilitate state synchronization after failover, the PEs
   attached to a multihomed ES operating in Single-Active redundancy
   mode SHOULD also coordinate IGMP Join (x,G) state.  In this case all

What are the drawbacks of not performing such synchronization?
Alternately, in what cases does it make sense to not perform
synchronization (so that the guidance is SHOULD rather than MUST)?

Section 9.1

It might be nice to mention that the length fields are measured in bits
here in this section, where the NLRI format is laid out, in addition to
§9.1.1 where the procedures for constructing it are laid out.

   o  If route is used for IPv6 (MLD) then bit 7 indicates support for
      MLD version 1.  The second least significant bit, bit 6 indicates

How does the receiver know if the route is being used for IPv6?  (Also
applies in §9.2, 9.3)

Section 9.1.1

Is there any requirement for consistency about using IPv4 vs IPv6
addresses in all three address fields?  The description given here would
seem to allow mixing address families, but I don't really expect that to
work in practice.

   version and any source filtering for a given group membership.  All
   EVPN SMET routes are announced with per- EVI Route Target extended
   communities.

Is there a good reference for discussion of these associated ECs?

Section 9.1.2

   PE2 to receive multicast traffic.  In this case PE2 MUST originate a
   (*,*) SMET route to receive all of the multicast traffic in the EVPN
   domain.  To generate Wildcards (*,*) routes, the procedure from
   [RFC6625] SHOULD be used.

Is the PE expected to identify this case based on protocol messages
received at runtime (e.g., any PIM at all), or is this external
configuration?

Section 9.3.1

   Maximum Response Time is value to be used while sending query as
   defined in [RFC2236]

Is it actually right to describe this as "while sending query
[messages]"?  My understanding is that a PE receiving this route over
BGP would in fact *not* actually send IGMP Query messages, but simply
use the time to set a timer and potentially clear up state if certain
conditions are met at the end of the period in question.

Section 10

Just to confirm my understanding here: in the immediate leave case, the
Leave Synch route will be advertised just for the "delta" period of time
described in §6.2 and then withdrawn?

   IGMP MAY be configured with immediate leave option.  This allows the

Is there a suitable reference for "immediate leave"?  I did not see much
relevant in RFCs 2236 and 3376.

Section 12

I support Roman's point about detailing which aspects are covered in
which referenced RFCs.

I also noted that the "delta" value used in the Last Member Query
process must be configured on each node, and to the same value.  Such
requirement for identical configuration opens up the chance for skew,
and sometimes any such skew is security-relevant and must be documented
in the security considerations.  However, I'm not sure that that's the
case, here, as it seems that skew would mostly only serve to cause a
brief "blip" where a PE drops its group state only to recreate it when a
report shows up later.  Is there a scenario where the skew goes the
other way, and a PE leaves group state in place indefinitely that should
have been dropped?

Section 16.1

Since we only reference RFC 4684 to say that its procedures are not
applicable to what we describe, it seems like it could be classified as
only an informative reference.

NITS

We seem quite inconsistent about whether we write "BCP Leave Synch
route" or "IGMP Leave Synch route" (but I believe these are both
supposed to be the same thing).

Section 1

   communication and orchestration.  However, EVPN is used as standard
   way of inter-POD communication for both intra-DC and inter-DC.  A

intra-DC and inter-DC are both adjectives that need to modify some noun.
Please supply such a noun (e.g., "traffic").

   These hosts express their interests in multicast groups on a given
   subnet/VLAN by sending IGMP Membership Reports (Joins) for their
   interested multicast group(s).  [...]

I think that this phrase "IGMP Membership Reports (Joins)" is intended
to serve some cross-protocol clarification role (e.g., "Join" is used by
IGMPv3 and MLD but not IGMPv2).  Since this is the first place where we
use that formulation, some additional text to clarify the shorthand
seems in order.

Section 3

   o  BD: Broadcast Domain.  As per [RFC7432], an EVI consists of a
      single or multiple BDs.  In case of VLAN-bundle and VLAN-aware

RFC 7432 spells "VLAN Bundle" with no hyphen.

   o  Single-Active Redundancy Mode: When only a single PE, among all
      the PEs attached to an Ethernet segment, is allowed to forward
      traffic to/from that Ethernet segment for a given VLAN, then the
      Ethernet segment is defined to be operating in Single-Active
      redundancy mode.

   o  All-Active Redundancy Mode: When all PEs attached to an Ethernet
      segment are allowed to forward known unicast traffic to/from that
      Ethernet segment for a given VLAN, then the Ethernet segment is
      defined to be operating in All-Active redundancy mode.

Is it important that the second definition only covers "unicast traffic"
but the former uses the unqualified term "traffic"?

   o  OIF: Outgoing Interface for multicast.  It can be physical
      interface, virtual interface or tunnel.

s/physical/a physical/

Section 4

   The IGMP Proxy mechanism is used to reduce the flooding of IGMP
   messages over an EVPN network similar to ARP proxy used in reducing

"similarly to how ARP proxy is used"

   speakers.  The information is again translated back to IGMP message
   at the recipient EVPN speaker.  Thus it helps create an IGMP overlay

"IGMP messages" plural, to match the previous sentence.

Section 4.1.1

   1.  When the first hop PE receives several IGMP Membership Reports
       (Joins), belonging to the same IGMP version, from different
       attached hosts for the same (*,G) or (S,G), it SHOULD send a
       single BGP message corresponding to the very first IGMP
       Membership Request (BGP update as soon as possible) for that
       (*,G) or (S,G).  [...]

What is an "IGMP Membership Request"?  Is this just a typo for Report?

                        This is because BGP is a stateful protocol and
       no further transmission of the same report is needed.  If the
       IGMP Membership Request is for (*,G), then multicast group
       address MUST be sent along with the corresponding version flag
       (v2 or v3) set.  [...]

(ditto)

                                   If the IGMP Join is for (S,G), then
       besides setting multicast group address along with the version
       flag v3, the source IP address and the IE flag MUST be set.  It

"setting the multicast group address" (add "the").

   2.  When the first hop PE receives an IGMPv3 Join for (S,G) on a
       given BD, it SHOULD advertise the corresponding EVPN Selective
       Multicast Ethernet Tag (SMET) route regardless of whether the

Forward reference Section 9.1, please?

   4.  When the first hop PE receives an IGMP version-X Join first for
       (*,G) and then later it receives an IGMPv3 Join for the same
       multicast group address but for a specific source address S, then
       the PE MUST advertise a new EVPN SMET route with v3 flag set (and
       v2 reset).  The IE flag also need to be set accordingly.  Since

What does "v2 reset" mean?  "The v2 flag is not set" or "the v2 flag is
cleared"?  I recommend not using the word "reset" in this context as
it's ambiguous.

   7.  Upon receiving EVPN SMET route(s) and before generating the
       corresponding IGMP Membership Request(s), the PE checks to see

"Membership Request" again.

       whether it has any CE multicast router for that BD on any of its
       ES's . The PE provides such a check by listening for PIM Hello
       messages on that AC (i.e, ES,BD).  If the PE does have the
       router's ACs, then the generated IGMP Membership Request(s) are
       sent to those ACs.  If it doesn't have any of the router's AC,
       then no IGMP Membership Request(s) needs to be generated.  [...]

The writing here seems rather jumbled, though perhaps I just
misunderstand the terminology in question.  Assuming that a PE router
has one or more ACs connecting it to one or more CE routers (possibly in
a many-to-many fashion), then I don't see how we can write about
the PE "have[ing] [any of] the router's ACs" -- wouldn't the relevant
criterion be that the AC has CE routers participating in multicast?

Section 4.1.2

   2.  When a PE receives an EVPN SMET route for a given (*,G), it
       compares the received version flags from the route with its per-
       PE stored version flags.  If the PE finds that a version flag
       associated with the (*,G) for the remote PE is reset, then the PE

[same comment about the word "reset" as above]

       MUST generate IGMP Leave for that (*,G) toward its local
       interface (if any) attached to the multicast router for that

Probably "router(s)" since there could be more than one.
And "interface(s)" as well?

       multicast group.  It should be noted that the received EVPN route
       MUST at least have one version flag set.  If all version flags
       are reset, it is an error because the PE should have received an

["reset" again]

Section 5

   Consider the EVPN network of Figure-1, where there is an EVPN
   instance configured across the PEs shown in this figure (namely PE1,
   PE2, and PE3).  Let's consider that this EVPN instance consists of a
   single bridge domain (single subnet) with all the hosts, sources, and

This is the only instance of the word "bridge" in this document (but
"broadcast domain" appears as a defined term).  Is "BD" intended?

Section 5.1

   all these local ports are associated with the hosts.  PE1 sends an
   EVPN Multicast Group route corresponding to this join for (*,G1) and
   setting v2 flag.  This EVPN route is received by PE2 and PE3 that are

s/setting/sets the/

   information.  However, when it receives the IGMPv3 Join from H3 for
   the same (*,G1).  Besides adding the corresponding port to its OIF

incomplete sentence; could add ", EVPN messaging is required" to connect
to the next sentence.

Section 6

   either DF or non-DF; i.e., different IGMP Membership Request messages

"Membership Request" again.

   needed.  All-Active multihoming PEs for a given ES MUST support IGMP
   synchronization procedures described in this section if they need to
   perform IGMP proxy for hosts connected to that ES.

Can we unpack the actual requirement here?  Is it: "if a given ES uses
all-active multihoming, in order for IGMP proxying to be used on that
ES, all the PEs on that segment must support the synchronization
procedures described in the following subsections"?
The analogous text in §6.2 seems more clear to me on what the
preconditions are.

Also, s/MUST support/MUST support the/ and s/IGMP proxy/IGMP proxying/

Section 6.1

   belongs.  If the PE doesn't already have local IGMP Membership
   Request (x,G) state for that BD on that ES, it MUST instantiate local
   IGMP Membership Request (x,G) state and MUST advertise a BGP IGMP

"Membership Request", albeit perhaps defensible since it is "state" and
not a message being sent.

   Join Synch route for that (ES,BD).  Local IGMP Membership Request
   (x,G) state refers to IGMP Membership Request (x,G) state that is
   created as a result of processing an IGMP Membership Report for
   (x,G).

It's typically easier for the reader when the new term is defined before
it is used, rather than after.  Especially so when the defined term is
similar to an existing, well-established, term that means something
else.

Section 9.1

   o  This EVPN route type is used to carry tenant IGMP multicast group
      information.  The flag field assists in distributing IGMP
      Membership Report of a given host for a given multicast route.
      The version bits help associate IGMP version of receivers
      participating within the EVPN domain.

   o  The include/exclude bit helps in creating filters for a given
      multicast route.

Is "assists" and "helps" really the terminology we want to use when this
information is literally required in order to construct the relevant
IGMP messages?  (Similarly for the subsequent subsections.)

Section 9.1.1

   The Originator Router Address is the IP address of router originating
   this route.  The SMET Originator Router IP address MUST match that of
   the IMET (or S-PMSI AD) route originated for the same EVI by the same
   downstream PE.

References for IMET and S-PMSI AD might be nice.

   The Flags field indicates the version of IGMP protocol from which the
   Membership Report was received.  It also indicates whether the

Probably "version(s)" and "Report(s)" since we encourage coalescing.

Section 9.3.1

   Maximum Response Time is value to be used while sending query as
   defined in [RFC2236]

"the value to be used while sending queries" (though see the non-nit
comment).