[bess] Benjamin Kaduk's No Objection on draft-ietf-bess-evpn-optimized-ir-09: (with COMMENT)
Benjamin Kaduk via Datatracker <noreply@ietf.org> Thu, 28 October 2021 02:53 UTC
Return-Path: <noreply@ietf.org>
X-Original-To: bess@ietf.org
Delivered-To: bess@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 6EF003A05C7; Wed, 27 Oct 2021 19:53:03 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-bess-evpn-optimized-ir@ietf.org, bess-chairs@ietf.org, bess@ietf.org, Matthew Bocci <matthew.bocci@nokia.com>, matthew.bocci@nokia.com
X-Test-IDTracker: no
X-IETF-IDTracker: 7.39.0
Auto-Submitted: auto-generated
Precedence: bulk
Reply-To: Benjamin Kaduk <kaduk@mit.edu>
Message-ID: <163538958283.18208.14529066116844758492@ietfa.amsl.com>
Date: Wed, 27 Oct 2021 19:53:03 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/bess/6Xxt8v5yJxby14LJcEkyLJ_G-qw>
Subject: [bess] Benjamin Kaduk's No Objection on draft-ietf-bess-evpn-optimized-ir-09: (with COMMENT)
X-BeenThere: bess@ietf.org
X-Mailman-Version: 2.1.29
List-Id: BGP-Enabled ServiceS working group discussion list <bess.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bess>, <mailto:bess-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bess/>
List-Post: <mailto:bess@ietf.org>
List-Help: <mailto:bess-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bess>, <mailto:bess-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Oct 2021 02:53:04 -0000
Benjamin Kaduk has entered the following ballot position for draft-ietf-bess-evpn-optimized-ir-09: No Objection When responding, please keep the subject line intact and reply to all email addresses included in the To and CC lines. (Feel free to cut this introductory paragraph, however.) Please refer to https://www.ietf.org/blog/handling-iesg-ballot-positions/ for more information about how to handle DISCUSS and COMMENT positions. The document, along with other ballot positions, can be found here: https://datatracker.ietf.org/doc/draft-ietf-bess-evpn-optimized-ir/ ---------------------------------------------------------------------- COMMENT: ---------------------------------------------------------------------- Thanks to Derek Atkins for the secdir review. Thanks as well to John Scudder for his detailed review; I support his discuss position and have omitted a few of my comments that he has already covered. (There are probably a few more that I could have omitted, but I did not do an exhaustive check. Feel free to just point to his ballot thread instead of repeating the explanation to me.) I think it would be very helpful to clearly state early on what the difference between the "selective" and "non-selective" setups is. The first description I see is not until §6.2 (I comment below where it appears as well). Section 2 - AR-IP: IP address owned by the AR-REPLICATOR and used to differentiate the ingress traffic that must follow the AR procedures. >From context I infer that the AR-IP is advertised along with the Replicator-AR RT-3 route. Since we talk about other defined values as being advertised along with such RT-3 routes, should we also say that this IP is advertised along with the corresponding RT-3 route? - AR-VNI: VNI advertised by the AR-REPLICATOR along with the Replicator-AR route. It is used to identify the ingress packets that must follow AR procedures ONLY in the Single-IP AR-REPLICATOR case. This phrasing seems ambiguous: please distinguish whether this is used only in the single-IP AR-REPLICATOR case or it identifies packets that sometimes follow AR procedures (in the single-IP AR-REPLICATOR case) and sometimes do not. - PTA: PMSI Tunnel Attribute PMSI is not marked as "well-known" at https://www.rfc-editor.org/materials/abbrev.expansion.txt and should be expanded on first use or otherwise defined. - EVI: EVPN Instance. An EVPN instance spanning the Provider Edge (PE) devices participating in that EVPN This seems rather circular. Can we define "EVPN Instance" without reference to "EVPN instance"? Section 3 c. The solution is compatible with [RFC7432] and [RFC8365] and has no impact on the EVPN procedures for BM traffic. In particular, I do not think that "no impact on the EVPN procedures" is what was intended -- it obviously has impact on the procedures, since it is implemented differently. Perhaps it has no impact on the CE, but that's not what this text seems to say. Section 4 I agree with the directorate reviewer that splitting the RT-3 NLRI layout and the PTA general format into separate figures is quite worthwhile. I would also suggest naming the first one as the NLRI of the RT-3 route type, rather than leaving that implicit. The Inclusive Multicast Ethernet Tag route (RT-3) and its PMSI Tunnel Attribute's (PTA) general format used in [RFC7432] are shown below: I suggest referencing RFC 6514 as the source of the PTA format. The Flags field is 8 bits long. This document defines the use of 4 bits of this Flags field: That's half of the flag bits! Why is it better to allocate so many flags than to move more structure into the tunnel identifier portion of the PTA? I guess RFC 7902 does provision for extended tunnel attribute flags, but the question of whether these all belong as flags still seems valid. - Regular-IR route: in this route, Originating Router's IP Address, Tunnel Type (0x06), MPLS Label and Tunnel Identifier MUST be used as described in [RFC7432] when Ingress Replication is in use. The NVE/PE that advertises the route will set the Next-Hop to an IP address that we denominate IR-IP in this document. When advertised by an AR-LEAF node, the Regular-IR route SHOULD be advertised with type T= AR-LEAF. When would I violate this SHOULD (and what other behaviors would be usable)? o Originating Router's IP Address MUST be set to an IP address of the PE that should be common to all the EVIs on the PE (usually this is the PE's loopback address). The Tunnel Identifier and Is it really the usual case that a PE has only one loopback address (so that the definite article "the" applies)? This seems particularly poigniant since we assume that AR-REPLICATORs will have multiple addresses available for use, to distinguish inbound IR and AR traffic. Next-Hop SHOULD be set to the same IP address as the Originating Router's IP address when the NVE/PE originates the route. [...] Should we say anything about what they are set to when the NVE/PE does not originate the route? It is only used for selective AR and its fields are set as follows: The antecedent for "its fields" seems to be "the Leaf A-D route (RT-11)"; I suggest using the precise terminology that the fields of the "route type specific portion of the route" are what are described. Precise use of terminology makes the documents much more approachable to unfamiliar readers that rely on textual search to correlate the relevant parts of the various documents in question. - Replicator-AR route: this route is used by the AR-REPLICATOR to advertise its AR capabilities, with the fields set as follows: o Originating Router's IP Address MUST be set to an IP address of the PE that should be common to all the EVIs on the PE (usually this is the PE's loopback address). The Tunnel Identifier and I note that the guidance in RFC 7432 for constructing what we in this document refer to as the "Regular-IR route" also has text about "the PE's loopback address" being useful for what we would call the IR-IP, but this address here is the AR-IP and (if we keep reading) SHOULD be different than the IR-IP. I think we need to say something about whether PEs are really expected to only have one ("the") loopback address vs multiple, and if there is only one how to decide whether to use it as AR-IP or IR-IP. To use language ("the PE's loopback address") that implies there is only one, while strongly suggesting that it be used for two different purposes and also strongly suggesting that those two different purposes have different addresses, seems to be internally inconsistent. o The AR-LEAF constructs an IP-address-specific route-target as indicated in [I-D.ietf-bess-evpn-bum-procedure-updates], by placing the IP address carried in the Next-Hop field of the received Replicator-AR route in the Global Administrator field of the Community, with the Local Administrator field of this Community set to 0. [...] The analogous text in draft-ietf-bess-evpn-bum-procedure-updates also mentions "setting the Extended Communities attribute of the Leaf A-D route to that Community"; would that be useful to include here as well? o The Leaf A-D route MUST include the PMSI Tunnel attribute with the Tunnel Type set to AR, type set to AR-LEAF and the Tunnel Identifier set to the IP of the advertising AR-LEAF. The PMSI Tunnel attribute MUST carry a downstream-assigned MPLS label or VNI that is used by the AR-REPLICATOR to send traffic to the AR-LEAF. This seems to be the only place where we specify the actual format/contents (i.e., including Tunnel Identifier contents) of the "AR" PTA tunnel type. I would have expected something more declarative of a declaration, that the IANA registration could point to. Section 5.1 It's a bit unfortunate that there's so much overlap between the list of "considerations" and the "rules" that an implementation must be compatible with, but it may be too risky to try to coalesce them at this time. b. An AR-REPLICATOR MUST advertise a Replicator-AR route and MAY advertise a Regular-IR route. The AR-REPLICATOR MUST NOT generate a Regular-IR route if it does not have local attachment circuits (AC). If the Regular-IR route is advertised, the AR Type field is set to zero. This seems to merit some more substantial discussion, since the value of zero in the AR type field is otherwise avoided in this document. That is, we have specific values for "leaf" and "acting as replicator", but the value zero is normally "does not support optimized-ir". Except here it's also used for "replicator advertising as non-replicator role"; it's probably appropriate to not abuse "leaf" for this case, but using zero seems to in some sense be a different abuse. Would the '11' value have been usable to indicate this distinction? Section 6 The solution is called "selective" because a given AR-REPLICATOR MUST replicate the BM traffic to only the AR-LEAF that requested the replication (as opposed to all the AR-LEAF nodes) and MAY replicate the BM traffic to the RNVEs. [...] I'm not sure I understand the motivation behind MAY, here. If we don't replicate the BM traffic to RNVEs isn't that data loss? Section 6.1 o When a node defined and operating as Selective AR-REPLICATOR receives a packet on an overlay tunnel, it will do a tunnel destination IP lookup and if the destination IP is the AR- REPLICATOR AR-IP Address, the node MUST replicate the packet to: [...] + overlay tunnels to the remote Selective AR-REPLICATORs if the tunnel source IP is an IR-IP of its own AR-LEAF-set (in any other case, the AR-REPLICATOR MUST NOT replicate the BM traffic to remote AR-REPLICATORs), where the tunnel destination IP is the AR-IP of the remote Selective AR- REPLICATOR. The tunnel destination IP AR-IP will be a It seems like it would require less cognitive burden on the reader if we disambiguated "tunnel source IP" as it relates to the incoming tunnel on which the packet in question was received vs the outgoing tunnel to which it is being replicated. ("tunnel destination IP" is arguably already disambiguated by the lead-in text that talks about doing a lookup based on the tunnel the packet was received on.) Given that the "rules" that appear later to specifically say that it checks both destination and source of the underlay IP header, it seems reasonable to say something similar here when listing the "considerations". + The Selective AR-REPLICATOR-set is composed of the overlay tunnels to all the AR-REPLICATORs that send a Replicator-AR route with L=1. The AR-IP addresses are used as tunnel destination IP. I'm not sure why the "that send a Replicator-AR route with L=1" clause is needed -- if there are AR-REPLICATORS that send with L=0 then aren't we required to fall back to the non-selective procedures? - In any case, non-BM overlay tunnels are excluded from flood-lists and, also, source squelching is always done in order to ensure the traffic is not sent back to the originating source. If the encapsulation is MPLSoGRE (or MPLSoUDP) and the BD label is not the bottom of the stack, the AR-REPLICATOR MUST copy the rest of the labels when forwarding them to the egress overlay tunnels. I'm not sure that I understand which labels "the rest of the labels" are in this context. Section 6.2 In the example of Figure 1, we consider NVE1/NVE2/NVE3 as Selective AR-LEAFs. NVE1 selects PE1 as its Selective AR-REPLICATOR. If that is so, NVE1 will send all its BM traffic for BD-1 to PE1. If other AR-LEAF/REPLICATORs send BM traffic, NVE1 will receive that traffic from PE1. These are the differences in the behavior of a Selective AR-LEAF compared to a non-selective AR-LEAF: I think this might be the first time we concretely say what makes the "selective" procedures earn that name (the combined selectivity of all BM traffic, in both directions, between leaf and replicator, as opposed to the non-selective case where leafs pick only the replicator that they send to, and must receive from everywhere. This seems like something that would be useful to have much earlier in the document, e.g., in the introduction. (It's also somewhat different than the sense in which RFC 6513 constrasts selective and inclusive tunnels, though I expect it's probably too late to try to change the terminology used here.) Section 8 - An AR-REPLICATOR will perform IR or AR forwarding mode for the incoming Overlay packets based on an ingress VNI lookup, as opposed to the tunnel IP DA lookup. Note that, when replicating to remote AR-REPLICATOR nodes, the use of the IR-VNI or AR-VNI advertised by the egress node will determine the IR or AR forwarding mode at the subsequent AR-REPLICATOR. Does this implicitly put a requirement on all AR-REPLICATOR implementations to support the VNI-based scheme, since they might be called upon to forward to another replicator using it? Section 9.1 In order to be compatible with the IP SA split-horizon check, the AR- REPLICATOR MAY keep the original received tunnel IP SA when replicating packets to a remote AR-LEAF or RNVE. This will allow AR- LEAF nodes to apply Split-horizon check procedures for BM packets, before sending them to the local Ethernet-Segment. Even if the AR- LEAF's IP SA is preserved when replicating to AR-LEAFs or RNVEs, the AR-REPLICATOR MUST always use its IR-IP as IP SA when replicating to other AR-REPLICATORs. It seems unfortunate that an AR-LEAF node needs to have knowledge of the configuration in use at remote AR-REFLECTORs in order to know if the split-horizon check will be effective. Is there no way to always require certain replicator behavior and give the leafs reliable knowledge in split-horizon scenarios? Section 9.2 Ethernet Segments associated to one or more AR-REPLICATOR nodes SHOULD follow "Local-Bias" procedures for EVPN all-active multi- homing, as follows: Is it really the "ethernet segments" that would follow local-bias procedures, or the EVPN nodes attached to them? Is putting SHOULD-level guidance to this effect in effect updating the core EVPN specification to privilege one way of handling multi-homing over others? (Maybe not, since the requirements only come into play when AR-REPLICATORs are involved and we disclaim applicability to cases where AR-REPLICATOR and AR-LEAF are on the same ethernet segment ... we might consider saying that as some part of why that case is out of scope.) Also, if we know of procedures other than local-bias that will still be effective, we might mention them as some justification for why this is only a SHOULD and not a MUST. Section 10 Since we use the Leaf A-D route from [bum-procedure-update], we might want to pull in its security considerations as well. I feel like there may be some more considerations to mention that are specific to the multi-homing case, but I don't think I understand that scenario well enough to be able to state them, myself. We might also mention that AR-REPLICATORs are, by design, using more bandwidth than stock RFC7432 PEs would, and if they exceed their local bandwidth that will cause service disruption. The text that's here already does do a pretty good job of capturing the important topics for the common case, though -- thanks! An implementation following the procedures in this document should not create BM loops, since the AR-REPLICATOR will always forward the BM traffic using the correct tunnel IP Destination Address that indicates the remote nodes how to forward the traffic. This is true in both, the Non-Selective and Selective modes defined in this document. (In the vein of my earlier comment,) what about the case when the tunnel destination is expecting to use VNI to determine how to forward the traffic? Section 14.2 Having SHOULD-level guidance to use the "local bias" procedures detailed in RFC 8365 might require that document to be promoted to a normative reference; see https://www.ietf.org/about/groups/iesg/statements/normative-informative-references/ NITS Section 1 Section 3 lists the requirements of the combined optimized-IR solution, whereas Section 5 and Section 6 describe the Assisted- Replication (AR) solution, and Section 7 the Pruned-Flood-Lists (PFL) solution. I suggest mentioning that sections 5 and 6 differ in that they cover the selective and non-selective cases. Section 2 - Regular-IR: Refers to Regular Ingress Replication, where the source NVE/PE sends a copy to each remote NVE/PE part of the BD. s/part of/that is part of/ Section 4 - Regular-IR route: in this route, Originating Router's IP Address, Tunnel Type (0x06), MPLS Label and Tunnel Identifier MUST be used as described in [RFC7432] when Ingress Replication is in use. The NVE/PE that advertises the route will set the Next-Hop to an IP address that we denominate IR-IP in this document. When advertised by an AR-LEAF node, the Regular-IR route SHOULD be advertised with type T= AR-LEAF. Hmm, down near the end of page 9 we say that AR-enabled nodes MUST signal the proper AR type (1 or 2) according to its administrative choice -- how is that MUST compatible with the SHOULD here? Also, if we're going to write out T = 01 (AR-REPLICATOR) just a few lines later, we should write out T = 10 (AR-LEAF) here. o The AR-LEAF constructs an IP-address-specific route-target as indicated in [I-D.ietf-bess-evpn-bum-procedure-updates], by placing the IP address carried in the Next-Hop field of the Pedantically, "as indicated in [bum-procedure-update]" would involve "placing the IP address carried in the Next Hop of the received I/S-PMSI A-D route in the Global Administrator field of the Community", which is obviously not going to be applicable in this case. So "analogously to" might be more appropriate than "as indicated". received Replicator-AR route in the Global Administrator field of the Community, with the Local Administrator field of this Community set to 0. Note that the same IP-address-specific import route-target is auto-configured by the AR-REPLICATOR that sent the Replicator-AR, in order to control the acceptance of the Leaf A-D routes. This "Note that ... is auto-configured by" phrasing suggests to me that there is some more detailed text elsewhere laying out a requirement to do this (and any needed procedures, though I suspect there are no real procedures to document). However, later on §6.1 refer back to §4 (here) for "the AR-REPLICATOR auto-configures its IP-address-specific import route-target as described in section Section 4." Maybe we could write this in a way that's more clearly a specification and binding on the AR-REPLICATOR? o The Leaf A-D route MUST include the PMSI Tunnel attribute with the Tunnel Type set to AR, type set to AR-LEAF and the Tunnel "type" here seems to refer to the new T field in the PTA flags, and should probably be referenced using consistent terminlogy. Each AR-enabled node MUST understand and process the AR type field in the PTA (Flags field) of the routes, and MUST signal the (same point about consistent terminology for T/AR-type) corresponding type (1 or 2) according to its administrative choice. I suggest writing "01" and "10" to match the previous treatment of the two-bit field. Section 5.1 - When an AR-REPLICATOR receives a BM packet on an AC, it will forward the BM packet to its flooding list (including local ACs and remote NVE/PEs), skipping the non-BM overlay tunnels. I assume that it goes without saying that the AR-REPLICATOR does not flood the packet back to the AC it came in on. (The "rules" later in the section do specify source squelching.) Section 5.2 b. In this non-selective AR solution, the AR-LEAF MUST advertise a single Regular-IR inclusive multicast route as in [RFC7432]. The AR-LEAF SHOULD set the AR Type field to AR-LEAF. Note that although this flag does not make any difference for the egress nodes when creating an EVPN destination to the AR-LEAF, it is egress, or ingress?
- [bess] Benjamin Kaduk's No Objection on draft-iet… Benjamin Kaduk via Datatracker
- Re: [bess] Benjamin Kaduk's No Objection on draft… Rabadan, Jorge (Nokia - US/Mountain View)