[mpls] Benjamin Kaduk's Discuss on draft-ietf-mpls-lsp-ping-lag-multipath-06: (with DISCUSS and COMMENT)

Benjamin Kaduk via Datatracker <noreply@ietf.org> Tue, 12 March 2019 20:30 UTC

Return-Path: <noreply@ietf.org>
X-Original-To: mpls@ietf.org
Delivered-To: mpls@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 02911131304; Tue, 12 Mar 2019 13:30:31 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-mpls-lsp-ping-lag-multipath@ietf.org, mpls-chairs@ietf.org, loa@pi.nu, mpls@ietf.org
X-Test-IDTracker: no
X-IETF-IDTracker: 6.93.0
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <155242263093.20169.3838924661112115545.idtracker@ietfa.amsl.com>
Date: Tue, 12 Mar 2019 13:30:30 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/mpls/e_4XfWvf6zLA-iqC-R8GtblPFa4>
Subject: [mpls] Benjamin Kaduk's Discuss on draft-ietf-mpls-lsp-ping-lag-multipath-06: (with DISCUSS and COMMENT)
X-BeenThere: mpls@ietf.org
X-Mailman-Version: 2.1.29
List-Id: Multi-Protocol Label Switching WG <mpls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/mpls>, <mailto:mpls-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/mpls/>
List-Post: <mailto:mpls@ietf.org>
List-Help: <mailto:mpls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mpls>, <mailto:mpls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 12 Mar 2019 20:30:36 -0000

Benjamin Kaduk has entered the following ballot position for
draft-ietf-mpls-lsp-ping-lag-multipath-06: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)

Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.

The document, along with other ballot positions, can be found here:


Thanks for this document; it's clearly going to provide value, and it
gives a pretty well-readable description of how things are expected to
work.  That said, there's a number of details that need to be worked out
before publication...

This document has six listed authors; per RFC 7322, documents listing
more than five authors are unusual, and six is greater than five.  Let's
talk about the author count.

"optional TLV" seems to be used in MPLS contexts as a technical term for
"comprehension-optional", not in the "optional to send" sense; it's the
counterpart of "mandatory TLV", and this terminology is even documented
in the MPLS TLVs registry.  It's my understanding that the LSP
Capabilities TLV and the Detailed Interface and Label Stack TLV are
intended to be comprehension-required (i.e., "mandatory TLVs"), and thus
that we must not use the phrase "optional TLV" in their description.
This would be made clear if we listed which range of values we intended
to allocate a TLV type from, in the IANA considerations, but that
remains unspecified at this time.

In a similar vein (the "comprehension-required" behavior), there are a
few places (Section 3 (twice), and Section 6; also Section 3 for the LAG
Description Indicator flag; see COMMENT) where we state new normative
language ("MUST") that is unenforceable, since it would need to apply to
MPLS implementations that do not implement this specification.
Fortunately, the comprehension-required TLV ranges provide this
functionality for us without the need to use normative language.

Section 4.2 has some conflicting "MUST"s about the ordering/presence of
sub-TLVs in the DDMAP TLV (see COMMENT, and also the "MANDATORY"
langauge in Figure 2).

If I'm reading Section 5.1.2 correctly, the described procedure is only
intended to apply when the "G" flag is present (as well as the "I" flag,
the requirement for which is explicitly stated), but this is not
explicitly stated.  In particular, the text as written says it applies
to all responders that "understand the LAG Description Indicator flag"
with no mention of runtime check for the presence of that flag.

I also worry that Sections 8, 9, and 10 are insufficiently clear about the
encoding of the interface index value -- is it an integer in NBO, an
opaque bitstring, or something else?


Am I reading this correctly that the "LSR Capability TLV" created here
is basically only applicable to properties of MPLS echo request/reply
exchanges?  If so, then perhaps the moniker "LSR Capability" is overly

Section 1.2

   o  Label switching over some member links of the LAG is successful,
      but will be failed over other member links of the LAG.

nit: there's some verb tense inconsistency here; maybe "but fails over
other member links" would help.

Section 2

   This document defines an optional TLV to discover the capabilities of
   a responder LSR and extensions for use with the MPLS LSP Ping and

nit: is it only the *responder* LSR we care about, or are there
potentially intermediaries in play?

   The solution consists of the MPLS echo request containing a DDMAP TLV
   and the optional LSR capability TLV to indicate that separate load

nit: Is this "optional capability TLV" the same "optional TLV" that we are
defining in this document?  If so, calling it "the new optional LSR
capability" would aid clarity.  (Also, we seem to use both the
"capability" and "Capability" capitzliations for describing this TLV.)

Section 3

   Capability TLV in the MPLS echo request message.  When the responder
   LSR receives an MPLS echo request message with the LSR Capability TLV
   included, if it knows the LSR Capability TLV, then it MUST include
   the LSR Capability TLV in the MPLS echo reply message with the LSR
   Capability TLV describing features and extensions supported by the
   local LSR.  Otherwise, an MPLS echo reply MUST be sent back to the
   initiator LSR with the return code set to "One or more of the TLVs
   was not understood".  [...]

This last MUST seems like something that this document cannot mandate
(as it attempts to apply to non-implementations of this document); it
could only work if it was an existing requirement from the core MPLS
spec, in which case lower-case "must" is appropriate (possibly with
section reference).

   o  If the responder LSR does not understand the "LAG Description
      Indicator flag":

      *  Clear both the "Downstream LAG Info Accommodation flag" and the
         "Upstream LAG Info Accommodation flag".

Similarly, if an LSR does not understand a flag, it cannot give special
treatment to packets containing that flag.  (Why an LSR would not
understand a flag defined in the same document that defines the
Capabilities TLV is beyond me, but you seem to want to allow for that

   If the responder does not know the LSR Capability TLV, an MPLS echo
   reply with the return code set to "One or more of the TLVs was not
   understood" MUST be sent back to the initiator LSR.

This duplicates the content I quoted at the top of this section's

Section 4.1

   Once the initiator LSR knows the capabilities that a responder
   supports, then it sends an MPLS echo request carrying a DDMAP with
   the "LAG Description Indicator flag" (G) set to the responder LSR.

nit: I think the key point is not that the initiator knows the
capabilities of the peer, but rather that the initiator knows that the
peer supports this specific mechanism.
Also, is this "a DDMAP TLV"?

Section 4.2

Reading this makes it sound like the "LAG Description Indicator flag" is
completely orthogonal to regular DDMAP functionality, so that if I set
the LAG description indicator flag but do not otherwise request detailed
descriptions, only the LAG interfaces are returned.  However, the flag
in question has to be inside a DDMAP TLV (and we should really say so,
perhaps as "[flag] set *in the DDMAP TLV*"!), so it seems like in practice
the LAG information can only be obtained in conjunction with full
detailed description information.  (The specific suggestion here is to
note clearly that the flag is se in the DDMAP TLV.)

      *  The responder LSR MUST include a DDMAP TLV when sending the
         MPLS echo reply.

I got confused on my first pass through this section; adding "There is a
single DDMAP TLV for the LAG interface, with member links described
using sub-TLVs" here would have kept me from veering onto the wrong path.
(I thought there was going to be one DDMAP TLV per member link, plus one
for the LAG aggregate, with lots of duplication.)

   Based on the procedures described above, every LAG member link will
   have a Local Interface Index Sub-TLV and a Multipath Data Sub-TLV
   entries in the DDMAP TLV.  The order of the Sub-TLVs in the DDMAP TLV
   for a LAG member link MUST be Local Interface Index Sub-TLV
   immediately followed by Multipath Data Sub-TLV.  A LAG member link
   MAY also have a corresponding Remote Interface Index Sub-TLV.  When a
   Local Interface Index Sub-TLV, a Remote Interface Index-Sub-TLV and a
   Multipath Data Sub-TLV are placed in the DDMAP TLV to describe a LAG
   member link, they MUST be placed in the order of Local Interface
   Index Sub-TLV, Remote Interface Index-Sub-TLV and Multipath Data Sub-

This last MUST contradicts the previous MUST.  I suggest some more text
to clarify under which conditions each requirement applies.

I'd also suggest not using the figure for conveying a (apparent?) mandatory
requirement, and instead adding some text at the end: "The block of local
interface index, (optional remote interface index) and multipath data
sub-TLVs for each member link MUST appear adjacent to each other in
order of increasing local interface index."

It's confusing to label the multiplath data sub-TLV as "MANDATORY" when
the body text says "MUST add an [sic] Multipath data Sub-TLV [...] if
the received DDMAP TLV requested multipath information", i.e., it is not
always present, based on the contents of the echo request.

   When none of the received multipath information maps to a particular
   LAG member link, then the responder LSR MUST still place the Local
   Interface Index Sub-TLV and the Multipath Data Sub-TLV for that LAG
   member link in the DDMAP TLV, the value of Multipath Length field of
   the Multipath Data Sub-TLV is set to zero.

nit: the last comma is a comma splice.

Section 5.1.2

It seems like this places a slightly stronger burden on the responder
than stock 8029 does, in that there is a MUST-level requirement to act
on the 'I' flag in combination with the 'G' flag, whereas for the 'I'
flag alone 8029 only has a SHOULD-level requirement to respond
accordingly.  We may want to call this out as a change in behavior.

Section 5.1.3

   Expectation is that there's a relationship between the interface
   index of the outgoing LAG member link at TTL=n and the interface
   index of the incoming LAG member link at TTL=n+1 for all discovered
   entropies.  In other words, set of entropies that load balances to

nit: "The expectation"
nit: "discovered entropies" is a strange wording given how the entropy
label works; is this more like "all entropies examined"?

   The initiator LSR sends two MPLS echo request messages to traverse
   the two LAG member links at TTL=n+1:

How does the initiator know (which entropy label values?) to get the echo
requests to traverse different member links?

   o  Error case:

      *  One MPLS echo request message reaches TTL=n+1 on an LAG member
         link 1.

      *  The other MPLS echo request message also reaches TTL=n+1 on an
         LAG member link 1.

      One or two MPLS echo request messages sent by the initiator LSR
      cannot reach the immediate downstream LSR, or the two MPLS echo
      request messages reach at the immediate downstream LSR from the
      same LAG member link.

The description paragraph doesn't seem to match up the scenario
described in the sub-bullets, which leaves me confused as to what
scenario(s) are intended to be considered as the "error case".

Section 6

   This document defines a new optional TLV which is referred to as the
   "LSR Capability TLV.  [...]

Is this optional to use or comprehension-optional?  (We elsewhere rely
on comprehension-mandatory behavior, so in the RFC 8029 terminology this
seems to be a "mandatory TLV" even if it is not mandatory to use.)

   included in the MPLS echo reply message.  Otherwise, if the responder
   does not know the LSR Capability TLV, an MPLS echo reply with the
   return code set to "One or more of the TLVs was not understood" MUST
   be sent back to the initiator LSR.

This duplicates a requirement stated elsewhere (that really ought to
just be using existing required behavior from 8029 anyway).

   LSR Capability TLV Type is TBD1.  Length is 4.  The value field of

In a hypothetical future where we need to allocate a 33rd capability
flag, would we rather create a new "extended capabilities" TLV (burning
another 32 bits for type+length) or leave flexibility now for 'length'
to increase in multiples of 4 with receivers ignoring bits past what
they know about?

      The "LSR Capability Flags" field is 4 octets in length, this
      document defines the following flags:

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     |                  Must Be Zero (Reserved)                  |U|D|

I'm wary of describing these as "Must Be Zero", since they're just
unallocated but not required to be zero other than by the current state
of bit allocations.  (I guess this is conventional for MPLS, though?  So
maybe it's not a problem.)  Just "Reserved" would be fine.

      This document defines two flags.  The remaining flags MUST be set
      to zero when sending and ignored on receipt.  Both the U and the D
      flag MUST be cleared in the MPLS echo request message when
      sending, and ignored on receipt.  Neither, either or both the U
      and the D flag MAY be set in the MPLS echo reply message.

Similarly, I think we want to say "unallocated flags" rather than "the
remaining flags".

It also seems like the semantics being described here are "the TLV value
is ignored in the echo request and only has meaning in the response"
(i.e., this is a "tell me your capabilities" exchange, not a "here are
mine; what are yours?" exchange).  As written, that applies only to the
U and D bits, and is not a general property.  That may well be a fine
choice if we think we might want the two-way exchange for some future
allocated bit, but if we do think it's a general property of the
mechanism, I'd suggest rewording the text.

Section 7

(Per IANA, bits 4 and 5 are already assigned but are not reflected in the

Section 10

   The Detailed Interface and Label Stack TLV format is derived from the
   Interface and Label Stack TLV format (from [RFC8029]).  Two changes
   are introduced.  The first is that the label stack is converted into
   a sub-TLV.  The second is that a new sub-TLV is added to describe an
   interface index.  These fields of Detailed Interface and Label Stack
   TLV have the same use and meaning as in [RFC8029].  A summary of
   these fields is as below:

nit: Are "These fields" just "The other fields"?

         The Address Type indicates if the interface is numbered or
         unnumbered.  It also determines the length of the IP Address
         and Interface fields.  The resulting total length of the
         initial part of the TLV is listed as "K Octets".  The Address
         Type is set to one of the following values:

nit: is it better to list the currently allocated values or just refer
to the IANA registry?

I seee that this "index assigned to the interface" language for
unnumbered address types originates from RFC 8029, but both this
document and 8029 leave me confused as to the encoding used for this
index (for either/both IPv4 or IPv6 address types).

[My comment about "Must Be Zero" and allocatable bits also applies here]

Section 12

This document also adds a capability "negotiation" mechanism for MPLS
echo request/reply exchanges.  As MPLS does not offer integrity
protection for its messages, this negotiation is only as trustworthy as
the network from which messages are accepted as valid, and initiators
have no recourse but to accept faithfully the echo replies received.

   To prevent leakage of vital information to untrusted users, a
   responder LSR MUST only accept MPLS echo request messages from
   designated trusted sources via filtering source IP address field of
   received MPLS echo request messages.  [...]

I'd prefer to see a note added here about how "source IP address
filtering provides only a weak form of access control and is not, in
general, a reliable security mechanism.  Nonetheless, it is required
here in the absence of any more robust mechanism that might be used."

Section 13

Should we ask IANA to update the "Interface and Label Stack Address
Types" registry to also refer to this document, since we are sharing the
registry between the Interface and Label Stack TLV and the Detailed

Section 13.2

The range 4-31743 spans two different allocation procedures (Standards
Action and Specification Required - Experimental RFC needed); I assume
that we want the 4-16383 range for Standards Action - mandatory TLVs.

Section 13.4

We don't say what range we want this to be allocated from -- I assume
4-16383 since this is a negotiated feature and sending it outside the
negotiated scenario should be an error.

Section 13.4.1

(I note that some existing sub-TLV registries have "Experimental RFC
required" for the experimental ranges, but that is arguably more
stringent than necessary; Specification Required seems more appropriate
for this case.)

Appendix A

I was kind of hoping for some more guidance on what an initiator could
do in these scenarios to try to get better visibility into the switch
behavior (and, hopefully, a workaround to get reliable echo responses
that cover all LAG member links).  AFAICT there's not a better option
than "try a bunch of entropy labels and see what responses you can get
back" and that's the same remedy in all the described topologies, but it
would be nice to have this stated explicitly for the reader.

Section A.1

                    In the worst case, MPLS echo request messages with
   specific entropies to exercise every LAG members from R1 to S1 can
   all reach R2 via same LAG member.  [...]

I'm confused by the phrase "specific entropies to exercise every LAG
members [sic] from R1 to S1" -- is there some deterministic algorithm by
which a specific entropy label value will force a specific LAG member
link?  My understanding was that the entropy was used as input to an
opaque hash function and that trial-and-error was the only way to
explore the mapping from entropy value to LAG member link taken.