Re: [mpls] Benjamin Kaduk's Discuss on draft-ietf-mpls-lsp-ping-lag-multipath-06: (with DISCUSS and COMMENT)

Mach Chen <> Wed, 13 March 2019 07:06 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 2C537130EC7; Wed, 13 Mar 2019 00:06:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -4.2
X-Spam-Status: No, score=-4.2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 348HNrLMPV7x; Wed, 13 Mar 2019 00:06:54 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id C13D9130EA7; Wed, 13 Mar 2019 00:06:53 -0700 (PDT)
Received: from (unknown []) by Forcepoint Email with ESMTP id A33DDAAE26A43FD64B7C; Wed, 13 Mar 2019 07:06:51 +0000 (GMT)
Received: from ( by ( with Microsoft SMTP Server (TLS) id 14.3.408.0; Wed, 13 Mar 2019 07:06:51 +0000
Received: from ( by ( with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1591.10; Wed, 13 Mar 2019 07:06:51 +0000
Received: from ( by ( with Microsoft SMTP Server (version=TLS1_0, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA_P256) id 15.1.1591.10 via Frontend Transport; Wed, 13 Mar 2019 07:06:50 +0000
Received: from ([]) by ([]) with mapi id 14.03.0415.000; Wed, 13 Mar 2019 15:06:43 +0800
From: Mach Chen <>
To: Benjamin Kaduk via Datatracker <>, The IESG <>
CC: "" <>, "" <>, "" <>, "" <>
Thread-Topic: Benjamin Kaduk's Discuss on draft-ietf-mpls-lsp-ping-lag-multipath-06: (with DISCUSS and COMMENT)
Thread-Index: AQHU2RJ3vF0A2kVnzkeAK0NcOKNWfKYI0+dw
Date: Wed, 13 Mar 2019 07:06:42 +0000
Message-ID: <>
References: <>
In-Reply-To: <>
Accept-Language: en-US, zh-CN
Content-Language: zh-CN
x-originating-ip: []
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-CFilter-Loop: Reflected
Archived-At: <>
Subject: Re: [mpls] Benjamin Kaduk's Discuss on draft-ietf-mpls-lsp-ping-lag-multipath-06: (with DISCUSS and COMMENT)
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Multi-Protocol Label Switching WG <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 13 Mar 2019 07:06:57 -0000

Hi Benjamin,

Thanks for your comments!

Please see some replies inline...

> -----Original Message-----
> From: Benjamin Kaduk via Datatracker []
> Sent: Wednesday, March 13, 2019 4:31 AM
> To: The IESG <>
> Cc:;;
> Subject: Benjamin Kaduk's Discuss on draft-ietf-mpls-lsp-ping-lag-multipath-
> 06: (with DISCUSS and COMMENT)
> Benjamin Kaduk has entered the following ballot position for
> draft-ietf-mpls-lsp-ping-lag-multipath-06: Discuss
> When responding, please keep the subject line intact and reply to all email
> addresses included in the To and CC lines. (Feel free to cut this introductory
> paragraph, however.)
> Please refer to
> for more information about IESG DISCUSS and COMMENT positions.
> The document, along with other ballot positions, can be found here:
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> Thanks for this document; it's clearly going to provide value, and it gives a
> pretty well-readable description of how things are expected to work.  That


> said, there's a number of details that need to be worked out before
> publication...


> This document has six listed authors; per RFC 7322, documents listing more
> than five authors are unusual, and six is greater than five.  Let's talk about the
> author count.

Loa has answered the question. 

Please refer to the Shepherd writeup where it is noted support of the six authors

> "optional TLV" seems to be used in MPLS contexts as a technical term for
> "comprehension-optional", not in the "optional to send" sense; it's the
> counterpart of "mandatory TLV", and this terminology is even documented in
> the MPLS TLVs registry.  It's my understanding that the LSP Capabilities TLV
> and the Detailed Interface and Label Stack TLV are intended to be
> comprehension-required (i.e., "mandatory TLVs"), and thus that we must
> not use the phrase "optional TLV" in their description.
> This would be made clear if we listed which range of values we intended to
> allocate a TLV type from, in the IANA considerations, but that remains
> unspecified at this time.

Good catch on the LSP Capabilities TLV, it should be specified as assigned from the range 0-16383 as a response is required. The TLV is optional, as noted this range also covers optional TLVs for which a response is required. Per RFC8029, the Downstream Detailed Mapping TLV which is used in this RFC is also optional.

> In a similar vein (the "comprehension-required" behavior), there are a few
> places (Section 3 (twice), and Section 6; also Section 3 for the LAG Description
> Indicator flag; see COMMENT) where we state new normative language
> ("MUST") that is unenforceable, since it would need to apply to MPLS
> implementations that do not implement this specification.
> Fortunately, the comprehension-required TLV ranges provide this
> functionality for us without the need to use normative language.

OK, this is addressed with the below CMMENT.

> Section 4.2 has some conflicting "MUST"s about the ordering/presence of
> sub-TLVs in the DDMAP TLV (see COMMENT, and also the "MANDATORY"
> langauge in Figure 2).

OK, this is addressed with the COMMENT. 

> If I'm reading Section 5.1.2 correctly, the described procedure is only
> intended to apply when the "G" flag is present (as well as the "I" flag, the
> requirement for which is explicitly stated), but this is not explicitly stated.  In
> particular, the text as written says it applies to all responders that
> "understand the LAG Description Indicator flag"
> with no mention of runtime check for the presence of that flag.

How about add a condition at the beginning:

> I also worry that Sections 8, 9, and 10 are insufficiently clear about the
> encoding of the interface index value -- is it an integer in NBO, an opaque
> bitstring, or something else?
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> Am I reading this correctly that the "LSR Capability TLV" created here is
> basically only applicable to properties of MPLS echo request/reply exchanges?
> If so, then perhaps the moniker "LSR Capability" is overly generic.

It's now defined for LSP Ping echo request/reply exchanges, but may be used to return general LSR capabilities in the future. I incline to keep it as is.  Are you suggesting  to change to another name?
> Section 1.2
>    o  Label switching over some member links of the LAG is successful,
>       but will be failed over other member links of the LAG.
> nit: there's some verb tense inconsistency here; maybe "but fails over other
> member links" would help.


> Section 2
>    This document defines an optional TLV to discover the capabilities of
>    a responder LSR and extensions for use with the MPLS LSP Ping and
> nit: is it only the *responder* LSR we care about, or are there potentially
> intermediaries in play?

Only the responder LSR cares. 

>    The solution consists of the MPLS echo request containing a DDMAP TLV
>    and the optional LSR capability TLV to indicate that separate load
> nit: Is this "optional capability TLV" the same "optional TLV" that we are
> defining in this document?  If so, calling it "the new optional LSR capability"
> would aid clarity.  (Also, we seem to use both the "capability" and
> "Capability" capitzliations for describing this TLV.)


And as suggested above, maybe we could remove the "optional" prefix. Will change it to "the new LSP capability". Is it OK for you?

> Section 3
>    Capability TLV in the MPLS echo request message.  When the responder
>    LSR receives an MPLS echo request message with the LSR Capability TLV
>    included, if it knows the LSR Capability TLV, then it MUST include
>    the LSR Capability TLV in the MPLS echo reply message with the LSR
>    Capability TLV describing features and extensions supported by the
>    local LSR.  Otherwise, an MPLS echo reply MUST be sent back to the
>    initiator LSR with the return code set to "One or more of the TLVs
>    was not understood".  [...]
> This last MUST seems like something that this document cannot mandate (as
> it attempts to apply to non-implementations of this document); it could only
> work if it was an existing requirement from the core MPLS spec, in which
> case lower-case "must" is appropriate (possibly with section reference).


>    o  If the responder LSR does not understand the "LAG Description
>       Indicator flag":
>       *  Clear both the "Downstream LAG Info Accommodation flag" and the
>          "Upstream LAG Info Accommodation flag".
> Similarly, if an LSR does not understand a flag, it cannot give special
> treatment to packets containing that flag.  (Why an LSR would not
> understand a flag defined in the same document that defines the Capabilities
> TLV is beyond me, but you seem to want to allow for that
> case.)

>    If the responder does not know the LSR Capability TLV, an MPLS echo
>    reply with the return code set to "One or more of the TLVs was not
>    understood" MUST be sent back to the initiator LSR.
> This duplicates the content I quoted at the top of this section's comments.


> Section 4.1
>    Once the initiator LSR knows the capabilities that a responder
>    supports, then it sends an MPLS echo request carrying a DDMAP with
>    the "LAG Description Indicator flag" (G) set to the responder LSR.
> nit: I think the key point is not that the initiator knows the capabilities of the
> peer, but rather that the initiator knows that the peer supports this specific
> mechanism.

OK, how about :

"Once the initiator LSR knows that a responder can support this mechanism, then it sends an MPLS echo request carrying a DDMAP TLV with the "LAG Description Indicator flag" (G) set to the responder LSR.

> Also, is this "a DDMAP TLV"?


> Section 4.2
> Reading this makes it sound like the "LAG Description Indicator flag" is
> completely orthogonal to regular DDMAP functionality, so that if I set the
> LAG description indicator flag but do not otherwise request detailed
> descriptions, only the LAG interfaces are returned.  However, the flag in
> question has to be inside a DDMAP TLV (and we should really say so, perhaps
> as "[flag] set *in the DDMAP TLV*"!), so it seems like in practice the LAG
> information can only be obtained in conjunction with full detailed description
> information.  (The specific suggestion here is to note clearly that the flag is se
> in the DDMAP TLV.)

OK, then the first sentence will like below:

"When a responder LSR received an MPLS echo request message with the "LAG Description Indicator flag" set in the DDMAP TLV."

>       *  The responder LSR MUST include a DDMAP TLV when sending the
>          MPLS echo reply.
> I got confused on my first pass through this section; adding "There is a single
> DDMAP TLV for the LAG interface, with member links described using sub-
> TLVs" here would have kept me from veering onto the wrong path.
> (I thought there was going to be one DDMAP TLV per member link, plus one
> for the LAG aggregate, with lots of duplication.)


>    Based on the procedures described above, every LAG member link will
>    have a Local Interface Index Sub-TLV and a Multipath Data Sub-TLV
>    entries in the DDMAP TLV.  The order of the Sub-TLVs in the DDMAP TLV
>    for a LAG member link MUST be Local Interface Index Sub-TLV
>    immediately followed by Multipath Data Sub-TLV.  A LAG member link
>    MAY also have a corresponding Remote Interface Index Sub-TLV.  When a
>    Local Interface Index Sub-TLV, a Remote Interface Index-Sub-TLV and a
>    Multipath Data Sub-TLV are placed in the DDMAP TLV to describe a LAG
>    member link, they MUST be placed in the order of Local Interface
>    Index Sub-TLV, Remote Interface Index-Sub-TLV and Multipath Data Sub-
>    TLV.
> This last MUST contradicts the previous MUST.  I suggest some more text to
> clarify under which conditions each requirement applies.

How about adding the following sentence before "The order of ....", the it will look like:

"When a Local Interface Index Sub-TLV and a Multipath Data Sub-TLV are placed in the DDMAP TLV to describe a LAG member link, the order of the Sub-TLVs in the DDMAP TLV..."

> I'd also suggest not using the figure for conveying a (apparent?) mandatory
> requirement, and instead adding some text at the end: "The block of local
> interface index, (optional remote interface index) and multipath data sub-
> TLVs for each member link MUST appear adjacent to each other in order of
> increasing local interface index."


> It's confusing to label the multiplath data sub-TLV as "MANDATORY" when
> the body text says "MUST add an [sic] Multipath data Sub-TLV [...] if the
> received DDMAP TLV requested multipath information", i.e., it is not always
> present, based on the contents of the echo request.

Since it's an example, will  remove the "MANDATORY" and "OPTIONAL" prefix. 

>    When none of the received multipath information maps to a particular
>    LAG member link, then the responder LSR MUST still place the Local
>    Interface Index Sub-TLV and the Multipath Data Sub-TLV for that LAG
>    member link in the DDMAP TLV, the value of Multipath Length field of
>    the Multipath Data Sub-TLV is set to zero.
> nit: the last comma is a comma splice.
> Section 5.1.2
> It seems like this places a slightly stronger burden on the responder than
> stock 8029 does, in that there is a MUST-level requirement to act on the 'I'
> flag in combination with the 'G' flag, whereas for the 'I'
> flag alone 8029 only has a SHOULD-level requirement to respond accordingly.
> We may want to call this out as a change in behavior.

I am fine to change to "SHOULD", then it can align with 8029. 

> Section 5.1.3
>    Expectation is that there's a relationship between the interface
>    index of the outgoing LAG member link at TTL=n and the interface
>    index of the incoming LAG member link at TTL=n+1 for all discovered
>    entropies.  In other words, set of entropies that load balances to
> nit: "The expectation"

> nit: "discovered entropies" is a strange wording given how the entropy label
> works; is this more like "all entropies examined"?

How about  "all entropies discovered"?

>    The initiator LSR sends two MPLS echo request messages to traverse
>    the two LAG member links at TTL=n+1:
> How does the initiator know (which entropy label values?) to get the echo
> requests to traverse different member links?
>    o  Error case:
>       *  One MPLS echo request message reaches TTL=n+1 on an LAG member
>          link 1.
>       *  The other MPLS echo request message also reaches TTL=n+1 on an
>          LAG member link 1.
>       One or two MPLS echo request messages sent by the initiator LSR
>       cannot reach the immediate downstream LSR, or the two MPLS echo
>       request messages reach at the immediate downstream LSR from the
>       same LAG member link.
> The description paragraph doesn't seem to match up the scenario described
> in the sub-bullets, which leaves me confused as to what
> scenario(s) are intended to be considered as the "error case".
> Section 6
>    This document defines a new optional TLV which is referred to as the
>    "LSR Capability TLV.  [...]
> Is this optional to use or comprehension-optional?  (We elsewhere rely on
> comprehension-mandatory behavior, so in the RFC 8029 terminology this
> seems to be a "mandatory TLV" even if it is not mandatory to use.)

Yes, it's a "mandatory TLV" according to RFC8029.

>    included in the MPLS echo reply message.  Otherwise, if the responder
>    does not know the LSR Capability TLV, an MPLS echo reply with the
>    return code set to "One or more of the TLVs was not understood" MUST
>    be sent back to the initiator LSR.
> This duplicates a requirement stated elsewhere (that really ought to just be
> using existing required behavior from 8029 anyway).

OK, will try to remove some duplicate text.

>    LSR Capability TLV Type is TBD1.  Length is 4.  The value field of
> In a hypothetical future where we need to allocate a 33rd capability flag,
> would we rather create a new "extended capabilities" TLV (burning another
> 32 bits for type+length) or leave flexibility now for 'length'
> to increase in multiples of 4 with receivers ignoring bits past what they know
> about?

32 seems reasonable enough for future use. If there will be a 33rd capability in a hypothetical future, a sub-TLV may be created then. 

>       The "LSR Capability Flags" field is 4 octets in length, this
>       document defines the following flags:
>       0                   1                   2                   3
>       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
>      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>      |                  Must Be Zero (Reserved)                  |U|D|
>      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> I'm wary of describing these as "Must Be Zero", since they're just unallocated
> but not required to be zero other than by the current state of bit allocations.
> (I guess this is conventional for MPLS, though?  So maybe it's not a problem.)
> Just "Reserved" would be fine.

Yes, it's conventional for MPLS. I incline to keep it as is.

>       This document defines two flags.  The remaining flags MUST be set
>       to zero when sending and ignored on receipt.  Both the U and the D
>       flag MUST be cleared in the MPLS echo request message when
>       sending, and ignored on receipt.  Neither, either or both the U
>       and the D flag MAY be set in the MPLS echo reply message.
> Similarly, I think we want to say "unallocated flags" rather than "the
> remaining flags".


> It also seems like the semantics being described here are "the TLV value is
> ignored in the echo request and only has meaning in the response"
> (i.e., this is a "tell me your capabilities" exchange, not a "here are mine; what
> are yours?" exchange).  As written, that applies only to the U and D bits, and
> is not a general property.  That may well be a fine choice if we think we might
> want the two-way exchange for some future allocated bit, but if we do think
> it's a general property of the mechanism, I'd suggest rewording the text.

For now, it's designed as unidirectional capabilities advertisement.  Define it as a more general property may require more considerations. I incline to keep it as-is. We can make an update if we have some specific use case in the future.

> Section 7
> (Per IANA, bits 4 and 5 are already assigned but are not reflected in the
> diagram.)

OK, will fix it.

> Section 10
>    The Detailed Interface and Label Stack TLV format is derived from the
>    Interface and Label Stack TLV format (from [RFC8029]).  Two changes
>    are introduced.  The first is that the label stack is converted into
>    a sub-TLV.  The second is that a new sub-TLV is added to describe an
>    interface index.  These fields of Detailed Interface and Label Stack
>    TLV have the same use and meaning as in [RFC8029].  A summary of
>    these fields is as below:
> nit: Are "These fields" just "The other fields"?


>          The Address Type indicates if the interface is numbered or
>          unnumbered.  It also determines the length of the IP Address
>          and Interface fields.  The resulting total length of the
>          initial part of the TLV is listed as "K Octets".  The Address
>          Type is set to one of the following values:
> nit: is it better to list the currently allocated values or just refer to the IANA
> registry?

Here, it does not define new registry, it just lists how the the purpose is the 

> I seee that this "index assigned to the interface" language for unnumbered
> address types originates from RFC 8029, but both this document and 8029
> leave me confused as to the encoding used for this index (for either/both
> IPv4 or IPv6 address types).

> [My comment about "Must Be Zero" and allocatable bits also applies here]

OK. Will change to "Reserved".

> Section 12
> This document also adds a capability "negotiation" mechanism for MPLS echo
> request/reply exchanges.  As MPLS does not offer integrity protection for its
> messages, this negotiation is only as trustworthy as the network from which
> messages are accepted as valid, and initiators have no recourse but to accept
> faithfully the echo replies received.
>    To prevent leakage of vital information to untrusted users, a
>    responder LSR MUST only accept MPLS echo request messages from
>    designated trusted sources via filtering source IP address field of
>    received MPLS echo request messages.  [...]
> I'd prefer to see a note added here about how "source IP address filtering
> provides only a weak form of access control and is not, in general, a reliable
> security mechanism.  Nonetheless, it is required here in the absence of any
> more robust mechanism that might be used."

OK, will add.

> Section 13
> Should we ask IANA to update the "Interface and Label Stack Address Types"
> registry to also refer to this document, since we are sharing the registry
> between the Interface and Label Stack TLV and the Detailed version?


> Section 13.2
> The range 4-31743 spans two different allocation procedures (Standards
> Action and Specification Required - Experimental RFC needed); I assume that
> we want the 4-16383 range for Standards Action - mandatory TLVs.


> Section 13.4
> We don't say what range we want this to be allocated from -- I assume
> 4-16383 since this is a negotiated feature and sending it outside the
> negotiated scenario should be an error.


> Section 13.4.1
> (I note that some existing sub-TLV registries have "Experimental RFC
> required" for the experimental ranges, but that is arguably more stringent
> than necessary; Specification Required seems more appropriate for this case.)

OK. Change to "Specification Required.
> Appendix A
> I was kind of hoping for some more guidance on what an initiator could do in
> these scenarios to try to get better visibility into the switch behavior (and,
> hopefully, a workaround to get reliable echo responses that cover all LAG
> member links).  AFAICT there's not a better option than "try a bunch of
> entropy labels and see what responses you can get back" and that's the
> same remedy in all the described topologies, but it would be nice to have this
> stated explicitly for the reader.

I am fine to add such a note in this section as you suggested, how about:

"Note: AFAICT there's not a better option than "try a bunch of entropy labels and see what responses you can get back" and that's the same remedy in all the described topologies below."

> Section A.1
>                     In the worst case, MPLS echo request messages with
>    specific entropies to exercise every LAG members from R1 to S1 can
>    all reach R2 via same LAG member.  [...]
> I'm confused by the phrase "specific entropies to exercise every LAG
> members [sic] from R1 to S1" -- is there some deterministic algorithm by
> which a specific entropy label value will force a specific LAG member link?

No, at least for my understanding.

> My understanding was that the entropy was used as input to an opaque hash
> function and that trial-and-error was the only way to explore the mapping
> from entropy value to LAG member link taken.

Yes, we have the same understanding.

Thanks again for your detailed and valuable comments!

Best regards,