Thomas, Bob,

Some questions below for you. Some old, and some new.


Sorry for the late response.
Please see zzh> below. I trimmed some points that we agree on.

From: Greg Mirsky
Sent: Wednesday, December 5, 2018 1:38 PM
To: Jeffrey (Zhaohui) Zhang <>
Cc: EXT -; Robert Kebler;; BESS
Subject: Re: [bess] WGLC, IPR and implementation poll for draft-ietf-bess-mvpn-fast-failover

Hi Jeffrey,
thank you for the review, detailed questions and helpful comments. Please find my notes, answers in-line tagged GIM>>.


On Fri, Nov 30, 2018 at 5:14 PM Jeffrey (Zhaohui) Zhang <<>> wrote:

I have the following questions/comments:

   The procedure described here is an OPTIONAL procedure that consists
   of having a downstream PE take into account the status of P-tunnels
   rooted at each possible upstream PEs, for including or not including
   each given PE in the list of candidate UMHs for a given (C-S,C-G)
   state.  The result is that, if a P-tunnel is "down" (see
   Section 3.1), the PE that is the root of the P-tunnel will not be
   considered for UMH selection, which will result in the downstream PE
   to failover to the upstream PE which is next in the list of

Is it possible that a p2mp tunnel is considered up by some leaves but down by some other leaves, leaving to them choosing different UMH? In that case, procedures described in Section 9.1.1 ("Discarding Packets from Wrong PE") of RFC 6513 must be used. I see that this is actually pointed out later in section 6 – good to have a pointer to it right here.
GIM>> Would the following new text that follows the quoted text address your concern:
   If rules to determine the state of the P-tunnel are not
   consistent across all PEs, then some may arrive at a different
   conclusion regarding the state of the tunnel, In such a scenario,
   procedures described in Section 9.1.1 of [RFC 6513] MUST be used.

Zzh> It’s not that the “rules … are not consistent”. It’s that by nature some PEs may think the tunnel is down while the others may think the tunnel is still up (because they’re on different tunnel branches), even when they follow the same rules. Traffic duplication in this case is also only with inclusive tunnels – so how about the following?

   Because all PEs may arrive at a different
   conclusion regarding the state of the tunnel,
   procedures described in Section 9.1.1 of [RFC 6513] MUST be used
   when using inclusive tunnels.

Additionally, the text in section 3 seems to be more biased on Single Forwarder Election choosing the UMH with the highest IP address. Section 5 of RFC6513 also describes two other options, hashing or based on “installed UMH route” (aka unicast-based). It is not clear how the text in this document applies to hashing based selection, and I don’t see how the text applies to unicast-based selection. Some rewording/clarification are needed here.
GIM>> How would the use of an alternative UMH selection algorithm change documented use of p2mp BFD? Do you think that if the Upstream PE selected using, for example, hashing then defined use of BGP-BFD and p2mp BFD itself no longer applicable?

Zzh> It’s not that the alternative UMH selection algorithm change documented use of p2mp BFD. It’s the other way around – tunnel state changes the selection result. I guess hashing can still be used (this document only controls what goes into the candidate pool). For unicast based selection I thought it’d no longer work, but then I noticed the following:

   o  second, the UMH candidates that advertise a PMSI bound to a tunnel
      that is "down" -- these will thus be used as a last resort to
      ensure a graceful fallback to the basic MVPN UMH selection
      procedures in the hypothetical case where a false negative would
      occur when determining the status of all tunnels

Zzh> So this should still work, although Ideally, the PE advertising the next best route should be considered before going to the last resort (of using the PE advertising the best route but whose tunnel is down).

   For P-tunnels of type P2MP MPLS-TE, the status of the P-tunnel is
   considered up if one or more of the P2MP RSVP-TE LSPs, identified by
   the P-tunnel Attribute, are in Up state.

Why is “one or more of …” used in the above text?
GIM>> Would s/one or more of/at least one of/ address your concern?

Zzh> Still confused. From the tunnel head, indeed it could send setup multiple (sub-)LSPs, one for each leaf. From the egress point of view, there is only one LSP for an egress PE, right?

   If tracking of the P-tunnel by using a p2mp BFD session is to be
   enabled after the P-tunnel has been already signaled, the the
   procedure described above MUST be followed.

What if the tracking is to be enabled before the P-tunnel has been signaled? The text implies different behavior?
GIM>> Not really, I guess. I think that the second sentence is important:
   Note that x-PMSI A-D Route MUST be re-sent with exactly the same attributes as before and
   the BGP-BFD Attribute included.

Zzh> In that case, how about changing the paragraph and the next one to the following:

   If tracking of the P-tunnel by using a p2mp BFD session is
  enabled after the x-PMSI A-D route has been already advertised,
   the x-PMSI A-D
   Route MUST be re-sent with exactly the same attributes as before and
   the BGP-BFD Attribute included.

   If the x-PMSI A-D route is advertised with P-tunnel status tracked using
   the p2mp BFD session and it is desired to stop tracking P-tunnel
   status using BFD, then:

zzh> BTW, the same applies to 3.1.7 as well.

   When such a procedure is used, in the context where fast restoration
   mechanisms are used for the P-tunnels, leaf PEs should be configured
   to wait before updating the UMH, to let the P-tunnel restoration
   mechanism happen.  A configurable timer MUST be provided for this
   purpose, and it is recommended to provide a reasonable default value
   for this timer.

What does “such a procedure” refers to?
GIM>> Would s/When such a procedure is used/In such a scenario/

Zzh> I looked at the surrounding (new) text:

   If the Downstream PE's P-tunnel is already up, its state being
   monitored by the p2mp BFD session, and the Downstream PE receives the
   new x-PMSI A-D Route without the BGP-BFD Attribute, the Downstream

   o  MUST accept the x-PMSI A-D Route;

   o  MUST stop receiving BFD control packets for this p2mp BFD session;

   o  SHOULD delete the p2mp BFD session associated with the P-tunnel;

   o  SHOULD NOT switch the traffic to the Standby Upstream PE.

   In such a scenario, in the context where fast restoration mechanisms
   are used for the P-tunnels, leaf PEs should be configured to wait
   before updating the UMH, to let the P-tunnel restoration mechanism

Zzh> Now I have the following two questions:
Zzh> a) Should the “MUST stop receiving BFD control packets for this p2mp BFD session” be removed? How would you “stop receiving BFD control packets”? Isn’t it implied by the next bullet point already?
Zzh> b) What does the last clause “to let the P-tunnel restoration mechanism happen” mean? The scenario is that an x-PMSI route update is received w/o the BGP-BFD attribute – where does the tunnel restoration come from?

3.1.7.  Per PE-CE link BFD Discriminator

   The following approach is defined for the fast failover in response
   to the detection of PE-CE link failures, in which UMH selection for a
   given C-multicast route takes into account the state of the BFD
   session associated with the state of the upstream PE-CE link.  Upstream PE Procedures

   For each protected PE-CE link, the upstream PE initiates a multipoint
   BFD session [I-D.ietf-bfd-multipoint] as MultipointHead toward
   downstream PEs.  A downstream PE monitors the state of the p2mp
   session as MultipointTail and MAY interpret transition of the BFD
   session into Down state as the indication of the associated PE-CE
   link being down.

Since the BFD packets are sent over the P2MP tunnel not the PE-CE link, my understanding is that the BFD discriminator is still for the tunnel and not tied to the PE-CE link; but different from the previous case, the root will stop sending BFD messages when it detects the PE-CE link failure. As far as the egress PEs are concerned, they don’t know if it is the tunnel failure or PE-CE link failure.

If my understanding is correct, the wording should be changed.
GIM>> There are other than stopping transmission of BFD control packets ways to distinguish two conditions for the egress PE. For example, the MultipointHead MAY set the State to AdminDown and continue sending BFD control packets. If and when PE-CE link restored to Up, the MultipointHead can set the state to Up in the BFD control packet.

   …  If the route to the
   src/RP changes such that the RPF interface is changed to be a new PE-
   CE interface, then the upstream PE will update the S-PMSI A-D route
   with included BGP-BFD Attribute so that value of the BFD
   Discriminator is associated with the new RPF link.

If the RPF interface changes on the upstream PE, why should it update the route to send a new discriminator? As long as there is a new RPF interface couldn’t the upstream PE do nothing but start tracking the new RPF interface?
GIM>> I'll defer this one to Thomas and Rob.

Zzh> I re-read section 3.1.6 and 3.1.7 and have more questions 😊
Zzh> 3.1.6 seems to be about tracking tunnel itself while 3.1.7 is about tracking PE-CE interfaces. From an egress point of view, (how) does it know if the discriminator is for the tunnel or for PE-CE interface 1 or PE-CE interface 2? Does it even care? It seems to me that an egress PE would not need to care. If so, why are there different procedures for 3.1.6/3.1.7 (at least for the egress PE behavior)? Even for the upstream PE behavior, shouldn’t apply to 3.1.7 as well?

Regardless which way (the currently described way and my imagined way), some text should be added to discuss how the downstream would not switch to another upstream PE when the primary PE is just going through a RPF change.
GIM>>  Would appending the following text be acceptable to address your concern:
   To avoid unwarranted switchover a downstream PE MUST gracefully handle the
   updated S-PMSI A-D route and switch to the use of the associated BFD
   Discriminator value.

4.  Standby C-multicast route

   The procedures described below are limited to the case where the site

   that contains C-S is connected to exactly two PEs. The procedures
   require all the PEs of that MVPN to follow the single forwarder PE
   selection, as specified in [RFC6513].

Why would it not work with more than two upstream PEs?
Why is it limited to single forwarder selection? What about unicast based selection?
GIM>> Again, asking for Thomas and Rob to help.

   If at some later point the local PE determines that C-S is no longer
   reachable through the Primary Upstream PE, the Standby Upstream PE
   becomes the Upstream PE, and the local PE re-sends the C-multicast
   route with RT that identifies the Standby Upstream PE, except that
   now the route does not carry the Standby PE BGP Community (which
   results in replacing the old route with a new route, with the only
   difference between these routes being the presence/absence of the
   Standby PE BGP Community).

Additionally the LOCAL_PREF should also change?
GIM>> Like normative SHOULD?

Zzh> I meant that there should also be text talking about changing LOCAL_PREF.
5.  Hot leaf standby

   The mechanisms defined in sections Section 4 and Section 3 can be
   used together as follows.

This section is a little confusing to me. It seems that it really should be how a leaf should behave when hot root standby is used, not that there is a “hot leaf” mode. A leaf is just a leaf, not a cold/warm/hot/primary/standby leaf.
GIM>> Would re-naming the section to "Use of Standby C-multicast Route" better reflect the content of the section?

Zzh> It seems to me that the title should really be changed to “Hot Root Standby”. Bob/Thomas?

Zzh> Thanks!
Zzh> Jeffrey



Hello Working Group,

This email starts a two-week Working Group Last Call on draft-ietf-bess-mvpn-fast-failover-04  [1]

This poll runs until *the 6th of December*.

We are also polling for knowledge of any undisclosed IPR that applies to this Document, to ensure that IPR has been disclosed in compliance with IETF IPR rules (see RFCs 3979, 4879, 3669 and 5378 for more details).

If you are listed as an Author or a Contributor of this Document please respond to this email and indicate whether or not you are aware of any relevant undisclosed IPR. The Document won't progress without answers from all the Authors and Contributors.

Currently two IPRs have been disclosed against this Document.

If you are not listed as an Author or a Contributor, then please explicitly respond only if you are aware of any IPR that has not yet been disclosed in conformance with IETF rules.

We are also polling for any existing implementation as per [2].

    Thank you,

    Stephane & Matthew




