Eric Rosen <erosen@cisco.com> Fri, 12 March 2010 16:27 UTC

To: l3vpn@ietf.org
Subject:
Date: Fri, 12 Mar 2010 11:13:28 -0500
Message-ID: <32177.1268410408@erosen-linux>
From: Eric Rosen <erosen@cisco.com>
Precedence: list
Reply-To: erosen@cisco.com

A few questions and comments on draft-morin-l3vpn-mvpn-fast-failover-04.

The proposed UMH selection procedure is:

   o  first, the UMH candidates that either (a) advertise a PMSI bound
      to a tunnel that is "up", or (b) do not advertise any I- or S-
      PMSI applicable to the said (C-S,C-G) but have associated a VRF
      Route Import BGP attribute to the unicast VPN route for S (this is
      necessary to avoid considering invalid some UMH PEs that use a a
      policy where no I-PMSI is advertised for a said VRF and where only
      S-PMSI are used, the S-PMSI advertisement being possibly done only
      after the upstream PE receives a C-multicast route for (C-S,
      C-G)/(C-*, C-G) to be carried over the advertised S-PMSI)

I think condition (a) really has to be "advertise a PMSI bound to a tunnel,
where the specified tunnel is not known to be down".  Saying "Not known to
be down" is not the same as saying "up".

Otherwise, consider the following:

- PE-R sees:

  * a route to C-S via PE-S1

  * a route to C-S via PE-S2

  where the route via PE-S2 is preferable

- PE-S1 has bound (C-S,C-G) to an S-PMSI instantiated by P-tunnel P1

- PE-S2 has bound (C-S,C-G) to an S-PMSI instantiated by P-tunnel P2

- PE-R happens to be already joined to P-tunnel P1, and is actively
  receiving traffic on it

- PE-R is not already joined to P-tunnel P2

Clearly PE-R will regard P1 as being up.  But what about P2?  PE-R cannot
know whether P2 is up, as PE-R is not joined to it.  Yet we certainly don't
want PE-R to pick PE-S1 as the UMH for (C-S,C-G) in this case.

A consequence is that PE-R has to keep track, for each (C-S, C-G), of the
set of UMH candidates whose tunnels are known to be down.  This is necessary
so that PE-R can redo the UMH selection whenever one of those tunnels is no
longer known to be down.  (Assuming of course that one wants to reoptimize
the routing when a tunnel comes back up.)  The draft should say something
about how one knows to return to the optimal route when a tunnel comes up.

I also have a question about whether changing UMH can really be considered
"fast failover" under most circumstances.  If PE-R is already joined to
PE-S1's tunnel when PE-S2's tunnel fails, then PE-R can certainly switch
quickly from receiving (C-S,C-G) on one tunnel to receiving it on the other.
But suppose PE-R is not already joined to PE-S1's tunnel.  Now PE-R must
engage in control plane activity in order to keep getting the (C-S,C-G)
traffic. So where's the "fast restoration"?  Fast restoration schemes are
usually focused on keeping the data flowing without waiting for control
plane activity to complete.

This is particularly relevant when the "tunnel down" status is inferred by
PE-R from the fact, locally detected, that the last hop link of the tunnel
is down.  In this case, the tunnel will get automatically reconstituted
after the IGP distributes the link status change.  If PE-R changes UMH, it
may need to send a Join (whether in PIM or in BGP) to the new UMH, and the
new UMH itself may need to send a C-PIM Join out a VRF interface.  Then PE-R
needs to add itself to the new P-tunnel, and this may involve a fair amount
of signaling, both in the tunnel set protocol (mLDP or RSVP-TE) and in BGP
(e.g., the new UMH may need to send a new S-PMSI A-D route, and PE-R may
then need to send a new Leaf A-D route.)  For this to count as a "fast
restoration" scheme, all this has to be done in less time than it would take
for the IGP to react to the link outage.

If "hot root standby" is being used, then these concerns don't apply unless
the standby tunnel and the primary tunnel happen to have the same last link
(which is certainly possible).

The draft does say:

   if the PE can determine that there is no fast
   restoration mechanism (such as MPLS FRR [RFC4090]) in place for the
   P-tunnel, it can update the UMH immediately.  Else, it should wait
   before updating the UMH, to let the P-tunnel restoration mechanisms
   happen

But this doesn't really address he issue I raise above.  If the timer has to
be long enough to let the IGP converge, there's not much point in having it
at all.

With regard to the procedures for use of the "standby community":

- For the "hot root standby" scheme, I don't think the community is
  necessary, as it doesn't seem to play any role.

- For the cold and warm root standby schemes, it seems that the standby PE
  has to know who the primary PE for a given (C-S,C-G) is.  I don't really
  see how this will work, since each downstream PE could have a different
  primary PE.

Eric Rosen