thoughts on draft-bryant-shand-ipfrr-notvia-addresses-00.txt

I've been thinking about this approach and have a number of comments on it 
(sorry for the length) gathered from various discussions.

In general, this is a straightforward approach that seems quite promising, 
but it still very preliminary.  I am concerned about a number of issues 
that are either not sufficiently addressed in the draft or where I feel the 
complexity of the approach in the draft is a problem.  If all such issues 
can be adequately resolved such that the IPFRR with Notvia Addresses can be 
relatively simple to configure, manage and understand while handling all 
the problem cases of interest, then this approach could be a likely 
candidate for an advanced method.

As we've been discussing, the key question is what is the correct trade-off 
between mechanism complexity and network coverage.  While I think that this 
approach has the possibility of being a reasonable trade-off, whether that 
is actually the case will depend on whether and how the various issues can 
be resolved.

At a high level, conceptually the Notvia addresses approach gives very 
similar forwarding paths as TE FRR.  The difference is that rather than the 
head-end doing the computation for each tunnel and signaling the ERO, the 
computation is distributed to all nodes in the network.

First, I'm going to go through what I believe the benefits are.
1.      The idea is conceptually simple; it is easy to understand the path 
the alternate will take.  This is useful for network operations.
2.      No more than a single level of encapsulation is ever 
required.  Although it does suffer from requiring an explicit tunnel, at 
least the forwarding complexity overhead is constant and well understood.
3.      For node and link failure, if the topology isn't disjoint after the 
failure, then an alternate will be found.  There are definitely issues 
still to handle the broadcast link case, but I'll address those later.
4.      Although computationally expensive, the necessary computation does 
appear to be feasible by using incremental SPFs and early 
termination.  These bring the required computation for pt2pt-link and node 
failure into a reasonable range.  My assumption is that this time can be 
further improved with some work.
5.      There is no possibility of looping via the alternates in the event 
that a worse failure, which the alternate can't protect against, has 
occurred.  This is both because the alternate in the Notvia addressed 
tunnel is not repaired and because the Notvia-addressed tunnel rejoins the 
SPT at a point that is always downstream of S.
6.      There are no issues with multi-area OSPF because traffic is sent 
through tunnels to Notvia addresses that are always intra-area for at most 
one area.  Of course, the multi-area OSPF applicability restrictions are 
not very restrictive.
7.      It is clear that SRLG failures and broadcast link failures can be 
handled.  The complexity required depends on the desired/required 
coverage.  I'll address this later in the concerns section.
8.      Because a tunnel is used, it is possible to use the same mechanism 
for multicast traffic, when we determine how to provide IPFRR for 
multicast.  There is also the advantage that the router advertising the 
Notvia address can know the SPT in ref to that Notvia address; this allows 
an RPF check for traffic considering the alternate.

Second is the list of downsides with the approach.  The main concern is 
that the mechanism becomes too complex such that the trade-off between its 
complexity and the full coverage is not desirable.
1.      This requires a large number of additional IP addresses in the 
IGP.  The same number of additional FECs is required to support LDP.
2.      Explicit tunnels are needed, which means that targeted LDP sessions 
are necessary to have this support LDP traffic.  This is a particular 
concern for multi-homed prefixes; I'll describe my concerns on this later.
3.      Substantial IGP changes are required to handle the additional 
Notvia addresses.
4.      A more complex algorithm is required to make the computation feasible.
5.      The management of the Notvia addresses & of the tunnels can create 
longer time periods where protection isn't available for a part of the 
network (the new link or node, etc.).

Third, there are a number of issues that I feel need considerable 
discussion to try and resolve.  I will try to go through each in turn and 
explain what I think the various aspects of each are.  Each of these issues 
has the possibility to resolve in such a way that the Notvia Addresses 
approach becomes overly complex.
1.      Notvia Addresses:  The first issue is how the Notvia addresses are 
allocated, distributed and withdrawn.  An initial idea of Stewart & Mike is 
that these addresses are not global addresses (i.e. are 10.x.x.x or such) 
and are configured in blocks on each router so that the router can manage 
the bindings itself.
a.      The routing extensions to the IGP will have to associate a network 
resource (node or link) that an address should be Notvia.  This is probably 
straightforward.
b.      It is desirable to have some dampening on the withdrawal of Notvia 
addresses to minimize thrashing.
c.      If configured in blocks, it would be extremely desirable to have 
the same Notvia address mean the same thing through multiple reboots, 
etc.  It'd be good to have some means of consistent association.  This is 
for easy manageability.
d.      When a new link or neighbor comes up, there will be a longer period 
of time when an alternate isn't available because the Notvia address hasn't 
been advertised yet.  These periods without protection need to be clearly 
understood and minimized.
e.      There may be scalability concerns based on the number of Notvia 
addresses and LDP FECs required.  For instance, as described in the draft, 
it is basically the number of uni-directional links in the topology.  This 
is ignoring the extras for broadcast links.  To fully & certainly provide 
SRLG protection if at all feasible, would require that each router 
advertise a Notvia address for every uni-directional link into every 
neighbor of that router.  This would result in K*L additional addresses, 
where K is the average number of neighbors & L is the number of 
uni-directional links in the topology.
2.      Insufficiently diverse topology:  It is possible that a network 
topology cannot provide an alternate that suffices for link, node and SRLG 
protection.  It isn't clear to me how to compute a "best-available" 
alternate using this approach.  For instance, if one can get link 
protection, but not node protection, how would that be determined, computed 
and assigned?  This becomes much more of a concern for SRLG protection & 
for topologies where failures have already occurred and the network has 
converged for those & needs protection in the event of an additional failure.
3.      Failure Diagnosis versus Pessimism:  As written, the draft 
discusses the idea of doing failure diagnosis using BFD.  As Stewart, Mike 
& I have discussed, this isn't possible for SRLG failures, although it is 
possible for broadcast links.
a.      I am concerned about adding the failure diagnosis.  This is yet 
another level of complexity for implementation.  It also has ramifications 
for the forwarding plane, because of the need to store multiple alternates 
to use & have multiple states to check to decide what to use.
b.      An example of a concern with the BFD diagnosis is that all 
interfaces on a node that has failed are not certain to fail exactly 
simultaneously or even within a sub-50ms bounded window.  It is entirely 
possible that BFD sessions are terminated on different line-cards, that 
detect the router failure at slightly different times and stop forwarding 
traffic, therefore, at slightly different times.
c.      The other approach is to pessimistically eliminate all routers 
connected to the broadcast link as well as the broadcast link; this may not 
provide an alternate.  It also needs to be thought through what issues 
might exist if the topologies used for the SPF vary slightly for each 
router that is on the broadcast link, since each will, as described, not 
prune itself out when doing the computation; of course, there could be an 
approach where the same topology can be used everywhere.  It isn't clear to 
me what Notvia addresses would be needed to express "don't go through this 
pseudo-node or any nodes attached to it"; I don't think that it is simply 
the Notvia address for avoiding a particular node.
4.      Multi-homed Prefixes:  I am quite concerned about the mechanisms 
suggested in the draft.
a.      First, I really do not like the idea of having separate forwarding 
for "local" prefixes that come out of a tunnel.  What is a local 
prefix?  For instance, does this mean that an ABR has to forward traffic 
different depending on which area traffic from the tunnel has come from?  I 
am concerned about how this would scale; maybe only 2 FIBs are needed (one 
for backbone & one for other), but it may be worse to handle AS external 
routes.  I know that Stewart, Mike, Joel, Albert and I had discussed/agreed 
to put this idea out of scope  at least for the moment.
b.      I am quite concerned about having tunnels to the advertisers of the 
prefixes.
i.      There needs to be a mechanism to determine whether the advertiser 
of a prefix will forward the packet in a loop-free fashion to avoid the 
failure point.  The separate forwarding for "local" prefixes avoided the 
need for this determination, but at more substantial cost.
ii.     To support LDP, every tunnel requires a targeted LDP session.  If 
multi-homed prefixes are common, then this becomes a full mesh for 
LDP.  That isn't acceptable.  Of course, multi-homed prefixes may be much 
more infrequent for LDP than for IP; for example, there is no reason to 
advertise a separate FEC for the subnet of a link.  However, multi-homed 
prefixes are a concern for LDP for at least the inter-area, AS External, 
and BGP routes.
iii.    If traffic is encapsulated to a node's regular address, because 
that traffic is destined to a prefix advertised by the node, how does the 
receiving node know to remove the encapsulation and forward the packet 
inside  all in the fast path?  Is this a just a question of different 
handling based on the header type inside the outer encapsulation (for GRE)?
iv.     Perhaps these issues could be handled by determining a 
next-next-hop that avoids the failure to reach an appropriate 
advertiser.   Of course, this is a different set/type of computation.
5.      SRLGs and Broadcast Links:  There seem to be a number of possible 
ways to handle SRLGs and broadcast links, each of which provides a 
different trade-off in terms of  coverage, computation, and extra Notvia 
addresses.  There are basically 4 approaches at this point.
a.      First, In order to compute a notvia alternate that avoids a link, 
the primary neighbor, and all SRLGs that the link is part of, it is 
necessary to have a separate topology and associated SPF computation for 
each link that is a member of an SRLG or a broadcast link.   This requires 
also a substantially larger number of Notvia addresses and the 
corresponding mechanisms to determine how and when to allocate and 
de-allocate them.
b.      Second, one could use a topology that removed the primary neighbor 
and see whether SRLG protection can be obtained either along S's path or 
along any path of a neighbor of S that is also loop-free.
c.      Third, when a Notvia address indicates to avoid a node, one could 
remove not merely the node & the uni-directional links to and from that 
node, but also any other links that are in a common SRLG with any of the 
links to or from the removed node.  This is pessimistic  but allows some 
SRLG protection without increased computation or Notvia addresses.
d.      Fourth, one could simply track the SRLGs encountered along the 
Notvia path; this just reports whether the alternate provides SRLG 
protection without any effort to obtain it.
6.      Implementability:  Clearly, the draft describes the basic idea for 
Notvia addresses, but there are a fair number of implementation/protocol 
decisions that need to be made before this can become anything more than an 
interesting idea.
7.      There is a definite need to describe the convergence case 
better.   This is how the transition from using the alternate to the 
network being converged happens, such that the alternate remains 
functional.
a.      For instance, if the node E fails, then the Notvia address E_!S 
will no longer be advertised.  If S was getting link protection (because 
that was all that was possible, for instance) by tunneling traffic to E_!S, 
it is important that this traffic be properly discarded when E's addresses 
go away.   This implies that there needs to be a default blackhole for 
Notvia addresses.
b.      Another example is when node E fails, the next-next-hop B must 
continue to advertise the Notvia address B_!E until the network converges 
so that S can continue to tunnel traffic to B_!E as the alternate.
c.      It is possible to get a micro-forwarding loop affecting a Notvia 
address as a result of a less severe failure than anticipated.  For 
instance, consider the following topology.
              [D]
               |
           1   |
      [E]-----[F]-\
       |       |   \ 10
     1 |R    1 |R   \
       |   5   |     \
      [S]-----[H]----[I]
                  2

      Link S->E and Link H->F are in SRLG R

When node E fails, if I converges before H, there will be a loop affecting 
the Notvia address being used to reach F without going through any of Link 
S->E, E or SRLG R.
d.      How do exceptions work?  Particularly in regards to an IP-in-IP 
encapsulation such as GRE, it doesn't seem like MTU exceeded cases can be 
handled cleanly  either by use of DF or by doing IP fragmentation and then 
the reassembly at the end of the tunnel.  This seems like a problem for all 
ICMP packets; how could a source understand the header inside for a TTL 
expired, for instance.
e.      For IP-in-IP tunnels, another concern is flow diversity.  The IP 
source and destination addresses are used to determine a flow; this flow 
identification may then be used for a variety of purposes, including 
ECMP.  By putting all the traffic to a variety of destinations inside the 
same header, the ability to take advantage of flow diversity appears to 
have disappeared.   This could possibly be solved by putting the original 
source address into the encapsulating header?  Are there other approaches?

Hopefully, this will spark some discussion on the issues.

Alia

_______________________________________________
Rtgwg mailing list
Rtgwg@ietf.org
https://www1.ietf.org/mailman/listinfo/rtgwg