Re: thoughts on draft-bryant-shand-ipfrr-notvia-addresses-00.txt

mike shand <mshand@cisco.com> Tue, 26 April 2005 14:35 UTC

Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DQR9u-0005U0-LB; Tue, 26 Apr 2005 10:35:30 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1DQR9q-0005Tq-Sp for rtgwg@megatron.ietf.org; Tue, 26 Apr 2005 10:35:29 -0400
Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id KAA13291 for <rtgwg@ietf.org>; Tue, 26 Apr 2005 10:35:25 -0400 (EDT)
Received: from ams-iport-1.cisco.com ([144.254.224.140]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1DQRMG-0005ms-VW for rtgwg@ietf.org; Tue, 26 Apr 2005 10:48:20 -0400
Received: from ams-core-1.cisco.com (144.254.224.150) by ams-iport-1.cisco.com with ESMTP; 26 Apr 2005 16:35:15 +0200
Received: from cisco.com (mrwint.cisco.com [64.103.71.48]) by ams-core-1.cisco.com (8.12.10/8.12.6) with ESMTP id j3QEZB54003627; Tue, 26 Apr 2005 16:35:12 +0200 (MEST)
Received: from mshand-w2k02.cisco.com (dhcp-rea-gp250-64-103-64-184.cisco.com [64.103.64.184]) by cisco.com (8.8.8-Cisco List Logging/8.8.8) with ESMTP id PAA02166; Tue, 26 Apr 2005 15:35:08 +0100 (BST)
Message-Id: <4.3.2.7.2.20050426141038.021f1a90@jaws.cisco.com>
X-Sender: mshand@jaws.cisco.com
X-Mailer: QUALCOMM Windows Eudora Version 4.3.2
Date: Tue, 26 Apr 2005 15:35:00 +0100
To: Alia Atlas <aatlas@avici.com>
From: mike shand <mshand@cisco.com>
In-Reply-To: <5.1.0.14.2.20050325145408.01fa1378@mailhost.avici.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format="flowed"
X-Spam-Score: 0.0 (/)
X-Scan-Signature: b5216aa5b0df24d46eaed76d4f65aa31
Cc: rtgwg@ietf.org
Subject: Re: thoughts on draft-bryant-shand-ipfrr-notvia-addresses-00.txt
X-BeenThere: rtgwg@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: rtgwg.ietf.org
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:rtgwg@ietf.org>
List-Help: <mailto:rtgwg-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=subscribe>
Sender: rtgwg-bounces@ietf.org
Errors-To: rtgwg-bounces@ietf.org

At 15:07 25/03/2005 -0500, Alia Atlas wrote:

>Second is the list of downsides with the approach.  The main concern is 
>that the mechanism becomes too complex such that the trade-off between its 
>complexity and the full coverage is not desirable.
>1.      This requires a large number of additional IP addresses in the 
>IGP.  The same number of additional FECs is required to support LDP.

Yes, it does. In the simplest case of link and node protection, and 
ignoring LANs it requires 2 addresses per protected link. It is expected 
that these would come out of a "private" address space, and hence wouldn't 
consume real addresses. Indeed for security reasons it is preferable that 
they are private addresses.

I don't think this number is "too many". The question is how does this 
number increase when we add LANs and SRLGs.

>2.      Explicit tunnels are needed, which means that targeted LDP 
>sessions are necessary to have this support LDP traffic.

Yes. In the case of node protection we could also using Naiming's scheme of 
next-next hop LDP advertisement.

>  This is a particular concern for multi-homed prefixes; I'll describe my 
> concerns on this later.

Yes. This is a concern for LDP. I don't like the idea of targeted LDP 
sessions. Two possibilities come to mind

a) each node with an attached MHP distributes an additional label for that 
prefix which has the semantics that when you pop that address you MUST 
forward the underlying IP packet "directly".

b) an alternative which doesn't require additional labels, but DOES require 
a new "well known" label with the above semantics.

Neither are very attractive, but perhaps more attractive than the directed 
LDP sessions.

>3.      Substantial IGP changes are required to handle the additional 
>Notvia addresses.

Substantial is perhaps a bit strong. We need to advertise the not-via 
address and its association. For IS-IS its pretty straightforward. OSPF, by 
its very nature, may be a little more tricky.



>4.      A more complex algorithm is required to make the computation feasible.

Yes.

>5.      The management of the Notvia addresses & of the tunnels can create 
>longer time periods where protection isn't available for a part of the 
>network (the new link or node, etc.).

I don't think the tunnels add to the time at all. They are after all just 
FIB entries. Distributing the notvia addresses for a new node/link will 
occur at the same time as distributing the information about the link/node 
in the first place. I don't think it significantly increases the delay.

There is of course the time it takes to recompute notvia routes, but we 
think this will be well under a second.

These aspects certainly need thinking about, but they don't seem to pose 
insurmountable issues.


>Third, there are a number of issues that I feel need considerable 
>discussion to try and resolve.  I will try to go through each in turn and 
>explain what I think the various aspects of each are.  Each of these 
>issues has the possibility to resolve in such a way that the Notvia 
>Addresses approach becomes overly complex.

Yes. That is a temptation we need to resist!

>1.      Notvia Addresses:  The first issue is how the Notvia addresses are 
>allocated, distributed and withdrawn.  An initial idea of Stewart & Mike 
>is that these addresses are not global addresses (i.e. are 10.x.x.x or 
>such) and are configured in blocks on each router so that the router can 
>manage the bindings itself.

That seems the most reasonable thing to do.

>a.      The routing extensions to the IGP will have to associate a network 
>resource (node or link) that an address should be Notvia.  This is 
>probably straightforward.

Yes, As I said above.

>b.      It is desirable to have some dampening on the withdrawal of Notvia 
>addresses to minimize thrashing.

The allocation of notvia addresses to links certainly shouldn't be changed 
as a result of not "needing" the notvia address when the object with which 
it is associated goes away. It should also get back the same notvia address 
when it comes back. But I don't think there are any particular issues 
associated with them disappearing and reappearing in the LSPs.

Do you have any specific issues in mind?

>c.      If configured in blocks, it would be extremely desirable to have 
>the same Notvia address mean the same thing through multiple reboots, 
>etc.  It'd be good to have some means of consistent association.  This is 
>for easy manageability.

Yes, definitely.


>d.      When a new link or neighbor comes up, there will be a longer 
>period of time when an alternate isn't available because the Notvia 
>address hasn't been advertised yet.  These periods without protection need 
>to be clearly understood and minimized.

Yes. I'm not convinced there is a particular problem here, but it does need 
thinking through carefully.

>e.      There may be scalability concerns based on the number of Notvia 
>addresses and LDP FECs required.  For instance, as described in the draft, 
>it is basically the number of uni-directional links in the topology.  This 
>is ignoring the extras for broadcast links.  To fully & certainly provide 
>SRLG protection if at all feasible, would require that each router 
>advertise a Notvia address for every uni-directional link into every 
>neighbor of that router.  This would result in K*L additional addresses, 
>where K is the average number of neighbors & L is the number of 
>uni-directional links in the topology.

Yes. This is a major concern, and we need to devise ways of solving SRLGs 
etc. which minimize the potential proliferation of addresses.
We need to get the right tradeoff here between optimal solutions and 
complexity.

>2.      Insufficiently diverse topology:  It is possible that a network 
>topology cannot provide an alternate that suffices for link, node and SRLG 
>protection.  It isn't clear to me how to compute a "best-available" 
>alternate using this approach.  For instance, if one can get link 
>protection, but not node protection, how would that be determined, 
>computed and assigned?  This becomes much more of a concern for SRLG 
>protection & for topologies where failures have already occurred and the 
>network has converged for those & needs protection in the event of an 
>additional failure.

Clearly it is always possible to create a topology which contains single 
points of failure and is inherently irreparable. This is part of the 
tradeoff we need to address when thinking about SRLGs, since taking a 
simple but pessimistic approach to SRLG can result in this sort of failure. 
This seems to be a property of the problem rather than any particular solution.


>3.      Failure Diagnosis versus Pessimism:  As written, the draft 
>discusses the idea of doing failure diagnosis using BFD.  As Stewart, Mike 
>& I have discussed, this isn't possible for SRLG failures, although it is 
>possible for broadcast links.

Yes, and this relates to (2) above.

>a.      I am concerned about adding the failure diagnosis.  This is yet 
>another level of complexity for implementation.  It also has ramifications 
>for the forwarding plane, because of the need to store multiple alternates 
>to use & have multiple states to check to decide what to use.

Yes. It would be nice not to have to do it, but that is back to the 
tradeoff above.

>b.      An example of a concern with the BFD diagnosis is that all 
>interfaces on a node that has failed are not certain to fail exactly 
>simultaneously or even within a sub-50ms bounded window.  It is entirely 
>possible that BFD sessions are terminated on different line-cards, that 
>detect the router failure at slightly different times and stop forwarding 
>traffic, therefore, at slightly different times.

Yes. There is the possibility of misdiagnosis in this case if the second 
failure occurs too long after the first. I suppose this then looks like two 
separate failures. Clearly an unreliable diagnosis is probably worse than 
no diagnosis at all. We need to get some handle on how realistic or not 
this scenario is.


>c.      The other approach is to pessimistically eliminate all routers 
>connected to the broadcast link as well as the broadcast link; this may 
>not provide an alternate.

Yes. While simple, it runs into the problem of being a single (albeit 
large) point of failure. Its the same trade-off as above.

>   It also needs to be thought through what issues might exist if the 
> topologies used for the SPF vary slightly for each router that is on the 
> broadcast link, since each will, as described, not prune itself out when 
> doing the computation; of course, there could be an approach where the 
> same topology can be used everywhere.

I'm not really sure what you mean here.

>  It isn't clear to me what Notvia addresses would be needed to express 
> "don't go through this pseudo-node or any nodes attached to it"; I don't 
> think that it is simply the Notvia address for avoiding a particular node.

No, it would need a specific notvia address bound to the LAN interface.

>4.      Multi-homed Prefixes:  I am quite concerned about the mechanisms 
>suggested in the draft.
>a.      First, I really do not like the idea of having separate forwarding 
>for "local" prefixes that come out of a tunnel.  What is a local 
>prefix?  For instance, does this mean that an ABR has to forward traffic 
>different depending on which area traffic from the tunnel has come 
>from?  I am concerned about how this would scale; maybe only 2 FIBs are 
>needed (one for backbone & one for other), but it may be worse to handle 
>AS external routes.  I know that Stewart, Mike, Joel, Albert and I had 
>discussed/agreed to put this idea out of scope  at least for the moment.

Clearly the problem needs solving, especially since prefixes which are 
multihomed are frequently the most important prefixes (which is WHY they 
are multihomed in the first place).

>b.      I am quite concerned about having tunnels to the advertisers of 
>the prefixes.
>i.      There needs to be a mechanism to determine whether the advertiser 
>of a prefix will forward the packet in a loop-free fashion to avoid the 
>failure point.  The separate forwarding for "local" prefixes avoided the 
>need for this determination, but at more substantial cost.

There seem to be two aspects to this.

a) we need the ability to get the packet to the "second-best" attachment 
point for the prefix without it being "sucked back" to the failure. This in 
general requires a tunnel, except for the cases where a neighbor of the 
node detecting the failure has an LFA to the second best attachment point. 
Clearly this could be used in preference to a tunnel where available, but 
at the expense of additional complexity. However this is really just an 
extension of the general principle that we should use "basic" (i.e. LFA) 
repair to cream off traffic which doesn't NEED to be tunnelled.

b) we need (in a very limited number of cases), the ability to force the 
packet to the locally attached prefix. This only occurs where the local 
cost is high compared to the cost back to the failed attachment point. But 
when we DO need it, the use of a tunnel is a convenient means of signalling 
this. I'm not sure how else to do it, other than using a label.

Of course ONE "solution" would be to REQUIRE the costs to be set sensibly :-)


>ii.     To support LDP, every tunnel requires a targeted LDP session.  If 
>multi-homed prefixes are common, then this becomes a full mesh for 
>LDP.  That isn't acceptable.

Agreed.

>  Of course, multi-homed prefixes may be much more infrequent for LDP than 
> for IP; for example, there is no reason to advertise a separate FEC for 
> the subnet of a link.  However, multi-homed prefixes are a concern for 
> LDP for at least the inter-area, AS External, and BGP routes.
>iii.    If traffic is encapsulated to a node's regular address, because 
>that traffic is destined to a prefix advertised by the node, how does the 
>receiving node know to remove the encapsulation and forward the packet 
>inside  all in the fast path?  Is this a just a question of different 
>handling based on the header type inside the outer encapsulation (for GRE)?

Yes.

>iv.     Perhaps these issues could be handled by determining a 
>next-next-hop that avoids the failure to reach an appropriate 
>advertiser.   Of course, this is a different set/type of computation.

Could you explain that suggestion please?

>5.      SRLGs and Broadcast Links:  There seem to be a number of possible 
>ways to handle SRLGs and broadcast links, each of which provides a 
>different trade-off in terms of  coverage, computation, and extra Notvia 
>addresses.

Yes.

>   There are basically 4 approaches at this point.
>a.      First, In order to compute a notvia alternate that avoids a link, 
>the primary neighbor, and all SRLGs that the link is part of, it is 
>necessary to have a separate topology and associated SPF computation for 
>each link that is a member of an SRLG or a broadcast link.   This requires 
>also a substantially larger number of Notvia addresses and the 
>corresponding mechanisms to determine how and when to allocate and 
>de-allocate them.

.. and could potentially result in a combinatorial explosion if we weren't 
very careful.

>b.      Second, one could use a topology that removed the primary neighbor 
>and see whether SRLG protection can be obtained either along S's path or 
>along any path of a neighbor of S that is also loop-free.

Could you explain that a bit more please?

>c.      Third, when a Notvia address indicates to avoid a node, one could 
>remove not merely the node & the uni-directional links to and from that 
>node, but also any other links that are in a common SRLG with any of the 
>links to or from the removed node.  This is pessimistic  but allows some 
>SRLG protection without increased computation or Notvia addresses.

Yes. This is nice and simple, but as you have pointed out above, could 
easily result in an inability to find a viable repair.

>d.      Fourth, one could simply track the SRLGs encountered along the 
>Notvia path; this just reports whether the alternate provides SRLG 
>protection without any effort to obtain it.

Yes. Interesting. I wonder how useful this would be.

>6.      Implementability:  Clearly, the draft describes the basic idea for 
>Notvia addresses, but there are a fair number of implementation/protocol 
>decisions that need to be made before this can become anything more than 
>an interesting idea.

Sure. There are quite a few design decisions and tradeoffs as indicated 
above that need tieing down.

>7.      There is a definite need to describe the convergence case 
>better.   This is how the transition from using the alternate to the 
>network being converged happens, such that the alternate remains functional.
>a.      For instance, if the node E fails, then the Notvia address E_!S 
>will no longer be advertised.  If S was getting link protection (because 
>that was all that was possible, for instance) by tunneling traffic to 
>E_!S, it is important that this traffic be properly discarded when E's 
>addresses go away.   This implies that there needs to be a default 
>blackhole for Notvia addresses.

I don't quite understand your concern here. If E goes away and S is sending 
to E_!S, then the neighbors of E will drop the packets because we don't 
repair a notvia address.

Or are you concerned that after convergence, there will be nodes which 
don't even have a forwarding entry for E_!S. By this time I don't think 
that S (or anyone else) should still be using that address, but even if it 
were, the absence of a forwarding entry would (SHOULD) cause the packet to 
be dropped. Is this all you are saying?


>b.      Another example is when node E fails, the next-next-hop B must 
>continue to advertise the Notvia address B_!E until the network converges 
>so that S can continue to tunnel traffic to B_!E as the alternate.

Yes. Our view was that no changes would be made to notvia advertisement or 
more specifically notvia FIB entries until after convergence is over. Of 
course there is an issue as to how you tell when that has happened, but the 
timers associated with loop free convergence probably give a good indication.

>c.      It is possible to get a micro-forwarding loop affecting a Notvia 
>address as a result of a less severe failure than anticipated.  For 
>instance, consider the following topology.
>              [D]
>               |
>           1   |
>      [E]-----[F]-\
>       |       |   \ 10
>     1 |R    1 |R   \
>       |   5   |     \
>      [S]-----[H]----[I]
>                  2
>
>      Link S->E and Link H->F are in SRLG R
>
>When node E fails, if I converges before H, there will be a loop affecting 
>the Notvia address being used to reach F without going through any of Link 
>S->E, E or SRLG R.

We discussed this privately, and I still don't see how loops could arrise 
even if the notvia FIB were recomputed before normal convergence is 
complete. But I think it is better to delay the notvia FIB changes anyway.

>d.      How do exceptions work?  Particularly in regards to an IP-in-IP 
>encapsulation such as GRE, it doesn't seem like MTU exceeded cases can be 
>handled cleanly  either by use of DF or by doing IP fragmentation and then 
>the reassembly at the end of the tunnel.  This seems like a problem for 
>all ICMP packets; how could a source understand the header inside for a 
>TTL expired, for instance.

I'll leave this for Stewart (tunnel) Bryant!

>e.      For IP-in-IP tunnels, another concern is flow diversity.  The IP 
>source and destination addresses are used to determine a flow; this flow 
>identification may then be used for a variety of purposes, including 
>ECMP.  By putting all the traffic to a variety of destinations inside the 
>same header, the ability to take advantage of flow diversity appears to 
>have disappeared.   This could possibly be solved by putting the original 
>source address into the encapsulating header?  Are there other approaches?

and this.

>Hopefully, this will spark some discussion on the issues.

Let's hope so

         Mike


>Alia
>
>
>
>_______________________________________________
>Rtgwg mailing list
>Rtgwg@ietf.org
>https://www1.ietf.org/mailman/listinfo/rtgwg

_______________________________________________
Rtgwg mailing list
Rtgwg@ietf.org
https://www1.ietf.org/mailman/listinfo/rtgwg