RE: WGLC for draft-rtgwg-mrt-frr-architecture

Rob,

Thanks for your feedback.  I added the following text to clarify how the alternates computed by MRT can be used together with alternates produced by other FRR technologies.  I also added text to clarify how MRT allows for the specification of different profiles which will generally produce different paths.

The diff of the new version can be found at:

https://www.ietf.org/rfcdiff?url2=draft-ietf-rtgwg-mrt-frr-architecture-08

---------------------

15.  Applying Policy to Select from Multiple Possible Alternates for FRR

   For a given topology, GADAG root, and profile, MRT will provide a
   node-protecting alternate path from each PLR to each destination for
   any single link or node failure, if such a path exists.  Therefore,
   an implementation may choose to only use the alternates determined by
   MRT to provide 100% FRR coverage.

   However, it may be desirable to allow an operator to use MRT
   alternates together with alternates provided by other FRR
   technologies.  A policy-based alternate selection process can allow
   an operator to select the best alternate from those provided by MRT
   and other FRR technologies.  As an example, it may be desirable to
   implement a policy where a node-protecting LFA (if it exists for a
   given failure mode and destination) is preferred over a given MRT
   alternate.  [I-D.ietf-rtgwg-lfa-manageability] discusses many of the
   potential criteria that one might take into account when evaluating
   different alternates for selection.

   Note that future documents may define MRT profiles in addition to the
   default profile defined here.  Different MRT profiles will generally
   produce red and blue paths with different properties.  An
   implementation may allow an operator to use different MRT profiles
   instead of or in addition to the default profile. 

----------------------

I hope this helps to clarify that MRT can be used either alone, or with other FRR technologies.  

I also want to comment on the fact that remote LFA produces multiple alternates to choose from.  With respect to determining if an alternate provides node protection or not, the fact that remote LFA computes many possible alternate paths could be viewed as a drawback, as opposed to an advantage.    For a given PLR and failure mode and destination, in general it will be the case that many nodes qualify as PQ nodes.  In order to determine if the complete repair path from PLR to PQ-node and PQ-node to destination is node-protecting, additional computation is needed.  The most efficient approach seems to be to run a forward SPF from the PQ-node being evaluated.   In some topologies, it is not uncommon for many nodes to qualify as PQ nodes.  In order to avoid spending too much time churning away at running forward SPFs rooted at PQ-nodes, some implementations may find it useful to limit the number of PQ nodes evaluated for node-protection.

By comparison, for roughly the computational cost of evaluating three PQ nodes for node-protection, MRT produces a path which is guaranteed to be node-protecting, if node protection is possible.  In cases where node-protection and maximum coverage is important, it seems reasonable to give operators the option of having an efficient means of generating a node-protecting path as opposed to the trial and error approach of evaluating large numbers of PQ nodes, which may or may not ultimately provide a node-protecting path.

Thanks,
Chris

-----Original Message-----
From: rtgwg [mailto:rtgwg-bounces@ietf.org] On Behalf Of Rob Shakir
Sent: Monday, December 14, 2015 11:32 AM
To: Alvaro Retana <aretana@cisco.com>; rtgwg@ietf.org; draft-rtgwg-mrt-frr-architecture@tools.ietf.org; Stewart Bryant <stewart.bryant@gmail.com>
Subject: Re: WGLC for draft-rtgwg-mrt-frr-architecture

Hi Stewart, Authors,

On 3 December 2015 at 11:03:27, Stewart Bryant (stewart.bryant@gmail.com(mailto:stewart.bryant@gmail.com)) wrote:
> Firstly my high order bit, and I suspect I will be in the rough on 
> this. I am not convinced that this is the best solution that we can 
> produce to this problem, and I am concerned about the operational 
> issues that result from the non-intuitive repair paths when compared 
> to other methods. I am also concerned about our ability to get a 
> solution as complicated as this right first time in all of the corner cases. Thus I think that rather than recommend this to the industry as a standard, we should issue it as informational and see how it works out at scale.

Whilst I’m not sure that I can comment whether this is the ‘best’ solution - I have (previously) spent some time thinking about the problem of disjointness and the applicability of this technique to some of the operational problems that I have had to deal with.

I have a couple of operationally-focused considerations that I think the operator of any network that is thinking of deploying MRTs would need to be thinking about. Primarily, we must say that we are happy with having one set of disjoint topologies (the MRT red/blue), which is utilised during the failure case. This might mean that node A and node B, which are topologically very close to one another might forward via a more indirect path based on the fact that the ‘red’ or ‘blue’ single tree that was selected is less optimal than a local set of disjoint paths that could be found. This means that we need to be able to say that the traffic that we are carrying via this network is also tolerant of that sub-optimality between those two nodes. In a world of less networks, and more variant traffic being carried on them, I am not sure that we can.

The second concern that I have with this approach is around operational influence of path selection. I have not re-reviewed the architecture again in great detail before sending this e-mail, so would apologise in advance if there are nuances that I did not understand. We have seen from LFA that whilst there may be N ways for traffic to get from A to B during a repair, from an operational perspective, not all paths are created equal (and hence we have the LFA manageability work). For example, paths that conform to ‘normal’ traffic routing within the network are generally preferred for manageability and capacity planning reasons. I would agree that we should leave this draft as informational, until such time as we have demonstrated, in live networks,  whether these operational challenges can be met by the MRT approach.

The reason for my reticence is that in the networks that I have operated that have tried to have single network-wide distribution trees, or single sets of repair paths, we have always ended up splitting things out (even when from a service perspective, they may have looked the same) to meet the myriad of business and operational concerns that we need to meet. To this end, I would be encouraging any operator that is thinking of using MRT for disjointness, or for repair, to consider what happens when they end up needing different treatment per-application, per-customer, or per-traffic class. At this point, one would want to compare the complexity of whether MRTs could be extended to these applications (that is to say, whether a set of (network-wide) “global” repair paths will be suitable), or whether they really would be better setting the foundation of having the ability to deploy a local repair paths, and building their operational processes around that.

At the end of the day, in my opinion, it is the operational process around these technologies that is the most complex thing to change, and I’m concerned that if I recommended building this process around MRT, I would be establishing new processes around something that offered more ability to do locally- rather than globally-optimal repair paths in the future.

Just my $0.02. There’s some interesting work here, but I share some of Stewart’s concerns.

Kind regards,
r.

_______________________________________________
rtgwg mailing list
rtgwg@ietf.org
https://www.ietf.org/mailman/listinfo/rtgwg