Re: Rtgdir early review of draft-bashandy-rtgwg-segment-routing-ti-lfa-00

"Ahmed Bashandy (bashandy)" <bashandy@cisco.com> Thu, 15 June 2017 14:52 UTC

Message-ID: <59429F00.5060503@cisco.com>
Date: Thu, 15 Jun 2017 07:51:44 -0700
From: "Ahmed Bashandy (bashandy)" <bashandy@cisco.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0
MIME-Version: 1.0
To: Stewart Bryant <stewart@g3ysx.org.uk>, rtg-dir@ietf.org
CC: draft-bashandy-rtgwg-segment-routing-ti-lfa.all@ietf.org, ietf@ietf.org
Subject: Re: Rtgdir early review of draft-bashandy-rtgwg-segment-routing-ti-lfa-00
References: <149625574151.19908.2374594318145684422@ietfa.amsl.com>
In-Reply-To: <149625574151.19908.2374594318145684422@ietfa.amsl.com>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf/d9M5XvUrk1ldq5OK00_Xoqdt-Ho>
Precedence: list

Thanks for the detailed review
See inline #Ahmed

Ahmed

On 5/31/2017 11:35 AM, Stewart Bryant wrote:
> Reviewer: Stewart Bryant
> Review result: Has Issues
>
> These review comments were incorrectly posted against the uloop draft,
> apologies for any confustion.
>
> I have been asked to perform an early review of this document on
> behalf of the Routing Directorate.
>
> Summary:
>
> A document on this subject is something that the WG should publish,
> but I think that there are number of issues that the WG need to
> discuss and reach consensus on before deciding whether or not they
> should adopt this draft as a starting point for that work.
>
>
> Major Issues:
>
> Before I get into the substance I am surprised that there are no IPR
> disclosures. In an earlier and related work
> (draft-francois-segment-routing-ti-lfa-00) there were three IPR
> disclosures.
#Ahmed
OK. We will take care of the IPR disclosures
>
> The work has four basic components, the concept of resolving the
> problem of P and Q being non-adjacent, the use of SR to solve the
> non-adjacency, the use of the post convergence path following failure
> and the applicability of these techniques to an SR network. The first
> and second points seem of utility in non-SR networks, and so I am
> surprised that they are not called out as such, in the first case
> perhaps with consideration to strategically places RSVP tunnels, or
> binding segments.
#Ahmed:
The draft is specific to SR as it is clear from the title. But I can add 
a statement to mentioning that other traffic steering protocols and/or 
algorithms may be used but are outside the scope of this document
>
> The issue of mapping repair path to the post convergence path to the
> something that has always concerned me in this concept. It is true
> that traffic that always passes through the PLR will experience the
> properties the authors describe, but not all traffic will pass through
> the PLR post convergence. The post failure path will be topology
> dependent, and may take a different path from the point of ingress.
#Ahmed
I agree. We are only protecting traffic that flows via the PLR.
>
> I am also concerned that the authors do not discuss the need for loop
> free convergence, since although traffic going through the repair path
> will be loop-free, traffic arriving at the PLR might not be. Consider
> for example a topology fragment that looks like a clock with a router
> at each minute. Traffic enters at 9 o'clock, leave at 3 o'clock and
> goes via 12 o'clock and 12 o'clock fails.  The routers 9..12 will
> re-converge at different times and this may give rise to the
> micro-looping of traffic trying to get to the PLR. A summary of the
> problem and a pointer to the companion draft may be sufficient.
#Ahmed
This draft address only loop-free alternate due to local failure as it 
is very clear from its name. The concept of microloop avoidance is 
discussed in a separate draft as you may already know. I will add a 
reference to that other draft as you suggested.
>
> Finally on the basic concept it would be good to state up from whether
> the proposal is constrained solely to SR networks, or whether the
> authors believe that the concept is of wider applicability. It see no
> reason why it would be constrained to only work on SR networks.
#Ahmed
Again the draft is specific to SR. There are other generic RFCs in the 
references. In this draft we are talking about SR-based techniques.
>
> There is no discussion of multiple failures, nor as far as I can see
> of failures that are worse than anticipated. This is an important
> point that needs to be established early. Some methods, (MRT)
> intrinsically address multiple failures, others (NV) intrinsically
> exclude them. Simple LFA needs a supervisor to quickly abandon all
> hope when they occur.
#Ahmed
The draft mentions in multiple places that the protection is done for 
node and SRLG failures. But I can add a statement that says that we are 
protecting against a single link, single node, or single SRLG group 
failure to avoid any ambiguity
>
> In an SR network the paths used are not the shortest paths, they are a
> collection of shortest paths, so there needs to be some discussion on
> the interaction between the SR paths and repair paths to consider
> whether it is unconditionally safe against forwarding loops. It would
> presumably be so if the authors borrowed the concept of repair
> addresses rather than normal forwarding addresses from not-via, but I
> don't think they have done this.
#Ahmed
The traffic is steered over the shortest path(s). Traffic is steered 
over the post convergence path(s) by stitching path segments, where each 
path segment is either a shortest path on its own or a link. The 
resultant composite path(s) are loop free post-convergence shortest path(s)
There is no need to use any other concept other than the usual shortest 
path and/or links
>
> There should also be some discussion on the original path constraints
> that are applicable to the repair. Presumably the ingress node
> constrained the traffic to go though failed node F for a reason. If
> the repair is unconstrained that reason could be violated, but this is
> not discussed in the text.
#Ahmed
The draft talks about shortest path only as calculated by the IGP on the 
PLR. Constrained shortest path is beyond the scope of this document. 
Although IMO it is clear, I will add a statement to clearly mention that 
we are protecting standard IGP shortest paths as calculated by routing 
protocols using SR methodology
>
>
> In the Security section you say:
>
>     The behavior described in this document is internal functionality
>     to a router that result in the ability to guarantee an upper bound
>     on the time taken to restore traffic flow upon the failure of a
>     directly connected link or node. As such no additional security
>     risk is introduced by using the mechanisms proposed in this
>     document.
>
>
> SB> I am not sure that the above is correct. There may be a security
> reason
> SB> why a packet was steered along a path which breaks when you use
> this
> SB> technique.
#Ahmed

What we are doing here is explicitly steering the traffic along the post 
convergence path without waiting for the the network to converge. This 
is the path that the user has selected to use when the primary path 
fails. Hence we are not really violating any constraint as long as 
SR-based traffic steering is used
We agree that constrained shortest path which utilize LFAs calculated by 
the techniques proposed in this draft may not conform to the original 
set of constrains. But as we mentioned above, protecting constrained 
shortest path is not a topic of this draft

>
> In the conclusion you say:
>
>     The
>     mechanism is able to calculate the backup path irrespective of the
>     topology as long as the topology is sufficiently redundant.
>
>
> SB> That is certainly true in classic. I am not sure this is
> universally
> SB> true under SR which includes the use of non-shortest path and
> SB> binding segments.
#Ahmed
As mentioned above, this draft is specific to protecting standard 
shortest path using SR
>
>
> Minor issues:
>
>     For each destination in the network, TI-LFA prepares a data-plane
>     switch-over to be activated upon detection of the failure of a
>     link used to reach the destination.
>
> SB> To make the scaling clearer to the reader, I think you need
> SB> to make it clear that for each protected link, you determine
> SB> the repair needed to reach every destination reachable over that
> SB> link. You sort of say that, but it's a bit hidden.
#Ahmed
I do not understand the difference between the text in the draft and the 
text that you are proposing.
>
>     We provide the TI-LFA approach that achieves guaranteed coverage
>     against link, node, and local SRLG failure, in any IGP network,
>     relying on the flexibility of SR.
>
> SB> Should that be any SINGLE link.... failure?
#Ahmed
Agreed. I can reword it to say that it will protect against the failure 
of any one of the following: single Link, single node, or single local 
SRLG failure in any IGP network, using SR.
>
> In the text (and the text that follows)
>
>     To do so, S applies a "NEXT" operation on Adj(S-F) and then two
>     consecutive "PUSH" operations: first it pushes a node segment for
> F,
>     and then it pushes a protection list allowing to reach F while
>     bypassing S-F.
>
> You need to reference the SR operations.
#Ahmed
Agreed. I will add a reference there
>
> Also you are considering Adj segments, and presumably they were there
> for a reason, but you do not discuss that.
#Ahmed
Adj segments are used for steering. It is a basic SR concept that is not 
specific to ti-lfa. If you are asking whether technique proposed in this 
draft can be used to protect adj-SIDs, then the answer is yes. I will 
add description to that in the next versions
>
> In 5.3.1 and 5.3.2 you have a list of conditions, but do not make it
> clear whether any or all must be true.
>
> Nits
>
> 1. Introduction
>
>     Segment Routing aims at supporting services with tight SLA
>     guarantees [1]. This document provides a local repair mechanism
>     relying on SR-capable of restoring end-to-end connectivity in the
>     case of a sudden failure of a network component.
>
> SB> Grammar needs a little work in the last sentence.
#Ahmed
OK
>
> In Fig 1, I assume that the blobs are network fragments.
>
> In the conclusion you say:
>     This document proposes a mechanism that is able to pre-calculate a
>     backup path for every primary path so as to be able to protect
>     against the failure of a directly connected link or node.
> SB> you need to add SRLG
#Ahmed
Agreed
>

Rtgdir early review of draft-bashandy-rtgwg-segme… Stewart Bryant
Re: Rtgdir early review of draft-bashandy-rtgwg-s… Ahmed Bashandy (bashandy)