RE: TI-LFA

"Stephane Litkowski (slitkows)" <slitkows@cisco.com> Fri, 21 January 2022 10:41 UTC

IronPort-PHdr: A9a23:6yin1BdtEpCJR/XjPGsrxkGblGM/tYqcDmcuAtIPh7FPd/Gl+JLvdAza6O52hVDEFYPc97pfiuXQvqyhPA5I4ZuIvH0YNpAZURgDhJYamgU6C5uDDkv2ZPfhcy09GpFEU1lot3G2OERYAoDwfVrX93az9jUVXB74MFkdGw==
IronPort-Data: A9a23:l5/csqr+oYJqhLhcw6oFgnpSbzleBmLhZxIvgKrLsJaIsI4StFCztgarIBmDbvuDMTT0Lo1+boXnp0MAvpLUz95lHAE9+CEzRiwa+ePIVI+TRqvS04x+DSFioHqKZKzyU/GYRCwPZiKa9kfF3oTJ9yEmj/nRHOOkUoYoBwgoLeNaYHZ54f5cs7ZRbr5A2bBVMivV0T/Ai5S31GyNg1aYBlkpB5er83uDihhdVAQw5TTSbdgT1LPXeuJ84Jg3fcldJFOgKmVY83LTegrN8F251juxExYFENiplPPwdVcHB+eLewOPkXFRHaOlh3CupARrjf19b6VaOBwR0mnW9zxy4I0lWZiYUgMoIq7Lh/81WBhDGCY4NqpDkFPCCSfm75HJlxedLhMAxN0rVinaJ7Yw4Pp4Hm5m9PEEJnYKdB/rr/iqz620D+Jsj8U5N+HqMZ8R/HZ6wlnxAewvT4yGQqjW65pCwDgviYVVEPnbaswFLCBocQjBZRIKJlZSDJw3tOalmne5dCdXwHqOuaco6nLPigZrwbHrLPLaf9WLQYNemUPwm45s1wwVGTkAP9CZjDGC6H/p37aJliLgU4VUH7q9nsOGSWa7ngQ7YCD6n3PiyRVhtnODZg==
IronPort-HdrOrdr: A9a23:hALFy6t2fDWzXTBrCObLZX2S7skCxoMji2hC6mlwRA09TyXGraGTdaUguyMc1gx/ZJh5o6H8BEGBKUmskqKdkrNhQYtKPTOW9FdASbsD0WKM+UyaJ8STzJ856U4CSdkwNDSTNykBsS+S2mDReLxMrKjlgcKVbIzlvhFQpHRRGtldBnBCe3+m+yNNNW17LKt8MKDZyttMpjKmd3hSRN+8HGM5U+/KoMCOvI76YDYdbiRXpjWmvHeN0vrXAhKY1hARX3dk2rE561XIlAT/++GKr+y78BnBzGXehq4m2ecJi+EzRPBkuPJlaAkEuTzYIbiJnIfy+Azdldvfq2rCVuO85CvIcf4DrU85NVvF3icFkzOQrgrGrUWSkGNxRRDY0JfErPVQMbsYuWsRSGqp12Mw+N57y65FxGSfqt5eCg7Bhj3045zSWwhtjVfcmwtorQc/tQ0XbWIlUs4YkWXfxjIhLL4QWCbhrIw3GuhnC8/RoP5QbFOBdnjc+m1i2salUHg/FgqPBhFqgL3Z7xFG2HRii0cIzs0WmXkNsJo7Vplf/uzBdqBljqtHQMMaZb90QO0BXcy0AGrQRg+kChPZHX33UKUcf37doZ/+57s4oOmsZZwT1ZM33I/MVVtJ3FRCDX4Gyff+q6Gj3iq9M1lVbA6dvv22vaIJyoEUbICbQxG+dA==
From: "Stephane Litkowski (slitkows)" <slitkows@cisco.com>
To: Stewart Bryant <stewart.bryant@gmail.com>
CC: "draft-ietf-rtgwg-segment-routing-ti-lfa@ietf.org" <draft-ietf-rtgwg-segment-routing-ti-lfa@ietf.org>, rtgwg-chairs <rtgwg-chairs@ietf.org>, "rtgwg@ietf.org" <rtgwg@ietf.org>
Subject: RE: TI-LFA
Thread-Topic: TI-LFA
Thread-Index: AdgOssBJlRagMf2GRpiPLjeWvg6/hg==
Date: Fri, 21 Jan 2022 10:40:40 +0000
Message-ID: <SJ0PR11MB51361D776AE46FA6D075CB88C25B9@SJ0PR11MB5136.namprd11.prod.outlook.com>
Accept-Language: fr-FR, en-US
Content-Language: en-US
x-ms-exchange-antispam-messagedata-chunkcount: 1
x-ms-exchange-antispam-messagedata-0: god7flNdKsA2GMWMSVSeaA3TrCVF40RFsJLtK2Y/GJixBi8bPgU+/W1IDNI+7OKXnx5wXaBwAi6Hv4GSFrlqva0gZHs3kLjZv33my/toeM6K26JQlNHXHF776SmjTZx1BskPX04T1wB02AG4WGteDcZul9g7b4NVB/4J6zIXzRoD1hBh36f5Ps2MRwd524AClhvNdZRc2Goz6/aXObcM5OWV9AxJ9dKJai5/N94/Qa+D49DpWPwWJTEai5rLO7cnX7ccNqZpr3BKZEqC7hmOI+VPKYagtr9UsloOnf4iXd3cvTEZDSRngrmpvL/Lf96f5R4vX5WWa5sKSWkHDe/LQyJDs4+C40HwBdQApcCLywT8GxAtepvI0uOCRSTyDbhJkB+geRETh4jDmax+dCWXmpTw6/sBK1EGXG0JgiQcrXFyS86QJ9BGxyCKvbW5jLqvN7zoeFLS5JE4sahDeb2BbIXb/3B7CiQoHAJFhlNg+NZx62C3PYiqhwxVUMzhGOF0WtkjsWfAOxb7Yvs9KlDSyWBtf1yIRPbFMhZl8NIjA0BR+n4MPI5H2lZAkwoj/ggJEngvS9Jwwx2diHY5GvY/dzPYg6CkY1ScYP/XOxIXWI367rS9Ytb4tHjH2bYWiaf8g99UnjZ3G8L7oov9gWO5WRg9domI0qzQNA4on00fAjFbxjT9hQWzSKDpm0AJnnSwWCrwE8DUx1GcoQbHnLyBmz9feyXRYP0DHCKlslhDfmuL4Cd1f47G+EGUlG9NNshWGQDtn/EasAHbdv4bCXRMXHLFS3d6zyemzV1gESYE5MgCBPMvfXPpFxj5Hy56a98/yyprbMXXuUFZvuf0ijiLRy1qpNbeE2tnUwiMMaeImg2wjaX8m0tVNaO3iBWzjLQ+vBJNhjP4koDOlwgdd+uODf5+UtCMtUSW8S0Y2kQAAYIcl7XhRXukWwGF0weZ0KVgISLxmzqFDLtdr3j/D496k6sNrjpJyf3I2D2a69Gmz4A9w+HzQBheVRGnzyXJ0jSEV5putFNT8fnkteCOQrJp6izqwjamtBkIu+uGR2zObVkjs/fYfoLDOf7kR6QAOVEERrDZLKOzgcdrBPyf6N+SSqwy/jJ9YSlZ4JTUC9IJ8FEvp1WuUnjEiJY2rqT/HFszRWAWGDzbBLpFhPAeo6zvjJPDleY2VWucWpuKlJLvnj5S1RfEJSm6ZEU4l49ihRGSOCD+MghafsuAt/TAcUrF51p3U+GXOTVQ5rFwzcXtqFqe7IiATEyyck8uXBvjfV6JqGEtm2fc08XWzqpQAK89ruechEhPz0JT85NIZ9cGQvHqogxjIlPaB+L4uy0VRoFar+Jux4qpmbA8PikhNd2F5PycclRM6s+q6vZJtAIolojU1qsfbUag3cK25ndmNuKz4/92ujdOD8Hq1TMK4gHGPHAkk8pl4+AnGADq/MojRmVFOadfUOdkGqzrPpujPRQJeQGe29QoV7cf8wh4Og6QMw/lkzOCLSFjkhWG9A5owDMxoxCZggc/Xo1rBgyqMmUutwLh+M+CtWOVgzvKC7OFoGPL6+eFHb/6FvQ+wc46jjX9BThElVaKAkfFT5ea0zfESeXl4wM0QNL6N5vTbC1Y/ZkQsZUzJRq9NwngEF1Bbbk=
Content-Type: multipart/alternative; boundary="_000_SJ0PR11MB51361D776AE46FA6D075CB88C25B9SJ0PR11MB5136namp_"
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: SJ0PR11MB5136.namprd11.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 3e16f8f5-6ca8-4836-e771-08d9dcca7859
X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Jan 2022 10:40:40.9048 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5ae1af62-9505-4097-a69a-c1553ef7840e
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: iNXh1OosBju2bHRblGqFKlwqpKxgoQ3u7tri0rgjyDct3nqpYGqwB5je0OOXGtmy6T0w83eAr0ttWFEdmuJW2A==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ0PR11MB4911
X-OriginatorOrg: cisco.com
X-Outbound-SMTP-Client: 173.37.102.20, xbe-rcd-005.cisco.com
X-Outbound-Node: rcdn-core-11.cisco.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtgwg/4C-TyULkjjllXm52Tzbo1KXSsTw>
X-BeenThere: rtgwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Routing Area Working Group <rtgwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtgwg/>
List-Post: <mailto:rtgwg@ietf.org>
List-Help: <mailto:rtgwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Jan 2022 10:41:01 -0000

Hi Stewart,

Thanks for the comments, and sorry for the late reply.
We just published a new revision of the doc (-08) to address the comments you had.

Please also find some replies inline. ([SLI])

Brgds,

Stephane


From: Stewart Bryant <stewart.bryant@gmail.com<mailto:stewart.bryant@gmail.com>>
Sent: samedi 6 novembre 2021 18:34
To: draft-ietf-rtgwg-segment-routing-ti-lfa@ietf.org<mailto:draft-ietf-rtgwg-segment-routing-ti-lfa@ietf.org>
Cc: Stewart Bryant <stewart.bryant@gmail.com<mailto:stewart.bryant@gmail.com>>; rtgwg-chairs <rtgwg-chairs@ietf.org<mailto:rtgwg-chairs@ietf.org>>; rtgwg@ietf.org<mailto:rtgwg@ietf.org>
Subject: TI-LFA

As I noted earlier today, I took a second detailed look at this text.

I have concerns about the document and think that it could use a more considered, detailed and constructive dialogue between the authors and the WG or at least the members of the WG who understand IPFRR.

One particular concern that I have is that the document seems to start out as a document describing the repair of a regular IP network using SR then abruptly changes to a document on using SR to repair SR.
[SLI] There are additional considerations for SR traffic protection compared to IP, this is what the section address.

Also the document starts out setting out the general case for TI-LFA repairs, which includes the general case of SRLG, but does not describe SRLG repair and then talks about a special (undefined) type of SRLG repair in the results table.
[SLI] the goal of the document is not to provide detailed algorithms that are implementation dependent and don’t require any interop.
The results table are for local SRLGs, this is clearly mentioned in the text, so it is not undefined.


I think that the community would be better served by two documents - a document describing the use of SR to repair a regular network and then a document describing the applicability of those techniques to the special case of an SR network.
[SLI] I don’t see any reason for that. TI-LFA is not applicable outside an SR network.

My comments are below. (Marked SB>)

Best regards

- Stewart


On the 8th August Sasha Vainshtein raised the following point concerning this draft:

https://mailarchive.ietf.org/arch/msg/rtgwg/eHEvqzniwNpGFV7JNTQ1YCYy9jA/

However I have not seen an answer. I think the point needs to be either shown to be invalid, or appropriate text needs to be added to the draft.

========

        Topology Independent Fast Reroute using Segment Routing

               draft-ietf-rtgwg-segment-routing-ti-lfa-07



Abstract



It extends these concepts to provide guaranteed coverage in

   any IGP network.

SB> Strictly that should be in any two connected network using a link state IGP

========



[SLI] Agree, this works with Link State only, will clarify. FIXED



A key aspect of TI-LFA is the FRR path selection

   approach establishing protection over the expected post-convergence

   paths from the point of local repair, dramatically reducing the

   operational need to control the tie-breaks among various FRR options.

SB> If you are going to say "dramatically reducing” then I think there needs to be text in the body of the document justifying this and quantifying “dramatically”, although I am not a fan of the emotive term “dramatic” in a formal engineering text.



[SLI] Fine, removed.





1.  Acronyms



   o  DLFA: Remote LFA with Directed forwarding.



   o  FRR: Fast Re-route.



   o  IGP: Interior Gateway Protocol.



   o  LFA: Loop-Free Alternate.



   o  LSDB: Link State DataBase.



   o  PLR: Point of Local Repair.



   o  RL: Repair list.



   o  RLFA: Remote LFA.



   o  SID: Segment Identifier.



   o  SLA: Service Level Agreement.



   o  SPF: Shortest Path First.



   o  SPT: Shortest Path Tree.



   o  SR: Segment Routing.



   o  SRGB: Segment Routing Global Block.



   o  SRLG: Shared Risk Link Group.



   o  TI-LFA: Topology Independant LFA.



2.  Introduction



   Segment Routing aims at supporting services with tight SLA guarantees

   [RFC8402].  By relying on SR this document provides a local repair

   mechanism for standard IGP shortest path capable of restoring end-to-

SB> I think that needs to be "link-state IGP". The method would not work in a RIP network.

   end connectivity in the case of a sudden directly connected failure

   of a network component.

SB> In a two-connected network of course.



[SLI] Agree, FIXED





Non-SR mechanisms for local repair are







Litkowski, et al.       Expires December 31, 2021               [Page 3]



Internet-Draft                  SR TI-LFA                      June 2021





   beyond the scope of this document.  Non-local failures are addressed

   in a separate document [I-D.bashandy-rtgwg-segment-routing-uloop].



   The term topology independent (TI) refers to the ability to provide a

   loop free backup path irrespective of the topologies used in the

   network.  This provides a major improvement compared to LFA [RFC5286]

   and remote LFA [RFC7490] which cannot provide a complete protection

   coverage in some topologies as described in [RFC6571].
SB> If you are going to evaluate against the other documents published by the IETF, you ought to also compare to RFC7812 and RFC6981 which could provide complete protection, and of course in MPLS the much deployed RSVP-TE tunnel protection method provided complete protection. Then of course there is draft-bryant-rtgwg-plfa-01 which can achieve the same result in non-SR networks including IPv4, and then there is the very first compete coverage IP protection scheme draft-bryant-ipfrr-tunnels-03 which was co-authored by one of the TI-LFA authors.
[SLI] The goal is not to compare with all the possible FRR options. This sentence is simply introducing TI-LFA as a new step in the LFA story this is why it is kept within the LFA/rLFA context.

===========



   For each destination in the network, TI-LFA pre-installs a backup

   forwarding entry for each protected destination ready to be activated

   upon detection of the failure of a link used to reach the

   destination.  TI-LFA provides protection in the event of any one of

   the following: single link failure, single node failure, or single

   SRLG failure.  In link failure mode, the destination is protected

   assuming the failure of the link.  In node protection mode, the

   destination is protected assuming that the neighbor connected to the

   primary link has failed.

SB> I looked for the text and I cannot see where you deal with node failure. Since the TI-LFA path that is node avoiding is different from the TI-LFA path that is simply link avoiding. Same for SRLG.



[SLI] The intent of the draft is not to highlight any specific detailed algorithm which is local to each implementation and doesn’t require interop.









SB> Also it is an advertised property of TI-LFA that the repair path is the same as the post convergence path and hence the result is loop-free. However if the failure is less severe than you are protecting against that property does not remain. This needs to be considered in the text.



[SLI] This is already documented in the intro:

“Readers should be aware that FRR protection is pre-computing a backup path to protect against a particular type of failure (link, node, SRLG).

         When using the post-convergence path as FRR backup path, the computed post-convergence path is the one considering the failure we are protecting against.

         This means that FRR is using an expected post-convergence path, and this expected post-convergence path may be actually different from the post-convergence path used if the failure that happened is different from the failure FRR was protecting against.

         As an example, if the operator has implemented a protection against a node failure, the expected post-convergence path used during FRR will be the one considering that the node has failed.

         However, even if a single link is failing or a set of links is failing (instead of the full node), the node-protecting post-convergence path will be used.

         The consequence is that the path used during FRR is not optimal with respect to the failure that has actually occurred.

“







In SRLG protecting mode, the destination is

   protected assuming that a configured set of links sharing fate with

   the primary link has failed (e.g. a linecard or a set of links

   sharing a common transmission pipe).



   Protection techniques outlined in this document are limited to

   protecting links, nodes, and SRLGs that are within a routing domain.

SB> I think that should be a single link state routing domain. Maybe actually stricter, a single link state area?

[SLI] Right, it should be a single link state area.





   Protecting domain exit routers and/or links attached to another

   routing domains are beyond the scope of this document



   Thanks to SR, TI-LFA does not require the establishment of TLDP

   sessions with remote nodes in order to take advantage of the

   applicability of remote LFAs (RLFA) [RFC7490][RFC7916] or remote LFAs

   with directed forwarding (DLFA)[RFC5714].  All the Segment

   Identifiers (SIDs) are available in the link state database (LSDB) of

   the IGP.  As a result, preferring LFAs over RLFAs or DLFAs, as well

   as minimizing the number of RLFA or DLFA repair nodes is not required

   anymore.



   Thanks to SR, there is no need to create state in the network in

   order to enforce an explicit FRR path.  This relieves the nodes

   themselves from having to maintain extra state, and it relieves the

   operator from having to deploy an extra protocol or extra protocol

   sessions just to enhance the protection coverage.



   [RFC7916] raised several operational considerations when using LFA or

   remote LFA.  [RFC7916] Section 3 presents a case where a high

   bandwidth link between two core routers is protected through a PE

   router connected with low bandwidth links.  In such a case,

   congestion may happen when the FRR backup path is activated.

   [RFC7916] introduces a local policy framework to let the operator







Litkowski, et al.       Expires December 31, 2021               [Page 4]



Internet-Draft                  SR TI-LFA                      June 2021





   tuning manually the best alternate election based on its own

   requirements.

SB> There needs to be some follow on text saying how TI-LFA knows not to do this.

[SLI] TI-LFA enforces expected post-convergence path. This is mentioned multiple time.





   From a network capacity planning point of view, it is often assumed

   that if a link L fails on a particular node X, the bandwidth consumed

   on L will be spread over some of the remaining links of X.  The

   remaining links to be used are determined by the IGP routing

   considering that the link L has failed (we assume that the traffic

   uses the post-convergence path starting from the node X).

SB> An important point which is skated over in TI-LFA is that this assumption is not always valid

   In

   Figure 1, we consider a network with all metrics equal to 1 except

   the metrics on links used by PE1, PE2 and PE3 which are 1000.  An

   easy network capacity planning method is to consider that if the link

   L (X-B) fails, the traffic actually flowing through L will be spread

   over the remaining links of X (X-H, X-D, X-A).  Considering the IGP

   metrics, only X-H and X-D can only be used in reality to carry the

   traffic flowing through the link L.  As a consequence, the bandwidth

   of links X-H and X-D is sized according to this rule.  We should

   observe that this capacity planning policy works, however it is not

   fully accurate.



   In Figure 1, considering that the source of traffic is only from PE1

   and PE4, when the link L fails, depending on the convergence speed of

   the nodes, X may reroute its forwarding entries to the remote PEs

   onto X-H or X-D; however in a similar timeframe, PE1 will also

   reroute a subset of its traffic (the subset destined to PE2) out of

   its nominal path reducing the quantity of traffic received by X.

SB> I am concerned about the previous text. This is an IPFRR text and so X WILL re-route i.e. repair its traffic before PE1 even learns of the X-B failure.

[SLI] The text is not talking about FRR, but capacity planning considering the IGP convergence. So the convergence time of X and PE1 may be different.



   The

   capacity planning rule presented previously has the drawback of

   oversizing the network, however it allows to prevent any transient

   congestion (when for example X reroutes traffic before PE1 does).





              H --- I --- J

              |           | \

   PE4        |           |  PE3

      \       | (L)       | /

        A --- X --- B --- G

       /      |           | \

    PE1       |           |  PE2

       \      |           | /

        C --- D --- E --- F





                                 Figure 1



   Based on this assumption, in order to facilitate the operation of

   FRR, and limit the implementation of local FRR policies, it looks

   interesting to steer the traffic onto the post-convergence path from

   the PLR point of view during the FRR phase.  In our example, when







Litkowski, et al.       Expires December 31, 2021               [Page 5]



Internet-Draft                  SR TI-LFA                      June 2021





   link L fails, X switches the traffic destined to PE3 and PE2 on the

   post-convergence paths.

SB> There is an important point here that we need to check. The implication in the above text that you ECMP into the repair paths. We need to check that this is considered later.

   This is perfectly inline with the capacity

   planning rule that was presented before and also inline with the fact

   X may converge before PE1 (or any other upstream router) and may

   spread the X-B traffic onto the post-convergence paths rooted at X.

SB> I am not sure why we consider convergence for capacity planning here. In any IPFRR design we have to consider that X will redistribute the traffic onto the repair path before any another node acts.



[SLI] Capacity planning is done based on IGP metric and behavior, usually not taking into account FRR behavior. We can say that this is bad, but this is how a lot of (most ?) networks are doing for 20 years. Your point is valid, when TI-LFA kicks in, of course X will switch traffic before any convergence occurs. I think the text is matching your point: it first says that when the failure occurs, X switches traffic (FRR behavior) on postconvergence path. And this behavior is matching the capacity planning rule that was based only on IGP convergence.





   It should be noted, that some networks may have a different capacity

   planning rule, leading to an allocation of less bandwidth on X-H and

   X-D links.  In such a case, using the post-convergence paths rooted

   at X during FRR may introduce some congestion on X-H and X-D links.

   However it is important to note, that a transient congestion may

   possibly happen, even without FRR activated, for instance when X

   converges before the upstream routers.  Operators are still free to

   use the policy framework defined in [RFC7916] if the usage of the

   post-convergence paths rooted at the PLR is not suitable.



   Readers should be aware that FRR protection is pre-computing a backup

   path to protect against a particular type of failure (link, node,

   SRLG).  When using the post-convergence path as FRR backup path, the

   computed post-convergence path is the one considering the failure we

   are protecting against.  This means that FRR is using an expected

   post-convergence path, and this expected post-convergence path may be

   actually different from the post-convergence path used if the failure

   that happened is different from the failure FRR was protecting

   against.  As an example, if the operator has implemented a protection

   against a node failure, the expected post-convergence path used

   during FRR will be the one considering that the node has failed.

   However, even if a single link is failing or a set of links is

   failing (instead of the full node), the node-protecting post-

   convergence path will be used.  The consequence is that the path used

   during FRR is not optimal with respect to the failure that has

   actually occurred.

SB> It is surely more than a matter of optimisation. The loop-free strategy is also compromised.





[SLI] No, if it’s less severe, the path will still be loop free, however if it’s more severe or different failure, the backup path will not be helpful.







   Another consideration to take into account is: while using the

   expected post-convergence path for SR traffic using node segments

   only (for instance, PE to PE traffic using shortest path) has some

   advantages, these advantages reduce when SR policies

   ([I-D.ietf-spring-segment-routing-policy]) are involved.  A segment-

   list used in an SR policy is computed to obey a set of path

   constraints defined locally at the head-end or centrally in a

   controller.  TI-LFA cannot be aware of such path constraints and

   there is no reason to expect the TI-LFA backup path protecting one

   the segments in that segment list to obey those constraints.  When SR

   policies are used and the operator wants to have a backup path which

   still follows the policy requirements, this backup path should be

   computed as part of the SR policy in the ingress node (or central

   controller) and the SR policy should not rely on local protection.

   Another option could be to use FlexAlgo ([I-D.ietf-lsr-flex-algo]) to







Litkowski, et al.       Expires December 31, 2021               [Page 6]



Internet-Draft                  SR TI-LFA                      June 2021





   express the set of constraints and use a single node segment

   associated with a FlexAlgo to reach the destination.  When using a

   node segment associated with a FlexAlgo, TI-LFA keeps providing an

   optimal backup by applying the appropriate set of constraints.  The

   relationship between TI-LFA and the SR-algorithm is detailed in

   Section 7.



========

   Thanks to SR and the combination of Adjacency segments and Node

   segments, the expression of the expected post-convergence path rooted

   at the PLR is facilitated and does not create any additional state on

   intermediate nodes.  The easiest way to express the expected post-

   convergence path in a loop-free manner is to encode it as a list of

   adjacency segments.  However, in an MPLS world, this may create a

   long stack of labels to be pushed that some hardware may not be able

   to push.

SB> Surely not just in the MPLS world?

[SLI] Right text is fixed to make it more general. FIXED





   One of the challenges of TI-LFA is to encode the expected

   post-convergence path by combining adjacency segments and node

   segments.  Each implementation will be free to have its own path

   compression optimization algorithm.  This document details the basic

   concepts that could be used to build the SR backup path as well as

   the associated dataplane procedures.



                                    L     ____

                                 S----F--{____}----D

                                /\    |          /

                               |  |   | _______ /

                               |__}---Q{_______}



                        Figure 2: TI-LFA Protection



   We use Figure 2 to illustrate the TI-LFA approach.



   The Point of Local Repair (PLR), S, needs to find a node Q (a repair

   node) that is capable of safely forwarding the traffic to a

   destination D affected by the failure of the protected link L, a set

   of links including L (SRLG), or the node F itself.  The PLR also

   needs to find a way to reach Q without being affected by the

   convergence state of the nodes over the paths it wants to use to

   reach Q: the PLR needs a loop-free path to reach Q.



   Section 3 defines the main notations used in the document.  They are

   in line with [RFC5714].



   Section 4 suggests to compute the P-Space and Q-Space properties

   defined in Section 3, for the specific case of nodes lying over the

   post-convergence paths towards the protected destinations.













Litkowski, et al.       Expires December 31, 2021               [Page 7]



Internet-Draft                  SR TI-LFA                      June 2021





   Using the properties defined in Section 4, Section 5 describes how to

   compute protection lists that encode a loop-free post-convergence

   path towards the destination.



   Section 6 defines the segment operations to be applied by the PLR to

   ensure consistency with the forwarding state of the repair node.



   By applying the algorithms specified in this document to actual

   service providers and large enterprise networks, we provide real life

   measurements for the number of SIDs used by repair paths.  Section 9

   summarizes these measurements.



2.1.  Conventions used in this document



   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",

   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and

   "OPTIONAL" in this document are to be interpreted as described in BCP

   14 [RFC2119] [RFC8174] when, and only when, they appear in all

   capitals, as shown here.



3.  Terminology



   We define the main notations used in this document as the following.



   We refer to "old" and "new" topologies as the LSDB state before and

   after the considered failure.



   SPT_old(R) is the Shortest Path Tree rooted at node R in the initial

   state of the network.



   SPT_new(R, X) is the Shortest Path Tree rooted at node R in the state

   of the network after the resource X has failed.



   PLR stands for "Point of Local Repair".  It is the router that

   applies fast traffic restoration after detecting failure in a

   directly attached link, set of links, and/or node.



   Similar to [RFC7490], we use the concept of P-Space and Q-Space for

   TI-LFA.



   The P-Space P(R,X) of a node R w.r.t. a resource X (e.g. a link S-F,

   a node F, or a SRLG) is the set of nodes that are reachable from R

   without passing through X.  It is the set of nodes that are not

   downstream of X in SPT_old(R).

SB> This does not look right

SB> Here is the RFC7490 definition of P-space

   P-space:

      The P-space of a router with respect to a protected link is the

      set of routers reachable from that specific router using the pre-

      convergence shortest paths without any of those paths (including

      equal-cost path splits) transiting that protected link.



[SLI] Fixed





SB> I think you need the whole definition but with s/link/resource/ because ECMP is something that has tripped up many FRR schemes.

SB> However if you have an ECMP to the destination from R then that node is in P-space so the second part of the definition seems wrong.

SB> Unless you have a good reason to change the definition beyond the generalisation above I think you should use the original without what looks like an incorrect clarification.



                                         [SLI] Fixed



   The Extended P-Space P'(R,X) of a node R w.r.t. a resource X is the

   set of nodes that are reachable from R or a neighbor of R, without

   passing through X.



SB> Again I think you need to use the RFC7490 definition

     The extended P-space of the protecting router

      with respect to the protected link is the union of the P-spaces of

      the neighbors in that set of neighbors with respect to the

      protected link (see Section 5.2.1.2<https://datatracker.ietf.org/doc/html/rfc7490#section-5.2.1.2>).



                                         [SLI] Fixed



SB> With the substitution of protected resource for protected link. The important difference is that extended P space includes nodes that are not ordinarily reachable from R and need to be forced over the first hop in contravention of the prefailure SPT.





Litkowski, et al.       Expires December 31, 2021               [Page 8]



Internet-Draft                  SR TI-LFA                      June 2021





   The Q-Space Q(D,X) of a destination node D w.r.t. a resource X is the

   set of nodes which do not use X to reach D in the initial state of

   the network.

SB> Again I would be happier with a generalisation of the much thought about RFC7490 definition

   Q-space:

      The Q-space of a router with respect to a protected link is the

      set of routers from which that specific router can be reached

      without any path (including equal-cost path splits) transiting

      that protected link.

[SLI] Fixed



SB> Again with the simple substitution of protected resource for protected link. Again it is important to worry about ECMP.





   In other words, it is the set of nodes which have D in

   their P-Space w.r.t.  S-F, F, or a set of links adjacent to S).

SB> That does not look quite right. I think that it is “the set of links”, but I am worried that the example is not as precise as the base definition.

Perhaps:

   In the example shown in Figure 1 it is the set of nodes which have D in

   their P-Space w.r.t. any of S’s links, F in the case of node protection, and the SRLG that includes S-F in the case of SRLG protectio



   A symmetric network is a network such that the IGP metric of each

   link is the same in both directions of the link.

SB> TI-LFA could work with asymmetric metrics. So we really need to think about how asymmetry is discussed in the text.



[SLI] Yes, it works w/ asymmetric metrics. The use of P and Q space are similar to RFC7490 and RFC7490 doesn’t talk specifically about asymmetric metrics.







4.  Intersecting P-Space and Q-Space with post-convergence paths



   One of the challenges of defining an SR path following the expected

   post-convergence path is to reduce the size of the segment list.  In

   order to reduce this segment list, an implementation MAY determine

   the P-Space/Extended P-Space and Q-Space properties (defined in

   [RFC7490]) of the nodes along the expected post-convergence path from

   the PLR to the protected destination and compute an SR-based explicit

   path from P to Q when they are not adjacent.  Such properties will be

   used in Section 5 to compute the TI-LFA repair list.



4.1.  P-Space property computation for a resource X



   A node N is in P(R, X) if it is not downstream of X in SPT_old(R).

SB> I think that needs to be “a downstream neighbour of X” a node that is further away can surely be downstream in the SPT_old(R).

[SLI] The sentence is correct, any node downstream to the resource X in SPT_old(R) cannot be part of P space, as it’s path from PLR to this node will cross X.





SB> I looked for a definition of downstream in the previous work that would have avoided that ambiguity but did not see it,

   X

   can be a link, a node, or a set of links adjacent to the PLR.  A node

   N is in P'(R,X) if it is not downstream of X in SPT_old(N), for at

   least one neighbor N of R.

SB> I find the above hard to parse do you mean:

SB> if it is not downstream of X in SPT_old(N), for any neighbor N of R.

SB> However I think we would be better pointing the reader to RFC7490 and saying that they should calculate the P space as per RFC7490 * by additionally excluding nodes reachable by the protected resource X where X is other than the next hop link from R to D.



[SLI] There is a type as we use N two times for a different thing. Should be:

“A node

   N is in P'(R,X) if it is not downstream of X in SPT_old(M), for at

   least one neighbor M of R.”







* thus dar we got link failure, so the rest of the sentence deals with the other types of resource.



SB> However I am wondering why we need to go though this step in the explanation.



SB> What you want to end up with is the Ti-LFA P space, call it the TP space. Which is the set of nodes N that are in the P-space of R wrt X where the path to N, including ECMP, is congruent with the path to N in the post failure SPT.



SB> Is it necessary to say any more at this stage of the text?





[SLI] Right, have simplified the text.





4.2.  Q-Space property computation for a link S-F, over post-convergence

      paths



   We want to determine which nodes on the post-convergence path from

   the PLR to the destination D are in the Q-Space of destination D

   w.r.t. link S-F.



   This can be found by intersecting the post-convergence path to D,

   assuming the failure of S-F, with Q(D, S-F).

SB> I am not sure what you mean by “intersecting”



[SLI] The post-convergence path to D (assuming failure) is composed of a set of nodes. Let’s call it S1.

Q(D, S-F) is another set of nodes, let’s call it S2.

The set of nodes that we want to consider here is a subset of S2 which is the intersection between S1 and S2.





SB> However as this is just link failure why do we not just say do what  RFC7490 does?

SB> Again as this is Ti-LFA we can usefully note that these paths will be congruent with the paths computed in the new SPT.





4.3.  Q-Space property computation for a set of links adjacent to S,

      over post-convergence paths



   We want to determine which nodes on the post-convergence path from

   the PLR to the destination D are in the Q-Space of destination D

   w.r.t. a set of links adjacent to S (S being the PLR).  That is, we

   aim to find the set of nodes on the post-convergence path that use

   none of the members of the protected set of links, to reach D.

SB> Isn’t that by definition true of any node that is in the Q space of R wrt X?



   This can be found by intersecting the post-convergence path to D,

   assuming the failure of the set of links, with the intersection among

   Q(D, S->X) for all S->X belonging to the set of links.

SB> Again I (nor I suspect will many readers) do not understand the term intersecting.

[SLI] Intersection is a clear word, IMO, when manipulating sets.







Litkowski, et al.       Expires December 31, 2021               [Page 9]



Internet-Draft                  SR TI-LFA                      June 2021





4.4.  Q-Space property computation for a node F, over post-convergence

      paths



   We want to determine which nodes on the post-convergence from the PLR

   to the destination D are in the Q-Space of destination D w.r.t. node

   F.



   This can be found by intersecting the post-convergence path to D,

   assuming the failure of F, with Q(D, F).

SB> Isn’t the notation the failure of X in the earlier text. However for the moment I am at a lost to understand why the RFC7490 Q space is not always congruent post convergence paths.



[SLI] Q-space may also have nodes that are not on the post convergence path (because they are in a completely different part of the network.



SB> Is there something subtle to do with the fragmentation of the paths that discontiguous SRLG paths may produce in the general case?

SB> In any case I think there needs to be text in the discontiguous SRLG case because it could produce some very interesting corner cases unless dealt with correctly. Maybe you should actually exclude that case from Ti-LFA?



[SLI] What do you mean by discontiguous SRLG paths ?





4.5.  Scaling considerations when computing Q-Space



   [RFC7490] raises scaling concerns about computing a Q-Space per

   destination.  Similar concerns may affect TI-LFA computation if an

   implementation tries to compute a reverse SPT for every destination

   in the network to determine the Q-Space.  It will be up to each

   implementation to determine the good tradeoff between scaling and

   accuracy of the optimization.

SB> We have not introduced the reader to the term “reverse SPT” yet.

[SLI] Added ref to RFC7490, it has a definition.





5.  TI-LFA Repair path



   The TI-LFA repair path (RP) consists of an outgoing interface and a

   list of segments (repair list (RL)) to insert on the SR header.  The

   repair list encodes the explicit post-convergence path to the

   destination, which avoids the protected resource X and, at the same

   time, is guaranteed to be loop-free irrespective of the state of FIBs

   along the nodes belonging to the explicit path.

SB> Guarantee is absolute, and I don’t think you can make that assumption other than for the failure of a single protected link.

SB> If you are node of SRLG protecting and the failure is a simple link failure repair path and the post failure path may not be congruent. Now you would be OK if you only used adjacency segments, but as soon as you use loose SR to reduce the the size of the segment list I think you risk a micro-loop problem. Is there a mathematical proof that this would not be the case?



[SLI] Algorithms that are using loose paths are implementation dependent.





   Thus there is no

   need for any co-ordination or message exchange between the PLR and

   any other router in the network.



   The TI-LFA repair path is found by intersecting P(S,X) and Q(D,X)

   with the post-convergence path to D and computing the explicit SR-

   based path EP(P, Q) from P to Q when these nodes are not adjacent

   along the post convergence path.  The TI-LFA repair list is expressed

   generally as (Node_SID(P), EP(P, Q)).

SB> OK so now I see what you mean by intersecting, but I still think you should explain this to the general reader.

[SLI] Clarified that we consider the intersection of set of nodes of the postconvergence path with set of nodes in Q space.



   Most often, the TI-LFA repair list has a simpler form, as described

   in the following sections.  Section 9 provides statistics for the

   number of SIDs in the explicit path to protect against various

   failures.



5.1.  FRR path using a direct neighbor



   When a direct neighbor is in P(S,X) and Q(D,x) and on the post-

   convergence path, the outgoing interface is set to that neighbor and

   the repair segment list SHOULD be empty.

SB> Why do we need to talk about the segment list at all, by not say use an LFA repair?



[SLI] Because the repair path consists of an outgoing segment list as part of the definition.

We could change it slightly, but having an empty list or no list is at the end the same.

We are clearly saying that the path is an LFA FRR repair, so there is no possible ambiguity.





   This is comparable to a post-convergence LFA FRR repair.







Litkowski, et al.       Expires December 31, 2021              [Page 10]



Internet-Draft                  SR TI-LFA                      June 2021





5.2.  FRR path using a PQ node



   When a remote node R is in P(S,X) and Q(D,x) and on the post-

   convergence path, the repair list MUST be made of a single node

   segment to R and the outgoing interface SHOULD be set to the outgoing

   interface used to reach R.



   This is comparable to a post-convergence RLFA repair tunnel.

SB> Given your earlier discussion about congestion management should you not talk about ECMP here rather than a single path?

5.3.  FRR path using a P node and Q node that are adjacent



   When a node P is in P(S,X) and a node Q is in Q(D,x) and both are on

   the post-convergence path and both are adjacent to each other, the

   repair list SHOULD be made of two segments: A node segment to P (to

   be processed first), followed by an adjacency segment from P to Q.



   This is comparable to a post-convergence DLFA repair tunnel.

SB> I do not think we have yet pointed the reader to a definition of a DLFA repair tunnel have we?

[SLI] this is not really defined as a dedicated RFC, but it is used in couple of drafts already.



SB> The closest I can find with google is draft-bryant-ipfrr-tunnels-00.txt
<https://datatracker.ietf.org/doc/html/draft-bryant-ipfrr-tunnels-00.txt>





5.4.  Connecting distant P and Q nodes along post-convergence paths



   In some cases, there is no adjacent P and Q node along the post-

   convergence path.  However, the PLR can perform additional

   computations to compute a list of segments that represent a loop-free

   path from P to Q.  How these computations are done is out of scope of

   this document.

SB> That seems to me to be a big omission in one of the major selling points of TI-LFA that it can repair any failure.



SB> You should probably say that you do this with a full set of adjacency SIDs, but optimisations may be possible. Such optimisations are outside the scope of this text. However I think that leaves the reader short of the headline claims for this approach.



[SLI] the most basic implementation can simply use a list of Adj-SIDs. I’ll add this statement.





6.  Building TI-LFA repair lists

SB> Until this point in the draft everything is applicable to the use of TI-LFA to repair regular IP/MPLS as well as SR.

SB> I think that this should be called out in the draft.

SB> Then I think there needs to be a discussion of what it means to do SR FRR. For example whether the goal is to simply get the packet to its destination or to get the packet to the next segment endpoint. There are both design and operational implications with the decision and these need to be called out so that the reader fully understand the issues.

SB> Arguably repairing to the segment endpoint is what should be done, on the basis that the segments are created for a reason that the FRR engine is not privy to. In this case the solution for the use of TI-LFA to repair non SR and SR is identical.

SB> There is another point which is moot but perhaps ought to be called out to reassure the reader and that is that the post convergence SR path may not be the same as the pre-convergence SR path. I do not think that this will cause micro loops because the packets travel in the underlay rather than the SR overlay, but it has implications for traffic engineering and order of packet arrival which may be important in some service overlays such as DetNet.





[SLI] Protecting a segment is not different than protecting an IP destination which explains why there is not much to say.  The destination (endpoint) of the segment is considered as the destination to be protected.







   The following sections describe how to build the repair lists using

   the terminology defined in [RFC8402].  The procedures described in

   Section 6.1 are equally applicable to both SR-MPLS and SRv6

   dataplane, while the dataplane-specific considerations are described

   in Section 6.2.



6.1.  Link protection



   In this section, we explain how a protecting router S processes the

   active segment of a packet upon the failure of its primary outgoing

   interface for the packet, S-F.



6.1.1.  The active segment is a node segment



   The active segment MUST be kept on the SR header unchanged and the

   repair list MUST be added.  The active segment becomes the first

   segment of the repair list.  The way the repair list is added depends

   on the dataplane used (see Section 6.2).

SB> The active segment becomes the first segment of the repair list. - I do not understand this. Surely it is the last segment on the repair list in order to eventually deliver the packet back to the SR path?



[SLI] We receive a packet with an active segment which is a node SID X. During FRR, we had the repair-list, and we most the active segment pointer to be the first segment of the repair list instead on node SID X.







Litkowski, et al.       Expires December 31, 2021              [Page 11]



Internet-Draft                  SR TI-LFA                      June 2021





6.1.2.  The active segment is an adjacency segment



   We define hereafter the FRR behavior applied by S for any packet

   received with an active adjacency segment S-F for which protection

   was enabled.  As protection has been enabled for the segment S-F and

   signalled in the IGP, any SR policy using this segment knows that it

   may be transiently rerouted out of S-F in case of S-F failure.

SB> You have introduced this control plane concept without reference to the control plane. I think this needs a reference.

[SLI] Added  ISIS and OSPF refs (informational)



   The simplest approach for link protection of an adjacency segment S-F

   is to create a repair list that will carry the traffic to F.  To do

   so, one or two "PUSH" operations are performed.

SB> I assume that this limit is because of the symmetric cost constraint. It would be useful to advise the reader.



[SLI] I think the text was not accurate enough, pushing the repair list may consist in pushing multiple segments. Agree that in symmetric cost networks, we need a maximum of 2. I’m changing the text slightly as follows:

“To do so, one or more “PUSH” operations are performed. If the repair list, while avoiding S-F, terminates on F, S only pushes segments present of the repair list. Otherwise, S pushes a node segment of F, followed by segments of the repair list. For details on the "NEXT" and "PUSH" operations,

                                refer to <xref target="RFC8402" />.”





   If the repair list,

   while avoiding S-F, terminates on F, S only pushes the repair list.

   Otherwise, S pushes a node segment of F, followed by by push of the

   repair list.  For details on the "NEXT" and "PUSH" operations, refer

   to [RFC8402].



   This method which merges back the traffic at the remote end of the

   adjacency segment has the advantage of keeping as much as possible

   the traffic on the pre-failure path.

SB> It also constrains the packet to the SR segment set which may be important for other reasons.





    As stated in Section 2, when SR

   policies are involved and a strict compliance of the policy is

   required, an end-to-end protection should be preferred over a local

   repair mechanism.  However this method may not provide the expected

   post-convergence path to the final destination as the expected post-

   convergence path may not go through F.  Another method requires to

   look to the next segment in the segment list.



   We distinguish the case where this active segment is followed by

   another adjacency segment from the case where it is followed by a

   node segment.



6.1.2.1.  Protecting [Adjacency, Adjacency] segment lists



   If the next segment in the list is an Adjacency segment, then the

   packet has to be conveyed to F.



   To do so, S MUST apply a "NEXT" operation on Adj(S-F) and then one or

   two "PUSH" operations.  If the repair list, while avoiding S-F,

   terminates on F, S only pushes the repair list.  Otherwise, S pushes

   a node segment of F, followed by push of the repair list..  For

   details on the "NEXT" and "PUSH" operations, refer to [RFC8402].

SB> Please can you explain why you need the two approaches. An adj to F was ok in the original packet why do we need the node?



[SLI] It depend on the implementation behavior, the easiest way is to ensure that the repair list (when computed) always terminates on F. I changed the text in a similar way as the previous section.





   Upon failure of S-F, a packet reaching S with a segment list matching

   [adj(S-F),adj(F-M),...] will thus leave S with a segment list

   matching [RL(F),node(F),adj(F-M)], where RL(F) is the repair path for

   destination F.



SB>[RL(F),node(F),adj(F-M),. . .] to match above

[SLI] FIXED















Litkowski, et al.       Expires December 31, 2021              [Page 12]



Internet-Draft                  SR TI-LFA                      June 2021





6.1.2.2.  Protecting [Adjacency, Node] segment lists



   If the next segment in the stack is a node segment, say for node T,

   the segment list on the packet matches [adj(S-F),node(T),...].



   In this case, S MUST apply a "NEXT" operation on the Adjacency

   segment related to S-F, followed by a "PUSH" of a repair list

   redirecting the traffic to a node Q, whose path to node segment T is

   not affected by the failure.



   Upon failure of S-F, packets reaching S with a segment list matching

   [adj(S-F), node(T), ...], would leave S with a segment list matching

   [RL(Q),node(T), …].

SB> So what about SRLG that was discussed earlier in the draft?

[SLI] SRLG protection is provided by the repair list, the type of protection provided is orthogonal here.



SB> Is this the entirety of the SR cases?



6.2.  Dataplane specific considerations



6.2.1.  MPLS dataplane considerations



   MPLS dataplane for Segment Routing is described in [RFC8660].



   The following dataplane behaviors apply when creating a repair list

   using an MPLS dataplane:



   1.  If the active segment is a node segment that has been signaled

       with penultimate hop popping and the repair list ends with an

       adjacency segment terminating on the tail-end of the active

       segment, then the active segment MUST be popped before pushing

       the repair list.

SB> Do you have to PHP the repair list? Do you always PHP the repair list? I do not see this documented.

[SLI] PHP is a property of a SID, not of the repair list. Each SID used in a repair list may have a different PHP behavior (advertised by IGP)





   2.  If the active segment is a node segment but the other conditions

       in 1. are not met, the active segment MUST be popped then pushed

       again with a label value computed according to the SRGB of Q,

       where Q is the endpoint of the repair list.  Finally, the repair

       list MUST be pushed.

SB> A couple of figures would be useful



6.2.2.  SRv6 dataplane considerations



   SRv6 dataplane and programming instructions are described

   respectively in [RFC8754] and [RFC8986].



   The TI-LFA path computation algorithm is the same as in the SR-MPLS

   dataplane.  Note however that the Adjacency SIDs are typically

   globally routed.  In such case, there is no need for a preceding

   Prefix SID and the resulting repair list is likely shorter.



   If the traffic is protected at a Transit Node, then an SRv6 SID list

   is added on the packet to apply the repair list.  The addition of the









Litkowski, et al.       Expires December 31, 2021              [Page 13]



Internet-Draft                  SR TI-LFA                      June 2021





   repair list follows the headend behaviors as specified in section 5

   of [RFC8986].



SB> This could usefully include some example using the RFC8986 notation.



   If the traffic is protected at an SR Segment Endpoint Node, first the

   Segment Endpoint packet processing is executed.  Then the packet is

   protected as if its were a transit packet.



7.  TI-LFA and SR algorithms



   SR allows an operator to bind an algorithm to a prefix SID (as

   defined in [RFC8402].  The algorithm value dictates how the path to

   the prefix is computed.  The SR default algorithm is known has the

   "Shortest Path" algorithm.  The SR default algorithm allows an

   operator to override the IGP shortest path by using local policies.

   When TI-LFA uses Node-SIDs associated with the default algorithm,

   there is no guarantee that the path will be loop-free as a local

   policy may have overriden the expected IGP path.  As the local

   policies are defined by the operator, it becomes the responsibility

   of this operator to ensure that the deployed policies do not affect

   the TI-LFA deployment.  It should be noted that such situation can

   already happen today with existing mechanisms as remote LFA.



   [I-D.ietf-lsr-flex-algo] defines a flexible algorithm (FlexAlgo)

   framework to be associated with Prefix SIDs.  FlexAlgo allows a user

   to associate a constrained path to a Prefix SID rather than using the

   regular IGP shortest path.  An implementation MAY support TI-LFA to

   protect Node-SIDs associated to a FlexAlgo.  In such a case, rather

   than computing the expected post-convergence path based on the

   regular SPF, an implementation SHOULD use the constrained SPF

   algorithm bound to the FlexAlgo (using the Flex Algo Definition)

   instead of the regular Dijkstra in all the SPF/rSPF computations that

   are occurring during the TI-LFA computation.  This includes the

   computation of the P-Space and Q-Space as well as the post-

   convergence path.  An implementation MUST only use Node-SIDs bound to

   the FlexAlgo and/or Adj-SIDs that are unprotected to build the repair

   list.

================

8.  Usage of Adjacency segments in the repair list



   The repair list of segments computed by TI-LFA may contain one or

   more adjacency segments.  An adjacency segment may be protected or

   not protected.



















Litkowski, et al.       Expires December 31, 2021              [Page 14]



Internet-Draft                  SR TI-LFA                      June 2021





           S --- R2 --- R3 --- R4 --- R5 --- D

                    \    |  \  /

                       R7 -- R8

                        |    |

                       R9 -- R10





                                 Figure 3



   In Figure 3, all the metrics are equal to 1 except

   R2-R7,R7-R8,R8-R4,R7-R9 which have a metric of 1000.  Considering R2

   as a PLR to protect against the failure of node R3 for the traffic

   S->D, the repair list computed by R2 will be [adj(R7-R8),adj(R8-R4)]

   and the outgoing interface will be to R7.  If R3 fails, R2 pushes the

   repair list onto the incoming packet to D.  During the FRR, if R7-R8

   fails and if TI-LFA has picked a protected adjacency segment for

   adj(R7-R8), R7 will push an additional repair list onto the packet

   following the procedures defined in Section 6.



   To avoid the possibility of this double FRR activation, an

   implementation of TI-LFA MAY pick only non protected adjacency

   segments when building the repair list.



SB> I am worried about this text because it talks about one specific case and it is not clear how this works. I assume that there is some unreferenced IPG extension, but I am worried how well this scales in practise. In the preceding works on IPFRR there has been an assumption that protecting against a single failure is OK, but protecting against a multiple failure simply does not scale. The normal practise is to abandon all hope (AAH) i.e. abort the repair if there are two or more failures.



SB> I think that there needs to be a much more comprehensive discussion of multiple failures if an approach beyond AAH is to be described.



[SLI] If I’m not mistaken, this text was added after one of your comment couple of IETFs back. And I think the point was just to highlight that double FRR could happen with TILFA.

I fully agree with your point, FRR is intended to work for a single failure, not multiple cascaded ones. There is nothing new here. I’m changing the text to remind that statement.









9.  Measurements on Real Networks



   This section presents measurements performed on real service provider

   and large enterprise networks.

SB> Were these measurements? I thought this was a simulation of the algorithms on the LSP DB from these networks.

[SLI] Simulations, but at the end, results will be the same as this is purely topological computation. Modified the text





The objective of the measurements is

   to assess the number of SIDs required in an explicit path when the

   mechanisms described in this document are used to protect against the

   failure scenarios within the scope of this document.  The number of

   segments described in this section are applicable to instantiating

   segment routing over the MPLS forwarding plane.



SB> Now there an important note to make here. These tables were in the document back in the days when it was describing the use of SR to protect an IP network. The document has been changed to be a description of how to use SR to protect an SR network. I think that the networks are regular IP networks but if not that should be clarified. As a method of protecting a segment I *think* that this is likely to be pessimistic, but without running the tests on some SR networks that is an assumption.



[SLI] I don’t see the point here. As discussed, pure IP traffic vs SR traffic will be protected in the same way.





   The measurements below indicate that for link and local SRLG

   protection,

SB> What is a local SRLG?

[SLI] SRLGs on the same node (ports on the same linecard for instance). This is known vocabulary as it is used in RFC5286.

It’s also self understandable.





   a 1 SID repair path delivers more than 99% coverage.  For

   node protection a 2 SIDs repair path yields 99% coverage.



   Table 1 below lists the characteristics of the networks used in our

   measurements.  The number of links refers to the number of

   "bidirectional" links (not directed edges of the graph).  The

   measurements are carried out as follows:



   o  For each network, the algorithms described in this document are

      applied to protect all prefixes against link, node, and local SRLG

      failure



   o  For each prefix, the number of SIDs used by the repair path is

      recored







Litkowski, et al.       Expires December 31, 2021              [Page 15]



Internet-Draft                  SR TI-LFA                      June 2021





   o  The percentage of number of SIDs are listed in Tables 2A/B, 3A/B,

      and 4A/B



   The measurements listed in the tables indicate that for link and

   local SRLG protection, 1 SID repair paths are sufficient to protect

   more than 99% of the prefix in almost all cases.  For node protection

   2 SIDs repair paths yield 99% coverage.



   +-------------+------------+------------+------------+------------+

   |   Network   |    Nodes   |  Links     |Node-to-Link| SRLG info? |

   |             |            |            |    Ratio   |            |

   +-------------+------------+------------+------------+------------+

   |    T1       |    408     |      665   |    1.63    |    Yes     |

   +-------------+------------+------------+------------+------------+

   |    T2       |    587     |     1083   |    1.84    |     No     |

   +-------------+------------+------------+------------+------------+

   |    T3       |    93      |      401   |    4.31    |    Yes     |

   +-------------+------------+------------+------------+------------+

   |    T4       |    247     |      393   |    1.59    |    Yes     |

   +-------------+------------+------------+------------+------------+

   |    T5       |    34      |      96    |    2.82    |    Yes     |

   +-------------+------------+------------+------------+------------+

   |    T6       |    50      |      78    |    1.56    |     No     |

   +-------------+------------+------------+------------+------------+

   |    T7       |    82      |      293   |    3.57    |     No     |

   +-------------+------------+------------+------------+------------+

   |    T8       |    35      |      41    |    1.17    |    Yes     |

   +-------------+------------+------------+------------+------------+

   |    T9       |    177     |     1371   |    7.74    |    Yes     |

   +-------------+------------+------------+------------+------------+

                       Table 1: Data Set Definition



   The rest of this section presents the measurements done on the actual

   topologies.  The convention that we use is as follows



   o  0 SIDs: the calculated repair path starts with a directly

      connected neighbor that is also a loop free alternate, in which

      case there is no need to explicitly route the traffic using

      additional SIDs.  This scenario is described in Section 5.1.



   o  1 SIDs: the repair node is a PQ node, in which case only 1 SID is

      needed to guarantee loop-freeness.  This scenario is covered in

      Section 5.2.



   o  2 or more SIDs: The repair path consists of 2 or more SIDs as

      described in Sections 4.3 and 4.4.  We do not cover the case for 2

      SIDs (Section 5.3) separately because there was no granularity in

      the result.  Also we treat the node-SID+adj-SID and node-SID +







Litkowski, et al.       Expires December 31, 2021              [Page 16]



Internet-Draft                  SR TI-LFA                      June 2021





      node-SID the same because they do not differ from the data plane

      point of view.



   Table 2A and 2B below summarize the measurements on the number of

   SIDs needed for link protection



   +-------------+------------+------------+------------+------------+

   |   Network   |    0 SIDs  |    1 SID   |   2 SIDs   |   3 SIDs   |

   +-------------+------------+------------+------------+------------+

   |    T1       |  74.3%     |   25.3%    |   0.5%     |   0.0%     |

   +-------------+------------+------------+------------+------------+

   |    T2       |  81.1%     |   18.7%    |   0.2%     |   0.0%     |

   +-------------+------------+------------+------------+------------+

   |    T3       |  95.9%     |    4.1%    |   0.1%     |   0.0%     |

   +-------------+------------+------------+------------+------------+

   |    T4       |  62.5%     |   35.7%    |   1.8%     |   0.0%     |

   +-------------+------------+------------+------------+------------+

   |    T5       |  85.7%     |   14.3%    |   0.0%     |   0.0%     |

   +-------------+------------+------------+------------+------------+

   |    T6       |  81.2%     |   18.7%    |   0.0%     |   0.0%     |

   +-------------+------------+------------+------------+------------+

   |    T7       |  98.9%     |   1.1%     |   0.0%     |   0.0%     |

   +-------------+------------+------------+------------+------------+

   |    T8       |  94.1%     |   5.9%     |   0.0%     |   0.0%     |

   +-------------+------------+------------+------------+------------+

   |    T9       |  98.9%     |   1.0%     |   0.0%     |   0.0%     |

   +-------------+------------+------------+------------+------------+

           Table 2A: Link protection (repair size distribution)



   +-------------+------------+------------+------------+------------+

   |   Network   |    0 SIDs  |    1 SID   |   2 SIDs   |   3 SIDs   |

   +-------------+------------+------------+------------+------------+

   |    T1       |  74.2%     |   99.5%    |    99.9%   |   100.0%   |

   +-------------+------------+------------+------------+------------+

   |    T2       |  81.1%     |   99.8%    |   100.0%   |   100.0%   |

   +-------------+------------+------------+------------+------------+

   |    T3       |  95.9%     |   99.9%    |   100.0%   |   100.0%   |

   +-------------+------------+------------+------------+------------+

   |    T4       |  62.5%     |   98.2%    |   100.0%   |   100.0%   |

   +-------------+------------+------------+------------+------------+

   |    T5       |  85.7%     |  100.0%    |   100.0%   |   100.0%   |

   +-------------+------------+------------+------------+------------+

   |    T6       |  81.2%     |   99.9%    |   100.0%   |   100.0%   |

   +-------------+------------+------------+------------+------------+

   |    T7       |  98,8%     |  100.0%    |   100.0%   |   100.0%   |

   +-------------+------------+------------+------------+------------+

   |    T8       |  94,1%     |  100.0%    |   100.0%   |   100.0%   |

   +-------------+------------+------------+------------+------------+







Litkowski, et al.       Expires December 31, 2021              [Page 17]



Internet-Draft                  SR TI-LFA                      June 2021





   |    T9       |  98,9%     |  100.0%    |   100.0%   |   100.0%   |

   +-------------+------------+------------+------------+------------+

       Table 2B: Link protection repair size cumulative distribution

   Table 3A and 3B summarize the measurements on the number of SIDs

   needed for local SRLG protection.



   +-------------+------------+------------+------------+------------+

   |   Network   |    0 SIDs  |    1 SID   |   2 SIDs   |   3 SIDs   |

   +-------------+------------+------------+------------+------------+

   |    T1       |  74.2%     |   25.3%    |   0.5%     |   0.0%     |

   +-------------+------------+------------+------------+------------+

   |    T2       |                No SRLG Information                |

   +-------------+------------+------------+------------+------------+

   |    T3       |  93.6%     |    6.3%    |   0.0%     |   0.0%     |

   +-------------+------------+------------+------------+------------+

   |    T4       |  62.5%     |   35.6%    |   1.8%     |   0.0%     |

   +-------------+------------+------------+------------+------------+

   |    T5       |  83.1%     |   16.8%    |   0.0%     |   0.0%     |

   +-------------+------------+------------+------------+------------+

   |    T6       |                No SRLG Information                |

   +-------------+---------------------------------------------------+

   |    T7       |                No SRLG Information                |

   +-------------+------------+------------+------------+------------+

   |    T8       |  85.2%     |   14.8%    |   0.0%     |   0.0%     |

   +-------------+------------+------------+------------+------------+

   |    T9       |  98,9%     |    1.1%    |   0.0%     |   0.0%     |

   +-------------+------------+------------+------------+------------+

         Table 3A: Local SRLG protection repair size distribution



   +-------------+------------+------------+------------+------------+

   |   Network   |    0 SIDs  |    1 SID   |   2 SIDs   |   3 SIDs   |

   +-------------+------------+------------+------------+------------+

   |    T1       |  74.2%     |   99.5%    |  99.9%     | 100.0%     |

   +-------------+------------+------------+------------+------------+

   |    T2       |                No SRLG Information                |

   +-------------+------------+------------+------------+------------+

   |    T3       |  93.6%     |   99.9%    | 100.0%     |   0.0%     |

   +-------------+------------+------------+------------+------------+

   |    T4       |  62.5%     |   98.2%    | 100.0%     | 100.0%     |

   +-------------+------------+------------+------------+------------+

   |    T5       |  83.1%     |  100.0%    | 100.0%     | 100.0%     |

   +-------------+------------+------------+------------+------------+

   |    T6       |                No SRLG Information                |

   +-------------+---------------------------------------------------+

   |    T7       |                No SRLG Information                |

   +-------------+------------+------------+------------+------------+

   |    T8       |  85.2%     |   100.0%   | 100.0%     | 100.0%     |

   +-------------+------------+------------+------------+------------+







Litkowski, et al.       Expires December 31, 2021              [Page 18]



Internet-Draft                  SR TI-LFA                      June 2021





   |    T9       |  98.9%     |   100.0%   | 100.0%     | 100.0%     |

   +-------------+------------+------------+------------+------------+

    Table 3B: Local SRLG protection repair size Cumulative distribution

   The remaining two tables summarize the measurements on the number of

   SIDs needed for node protection.



   +---------+----------+----------+----------+----------+----------+

   | Network |  0 SIDs  |   1 SID  | 2 SIDs   |  3 SIDs  |  4 SIDs  |

   +---------+----------+----------+----------+----------+----------+

   |    T1   |  49.8%   | 47.9%    | 2.1%     |  0.1%    |  0.0%    |

   +---------+----------+----------+----------+----------+----------+

   |    T2   |  36,5%   | 59.6%    | 3.6%     |  0.2%    |  0.0%    |

   +---------+----------+----------+----------+----------+----------+

   |    T3   |  73.3%   | 25.6%    | 1.1%     |  0.0%    |  0.0%    |

   +---------+----------+----------+----------+----------+----------+

   |    T4   |  36.1%   | 57.3%    | 6.3%     |  0.2%    |  0.0%    |

   +---------+----------+----------+----------+----------+----------+

   |    T5   |  73.2%   | 26.8%    | 0%       |  0%      |  0%      |

   +---------+----------+----------+----------+----------+----------+

   |    T6   |  78.3%   | 21.3%    | 0.3%     |  0%      |  0%      |

   +---------+----------+----------+----------+----------+----------+

   |    T7   |  66.1%   | 32.8%    | 1.1%     |  0%      |  0%      |

   +---------+----------+----------+----------+----------+----------+

   |    T8   |  59.7%   | 40.2%    | 0%       |  0%      |  0%      |

   +---------+----------+----------+----------+----------+----------+

   |    T9   |  98.9%   | 1.0%     | 0%       |  0%      |  0%      |

   +---------+----------+----------+----------+----------+----------+

           Table 4A: Node protection (repair size distribution)



   +---------+----------+----------+----------+----------+----------+

   | Network |  0 SIDs  |   1 SID  | 2 SIDs   |  3 SIDs  |  4 SIDs  |

   +---------+----------+----------+----------+----------+----------+

   |    T1   |  49.7%   |  97.6%   |  99.8%   | 99.9%    |  100%    |

   +---------+----------+----------+----------+----------+----------+

   |    T2   |  36.5%   |  96.1%   |  99.7%   | 99.9%    |  100%    |

   +---------+----------+----------+----------+----------+----------+

   |    T3   |  73.3%   |  98.9%   |  99.9%   | 100.0%   |  100%    |

   +---------+----------+----------+----------+----------+----------+

   |    T4   |  36.1%   |  93.4%   |  99.8%   | 99.9%    |  100%    |

   +---------+----------+----------+----------+----------+----------+

   |    T5   |  73.2%   | 100.0%   | 100.0%   | 100.0%   |  100%    |

   +---------+----------+----------+----------+----------+----------+

   |    T6   |  78.4%   | 99.7%    | 100.0%   | 100.0%   |  100%    |

   +---------+----------+----------+----------+----------+----------+

   |    T7   |  66.1%   | 98.9%    | 100.0%   | 100.0%   |  100%    |

   +---------+----------+----------+----------+----------+----------+

   |    T8   |  59.7%   | 100.0%   | 100.0%   | 100.0%   |  100%    |

   +---------+----------+----------+----------+----------+----------+







Litkowski, et al.       Expires December 31, 2021              [Page 19]



Internet-Draft                  SR TI-LFA                      June 2021





   |    T9   |  98.9%   | 100.0%   | 100.0%   | 100.0%   |  100%    |

   +---------+----------+----------+----------+----------+----------+

      Table 4B: Node protection (repair size cumulative distribution)

SB> What are the stats for node protection in the presence of SRLGs?



[SLI] Not evaluated





10.  Security Considerations



   The techniques described in this document are internal

   functionalities to a router that result in the ability to guarantee

   an upper bound on the time taken to restore traffic flow upon the

   failure of a directly connected link or node.  As these techniques

   steer traffic to the post-convergence path as quickly as possible,

   this serves to minimize the disruption associated with a local

   failure which can be seen as a modest security enhancement.  The

   protection mechanisms does not protect external destinations, but

   rather provides quick restoration for destination that are internal

   to a routing domain.



   Security considerations described in [RFC5286] and [RFC7490] apply to

   this document.  Similarly, as the solution described in the document

   is based on Segment Routing technology, reader should be aware of the

   security considerations related to this technology ([RFC8402]) and

   its dataplane instantiations ([RFC8660], [RFC8754] and [RFC8986]).

   However, this document does not introduce additional security

   concern.



11.  IANA Considerations



   No requirements for IANA



12.  Contributors



   In addition to the authors listed on the front page, the following

   co-authors have also contributed to this document:



      Francois Clad, Cisco Systems



      Pablo Camarillo, Cisco Systems



13.  Acknowledgments



   We would like to thank Les Ginsberg, Stewart Bryant, Alexander

   Vainsthein, Chris Bowers, Shraddha Hedge for their valuable comments.



14.  References















Litkowski, et al.       Expires December 31, 2021              [Page 20]



Internet-Draft                  SR TI-LFA                      June 2021





14.1.  Normative References



   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate

              Requirement Levels", BCP 14, RFC 2119,

              DOI 10.17487/RFC2119, March 1997,

              <https://www.rfc-editor.org/info/rfc2119>.



   [RFC7916]  Litkowski, S., Ed., Decraene, B., Filsfils, C., Raza, K.,

              Horneffer, M., and P. Sarkar, "Operational Management of

              Loop-Free Alternates", RFC 7916, DOI 10.17487/RFC7916,

              July 2016, <https://www.rfc-editor.org/info/rfc7916>.



   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC

              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,

              May 2017, <https://www.rfc-editor.org/info/rfc8174>.



   [RFC8402]  Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L.,

              Decraene, B., Litkowski, S., and R. Shakir, "Segment

              Routing Architecture", RFC 8402, DOI 10.17487/RFC8402,

              July 2018, <https://www.rfc-editor.org/info/rfc8402>.



   [RFC8660]  Bashandy, A., Ed., Filsfils, C., Ed., Previdi, S.,

              Decraene, B., Litkowski, S., and R. Shakir, "Segment

              Routing with the MPLS Data Plane", RFC 8660,

              DOI 10.17487/RFC8660, December 2019,

              <https://www.rfc-editor.org/info/rfc8660>.



   [RFC8754]  Filsfils, C., Ed., Dukes, D., Ed., Previdi, S., Leddy, J.,

              Matsushima, S., and D. Voyer, "IPv6 Segment Routing Header

              (SRH)", RFC 8754, DOI 10.17487/RFC8754, March 2020,

              <https://www.rfc-editor.org/info/rfc8754>.



   [RFC8986]  Filsfils, C., Ed., Camarillo, P., Ed., Leddy, J., Voyer,

              D., Matsushima, S., and Z. Li, "Segment Routing over IPv6

              (SRv6) Network Programming", RFC 8986,

              DOI 10.17487/RFC8986, February 2021,

              <https://www.rfc-editor.org/info/rfc8986>.



14.2.  Informative References



   [I-D.bashandy-rtgwg-segment-routing-uloop]

              Bashandy, A., Filsfils, C., Litkowski, S., Decraene, B.,

              Francois, P., and P. Psenak, "Loop avoidance using Segment

              Routing", draft-bashandy-rtgwg-segment-routing-uloop-10

              (work in progress), December 2020.













Litkowski, et al.       Expires December 31, 2021              [Page 21]



Internet-Draft                  SR TI-LFA                      June 2021





   [I-D.ietf-lsr-flex-algo]

              Psenak, P., Hegde, S., Filsfils, C., Talaulikar, K., and

              A. Gulko, "IGP Flexible Algorithm", draft-ietf-lsr-flex-

              algo-15 (work in progress), April 2021.



   [I-D.ietf-spring-segment-routing-policy]

              Filsfils, C., Talaulikar, K., Voyer, D., Bogdanov, A., and

              P. Mattes, "Segment Routing Policy Architecture", draft-

              ietf-spring-segment-routing-policy-11 (work in progress),

              April 2021.



   [RFC5286]  Atlas, A., Ed. and A. Zinin, Ed., "Basic Specification for

              IP Fast Reroute: Loop-Free Alternates", RFC 5286,

              DOI 10.17487/RFC5286, September 2008,

              <https://www.rfc-editor.org/info/rfc5286>.



   [RFC5714]  Shand, M. and S. Bryant, "IP Fast Reroute Framework",

              RFC 5714, DOI 10.17487/RFC5714, January 2010,

              <https://www.rfc-editor.org/info/rfc5714>.



   [RFC6571]  Filsfils, C., Ed., Francois, P., Ed., Shand, M., Decraene,

              B., Uttaro, J., Leymann, N., and M. Horneffer, "Loop-Free

              Alternate (LFA) Applicability in Service Provider (SP)

              Networks", RFC 6571, DOI 10.17487/RFC6571, June 2012,

              <https://www.rfc-editor.org/info/rfc6571>.



   [RFC7490]  Bryant, S., Filsfils, C., Previdi, S., Shand, M., and N.

              So, "Remote Loop-Free Alternate (LFA) Fast Reroute (FRR)",

              RFC 7490, DOI 10.17487/RFC7490, April 2015,

              <https://www.rfc-editor.org/info/rfc7490>.



Authors' Addresses



   Stephane Litkowski

   Cisco Systems

   France



   Email: slitkows@cisco.com<mailto:slitkows@cisco.com>





   Ahmed Bashandy

   Individual



   Email: abashandy.ietf@gmail.com<mailto:abashandy.ietf@gmail.com>















Litkowski, et al.       Expires December 31, 2021              [Page 22]



Internet-Draft                  SR TI-LFA                      June 2021





   Clarence Filsfils

   Cisco Systems

   Brussels

   Belgium



   Email: cfilsfil@cisco.com<mailto:cfilsfil@cisco.com>





   Pierre Francois

   INSA Lyon



   Email: pierre.francois@insa-lyon.fr<mailto:pierre.francois@insa-lyon.fr>





   Bruno Decraene

   Orange

   Issy-les-Moulineaux

   France



   Email: bruno.decraene@orange.com<mailto:bruno.decraene@orange.com>





   Daniel Voyer

   Bell Canada

   Canada



   Email: daniel.voyer@bell.ca<mailto:daniel.voyer@bell.ca>

















































Litkowski, et al.       Expires December 31, 2021              [Page 23]

TI-LFA Stewart Bryant
RE: TI-LFA Stephane Litkowski (slitkows)