RE: draft-ietf-rtgwg-segment-routing-ti-lfa : A simple pathological network fragment

"Voyer, Daniel" <daniel.voyer@bell.ca> Wed, 08 November 2023 15:00 UTC

Return-Path: <prvs=6698c5122=daniel.voyer@bell.ca>
X-Original-To: rtgwg@ietfa.amsl.com
Delivered-To: rtgwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7DC2EC151520; Wed, 8 Nov 2023 07:00:29 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.102
X-Spam-Level:
X-Spam-Status: No, score=-2.102 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bell.ca
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wQTMZ1Zfprfo; Wed, 8 Nov 2023 07:00:24 -0800 (PST)
Received: from ESA2-Wyn.bell.ca (esa2-wyn.bell.ca [67.69.243.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 36D11C17C503; Wed, 8 Nov 2023 06:59:54 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bell.ca; i=@bell.ca; q=dns/txt; s=ESAcorp; t=1699455594; x=1730991594; h=from:to:cc:date:message-id:mime-version:subject; bh=1wVLJVL9KJmtxU2Jz6bvFKi1LMGINSZpj3IGoiVfeJo=; b=cRq4pSpOVhO3n7dAPWM+0oQjTkR3WJUOKA6/TC56QJgZoRqZa4es+JGR qArf+vfNJBKZ+a08U0lfyoKQuTTfCP5v7Coe+50OeZRKx429FzLSW3hE6 fmufGaEUa8WRlS9pbu0ezzosXJsEe7HHoQiy6puXPZCSertbSVVJH82lq +6yhBM3bg0T5MsJKmQ796VkBKx3YKbRy3KJO8mSivlkoUgy2IqtSAgiWV LH+J8+xszD0H97V0sKMK7wmX2xUdegbjM40YjE6bNOT4E+b0CgV3MB4+e sScOYNbI5jHi6g2bQci/SBs8xmiuC2DUNOEvbdoCX0p1TzQcjtSf/Ambs A==;
Subject: RE: draft-ietf-rtgwg-segment-routing-ti-lfa : A simple pathological network fragment
Received: from dm5czo-d00.bellca.int.bell.ca (HELO DG4MBX04-WYN.bell.corp.bce.ca) ([198.235.102.32]) by esa02corp-wyn.bell.corp.bce.ca with ESMTP; 08 Nov 2023 09:59:53 -0500
Received: from DG4MBX01-WYN.bell.corp.bce.ca (142.182.18.27) by DG4MBX04-WYN.bell.corp.bce.ca (142.182.18.30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Wed, 8 Nov 2023 09:59:52 -0500
Received: from DG4MBX01-WYN.bell.corp.bce.ca ([fe80::9d79:eeda:2c4:e2e1]) by DG4MBX01-WYN.bell.corp.bce.ca ([fe80::9d79:eeda:2c4:e2e1%4]) with mapi id 15.01.2507.027; Wed, 8 Nov 2023 09:59:52 -0500
From: "Voyer, Daniel" <daniel.voyer@bell.ca>
To: "bruno.decraene@orange.com" <bruno.decraene@orange.com>, Stewart Bryant <stewart.bryant@gmail.com>
CC: Ahmed Bashandy <abashandy.ietf@gmail.com>, Alexander Vainshtein <Alexander.Vainshtein@rbbn.com>, "rtgwg@ietf.org" <rtgwg@ietf.org>, "draft-ietf-rtgwg-segment-routing-ti-lfa@ietf.org" <draft-ietf-rtgwg-segment-routing-ti-lfa@ietf.org>, rtgwg-chairs <rtgwg-chairs@ietf.org>, Gyan Mishra <hayabusagsm@gmail.com>
Thread-Topic: [EXT]RE: draft-ietf-rtgwg-segment-routing-ti-lfa : A simple pathological network fragment
Thread-Index: AQHaElQ6DKWQHXlOe0eKjpLEIAXyJQ==
Date: Wed, 08 Nov 2023 14:59:52 +0000
Message-ID: <B316DA3B-E7B9-49CB-9D42-40BF555F7EB9@bell.ca>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/16.78.23102801
msip_labels: MSIP_Label_f47c794b-e3ab-43f0-9e0f-29fc3e503192_Enabled=true; MSIP_Label_f47c794b-e3ab-43f0-9e0f-29fc3e503192_SetDate=2023-11-08T13:18:20Z; MSIP_Label_f47c794b-e3ab-43f0-9e0f-29fc3e503192_Method=Standard; MSIP_Label_f47c794b-e3ab-43f0-9e0f-29fc3e503192_Name=Orange_restricted_external.2; MSIP_Label_f47c794b-e3ab-43f0-9e0f-29fc3e503192_SiteId=90c7a20a-f34b-40bf-bc48-b9253b6f5d20; MSIP_Label_f47c794b-e3ab-43f0-9e0f-29fc3e503192_ActionId=48217df0-cec8-4842-b795-ce40969be572; MSIP_Label_f47c794b-e3ab-43f0-9e0f-29fc3e503192_ContentBits=2
x-originating-ip: [172.24.112.71]
Content-Type: multipart/alternative; boundary="_000_B316DA3BE7B949CB9D4240BF555F7EB9bellca_"
MIME-Version: 1.0
X-CFilter-Loop: Reflected
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtgwg/ZSnvLMd7j5BUDS3Ttz3t0MGdkz4>
X-BeenThere: rtgwg@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Routing Area Working Group <rtgwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtgwg/>
List-Post: <mailto:rtgwg@ietf.org>
List-Help: <mailto:rtgwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Nov 2023 15:00:29 -0000

Hi Bruno & Stewart,

I will start unpacking few things as a warm-up for later this pm when we meet.


  1.  TI-LFA is a FRR solution which works. It provides a loop-free protection from the PLR to the destination.
  2.  When IGP convergence starts, micro-loop may happen because of this distributed IGP convergence. It may affect the forwarding from the source/ingress to the PLR (and hence starve the PLR)
  3.  If one promised 50 ms recovery to their customers, one need both a FRR solution and a micro-loop solution. (TI-LFA being a FRR solution, you still need a micro-loop solution)


[DV] ok, at Bell we did extensive testing (multi-vendor) with different topologies, with testset, before going in productions with TI-LFA (and later uloop). TI-LFA include local repair (and node protection) and one can achieve <50ms convergence “before” adding uloop into the mixt. The testing made sure there was NO ECMP paths in the topology just to focus on TI-LFA alone (as well as multi-vendor behavior). Our testing included failure AND recovery.
To me, as it was explained before, TI-LFA and micro-loop address different problems. And since these concepts were introduced since some time ago already, I believe we need real data & facts to continue debating this. I can show something “raw” data later today if need be. Perhaps we could ask EANTC if they have a public report to show.

With that said, when TI-LFA is combine with micro-loop (and overload-bits is tune according to vendor’s platform) you can achieve near <10 ms convergence (w/o ECMP in account).

Thanks,
Dan

From: Bruno Decraene <bruno.decraene@orange.com>
Date: Wednesday, November 8, 2023 at 2:18 PM
To: Stewart Bryant <stewart.bryant@gmail.com>
Cc: Ahmed Bashandy <abashandy.ietf@gmail.com>, Alexander Vainshtein <Alexander.Vainshtein@rbbn.com>, "rtgwg@ietf.org" <rtgwg@ietf.org>, "draft-ietf-rtgwg-segment-routing-ti-lfa@ietf.org" <draft-ietf-rtgwg-segment-routing-ti-lfa@ietf.org>, rtgwg-chairs <rtgwg-chairs@ietf.org>, Gyan Mishra <hayabusagsm@gmail.com>
Subject: [EXT]RE: draft-ietf-rtgwg-segment-routing-ti-lfa : A simple pathological network fragment
Resent-From: <alias-bounces@ietf.org>
Resent-To: <pierre.francois@insa-lyon.fr>, <slitkows@cisco.com>, Clarence Filfils <cfilsfil@cisco.com>, Dan Voyer <daniel.voyer@bell.ca>, <abashandy.ietf@gmail.com>, Bruno Decraene <bruno.decraene@orange.com>
Resent-Date: Wednesday, November 8, 2023 at 2:18 PM

Hi Stewart,

Thanks for your email and your rephrased summary.

Strangely, I feel that we are in agreement. At least, unless I missed a point, I agree with your below email. I’d propose to rephrase it to check if you do agree with my rephase. (3-way handshake seems safer). Please find below my summary:

  1.  TI-LFA is a FRR solution which works. It provides a loop-free protection from the PLR to the destination.
  2.  When IGP convergence starts, micro-loop may happen because of this distributed IGP convergence. It may affect the forwarding from the source/ingress to the PLR (and hence starve the PLR)
  3.  If one promised 50 ms recovery to their customers, one need both a FRR solution and a micro-loop solution. (TI-LFA being a FRR solution, you still need a micro-loop solution)

Are we in sync on the above?
(on a side note, what we call “micro-loops” is “a possibility for micro-loops”. They may not happen (by topology or chance) in which case, the customer did see an improvement with FRR only)

If not, please correct me.
If so,

  *   I agree. This is not new (IMO) and also applicable to RLFA, which did mention this (credit to you) in its section 10. https://datatracker.ietf.org/doc/html/rfc7490#section-10
  *   I had proposed you to add the same text in TI-LFA (for simplicity since you, WG and IESG already agree on this) but after discussion with you and Sasha the current proposed text is the following



  1.  TI-LFA is a local operation applied by the PLR when it detects failure of one of its local links. As such,  it does not affect:
     *   Micro-loops that appear – or do not appear – as part of the distributed IGP convergence [RFC5715]on the paths to the destination that do not pass thru TI-LFA paths

                                                               i.      As explained in RFC 5714, such micro-loops may result in the traffic not reaching the PLR and therefore not following TI-LFA paths

                                                             ii.      Segment Routing may be used for prevention of such micro-loops as described in the micro-loop avoidance draft

     *   Micro-loops that appear – or do not appear - when the failed link is repaired
  1.  TI-LFA paths are loop-free. What’s more, they follow the post-convergence paths, and, therefore, not subject to micro-loops due to difference in the IGP convergence times of the nodes thru which they pass
  2.  TI-LFA paths are applied from the moment the PLR detects failure of a local link and until IGP convergence at the PLR is completed. Therefore, early (relative to the other nodes) IGP convergence at the PLR and the consecutive ”early” release of TI-LFA paths may cause micro-loops, especially if these paths have been computed using the methods described in Section 6.2, 6.3 or 6.4 of the draft. One of the possible ways to prevent such micro-loops is local convergence delay (RFC 8333).
  3.  TI-LFA procedures are complementary to application of any micro-loop avoidance procedures in the case of link or node failure:
     *   Link or node failure requires some urgent action to restore the traffic that passed thru the failed resource. TI-LFA paths are pre-computed and pre-installed and therefore suitable for urgent recovery
     *   The paths used in the micro-loop avoidance procedures typically cannot be pre-computed.


https://mailarchive.ietf.org/arch/msg/rtgwg/oY3gGIZMRCTRptTDxrpuSaBztGY/ (proposal)
https://mailarchive.ietf.org/arch/msg/rtgwg/oY3gGIZMRCTRptTDxrpuSaBztGY/ (next email with Sasha agreeing)

That being said, I’m not married with this text: it’s just that Sasha proposed text (thanks Sasha) and I agreed with it. It’s ok to change the text if you want to propose something else to change some parts. (Personally, I feel that the text could be made more synthetic/shorter, but after so many difficulties to communicate, I was happy to jump on a proposed text).
I would just assume that the text you would propose would be on the same line.


Next, is this micro-loop aspect the only issue you wanted to raise or is there another point?

--Bruno



Orange Restricted
From: Stewart Bryant <stewart.bryant@gmail.com>
Sent: Wednesday, November 8, 2023 9:37 AM
To: Gyan Mishra <hayabusagsm@gmail.com>
Cc: Stewart Bryant <stewart.bryant@gmail.com>; Ahmed Bashandy <abashandy.ietf@gmail.com>; Alexander Vainshtein <Alexander.Vainshtein@rbbn.com>; rtgwg@ietf.org; draft-ietf-rtgwg-segment-routing-ti-lfa@ietf.org; rtgwg-chairs <rtgwg-chairs@ietf.org>
Subject: Re: draft-ietf-rtgwg-segment-routing-ti-lfa : A simple pathological network fragment



On 8 Nov 2023, at 05:18, Gyan Mishra <hayabusagsm@gmail.com<mailto:hayabusagsm@gmail.com>> wrote:



In the below RLFA RFC 7490 style  loop topology R1, R4, R5 are in the extended P space and  and Q space being R5, R6, R3 and TO-LFA algorithm post convergence path calculated RLFA PQ node being R5.

Using section 6.4 to build the post convergence repair path using RFC 5715 near side tunneling the repair path is NodeSid(R5), AdjSid(R6). So a near side tunnel is now built from R1 to R6.

Looping is not an issue with R4 or R5 in looping packets back to R1 as the repair path is built from R1 to R6, tunneling over any nodes with un-converged FIBs.

Micro loop problem solved!



CE1 –R1- R2-/-R3-CE2

     |         |

     R4 – R5 -R6

I think that it is important to note that if R1 reconverges first it will send packets to R4 using normal forwarding. However R4 is ECMP to CE2 via R1 which will micro loop back to R4.

At this point the repair is starved and no longer works.

Hence the point that I have been making and I think the point that Gyan originally made.

Without FRR the network converges in its own time and we accept micro loops and traffic discontinuity for an unknown time plus collateral damage to traffic that never used the failed link.

However once we deploy FRR we make a contract with the user that after a short while - of the order of 50ms - productive forwarding will continue uninterrupted. However this is not the case in some topologies (see above) and thus uloop prevention is required.

The thread has become somewhat difficult to follow with time, so I am now not sure what Bono’s text is. It would be helpful if it were repeated. However I think the draft has to say  that in order to warrant that FRR continues to provide traffic continuity until the network is reconverged a uloop strategy is required.

I would note as it is easily forgotten that a uloop strategy is also required when R2-R3 goes back into service. This is because if R4 converges first it will ECMP back to R1 which will send the packet back to R1.

Now we need to be clear that the micro looking is not the fault of the TiLFA design per se, but given that networks will deploy TiLFA with certain traffic continuity expectations we must clearly note to the reader that those expectations may not be met without addressing the uloop problem.

By way of referencing earlier work, RFC5714 does point to RFC5715 stating that a uloop technology is needed. In Section 10 of RFC 7490 the issue of loops is drawn to the attention of the network manager although perhaps with hindsight the text should be stronger.

- Stewart









____________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc

pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler

a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,

Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.



This message and its attachments may contain confidential or privileged information that may be protected by law;

they should not be distributed, used or copied without authorisation.

If you have received this email in error, please notify the sender and delete this message and its attachments.

As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.

Thank you.

________________________________
External Email: Please use caution when opening links and attachments / Courriel externe: Soyez prudent avec les liens et documents joints