Re: [EXTERNAL] draft-ietf-rtgwg-segment-routing-ti-lfa : A simple pathological network fragment

Yingzhen Qu <yingzhen.ietf@gmail.com> Fri, 03 November 2023 20:13 UTC

Return-Path: <yingzhen.ietf@gmail.com>
X-Original-To: rtgwg@ietfa.amsl.com
Delivered-To: rtgwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 75A09C09074F; Fri, 3 Nov 2023 13:13:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.107
X-Spam-Level:
X-Spam-Status: No, score=-2.107 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LfgFP0zQzWNG; Fri, 3 Nov 2023 13:13:33 -0700 (PDT)
Received: from mail-lj1-x231.google.com (mail-lj1-x231.google.com [IPv6:2a00:1450:4864:20::231]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 665DCC09074A; Fri, 3 Nov 2023 13:13:33 -0700 (PDT)
Received: by mail-lj1-x231.google.com with SMTP id 38308e7fff4ca-2c595f5dc84so37269121fa.0; Fri, 03 Nov 2023 13:13:33 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699042411; x=1699647211; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=iEM03P6uw1tAXI/K882d5Gps8mFEzKlNvZdHmCx+gE4=; b=TXFuvBMf/jf12iDdErXzpZfoPEKf3pOVrSi+uPldyg9+YSLmjoN83pLqqpfj4qeoh6 tABbmL6SKZq8iIY/65kFBo+qAgvzV7SFtL4qAdNYQ+pbjymB4yt8MU2AER8p0PFuR02x L9/KTV2E1kk6VS7+LLLZYTTMVSXvy2M5VC6/QFiA622GX4V01sUGVn96j8DRJTiA+xi/ wkg0JlQfST0cGkpjSZreeA61iwwQJD8FCpbjKe7ijWSqCzZ4S814sqLdRy5uRWmZ2P0s AD04h4/kcYrXfwQx8abXztoxvmJ7pCWWdA4fQwQy+bGGzLnEBfuOLOvWXbOQqcDZ8Ajz cqUg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699042411; x=1699647211; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=iEM03P6uw1tAXI/K882d5Gps8mFEzKlNvZdHmCx+gE4=; b=gWZYdqYr5mcyQpsheYEPRfEBK4xrMMvmQimXxbVsT9JFJBp+d7Dm1lyxx+9DCQoutQ A9I+QhSVM4x5HcR7r9q0nnUhWi8U9I+RwuwCicb0GWn38tSIhAtcLNjMkBN+zqGlS9AJ 2c2oWgOmTKsI3ayj6k713Ilx/V4p+N+oRwiLaFxJ0qO3R/NMJ+YOse2SLfttZl3vUnC8 kpshrOPP/bX5OWuzIA01Rp1T6ArexyQ6mfmodjG4zfKUPZqUmLbdqBzg9KBAFW6vzgCx aT3WsTzeyaRHvoiwAxt/uF8dptYqG3VcMr4jgC+0/F/S7TaEx09qKeI2KiLVmqxTzV/p YgOQ==
X-Gm-Message-State: AOJu0YxVljvDbHG+q9IWdL1m2Vw3UFf2dDGwHVKbiu/hLT9b4tyXd6IM cRZVPSv7lG7UGP2aZHMoPOtdsPw3uaN3FlPWtQ==
X-Google-Smtp-Source: AGHT+IHDi0/C452W7GIWGHQzw3fSdC7mLbcyjkoDhc0EXKfz3j/HHRRjWzaFg3aI9duoawKL2Js44BdW/hQkngCof3M=
X-Received: by 2002:a2e:6a09:0:b0:2c5:22cc:eb38 with SMTP id f9-20020a2e6a09000000b002c522cceb38mr1469472ljc.1.1699042410563; Fri, 03 Nov 2023 13:13:30 -0700 (PDT)
MIME-Version: 1.0
References: <9908D9F3-45C6-497D-B3BF-84D8A68A5013@gmail.com> <AS2PR02MB88395D3114B0DEE583BEEF65F0D7A@AS2PR02MB8839.eurprd02.prod.outlook.com> <60124119-5847-4F52-8BB8-18398A9BA4AC@gmail.com> <AS2PR02MB8839FB5A5537FC3E9F37A560F0D4A@AS2PR02MB8839.eurprd02.prod.outlook.com> <PH0PR03MB63004F32F9AF282ECDB78637F6D9A@PH0PR03MB6300.namprd03.prod.outlook.com> <AS2PR02MB88393EC50B913A5F8C3AB5E2F0D8A@AS2PR02MB8839.eurprd02.prod.outlook.com> <PH0PR03MB6300D9A7F9DC3E2E864EF11EF6D8A@PH0PR03MB6300.namprd03.prod.outlook.com> <CABNhwV30uhLOo52WHAv6YS4Wg0k9gDbkrs1ANuGPPdLzc1=dsw@mail.gmail.com> <PH0PR03MB6300958E56135029D7D336AEF6A6A@PH0PR03MB6300.namprd03.prod.outlook.com> <CABNhwV1T8Wg-JGf3Xi0=KYXut0pyah1PKOxY3edoFeTts+99iQ@mail.gmail.com> <5AC5BAC3-4C95-4A85-93C2-95F2208A8D6B@gmail.com>
In-Reply-To: <5AC5BAC3-4C95-4A85-93C2-95F2208A8D6B@gmail.com>
From: Yingzhen Qu <yingzhen.ietf@gmail.com>
Date: Fri, 03 Nov 2023 13:13:19 -0700
Message-ID: <CABY-gONazNhv5PVHb+dMDkOFSjbQMKhqh000jb-TyCjUq9a2-A@mail.gmail.com>
Subject: Re: [EXTERNAL] draft-ietf-rtgwg-segment-routing-ti-lfa : A simple pathological network fragment
To: Stewart Bryant <stewart.bryant@gmail.com>
Cc: Gyan Mishra <hayabusagsm@gmail.com>, Alexander Vainshtein <Alexander.Vainshtein@rbbn.com>, "rtgwg@ietf.org" <rtgwg@ietf.org>, rtgwg-chairs <rtgwg-chairs@ietf.org>, "draft-ietf-rtgwg-segment-routing-ti-lfa@ietf.org" <draft-ietf-rtgwg-segment-routing-ti-lfa@ietf.org>, ahmedbashandy@gmail.com
Content-Type: multipart/alternative; boundary="000000000000570c8606094523de"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtgwg/nFfBHMTLZhgHT8WmgGoc9hPPVU8>
X-BeenThere: rtgwg@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Routing Area Working Group <rtgwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtgwg/>
List-Post: <mailto:rtgwg@ietf.org>
List-Help: <mailto:rtgwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtgwg>, <mailto:rtgwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Nov 2023 20:13:34 -0000

Hi,

I looked at the side meeting schedule, and found the following slot:
 Wednesday (11/8) 14:00 - 15:30.

I hope to have the following key contributors available for the discussion:
Stewart Bryant
Gyan Mishra
Sasha Vainshtein
Stephane Litkowski
Bruno Decraene
Ahmed Bashandy

Of course anyone is welcome to join the discussion.

Please let me know your availability. If this doesn't work out, we'll
schedule an interim after IETF 118.

Thanks,
Yingzhen





On Fri, Nov 3, 2023 at 2:06 AM Stewart Bryant <stewart.bryant@gmail.com>
wrote:

>
>
> On 3 Nov 2023, at 02:43, Gyan Mishra <hayabusagsm@gmail.com> wrote:
>
>         Gyan> TI-LFA is a critical draft for operator SR deployments and I
> agree getting it published asap is a good idea.  All vendors that have
> implemented     TI-LFA  have implemented uLoop.  In reality any operator
> deploying TI-LFA would  always deploy uLoop avoidance at the same time per
> vendor recommendation.  The uLoop I-D  is 7 years old and  is mature as
> every vendor that has implemented TI-LFA has also implemented uLoop,  so I
> think this could be slam dunk to do a quick Adoption followed by expedite
> through WGLC and publish.  The other option is combine the drafts which may
> or may not be favorable to the WG.
>
> The uLoop basic concept is simple —>> building a list of adj-sid from PLR
> to RLFA PQ node merge point with a timer set at time T1 post convergence
> and removed when T2 timer pops.  Simple!  The solution for TI-LFA in my
> mind is not complete without uLoop.  The major issue that Stewart pointed
> out is related to multiple entry points or chain of P space nodes preceding
> the PLR or multiple Q space nodes preceding the RLFA PQ node merge point is
> what I documented in my review.  Any of those longer chain of nodes can
> have uLoop distributed convergence cascaded delays.
>
> TI-LFA implementations aim to solve with optimized least number of SID to
> avoid hardware MSD issues to solve the problem using a single node-sid plus
> maybe an adj-sid and at most 4 sid’s.  Use of node-sid yields ECMP along
> the chain of nodes not yet converged resulting in many possible micro loops
> is the major issue that the  hop by hop list of adj-sid’s along the post
> convergence path solves with the uLoop draft.
>
> I don’t know of any other way to resolve the TI-LFA uLoop issue if
> implemented by itself if node-sid ECMP is utilized.  One option but
> unlikely is in case of chain of nodes exists, that TI-LFA if configured by
> itself w/o uLoop while signaling for MSD maximum threshold, can build an
> adj-sid list across the nodes not yet converged from PLR to PQ node merge
> point.  Other then trying to fix TI-LFA so it can work independently of
> uLoop feature is to do what we have been discussing in the thread about
> adding txt related to micro loops and interaction between       TI-LFA
> draft and uLoop draft.
>
> Cheers,
>
> Gyan
>
>
> As I noted earlier in the thread, unless you need to ensure that the
> repair path is congruent with the post convergence path for TE reasons, you
> never need more than two labels for a link repair.
>
> If you use the procedures in RFC 7490 then at most you need two labels for
> link failure. One can be a normal MPLS label, the second is a label that
> get the packet from P to Q. When we wrote RFC 7490 we did not have SR, so
> we were expecting to use T-LDP which created additional state in the
> network. Now SR-MPLS is deployed you can use an SR label to get from P to Q
> and thus avoid the need for T-LDP [1].
>
> I would point out that none of this actually requires standardisation,
> since the repair is a unitary action by the PLR and uses existing widely
> deployed MPLS technology, i.e. any path that gets to P then Q will work and
> any path can be chosen that meets the needs of the operator. The notion
> that forcing the repair path to the post convergence path from the PLR
>  solves all the TE problems is questionable since, as was noted the very
> first time TiLFA was mooted, the operational traffic may no longer go via
> the PLR post convergence. It is also clear from these discussions that
> whilst TiLFA solves the problem of micro looping along the path from the
> PLR to Q space, that is not adequate in itself and thus not a useful path
> constraint.
>
> Simplifying the design to use exiting RFC 7490 with an SR label to get
> from P to Q would not invalidate any TiLFA implementation but would make it
> clear that implementations could chose any path that best suited their
> needs.
>
> If we expect failure to be a rare event, then we could control the
> convergence with an unoptimised ordered fib solution an approach which is
> also a unitary action at the PLR. Of course the PLR might choose to
> calculate the optimum path cost values to speed up the process.
>
> If we need a more expeditious approach then we can achieve this with a
> method such as nearside tunnelling which also needs at most one ordinary
> MPLS label.
>
> Now let us go up a level. This is an emergency use safety system. Safety
> engineering teaches two things, firstly that such systems are rarely
> executed and thus bugs may remain hidden for a long time before then
> manifest themselves, and secondly they normally need to applied in
> circumstances where instrumentation is difficult. The design philosophy in
> such systems is normally that they are extremely simple and thus will
> obviously work under all circumstances both those that are “expected” and
> those that are “reasonably unexpected”. This is why most safety systems are
> at first glance quite primitive.
>
> With TiLFA I think we have lost sight of the need for simplicity and thus
> have an higher risk of a repair failure than we would have in a simpler but
> adequately functional alternative approach.
>
> Best regards
>
> Stewart
>
> [1] Node failure is intrinsically more complex for all solutions and many
> more labels (or network state) may be needed. This was written up as the
> cartwheel problem in which a node has a black hole effect on the traffic
> and you need to skim the traffic around the rim of the cartwheel.
>
>
>
>