Re: [Idr] I-D Action: draft-ietf-idr-bgp-fwd-rr-02.txt

Igor Malyushkin <gmalyushkin@gmail.com> Mon, 18 March 2024 20:02 UTC

Return-Path: <gmalyushkin@gmail.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 54D80C1D5C4F for <idr@ietfa.amsl.com>; Mon, 18 Mar 2024 13:02:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.107
X-Spam-Level:
X-Spam-Status: No, score=-7.107 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id a3GBGyBbAw4e for <idr@ietfa.amsl.com>; Mon, 18 Mar 2024 13:02:55 -0700 (PDT)
Received: from mail-yw1-x1135.google.com (mail-yw1-x1135.google.com [IPv6:2607:f8b0:4864:20::1135]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A2916C1CAF4C for <idr@ietf.org>; Mon, 18 Mar 2024 13:02:55 -0700 (PDT)
Received: by mail-yw1-x1135.google.com with SMTP id 00721157ae682-609f1f97864so52409717b3.0 for <idr@ietf.org>; Mon, 18 Mar 2024 13:02:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1710792174; x=1711396974; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=6oyrGKEVtEq1mqswOfgCnyrkjTlvmFDxdhhErLqhQo4=; b=QXjhJw9o3iO/6vBmzhEJpNW081mnFG3bnmr1ZCeBt3/5oBtFOVvUnbn/llNbkqMnSc Yco8R4e205o+BfPJYhgqQp6K79l9qabtabvJsQXnVAn0u68G1AFBm5JT20525kHxodZj rzXqVW9/j8HaXg/R24faPTqeul4pRFMbJVAwpG34sN2ZFnJGauXFYXTLFvWX+Iv8x/Ln Q8sfA7ju9EYSRKfimKl29jp+80oKt95y0svdu731xnGoHe54s3odpC6RfJgsWwWscMUc aYvxX+6ecUHou28cAui4XbodiuK/j2wCkFIDuMYKwqrbfsu5iOhWj8qXAjm8E0MrEv2d dWPQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710792174; x=1711396974; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=6oyrGKEVtEq1mqswOfgCnyrkjTlvmFDxdhhErLqhQo4=; b=MFCUdA2LgyRIowronNaog48EDb7/FanbKOvHTNnvcW+oIzmeu3jJDkDy9wRhn9WAmF TvxadO9+81JIHY5sTcdaEYGO1/UegeXtyBUVFJ/jA1lbjKu9890d6U4+HYrFhZX/b4ZH NiYMxsvN3EKoYk8PZuFhWHm3TgkVS5Q+whOqrMJqqs3uP1kxJ0nrdnIcRuzVkUnIludQ T4KttsgImaEPpwRXylf+IPLSHXEK3753ZJJclmrbVZjzI6Ln/SvhhECjwAoBMeAzziUo rYrH5+QvoLPmDTV9h6WodS6aCeOwBbd2o0OZfSoYzd988pBOvpJ2KV3e0n2uMqhfaxEI 3RkQ==
X-Forwarded-Encrypted: i=1; AJvYcCWeWRYjTm/0oguIVKo5mzf1WwSdUalXQqS1eSgS9kWWdUdaO/iFl6zF4m9ZckkFq0d6V5WRuaMGh3s3QBM=
X-Gm-Message-State: AOJu0Yzz1vnVKqenV3erZ3at3zwxaIYy6SRRIQL0igE9Vk1aDMJwBYGy mwe+2wBhZObET+00QemoB+SgNCKy12HA3qukxIu9suYSzi4xyD+N//7v9f9gA16Lmj53FZ9J1uO WP5f1UVYM3FzaWfIalk52hhfK5R2V021F
X-Google-Smtp-Source: AGHT+IGxO2u2FAGuFTnCnmOT+w7nWMQfARJkQ2pz2xbSZ6Ksoa0bLvPdu2oy9VnuBtnVT3ocyCJZpXqY+YR8VqD0BS8=
X-Received: by 2002:a25:844b:0:b0:dcd:741f:490a with SMTP id r11-20020a25844b000000b00dcd741f490amr144946ybm.7.1710792174412; Mon, 18 Mar 2024 13:02:54 -0700 (PDT)
MIME-Version: 1.0
References: <171065415177.59997.7631576612994148063@ietfa.amsl.com> <CAOj+MMEsp_UfuiHdc4U_Bv5o7xsYYK_RryusUZ88u+SH9xifSA@mail.gmail.com> <SJ0PR05MB86322B34D635E7F221C04C0FA22D2@SJ0PR05MB8632.namprd05.prod.outlook.com> <CAEfhRrxDVi_Yw2wtTWiGzjw4pQ-8TF-48UCY5AUxMpKdrbexZw@mail.gmail.com> <CAOj+MMHNAz741WP9Pf2UCSOQRj6YepFh=Q4tzedmwBCm6e289g@mail.gmail.com> <CAEfhRryMW9nyWfnDQdi+R5g-nypg5ppwFy_Gdf71pRMFmZysHA@mail.gmail.com> <CAOj+MMF2-oqZ29hSBgaO+gzYXXyvCRgJ0m-zW2K7CWattCpgrQ@mail.gmail.com> <CAEfhRrw=acXDgVtzUEhqxZcOPYbJwT0Ha36k-ADgZaiy863erg@mail.gmail.com> <CAOj+MMGnup=Qg7+XiyXtCJe0a2csmdFVVLtJj_TmoA4yysbghw@mail.gmail.com> <CAEfhRrxFBay_pctO32G9S9kL8Ry_Y2Cx+rQEknN3D6qU9XG2+Q@mail.gmail.com> <CAOj+MMFDUtDzSVWRyH6U2arCUWsn-hNTA-+_RXxan_hPLS=AAQ@mail.gmail.com> <CAEfhRrwxXgTNR0hWd96WnAi_wOvUqCsQkN=5bvbqycgxqMCkeg@mail.gmail.com> <CAOj+MMEpvT4f3W238RPr2bULuwyqTn0jwg6xRf01dLYp6H_6hg@mail.gmail.com> <CAEfhRrz+6ZP6DMgUj-MOwkGQDwj_Z-q0NUAo6VR2-V9fjk7KAw@mail.gmail.com> <CAOj+MMFmY4LrUc=-Nb+rgOQywcTAQ3JZKA071hsHc9SEo=dM9g@mail.gmail.com> <CAEfhRrzb5GsFX_AQ4dY9+2azVbNRvNZDQda3g_rMRPmj_no1tw@mail.gmail.com> <CAOj+MMGYTYH3zJ-BcM32PYstBBCkV5Hv_t=EZWaxYi7B4sJnHQ@mail.gmail.com>
In-Reply-To: <CAOj+MMGYTYH3zJ-BcM32PYstBBCkV5Hv_t=EZWaxYi7B4sJnHQ@mail.gmail.com>
From: Igor Malyushkin <gmalyushkin@gmail.com>
Date: Tue, 19 Mar 2024 00:02:42 +0400
Message-ID: <CAEfhRry-yfiNN7RovrzMHvyHmMXuSkwfMNRTJ6QxA5uf+7Zf=w@mail.gmail.com>
To: Robert Raszuk <robert@raszuk.net>
Cc: Kaliraj Vairavakkalai <kaliraj=40juniper.net@dmarc.ietf.org>, "idr@ietf. org" <idr@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000d72dfe0613f4d77f"
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/vnYlovEIwHkxFq_CcL-RiNFD2yE>
Subject: Re: [Idr] I-D Action: draft-ietf-idr-bgp-fwd-rr-02.txt
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Mar 2024 20:02:56 -0000

Hi Robert,

I was discussing a more general view on the protection topic. Imagine, you
have a ladder topology (i.e., a square in a region, RFC5439, Figure 5 as
the example) and ABRs have their path to only a directly connected ASBR.
Such topologies are not rare, especially in SP cores. Probably, there are
other examples, but I have none by hand at the moment.

Another point is we aren't talking about some potential solution, many
network equipment already have this shipped. Whether to use it or not is a
decision for a network designer. This is everything that I wanted to state
at the very beginning.

Speaking of the topology from the draft, if ABR23 has a path via P29, it
should have more priority as a protection path because of the metrics. I
agree with you, if we do not account for a double failure (which is rare
for planning teams in my experience), we do not need a protection path
between ABRs.


пн, 18 мар. 2024 г. в 23:35, Robert Raszuk <robert@raszuk.net>:

> Igor,
>
> To hopefully put a final note here please observe that in well designed
> network each ABRs will be receiving at least two paths with different next
> hops. In the discussed case those will be coming via RR27 from ASBR21 and
> ASBR22.
>
> So when you lose such next hop from region-2 you:
>
> a) have already another path via different ASBR
>
> b) it is highly likely that any of the region-2 ASBRs going down will go
> down from both ABRs in the same time.
>
> IMO creating repair between ABRs is unnecessary. Keep in mind that NH
> metric to any remote ASBRs (via any path in region-2) MUST be lower then
> metric to peer ABR's next hop. This means that any local protection will
> use paths from region-2 not such ABR to ABR link.
>
> Thx a lot,
> R.
>
>
>
> On Mon, Mar 18, 2024 at 5:34 PM Igor Malyushkin <gmalyushkin@gmail.com>
> wrote:
>
>>
>>
>> пн, 18 мар. 2024 г. в 20:11, Robert Raszuk <robert@raszuk.net>:
>>
>>> Hi Igor,
>>>
>>>
>>>> This is crucial to my point. The local "repair" action is really not a
>>>>> repair. When you loose your sessions on primary ABR1 to region-2, or you
>>>>> get withdraws etc ...) you need to re-run local (on said ABR1) best path as
>>>>> there is no other trigger to simply activate bulk switchover in data plane
>>>>> to suddenly go to ABR2. This is the key.
>>>>>
>>>> [IM] I see at least three such triggers.
>>>>
>>>
>>> Ahhh see the crux of the matter is that none of those listed below is
>>> applicable to support bulk data plane switchover to back ABR :(
>>>
>> [IM] Generally, all infrastructure routes have several possible next-hop
>> addresses. We monitor the availability of the next hops (and LSPs to them),
>> not prefixes. So, the "bulkiness" here is about our reaction to a next-hop
>> failure. This next-hop is a part of many NHFLEs. But modern gear has lots
>> of optimizations with pointers and so on to make it almost instant.
>>
>>>
>>> An outgoing Region2-faced interface failure.
>>>>
>>>
>>> When you loose interface and ABR is (and should be) connected via at
>>> least two interfaces to each region your IBGP sessions are intact.
>>>
>> [IM] Yes, and it is always a good option to have the second interface as
>> a backup. But it is the same next-hop protection technique and another
>> layer in the hierarchy (upper). Modern devices have several layers of
>> next-hops and can react to different failures.
>>
>>>
>>> And even if this is single interface and you invent new harmful knob to
>>> invalidate IBGP sessions when interface X goes down (this is common
>>> practice on directly connected EBGP sessions only) you still have zoo of
>>> prefixes received over those IBGP sessions and each may have been
>>> advertised with different label.
>>>
>>
>>> So you still need to run full best path and pick new (now backup path)
>>> on a per prefix basis and install one by one in RIB/FIB/LFIB.
>>>
>>
>>>
>>>> An IGP event in Region 2 with a remote ASBR failure.
>>>>
>>>
>>> Ok you can invalidate a remote next hop. Sure. But that means as you
>>> have observed already that each route may be incoming with different label
>>> so you need prefix by prefix pick alternative label and install it in
>>> RIB/FIB/LFIB locally.
>>>
>>  [IM] Actually, this is a parallel process, we can protect our labeled
>> traffic during the calculation of new bests.
>>
>>>
>>>
>>>
>>>> An RSVP event of underlying Region 2's LSP failure.
>>>>
>>>
>>> Whow ... now we see a soft state protocol through into this mix.  And
>>> when an RSVP-TE LSP to remote next hop in region-2 goes down you want to
>>> consider this a trigger to invalidate given next hop ? Ok but we are back
>>> to #2 above. Still one by one ...
>>>
>> [IM] Sure! Most of the modern SP-ready devices react on an LSP failure.
>> Once I tested an IOS device and it didn't react on an LSP failure at all
>> because it had a route toward an NH. Horrible...
>>
>>>
>>>
>>>> All of them describe some group of prefixes and can influence the
>>>> switchover to their backups.
>>>>
>>>
>>> That would be true for vanilla IPv4/\Pv6 SAFI 1. Not for SAFI 4 or SAFI
>>> 76.
>>>
>> [IM] I didn't get it, sorry.
>>
>>>
>>> Cheers,
>>> R.
>>>
>>>