Re: [Idr] I-D Action: draft-ietf-idr-bgp-fwd-rr-02.txt

Robert Raszuk <robert@raszuk.net> Mon, 18 March 2024 14:01 UTC

Return-Path: <robert@raszuk.net>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7F016C15109D for <idr@ietfa.amsl.com>; Mon, 18 Mar 2024 07:01:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.006
X-Spam-Level:
X-Spam-Status: No, score=-7.006 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, HTTPS_HTTP_MISMATCH=0.1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=raszuk.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gAsBDiBT_fQ3 for <idr@ietfa.amsl.com>; Mon, 18 Mar 2024 07:00:56 -0700 (PDT)
Received: from mail-lf1-x12e.google.com (mail-lf1-x12e.google.com [IPv6:2a00:1450:4864:20::12e]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 24CA4C14F610 for <idr@ietf.org>; Mon, 18 Mar 2024 07:00:50 -0700 (PDT)
Received: by mail-lf1-x12e.google.com with SMTP id 2adb3069b0e04-513dc9d6938so3321006e87.2 for <idr@ietf.org>; Mon, 18 Mar 2024 07:00:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raszuk.net; s=google; t=1710770449; x=1711375249; darn=ietf.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=hmQcNDKhE9vWJOQ0oaWcP+VFIy5Rmb0+8tFb2x+3iFw=; b=HoAVdWejE63ixxSOFjXp+1BWx0onwluToPqzEsvfBXf1RzXeA6hdeCniQwB7HFmRqt AgqFcgY8z9m5IAbm4Urpzg7ib7zVCBOxClFzPr6Q2atTdmKusbF/jsdQjASdkuPmmDSZ swEQAlVL/VHogNjpuf68g4Ea+lMkYbaCG1jIroBSqMblqJVP0QDOLmzmCF+8ZYzIY+AM Wd+rOG0ujDqtm/VTLPhb9CQBzvtMJdW4H6asxDcSJomZMT4KArcWaP1EOFyRmWSjR+ml i7G+fCA3c0juoGcqB8UkXCi01p08DMZPq52LAEaxre2sWYXJiutqaU4Yv3cal4uMvS5G Qq0A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710770449; x=1711375249; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=hmQcNDKhE9vWJOQ0oaWcP+VFIy5Rmb0+8tFb2x+3iFw=; b=OtE5JfJeFUcAroWvWeszj0N09HjutOoESKEHgkkYj68jp+NImaNQLw9+PdxfGJbE/w gr2H+4YezI+wuXGl/9372X0AIPG2eWT8smBn+kQw/KCmGQ6HRhneWN9qe0lT10Hlb9ZR QigUcX7Bj0txku4VQSQymVOdHrh8/UFUdqNw+r5rSG9OgOL7BfrXSIgZSGTAKtdQnKuC 3spniKyQVDnkcs7vZmzv+LuRFohM19pN4r92txxgOgSwsYnhV1Akm/4uMuKaixqoyyvi LD9Iq1TOpRNvmhzBb55YT1x4T3AinSUsINvno8CmwuYx3POnM6rJcFpbMPdxQ1lTvO0T sdSQ==
X-Forwarded-Encrypted: i=1; AJvYcCUofuixNTta80wI7kp/kCapyvL0RQk8ln2/kxeg8Qy8PvbKuJ+igXvgQlsBdFUF4yvr9B7c+gwAQEVFdw0=
X-Gm-Message-State: AOJu0YwmRFdmJcwMsXrjfei1DPhmpj+xE/lwnRGpez1VRNbkJQFAZw19 WxNOqQU9qD2m41KA89uePzqt3anPIYvuiqwTg+Sp9Zea15nM3ubjKRH8Ow0NG8OM9niMBNw+/4U cLnqYtVeGCRRVHWwmBETWcB5CZu8Wao05DX16HRcBY1jj/Qij
X-Google-Smtp-Source: AGHT+IEbWVQVuAvvC6oOnXyoMinOEeHHjk44CyGN3DbJlSgXl71tFgGDBV02rh5unPAqChxKBpL9Fd/NwrZJ0mBT6zg=
X-Received: by 2002:a2e:b042:0:b0:2d2:a4e2:bb84 with SMTP id d2-20020a2eb042000000b002d2a4e2bb84mr7576811ljl.47.1710770448752; Mon, 18 Mar 2024 07:00:48 -0700 (PDT)
MIME-Version: 1.0
References: <171065415177.59997.7631576612994148063@ietfa.amsl.com> <CAOj+MMEsp_UfuiHdc4U_Bv5o7xsYYK_RryusUZ88u+SH9xifSA@mail.gmail.com> <SJ0PR05MB86322B34D635E7F221C04C0FA22D2@SJ0PR05MB8632.namprd05.prod.outlook.com> <CAEfhRrxDVi_Yw2wtTWiGzjw4pQ-8TF-48UCY5AUxMpKdrbexZw@mail.gmail.com> <CAOj+MMHNAz741WP9Pf2UCSOQRj6YepFh=Q4tzedmwBCm6e289g@mail.gmail.com> <CAEfhRryMW9nyWfnDQdi+R5g-nypg5ppwFy_Gdf71pRMFmZysHA@mail.gmail.com> <CAOj+MMF2-oqZ29hSBgaO+gzYXXyvCRgJ0m-zW2K7CWattCpgrQ@mail.gmail.com> <CAEfhRrw=acXDgVtzUEhqxZcOPYbJwT0Ha36k-ADgZaiy863erg@mail.gmail.com>
In-Reply-To: <CAEfhRrw=acXDgVtzUEhqxZcOPYbJwT0Ha36k-ADgZaiy863erg@mail.gmail.com>
From: Robert Raszuk <robert@raszuk.net>
Date: Mon, 18 Mar 2024 15:00:37 +0100
Message-ID: <CAOj+MMGnup=Qg7+XiyXtCJe0a2csmdFVVLtJj_TmoA4yysbghw@mail.gmail.com>
To: Igor Malyushkin <gmalyushkin@gmail.com>
Cc: Kaliraj Vairavakkalai <kaliraj=40juniper.net@dmarc.ietf.org>, "idr@ietf. org" <idr@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000e4019b0613efc8d3"
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/Lr22vavTkn52pGenfuMa5bUTZTY>
Subject: Re: [Idr] I-D Action: draft-ietf-idr-bgp-fwd-rr-02.txt
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Mar 2024 14:01:00 -0000

Igor,

I think you are forgetting that those ABRs set next hop self. When you do
that there is no more label SWAP operation on transport label. The only
choice you have to do PHP or not on penultimate hop.

Thinking more about this entire model there is one more serious concern.

Because ABRs set next hop self metric of the link between such ABRs MUST be
higher then any NH cumulative metric in region-2. If not even under steady
state ABR will select as best peer's ABR paths as this is all IBGP here and
with IBGP it is not unusual to get down to next hop metric tie break in BGP
best path selection.

The design complexity here with different CLUSTER_IDs  on ABRs acting as
RRs (mutual clients) and Next-Hop-Self is really high and
requires very careful planning.

Cheers,
R.





On Mon, Mar 18, 2024 at 1:02 PM Igor Malyushkin <gmalyushkin@gmail.com>
wrote:

> I see this differently. ABR1 has two sets of the same
> infrastructure paths. One set is from original sources (outside his left
> area), and another is from ABR2. Imagine, that the first set is the best
> one, then ABR1 allocates a label for every prefix (and its label) from the
> set and distributes them as transport prefixes toward his right area (and
> to ABR2 too). Effectively, it makes ABR1 an LSR because it performs the
> SWAP for any incoming label. With the per-prefix label allocation mode, it
> is possible to compile these SWAPs with more than one outgoing label.
> Considering, we have the second set of the same paths from ABR2, we can use
> his labels as a backup. So there is a PIC egress for such labels.
>
> Maybe I confused you because I didn't mention labels instead of routes. My
> bad.
>
> To the authors,
>
> AS2 is further divided into two regions. There are three tunnel domains in
> provider's network: The two regions in AS1 use RSVP intra-domain tunnel.
> AS2 also uses RSVP-TE intra-domain tunnels. MPLS forwarding is used within
> these domains and on inter-domain links. BGP LU (AFI/SAFI: 1/4) is the
> transport family providing reachability between PE loopbacks PE25 and
> PE11.
>
> I see here a subtle mistake. There are no two regions in AS1 that can use
> RSVP LSPs, probably it is AS2.
>
> пн, 18 мар. 2024 г. в 15:21, Robert Raszuk <robert@raszuk.net>:
>
>> Hi Igor,
>>
>> On Mon, Mar 18, 2024 at 12:13 PM Igor Malyushkin <gmalyushkin@gmail.com>
>> wrote:
>>
>>> Well, maybe there is some gap in terminology. I always considered this
>>> behavior as a PIC, because we can switch between the next hops without any
>>> dependency on the number of prefixes above. An egress characteristic here
>>> is that it happens on a failed next-hop node (an ingress is not aware
>>> at the moment or is just starting to react). But we can find a better name
>>> for this to avoid confusion.
>>>
>>
>> I disagree.
>>
>> If you zoom into this specific scenario the described situation is that
>> say ABR1 looses (all or some)  IBGP sessions outside his left area. Within
>> those session(s) he may have gotten lots of infrastructure routes with lots
>> of next hops.
>>
>> So here it needs to run best path and install all routes one by one into
>> RIB and FIB now pointing towards a peer ABR2.
>>
>> There is no prefix independence here at all. There is no signalling in
>> neither IGP nor BGP that one next hop is lost and we need to use the other
>> one. That would be possible only on PEs not on ABRs.
>>
>> So while it is some sort of local protection it is not PIC.
>>
>> Regards,
>> R.
>>
>>
>>
>>> Speaking about the propagation of withdraws. As I've previously
>>> mentioned, traffic may be sent slightly before (a few milliseconds) or just
>>> in time of a failure. Without "protection" at egress, it will be lost if
>>> ABRs do not exchange their routes (e.g., because of the same CLUSTER ID).
>>> Another moment to consider is that the fast propagation not only depends
>>> on the diameter of the BGP network (the number of BGP hops from a source of
>>> the event to all its potential receivers) but also on the situation on
>>> every such hop (e.g., CPU spikes). In other words, it is not constant.
>>>
>>> пн, 18 мар. 2024 г. в 14:54, Robert Raszuk <robert@raszuk.net>:
>>>
>>>> > the egress PIC
>>>>
>>>> Except this is not real egress PIC.
>>>>
>>>> In egress PIC ASBRs or PEs receive EBGP paths and rarely act as RRs.
>>>>
>>>> Here we seem to have a case of option C and IBGP domain where ABRs are
>>>> usually redundantly connected and they learn routes over IBGP from each
>>>> site.
>>>>
>>>> I must admit that I have never seen a real practical analysis if in
>>>> such cases we should be doing PIC between ABRs acting as RRs. Especially
>>>> for infrastructure routes.
>>>>
>>>> And btw propagating withdraws via good RRs last time I measured was
>>>> taking at most single milliseconds.
>>>>
>>>> Cheers,
>>>> R.
>>>>
>>>>
>>>> On Mon, Mar 18, 2024 at 11:24 AM Igor Malyushkin <gmalyushkin@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> AFAIK, the egress PIC is a widely deployed feature with labeled paths.
>>>>> One of its characteristics is to preserve traffic in-flight, that was sent
>>>>> just in time of a failure event or slightly after that. Traffic is almost
>>>>> always faster than any control plane stuff. The significant problem with
>>>>> PIC in this case is a possible temporal loop if a destination node fails,
>>>>> but it is a separate topic.
>>>>>
>>>>> My 2 cents.
>>>>>
>>>>> пн, 18 мар. 2024 г. в 08:40, Kaliraj Vairavakkalai <kaliraj=
>>>>> 40juniper.net@dmarc.ietf.org>:
>>>>>
>>>>>> Hi Robert, please see inline. KV>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Kaliraj
>>>>>>
>>>>>>
>>>>>>
>>>>>> Juniper Business Use Only
>>>>>>
>>>>>> *From: *Robert Raszuk <robert@raszuk.net>
>>>>>> *Date: *Sunday, March 17, 2024 at 11:28 PM
>>>>>> *To: *Kaliraj Vairavakkalai <kaliraj@juniper.net>
>>>>>> *Cc: *idr@ietf. org <idr@ietf.org>
>>>>>> *Subject: *Fwd: I-D Action: draft-ietf-idr-bgp-fwd-rr-02.txt
>>>>>>
>>>>>> *[External Email. Be cautious of content]*
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi Kaliraj,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thx for posting the new version.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I have one observation or clarification to be made in respect to text
>>>>>> you added in section 4.1:
>>>>>>
>>>>>>
>>>>>>
>>>>>> > However this approach does not allow the ABR-ABR tunnels to be
>>>>>>
>>>>>> > used as backup path, in the event where an ABR looses all tunnels
>>>>>>
>>>>>> > to upstream ASBR.
>>>>>>
>>>>>>
>>>>>>
>>>>>> So you are talking about the delta time it takes for ABR which
>>>>>> looses all tunnels to upstream ASBRs to send BGP withdraws for those
>>>>>> learned infrastructure routes - correct ?
>>>>>>
>>>>>>
>>>>>>
>>>>>> KV> Yes. Those withdrawals need to anyway happen, and reach both the
>>>>>> ingress PEs and adjoining/redundant ABR.
>>>>>>
>>>>>> KV> So that they can do BGP PIC repair based on that event.
>>>>>>
>>>>>> KV> Here I am saying that such BGP PIC repair can happen only at
>>>>>> ingress PE
>>>>>>
>>>>>> KV> (which may be multiple BGP hops away), and not at the adjoining
>>>>>> ABR.
>>>>>>
>>>>>>
>>>>>>
>>>>>> So we are talking 10s of milliseconds here from the moment all such
>>>>>> paths are invalidated (which  -the detection and invalidation is needed in
>>>>>> any scenario).
>>>>>>
>>>>>>
>>>>>>
>>>>>> KV> The BGP update propagation can take longer, based on load on the
>>>>>> BGP propagation path. But BGP PIC itself can’t always
>>>>>>
>>>>>> KV> guarantee 10s of ms restoration. It only guarantees restoring the
>>>>>> traffic without depending on service-prefix scale
>>>>>>
>>>>>> KV> once the unreachability is detected (in this case: BGP withdrawal
>>>>>> is received).
>>>>>>
>>>>>>
>>>>>>
>>>>>> As you have established each ABR will set next hop self and advertise
>>>>>> routes to local PEs (directly or via yet one more pair of RRs (RR26 here)).
>>>>>> So each PE will already have backup paths all what you are observing here
>>>>>> is the time before PEs invalidate paths advertised by ASBR which
>>>>>> looses upstream tunnels.
>>>>>>
>>>>>>
>>>>>>
>>>>>> KV> Agreed.
>>>>>>
>>>>>>
>>>>>>
>>>>>> So if such failure models are really likely to happen (in spite of
>>>>>> redundant ABR connectivity in each area)  I would rather focus on fast
>>>>>> removal of broken paths from the network with one next hop invalidation
>>>>>> (single BGP or IGP message, single RIB to FIB switchover on PEs) etc ...
>>>>>>
>>>>>>
>>>>>>
>>>>>> KV> As explained above, that would also happen. But it may take
>>>>>> longer than if the repair happened at the ABR, which is closer to the
>>>>>> failure event.
>>>>>>
>>>>>> KV> Just a tradeoff to be aware of. Thx.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thx,
>>>>>> Robert
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------- Forwarded message ---------
>>>>>> From: <internet-drafts@ietf.org>
>>>>>> Date: Sun, Mar 17, 2024 at 6:42 AM
>>>>>> Subject: I-D Action: draft-ietf-idr-bgp-fwd-rr-02.txt
>>>>>> To: <i-d-announce@ietf.org>
>>>>>> Cc: <idr@ietf.org>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Internet-Draft draft-ietf-idr-bgp-fwd-rr-02.txt is now available. It
>>>>>> is a work
>>>>>> item of the Inter-Domain Routing (IDR) WG of the IETF.
>>>>>>
>>>>>>    Title:   BGP Route Reflector with Next Hop Self
>>>>>>    Authors: Kaliraj Vairavakkalai
>>>>>>             Natrajan Venkataraman
>>>>>>    Name:    draft-ietf-idr-bgp-fwd-rr-02.txt
>>>>>>    Pages:   9
>>>>>>    Dates:   2024-03-16
>>>>>>
>>>>>> Abstract:
>>>>>>
>>>>>>    The procedures in BGP Route Reflection (RR) spec RFC4456 primarily
>>>>>>    deal with scenarios where the RR is reflecting BGP routes with next
>>>>>>    hop unchanged.  In some deployments like Inter-AS Option C
>>>>>>    (Section 10, RFC4364), the ABRs may perform RR functionality with
>>>>>>    nexthop set to self.  If adequate precautions are not taken, the
>>>>>>    RFC4456 procedures can result in traffic forwarding loop in such
>>>>>>    deployments.
>>>>>>
>>>>>>    This document illustrates one such looping scenario, and specifies
>>>>>>    approaches to minimize possiblity of traffic forwarding loop in
>>>>>> such
>>>>>>    deployments.  An example with Inter-AS Option C (Section 10,
>>>>>> RFC4364)
>>>>>>    deployment is used, where RR with next hop self is used at
>>>>>> redundant
>>>>>>    ABRs when they re-advertise BGP transport family routes between
>>>>>>    multiple IGP domains.
>>>>>>
>>>>>> The IETF datatracker status page for this Internet-Draft is:
>>>>>> https://datatracker.ietf.org/doc/draft-ietf-idr-bgp-fwd-rr/
>>>>>> <https://urldefense.com/v3/__https:/datatracker.ietf.org/doc/draft-ietf-idr-bgp-fwd-rr/__;!!NEt6yMaO-gk!Hv7GNYr6n89i4QRD_aXV0QhV0N_J6YWRal9RjghMoB6DdmitfkQrjPi8YKCDbwPDc6YiEq2NYiMTgzkj$>
>>>>>>
>>>>>> There is also an HTMLized version available at:
>>>>>> https://datatracker.ietf.org/doc/html/draft-ietf-idr-bgp-fwd-rr-02
>>>>>> <https://urldefense.com/v3/__https:/datatracker.ietf.org/doc/html/draft-ietf-idr-bgp-fwd-rr-02__;!!NEt6yMaO-gk!Hv7GNYr6n89i4QRD_aXV0QhV0N_J6YWRal9RjghMoB6DdmitfkQrjPi8YKCDbwPDc6YiEq2NYmrSzwvH$>
>>>>>>
>>>>>> A diff from the previous version is available at:
>>>>>> https://author-tools.ietf.org/iddiff?url2=draft-ietf-idr-bgp-fwd-rr-02
>>>>>> <https://urldefense.com/v3/__https:/author-tools.ietf.org/iddiff?url2=draft-ietf-idr-bgp-fwd-rr-02__;!!NEt6yMaO-gk!Hv7GNYr6n89i4QRD_aXV0QhV0N_J6YWRal9RjghMoB6DdmitfkQrjPi8YKCDbwPDc6YiEq2NYmTSTQwU$>
>>>>>>
>>>>>> Internet-Drafts are also available by rsync at:
>>>>>> rsync.ietf.org::internet-drafts
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> I-D-Announce mailing list
>>>>>> I-D-Announce@ietf.org
>>>>>> https://www.ietf.org/mailman/listinfo/i-d-announce
>>>>>> <https://urldefense.com/v3/__https:/www.ietf.org/mailman/listinfo/i-d-announce__;!!NEt6yMaO-gk!Hv7GNYr6n89i4QRD_aXV0QhV0N_J6YWRal9RjghMoB6DdmitfkQrjPi8YKCDbwPDc6YiEq2NYmzhCeS9$>
>>>>>> _______________________________________________
>>>>>> Idr mailing list
>>>>>> Idr@ietf.org
>>>>>> https://www.ietf.org/mailman/listinfo/idr
>>>>>>
>>>>>