Re: [Lsr] https://tools.ietf.org/html/draft-wang-lsr-prefix-unreachable-annoucement-05

Tony Przygienda <tonysietf@gmail.com> Tue, 09 March 2021 11:13 UTC

Return-Path: <tonysietf@gmail.com>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 642373A1A34; Tue, 9 Mar 2021 03:13:19 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QOypSEO4Mjzo; Tue, 9 Mar 2021 03:13:16 -0800 (PST)
Received: from mail-il1-x133.google.com (mail-il1-x133.google.com [IPv6:2607:f8b0:4864:20::133]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C21243A1A46; Tue, 9 Mar 2021 03:13:16 -0800 (PST)
Received: by mail-il1-x133.google.com with SMTP id i18so11768350ilq.13; Tue, 09 Mar 2021 03:13:16 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=IwDcpx6TkLnO40Otaqfjfqp/gqgQaa5cLrQEh+R6LAU=; b=cSgZoW6ECPdYMBm/8lzmQQKIPdWywFQ7LarnMUPZ3XmDj9a9PWGd1lFW26WzwYZZin qwpsykCMJ5Ag10dYHN5K+fU2U25FKpc87w9OQ0H9M1Ys1k7HWDQivac4fmFixo+pfp8i 8vXNwodwrj3u1UM0lqrWyZwdbbfrAm5ptEjuYkcJwlAkdBi9hfZzQBtZpKiEEvsBQQAI Fl7ZZZtkWnv2woOIL7S1ZDG/4oPXGmIqAU/XN3Dujw7WJcMoZlKIDkSSgduNFkLK1Cub Aqbm/irJrX+KX8iPnrccgRMvvw6V0cO3kymlTDJXgRqBT6meXfVDhR8LsIm9fCdwnvd8 f19g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=IwDcpx6TkLnO40Otaqfjfqp/gqgQaa5cLrQEh+R6LAU=; b=etdSHmSGL8LX+VNfocKOvlSOwsUJ23bTNL+jk8vaSiOtOmjnVBUpMt1OXJba/hVAEI UWJXRLwvKz9vldEJ+Q3JdKZ6VQAQ/z/dJYI6O6PLv3OV2m/53cMn3gPMwzoS+sPbbVw6 cxtWVbvwaoZBoZAX9/VEH90yF/9vBLA7hnWrHyfGotZjRsvgMTI0WUsXrOgAtMy7R5WJ ujPrkrgSw4aSuEBADLWpyynSMN4YpcJW9afpEMtAycWHAeRZGF5ussrChCfUHPIppGsb dYcCBrj3pv254iRqgMwZyPaMRXlmGxrlGkNcsaV6Ym5lVWTsampzPVFybHQP+DNQOpvr Adxw==
X-Gm-Message-State: AOAM5326xQtploAo7Nj02vjIrrePeid144uAmLKR8ProsQsLwKyUia8m BllG/uSj31+wLA1s05OeQp/ljIgadthFJWUlPJgtxpZ2uw0ZwSIa
X-Google-Smtp-Source: ABdhPJyZN+goJAUm3iIJsCAcroue0A/sCVdsOIRv09XhYUHTID6nzUfK9nbT1TlKS88lSGrGki4YGLHIRHABOiYXUMY=
X-Received: by 2002:a05:6e02:180d:: with SMTP id a13mr23940939ilv.156.1615288395543; Tue, 09 Mar 2021 03:13:15 -0800 (PST)
MIME-Version: 1.0
References: <22FDE3EA-B5D1-4E4D-B698-1D79173E8637@tony.li> <6E0281D2-7755-499A-B084-CA8472949683@chinatelecom.cn> <D6B0D95F-68AD-4A18-B98C-69835E8B149B@tony.li> <018801d71499$9890feb0$c9b2fc10$@tsinghua.org.cn> <CABNhwV2SpcDcm-s-WkWPpnVLpYB2nZGz2Yv0SfZah+-k=bGx4A@mail.gmail.com> <BFB3CE24-446A-4ADA-96ED-9CF876EA6A00@tony.li> <CAOj+MMGeR4bodbgpPqDCtLZD6XmX6fkjyxLWZAKa4LC2R1tBzg@mail.gmail.com> <ecf2e8b4-fdae-def6-1a29-ec1ae37f5811@cisco.com> <CAOj+MMFSEqVkM62TDAc6yn19Hup+v-9w=kiq_q6dVn39LcOkqQ@mail.gmail.com>
In-Reply-To: <CAOj+MMFSEqVkM62TDAc6yn19Hup+v-9w=kiq_q6dVn39LcOkqQ@mail.gmail.com>
From: Tony Przygienda <tonysietf@gmail.com>
Date: Tue, 9 Mar 2021 12:12:39 +0100
Message-ID: <CA+wi2hNVfbh+F6K=02GDKpmV_0MAMnyT8d_GhF3YE6ouksXHcA@mail.gmail.com>
To: Robert Raszuk <robert@raszuk.net>
Cc: Peter Psenak <ppsenak@cisco.com>, Gyan Mishra <hayabusagsm@gmail.com>, Aijun Wang <wangaijun@tsinghua.org.cn>, Aijun Wang <wangaj3@chinatelecom.cn>, Tony Li <tony.li@tony.li>, lsr <lsr@ietf.org>, "Acee Lindem (acee)" <acee@cisco.com>, draft-wang-lsr-prefix-unreachable-annoucement <draft-wang-lsr-prefix-unreachable-annoucement@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000006b92105bd18a3cd"
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/urk_EIm0cE4TWxIB4_mDbAknR1o>
Subject: Re: [Lsr] https://tools.ietf.org/html/draft-wang-lsr-prefix-unreachable-annoucement-05
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Mar 2021 11:13:21 -0000

I know it's not fashionable (yet) but put multipoint BFD on BIER & run 2
subdomains and you got 10K. Add subdomains to taste which will also allow
to partition it across chips easily. Yepp, needs silicon that will sustain
reasonable rates but you have a pretty good darn' solution. IGP gives you
reachability for BIER already, you got minimal replication and you can tune
your timers to heart's delight.

Sticking stuff in IGP (to second Tony #1) is very satisfying, especially
since some of that could work some time @ no load on IGP. All those "let's
add a million things to IGP" only catches you when you realize in serious
outage your IGP is busy figuring out/flooding junk rather than getting you
basic connectivity. Yes, good implementation technique and careful design
of protocol can help (e.g. take ISIS extensions that allow you for a large
LSP space where you know what priority things are that need to go out/come
in/compute, here is sympathize with the new idea of separate instance for
"junk hauling" BTW ;-) but mashing overlay into underlay of your most time
sensitive and delicate piece of network control to use it as overlay
signalling protocol does not have a promising history. Confounding the
whole thing on top with adding a route type as signalling means is a bit
injury on top of insult or vice versa

-- tony

On Tue, Mar 9, 2021 at 12:03 PM Robert Raszuk <robert@raszuk.net> wrote:

> Hey Peter,
>
> Well ok so let's forget about LDP - cool !
>
> So IGP sends summary around and that is all what is needed.
>
> So the question why not propage information that PE went down in service
> signalling - today mainly BGP.
>
> >   And forget BFD, does not scale with 10k PEs.
>
> You missed the point. No one is proposing full mesh of BFD sessions
> between all PEs. I hope so at least.
>
> PE is connected to RRs so you need as many BFD sessions as RR to PE BGP
> sessions. Once that session is brought down RR has all it needs to trigger
> a message (withdraw or implicit withdraw) to remove the broken service
> routes in a scalable way.
>
> Thx,
> R.
>
> PS. Yes we still need to start support signalling of unreachability in BGP
> itself when BGP is used for underlay but this is a bit different use case
> and outside of scope of LSR
>
>
> On Tue, Mar 9, 2021 at 11:55 AM Peter Psenak <ppsenak@cisco.com> wrote:
>
>> Robert,
>>
>> On 09/03/2021 11:47, Robert Raszuk wrote:
>> >  > You’re trying to fix a problem in the overlay by morphing the
>> > underlay.  How can that seem like a good idea?
>> >
>> > I think this really nails this discussion.
>> >
>> > We have discussed this before and while the concept of signalling
>> > unreachability does seem useful such signalling should be done where it
>> > belongs.
>> >
>> > Here clearly we are talking about faster connectivity restoration for
>> > overlay services so it naturally belongs in overlay.
>> >
>> > It could be a bit misleading as this is today underlay which propagates
>> > reachability of PEs and overlay relies on it. And to scale,
>> > summarization is used hence in the underlay, failing remote PEs remain
>> > reachable. That however in spite of many efforts in lots of networks
>> are
>> > really not the practical problem as those networks still relay on exact
>> > match of IGP to LDP FEC when MPLS is used. So removal of /32 can and
>> > does happen.
>>
>> think SRv6, forget /32 or /128 removal. Think summarization.
>>
>> I'm not necessary advocating the solution proposed in this particular
>> draft, but the problem is valid. We need fast detection of the PE loss.
>>
>> And forget BFD, does not scale with 10k PEs.
>>
>> thanks,
>> Peter
>>
>>
>>
>> >
>> > In the same time BGP can pretty quickly (milliseconds) remove affected
>> > service routes (or rather paths) hence connectivity can be restored to
>> > redundantly connected endpoints in sub second. Such removal can be in a
>> > form of atomic withdraw (or readvertisement), removal of recursive
>> > routes (next hop going down) or withdraw of few RD/64 prefixes.
>> >
>> > I am not convinced and I have not seen any evidence that if we put this
>> > into IGP it will be any faster across areas or domains (case of
>> > redistribution over ASBRs to and from IGP to BGP). One thing for sure -
>> > it will be much more complex to troubleshoot.
>> >
>> > Thx,
>> > R.
>> >
>> > On Tue, Mar 9, 2021 at 5:39 AM Tony Li <tony.li@tony.li
>> > <mailto:tony.li@tony.li>> wrote:
>> >
>> >
>> >     Hi Gyan,
>> >
>> >      >     Gyan> In previous threads BFD multi hop has been mentioned to
>> >     track IGP liveliness but that gets way overly complicated especially
>> >     with large domains and not viable.
>> >
>> >
>> >     This is not tracking IGP liveness, this is to track BGP endpoint
>> >     liveness.
>> >
>> >     Here in 2021, we seem to have (finally) discovered that we can
>> >     automate our management plane. This ameliorates a great deal of
>> >     complexity.
>> >
>> >
>> >      >     Gyan> As we are trying to signal the IGP to trigger the
>> >     control plane convergence, the flooding machinery in the IGP already
>> >     exists well as the prefix originator sub TLV from the link or node
>> >     failure.  IGP seems to be the perfect mechanism for the control
>> >     plane signaling switchover.
>> >
>> >
>> >     You’re trying to fix a problem in the overlay by morphing the
>> >     underlay.  How can that seem like a good idea?
>> >
>> >
>> >      >       Gyan>As I mentioned advertising flooding of the longer
>> >     prefix defeats the purpose of summarization.
>> >
>> >
>> >     PUA also defeats summarization.  If you really insist on faster
>> >     convergence and not building a sufficiently redundant topology, then
>> >     yes, your area will partition and you will have to pay the price of
>> >     additional state for your longer prefixes.
>> >
>> >
>> >      > In order to do what you are stating you have to remove the
>> >     summarization and go back to domain wide flooding
>> >
>> >
>> >     No, I’m suggesting you maintain the summary and ALSO advertise the
>> >     longer prefix that you feel is essential to reroute immediately.
>> >
>> >
>> >      > which completely defeats the goal of the draft which is to make
>> >     host route summarization viable for operators.  We know the prefix
>> >     that went down and that is why with the PUA negative advertisement
>> >     we can easily flood a null0 to block the control plane from
>> >     installing the route.
>> >
>> >
>> >     So you can also advertise the more specific from the connected ABR…
>> >
>> >
>> >      > We don’t have any prior knowledge of the alternate for the egress
>> >     PE bgp next hop attribute for the customer VPN overlay.  So the only
>> >     way to accomplish what you are asking is not do any summarization
>> >     and flood al host routes.  Of course  as I stated defeats the
>> >     purpose of the draft.
>> >
>> >
>> >     Please read again.
>> >
>> >     Tony
>> >
>> >     _______________________________________________
>> >     Lsr mailing list
>> >     Lsr@ietf.org <mailto:Lsr@ietf.org>
>> >     https://www.ietf.org/mailman/listinfo/lsr
>> >
>>
>> _______________________________________________
> Lsr mailing list
> Lsr@ietf.org
> https://www.ietf.org/mailman/listinfo/lsr
>