Re: [Lsr] Thoughts about PUAs - are we not over-engineering?

Peter Psenak <ppsenak@cisco.com> Wed, 15 June 2022 09:02 UTC

Return-Path: <ppsenak@cisco.com>
X-Original-To: lsr@ietfa.amsl.com
Delivered-To: lsr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 28C1BC14F725; Wed, 15 Jun 2022 02:02:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -12.229
X-Spam-Level:
X-Spam-Status: No, score=-12.229 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.745, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-1.876, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id prGc0smLD6CS; Wed, 15 Jun 2022 02:02:48 -0700 (PDT)
Received: from aer-iport-3.cisco.com (aer-iport-3.cisco.com [173.38.203.53]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 02BF7C14CF09; Wed, 15 Jun 2022 02:02:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=6919; q=dns/txt; s=iport; t=1655283765; x=1656493365; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=JEHn7PQPICJkaGmoy7CC5h5f/faGwGW9Nx198zixPqY=; b=Pvs4vHT+IthFinDi5ukZjSyXxNlW/QhZY39uC0jF0y2uta7jzkpP2geB c7jsQWd79fhG5DyBCvG6i1H2s1o8uKxCnDlAo6QwCtqrKueLyje7xzucm mN9wj5WXnuRh2zYKxPrACE9UN86b26wJF2BhJLfoTnUv8TWwqdS/Rgfry c=;
X-IPAS-Result: A0AFAAAZn6li/xbLJq1aGgEBAQEBAQEBAQEDAQEBARIBAQEBAgIBAQEBQIE7BQEBAQELAYF7gX4sEkSEToghX4deLgOQTIwfgXwLAQEBD0IEAQGFAgKFSSY0CQ4BAgQBAQEBAwIDAQEBAQEBAwEBBQEBAQIBBwSBCROFdYZCAQEBAQIBIwQLAQVBBQcECxEEAQEBAgImAgJPCAYBDAYCAQEXggpYgnYjA6xTen8ygQGIGYFlgREsAYpVgxWBCEOBSUSBFSeCUzA+hBsBGYNlgmUEmG8mBA8DGi00EoEhcQEIBgMDBwoFMgYCDBgUBAITElMdAhIFBwocDhQcJBkMDwMSAxEBBwILEggVLAgDAgMIAwIDIwsCAxcJBwoDHQgKHBIQFAIEEx4LCAMZHywJAgQOA0MICwoDEQQDExgLFggQBAYDCS8NKAsDBQ8NAQYDBgIFBQEDIAMUAwUnBwMhBwsmDQ0EHAcdAwMFJgMCAhsHAgIDAgYXBgICbwomDQgECAQcHSQQBQIHMQUELwIeBAUGEQkCFgIGBAUCBAQWAgISCAIIJxsHFhkdGQEFXQYLCSEcCh8LBgUGFgMjcwUKPg8pNTY8LyEbCoEPEQYiARsCmVgBEFsILBAmBIEiATUTLRACLTkFDJJJrl+DWIQYm2UGDwQtg3WMQYYxkXmHIo9KIIIrnzUYhRuBYTyBWTMaCBsVO4JoURkPnHJCMTsCBgEKAQEDCYw1AQEmgiABAQ
IronPort-Data: A9a23:XAHCva6S9txXG4Gu5xYrLAxRtCfGchMFZxGqfqrLsTDasY5as4F+v jAaWGDQafrcZ2Sme4pzaY+28h8DuJGDm4djQAJtqns2Zn8b8sCt6fZ1gavT04J+CuWZESqLO u1HMoGowPgcFyOa/FH1WlTYhSEU/bmSQbbhA/LzNCl0RAt1IA8skhsLd9QR2uaEuvDkRVLU0 T/Oi5eHYgX9hWcoajl8B5+r8XuDgtyj4Fv0gXRmDRx7lAe2v2UYCpsZOZawIxPQKmWDNrfnL wpr5OjRElLxp3/BOPv8+lrIWhFirorpAOS7oiE+t55OLfR1jndaPq4TbJLwYKrM4tmDt4gZJ N5l7fRcReq1V0HBsLx1bvVWL81xFfx845GWAymzjeKS/xHEK3vGkvtUIU5jaOX0+s4vaY1P3 fUVMnUGaQqOwr7wy7OgQe4qjcMmRCXpFNpA4Tc7nXeDVa1gG8qrr6bivbe02B8onttDG//dT 8EYcjFoKh/HZnWjP39NVM5uzbz17pX5W2BEs3ST+akb2U/e0BQg757iDcPKJOXfEK25mW7d/ Aoq5V/RDgsTOsDa1jOD/TeonfWKhTn2VoMCUaC+7PNji12azGgeIBwbSVX9puO24mayQdtRN wkM4jEjq6ExsUiwVJz8UAX9vWSJtxUcHsJKHuM7+ESEzKzT/gCSC0AFQyJPLts8u6ceQScw/ l6Eg92vAiZg2JWaVHSB+63Sszq0DiQYGnEPYSMJZQ0C4Njq5oo0i3ryos1LGaOvy9ztHivsh jaDsG41hq4YiogA0KDTEU37vg9Ab6PhFmYdjjg7lEr8hu+lTOZJv7CV1GU=
IronPort-HdrOrdr: A9a23:Wc9WVKhig9ChsRQVWs0gtmkr63BQXusji2hC6mlwRA09TyVXrb HMoB1p737JYVEqKRcdcLG7Sc69qBznmKKdjbNhWItKGTOW3FdAT7sP0WKB+Vfd8kTFn4Y36U 4jSdkdNDSaNzZHZKjBgDVQX+xO/DFCm5rY/ds3CBxWPHhXV50=
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-AV: E=Sophos;i="5.91,300,1647302400"; d="scan'208";a="2448935"
Received: from aer-iport-nat.cisco.com (HELO aer-core-4.cisco.com) ([173.38.203.22]) by aer-iport-3.cisco.com with ESMTP/TLS/DHE-RSA-SEED-SHA; 15 Jun 2022 09:02:42 +0000
Received: from [10.147.24.42] ([10.147.24.42]) by aer-core-4.cisco.com (8.15.2/8.15.2) with ESMTP id 25F92fI0023114; Wed, 15 Jun 2022 09:02:42 GMT
Message-ID: <3aec3021-2f80-e227-56cc-ba477fb3a251@cisco.com>
Date: Wed, 15 Jun 2022 11:02:41 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.9.1
Content-Language: en-US
To: "Van De Velde, Gunter (Nokia - BE/Antwerp)" <gunter.van_de_velde@nokia.com>, lsr <lsr@ietf.org>
Cc: "draft-ppsenak-lsr-igp-ureach-prefix-announce@ietf.org" <draft-ppsenak-lsr-igp-ureach-prefix-announce@ietf.org>, "draft-wang-lsr-prefix-unreachable@ietf.org" <draft-wang-lsr-prefix-unreachable@ietf.org>
References: <AM0PR07MB63863359D147F9EC0FF67689E0AA9@AM0PR07MB6386.eurprd07.prod.outlook.com> <16e06718-542f-e266-05fd-a1822bc4fd49@cisco.com> <AM0PR07MB6386AD4F6970AA87A9151E6DE0AD9@AM0PR07MB6386.eurprd07.prod.outlook.com>
From: Peter Psenak <ppsenak@cisco.com>
In-Reply-To: <AM0PR07MB6386AD4F6970AA87A9151E6DE0AD9@AM0PR07MB6386.eurprd07.prod.outlook.com>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
X-Outbound-SMTP-Client: 10.147.24.42, [10.147.24.42]
X-Outbound-Node: aer-core-4.cisco.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/lsr/1s9dC-z7I5Hq3HDG55qi7ixxm6w>
Subject: Re: [Lsr] Thoughts about PUAs - are we not over-engineering?
X-BeenThere: lsr@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Link State Routing Working Group <lsr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lsr>, <mailto:lsr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lsr/>
List-Post: <mailto:lsr@ietf.org>
List-Help: <mailto:lsr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lsr>, <mailto:lsr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Jun 2022 09:02:52 -0000

Hi Gunter,

please see inline:


On 15/06/2022 10:38, Van De Velde, Gunter (Nokia - BE/Antwerp) wrote:
> Hi Peter, All,
> 
>>From a BGP perspective (PE service nodes) the event detection when transport tunnel end-point suddenly becomes unreachable is an operational problem. I think we all agree.
> This problem exists in any multi-domain network, and is not limited to a multi-area/level IGP with summarization. Hence my doubts that simple encodings using the IGP as API for unreachability signaling is an optimal solution.

we are solving the problem for inter-area and/or inter-domain IGP 
networks. There are plenty of them.

> 
> Churning the LSDB for these things doesn't seem right.  Would this mean that we hack the IGP implementation so we don't trigger SPFs on rx of these updates?

I would not call adding a UPA announcement for a very rare event 
churning the LSDB. I really do not see the problem there.

UPA is a prefix advertisement with unreachable metric. Given that the 
prefix was never advertised with valid metric before (due to 
summarization) even PRC is not required.

> Another concern is how we hook into BGP sideways to update it. Typically a router just looks at RTM and tunnel-tables for reachability. Now it would have check all the time a separate bypass-list.

that is a matter of implementation.

> What about the pseudo-state. On startup I would imagine we would have to originate this PUA until a certain point?


UPA is only advertised if the component prefix of the summary that was 
reachable in its source area/domain becomes unreachable. Nothing is sent 
on startup.

> 
> Some consideration about installing the PUA route as a blackhole route, it does not seem an option because resolution of BGP next-hops with blackhole /32 routes has to continue to mean “drop” matching traffic because of the widespread way this is used for DDOS protection. So there is need another “install” type for the “unreachable” IGP prefix which does not exist yet.

again, UPA processing is a matter of the implementation and is out of 
the scope of the draft. All you need to do is to trigger BGP PIC for 
destinations that use the UPA prefix as its NH. Isn't that hard.

> 
> To make IGP based Prefix-unreachability-signal successful seems not a trivial task pe-to-pe, and involves more than simplistic dumping of opaque link-state messages into IGP and to re-vector interior routing as an API. I'm a bit tormented regarding the potential evil caused to IGP for signaling prefix-unreachability. It may not be worth the effort. Especially when realizing that the problem space is not limited to multi-area/level summarization but instead exists in any multi-domain network.

once you implement it, you realize that it was not that hard at all.

thanks,
Peter


> 
> Maybe IETF should consider looking at the bigger picture, at service level, and document a full service level solution framework instead of looking only at IGP in atomic fashion.
> 
> G/
> 
> -----Original Message-----
> From: Peter Psenak <ppsenak@cisco.com>
> Sent: Tuesday, June 14, 2022 5:46 PM
> To: Van De Velde, Gunter (Nokia - BE/Antwerp) <gunter.van_de_velde@nokia.com>; lsr <lsr@ietf.org>
> Cc: draft-ppsenak-lsr-igp-ureach-prefix-announce@ietf.org; draft-wang-lsr-prefix-unreachable-annoucement <draft-wang-lsr-prefix-unreachable-annoucement@ietf.org>
> Subject: Re: Thoughts about PUAs - are we not over-engineering?
> 
> Hi Gunter,
> 
> please see inline:
> 
> On 14/06/2022 10:59, Van De Velde, Gunter (Nokia - BE/Antwerp) wrote:
>> Hi All,
>>
>> When reading both proposals about PUA's:
>> * draft-ppsenak-lsr-igp-ureach-prefix-announce-00
>> * draft-wang-lsr-prefix-unreachable-annoucement-09
>>
>> The identified problem space seems a correct observation, and indeed summaries hide remote area network instabilities. It is one of the perceived benefits of using summaries. The place in the network where this hiding takes the most impact upon convergence is at service nodes (PE's for L3/L2/transport) where due to the summarization its difficult to detect that the transport tunnel end-point suddenly becomes unreachable. My concern however is if it really is a problem that is worthy for LSR WG to solve.
> 
> the request to address the problem is coming from the field. The scale of the networks in the field is growing significantly and the summarization is being implemented to keep the prefix scale under control.
> 
> 
>>
>> To me the "draft draft-wang-lsr-prefix-unreachable-annoucement-09" is
>> not a preferred solution due to the expectation that all nodes in an
>> area must be upgraded to support the IGP capability. From this
>> operational perspective the draft
>> "draft-ppsenak-lsr-igp-ureach-prefix-announce-00" is more elegant, as
>> only the A(S)BR's and particular PEs must be upgraded to support
>> PUA's. I do have concerns about the number of PUA advertisements in
>> hierarchically summarized networks (/24 (site) -> /20 (region) -> /16
>> (core)). More specific, in the /16 backbone area, how many of these
>> PUAs will be floating around creating LSP LSDB update churns? How to
>> control the potentially exponential number of observed PUAs from
>> floating everywhere? (will this lead to OSPF type NSSA areas where
>> areas will be purged from these PUAs for scaling stability?)
> 
> Node going down is a rare event. The expected number of UPAs at any given time is very small. Implementations can limit the number of UPAs on ABR/ASBR in case of a catastrophic events, in which case the UPAs would hardly help anyway.
> 
>>
>> Long story short, should we not take a step back and re-think this identified problem space? Is the proposed solution space not more evil as the problem space? We do summarization because it brings stability and reduce the number of link state updates within an area. And now with PUA we re-introduce additional link state updates (PUAs), we blow up the LSDB with information opaque to SPF best-path calculation. In addition there is suggestion of new state-machinery to track the igp reachability of 'protected' prefixes and there is maybe desire to contain or filter updates cross inter-area boundaries. And finally, how will we represent and track PUA in the RTM?
> 
> the problem space is valid, as conformed by the field. As described
> above, the number of UPAs will be low, so there is no danger of
> defeating the purpose of the summarization.
> 
>>
>> What is wrong with simply not doing summaries and forget about these PUAs to pinch holes in the summary prefixes? this worked very well during last two decennia. Are we not over-engineering with PUAs?
> 
> it's the scale of the current networks, which is growing exponentially,
> which demands the use of the summarization.
> 
> 
> thanks,
> Peter
> 
>>
>> G/
>>
> 
>