Re: [Idr] WG Adoption call for draft-wang-idr-rd-orf-05.txt (2/4/2021 to 2/18/2021)

Aijun Wang <wangaijun@tsinghua.org.cn> Tue, 16 February 2021 00:36 UTC

Return-Path: <wangaijun@tsinghua.org.cn>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 937533A0ABD for <idr@ietfa.amsl.com>; Mon, 15 Feb 2021 16:36:15 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.795
X-Spam-Level:
X-Spam-Status: No, score=-1.795 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, HTTPS_HTTP_MISMATCH=0.1, MIME_QP_LONG_LINE=0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 02427NCOCpZL for <idr@ietfa.amsl.com>; Mon, 15 Feb 2021 16:36:11 -0800 (PST)
Received: from mail-m17638.qiye.163.com (mail-m17638.qiye.163.com [59.111.176.38]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C83653A0ADB for <idr@ietf.org>; Mon, 15 Feb 2021 16:36:08 -0800 (PST)
Received: from [240.0.0.1] (unknown [111.194.51.239]) by mail-m17638.qiye.163.com (Hmail) with ESMTPA id F3CED1C0091; Tue, 16 Feb 2021 08:36:04 +0800 (CST)
Content-Type: multipart/alternative; boundary="Apple-Mail-6B1FAE78-09D3-4B66-BA8B-504A60C8F459"
Content-Transfer-Encoding: 7bit
From: Aijun Wang <wangaijun@tsinghua.org.cn>
Mime-Version: 1.0 (1.0)
Date: Tue, 16 Feb 2021 08:36:04 +0800
Message-Id: <125BC9A7-FD6F-42A4-B4D3-5690BE5C5049@tsinghua.org.cn>
References: <MW4PR02MB739409FC055CBA1EB54ADA8DC6889@MW4PR02MB7394.namprd02.prod.outlook.com>
Cc: "Jakob Heitz (jheitz)" <jheitz=40cisco.com@dmarc.ietf.org>, Robert Raszuk <robert@raszuk.net>, idr@ietf.org, Susan Hares <shares@ndzh.com>
In-Reply-To: <MW4PR02MB739409FC055CBA1EB54ADA8DC6889@MW4PR02MB7394.namprd02.prod.outlook.com>
To: "UTTARO, JAMES" <ju1738@att.com>
X-Mailer: iPhone Mail (18D52)
X-HM-Spam-Status: e1kfGhgUHx5ZQUtXWQgYFAkeWUFZS1VLWVdZKFlBSkxLS0o3V1ktWUFJV1 kPCRoVCBIfWUFZSk0dTksfT0lJHxlJVkpNSkhPSE5MTU5KQ0lVEwETFhoSFyQUDg9ZV1kWGg8SFR 0UWUFZT0tIVUpKS0JITVVLWQY+
X-HM-Sender-Digest: e1kMHhlZQR0aFwgeV1kSHx4VD1lBWUc6NRQ6Gio6DD8TTCMWKilRKj40 TAgaCilVSlVKTUpIT0hOTE1OTU1IVTMWGhIXVQwaFRwaEhEOFTsPCBIVHBMOGlUUCRxVGBVFWVdZ EgtZQVlKSkpVSkJPVU5KVUlIQllXWQgBWUFPQ0NMSTcG
X-HM-Tid: 0a77a8440109d993kuwsf3ced1c0091
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/CCG-5PmObH0nheguHsGUFWupklk>
Subject: Re: [Idr] WG Adoption call for draft-wang-idr-rd-orf-05.txt (2/4/2021 to 2/18/2021)
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Feb 2021 00:36:19 -0000

Hi, Jim:
It is clear that 1 million routes has caused an issue that let Jakob to debug.
Isn’t it?

Aijun Wang
China Telecom

> On Feb 16, 2021, at 05:15, UTTARO, JAMES <ju1738@att.com> wrote:
> 
> 
> Here is another fact..
>  
> It is more like 10+ Million without an issue.
>  
> Thanks,
>               Jim Uttaro
>  
> From: Idr <idr-bounces@ietf.org> On Behalf Of Jakob Heitz (jheitz)
> Sent: Monday, February 15, 2021 4:05 PM
> To: Aijun Wang <wangaijun@tsinghua.org.cn>; Robert Raszuk <robert@raszuk.net>
> Cc: idr@ietf.org; Susan Hares <shares@ndzh.com>
> Subject: Re: [Idr] WG Adoption call for draft-wang-idr-rd-orf-05.txt (2/4/2021 to 2/18/2021)
>  
> I asked you for the facts Aijun.
> Show me an outage caused by a PE having to drop incoming routes.
> You have not shown me the facts.
>  
> Here's a fact:
> Some time ago, I was debugging an issue with a customer of a route that went missing in a VRF.
> The feed from a route reflector included over a million routes.
> I asked them to do a clear bgp soft in to the route reflector.
> They were very hesitant, because it might overload the CPU on the PE.
> "Do it", I said. "Check the InQ and when it goes to zero, it's done".
> Finally, they did it.
> "Check the InQ" I asked them.
> "It's zero. When will it start?" they replied.
> "Check the route. Is it there now?" I asked.
> "Hey, there it is".
> It finished the million routes before they could check the InQ.
> Why so fast?
> Because the BGP receiver had very little inbound policy, did not have to process the incoming routes, except one,
> because they already existed in the BGP table.
> Where we see convergence issues is:
> - when BGP has to download incoming routes to FIB (FIB prioritizes forwarding, not programming of routes)
> - Heavy inbound route-policies (several MEGABYTES of configuration). It takes some time, but it works.
> - Recomputing and changing bestpath and propagating the new paths out to many RR clients.
>  
> Receiving and matching on an RD to drop a route is just so trivial in comparison.
>  
> In your case, the PE has ALREADY received the excess routes. Your rd-orf just tells the the source not to send ANY MORE.
> You know what happens then? The source sends a million withdraws that take similar processing power on the receiver.
>  
> Regards,
> Jakob.
>  
> From: Aijun Wang <wangaijun@tsinghua.org.cn> 
> Sent: Saturday, February 13, 2021 4:46 AM
> To: Robert Raszuk <robert@raszuk.net>
> Cc: Gyan Mishra <hayabusagsm@gmail.com>; Jakob Heitz (jheitz) <jheitz@cisco.com>; Susan Hares <shares@ndzh.com>; idr@ietf.org
> Subject: Re: [Idr] WG Adoption call for draft-wang-idr-rd-orf-05.txt (2/4/2021 to 2/18/2021)
>  
> Hi, Robert:
> Let’s discuss based on the facts, not the exaggeration declarations.
> Details replying inline below[WAJ]
>  
> Aijun Wang
> China Telecom
>  
> 
> On Feb 13, 2021, at 19:42, Robert Raszuk <robert@raszuk.net> wrote:
> 
> 
> All,
>  
> >   The problem we are trying to solve is a scenario where you have an offending PE that is flooding routes and a weak PE that is overwhelmed by a flood of routes.
>  
> The problem is a valid problem. But the proposed solution to the problem is not. And moreover solutions to this very problem are widely known and used for years. 
>  
> The problem as a matter of fact has nothing to do with VPNs of any sort. 
> 
> [WAJ] We have said clearly RD-ORF mechanism applying to the scenarios that inter-AS option B/C scenarios or the scenario within one AS when RR present. All of these situations have the following characteristics:
> 1. Several VPNs share one BGP sessions, where the BGP max-prefix is too coarse to do the fine-granularity control.
> 2.Junk VPN routes can’t be known in advance where the RTC and prefix-based ORF are not enough for the potential threats.
> 3. The network should react automatically/quickly to alleviate the threats, to give the chance for the operators step in before it leads other VPN services disrupted.
>  
> So, how you get the conclusion that it has nothing to do with VPN?
>  
>  
> If you are ISP offering Internet transport unless you apply proper protection you would be badly exposed. Your clients or peers or upstreams injecting millions of routes and melting your network and perhaps even global Internet (if they would use a registered block). 
>  
> And yes BGP ingress policy here is used to filter out junk before it enters any network. The same policy must be used in VPN cases too. 
> [WAJ] RD-ORF doesn’t  preclude the usage of other protection methods. I have said this point several time. Please remember this.
> 
>  
> Indeed years have passed and I think I have only seen in a very few cases that operators offering L3VPNs are doing prefix ingress filtering ... Max prefix on ingress is used much more often. Those are the right tools here to work on. I am not saying we should not invent more ... we should.
> 
> [WAJ] OK.
>  
> 
>  
> Ideas: 
>  
> * Customise secure BGP to work in VPN cases as example.   
>  
> * Augment RRs to be a bit more intelligent with pitch of ML - if number of routes with given RD for time Tx is R moment you receive 100*R you suspend those and raise NOC alarm before spraying everywhere
>  
> [WAJ] RD-ORF mechanism is just one kind of ML capabilities, not only on RR, but also on other PE devices. We just want the network more intelligent and can cope with some extra situations automatically, not always static configuration.
>  
> 
>  
> * Do not put VPN customer routes into your data plane ... just handle next hops. Redefine RFC4364 all together and use IP transport for it. 
>  
> etc ... 
>  
> Focus on not allowing the meltdown is the proper solution space ... But here instead we do nothing to prevent the fire to start and instead focus on tools to extinguish it.
> 
> [WAJ] As you mentioned, there existing several methods to prevent the fire to start. But there still some chances the fire is ignited. Don’t you agree? RD-ORF can act as the spray system when the fire is beginning.
>  
> 
> Wrong approach. And specifically this tool (RD-ORF) does way too much damage when used. 
> 
> [WAJ] How to get this baseless conclusion? We have explained to your scenarios in previous mail. Do you have others or have still concerns the previous one?
>  
> 
>  
> Thx,
> R.
>  
>  
>  
>  
>  
>  
> On Sat, Feb 13, 2021 at 4:16 AM Gyan Mishra <hayabusagsm@gmail.com> wrote:
>  
> All
>  
> From Susan Hares summary of where we are at with the adoption call let’s start with the problem this draft is trying to solve and gaining consensus.  Once we gain consensus we can get back to RD-ORF solution.  See w
>  
> a) the problem this draft is drafting to solve relating to BGP routes,
> The problem we are trying to solve is a scenario where you have an offending PE that is flooding routes and a weak PE that is overwhelmed by a flood of routes.  This is not a normal situation and is an outage situation where the weak PE being overwhelmed by a flood of routes.  Do we all agree to the problem statement?
>  
> Why and why not?
>  
> b) the need for additional mechanisms to solve the problem,
> Do other methods exist that can solve the problem and if not do we need a new mechanism to solve this?
> RTC, Peer maximum prefix, VPN maximum prefix
> c) a clear description of the technology to solve the problem.
>  
> Do we all agree that in a normal situation we would never filter on RD as that would partition the VPN which is unwanted and what Robert mentioned.  As this is not a normal situation but a unique situation where a weak PE is overwhelmed by a flood of routes.  How best can this be solved?
>  
>  
>  
> On Fri, Feb 12, 2021 at 10:32 AM Aijun Wang <wangaijun@tsinghua.org.cn> wrote:
> Hi, Robert:
> Yes, the behavior of the device should be determined. There maybe several factors to be considered for this local behavior, we should describe it more clearly in this section later.
> We have discussed the differences between RTC and RD-ORF a lot. As Haibo mentioned, they are not exclusive to each other, and can be used together in some situations. But they are different and can’t replace each other.
> 
> Aijun Wang
> China Telecom
>  
> 
> On Feb 12, 2021, at 23:04, Robert Raszuk <robert@raszuk.net> wrote:
> 
> 
> Sorry Aijun,
>  
> What you say is just handwaving. There is no room for it in any spec.
>  
> When code is written PE must deterministically behave so the RR or any other network element. 
>  
> Statements "decisions of PE2 to judge" are not acceptable in protocol design. 
>  
> Just imagine that each PE does what it feels like in a distributed network .... Same for BGP same for IGP etc .... 
>  
> And all of this is not needed if on ingress between PE1 and HQ1 you apply max prefix of 2 or even 100. It is also not needed if you enable  RTC to send RT:TO_HUB from PE2 to RR.
>  
> But I understand - no matter what we say or how much we spend time to explain why this idea is a bad idea you are still going to push this fwd. Oh well ...   If I were you I would spend this time to redefine L3VPN such that customer routes are never needed to be sent to SP core routers. 
>  
> Thx,
> R.
>  
>  
> On Fri, Feb 12, 2021 at 3:47 PM Aijun Wang <wangaijun@tsinghua.org.cn> wrote:
> Hi, Robert:
>  
> https://datatracker.ietf.org/doc/html/draft-wang-idr-rd-orf-05#section-5.1.1 has described such situations, which will require the additional local decisions of PE2 to judge whether to send the RD-ORF message out.
> In your example, if only the HUB VRF exceed but the resources of PE2 is not exhausted, then the PE2 will not send the RD-ORF message. It may just discard the excessive 100000/32 routes.
> If the resources of PE2 is nearly exhausted, it must send the RD-ORF message out. Or else not only the Spoke VRF, but also other VPNs on this device can’t be used.
>  
> Regarding to RR, it is the same principle: if RR can cope with such flooding, it need not send out RD-ORF to PE1. If RR can’t cope with, it must send out the RD-ORF message, or else not only the VPN that import RD X1 routes can’t work, but also other VPNs that don’t import RD x1 routes.
>  
> RD-ORF mechanism just keep the influences as small as possible.
>  
> Wish the above explanation can refresh your review of this draft.
>  
> We are also hopeful to invite you join us to make RD-ORF mechanism more robust and meet the critical challenges.
> 
> Aijun Wang
> China Telecom
>  
> 
> On Feb 12, 2021, at 19:30, Robert Raszuk <robert@raszuk.net> wrote:
> 
> 
> Aijun & Gyan,
>  
> Let me try one more (hopefully last time) to explain to both of you - and for that matter to anyone how supported this adoption. 
>  
> Let's consider very typical Hub and Spoke scenario as illustrated below: 
>  
> <image.png>
>  
>  
> HQ1 is advertising two routes:
>  
> - one default with RDX1 with RT TO_SPOKE 
> - one or more specifics with RDX1 to the other HUBs
>  
> Now imagine HQ1 bought a new BGP "Optimizer" and suddenly is starting to advertise 100000 /32 routes just to the other HUB with RT: TO_HUB. 
>  
> <image.png>
>  
>  
>  
> So PE2 detects this as VRF with RDX2 on it got overwhelmed during import with RT TO_HUB and starts pushing RDX1 (original RD) to RR to stop getting those routes. 
>  
> Well all great except now you are throwing baby with the water as all spokes attached to PE2 which just import default route to HUB HQ1 also can no longer reach their hub site as their default route will be removed. Therefor they will have nothing to import with RT:TO_SPOKE
>  
> Further if RR "independently" decided ... oh let's push this ORF to PE1 then all of the spokes attached to perhaps even much more powerful PE3 can also no longer reach their headquarters. 
>  
> - - - 
>  
> Summary: 
>  
> The above clearly illustrates why the proposed solution to use RD for filtering is in fact harmful. 
>  
> See when you design new protocol extensions the difficulty is to not break any existing protocols and deployments.
>  
> Hope this puts this long thread to rest now. 
>  
>  
> Thx,
> Robert
>  
> --
> <image002.jpg>
> 
> Gyan Mishra
> Network Solutions Architect 
> M 301 502-1347
> 13101 Columbia Pike 
> Silver Spring, MD
>