Re: [Idr] Adoption and IPR call for draft-wang-idr-vpn-prefix-orf-03.txt (8/16 to 8/30)

Hi Igor,
&nbsp; &nbsp; Actually, the principle of VPN Prefix ORF mechanism is: if there is available space in VRF, it should allow routes to come in.
&nbsp; &nbsp; And regarding to your mentioned scenario:&nbsp;
&nbsp; &nbsp; 1) The room leaved by withdrawn of excessive routes from the first PE should and can be used by the routes from the second PE that under its quota. This is reasonable and acceptable.
&nbsp; &nbsp; 2) The nexhop of such routes are different, they should be treated as the update of the route, or two different routes.
&nbsp; &nbsp; 3) The routes from the second PE is accepted as below its quota. Then it is unlikely be withdrawn again by the RR when the second PE exceed its quota. Normally, the RR should withdrawn the newly advertised excessive routes.

Best Regards,
Wei
------------------&nbsp;Original&nbsp;------------------
From:                                                                                                                        "Igor Malyushkin"                                                                                    <gmalyushkin@gmail.com&gt;;
Date:&nbsp;Tue, Aug 30, 2022 04:22 AM
To:&nbsp;"Gyan Mishra"<hayabusagsm@gmail.com&gt;;
Cc:&nbsp;"idr"<idr@ietf.org&gt;;"Sue Hares"<shares@ndzh.com&gt;;
Subject:&nbsp;Re: [Idr] Adoption and IPR call for draft-wang-idr-vpn-prefix-orf-03.txt (8/16 to 8/30)

Hi Gyan,

Please see the inline.

пн, 29 авг. 2022 г. в 21:46, Gyan Mishra <hayabusagsm@gmail.com&gt;:

Hi Igor&nbsp;

On Mon, Aug 29, 2022 at 2:38 PM Igor Malyushkin <gmalyushkin@gmail.com&gt; wrote:

Hi Gyan,

I`m not talking about a case when two CEs are connected to the same PE and they advertise the same prefix. Instead, imagine a scenario when a CE or group of CEs (say, sub-ring) is connected to two PEs. These PEs use different RDs for a VRF where the CE (or the sub-ring) resides. For some reason, PE-CE session limits aren`t configured and both PEs received a lot of routes from the CE. One PE may receive these routes slightly later (imagine, there is some propagation delay of routes through the sub-ring). Or one of the multihomed PEs is slightly saturated by CPU resources. Or there is misconfiguration for MRAI, etc., etc., etc. Eventually, one of the PEs sends VPN routes via internal sessions toward an RR with some delay.

A destination PE starts receiving VPN routes from the first (a faster) PE via a session to the RR, these routes exhaust a quota and a VRF prefix limit. The destination PE sends an ORF message to the RR and starts discarding excessive routes that&nbsp;it&nbsp;already&nbsp;received, but it is still receiving new routes from the RR (RR hasn`t received and processed the ORF message). 

&nbsp; &nbsp; Gyan&gt; Clarification.&nbsp; The destination PE sends ORF message to the RR, RR sends updated Adj-RiB-Out based on ORF entires, destination PE now receives the “filtered” routing update based on the ORF entries processed by the RR.

[IM] There is plenty of time between these steps. Please consider that all of this is happening in a very short period of time. For example, when the destination PE decides to send the ORF message it is still receiving routes from the RR. These routes are already excessive and they will be dropped from inserting to a VRF locally by the VRF prefix limit. Also, "the filtering routing update" is not an entity. It is a lot of routes that can be packed and has to be sent as some amount of UPDATE messages. The destination PE has to process them too. It cannot be described just as "send, receive, now". It is a continuous process.&nbsp;

The RR is doing the discarding or dropping/filtering towards the PE and it’s the PE as a result recovering from the high CPU and memory exhaustion with relief on the VRF RIB.

[IM] I haven`t said anything about the high CPU or memory of the destination PE. Crossing the VRF limit does not mean that there are some problems with the resources. One PE can be able to process excessive routes, another can not be able. This situation can change at different moments of time and depends on variable parameters.

I am not understanding this in quotes&nbsp;
“ but it is still receiving new routes from the RR (RR hasn`t received and processed the ORF message).”

If new routes are from the second multihomed PE those routes would also be dropped with RT TLV set for each source PE flooding the routes

[IM] I explained this above. First, at this moment in time, there is nothing about the routes from the multihoming second PE. Second, there is a time frame between the reaction of the destination PE on the excessive routes and the decision of the RR to stop sending UPDATES due to the ORF message. All of this time the RR can send routes to the destination PE if the RR has them to send. I don`t see anything bad here actually. It's just how things work.

The destination PE is only discarding routes based on the updated Adj-RIB-out from the RR

[IM] Yes. By discarding, I mean not installing them into the VRF due to the VRF prefix limit. Maybe the term is not suitable, but hope now it`s clear.

At this time the RR starts sending also VPN routes from the second multihoming PE. It also eventually receives the ORF message and stops sending routes from the first PE.

&nbsp; &nbsp;Please read RFC 5291 which explains the ORF process.&nbsp; Basically ORF AFI/SAFI is first negotiated P2P and then the Peer A (PE) sends ORF towards Peer B (RR) and peer B installs the ORF entries from peer A and updates it’s Adj-RIB-Out towards Peer A at which point the routes based on the ORF entries received have been dropped or excluded in the update to Peer A based on the 3 tuple {RT, RD, Source PE}.&nbsp; So the action of dropping is done by side receiving the ORF in this case is the RR. &nbsp;

[IM] Thanks for the reference. Actually here you express my concern that RR will delete ALL routes but not only&nbsp;excessive ones. In the other way, it needs something more than just a tuple of {RT, RD, SRC}. But anyway, your statement does not change anything. RR will reevaluate the Adj-RIB-Out toward the destination PE and will drop or exclude (using your terminology) routes based on {RT, RD, SRC}, but please note that routes from the second multihoming PE have different RD. Thus, RR will proceed to send them until it receives the second ORF message. After that, RR again will reevaluate the Adj-RIB-Out and will drop or exclude routes, but now with the {RT, RD-2, SRC-2}. The destination PE will again process some amount UPDATEs with withdrawals.

From the point of view of destinations (not routes), this process will be repeated two times for every destination. For example, route A from the first multihoming PE will be the route above the quota but below the VRF prefix limit. Thus, it will be installed into the VRF. Then RR will withdraw this route as excessive. The destination PE will delete this route from the VRF and will install a route for the same destination from the second multihoming PE (if the destination PE has received this route, it can actually happen). This process will be repeated when RR receives the second ORF message.

RR starts sending withdrawals for the routes of the first PE and continues sending routes of the second PE. Let`s imagine, that the destination PE considers the routes of the second multihoming PE and always compares them with the quota (I`m still not sure about it, the draft is uncertain&nbsp;here). Due to the VRF prefix limit being passed a long time ago, the PE sends the second ORF message (although we could stop all this nightmare with the first message if it weresource-less). All this time the destination PE is dropping the same amount of routes but from the second multihoming PE. The RR received the second ORF, stops sending updates, and start sending withdrawals.&nbsp;Consider that some routes would be deleted from the VRF (I`m still not sure about it) when the destination PE sends the first ORF message. In this case, we also need to update FIB, delete the routes from the first multihoming PE, then install routes for the same destinations from the second. After the second ORF message, we again delete these routes.

пн, 29 авг. 2022 г. в 20:09, Gyan Mishra <hayabusagsm@gmail.com&gt;:

Hi Igor&nbsp;

In the dual homes CE scenario the paths advertised from CE1 would have path id 1 and the prefixes from CE2 would have a different path id homed to the same PE, and if add paths is enabled on all PEs for diverse pathing the redundant path may also have a different path id as the withdrawal is done based on the path id.&nbsp; So I don’t see the withdrawal causing any kind of race conditions.

Kind Regards&nbsp;

Gyan
On Mon, Aug 29, 2022 at 12:09 PM Igor Malyushkin <gmalyushkin@gmail.com&gt; wrote:

Hi Aijun,

We can see the solution to the problem differently, but I think any solution must not create additional problems.

I`m not sure that with possible race conditions this solution doesn`t&nbsp;pose new problems with the processing of updates.

пн, 29 авг. 2022 г. в 17:20, Aijun Wang <wangaijun@tsinghua.org.cn&gt;:

Hi, Igor:

&nbsp;

The quota value shouldn't be changed dynamically.

[IM] Ok, it was bad wording. I mean to count received routes over a quota even if the VRF prefix limit is reached.

&nbsp;

In your mentioned scenario(CE is dual homed to two PEs), normally the routes from the first PE and second PE will pass their quotas at the same time first. &nbsp;

[IM] What do you mean by "normally"? We expect that they will be received by a destination PE almost at the same time, but it is not guaranteed.

Then when the VRF limit is reached, both of them will be withdrawn via the VPN Prefixes ORF message at the same time.

[IM] This statement is based on a previous invalid assumption.

Then is it rare or impossible that your mentioned scenario will occur?

[IM] I don`t think that multihoming of CE is rare, also I don`t think that multihoming PEs will send updates at the same time at&nbsp;the same pace (lots of reasons for that).

Aijun Wang

China Telecom

&nbsp;

&nbsp;

发件人: idr-bounces@ietf.org [mailto:idr-bounces@ietf.org] 代表 Igor Malyushkin
发送时间: 2022年8月29日 21:27
收件人: Jeffrey Haas <jhaas@pfrc.org&gt;
抄送: idr <idr@ietf.org&gt;; Sue Hares <shares@ndzh.com&gt;
主题: Re: [Idr] Adoption and IPR call for draft-wang-idr-vpn-prefix-orf-03.txt (8/16 to 8/30)

&nbsp;

Hi Jeff,

Thanks for comments.

&nbsp;

I`m concerned that the suggested solution covers only subset of cases. For example, if a multihomed CE sends us lots of prefixes (that we for unknown reason didn`t drop at ingress), one multihomed PE can distribute them slightly faster than another one. In that case, routes from one multihoming PE will deplet&nbsp;and its&nbsp;quota, and the VRF prefix limit. At the same time routes from the second multihoming PE come. Let`s imagine that RR hasn`t withdrew yet all excessive routes of the first multihoming PE, it is in the process. Here we need to drop locally (due to the old-good prefix limit) almost the same amount of routes (roughly) from the second leg also receive and process withdraws from RR for the fist leg. I believe we will make things with resources even worse. Not to mention if we will free some room for prefixes due to ORF, we will doomed to update RIB/FIB two times in vain.

&nbsp;

Maybe it`s a good move to count a quote independently of the VRF limit (such mechanic isn`t described in the draft, so I`m not sure how it&nbsp;actually works). In the scenario&nbsp;above despite we locally&nbsp;drop excessive routes from the second multihoming PE due to the VRF prefix limit, we can also reduce its quota at the same time and react much&nbsp;faster.

&nbsp;

Please also see the inline.

&nbsp;

пн, 29 авг. 2022 г. в 14:45, Jeffrey Haas <jhaas@pfrc.org&gt;:

Igor,

&gt; On Aug 29, 2022, at 8:39 AM, Igor Malyushkin <gmalyushkin@gmail.com&gt; wrote:
&gt; 
&gt; 
&gt; In the first option, will RR withdraw all PE3`s routes until the number of these routes reaches to the quota of PE3, right? In such way, the described problem can happen only in the second scenario because there will be a room for the routes of PE2. If RR withdraws routes that overflowed the VRF prefix limit only, the described problem will actual for any case.

One observation is that the local systems, when examining their quotas, can use the fact that it knows that a given RD is intended to be mitigated by the ORF or not.

Exactly how the system needs to behave in the implementation would partially depend on the reason for mitigation.&nbsp; For memory exhaustion, it may need to be more aggressive about discarding routes.&nbsp; For CPU overload, lesser mitigations may be sufficient.

[IM] Actually overloading of a VRF prefix limit (which starts sending of an ORF message) does not mean that there are any problems with the memory or CPU. It is just a threshold, a device can even locally drop all excessive routes without any starvation of its resources.&nbsp;This threshold (VRF limit) is an only good and reliable trigger for us. We also can`t know beforehand what problem is actual in the case of routes overloading, it may be either a memory problem, or a CPU one, or even both. So I can`t see a way to configure "the aggressiveness mode" for the proposed solution either. Or I didn`t get your point.

I think the critical implementation detail is that once this ORF is triggered, it should require operator intervention to clear to avoid thrashing routes.

[IM] Operator`s intervention should be triggered earlier, when the quota has passed. But I agree that number of excessive routes can be so much so it will&nbsp;run through the quota and the VRF limit almost simultaneously.

-- Jeff

 _______________________________________________
 Idr mailing list
 Idr@ietf.org
 https://www.ietf.org/mailman/listinfo/idr

-- 

Gyan Mishra

Network Solutions Architect&nbsp;

Email gyan.s.mishra@verizon.com

M 301 502-1347

-- 

Gyan Mishra

Network Solutions Architect&nbsp;

Email gyan.s.mishra@verizon.com

M 301 502-1347