Re: [Idr] Adoption and IPR call for draft-wang-idr-vpn-prefix-orf-03.txt (8/16 to 8/30)

Hi Gyan,

I`m not talking about a case when two CEs are connected to the same PE and
they advertise the same prefix. Instead, imagine a scenario when a CE or
group of CEs (say, sub-ring) is connected to two PEs. These PEs use
different RDs for a VRF where the CE (or the sub-ring) resides. For some
reason, PE-CE session limits aren`t configured and both PEs received a lot
of routes from the CE. One PE may receive these routes slightly later
(imagine, there is some propagation delay of routes through the sub-ring).
Or one of the multihomed PEs is slightly saturated by CPU resources. Or
there is misconfiguration for MRAI, etc., etc., etc. Eventually, one of the
PEs sends VPN routes via internal sessions toward an RR with some delay.

A destination PE starts receiving VPN routes from the first (a faster) PE
via a session to the RR, these routes exhaust a quota and a VRF prefix
limit. The destination PE sends an ORF message to the RR and starts
discarding excessive routes that it already received, but it is still
receiving new routes from the RR (RR hasn`t received and processed the ORF
message). At this time the RR starts sending also VPN routes from the
second multihoming PE. It also eventually receives the ORF message and
stops sending routes from the first PE. RR starts sending withdrawals for
the routes of the first PE and continues sending routes of the second PE.
Let`s imagine, that the destination PE considers the routes of the second
multihoming PE and always compares them with the quota (I`m still not sure
about it, the draft is uncertain here). Due to the VRF prefix limit being
passed a long time ago, the PE sends the second ORF message (although we
could stop all this nightmare with the first message if it
weresource-less). All this time the destination PE is dropping the same
amount of routes but from the second multihoming PE. The RR received the
second ORF, stops sending updates, and start sending withdrawals.
Consider that some routes would be deleted from the VRF (I`m still not sure
about it) when the destination PE sends the first ORF message. In this
case, we also need to update FIB, delete the routes from the first
multihoming PE, then install routes for the same destinations from the
second. After the second ORF message, we again delete these routes.

пн, 29 авг. 2022 г. в 20:09, Gyan Mishra <hayabusagsm@gmail.com>:

>
> Hi Igor
>
> In the dual homes CE scenario the paths advertised from CE1 would have
> path id 1 and the prefixes from CE2 would have a different path id homed to
> the same PE, and if add paths is enabled on all PEs for diverse pathing the
> redundant path may also have a different path id as the withdrawal is done
> based on the path id.  So I don’t see the withdrawal causing any kind of
> race conditions.
>
> Kind Regards
>
> Gyan
> On Mon, Aug 29, 2022 at 12:09 PM Igor Malyushkin <gmalyushkin@gmail.com>
> wrote:
>
>> Hi Aijun,
>>
>> We can see the solution to the problem differently, but I think any
>> solution must not create additional problems.
>>
>> I`m not sure that with possible race conditions this solution
>> doesn`t pose new problems with the processing of updates.
>>
>>
>> пн, 29 авг. 2022 г. в 17:20, Aijun Wang <wangaijun@tsinghua.org.cn>:
>>
>>> Hi, Igor:
>>>
>>>
>>>
>>> The quota value shouldn't be changed dynamically.
>>>
>> [IM] Ok, it was bad wording. I mean to count received routes over a quota
>> even if the VRF prefix limit is reached.
>>
>>>
>>>
>>> In your mentioned scenario(CE is dual homed to two PEs), normally the
>>> routes from the first PE and second PE will pass their quotas at the same
>>> time first.
>>>
>> [IM] What do you mean by "normally"? We *expect *that they will be
>> received by a destination PE almost at the same time, but it is not
>> guaranteed.
>>
>>> Then when the VRF limit is reached, both of them will be withdrawn via
>>> the VPN Prefixes ORF message at the same time.
>>>
>> [IM] This statement is based on a previous invalid assumption.
>>
>>>
>>>
>>> Then is it rare or impossible that your mentioned scenario will occur?
>>>
>> [IM] I don`t think that multihoming of CE is rare, also I don`t think
>> that multihoming PEs will send updates at the same time at the same pace
>> (lots of reasons for that).
>>
>> Aijun Wang
>>>
>>> China Telecom
>>>
>>>
>>>
>>>
>>>
>>> *发件人:* idr-bounces@ietf.org [mailto:idr-bounces@ietf.org] *代表 *Igor
>>> Malyushkin
>>> *发送时间:* 2022年8月29日 21:27
>>> *收件人:* Jeffrey Haas <jhaas@pfrc.org>
>>> *抄送:* idr <idr@ietf.org>; Sue Hares <shares@ndzh.com>
>>> *主题:* Re: [Idr] Adoption and IPR call for
>>> draft-wang-idr-vpn-prefix-orf-03.txt (8/16 to 8/30)
>>>
>>>
>>>
>>> Hi Jeff,
>>>
>>> Thanks for comments.
>>>
>>>
>>>
>>> I`m concerned that the suggested solution covers only subset of cases.
>>> For example, if a multihomed CE sends us lots of prefixes (that we for
>>> unknown reason didn`t drop at ingress), one multihomed PE can distribute
>>> them slightly faster than another one. In that case, routes from one
>>> multihoming PE will deplet and its quota, and the VRF prefix limit. At the
>>> same time routes from the second multihoming PE come. Let`s imagine that RR
>>> hasn`t withdrew yet all excessive routes of the first multihoming PE, it is
>>> in the process. Here we need to drop locally (due to the old-good prefix
>>> limit) almost the same amount of routes (roughly) from the second leg also
>>> receive and process withdraws from RR for the fist leg. I believe we will
>>> make things with resources even worse. Not to mention if we will free some
>>> room for prefixes due to ORF, we will doomed to update RIB/FIB two times in
>>> vain.
>>>
>>>
>>>
>>> Maybe it`s a good move to count a quote independently of the VRF limit
>>> (such mechanic isn`t described in the draft, so I`m not sure how
>>> it actually works). In the scenario above despite we locally drop excessive
>>> routes from the second multihoming PE due to the VRF prefix limit, we can
>>> also reduce its quota at the same time and react much faster.
>>>
>>>
>>>
>>> Please also see the inline.
>>>
>>>
>>>
>>> пн, 29 авг. 2022 г. в 14:45, Jeffrey Haas <jhaas@pfrc.org>:
>>>
>>> Igor,
>>>
>>> > On Aug 29, 2022, at 8:39 AM, Igor Malyushkin <gmalyushkin@gmail.com>
>>> wrote:
>>> >
>>> >
>>> > In the first option, will RR withdraw all PE3`s routes until the
>>> number of these routes reaches to the quota of PE3, right? In such way, the
>>> described problem can happen only in the second scenario because there will
>>> be a room for the routes of PE2. If RR withdraws routes that overflowed the
>>> VRF prefix limit only, the described problem will actual for any case.
>>>
>>> One observation is that the local systems, when examining their quotas,
>>> can use the fact that it knows that a given RD is intended to be mitigated
>>> by the ORF or not.
>>>
>>> Exactly how the system needs to behave in the implementation would
>>> partially depend on the reason for mitigation.  For memory exhaustion, it
>>> may need to be more aggressive about discarding routes.  For CPU overload,
>>> lesser mitigations may be sufficient.
>>>
>>> [IM] Actually overloading of a VRF prefix limit (which starts sending of
>>> an ORF message) does not mean that there are any problems with the memory
>>> or CPU. It is just a threshold, a device can even locally drop all
>>> excessive routes without any starvation of its resources. This threshold
>>> (VRF limit) is an only good and reliable trigger for us. We also can`t know
>>> beforehand what problem is actual in the case of routes overloading, it may
>>> be either a memory problem, or a CPU one, or even both. So I can`t see a
>>> way to configure "the aggressiveness mode" for the proposed solution
>>> either. Or I didn`t get your point.
>>>
>>>
>>> I think the critical implementation detail is that once this ORF is
>>> triggered, it should require operator intervention to clear to avoid
>>> thrashing routes.
>>>
>>> [IM] Operator`s intervention should be triggered earlier, when the quota
>>> has passed. But I agree that number of excessive routes can be so much so
>>> it will run through the quota and the VRF limit almost simultaneously.
>>>
>>>
>>> -- Jeff
>>>
>>> _______________________________________________
>> Idr mailing list
>> Idr@ietf.org
>> https://www.ietf.org/mailman/listinfo/idr
>>
> --
>
> <http://www.verizon.com/>
>
> *Gyan Mishra*
>
> *Network Solutions A**rchitect *
>
> *Email gyan.s.mishra@verizon.com <gyan.s.mishra@verizon.com>*
>
>
>
> *M 301 502-1347*
>
>