Re: [Idr] rd-orf problem clarification at remote-PE level

Aijun Wang <wangaj3@chinatelecom.cn> Tue, 16 February 2021 02:17 UTC

Return-Path: <wangaj3@chinatelecom.cn>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 86FEE3A09BE; Mon, 15 Feb 2021 18:17:13 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, MIME_QP_LONG_LINE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ETBydmxC9CPE; Mon, 15 Feb 2021 18:17:09 -0800 (PST)
Received: from chinatelecom.cn (prt-mail.chinatelecom.cn [42.123.76.219]) by ietfa.amsl.com (Postfix) with ESMTP id 3B4DA3A09BD; Mon, 15 Feb 2021 18:17:07 -0800 (PST)
HMM_SOURCE_IP: 172.18.0.92:24213.1195065534
HMM_ATTACHE_NUM: 0000
HMM_SOURCE_TYPE: SMTP
Received: from clientip-111.194.51.239?logid-3d19b2cb22fd496ab1233e5cabc4e473 (unknown [172.18.0.92]) by chinatelecom.cn (HERMES) with SMTP id 7337828008E; Tue, 16 Feb 2021 10:17:01 +0800 (CST)
X-189-SAVE-TO-SEND: 66040164@chinatelecom.cn
Received: from ([172.18.0.92]) by App0021 with ESMTP id 3d19b2cb22fd496ab1233e5cabc4e473 for jhaas@pfrc.org; Tue Feb 16 10:17:04 2021
X-Transaction-ID: 3d19b2cb22fd496ab1233e5cabc4e473
X-filter-score: filter<0>
X-Real-From: wangaj3@chinatelecom.cn
X-Receive-IP: 172.18.0.92
X-MEDUSA-Status: 0
Sender: wangaj3@chinatelecom.cn
Content-Type: multipart/alternative; boundary="Apple-Mail-C5187E47-1F29-415E-A549-B7254E06A6E6"
Content-Transfer-Encoding: 7bit
From: Aijun Wang <wangaj3@chinatelecom.cn>
Mime-Version: 1.0 (1.0)
Date: Tue, 16 Feb 2021 10:16:57 +0800
Message-Id: <B967A505-B620-46E1-88FE-FBF317020954@chinatelecom.cn>
References: <20210215180548.GC16389@pfrc.org>
Cc: idr@ietf.org, draft-wang-idr-rd-orf@ietf.org
In-Reply-To: <20210215180548.GC16389@pfrc.org>
To: Jeffrey Haas <jhaas@pfrc.org>
X-Mailer: iPhone Mail (18D52)
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/5c2scmlkJPvoySyizAwR0cYHI0g>
Subject: Re: [Idr] rd-orf problem clarification at remote-PE level
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Feb 2021 02:17:14 -0000

Hi, Jeff:
Thanks for your clarification. Let me try to respond and co-authors are welcome to complement.

Aijun Wang
China Telecom

> On Feb 16, 2021, at 01:45, Jeffrey Haas <jhaas@pfrc.org> wrote:
> 
> Authors,
> 
> The IDR chairs were discussing how best to move forward the discussion about
> Working Group adoption for rd-orf.  As part of that discussion, we thought
> it would be helpful to have the discussion split to cover specific points.
> The goal is primarily to clarify the functional requirements.  As part of
> this discussion, there will be some technical observations on the proposal.
> 
> This e-mail is structured with the perceived issue, and a set of questions
> to be commented on as Issues at the end.
> 
> This e-mail focuses on the rd-orf problem statement from the remote-PE's
> perspective.
> 
> ---
> 
> A VPN network distributes VPN routes between PEs.  This may be done across a
> variety of BGP topologies including directly attached routers using iBGP and
> eBGP, leverage route reflectors and confederations.
> 
> In the absence of other mechanisms, routes are often distributed wholly
> across the VPN topology.  Filtering mechanisms such as BGP policy may be
> used to restrict where routes flow across such VPN networks.
> 
> RT-Constrain, RFC 4684, may be used to build subscription graphs of PEs
> interested in specific route targets.
[WAJ] Yes. But RFC4684 can only express explicitly that “What the receiver want”, not the explicit expressions that “What the receiver don’t want”

> 
> ---
> 
> Problem statement:
> 
> Presume that a set of VPN routes originated to a VPN network are identified
> as undesirable by one or more remote PEs.
> 
> Presume those routes are associated with a specific VPN route distinguisher.
> 
> ---
> 
> Working example:
> 
> For some VPN identified by a particular route target RT-A, a VPN site
> identified by VPN routes with a route distinguisher RD-1 accidentally leaks
> the entire Internet routing table into the VPN.  (Currently over 3/4 million
> routes.)
> 
> A remote PE that is a member of the VPN identified by RT-A receives these
> routes RD-1 along with other VPN sites that it wishes to receive that are
> better behaved.

[WAJ] Yes, this is the scenarios that the draft aims to solve.

> 
> ---
> 
> Potential local mitigations:
> 
> 1. The receiving PE may deal with the offending routes by leveraging BGP
>   policy to filter the routes that have RD1 in the prefix.  This may be
>   done using manually configured policy, or mechanisms like ORF.  

[WAJ] Because we can’t know in advance which RD will let the overwhelming VPN routes, configuration policy in advance is impossible.

> 
>   However, this doesn't relieve the impact on the VPN network itself.

[WAJ] We think relieve the impact on the VPN network in one granular way, that is, first is the receiving PE, then the upstream RR, the the source PE. But RD-ORF message is not flooding directly out to network. Each of these devices can act as the valve to control the advertising of overwhelming VPN routes. Such flexibility can cover more scenarios, for example, if only some PEs has constrained capabilities to process the excessive VPN routes, it need not bother other PEs, the RR can restrain the VPN routes that it doesn’t want instead.

> 
> 2. Other route attributes may be available for such filtering, depending the
>   BGP VPN deployment.  An example of this may be the Route Origin Extended
>   Community (RFC 4360).  This filtering may be via manually configured
>   policy.  
> 
>   ORF work originally included Extended Community filtering prior to the
>   use case being supplanted by RT-Constrain.  
> 
>   https://tools.ietf.org/html/draft-ietf-idr-route-filter-13, section 3.2.
> 
>   A dead IETF draft covers this behavior.
>   https://tools.ietf.org/html/draft-chen-bgp-ext-community-orf-02

[WAJ] If we can know in advance these attributes, we can. But due to the occasional occurrence, we can only expect to react quickly after it’s happening.

> 
>   Note, however, that those ORFs were not properly structured as "deny
>   some, permit the rest" and would require Sequence number behavior, or
>   changes in the behavior of ORF default procedures.
> 
[WAJ] The basic ORF principle is “permit some, deny the rest”, as described in https://tools.ietf.org/html/rfc5291#section-6: 
“If for a given AFI/SAFI the intersection between these two sets is non-empty, the speaker SHOULD NOT advertise to the peer any routes with that AFI/SAFI prior to receiving from the peer any ROUTE-REFRESH message carrying that AFI/SAFI, where the message could be either without any ORF entries, or with one or more ORF entry and the When-to-refresh field set to IMMEDIATE. If, on the other hand, for a given AFI/SAFI the intersection between these two sets is empty, the speaker MUST follow normal BGP procedures.”

RD-ORF mechanism follows such implicit behavior.

> ---
> 
> Transitive mitigations:
> 
> ORFs apply to individual BGP sessions.  
> 
> It is possible for local routers to determine to apply a per-session
> mitigation mechanism.  It is possible for that per-session behavior to be
> recursively applied in a BGP network.

[WAJ] The ORF message is based on ROUTE-REFRESH message, it is per session behavior.

>  However, this has operational
> challenges:
> 
> - The originating entity for such a mitigation is not carried in the ORF
>  semantics.  It becomes problematic to trace the origin of such a
>  mitigation.

[WAJ] The ORF message will not be transferred automatically across different BGP sessions. It is easier to trace the sender of this message, that is, the other end of the BGP session.

> 
> - The desired semantic is "do not send me this VPN route with this RD".
>  Negative state is acceptable in networks, but has an operational tendency
>  to become stale.  If used, it should have a short lifetime.

[WAJ] Yes, after the intervention/recover of the operators once the RD-ORF message is triggered, the manual removal will be followed, as described in https://datatracker.ietf.org/doc/html/draft-wang-idr-rd-orf-05#section-5.2

> 
>  (This is because the absence of the negative state for filtering can turn
>  into a state explosion if the negative state is lost for some reason.)
> 
> - ORFs are potentially problamtic to implementations because they provide
>  per session filtering semantics which tends to weaken peer-group
>  strategies to share expesnive queueing work.
> 
[WAJ] Yes. Unless all the peers has sent the same RD-ORF message. I think this is acceptable, considering there is seldom peer-group usage in VPN service deployment? Or if such thing happen, extracts the affected PE from the peer-group?

> ---
> 
> Issues:
> 
> 1. Is there a need to signal a remote PE that it's routes are undesirable?
>   If so, what does the distribution graph look like?
> 
>   (Note that this begins to resemble flood and prune mechanisms.)
[WAJ] The RD-ORF message may or may not reach to the sourcing PE, the related intermediate node can all act as the valve to control the excess VPN routes, on behalf of its downstream BGP peers.
> 
> 2. Local mitigations are possible to reduce the impact on a receiving PE.  A
>   VPN route distribution network needs to be able to deal with resource
>   overload on a short term basis.  If it cannot, it has other fragility
>   issues that can't be mitigated by a triggered mechanism.

[WAJ] RD-ORF mechanism doesn’t preclude other protections mechanism deployment . It is useful because existing mechanisms can’t reduce the occasional overwhelming of interested VPN routes.
> 
>   Is there any need for a signaling mechanism to attempt to mitigate this
>   or should this be considered solely a matter for operator attention?
>   (NOC phone, for example.)

[WAJ] The network should have the capabilities to react automatically, not always static configuration. After the trigger of RD-ORF, the operator will be notified. Then such mechanism give the time for operators intervention before the churn disruption.
> 
> -- Jeff
>