[Idr] rd-orf problem clarification at remote-PE level

Jeffrey Haas <jhaas@pfrc.org> Mon, 15 February 2021 17:45 UTC

Return-Path: <jhaas@slice.pfrc.org>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7FDF93A0E32; Mon, 15 Feb 2021 09:45:46 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0
X-Spam-Level:
X-Spam-Status: No, score=0 tagged_above=-999 required=5 tests=[SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iR382lc8iscy; Mon, 15 Feb 2021 09:45:44 -0800 (PST)
Received: from slice.pfrc.org (slice.pfrc.org [67.207.130.108]) by ietfa.amsl.com (Postfix) with ESMTP id B9F3E3A0E2F; Mon, 15 Feb 2021 09:45:43 -0800 (PST)
Received: by slice.pfrc.org (Postfix, from userid 1001) id 7EDD81E35A; Mon, 15 Feb 2021 13:05:49 -0500 (EST)
Date: Mon, 15 Feb 2021 13:05:49 -0500
From: Jeffrey Haas <jhaas@pfrc.org>
To: idr@ietf.org, draft-wang-idr-rd-orf@ietf.org
Message-ID: <20210215180548.GC16389@pfrc.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/AJ4ruGxU1KR7_6D8oEEp2poceTI>
Subject: [Idr] rd-orf problem clarification at remote-PE level
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Feb 2021 17:45:46 -0000

Authors,

The IDR chairs were discussing how best to move forward the discussion about
Working Group adoption for rd-orf.  As part of that discussion, we thought
it would be helpful to have the discussion split to cover specific points.
The goal is primarily to clarify the functional requirements.  As part of
this discussion, there will be some technical observations on the proposal.

This e-mail is structured with the perceived issue, and a set of questions
to be commented on as Issues at the end.

This e-mail focuses on the rd-orf problem statement from the remote-PE's
perspective.

---

A VPN network distributes VPN routes between PEs.  This may be done across a
variety of BGP topologies including directly attached routers using iBGP and
eBGP, leverage route reflectors and confederations.

In the absence of other mechanisms, routes are often distributed wholly
across the VPN topology.  Filtering mechanisms such as BGP policy may be
used to restrict where routes flow across such VPN networks.

RT-Constrain, RFC 4684, may be used to build subscription graphs of PEs
interested in specific route targets.

---

Problem statement:

Presume that a set of VPN routes originated to a VPN network are identified
as undesirable by one or more remote PEs.

Presume those routes are associated with a specific VPN route distinguisher.

---

Working example:

For some VPN identified by a particular route target RT-A, a VPN site
identified by VPN routes with a route distinguisher RD-1 accidentally leaks
the entire Internet routing table into the VPN.  (Currently over 3/4 million
routes.)

A remote PE that is a member of the VPN identified by RT-A receives these
routes RD-1 along with other VPN sites that it wishes to receive that are
better behaved.

---

Potential local mitigations:

1. The receiving PE may deal with the offending routes by leveraging BGP
   policy to filter the routes that have RD1 in the prefix.  This may be
   done using manually configured policy, or mechanisms like ORF.  

   However, this doesn't relieve the impact on the VPN network itself.

2. Other route attributes may be available for such filtering, depending the
   BGP VPN deployment.  An example of this may be the Route Origin Extended
   Community (RFC 4360).  This filtering may be via manually configured
   policy.  

   ORF work originally included Extended Community filtering prior to the
   use case being supplanted by RT-Constrain.  

   https://tools.ietf.org/html/draft-ietf-idr-route-filter-13, section 3.2.

   A dead IETF draft covers this behavior.
   https://tools.ietf.org/html/draft-chen-bgp-ext-community-orf-02

   Note, however, that those ORFs were not properly structured as "deny
   some, permit the rest" and would require Sequence number behavior, or
   changes in the behavior of ORF default procedures.

---

Transitive mitigations:

ORFs apply to individual BGP sessions.  

It is possible for local routers to determine to apply a per-session
mitigation mechanism.  It is possible for that per-session behavior to be
recursively applied in a BGP network.  However, this has operational
challenges:

- The originating entity for such a mitigation is not carried in the ORF
  semantics.  It becomes problematic to trace the origin of such a
  mitigation.

- The desired semantic is "do not send me this VPN route with this RD".
  Negative state is acceptable in networks, but has an operational tendency
  to become stale.  If used, it should have a short lifetime.

  (This is because the absence of the negative state for filtering can turn
  into a state explosion if the negative state is lost for some reason.)

- ORFs are potentially problamtic to implementations because they provide
  per session filtering semantics which tends to weaken peer-group
  strategies to share expesnive queueing work.

---

Issues:

1. Is there a need to signal a remote PE that it's routes are undesirable?
   If so, what does the distribution graph look like?

   (Note that this begins to resemble flood and prune mechanisms.)

2. Local mitigations are possible to reduce the impact on a receiving PE.  A
   VPN route distribution network needs to be able to deal with resource
   overload on a short term basis.  If it cannot, it has other fragility
   issues that can't be mitigated by a triggered mechanism.

   Is there any need for a signaling mechanism to attempt to mitigate this
   or should this be considered solely a matter for operator attention?
   (NOC phone, for example.)

-- Jeff