[Idr] draft-ietf-idr-rs-bfd state distribution
Job Snijders <job@instituut.net> Tue, 30 May 2017 11:35 UTC
Return-Path: <job@instituut.net>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5A45B129BDE for <idr@ietfa.amsl.com>; Tue, 30 May 2017 04:35:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.701
X-Spam-Level:
X-Spam-Status: No, score=-0.701 tagged_above=-999 required=5 tests=[BAYES_40=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=instituut-net.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bdWoOZqTHU4i for <idr@ietfa.amsl.com>; Tue, 30 May 2017 04:35:03 -0700 (PDT)
Received: from mail-wm0-x22f.google.com (mail-wm0-x22f.google.com [IPv6:2a00:1450:400c:c09::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 32F07129BD4 for <idr@ietf.org>; Tue, 30 May 2017 04:35:03 -0700 (PDT)
Received: by mail-wm0-x22f.google.com with SMTP id e127so96477055wmg.1 for <idr@ietf.org>; Tue, 30 May 2017 04:35:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=instituut-net.20150623.gappssmtp.com; s=20150623; h=date:from:to:subject:message-id:mime-version:content-disposition :user-agent; bh=E6MlQU2BCvi6swIc6rq9PrEvwDwIodj6TjNyJFXuI1E=; b=hhXNR/yyD8dM1HO3wzsGFKyEUmfFUy6zdvhFu0A/dMSsAy/gqOuBCyC/yW1NDHi/0a H9FQKWcDFbACASn8hVxUzSTl4S4iDphR9DBbgdOcyCJMbkci8T0GTN1rBrCZppgpTFxv XDFToUaYhbFi3aj4zwaCbzi0NWR8qDlZ2D4S25Rf4ZWTEfvfJOWqP4u6XULpUCqCbAAM vq5XLMeAOMNIPOEEMkeznBVgyTId2tRJOiyX/ZHVvrrfjml30ylSAnUXvfJ4T0cyUOdu /CBvWyGmWCLR9eDdvyYtfOO1Q7upQwrzmkx7NN1Y9D/AGdyUVCByVmCnuk3deI9L+FTd X7zg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:subject:message-id:mime-version :content-disposition:user-agent; bh=E6MlQU2BCvi6swIc6rq9PrEvwDwIodj6TjNyJFXuI1E=; b=riznBsrVwzIt8FO1gnExLFORkD2VDsi/8un9iMZGszZ8pEH1cRv78FyCKBZ1r7II1x Wijju5MNomEsd2Ff5AIT/EJhpXs8pOYlw8v9VSj4MGr+skLqvldcRP6lrvD/o/ZjkRlU 77+uOCcSr9kHlQNhGZfW+qKbjuJcmR0LjjxVYKpjhLd8EByfmTQGyOqfqBeON1T+/UAX 2L3rZ5WGyaYyuhJiQodr0r0S9DMGVZxE1san2mKcjF25xy83xB27eAXhDiRj/q4COXoV ROUD/yW239NQygWbdPYUrUschu6nrz3JXiGKohMlr7KeZ0nbc/tZs3kH2qKNIP2BoLQE K1Pw==
X-Gm-Message-State: AODbwcAiSXEJHA3ewG3LS2IQpLa3MiQHi3+6nKWzLVe8EmT8nj58VGpO OTuZcJf3+5oIRgKsZoN1EA==
X-Received: by 10.28.209.141 with SMTP id i135mr1286168wmg.123.1496144101249; Tue, 30 May 2017 04:35:01 -0700 (PDT)
Received: from localhost ([2001:67c:208c:10:21e5:a4c6:148d:8fa7]) by smtp.gmail.com with ESMTPSA id q98sm8351537wrb.3.2017.05.30.04.35.00 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 May 2017 04:35:00 -0700 (PDT)
Date: Tue, 30 May 2017 13:34:59 +0200
From: Job Snijders <job@instituut.net>
To: idr@ietf.org
Message-ID: <20170530113459.pauvpic623ibecj4@hanna.meerval.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
X-Clacks-Overhead: GNU Terry Pratchett
User-Agent: NeoMutt/20170428 (1.8.2)
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/2LgFxEqjmWSVuAC5PM103pR7buU>
Subject: [Idr] draft-ietf-idr-rs-bfd state distribution
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 30 May 2017 11:35:05 -0000
Hi all, Perhaps this has been covered in prior discussion, if so, my apologies. Some individuals in the working group made every effort to drown out discussion of meaningful deployment scenarios. In reviewing draft-ietf-idr-rs-bfd-02 it occurred to me that there might be an optimalisation to be made. At the last IETF I questioned whether the Route Server itself really needs to be aware of reachability between Route Server participants, and I still think that is not necessity. OLD: This document proposes the use of BFD between the two peering routers to detect a data plane failure, and then uses a newly defined BGP SAFI to signal the state of the data link to the route server(s). NEW: This document proposes the use of BFD between the two peering routers to detect a data plane failure. The Route Server facilitates setup and teardown of such BFD sessions through a newly defined BGP SAFI in which it announces a next-hop's capability and willingness to setup a direct BFD session. OLD: To remedy this, two basic problems need to be solved: 1. Client routers must have a means of verifying connectivity amongst themselves, and 2. Client routers must have a means of communicating the knowledge of the failure back to the route server. NEW: To remedy this, a basic problem need to be solved: 1. Client routers must have a means of verifying connectivity amongst themselves. etc.. Operation: ---------- Scenario: ISP_A and ISP_B are connected to common layer-2 fabric IXP_C, and both have a BGP session with Route Server RS_R. ISP_A and ISP_B are both support draft-ietf-idr-rs-bfd-XX, and in the OPEN message to RS_R they announce support for idr-rs-bfd through a BGP capability. Since RS_R received this capability from both ISP_A and ISP_B, it _can_ announce in a newly defined SAFI ISP_A's next-hop to ISP_B, and _can_ ISP_B's next-hop to ISP_A. Note: if ISP_A did not announce the capability, ISP_A's nexthop will not be announced to ISP_B, and ISP_A will of course not receive messages in context of the newly defined SAFI. If RS_R announces a path to ISP_B for which the next-hop is ISP_A, it must also announce a ISP_A's next-hop in the newly defined SAFI to indicate that ISP_A expressed a willingness and capability to set up a BFD session. However, since a Route Server allows for facilitation of unidirectional traffic flows, and BFD is a bidirectional construct, RS_R must also announce ISP_B's next-hop in the newly defined SAFI to ISP_A. The above 'pairwise' announcement style might violate RFC 4271 section 9.1: "The function that calculates the degree of preference for a given route SHALL NOT use any of the following as its inputs: the existence of other routes, the non-existence of other routes, or the path attributes of other routes." But since this is a Route Server, it is perhaps permissible to add yet another crime to the Route Server's rap sheet. Implementers might want to add a degree of dampening for the newly defined SAFI to mitigate all to fast setup and teardown of BFD sessions. Rationale: ---------- Should there be a fault of sorts on the IXP where for some reason ISP_A and ISP_B can no longer reach each other, but they can reach RS_R, I'd argue this is a matter solely between ISP_A and ISP_B. They both were facilitated by RS_R to set up BFD to each other, they have a BFD session, the BFD session goes down because of the incident, as a consequence they'll consider each other's next-hop to be inadmissible for route selection and proceed to treat routes with each other's next-hop as withdrawn. Life is good. Should ISP_A and ISP_B have a feedback loop to RS_R to inform the RS that they no longer can reach each other, what do we expect RS_R to do? Calculate new best-paths for each client? Why is this useful? By the time any flavor of draft-ietf-idr-rs-bfd-XX is implemented, we might anticipate more use of ADD-PATH. But even without ADD-PATH, there is no real harm in using a few routes from the RS_R when the IXP has gone split-brain, the way any Route Server is used on the Internet, they only carry partial routing tables anyway. My main concern is that in real life, when RS_R receives from hundreds of clients on one side of the IXP that they cannot reach the other side of the IXP, and the hundreds on the other side of the IXP informs RS_R that they cannot reach the side I first mentioned, this will create a stampede for both Route Server and Route Server Participants. 1) All Participants observe that 100s of BFD sessions go down 2) All Participants immediately proceed to deprecate those paths in their own RIBs, sending out withdraws to downstream BGP speakers. 3) The Route Server receives from n*100s of clients that they can't reach various next-hops. These notifications may arrive in a staggered fashion. 3a) The Route Server may observe BGP sessions going down with a subset of the participants, since IXP faults like these rarely are clean-cut. 4) The Route Server has to run a per-client best-path-selection process within each RIB 4a) Participants see churn in the announcements received in the newly defined SAFI, and will proceed with teardown / setup of BFD sessions. 5) The Route Server has to announce the newly selected best paths Knowing that at the larger IXPs the Route Servers are already at the upper bound of their scaling capabilities, and many routing engines used by IXP participants are somewhat underscaled. With the current proposal I see a lot, perhaps too much of stirring in the convergence soup. In making the Route Server aware of link-failures _between_ Route Server Participants, the totality of all stakeholders depends not only on local convergence, but now convergence at the Route Server plays a significant role, which in turn can impact the participant's convergence. Another issue that under the current proposal, should a Route Server participant oscilate within the RS-Reachable SAFI, this oscilation can place a significant additional burden on the Route Server since the per-client Loc-RIBs needs to be recomputed. This risk does not exist in a mode of operation where less state is made known to the Route Server. Another note, instead of "The RS-Reachable Control Extended Community", shouldn't "Enhanced Route Refresh" (RFC 7313) be used? It appears to me that the only argument for storing client-to-client data-link reachability state at the Route Server, is to mitigate Path Hiding (rfc7947 section 2.3.1) - but it appears to me that it comes at a too high computational cost. Should the path hiding really need to be mitigated for the duration of a catastrophic failure at the IXP, ADD-PATH can be used. Kind regards, Job
- [Idr] draft-ietf-idr-rs-bfd state distribution Job Snijders
- Re: [Idr] draft-ietf-idr-rs-bfd state distribution Robert Raszuk