[Idr] draft-ietf-idr-rs-bfd state distribution

Job Snijders <job@instituut.net> Tue, 30 May 2017 11:35 UTC

Return-Path: <job@instituut.net>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5A45B129BDE for <idr@ietfa.amsl.com>; Tue, 30 May 2017 04:35:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.701
X-Spam-Level:
X-Spam-Status: No, score=-0.701 tagged_above=-999 required=5 tests=[BAYES_40=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=instituut-net.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bdWoOZqTHU4i for <idr@ietfa.amsl.com>; Tue, 30 May 2017 04:35:03 -0700 (PDT)
Received: from mail-wm0-x22f.google.com (mail-wm0-x22f.google.com [IPv6:2a00:1450:400c:c09::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 32F07129BD4 for <idr@ietf.org>; Tue, 30 May 2017 04:35:03 -0700 (PDT)
Received: by mail-wm0-x22f.google.com with SMTP id e127so96477055wmg.1 for <idr@ietf.org>; Tue, 30 May 2017 04:35:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=instituut-net.20150623.gappssmtp.com; s=20150623; h=date:from:to:subject:message-id:mime-version:content-disposition :user-agent; bh=E6MlQU2BCvi6swIc6rq9PrEvwDwIodj6TjNyJFXuI1E=; b=hhXNR/yyD8dM1HO3wzsGFKyEUmfFUy6zdvhFu0A/dMSsAy/gqOuBCyC/yW1NDHi/0a H9FQKWcDFbACASn8hVxUzSTl4S4iDphR9DBbgdOcyCJMbkci8T0GTN1rBrCZppgpTFxv XDFToUaYhbFi3aj4zwaCbzi0NWR8qDlZ2D4S25Rf4ZWTEfvfJOWqP4u6XULpUCqCbAAM vq5XLMeAOMNIPOEEMkeznBVgyTId2tRJOiyX/ZHVvrrfjml30ylSAnUXvfJ4T0cyUOdu /CBvWyGmWCLR9eDdvyYtfOO1Q7upQwrzmkx7NN1Y9D/AGdyUVCByVmCnuk3deI9L+FTd X7zg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:subject:message-id:mime-version :content-disposition:user-agent; bh=E6MlQU2BCvi6swIc6rq9PrEvwDwIodj6TjNyJFXuI1E=; b=riznBsrVwzIt8FO1gnExLFORkD2VDsi/8un9iMZGszZ8pEH1cRv78FyCKBZ1r7II1x Wijju5MNomEsd2Ff5AIT/EJhpXs8pOYlw8v9VSj4MGr+skLqvldcRP6lrvD/o/ZjkRlU 77+uOCcSr9kHlQNhGZfW+qKbjuJcmR0LjjxVYKpjhLd8EByfmTQGyOqfqBeON1T+/UAX 2L3rZ5WGyaYyuhJiQodr0r0S9DMGVZxE1san2mKcjF25xy83xB27eAXhDiRj/q4COXoV ROUD/yW239NQygWbdPYUrUschu6nrz3JXiGKohMlr7KeZ0nbc/tZs3kH2qKNIP2BoLQE K1Pw==
X-Gm-Message-State: AODbwcAiSXEJHA3ewG3LS2IQpLa3MiQHi3+6nKWzLVe8EmT8nj58VGpO OTuZcJf3+5oIRgKsZoN1EA==
X-Received: by 10.28.209.141 with SMTP id i135mr1286168wmg.123.1496144101249; Tue, 30 May 2017 04:35:01 -0700 (PDT)
Received: from localhost ([2001:67c:208c:10:21e5:a4c6:148d:8fa7]) by smtp.gmail.com with ESMTPSA id q98sm8351537wrb.3.2017.05.30.04.35.00 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 May 2017 04:35:00 -0700 (PDT)
Date: Tue, 30 May 2017 13:34:59 +0200
From: Job Snijders <job@instituut.net>
To: idr@ietf.org
Message-ID: <20170530113459.pauvpic623ibecj4@hanna.meerval.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
X-Clacks-Overhead: GNU Terry Pratchett
User-Agent: NeoMutt/20170428 (1.8.2)
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/2LgFxEqjmWSVuAC5PM103pR7buU>
Subject: [Idr] draft-ietf-idr-rs-bfd state distribution
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 30 May 2017 11:35:05 -0000

Hi all,

Perhaps this has been covered in prior discussion, if so, my apologies.
Some individuals in the working group made every effort to drown out
discussion of meaningful deployment scenarios.

In reviewing draft-ietf-idr-rs-bfd-02 it occurred to me that there might
be an optimalisation to be made. At the last IETF I questioned whether
the Route Server itself really needs to be aware of reachability between
Route Server participants, and I still think that is not necessity.

OLD:
    This document proposes the use of BFD between the two peering
    routers to detect a data plane failure, and then uses a newly
    defined BGP SAFI to signal the state of the data link to the route
    server(s).
NEW:
    This document proposes the use of BFD between the two peering
    routers to detect a data plane failure. The Route Server facilitates
    setup and teardown of such BFD sessions through a newly defined BGP
    SAFI in which it announces a next-hop's capability and willingness
    to setup a direct BFD session.

OLD:
    To remedy this, two basic problems need to be solved:
    1.  Client routers must have a means of verifying connectivity
        amongst themselves, and
    2.  Client routers must have a means of communicating the knowledge
        of the failure back to the route server.
NEW:
    To remedy this, a basic problem need to be solved:
    1.  Client routers must have a means of verifying connectivity
        amongst themselves.

etc..

Operation:
----------

Scenario: ISP_A and ISP_B are connected to common layer-2 fabric IXP_C,
and both have a BGP session with Route Server RS_R.

ISP_A and ISP_B are both support draft-ietf-idr-rs-bfd-XX, and in the
OPEN message to RS_R they announce support for idr-rs-bfd through a BGP
capability. Since RS_R received this capability from both ISP_A and
ISP_B, it _can_ announce in a newly defined SAFI ISP_A's next-hop to
ISP_B, and _can_ ISP_B's next-hop to ISP_A.

Note: if ISP_A did not announce the capability, ISP_A's nexthop will not
be announced to ISP_B, and ISP_A will of course not receive messages in
context of the newly defined SAFI.

If RS_R announces a path to ISP_B for which the next-hop is ISP_A, it
must also announce a ISP_A's next-hop in the newly defined SAFI to
indicate that ISP_A expressed a willingness and capability to set up a
BFD session. However, since a Route Server allows for facilitation of
unidirectional traffic flows, and BFD is a bidirectional construct, RS_R
must also announce ISP_B's next-hop in the newly defined SAFI to ISP_A. 

The above 'pairwise' announcement style might violate RFC 4271 section
9.1: "The function that calculates the degree of preference for a given
route SHALL NOT use any of the following as its inputs: the existence of
other routes, the non-existence of other routes, or the path attributes
of other routes." But since this is a Route Server, it is perhaps
permissible to add yet another crime to the Route Server's rap sheet.

Implementers might want to add a degree of dampening for the newly
defined SAFI to mitigate all to fast setup and teardown of BFD sessions.

Rationale:
----------

Should there be a fault of sorts on the IXP where for some reason ISP_A
and ISP_B can no longer reach each other, but they can reach RS_R, I'd
argue this is a matter solely between ISP_A and ISP_B.

They both were facilitated by RS_R to set up BFD to each other, they
have a BFD session, the BFD session goes down because of the incident,
as a consequence they'll consider each other's next-hop to be
inadmissible for route selection and proceed to treat routes with each
other's next-hop as withdrawn. Life is good.

Should ISP_A and ISP_B have a feedback loop to RS_R to inform the RS
that they no longer can reach each other, what do we expect RS_R to do?
Calculate new best-paths for each client? Why is this useful? By the
time any flavor of draft-ietf-idr-rs-bfd-XX is implemented, we might
anticipate more use of ADD-PATH. But even without ADD-PATH, there is no
real harm in using a few routes from the RS_R when the IXP has gone
split-brain, the way any Route Server is used on the Internet, they only
carry partial routing tables anyway. 

My main concern is that in real life, when RS_R receives from hundreds
of clients on one side of the IXP that they cannot reach the other side
of the IXP, and the hundreds on the other side of the IXP informs RS_R
that they cannot reach the side I first mentioned, this will create a
stampede for both Route Server and Route Server Participants.

    1)  All Participants observe that 100s of BFD sessions go down
    2)  All Participants immediately proceed to deprecate those paths in
        their own RIBs, sending out withdraws to downstream BGP
        speakers.
    3)  The Route Server receives from n*100s of clients that they can't
        reach various next-hops. These notifications may arrive in a
        staggered fashion.
    3a) The Route Server may observe BGP sessions going down with a
        subset of the participants, since IXP faults like these rarely
        are clean-cut. 
    4)  The Route Server has to run a per-client best-path-selection
        process within each RIB
    4a) Participants see churn in the announcements received in the
        newly defined SAFI, and will proceed with teardown / setup of
        BFD sessions.
    5)  The Route Server has to announce the newly selected best paths

Knowing that at the larger IXPs the Route Servers are already at the
upper bound of their scaling capabilities, and many routing engines used
by IXP participants are somewhat underscaled. With the current proposal
I see a lot, perhaps too much of stirring in the convergence soup.

In making the Route Server aware of link-failures _between_ Route Server
Participants, the totality of all stakeholders depends not only on local
convergence, but now convergence at the Route Server plays a significant
role, which in turn can impact the participant's convergence.

Another issue that under the current proposal, should a Route Server
participant oscilate within the RS-Reachable SAFI, this oscilation can
place a significant additional burden on the Route Server since the
per-client Loc-RIBs needs to be recomputed. This risk does not exist in
a mode of operation where less state is made known to the Route Server.

Another note, instead of "The RS-Reachable Control Extended Community",
shouldn't "Enhanced Route Refresh" (RFC 7313) be used?

It appears to me that the only argument for storing client-to-client
data-link reachability state at the Route Server, is to mitigate Path
Hiding (rfc7947 section 2.3.1) - but it appears to me that it comes at a
too high computational cost. Should the path hiding really need to be
mitigated for the duration of a catastrophic failure at the IXP,
ADD-PATH can be used. 

Kind regards,

Job