Re: Service Redundancy using BFD

Sami Boutros <> Tue, 28 November 2017 22:39 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id CD7A31287A3 for <>; Tue, 28 Nov 2017 14:39:17 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -4.699
X-Spam-Status: No, score=-4.699 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.8, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (1024-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id SGVRXnKTFuJ5 for <>; Tue, 28 Nov 2017 14:39:14 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 9FCDD1287A0 for <>; Tue, 28 Nov 2017 14:39:06 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=selector1-vmware-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=n4d1PBML5f4rW3eMMz51FVu3kAwifGH3uQA3fKH3zww=; b=if/duVXQzbsMUXdsLHbIvKqB1UVtOy2kEzgvcscYZCNK8FFgnuTqh7fhT4PphoQZ1jh5hxRrlcb1ylSQTK3/kcpkbglBf9upDfCwwe1qXeA8DZCc5rPZRcs/SZ9UF/USrdfHWECItmM1CubvZ5mnAsQRCm7djVzmf6Ch+7hvI+Q=
Received: from ( by ( with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id; Tue, 28 Nov 2017 22:39:04 +0000
Received: from ([]) by ([]) with mapi id 15.20.0282.002; Tue, 28 Nov 2017 22:39:04 +0000
From: Sami Boutros <>
To: Greg Mirsky <>
CC: Ashesh Mishra <>, Ankur Dubey <>, "" <>, Reshad Rahman <>
Subject: Re: Service Redundancy using BFD
Thread-Topic: Service Redundancy using BFD
Thread-Index: AQHTZ/M82vlye4FAgk+FmQYa86ul9aMpXmmAgAA3kICAAGH9AP//z8oAgAA42wD//9WuAIAAAGGAgACHvAD//373AA==
Date: Tue, 28 Nov 2017 22:39:04 +0000
Message-ID: <>
References: <> <> <> <> <> <> <> <> <>
In-Reply-To: <>
Accept-Language: en-US
Content-Language: en-US
authentication-results: spf=none (sender IP is );
x-originating-ip: []
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; MWHPR05MB3231; 20:I6eLXwrCex/fVLRmXe8VTR29VfnZAOMruKk3YoCuHoL1PlAGAec1uecYwDnMe9Frh0Rpst72RBDAfQV2feCJywo5cJPyPdyhBwsNJrvefv6r0LUMoYkvzPazPPwA2tMUunz/HJIh9iRHyxfnkuHw5lnFrE5AeAESVhYUS72nK/s=
x-ms-exchange-antispam-srfa-diagnostics: SSOS;SSOR;
x-forefront-antispam-report: SFV:SKI; SCL:-1; SFV:NSPM; SFS:(10009020)(346002)(376002)(366004)(39860400002)(199003)(24454002)(189002)(51914003)(76176999)(66066001)(50986999)(86362001)(68736007)(3480700004)(81166006)(4326008)(53936002)(101416001)(99286004)(1411001)(2900100001)(81156014)(53546010)(39060400002)(6246003)(236005)(189998001)(229853002)(54896002)(82746002)(83716003)(2906002)(316002)(54356999)(8676002)(6512007)(8936002)(36756003)(2950100002)(478600001)(97736004)(33656002)(6916009)(45080400002)(77096006)(6506006)(105586002)(3280700002)(106356001)(5660300001)(14454004)(3846002)(6436002)(25786009)(6486002)(6116002)(93886005)(54906003)(102836003)(7736002)(3660700001); DIR:OUT; SFP:1101; SCL:1; SRVR:MWHPR05MB3231;; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en;
x-ms-office365-filtering-correlation-id: 425a7dac-4c61-4877-f18f-08d536b0d3f2
x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(5600026)(4604075)(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(2017052603199); SRVR:MWHPR05MB3231;
x-ms-traffictypediagnostic: MWHPR05MB3231:
x-microsoft-antispam-prvs: <>
x-exchange-antispam-report-test: UriScan:(61668805478150)(72170088055959)(189930954265078)(95692535739014);
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040450)(2401047)(5005006)(8121501046)(10201501046)(3231022)(3002001)(93006095)(93001095)(6041248)(20161123558100)(20161123562025)(20161123564025)(20161123560025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123555025)(6072148)(201708071742011); SRVR:MWHPR05MB3231; BCL:0; PCL:0; RULEID:(100000803101)(100110400095); SRVR:MWHPR05MB3231;
x-forefront-prvs: 0505147DDB
received-spf: None ( does not designate permitted sender hosts)
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: multipart/alternative; boundary="_000_359C56876C52494D8CF3A76B4BDC622Avmwarecom_"
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: 425a7dac-4c61-4877-f18f-08d536b0d3f2
X-MS-Exchange-CrossTenant-originalarrivaltime: 28 Nov 2017 22:39:04.0226 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: b39138ca-3cee-4b4a-a4d6-cd83d9dd62f0
X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR05MB3231
Archived-At: <>
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 28 Nov 2017 22:39:18 -0000

Hi Greg,

A can detect failures to the link to C using any mechanisms not only BFD.

The picture below is for illustration, A and B themselves can be providing services (L4 to L7), this could include Firewall, NAT, LoadBalancer etc..


From: Greg Mirsky <<>>
Date: Tuesday, November 28, 2017 at 2:20 PM
To: Sami Boutros <<>>
Cc: Ashesh Mishra <<>>, Ankur Dubey <<>>, "<>" <<>>, Reshad Rahman <<>>
Subject: Re: Service Redundancy using BFD

Hi Sami,
would C have BFD sessions to A and B respectively or it use anycast address? The more I look at the use case, the more I think of VRRP ;)


On Tue, Nov 28, 2017 at 2:15 PM, Sami Boutros <<>> wrote:

Hi Ashesh,

The topology is more like the following:

A <—\
|         \
BFD      C
|         /

A and B are nodes providing L2 and L3 services for C, with A/S redundancy.

A can be active and B standby, if A goes down then B start providing the services.


From: Ashesh Mishra <<>>
Date: Tuesday, November 28, 2017 at 1:45 PM

To: Sami Boutros <<>>, Ankur Dubey <<>>, "<>" <<>>
Cc: Reshad Rahman <<>>
Subject: Re: Service Redundancy using BFD

Okay. That makes sense now.

So in a scenario where you have a primary overlay service between A and B, and a backup overlay service between C and D, the BFD sessions in question will be between A and C, and B and D (so that the backup can send diag code to primary)?

A <------- primary service --------->B
|                                                           |
BFD                                                    BFD
|                                                           |
C<-------- backup service ---------->D


From: Sami Boutros <<>>
Date: Tuesday, November 28, 2017 at 4:21 PM
To: Ashesh Mishra <<>>, Ankur Dubey <<>>, "<>" <<>>
Cc: Reshad Rahman <<>>
Subject: Re: Service Redundancy using BFD

Hi Ashesh,

A service is an overlay service running on a routing node, this could be a L2 or L3 VPN service running on set of links connected to 2 or more nodes, where one node is active for a service at a given point in time, and one node is standby.

Now, BFD is running on underlay links between the 2 nodes active and standby, once BFD goes down, the standby assumes that the active went down and activates the services that it shares with the active. On the BFD session the standby would signal to the old active when it came back up that it activated the non-preemptive services via this diag code saying that it didn’t fail, so the old active node doesn’t activate those non-preemptive services.


From: Ashesh Mishra <<>>
Date: Tuesday, November 28, 2017 at 1:14 PM
To: Sami Boutros <<>>, Ankur Dubey <<>>, "<>" <<>>
Cc: Reshad Rahman <<>>
Subject: Re: Service Redundancy using BFD

Thanks for the response, Sami. I think our disconnect lies in the definition of a service. From a BFD perspective, I expect the service to be established across two nodes, at the very least, so that BFD can monitor its liveness. Can you elaborate on

-          What, in the context of this draft, a service is?

-          How does BFD signal for a service that it is not monitoring the liveness for?


From: Sami Boutros <<>>
Date: Tuesday, November 28, 2017 at 1:23 PM
To: Ashesh Mishra <<>>, Ankur Dubey <<>>, "<>" <<>>
Cc: Reshad Rahman <<>>
Subject: Re: Service Redundancy using BFD

Hi Ashesh,

Thanks for your comments.

For your first comment the draft applies to both single hop or what you call interface BFD and multi hop BFD too. And yes the per service could be per interface too if this is a single hop BFD, we can clarify that in the draft.

For your second comment, I am not sure I understand. The service will be active only on one node, if the service is associated with the whole node, then the BFD session is monitoring the node liveness. And when the service is associated with an interface the BFD session will monitor the interface connectivity as well. So, a primary service can’t be active at the 2 node endpoints hosting the BFD session.