Re: Service Redundancy using BFD

Ankur Dubey <adubey@vmware.com> Wed, 29 November 2017 02:30 UTC

Return-Path: <adubey@vmware.com>
X-Original-To: rtg-bfd@ietfa.amsl.com
Delivered-To: rtg-bfd@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 55BE31267BB for <rtg-bfd@ietfa.amsl.com>; Tue, 28 Nov 2017 18:30:02 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.069
X-Spam-Level:
X-Spam-Status: No, score=0.069 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, HTTPS_HTTP_MISMATCH=1.989, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=onevmw.onmicrosoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kRjzQIlbbpbo for <rtg-bfd@ietfa.amsl.com>; Tue, 28 Nov 2017 18:29:58 -0800 (PST)
Received: from NAM03-DM3-obe.outbound.protection.outlook.com (mail-dm3nam03on0061.outbound.protection.outlook.com [104.47.41.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DD5091200C5 for <rtg-bfd@ietf.org>; Tue, 28 Nov 2017 18:29:57 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=onevmw.onmicrosoft.com; s=selector1-vmware-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=ZqKRjCc6uynbEjv3lzAZ9TB7Jsc9c4GO+KstWAC11Fc=; b=fRBc1ELugjMzaIiT1N5bnVVDgqKloUAasSZ1g/RFQbApc3z0ccsdkub2k2h7/i0hMSzzBjzlbB6IZuANJuBKwmJrhJ0RYY2gAq9H7LAUcjzDGEAvKx7Elc+iSmDR9RLfykN6sqrKdfOCBWHriUc5lRI9TGOn1xLGIUvC3nQg9kQ=
Received: from BN6PR05MB3219.namprd05.prod.outlook.com (10.172.146.149) by BN6PR05MB3377.namprd05.prod.outlook.com (10.174.95.151) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.282.3; Wed, 29 Nov 2017 02:29:55 +0000
Received: from BN6PR05MB3219.namprd05.prod.outlook.com ([10.172.146.149]) by BN6PR05MB3219.namprd05.prod.outlook.com ([10.172.146.149]) with mapi id 15.20.0282.004; Wed, 29 Nov 2017 02:29:55 +0000
From: Ankur Dubey <adubey@vmware.com>
To: Ashesh Mishra <mishra.ashesh@outlook.com>, Sami Boutros <sboutros@vmware.com>, "rtg-bfd@ietf.org" <rtg-bfd@ietf.org>
CC: Reshad Rahman <rrahman@cisco.com>
Subject: Re: Service Redundancy using BFD
Thread-Topic: Service Redundancy using BFD
Thread-Index: AQHTZ/M82vlye4FAgk+FmQYa86ul9aMpXmmAgAA3kICAAGH9AP//z8oAgAA42wD//9WuAIAART2A///YzwAAAPTXgAAHPY8A///n9YA=
Date: Wed, 29 Nov 2017 02:29:54 +0000
Message-ID: <0DE01E39-5ADE-4F02-A469-21BC6AD45248@vmware.com>
References: <3A4A67EC-042C-4F8A-80AB-E7A5F638DE15@vmware.com> <76804F35-63BB-46A0-A74C-9E41B2C213B4@outlook.com> <6FB7BA5C-8ECC-4330-89D0-8FD7306217F5@vmware.com> <00F17C92-E43D-4BFB-81B1-534DD221E66F@outlook.com> <42407007-C6BA-4CAF-8BE8-F6C552B92A38@vmware.com> <874DFFD3-1DE2-43A1-B726-B128E5746DBE@outlook.com> <828E73CC-E8C2-48C8-93CD-3CB580174536@vmware.com> <FCCDE12A-C55F-4044-9A06-486BFD66B41B@outlook.com> <A78BAA9C-8968-4B81-AB77-97D73CD37A14@vmware.com> <288B0E50-2D0A-4288-84B7-A12CF6DA7BEB@vmware.com> <7C9430DC-B97A-4F42-9774-CD4F74B47A44@outlook.com>
In-Reply-To: <7C9430DC-B97A-4F42-9774-CD4F74B47A44@outlook.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
authentication-results: spf=none (sender IP is ) smtp.mailfrom=adubey@vmware.com;
x-originating-ip: [208.91.1.34]
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; BN6PR05MB3377; 20:LnFVtfQPiSfC5KlEHDJ/gDBPVaqCz1Zm0xBwL1KfWkDpiV1bbNq+jWbMHzx/p6d3TDkWoKIsDDk4WdLPlRGaNEnXQm4227yFbvO/RoJQFxOFw04fR3QHxIL3dGn0C3nYrjuW5WZ+3M6eaWM5IXbmqXjVmkID7d41aUQqyZGkfqo=
x-ms-exchange-antispam-srfa-diagnostics: SSOS;SSOR;
x-forefront-antispam-report: SFV:SKI; SCL:-1; SFV:NSPM; SFS:(10009020)(366004)(346002)(376002)(199003)(189002)(51914003)(53754006)(86362001)(53546010)(2950100002)(81166006)(68736007)(606006)(7736002)(99286004)(2900100001)(102836003)(2501003)(189998001)(6512007)(6116002)(105586002)(106356001)(66066001)(5660300001)(36756003)(3846002)(6306002)(8676002)(81156014)(54896002)(8936002)(33656002)(14454004)(236005)(93886005)(478600001)(76176999)(50986999)(77096006)(97736004)(6486002)(39060400002)(45080400002)(966005)(54356999)(2906002)(53946003)(6246003)(83716003)(25786009)(3480700004)(82746002)(4326008)(316002)(110136005)(561944003)(101416001)(3280700002)(6436002)(6506006)(229853002)(3660700001)(53936002); DIR:OUT; SFP:1101; SCL:1; SRVR:BN6PR05MB3377; H:BN6PR05MB3219.namprd05.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en;
x-ms-office365-filtering-correlation-id: 3caf06e4-8ce5-4925-f74d-08d536d113c6
x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(5600026)(4604075)(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(2017052603199); SRVR:BN6PR05MB3377;
x-ms-traffictypediagnostic: BN6PR05MB3377:
x-microsoft-antispam-prvs: <BN6PR05MB3377DFB08DA0CD403FABDC2CA13B0@BN6PR05MB3377.namprd05.prod.outlook.com>
x-exchange-antispam-report-test: UriScan:(61668805478150)(10436049006162)(72170088055959)(120809045254105)(189930954265078)(95692535739014)(227612066756510)(21748063052155);
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040450)(2401047)(8121501046)(5005006)(3002001)(3231022)(93006095)(93001095)(10201501046)(6041248)(20161123564025)(20161123560025)(20161123555025)(20161123558100)(20161123562025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(6072148)(201708071742011); SRVR:BN6PR05MB3377; BCL:0; PCL:0; RULEID:(100000803101)(100110400095); SRVR:BN6PR05MB3377;
x-forefront-prvs: 05066DEDBB
received-spf: None (protection.outlook.com: vmware.com does not designate permitted sender hosts)
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: multipart/alternative; boundary="_000_0DE01E395ADE4F02A46921BC6AD45248vmwarecom_"
MIME-Version: 1.0
X-OriginatorOrg: vmware.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 3caf06e4-8ce5-4925-f74d-08d536d113c6
X-MS-Exchange-CrossTenant-originalarrivaltime: 29 Nov 2017 02:29:54.8945 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: b39138ca-3cee-4b4a-a4d6-cd83d9dd62f0
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR05MB3377
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-bfd/NJYyXhaEY7VfMn2Q8JH58gzLVd4>
X-Mailman-Approved-At: Wed, 29 Nov 2017 04:29:15 -0800
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-bfd/>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Nov 2017 02:30:02 -0000

Ashesh,

The bitmap represents all the non-revertive services running between a pair of nodes providing redundancy. One bit per non-revertive service.

The bitmap needs to be used only if a per-service failover has to be supported (section 2.2). When there is at least one non-revertive service for which a node is not active AND it is active for at least 1 non-revertive service, this node will set bits identifying the active services in the bitmap and send it in the payload of the BFD packet.

Thanks,
--Ankur

From: Ashesh Mishra <mishra.ashesh@outlook.com>
Date: Tuesday, November 28, 2017 at 4:55 PM
To: Ankur Dubey <adubey@vmware.com>, Sami Boutros <sboutros@vmware.com>, "rtg-bfd@ietf.org" <rtg-bfd@ietf.org>
Cc: Reshad Rahman <rrahman@cisco.com>
Subject: Re: Service Redundancy using BFD

Ankur,

The bitmap that you mentioned in your previous email to demultiplex the services needs more clarity. Where is that bitmap added in the BFD frame? How does a bitmap represent a subset of the services? I feel there is an underlying assumption in the use-case that’s not clear in the proposal that simplifies the service structure.

Ashesh

From: Ankur Dubey <adubey@vmware.com>
Date: Tuesday, November 28, 2017 at 7:28 PM
To: Ashesh Mishra <mishra.ashesh@outlook.com>, Sami Boutros <sboutros@vmware.com>, "rtg-bfd@ietf.org" <rtg-bfd@ietf.org>
Cc: Reshad Rahman <rrahman@cisco.com>
Subject: Re: Service Redundancy using BFD

Ashesh,

In case you meant that C and D are network nodes (not services) peering with A-B which are providing redundancy for any L2/L3/L4-L7 services, I’d like to clarify the following:

The mechanism to indicate to C&D which node (A or B) should attract traffic for a given service is not described in this draft. Like Sami mentioned in another email, there can be many ways to do that depending on the deployment scenario.

The solution described in this draft helps to establish understanding between the network nodes proving redundancy (A and B) regarding which node should be Active for a given service.

Thanks,
--Ankur

From: Ankur Dubey <adubey@vmware.com>
Date: Tuesday, November 28, 2017 at 4:01 PM
To: Ashesh Mishra <mishra.ashesh@outlook.com>, Sami Boutros <sboutros@vmware.com>, "rtg-bfd@ietf.org" <rtg-bfd@ietf.org>
Cc: Reshad Rahman <rrahman@cisco.com>
Subject: Re: Service Redundancy using BFD

Hi Ashesh,

Yes, multiple services can be running between A & B. The indication of Active is needed on BFD packet only when the backup node is acting as the active for a given non-revertive service.

If all non-revertive services (lets say C and D in your example) are Active on a backup node (lets say B), the new diag code is sufficient to indicate the Active status for those services.

If some non-revertive services are active on A, while others on B, the bitmap indicating active services is needed in the payload.

Thanks,
--Ankur


From: Ashesh Mishra <mishra.ashesh@outlook.com>
Date: Tuesday, November 28, 2017 at 3:21 PM
To: Sami Boutros <sboutros@vmware.com>, Ankur Dubey <adubey@vmware.com>, "rtg-bfd@ietf.org" <rtg-bfd@ietf.org>
Cc: Reshad Rahman <rrahman@cisco.com>
Subject: Re: Service Redundancy using BFD

Sami,

Thanks for the clarification. In a typical scenario, this will look like:

A <-------------\
            \           \
             C            D
            /           /
B <-------------/

were there are two sets of services. The BFD session between A and B in this case will be overloaded with the states for the two sets of sessions. It’s not clear from the proposal if this scenario is addressed (and how).

Ashesh
From: Sami Boutros <sboutros@vmware.com>
Date: Tuesday, November 28, 2017 at 5:13 PM
To: Ashesh Mishra <mishra.ashesh@outlook.com>, Ankur Dubey <adubey@vmware.com>, "rtg-bfd@ietf.org" <rtg-bfd@ietf.org>
Cc: Reshad Rahman <rrahman@cisco.com>
Subject: Re: Service Redundancy using BFD

Hi Ashesh,

The topology is more like the following:

A <—\
|         \
BFD      C
|         /
B<—/

A and B are nodes providing L2 and L3 services for C, with A/S redundancy.

A can be active and B standby, if A goes down then B start providing the services.

Thanks,

Sami
From: Ashesh Mishra <mishra.ashesh@outlook.com<mailto:mishra.ashesh@outlook.com>>
Date: Tuesday, November 28, 2017 at 1:45 PM
To: Sami Boutros <sboutros@vmware.com<mailto:sboutros@vmware.com>>, Ankur Dubey <adubey@vmware.com<mailto:adubey@vmware.com>>, "rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>" <rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>>
Cc: Reshad Rahman <rrahman@cisco.com<mailto:rrahman@cisco.com>>
Subject: Re: Service Redundancy using BFD

Okay. That makes sense now.

So in a scenario where you have a primary overlay service between A and B, and a backup overlay service between C and D, the BFD sessions in question will be between A and C, and B and D (so that the backup can send diag code to primary)?

A <------- primary service --------->B
|                                                           |
BFD                                                    BFD
|                                                           |
C<-------- backup service ---------->D

--
Ashesh


From: Sami Boutros <sboutros@vmware.com<mailto:sboutros@vmware.com>>
Date: Tuesday, November 28, 2017 at 4:21 PM
To: Ashesh Mishra <mishra.ashesh@outlook.com<mailto:mishra.ashesh@outlook.com>>, Ankur Dubey <adubey@vmware.com<mailto:adubey@vmware.com>>, "rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>" <rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>>
Cc: Reshad Rahman <rrahman@cisco.com<mailto:rrahman@cisco.com>>
Subject: Re: Service Redundancy using BFD

Hi Ashesh,

A service is an overlay service running on a routing node, this could be a L2 or L3 VPN service running on set of links connected to 2 or more nodes, where one node is active for a service at a given point in time, and one node is standby.

Now, BFD is running on underlay links between the 2 nodes active and standby, once BFD goes down, the standby assumes that the active went down and activates the services that it shares with the active. On the BFD session the standby would signal to the old active when it came back up that it activated the non-preemptive services via this diag code saying that it didn’t fail, so the old active node doesn’t activate those non-preemptive services.

Thanks,

Sami
From: Ashesh Mishra <mishra.ashesh@outlook.com<mailto:mishra.ashesh@outlook.com>>
Date: Tuesday, November 28, 2017 at 1:14 PM
To: Sami Boutros <sboutros@vmware.com<mailto:sboutros@vmware.com>>, Ankur Dubey <adubey@vmware.com<mailto:adubey@vmware.com>>, "rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>" <rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>>
Cc: Reshad Rahman <rrahman@cisco.com<mailto:rrahman@cisco.com>>
Subject: Re: Service Redundancy using BFD

Thanks for the response, Sami. I think our disconnect lies in the definition of a service. From a BFD perspective, I expect the service to be established across two nodes, at the very least, so that BFD can monitor its liveness. Can you elaborate on


-          What, in the context of this draft, a service is?

-          How does BFD signal for a service that it is not monitoring the liveness for?

Thanks,
Ashesh

From: Sami Boutros <sboutros@vmware.com<mailto:sboutros@vmware.com>>
Date: Tuesday, November 28, 2017 at 1:23 PM
To: Ashesh Mishra <mishra.ashesh@outlook.com<mailto:mishra.ashesh@outlook.com>>, Ankur Dubey <adubey@vmware.com<mailto:adubey@vmware.com>>, "rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>" <rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>>
Cc: Reshad Rahman <rrahman@cisco.com<mailto:rrahman@cisco.com>>
Subject: Re: Service Redundancy using BFD

Hi Ashesh,

Thanks for your comments.

For your first comment the draft applies to both single hop or what you call interface BFD and multi hop BFD too. And yes the per service could be per interface too if this is a single hop BFD, we can clarify that in the draft.

For your second comment, I am not sure I understand. The service will be active only on one node, if the service is associated with the whole node, then the BFD session is monitoring the node liveness. And when the service is associated with an interface the BFD session will monitor the interface connectivity as well. So, a primary service can’t be active at the 2 node endpoints hosting the BFD session.

Thanks,

Sami
From: Ashesh Mishra <mishra.ashesh@outlook.com<mailto:mishra.ashesh@outlook.com>>
Date: Tuesday, November 28, 2017 at 4:04 AM
To: Ankur Dubey <adubey@vmware.com<mailto:adubey@vmware.com>>, "rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>" <rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>>
Cc: Reshad Rahman <rrahman@cisco.com<mailto:rrahman@cisco.com>>, Sami Boutros <sboutros@vmware.com<mailto:sboutros@vmware.com>>
Subject: Re: Service Redundancy using BFD

Hi Ankur,

This is a good proposal to pursue within the BFD-wg.

Couple of comments:

-          BFD can only signal this diag code for the interface that it is monitoring (the IP next hop, MPLS LSP, etc.). You mention per-service (which I assume means per-service-per-interface) failover in the draft but it may be worthwhile defining behavior on per-service-type-per-interface as well.

-          There still needs to be a method for the primary and backup pairs (two BFD end-points on primary service and two on backup service) to communicate with each other (primary-to-primary and backup-to-backup) if the service is active or standby. This is useful in the scenario when the primary cannot communicate with backup nodes (it is a failure condition after all).

Again, at 10k ft, I like the idea of signaling active/standby using BFD.

Cheers,
Ashesh

From: Rtg-bfd <rtg-bfd-bounces@ietf.org<mailto:rtg-bfd-bounces@ietf.org>> on behalf of Ankur Dubey <adubey@vmware.com<mailto:adubey@vmware.com>>
Date: Monday, November 27, 2017 at 9:47 PM
To: "rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>" <rtg-bfd@ietf.org<mailto:rtg-bfd@ietf.org>>
Cc: Reshad Rahman <rrahman@cisco.com<mailto:rrahman@cisco.com>>, Sami Boutros <sboutros@vmware.com<mailto:sboutros@vmware.com>>
Subject: Service Redundancy using BFD

Hi all,

Please review and provide comments for the following draft:

https://datatracker.ietf.org/doc/draft-adubey-bfd-service-redundancy/<https://urldefense.proofpoint.com/v2/url?u=https-3A__datatracker.ietf.org_doc_draft-2Dadubey-2Dbfd-2Dservice-2Dredundancy_&d=DwMGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=IVzcTRLQdpta08L0b_y2zDkqvwJhRKMCAbX-2K-LV98&m=3D1zKBUXYinynnVWgCSqOkn4ccSIcx6rzDitjPm2dfs&s=d4DdCstEXxJ0sOJ09fOaHRCfpS3chnYNcuVWImRCcFQ&e=>










Summary of draft:

This draft proposes a new BFD diag code via which a node running a BFD session with another node, can inform the other node after a BFD session times out, that it didn’t go down and did live through the failure.

Such notification is useful for a set of nodes providing Active/Standby redundancy. When these nodes are running multiple L2/L3/L4-L7 services  in non-revertive mode of redundancy, the standby node taking over as active for non-revertive services after BFD times out needs to indicate in the BFD packet that it outlived the other failed old active node. The new diag code will be used for this purpose. When this diag code is set in the BFD packets, it will provide an indication to the failed old active node that it MUST NOT activate the non-revertive services when it comes up.

For providing a per service level failover, a node activating certain non-revertive services needs to indicate that it is Active ONLY for those non-revertive services. This can be done by using a unique bitmap where each bit position is uniquely identifying a service. This unique bitmap is configured on all nodes by a network controller. When there is at least one non-revertive service for which a node is not active AND it is active for at least 1 non-revertive service, this node will set bits identifying the active services in the bitmap and send it in the payload of the BFD packet.


Thanks,
--Ankur