Re: Service Redundancy using BFD

Greg Mirsky <gregimirsky@gmail.com> Tue, 28 November 2017 23:20 UTC

Return-Path: <gregimirsky@gmail.com>
X-Original-To: rtg-bfd@ietfa.amsl.com
Delivered-To: rtg-bfd@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B55C2128CFF for <rtg-bfd@ietfa.amsl.com>; Tue, 28 Nov 2017 15:20:35 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.709
X-Spam-Level:
X-Spam-Status: No, score=-0.709 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, HTTPS_HTTP_MISMATCH=1.989, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2CnALOWaPgdw for <rtg-bfd@ietfa.amsl.com>; Tue, 28 Nov 2017 15:20:32 -0800 (PST)
Received: from mail-lf0-x229.google.com (mail-lf0-x229.google.com [IPv6:2a00:1450:4010:c07::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 37D07128BB7 for <rtg-bfd@ietf.org>; Tue, 28 Nov 2017 15:20:32 -0800 (PST)
Received: by mail-lf0-x229.google.com with SMTP id a12so1759921lfe.4 for <rtg-bfd@ietf.org>; Tue, 28 Nov 2017 15:20:32 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=wvWvFNRzBLJHx+Gf5ZJSckVt7S4lF6r1AoacOK1jYQs=; b=iVDhvVCnLDdQkxpKEkJVqTqrxrvAehy5gFqXnSXmYMSjl8y90yEjukT74cCojdzuIq TyRFJ8cEh8DZUtCCX3TFKS2JeqWVtXS1F1g/5OrPKIFNPzZJBx+N5t11C7qPTcUCpDci CYn48bl+lIvFdwdwSoTgB3GAOGQ/x3BnIUN4gwTmewrPprWwzgnT9L4SCV2nD8w4HQku T6OT3+AydXLyRpN3W8pB1tHXlpooYBaVyN91PmxiVV8aOgZjcziPktENnZp1Yg2L9OOS vnAoWxfvrdG1W5dfWHvbgKXNSkER0n4ZUeZirrCYGsDQGcJ8BEzehsNAVDM+JEkJJpSM F3Rw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=wvWvFNRzBLJHx+Gf5ZJSckVt7S4lF6r1AoacOK1jYQs=; b=ro0v1ay5gwqdqQ1nyRuRQfWdILO6J7qkyFWDATr6l4VI3WlFgryNI+HQMXBDY2/ACt fT0NcW2YTW7JWwU0qSGApbPoovmgMQ4+a1eyoDSX/IHuAekPiXkxa8jo9QudOzsTF5w7 SJpjQ1SiguBq6wHwQfnNUCyXqbQJSnB6wMjRA5VckGOdT9NpKFG0MAvdxhcyW01agckL gO/05GcoTSsnnfc9KyPRNtPK+H4vy2b6SaGshSgJsMJDkJiOhJScsOk2K0TfMAzpQw3w vDMVv/3EqukasaM329E+QKnvwFgnIpayjPiOJjwCMMfVQTzmcEhfTU2At4tXPrjOGOFt 7/1g==
X-Gm-Message-State: AJaThX4xYDDVQuS2M6a7SwzSFmhKhzLMScNRiYrzWs4Kp9F4MMJvwXTo 366Z/RsnB4EKUqPlpAtMjuDBoLewuxMkaVizoas=
X-Google-Smtp-Source: AGs4zMY1azEwamnZv01UML7pRkxqtqpblu1YKvyos+zCR0E8DsSWDUJw7Q8Fn0pmEALZwNPBk0mYTExvvTi2vXyIoQY=
X-Received: by 10.46.23.144 with SMTP id 16mr388848ljx.162.1511911230416; Tue, 28 Nov 2017 15:20:30 -0800 (PST)
MIME-Version: 1.0
Received: by 10.46.32.136 with HTTP; Tue, 28 Nov 2017 15:20:29 -0800 (PST)
In-Reply-To: <9C021E7D-5F52-4C3B-8083-BB4FE2AB48D5@outlook.com>
References: <3A4A67EC-042C-4F8A-80AB-E7A5F638DE15@vmware.com> <76804F35-63BB-46A0-A74C-9E41B2C213B4@outlook.com> <6FB7BA5C-8ECC-4330-89D0-8FD7306217F5@vmware.com> <00F17C92-E43D-4BFB-81B1-534DD221E66F@outlook.com> <CA+RyBmXgLBdE7JTEs2pQHs59t+vVNagLxsKR7riBJc5JceX9Uw@mail.gmail.com> <9C021E7D-5F52-4C3B-8083-BB4FE2AB48D5@outlook.com>
From: Greg Mirsky <gregimirsky@gmail.com>
Date: Tue, 28 Nov 2017 15:20:29 -0800
Message-ID: <CA+RyBmVcs=jrnrEZORLUTnJFmK72akG4VutS8Z7WCBkDVknO5Q@mail.gmail.com>
Subject: Re: Service Redundancy using BFD
To: Ashesh Mishra <mishra.ashesh@outlook.com>
Cc: Sami Boutros <sboutros@vmware.com>, Ankur Dubey <adubey@vmware.com>, "rtg-bfd@ietf.org" <rtg-bfd@ietf.org>, Reshad Rahman <rrahman@cisco.com>
Content-Type: multipart/alternative; boundary="94eb2c074a20d22b9d055f1343f6"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-bfd/IcyyvXKReq_tQ8CV-RHKX5QBPDE>
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-bfd/>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Nov 2017 23:20:36 -0000

Hi Ashesh,
I agree that there are new scenarios and use cases to apply BFD-like
mechanism. Is it then time for BFD v2.0?

Regards,
Greg

On Tue, Nov 28, 2017 at 3:17 PM, Ashesh Mishra <mishra.ashesh@outlook.com>
wrote:

> Hi Greg,
>
>
>
> I’m just trying to understand the use of BFD in this proposal.
>
>
>
> I agree with you that 5880 was clear in its scope at the time, but that
> should not inform the entire scope of BFD in the future.
>
>
>
> Ashesh
>
>
>
> *From: *Greg Mirsky <gregimirsky@gmail.com>
> *Date: *Tuesday, November 28, 2017 at 5:06 PM
> *To: *Ashesh Mishra <mishra.ashesh@outlook.com>
> *Cc: *Sami Boutros <sboutros@vmware.com>, Ankur Dubey <adubey@vmware.com>,
> "rtg-bfd@ietf.org" <rtg-bfd@ietf.org>, Reshad Rahman <rrahman@cisco.com>
>
> *Subject: *Re: Service Redundancy using BFD
>
>
>
> Hi Ashesh,
>
> I believe that the abstract of RFC 5880 is very clear of what is the goal
> of BFD:
>
>    This document describes a protocol intended to detect faults in the
>
>    bidirectional path between two forwarding engines, including
>
>    interfaces, data link(s), and to the extent possible the forwarding
>
>    engines themselves, with potentially very low latency.  It operates
>
>    independently of media, data protocols, and routing protocols.
>
>
>
> Applications, e.g. routing protocols, residing on the BFD node may use
> notifications of BFD state changes to trigger their own processes. An
> implementation may use BFD state changes to draw conclusions of state of
> its remote peer but, I strongly believe, BFD is not intended to verify
> anything but path continuity between two nodes and, to some extent, proper
> functioning of the forwarding engines at BFD nodes.
>
>
>
> Regards,
>
> Greg
>
>
>
> On Tue, Nov 28, 2017 at 1:14 PM, Ashesh Mishra <mishra.ashesh@outlook.com>
> wrote:
>
> Thanks for the response, Sami. I think our disconnect lies in the
> definition of a service. From a BFD perspective, I expect the service to be
> established across two nodes, at the very least, so that BFD can monitor
> its liveness. Can you elaborate on
>
>
>
> -          What, in the context of this draft, a service is?
>
> -          How does BFD signal for a service that it is not monitoring
> the liveness for?
>
>
>
> Thanks,
>
> Ashesh
>
>
>
> *From: *Sami Boutros <sboutros@vmware.com>
> *Date: *Tuesday, November 28, 2017 at 1:23 PM
> *To: *Ashesh Mishra <mishra.ashesh@outlook.com>, Ankur Dubey <
> adubey@vmware.com>, "rtg-bfd@ietf.org" <rtg-bfd@ietf.org>
> *Cc: *Reshad Rahman <rrahman@cisco.com>
>
>
> *Subject: *Re: Service Redundancy using BFD
>
>
>
> Hi Ashesh,
>
>
>
> Thanks for your comments.
>
>
>
> For your first comment the draft applies to both single hop or what you
> call interface BFD and multi hop BFD too. And yes the per service could be
> per interface too if this is a single hop BFD, we can clarify that in the
> draft.
>
>
>
> For your second comment, I am not sure I understand. The service will be
> active only on one node, if the service is associated with the whole node,
> then the BFD session is monitoring the node liveness. And when the service
> is associated with an interface the BFD session will monitor the interface
> connectivity as well. So, a primary service can’t be active at the 2 node
> endpoints hosting the BFD session.
>
>
>
> Thanks,
>
>
>
> Sami
>
> *From: *Ashesh Mishra <mishra.ashesh@outlook.com>
> *Date: *Tuesday, November 28, 2017 at 4:04 AM
> *To: *Ankur Dubey <adubey@vmware.com>, "rtg-bfd@ietf.org" <
> rtg-bfd@ietf.org>
> *Cc: *Reshad Rahman <rrahman@cisco.com>, Sami Boutros <sboutros@vmware.com
> >
> *Subject: *Re: Service Redundancy using BFD
>
>
>
> Hi Ankur,
>
>
>
> This is a good proposal to pursue within the BFD-wg.
>
>
>
> Couple of comments:
>
> -          BFD can only signal this diag code for the interface that it
> is monitoring (the IP next hop, MPLS LSP, etc.). You mention per-service
> (which I assume means per-service-per-interface) failover in the draft but
> it may be worthwhile defining behavior on per-*service-type*-per-interface
> as well.
>
> -          There still needs to be a method for the primary and backup
> pairs (two BFD end-points on primary service and two on backup service) to
> communicate with each other (primary-to-primary and backup-to-backup) if
> the service is active or standby. This is useful in the scenario when the
> primary cannot communicate with backup nodes (it is a failure condition
> after all).
>
>
>
> Again, at 10k ft, I like the idea of signaling active/standby using BFD.
>
>
>
> Cheers,
>
> Ashesh
>
>
>
> *From: *Rtg-bfd <rtg-bfd-bounces@ietf.org> on behalf of Ankur Dubey <
> adubey@vmware.com>
> *Date: *Monday, November 27, 2017 at 9:47 PM
> *To: *"rtg-bfd@ietf.org" <rtg-bfd@ietf.org>
> *Cc: *Reshad Rahman <rrahman@cisco.com>, Sami Boutros <sboutros@vmware.com
> >
> *Subject: *Service Redundancy using BFD
>
>
>
> Hi all,
>
>
>
> Please review and provide comments for the following draft:
>
>
>
> https://datatracker.ietf.org/doc/draft-adubey-bfd-service-redundancy/
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__datatracker.ietf.org_doc_draft-2Dadubey-2Dbfd-2Dservice-2Dredundancy_&d=DwMGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=IVzcTRLQdpta08L0b_y2zDkqvwJhRKMCAbX-2K-LV98&m=3D1zKBUXYinynnVWgCSqOkn4ccSIcx6rzDitjPm2dfs&s=d4DdCstEXxJ0sOJ09fOaHRCfpS3chnYNcuVWImRCcFQ&e=>
>
>
>
>
> *Summary of draft:*
>
>
>
> This draft proposes a new BFD diag code via which a node running a BFD
> session with another node, can inform the other node after a BFD session
> times out, that it didn’t go down and did live through the failure.
>
>
>
> Such notification is useful for a set of nodes providing Active/Standby
> redundancy. When these nodes are running multiple L2/L3/L4-L7 services  in
> non-revertive mode of redundancy, the standby node taking over as active
> for non-revertive services after BFD times out needs to indicate in the BFD
> packet that it outlived the other failed old active node. The new diag code
> will be used for this purpose. When this diag code is set in the BFD
> packets, it will provide an indication to the failed old active node that
> it MUST NOT activate the non-revertive services when it comes up.
>
>
>
> For providing a per service level failover, a node activating certain
> non-revertive services needs to indicate that it is Active ONLY for those
> non-revertive services. This can be done by using a unique bitmap where
> each bit position is uniquely identifying a service. This unique bitmap is
> configured on all nodes by a network controller. When there is at least one
> non-revertive service for which a node is not active AND it is active for
> at least 1 non-revertive service, this node will set bits identifying the
> active services in the bitmap and send it in the payload of the BFD packet.
>
>
>
>
>
> Thanks,
>
> --Ankur
>
>
>