Re: BFD stability follow-up from IETF-91

Jeffrey Haas <jhaas@pfrc.org> Fri, 28 November 2014 19:55 UTC

Return-Path: <jhaas@slice.pfrc.org>
X-Original-To: rtg-bfd@ietfa.amsl.com
Delivered-To: rtg-bfd@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 56A931A0143 for <rtg-bfd@ietfa.amsl.com>; Fri, 28 Nov 2014 11:55:38 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.578
X-Spam-Level:
X-Spam-Status: No, score=-1.578 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, IP_NOT_FRIENDLY=0.334, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3f5DV8xWnTO3 for <rtg-bfd@ietfa.amsl.com>; Fri, 28 Nov 2014 11:55:37 -0800 (PST)
Received: from slice.pfrc.org (slice.pfrc.org [67.207.130.108]) by ietfa.amsl.com (Postfix) with ESMTP id 5BCF31A0196 for <rtg-bfd@ietf.org>; Fri, 28 Nov 2014 11:55:37 -0800 (PST)
Received: by slice.pfrc.org (Postfix, from userid 1001) id 161FAC16E; Fri, 28 Nov 2014 14:55:37 -0500 (EST)
Date: Fri, 28 Nov 2014 14:55:37 -0500
From: Jeffrey Haas <jhaas@pfrc.org>
To: Gregory Mirsky <gregory.mirsky@ericsson.com>
Subject: Re: BFD stability follow-up from IETF-91
Message-ID: <20141128195536.GG1274@pfrc>
References: <007701d00af9$28719050$7954b0f0$@chinamobile.com> <D09E5FAC.27C51%mmudigon@cisco.com> <007e01d00b07$9c02cc10$d4086430$@chinamobile.com> <7347100B5761DC41A166AC17F22DF1121B8998E7@eusaamb103.ericsson.se>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <7347100B5761DC41A166AC17F22DF1121B8998E7@eusaamb103.ericsson.se>
User-Agent: Mutt/1.5.21 (2010-09-15)
Archived-At: http://mailarchive.ietf.org/arch/msg/rtg-bfd/OEP_79Sc80OpvTi-1efIW7NUuCo
Cc: "rtg-bfd@ietf.org" <rtg-bfd@ietf.org>
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtg-bfd/>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 28 Nov 2014 19:55:38 -0000

[Speaking as an individual contributor...]

On Fri, Nov 28, 2014 at 07:36:30PM +0000, Gregory Mirsky wrote:
> this is very interesting scenario. I think that if BFD experiences ~30% packet loss, then highly likely so are affected other applications. Then it is not just BFD issue but condition that should be detected  by performance measurement method, whether active or passive packet loss measurement.
> I'm convinced that overloading BFD with performance measurement provisions is counter-productive and is inappropriate.

My opinion is about halfway between your opinion, Greg.

I agree that we wish to be very cautious about overloading BFD with
components that are in other OAM mechanisms.  Among my desire for such
caution is that IEEE has expressed interest in not having us step on their
technologies and this would create paperwork for the chairs. :-)

But where I think we diverge slightly comes from experience in helping the
working group and vendors wend their way through implementing BFD for LAG.
During that discussion, it was very clear that depending on the vendor, the
architecture and sometimes specific chipsets that "BFD" lived in very
different pieces of underlying architecture.

What this means is that trying to do very tight timing things will run into
practical issues in having to figure out what the perspective of the timings
are.  Is it some underlying L2? L3? Something between?  At what point do you
realize you are measuring contradictory things?

But similarly, when trying to measure and account for loss, having some data
is useful simply because it helps you determine that the component that is
*responsible for BFD* may be experiencing loss.  Depending on your
architecture, this may be the underlying layer-1, layer-2 or something else.
In such cases, the lower-layer OAM is better to troubleshoot.  But in cases
where your lower-layer OAM doesn't indicate the loss, you still need to
understand that there is BFD-level loss.

I encourage participants in this discussion to remember this detail: We are
trying to help measure BFD loss.  Trying to read too much detail into what
that means outside of BFD may lead you to erroneous conclusions depending on
a given implementation.

Thus, consider what is best for BFD.

-- Jeff