RE: BFD stability follow-up from IETF-91

Marc Binderberger <marc@sniff.de> Mon, 01 December 2014 10:25 UTC

Return-Path: <marc@sniff.de>
X-Original-To: rtg-bfd@ietfa.amsl.com
Delivered-To: rtg-bfd@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CDC6C1A1B28 for <rtg-bfd@ietfa.amsl.com>; Mon, 1 Dec 2014 02:25:23 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.56
X-Spam-Level:
X-Spam-Status: No, score=-1.56 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HELO_EQ_DE=0.35, T_RP_MATCHES_RCVD=-0.01] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OImX6M59YmUn for <rtg-bfd@ietfa.amsl.com>; Mon, 1 Dec 2014 02:25:22 -0800 (PST)
Received: from door.sniff.de (door.sniff.de [IPv6:2001:6f8:94f:1::1]) by ietfa.amsl.com (Postfix) with ESMTP id 0EC461A1B1F for <rtg-bfd@ietf.org>; Mon, 1 Dec 2014 02:25:22 -0800 (PST)
Received: from [IPv6:::1] (localhost.sniff.de [127.0.0.1]) by door.sniff.de (Postfix) with ESMTP id E41722AA0F; Mon, 1 Dec 2014 10:25:19 +0000 (GMT)
Date: Mon, 01 Dec 2014 02:28:35 -0800
From: Marc Binderberger <marc@sniff.de>
To: Gregory Mirsky <gregory.mirsky@ericsson.com>
Message-ID: <20141201022835893859.2b253881@sniff.de>
In-Reply-To: <7347100B5761DC41A166AC17F22DF1121B89C865@eusaamb103.ericsson.se>
References: <007701d00af9$28719050$7954b0f0$@chinamobile.com> <D09E5FAC.27C51%mmudigon@cisco.com> <007e01d00b07$9c02cc10$d4086430$@chinamobile.com> <7347100B5761DC41A166AC17F22DF1121B8998E7@eusaamb103.ericsson.se> <20141128195536.GG1274@pfrc> <7347100B5761DC41A166AC17F22DF1121B89C865@eusaamb103.ericsson.se>
Subject: RE: BFD stability follow-up from IETF-91
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Mailer: GyazMail version 1.5.15
Archived-At: http://mailarchive.ietf.org/arch/msg/rtg-bfd/zE1UczitAb6BmxJaFIH4vV5uF1k
Cc: "rtg-bfd@ietf.org" <rtg-bfd@ietf.org>
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtg-bfd/>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 01 Dec 2014 10:25:24 -0000

Hello Gregory,

the test of "do we _really_ need this" is always a good idea :-)

>> I'm convinced that overloading BFD with performance measurement provisions 
>> is counter-productive and is inappropriate.

BFD is successful because it is so simple. And because we focused on doing 
this one thing right. Agree with your comment. It does not exclude to think 
about debug improvements though.

And as I claimed in several emails ;-), I think that at least some aspects of 
the discussion can be covered by "implementation", e.g. local time stamps and 
some common set of information collected in implementations. I.e. an informal 
draft could cover this.

To identify a packet though, so we can correlate Tx and Rx debug data, I 
don't see a good alternative to a sequence number yet. Which has the 
additional benefit that it tells us about packet loss (but for me the main 
reason is identification). But what is the alternative to introducing an 
extra field - reusing existing fields instead? (*)


Nevertheless, I prefer an "70%" debug solution with small BFD changes to a 
more complex "100%" solution.

My $0.02


Regards, Marc

P.S.: well, you know a v2 header would offer more options ;-) but looking at 
the v1 packet:

Local discriminator: some RFCs don't allow this to change (e.g. RFC5884). 
Otherwise it would be a nice identifier not only of a session but even of a 
single packet.

Required Min Echo RX Interval: my favourite as this field is effectively 
wasted for async BFD. The top bits translate into very large echo intervals 
(read: not very useful) and could be used as flags instead. The least 
significant bits translate into usec values, much faster than any 
implementation (yet?) and could also be used as flags instead. 

E.g. values of 0x80000000 - 0xFFFFFFFF could translate into an echo interval 
of zero and a 31 bit sequence number. But I don't see how to guarantee there 
won't be any interop problems with older implementations :-)







On Sat, 29 Nov 2014 03:17:54 +0000, Gregory Mirsky wrote:
> Hi Jeff,
> I absolutely agree with continuing discussion and sharing experiences from 
> real-life deployments and interoperability of BFD in IP and MPLS networks, 
> single-, multi-hop and over LAG constituents. From that we're learning and 
> that helps us make our implementations more robust, reliable, and 
> interoperable. And certainly, where it benefits the community informational 
> RFCs, like BFD intervals or RFC 7325 MPLS Forwarding Compliance and 
> Performance Requirements, to document the cases and our recommendations. 
> But I'm somewhat concerned with anything that targets standard status, even 
> as optional functionality, even though it is not proven that the problem is 
> in the standard, not in an implementation.
> 
> 	Regards,
> 		Greg
> 
> -----Original Message-----
> From: Jeffrey Haas [mailto:jhaas@pfrc.org] 
> Sent: Friday, November 28, 2014 11:56 AM
> To: Gregory Mirsky
> Cc: Fan, Peng; 'MALLIK MUDIGONDA (mmudigon)'; rtg-bfd@ietf.org
> Subject: Re: BFD stability follow-up from IETF-91
> 
> [Speaking as an individual contributor...]
> 
> On Fri, Nov 28, 2014 at 07:36:30PM +0000, Gregory Mirsky wrote:
>> this is very interesting scenario. I think that if BFD experiences ~30% 
>> packet loss, then highly likely so are affected other applications. Then 
>> it is not just BFD issue but condition that should be detected  by 
>> performance measurement method, whether active or passive packet loss 
>> measurement.
>> I'm convinced that overloading BFD with performance measurement provisions 
>> is counter-productive and is inappropriate.
> 
> My opinion is about halfway between your opinion, Greg.
> 
> I agree that we wish to be very cautious about overloading BFD with 
> components that are in other OAM mechanisms.  Among my desire for such 
> caution is that IEEE has expressed interest in not having us step on their 
> technologies and this would create paperwork for the chairs. :-)
> 
> But where I think we diverge slightly comes from experience in helping the 
> working group and vendors wend their way through implementing BFD for LAG.
> During that discussion, it was very clear that depending on the vendor, the 
> architecture and sometimes specific chipsets that "BFD" lived in very 
> different pieces of underlying architecture.
> 
> What this means is that trying to do very tight timing things will run into 
> practical issues in having to figure out what the perspective of the 
> timings are.  Is it some underlying L2? L3? Something between?  At what 
> point do you realize you are measuring contradictory things?
> 
> But similarly, when trying to measure and account for loss, having some 
> data is useful simply because it helps you determine that the component 
> that is *responsible for BFD* may be experiencing loss.  Depending on your 
> architecture, this may be the underlying layer-1, layer-2 or something else.
> In such cases, the lower-layer OAM is better to troubleshoot.  But in cases 
> where your lower-layer OAM doesn't indicate the loss, you still need to 
> understand that there is BFD-level loss.
> 
> I encourage participants in this discussion to remember this detail: We are 
> trying to help measure BFD loss.  Trying to read too much detail into what 
> that means outside of BFD may lead you to erroneous conclusions depending 
> on a given implementation.
> 
> Thus, consider what is best for BFD.
> 
> -- Jeff
>