Re: BFD stability follow-up from IETF-91

Manav Bhatia <manavbhatia@gmail.com> Fri, 05 December 2014 02:51 UTC

Return-Path: <manavbhatia@gmail.com>
X-Original-To: rtg-bfd@ietfa.amsl.com
Delivered-To: rtg-bfd@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 093B01A9095 for <rtg-bfd@ietfa.amsl.com>; Thu, 4 Dec 2014 18:51:17 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jEMe0iLiWTRZ for <rtg-bfd@ietfa.amsl.com>; Thu, 4 Dec 2014 18:51:14 -0800 (PST)
Received: from mail-ob0-x229.google.com (mail-ob0-x229.google.com [IPv6:2607:f8b0:4003:c01::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1A24E1A9092 for <rtg-bfd@ietf.org>; Thu, 4 Dec 2014 18:51:14 -0800 (PST)
Received: by mail-ob0-f169.google.com with SMTP id vb8so3917157obc.0 for <rtg-bfd@ietf.org>; Thu, 04 Dec 2014 18:51:13 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=pMSdpOpIdP9sEgZWX4NiK59p+VzmaR6YBQI2uH+vFfk=; b=f9ry/zsGDTHhtMGxI2BA15MrBDrvLCEfpAXp2BHMsQBkUCntXoZKKCpuWtCXWPlmHe ILdwHzWsdqUYaa8bfHCqSafNHVLyJObWWVZK0egZUz5KXWuzmEffZ6TV4HKcXwTB79VH 8OLVxXIO/r0TZVYYh60s9B//DAiXsT/kJRmW4N4dNySQLUzXBzG+/4iptyb1ljSFJByC XwdFs4RrNS0BN8Ge+croy5Cm61Z3LLrQvpk8O0SOrLttpIPDqwvqf7bxiYVicqo0Cmei qtesGb3u9vAQlwU6f5x+dQjZimyCJmDnRO3eRrEP/RaRIVktY1lDZOTsi71LWkwRziD7 HrmA==
MIME-Version: 1.0
X-Received: by 10.202.204.208 with SMTP id c199mr8701444oig.42.1417747873431; Thu, 04 Dec 2014 18:51:13 -0800 (PST)
Received: by 10.76.178.199 with HTTP; Thu, 4 Dec 2014 18:51:13 -0800 (PST)
In-Reply-To: <58D290A6-1EB1-425B-9FFA-3025A3CAE4EE@gmail.com>
References: <CO2PR0501MB823C222B7D62779F4DF58CDB3780@CO2PR0501MB823.namprd05.prod.outlook.com> <D0A647C1.28843%mmudigon@cisco.com> <CO2PR0501MB8234A1BDDFD008EE12C847AB3780@CO2PR0501MB823.namprd05.prod.outlook.com> <CECE764681BE964CBE1DFF78F3CDD3943F5AE38D@xmb-aln-x01.cisco.com> <CAG1kdogkUr2YyodeUPWOqea+2jqOkmdYnPywVHCw8j1+=9eM6A@mail.gmail.com> <CECE764681BE964CBE1DFF78F3CDD3943F5AE4AE@xmb-aln-x01.cisco.com> <20141204151708.GA9458@pfrc> <7347100B5761DC41A166AC17F22DF1121B8AAC29@eusaamb103.ericsson.se> <059338DA-6758-46C1-AD23-D2039C875D09@gmail.com> <CAG1kdogeZBuhmRmgkY2jo2oFTMOXzwWbS=f0H4M4mh9mJXAdNg@mail.gmail.com> <58D290A6-1EB1-425B-9FFA-3025A3CAE4EE@gmail.com>
Date: Fri, 05 Dec 2014 08:21:13 +0530
Message-ID: <CAG1kdoiumymdnAyG8jOJNSztVHtTO0DzLeHd1SnpP8R6xNeVvw@mail.gmail.com>
Subject: Re: BFD stability follow-up from IETF-91
From: Manav Bhatia <manavbhatia@gmail.com>
To: "Sam K. Aldrin" <aldrin.ietf@gmail.com>
Content-Type: multipart/alternative; boundary="001a1135329c6089e305096f2656"
Archived-At: http://mailarchive.ietf.org/arch/msg/rtg-bfd/d9cmf7InUmxZ-fVFUL-pTfMIzC0
Cc: "rtg-bfd@ietf.org" <rtg-bfd@ietf.org>
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtg-bfd/>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Dec 2014 02:51:17 -0000

Hi Sam,

> Ideally even if there is some bit of congestion i would like the BFD
> packet to get through.
>
> I understand the queuing problems but I am not clear how the ID is going
> to solve the problem, if there is only congestion and not
>

It would help.

Assume the last sequence number you saw before the flap was s1. You timed
out since you did not see s2 and s3 before the timeout. Now further assume
that you know that you did receive s2 and s3, however they arrived after
the BFD expiry interval then you know that there wasnt a drop, and the
packet arrived late because of some queing issue. Now how you determine
whether this delay was seen at the TX or RX side is open to discussion.

Without a sequence number you have no idea whether the packet was dropped
or whether it arrived/processed late.

If by some out-of-band mechanism you can figure out that the TX was done on
time and the delay was at RX end then its an implementation issue on the RX
side. If the TX was delayed then its an implementation issue at the TX side.

This helps isolating the node that needs to fix the issue, otherwise we're
only shooting in the dark.

packet drop. In that case just the sequence # doesn't help and timestamping
> is needed. Even if timestamping is to be done, realistically it cannot
> happen the same way across multi vendor i.e. where the timestamping
> should/could be done. For ex: Before it is queued or after OR
> LC/RP/SP/Process etc.
>

This isnt a new problem. Each vendor time stamps 1588 packets differently.
However, the aggregate solution works across multiple vendors.

While it may not solve all the issues because of the vagaries of how each
vendor does time stamping it would certainly help debugging large number of
BFD flaps.


> Secondly, if the congestion happens, the CIR/PIR should apply to data
> packets too. In that case BFD flap is at least a good indicator of the
> problem, isn't it?
>

There is usually a separate CIR/PIR for different CPU bound packets. So
based on some parameters you might impose a different CIR/PIR for BFD than
say, ssh and radius packets. If BFD flaps and you know you missed a few
sequence numbers and you see drops in that queue then all of this could be
co-related and you could fix the CPU queue parameters for BFD. I understand
that this is a very implementation specific issue, but then thats what i
had said earlier -- such a mechanism can help in isolating implementation
specific issues as well.


> Lastly, these improvements is change from the existing BFD model/protocol.
> I do not see why it shouldn't be part of BFD v2 OR lead to v2 :D
>

Sure, that would make Marc very happy ! :-)

I am not sure if we have enough momentum right now that can propel us
towards BFD v2.

OTOH, if the WG believes that this is an opportune time for us to start
looking at BFDv2, then i would be more than willing to participate!

Cheers, Manav

>
> -sam
>
>
> Cheers, Manav
>
> - I see concerns regarding timestamps and sequence numbers expressed in
>> emails. In that case, the proposed model is still not going to identify the
>> problem completely. Am I reading it right?
>>
>> -sam
>> On Dec 4, 2014, at 7:47 AM, Gregory Mirsky wrote:
>>
>> > Hi Jeff,
>> > I can reference RFC 5357 here. The Appendix describes what is called
>> TWAMP-Light mode with Stateless Reflector. About year and a half the Errata
>> been accepted that describes Stateful Reflector, which supports measurement
>> of one-way latency/jitter and packet loss metrics.
>> >
>> >       Regards,
>> >               Greg
>> >
>> > -----Original Message-----
>> > From: Rtg-bfd [mailto:rtg-bfd-bounces@ietf.org] On Behalf Of Jeffrey
>> Haas
>> > Sent: Thursday, December 04, 2014 7:17 AM
>> > To: Nobo Akiya (nobo)
>> > Cc: rtg-bfd@ietf.org
>> > Subject: Re: BFD stability follow-up from IETF-91
>> >
>> > On Thu, Dec 04, 2014 at 03:14:50PM +0000, Nobo Akiya (nobo) wrote:
>> >> If what you say is the only requirement not met, one approach may be
>> to pursue a non-standard-track document describing some suggested
>> implementation techniques to locally store TX/RX timestamp.
>> >>
>> >> Given that echo approach will be less accurate and given that we seem
>> to be having difficulty converging, I thought I???ll throw out another idea.
>> >
>> > I think my biggest concern is that the echo approach has bidirectional
>> packet loss possibilities.  Async at least lets the receiver know about
>> unidirectional packet loss.
>> >
>> > Of course, if your goal is to notify the sender that their packets are
>> being lost, you need a backchannel anyway.  I just don't know if we want
>> that back channel to be bfd.
>> >
>> > - Jeff
>> >
>>
>>
>
>