Re: Echo Function and Asymmetry - Timer negotiation

Reshad Rahman <rrahman@cisco.com> Mon, 15 August 2005 21:21 UTC

Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1E4mOO-0003mX-L9; Mon, 15 Aug 2005 17:21:12 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1E4mON-0003l6-5T for rtg-bfd@megatron.ietf.org; Mon, 15 Aug 2005 17:21:11 -0400
Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id RAA05272 for <rtg-bfd@ietf.org>; Mon, 15 Aug 2005 17:21:09 -0400 (EDT)
Received: from sj-iport-3-in.cisco.com ([171.71.176.72] helo=sj-iport-3.cisco.com) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1E4mxX-0004al-MR for rtg-bfd@ietf.org; Mon, 15 Aug 2005 17:57:34 -0400
Received: from sj-core-1.cisco.com (171.71.177.237) by sj-iport-3.cisco.com with ESMTP; 15 Aug 2005 14:20:59 -0700
X-IronPort-AV: i="3.96,108,1122879600"; d="scan'208"; a="332408380:sNHT34387544"
Received: from xbh-rtp-211.amer.cisco.com (xbh-rtp-211.cisco.com [64.102.31.102]) by sj-core-1.cisco.com (8.12.10/8.12.6) with ESMTP id j7FLKl0N029990; Mon, 15 Aug 2005 14:20:55 -0700 (PDT)
Received: from xfe-rtp-201.amer.cisco.com ([64.102.31.38]) by xbh-rtp-211.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.211); Mon, 15 Aug 2005 17:20:39 -0400
Received: from [192.168.1.102] ([10.86.240.185]) by xfe-rtp-201.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.211); Mon, 15 Aug 2005 17:20:38 -0400
Message-ID: <43010724.3050303@cisco.com>
Date: Mon, 15 Aug 2005 17:20:36 -0400
From: Reshad Rahman <rrahman@cisco.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.2) Gecko/20040804 Netscape/7.2 (ax)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: richard.spencer@bt.com
References: <B5E87B043D4C514389141E2661D255EC074C4FD7@i2km41-ukdy.domain1.systemhost.net>
In-Reply-To: <B5E87B043D4C514389141E2661D255EC074C4FD7@i2km41-ukdy.domain1.systemhost.net>
Content-Type: text/plain; charset="us-ascii"; format="flowed"
Content-Transfer-Encoding: 7bit
X-OriginalArrivalTime: 15 Aug 2005 21:20:38.0833 (UTC) FILETIME=[2F017210:01C5A1DF]
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 22bbb45ef41b733eb2d03ee71ece8243
Content-Transfer-Encoding: 7bit
Cc: rtg-bfd@ietf.org, dkatz@juniper.net
Subject: Re: Echo Function and Asymmetry - Timer negotiation
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
Sender: rtg-bfd-bounces@ietf.org
Errors-To: rtg-bfd-bounces@ietf.org

Hi Richard,

richard.spencer@bt.com wrote:

>Hi Reshad,
>
>Regarding your comment about echo mode decreasing the chances of false-positives due to spikes in CPU usage, I believe this benefit is implementation dependant. 
>
I agree that this is implementation specific. But I do believe one of 
the goals behind the echo mode was to decrease the chances of 
false-postives. This is just my understanding, there's no mention of 
this in the specs, so I could be wrong.

>The argument I have heard put forward is that if a remote BFD peer's CPU becomes overloaded when using asynchronous mode, BFD control packets may not be sent/processed in a timely manner causing the BFD session to be declared down even though there hasn't been a loss in connectivity. Using echo mode is perceived to solve this issue due to the fact that echo packets are switched using the forwarding plane h/w rather than the CPU.
>
>Firstly, some BFD systems (e.g. simple low cost access devices with minimal IP functionality) may process/forward all IP packets using the CPU, in which case, if there are CPU spikes BFD echo packets will be affected in the same way that BFD control packets are (assuming they are given the same priority).
>  
>
Yes this can happen on some systems, although typically packet 
forwarding does get higher priority.

>Secondly, even if an implementation switches received echo packets in h/w, it may still use its CPU to generate/process its own echo packets. In which case, if a system's CPU resources become overloaded, although it will continue to switch back echo packets that it receives from its peer, it may fail to process its own echo packets in time and the session will be declared down anyway. 
>
If the system's CPU is overloaded, then most likely the system isn't 
sending echo packets either. But again, this is implementation dependent.

>Of course, this is also depends on what constitutes a failure using echo mode as this is implementation dependant (it isn't defined in any of the drafts).
>  
>
I agree.

>Therefore, to ensure an echo mode BFD implementation is not affected by CPU usage spikes, it must not generate, process, or switch BFD packets using the CPU, i.e. all BFD echo packet generation/processing must be done in hardware. 
>  
>
Even if the echo packets are generated and processed by s/w, I think 
it's possible to have an implementation which woudn't detect echo 
failure (of course as you pointed out this depends on the definition of 
echo failure) if the CPU gets busy (assuming echo packets are being 
switched back by the remote end).

Regards,
Reshad.

>In order to determine how significant the chance of decreasing the chance of false-positives using echo mode is, it would be interesting to hear from vendors if full h/w based BFD echo packet processing is the exception or the norm?
>
>Regards,
>Richard
>
>-----Original Message-----
>From: Reshad Rahman [mailto:rrahman@cisco.com]
>Sent: 12 August 2005 21:03
>To: Spencer,R,Richard,XDE73 R
>Cc: dkatz@juniper.net; rtg-bfd@ietf.org; tanyas@cisco.com
>Subject: Re: Echo Function and Asymmetry - Timer negotiation
>
>
>Hi Richard,
>
>Please see inline.
>
>richard.spencer@bt.com wrote: 
>Hi Reshad,
>
>In the example below, as you point out the TX burden will be on A. On the other hand though, B will be receiving control/echo packets at a fast rate, whilst A will be receiving control packets at a sedate rate and echo packets at a fast rate. Therefore, from a TX and RX perspective, things are a bit more balanced.
>B is receiving echo packets but it is forwarding/looping-back the echo packets. I should have been clearer: I wasn't referring to raw tx/rx but to the host-stack tx/rx. And the burden on the host-stack is bigger on A.
>
>
>As I say though, I don't understand why anyone would want to use echo mode, except for on demand fault diagnosis. One might argue that echo mode tests the forwarding plane of the remote system, but just because the remote system is looping BFD echo packets back correctly doesn't mean that it is routing/switching all packets correctly, e.g. the routing/switching table could be partially corrupted.
>
>
>I agree that echo doesn't detect partial failures. But echo mode does detect failures which occur as a result of the whole fwding engine being hosed.
>
>I think echo has the benefit of decreasing the chances of false-positives due to spikes in CPU usage.
>
>Regards,
>Reshad.
>
>Regards,
>Richard
>
> -----Original Message-----
>From: Reshad Rahman [mailto:rrahman@cisco.com]
>Sent: 12 August 2005 14:34
>To: Spencer,R,Richard,XDE73 R
>Cc: dkatz@juniper.net; rtg-bfd@ietf.org; Tanya Shastri
>Subject: Re: Echo Function and Asymmetry - Timer negotiation
>
>
>Richard,
>
>Thanks for the response, makes sense. So in the example below where only A is using echo mode, this means that A will be sending control and echo packets at a fast rate and B will be sending control packets at the sedate rate? If that's the case all the tx burden will be on A, asymmetric echo doesn't seem fair...
>
>Regards,
>Reshad.
>
>richard.spencer@bt.com wrote:
>
>Reshad,
>
>  
>It's not clear to me what's the benefit of doing this if the 
>asymmetric echo is being run at fast rate. Failure in any 
>direction will be detected by the asymmetric echo and the 
>other end (which isn't running echo) will be notified on 
>echo failure. So it's not clear to me what's the benefit of 
>the the guy not running echo to be receiving control 
>packets at a faster rate. It would seem that with asymmetric 
>echo we can still run control packets at a sedate rate in both 
>directions. Or am I missing something?
>    
>
>The problem is that you are assuming the system not running echo mode will always be notified of a failure. Lets say we have a BFD session between two systems A and B, echo mode is active on system A, but system B is just using asynchronous mode. If there is a unidirectional failure in the direction B->A, system A will detect the failure and will send a BFD control packet to B indicating that there is a failure. In this scenario, both system A and system B will be aware of the failure.
>
>However, if there is a unidirectional failure in the direction A->B, system A will detect the failure and will send a BFD control packet to B indicating that there is a failure. This is where the problem lies, system B will not receive the BFD control packet because the failure is in the direction A->B, and therefore B will not detect the failure until it's asynchronous timer has timed out. Similarly, if there is a bi-directional failure, system A will detect the failure and will send a BFD control packet to B indicating that there is a failure, but again system B will not receive the failure notification.
>
>I personally don't like echo mode as an always on fault detection mechanism for a number of reasons:
>
>1. Echo mode does not provide any indication of the direction/location of the failure, it could be in the direction A->B, B->A, or it could be a forwarding plane failure in the remote system.
>2. To support symmetric failure detection times, echo mode requires twice as many packets to be transmitted/received as asynchronous mode does if active on both systems, and 50% more if active on just one system.
>3. The draft does not define exactly how a failure is detected in echo mode, therefore vendors may use different methods/settings for fault detection using echo mode. In a multi-vendor environment, this may require translation between different methods/settings in order to ensure symmetry in detecting failures (if they are actually user configurable), which adds management/operational complexity.
>4. Failure detection times using echo mode are more susceptible to variations due to the packets being looped back, i.e. [downstream propagation delay + far end switching delay + upstream propagation delay] vs. [one way propagation delay].
>
>I would compare BFD asynchronous mode to the use of continuity check (CC) cells and echo mode to the use of loopback cells in ATM. In general, loopback tests are useful as on demand tests for diagnosing faults (e.g. for locating a fault), but are not suitable as always on fault detection mechanisms.
>
>Regards,
>Richard
>  
>  
>