RE: Echo Function and Asymmetry - Timer negotiation

richard.spencer@bt.com Mon, 15 August 2005 20:14 UTC

Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1E4lLu-0004Mo-Ra; Mon, 15 Aug 2005 16:14:34 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1E4lLs-0004Hq-F0 for rtg-bfd@megatron.ietf.org; Mon, 15 Aug 2005 16:14:32 -0400
Received: from ietf-mx.ietf.org (ietf-mx [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id QAA00837 for <rtg-bfd@ietf.org>; Mon, 15 Aug 2005 16:14:30 -0400 (EDT)
From: richard.spencer@bt.com
Received: from smtp4.smtp.bt.com ([217.32.164.151]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1E4lv3-0002aB-3h for rtg-bfd@ietf.org; Mon, 15 Aug 2005 16:50:54 -0400
Received: from i2km97-ukbr.domain1.systemhost.net ([193.113.197.30]) by smtp4.smtp.bt.com with Microsoft SMTPSVC(6.0.3790.211); Mon, 15 Aug 2005 21:14:20 +0100
Received: from i2km41-ukdy.domain1.systemhost.net ([193.113.30.29]) by i2km97-ukbr.domain1.systemhost.net with Microsoft SMTPSVC(5.0.2195.6713); Mon, 15 Aug 2005 21:14:20 +0100
X-MimeOLE: Produced By Microsoft Exchange V6.0.6603.0
content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Date: Mon, 15 Aug 2005 21:14:19 +0100
Message-ID: <B5E87B043D4C514389141E2661D255EC074C4FD7@i2km41-ukdy.domain1.systemhost.net>
Thread-Topic: Echo Function and Asymmetry - Timer negotiation
Thread-Index: AcWfeOZhtdAYLS1cTD2FzJJp6q5QgACLVrTw
To: rrahman@cisco.com
X-OriginalArrivalTime: 15 Aug 2005 20:14:20.0970 (UTC) FILETIME=[EC03B8A0:01C5A1D5]
X-Spam-Score: 0.3 (/)
X-Scan-Signature: 156eddb66af16eef49a76ae923b15b92
Content-Transfer-Encoding: quoted-printable
Cc: rtg-bfd@ietf.org, dkatz@juniper.net
Subject: RE: Echo Function and Asymmetry - Timer negotiation
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
Sender: rtg-bfd-bounces@ietf.org
Errors-To: rtg-bfd-bounces@ietf.org

Hi Reshad,

Regarding your comment about echo mode decreasing the chances of false-positives due to spikes in CPU usage, I believe this benefit is implementation dependant. The argument I have heard put forward is that if a remote BFD peer's CPU becomes overloaded when using asynchronous mode, BFD control packets may not be sent/processed in a timely manner causing the BFD session to be declared down even though there hasn't been a loss in connectivity. Using echo mode is perceived to solve this issue due to the fact that echo packets are switched using the forwarding plane h/w rather than the CPU.

Firstly, some BFD systems (e.g. simple low cost access devices with minimal IP functionality) may process/forward all IP packets using the CPU, in which case, if there are CPU spikes BFD echo packets will be affected in the same way that BFD control packets are (assuming they are given the same priority).

Secondly, even if an implementation switches received echo packets in h/w, it may still use its CPU to generate/process its own echo packets. In which case, if a system's CPU resources become overloaded, although it will continue to switch back echo packets that it receives from its peer, it may fail to process its own echo packets in time and the session will be declared down anyway. Of course, this is also depends on what constitutes a failure using echo mode as this is implementation dependant (it isn't defined in any of the drafts).

Therefore, to ensure an echo mode BFD implementation is not affected by CPU usage spikes, it must not generate, process, or switch BFD packets using the CPU, i.e. all BFD echo packet generation/processing must be done in hardware. 

In order to determine how significant the chance of decreasing the chance of false-positives using echo mode is, it would be interesting to hear from vendors if full h/w based BFD echo packet processing is the exception or the norm?

Regards,
Richard

-----Original Message-----
From: Reshad Rahman [mailto:rrahman@cisco.com]
Sent: 12 August 2005 21:03
To: Spencer,R,Richard,XDE73 R
Cc: dkatz@juniper.net; rtg-bfd@ietf.org; tanyas@cisco.com
Subject: Re: Echo Function and Asymmetry - Timer negotiation


Hi Richard,

Please see inline.

richard.spencer@bt.com wrote: 
Hi Reshad,

In the example below, as you point out the TX burden will be on A. On the other hand though, B will be receiving control/echo packets at a fast rate, whilst A will be receiving control packets at a sedate rate and echo packets at a fast rate. Therefore, from a TX and RX perspective, things are a bit more balanced.
B is receiving echo packets but it is forwarding/looping-back the echo packets. I should have been clearer: I wasn't referring to raw tx/rx but to the host-stack tx/rx. And the burden on the host-stack is bigger on A.


As I say though, I don't understand why anyone would want to use echo mode, except for on demand fault diagnosis. One might argue that echo mode tests the forwarding plane of the remote system, but just because the remote system is looping BFD echo packets back correctly doesn't mean that it is routing/switching all packets correctly, e.g. the routing/switching table could be partially corrupted.


I agree that echo doesn't detect partial failures. But echo mode does detect failures which occur as a result of the whole fwding engine being hosed.

I think echo has the benefit of decreasing the chances of false-positives due to spikes in CPU usage.

Regards,
Reshad.

Regards,
Richard

 -----Original Message-----
From: Reshad Rahman [mailto:rrahman@cisco.com]
Sent: 12 August 2005 14:34
To: Spencer,R,Richard,XDE73 R
Cc: dkatz@juniper.net; rtg-bfd@ietf.org; Tanya Shastri
Subject: Re: Echo Function and Asymmetry - Timer negotiation


Richard,

Thanks for the response, makes sense. So in the example below where only A is using echo mode, this means that A will be sending control and echo packets at a fast rate and B will be sending control packets at the sedate rate? If that's the case all the tx burden will be on A, asymmetric echo doesn't seem fair...

Regards,
Reshad.

richard.spencer@bt.com wrote:

Reshad,

  
It's not clear to me what's the benefit of doing this if the 
asymmetric echo is being run at fast rate. Failure in any 
direction will be detected by the asymmetric echo and the 
other end (which isn't running echo) will be notified on 
echo failure. So it's not clear to me what's the benefit of 
the the guy not running echo to be receiving control 
packets at a faster rate. It would seem that with asymmetric 
echo we can still run control packets at a sedate rate in both 
directions. Or am I missing something?
    

The problem is that you are assuming the system not running echo mode will always be notified of a failure. Lets say we have a BFD session between two systems A and B, echo mode is active on system A, but system B is just using asynchronous mode. If there is a unidirectional failure in the direction B->A, system A will detect the failure and will send a BFD control packet to B indicating that there is a failure. In this scenario, both system A and system B will be aware of the failure.

However, if there is a unidirectional failure in the direction A->B, system A will detect the failure and will send a BFD control packet to B indicating that there is a failure. This is where the problem lies, system B will not receive the BFD control packet because the failure is in the direction A->B, and therefore B will not detect the failure until it's asynchronous timer has timed out. Similarly, if there is a bi-directional failure, system A will detect the failure and will send a BFD control packet to B indicating that there is a failure, but again system B will not receive the failure notification.

I personally don't like echo mode as an always on fault detection mechanism for a number of reasons:

1. Echo mode does not provide any indication of the direction/location of the failure, it could be in the direction A->B, B->A, or it could be a forwarding plane failure in the remote system.
2. To support symmetric failure detection times, echo mode requires twice as many packets to be transmitted/received as asynchronous mode does if active on both systems, and 50% more if active on just one system.
3. The draft does not define exactly how a failure is detected in echo mode, therefore vendors may use different methods/settings for fault detection using echo mode. In a multi-vendor environment, this may require translation between different methods/settings in order to ensure symmetry in detecting failures (if they are actually user configurable), which adds management/operational complexity.
4. Failure detection times using echo mode are more susceptible to variations due to the packets being looped back, i.e. [downstream propagation delay + far end switching delay + upstream propagation delay] vs. [one way propagation delay].

I would compare BFD asynchronous mode to the use of continuity check (CC) cells and echo mode to the use of loopback cells in ATM. In general, loopback tests are useful as on demand tests for diagnosing faults (e.g. for locating a fault), but are not suitable as always on fault detection mechanisms.

Regards,
Richard