Exiting loss of connectivity state
richard.spencer@bt.com Tue, 01 March 2005 14:30 UTC
Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id JAA29143; Tue, 1 Mar 2005 09:30:58 -0500 (EST)
Received: from megatron.ietf.org ([132.151.6.71]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1D68Pn-0004Z5-Ta; Tue, 01 Mar 2005 09:32:00 -0500
Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1D68Mn-0004EH-Hk; Tue, 01 Mar 2005 09:28:53 -0500
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1D68Ml-0004E4-Do for rtg-bfd@megatron.ietf.org; Tue, 01 Mar 2005 09:28:51 -0500
Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id JAA28999 for <rtg-bfd@ietf.org>; Tue, 1 Mar 2005 09:28:50 -0500 (EST)
From: richard.spencer@bt.com
Received: from smtp5.smtp.bt.com ([217.32.164.139]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1D68Nj-0004WX-AH for rtg-bfd@ietf.org; Tue, 01 Mar 2005 09:29:51 -0500
Received: from i2km95-ukbr.domain1.systemhost.net ([193.113.197.29]) by smtp5.smtp.bt.com with Microsoft SMTPSVC(6.0.3790.80); Tue, 1 Mar 2005 14:28:19 +0000
Received: from i2km41-ukdy.domain1.systemhost.net ([193.113.30.29]) by i2km95-ukbr.domain1.systemhost.net with Microsoft SMTPSVC(5.0.2195.6713); Tue, 1 Mar 2005 14:28:13 +0000
X-MimeOLE: Produced By Microsoft Exchange V6.0.6603.0
Content-Class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Date: Tue, 01 Mar 2005 14:28:12 -0000
Message-ID: <B5E87B043D4C514389141E2661D255EC0A8358CE@i2km41-ukdy.domain1.systemhost.net>
Thread-Topic: Exiting loss of connectivity state
thread-index: AcUearq6Ks2YGb4rSUa8ibc1DJc2Mw==
To: rtg-bfd@ietf.org
X-OriginalArrivalTime: 01 Mar 2005 14:28:13.0280 (UTC) FILETIME=[E6857600:01C51E6A]
X-Spam-Score: 0.3 (/)
X-Scan-Signature: f607d15ccc2bc4eaf3ade8ffa8af02a0
Content-Transfer-Encoding: quoted-printable
Subject: Exiting loss of connectivity state
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
Sender: rtg-bfd-bounces@ietf.org
Errors-To: rtg-bfd-bounces@ietf.org
X-Spam-Score: 0.3 (/)
X-Scan-Signature: 02ec665d00de228c50c93ed6b5e4fc1a
Content-Transfer-Encoding: quoted-printable
According to the current base draft "The Detect Mult value is (roughly speaking, due to jitter) the number of packets that have to be missed in a row to declare the session to be down." So, if the TX Interval is 1sec and the Detect Mult is 3 (in both directions), and a unidirectional failure occurs in the direction A->B, then: - B will declare the session down (roughly speaking, due to jitter) after non receipt of BFD packets over a 3 second period. - A will declare the session down after the time it takes B to detect the failure (about 3 secs in this example) and then send a BFD packet to A with I Hear You 0 and Local Diag 1. This approach is consistent with other OAM mechanisms, i.e. by configuring the TX Interval to be 1sec and the Detect Mult to be 3, the loss of connectivity state is entered after non-receipt of packets over a 3sec period. However, how long does it take for a BFD session to be declared up again after connectivity has been restored, i.e. what is the exit criteria for the loss of connectivity state? Other OAM mechanisms use exit criteria that is consistent with the entry criteria, i.e. the loss of connectivity state is exited after receiving the expected number of OAM packets over a 3 second period. In BFDs case, the time taken and number of packets TX/RX required to transition from the Down/Failing/Init state to the Up state is dependant on the current session states at the time connectivity is restored, and how quickly an implementation sends BFD packets (i.e. whether or not between schedule BFD sessions are sent and for which state changes). I have worked through a few scenarios, and depending on the variables, it could take a less than a second or several seconds for a session to come back up again (or a session might not ever come back up under specific conditions based on discussions on a previous thread "Re: Between schedule transmissions"). MPLS BFD has been proposed as a connection verification tool for MPLS LSPs. It seems to me that the current BFD draft allows an operator to determine when connectivity has been lost (i.e. following detection timeout) but does not allow operators to determine with any certainty if/when connectivity has been restored. This information is important in checking that SLAs are being met, i.e. how long connectivity has been lost for. This information is also important for protection switching as the backup path may be expensive to use or may not have sufficient resource for all traffic (i.e. lower priority traffic is dropped), and therefore we need to switch back over to the primary as soon as possible after connectivity has been restored. It appears to me that extensive work will need to be carried out by operators in analysing different failure scenarios in order to determine what the effect of sessions being declared back up at different time periods will be, and to ensure that the probability of BFD sessions not coming back up at all is as close to zero as possible. Testing will be particularly important where BFD sessions are used across multiple levels of hierarchy (e.g. a BFD session could be declared up at a client layer before a BFD session at server layer), and also in multivendor environments where implementations may or may not send between schedule transmissions for different state transitions. Does anyone have any comments/solutions regarding this problem? Thanks, Richard
- Exiting loss of connectivity state richard.spencer