Re: [IPsec] criteria for failure detection

Yoav Nir <ynir@checkpoint.com> Mon, 22 March 2010 20:04 UTC

From: Yoav Nir <ynir@checkpoint.com>
To: "Hu, Jun (Jun)" <jun.hu@alcatel-lucent.com>
Date: Mon, 22 Mar 2010 22:04:41 +0200
Thread-Topic: [IPsec] criteria for failure detection
Thread-Index: AcrJ+vXXq2q3QgUMR0iQ6yhNmhlKbw==
Message-ID: <7C523A18-610F-4B3D-8BAB-0511C157F9A3@checkpoint.com>
References: <098C866D3766674B95CCF197E1C2489B0B82631361@USNAVSXCHMBSC1.ndc.alcatel-lucent.com>
In-Reply-To: <098C866D3766674B95CCF197E1C2489B0B82631361@USNAVSXCHMBSC1.ndc.alcatel-lucent.com>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "ipsec@ietf.org" <ipsec@ietf.org>
Subject: Re: [IPsec] criteria for failure detection
Precedence: list

Failure detection by either QCD or SIR takes 1-2 roundtrips, whether this is subsecond or not depends on round-trip time.

In case of a non-synchronized hot-standby gateway, those 2 roundtrips begin the moment that the standby gateway becomes active, so the call probably doesn't get dropped.

In case of a single gateway, these two roundtrips start when the gateway is back online.
- with a manual tunnel deletion, this can be immediate.
- with a restart of IPsec services, this can be 10-15 seconds (for a particular implementation)
- with a reboot of the device, this is 0.5-2 minutes.

What the work item is about, is shortening the time that the actual detection takes, from the several minutes specified in RFC 4306 to something more manageable, like two roundtrips.

Without an HA implementation, this is not about keeping the current calls, but about being able to establish new calls as soon as possible.

Yoav

On Mar 22, 2010, at 12:13 PM, Hu, Jun (Jun) wrote:

> Hi,
> First of all, I am not sure if this fit into existing "supported scenarios" criteria or it is a new one, the failure detection time is cirtical to some services runs over ipsec tunnel, such services like VoIP can only tolerate sub-second(or 1~2 seconds max) of transport failure, otherwise the call will be dropped. However , it seems to me that the current proposed solutions all depends on reception of "INVALID_SPI" from failed node AFTER reboot which usually take much longer time than 1~2 seconds.  This will result to the interuption of those services.
> 
> Of course, a good HA implementation may solve this issue, however a fast failure detection mechanism can also help the host to switch to a backup tunnel(or other route) asap before the service got interrupted.
> 
> ---------------
> Hu Jun

[IPsec] criteria for failure detection Hu, Jun (Jun)
Re: [IPsec] criteria for failure detection Yoav Nir