[IPsec] HA/LS terminology

Rodney Van Meter <rdv@sfc.wide.ad.jp> Tue, 23 March 2010 17:58 UTC

Message-Id: <7EF09073-9D20-4077-A8DD-59B84B1732D0@sfc.wide.ad.jp>
From: Rodney Van Meter <rdv@sfc.wide.ad.jp>
To: ipsec@ietf.org
Content-Type: text/plain; charset="US-ASCII"; format="flowed"
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v936)
Date: Wed, 24 Mar 2010 02:46:26 +0900
Subject: [IPsec] HA/LS terminology
Precedence: list

I am *NOT* an expert on fault tolerance, but I have studied it a
little (long ago, if not so far away), and I worked on Network
Alchemy's fault tolerant implementation of an IPsec gateway (a decade
ago, and a little farther away).  So, some suggestions on the
terminology for the HA&LS draft.

Terminology:

"High Availability" refers to a gateway or cluster whose expected
downtime is low on e.g. an annual basis.  High availability may be
achieved using fault tolerant techniques such as hardware/software
clustering.  It may also be achieved by e.g. using extremely robust
components and having a very low reboot time.

"Fault Tolerant" refers to a gateway or cluster that will maintain
service availability even when a specified set of fault conditions
occurs, such as the loss of one or more cluster members to a hardware
or software fault.

Clusters whose purpose is improving availability may operate using a
"hot standby" model, in which one or more gateways is active and one
or more gateways is held in reserve and activated when the failure of
an active member is detected.  Clusters whose purpose is improving
scalability (of performance, number of active connections, etc.),
using a "load sharing" model, have more than one member active.

IPsec gateways must be prepared for their peers to lose state, e.g. as
a result of a reboot, resulting in each peer attempting to reconnect.
The latency of that reconnection, and the computational load of
reconnecting a large number of peers, means that a fast-rebooting
gateway alone is not sufficient to provide high availability service,
driving the need for fault tolerance in an IPsec gateway
implementation.

If a fault is never visible to peers, the cluster is said to be
"completely transparent".  If some peers must reconnect, or a change
of IP address is visible, the cluster is said to be "partially
transparent".  It is possible to create an implementation with lazy
synchronization or an otherwise incompletely redundant state,
resulting in e.g. a few percent of peers (or a few percent probability
of any given peer) being aware of the fault.

		--Rod

[IPsec] HA/LS terminology Rodney Van Meter
Re: [IPsec] HA/LS terminology Melinda Shore
Re: [IPsec] HA/LS terminology Rodney Van Meter
[IPsec] Issue #177. (was: HA/LS terminology) Yoav Nir
Re: [IPsec] Issue #177. (was: HA/LS terminology) Rodney Van Meter
Re: [IPsec] Issue #177. (was: HA/LS terminology) Melinda Shore
Re: [IPsec] Issue #177. (was: HA/LS terminology) Rodney Van Meter
Re: [IPsec] Issue #177. (was: HA/LS terminology) Yoav Nir
Re: [IPsec] Issue #177. (was: HA/LS terminology) Dan Harkins
Re: [IPsec] Issue #177. (was: HA/LS terminology) Yoav Nir
Re: [IPsec] Issue #177. (was: HA/LS terminology) Dan Harkins
Re: [IPsec] Issue #177. (was: HA/LS terminology) Melinda Shore
Re: [IPsec] Issue #177. (was: HA/LS terminology) Raj Singh
Re: [IPsec] Issue #177. (was: HA/LS terminology) Yoav Nir
Re: [IPsec] Issue #177. (was: HA/LS terminology) Tero Kivinen