[IPsec] HA/LS terminology

Rodney Van Meter <rdv@sfc.wide.ad.jp> Tue, 23 March 2010 17:58 UTC

Return-Path: <rdv@sfc.wide.ad.jp>
X-Original-To: ipsec@core3.amsl.com
Delivered-To: ipsec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id A0AA73A6C32 for <ipsec@core3.amsl.com>; Tue, 23 Mar 2010 10:58:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -94.766
X-Spam-Level:
X-Spam-Status: No, score=-94.766 tagged_above=-999 required=5 tests=[BAYES_50=0.001, DNS_FROM_OPENWHOIS=1.13, HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, J_CHICKENPOX_22=0.6, RELAY_IS_203=0.994, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sLPKX2wEzV4s for <ipsec@core3.amsl.com>; Tue, 23 Mar 2010 10:58:52 -0700 (PDT)
Received: from mail.sfc.wide.ad.jp (mail.sfc.wide.ad.jp [203.178.142.146]) by core3.amsl.com (Postfix) with ESMTP id D4E5B3A6D02 for <ipsec@ietf.org>; Tue, 23 Mar 2010 10:46:20 -0700 (PDT)
Received: from [IPv6:2001:df8::24:223:6cff:fe91:9b42] (unknown [IPv6:2001:df8:0:24:223:6cff:fe91:9b42]) by mail.sfc.wide.ad.jp (Postfix) with ESMTPSA id C9BE84DC57 for <ipsec@ietf.org>; Wed, 24 Mar 2010 02:46:28 +0900 (JST)
Message-Id: <7EF09073-9D20-4077-A8DD-59B84B1732D0@sfc.wide.ad.jp>
From: Rodney Van Meter <rdv@sfc.wide.ad.jp>
To: ipsec@ietf.org
Content-Type: text/plain; charset="US-ASCII"; format="flowed"
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v936)
Date: Wed, 24 Mar 2010 02:46:26 +0900
X-Mailer: Apple Mail (2.936)
Subject: [IPsec] HA/LS terminology
X-BeenThere: ipsec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Discussion of IPsec protocols <ipsec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ipsec>, <mailto:ipsec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ipsec>
List-Post: <mailto:ipsec@ietf.org>
List-Help: <mailto:ipsec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipsec>, <mailto:ipsec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 23 Mar 2010 17:58:53 -0000

I am *NOT* an expert on fault tolerance, but I have studied it a
little (long ago, if not so far away), and I worked on Network
Alchemy's fault tolerant implementation of an IPsec gateway (a decade
ago, and a little farther away).  So, some suggestions on the
terminology for the HA&LS draft.

Terminology:

"High Availability" refers to a gateway or cluster whose expected
downtime is low on e.g. an annual basis.  High availability may be
achieved using fault tolerant techniques such as hardware/software
clustering.  It may also be achieved by e.g. using extremely robust
components and having a very low reboot time.

"Fault Tolerant" refers to a gateway or cluster that will maintain
service availability even when a specified set of fault conditions
occurs, such as the loss of one or more cluster members to a hardware
or software fault.

Clusters whose purpose is improving availability may operate using a
"hot standby" model, in which one or more gateways is active and one
or more gateways is held in reserve and activated when the failure of
an active member is detected.  Clusters whose purpose is improving
scalability (of performance, number of active connections, etc.),
using a "load sharing" model, have more than one member active.

IPsec gateways must be prepared for their peers to lose state, e.g. as
a result of a reboot, resulting in each peer attempting to reconnect.
The latency of that reconnection, and the computational load of
reconnecting a large number of peers, means that a fast-rebooting
gateway alone is not sufficient to provide high availability service,
driving the need for fault tolerance in an IPsec gateway
implementation.

If a fault is never visible to peers, the cluster is said to be
"completely transparent".  If some peers must reconnect, or a change
of IP address is visible, the cluster is said to be "partially
transparent".  It is possible to create an implementation with lazy
synchronization or an otherwise incompletely redundant state,
resulting in e.g. a few percent of peers (or a few percent probability
of any given peer) being aware of the fault.

		--Rod