Re: [IPsec] I-D Action:draft-ietf-ipsecme-ipsecha-protocol-00.txt

Pekka Riikonen <priikone@iki.fi> Mon, 06 September 2010 17:46 UTC

Date: Mon, 06 Sep 2010 19:48:02 +0200
From: Pekka Riikonen <priikone@iki.fi>
To: Raj Singh <rsjenwar@gmail.com>
In-Reply-To: <AANLkTim=S_TVvQoy-Oh4WuiwLX22qQjNA9WBST4FZhYQ@mail.gmail.com>
Message-ID: <Pine.NEB.4.64.1009061921590.11774@otaku.Xtrmntr.org>
References: <20100906031510.557ED3A6863@core3.amsl.com> <Pine.NEB.4.64.1009060634140.12204@otaku.Xtrmntr.org> <AANLkTim=S_TVvQoy-Oh4WuiwLX22qQjNA9WBST4FZhYQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"
Cc: ipsec@ietf.org
Subject: Re: [IPsec] I-D Action:draft-ietf-ipsecme-ipsecha-protocol-00.txt
Precedence: list

On Mon, 6 Sep 2010, Raj Singh wrote:

: > The IKEv2 message id sync is definitely mandatory, but the IPSEC SA seqno
: > sync IMHO isn't.  Although, none of this would be an issue if IKEv2 would
: > allow initiator to move the window forward freely (that would be real
: > "fix").
: >
: > The IPsec SA replay counter sync is required for these reasons:
: 1.  In cluster environment, IPsec SA reply counter will get get updated from
: active to standby in
:      periodic manner. Suppose, we sync IPsec reply counter for every 1,000
: IPsec packets and
:      last sync happened at 30,000. So, the next sync will happen 31,000.
:      Now, say failover happened at 30, 500. So, standby member becomes
: active, and it start
:     using IPsec replay counter from 30, 000. It will be considered as Replay
: Attack and SA has to be destroyed.
: 
I don't think the picture is this bleak.  What needs to be done is to 
increase the sequence number when the failover happens with large enough 
value to avoid the replay errors.  This can be done by calculating the 
expected pps that can be sent by the active node during the sequence 
number sync period in the cluster.  If you sync once a second, then 
calculate the maximum pps you think/know what can be for that period, and 
increse the sequence number with that ammount.  In your example it would 
go as follows:

 active seq: 30500
 standby seq: 30000
 --failover--
 standby seq: 30000 + 1000

The window at the remote peer will move to 31000; no replay errors.

In this example we assume that maximum of 1000 packets can be sent during 
the sync period.

We've had this working for years without noticeable issues.  We sync 
faster than once a second because we support pretty high throughput 
(several gbit/sec), but we also increase it by quite large number to 
support that throughput also during failovers.

:     This applies to both outbound and inbound replay counters.
:
The issue is only with outbound sequence numbers, because as we know, the 
window can move always forward with incoming packets.  So even if the 
incoming window is lagging behind it will always move forward when the 
remote sends new packets to us.

The draft itself says that the sequence numbers are increased in failover:

   o  Active member dies and Stand-by member takes over.  Stand-by
      Member increments its values of Outbound SA Counters for each
      IPsec SA and sends them to the peer.

It just desn't define how much to increment.  If it increments them with 
large enough number there is no need to send them to the peer.  The window 
at the peer will move automatically to the new values when it receives 
ESP/AH packets with the new sequence numbers.

: - Simultaneous failover at both ends.  If failover happens at the same
: > time in both ends, implementations must be able to handle situation where
: > they receive SYNC_SA_COUNTER_INFO request before they receive response to
: > their own request (they may receive the response only after
: > retransmission).
: >
: > I don't think we have any solution for this and any solution is possible
: (?). This basically means that both sides has lost the SA.
: So we have to establish SA from scratch. This draft assumes, that both side
: has the SA and their message windows has mis-matched.
:
If both sides are clusters then naturally both ends have HA, and can 
handle failover gracefully.  Both ends retain the old SAs in the standby 
node, and need to just update the message id window.  The concern I had 
here is that implementation must be able to handle request also from the 
remote end while it is itself also doing the request.

	Pekka

[IPsec] I-D Action:draft-ietf-ipsecme-ipsecha-pro… Internet-Drafts
Re: [IPsec] I-D Action:draft-ietf-ipsecme-ipsecha… Pekka Riikonen
Re: [IPsec] I-D Action:draft-ietf-ipsecme-ipsecha… Raj Singh
Re: [IPsec] I-D Action:draft-ietf-ipsecme-ipsecha… Pekka Riikonen
Re: [IPsec] I-D Action:draft-ietf-ipsecme-ipsecha… Tero Kivinen
Re: [IPsec] I-D Action:draft-ietf-ipsecme-ipsecha… Tero Kivinen
Re: [IPsec] I-D Action:draft-ietf-ipsecme-ipsecha… Tero Kivinen
Re: [IPsec] I-D Action:draft-ietf-ipsecme-ipsecha… Pekka Riikonen
Re: [IPsec] I-D Action:draft-ietf-ipsecme-ipsecha… Pekka Riikonen