Re: [IPsec] IPsec Digest, Vol 123, Issue 21

Les Leposo <leposo@gmail.com> Tue, 19 August 2014 14:31 UTC

Content-Type: multipart/alternative; boundary="Apple-Mail=_286432FE-73C7-4E6A-A293-4400EED9B3F1"
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
From: Les Leposo <leposo@gmail.com>
In-Reply-To: <21491.10614.714777.145464@fireball.kivinen.iki.fi>
Date: Tue, 19 Aug 2014 17:31:28 +0300
Message-Id: <1252045C-2D84-4764-AFCC-1450A2921A8C@gmail.com>
References: <mailman.4236.1406823571.13632.ipsec@ietf.org> <A0463391-0BB4-408F-874B-A6B91ED6D102@gmail.com> <21490.4420.127387.489490@fireball.kivinen.iki.fi> <9852A873-1EEA-45EC-AEA3-BA654249A662@gmail.com> <21491.10614.714777.145464@fireball.kivinen.iki.fi>
To: Tero Kivinen <kivinen@iki.fi>
Archived-At: http://mailarchive.ietf.org/arch/msg/ipsec/NAyrZnjdrY7--Qpp1uvetX2jZcU
Cc: ipsec@ietf.org
Subject: Re: [IPsec] IPsec Digest, Vol 123, Issue 21
Precedence: list

On Aug 19, 2014, at 1:39 PM, Tero Kivinen <kivinen@iki.fi> wrote:

> Les Leposo writes:
>> have you overlooked the issue of nat mappings?
> 
> Nope.
> 
>> ipsec nat keepalives are very useful for keeping nat mappings alive,
>> and in a world full of all sorts of nat devices (some behaving
>> reliably and others not), one would have to use low keepalive
>> interval... like 10-60s. 
> 
> IPsec NAT-T keepalives are completely different thing than DPD.
> 
> IPsec NAT-T keepalives are packets sent by the device behind the NAT
> as specified in the RFC3948 section 2.3. The responder SHOULD ignore
> the received NAT-keepalive packet, and MUST NOT be used to detect
> whether a connection is live (RFC 3948 section 4). Only device behind
> the NAT sends them and other end does not respond to them, or send its
> own keepalives (unless it is also behind NAT).

> 
> The dead peer detection (DPD) or liveness check is a procedure
> specified in the RFC5996 section 2.4 where it says that:
> 
> 	... If there has only been outgoing traffic on all of
>   the SAs associated with an IKE SA, it is essential to confirm
>   liveness of the other endpoint to avoid black holes.  If no
>   cryptographically protected messages have been received on an IKE SA
>   or any of its Child SAs recently, the system needs to perform a
>   liveness check in order to prevent sending messages to a dead peer.
>   (This is sometimes called "dead peer detection" or "DPD", although it
>   is really detecting live peers, not dead ones.)  Receipt of a fresh
>   cryptographically protected message on an IKE SA or any of its Child
>   SAs ensures liveness of the IKE SA and all of its Child SAs.
> 
> This is done by sending empty INFORMATIONAL message to the other end,
> and if there is response to it then other end is up and running. You
> are supposed to do this only when you suspect something is wrong, i.e.
> the traffic changed to be one way (i.e. no packets coming back), or
> you get ICMP or unauthenticated notify payload or similar.
> 

>> Now, today's client devices need to be energy efficient - so the
>> device sleeps/hibernates to save battery. Sleeping past the nat
>> keepalives is bound to happen (either by design or error). At some
>> point the device will wake from sleep and need to test reachability
>> using dpd.
> 
> Yes, if the device sleeps a long time, it should check whether the IKE
> SA is still up by using DPD. This has nothing to do with the NAT
> keepalives. 

Paul's concerns centred around chattiness and the client-side energy cost of this chattiness.
Hence, I offered some suggestion for reducing the energy cost of the chattiness, while probing down the path of DPD, and rekeys to get to the root of the high chattiness.

Paul's ios issue seems pathological - linked to the ike daemon crashing or being signalled. But unless there is a gold standard implementation that can maintain tunnels like all week (on a single charge) we can't just stop at poor-implementation=low-battery.

But let me explain why I ventured into Keepalives and DPD.
Keepalives help maintain the network path. In their absence, the network path (from the server's point of view) will likely change (particularly in today's world full of crappy nats and ip oversubscription, made worse by multi-connection web pages and apps).

If mobile device sleeps a long time, the likelihood that device wakes to a network path change is significant. Aside from pure blackouts, black holes and Source IP changes, Source Port address changes will cause issues for some servers/configurations. 

Hence dpd is increasingly used to verify both path and peer, both by design (e.g. upon waking just do dpd, or upon waking daemon kicks off dpd if it sees outgoing esp traffic and no incoming esp traffic) and by configuration. As of ios 4 & 5, i'm sure the iphone ike did the former approach, and many admins do the latter.
Same goes for wake, where the daemon has to wait for the interface to return (and verify the underlying network) before trying DPD, ios 4 & 5 did that.

Keepalives and DPD were merely a small illustration of the larger issue which is that in todays real world network conditions, maintaining the tunnel is a challenge.
Protocol chattiness increases as the client daemon adapts (itself or by the developer) to the lowest common denominator (the crappy nat, the roadside motel wifi, oversubscribed cell data network, or city apartment 'high interference' wifi, or the buggy/misconfigured server).
Hence I kept probing down the path of DPD, and rekeys because Paul's concerns centered around chattiness and .

Even with good implementation of the latest ikev2 spec, the lowest common denominator will demand more energy. And so, spec tweaks and creative (supplementary) drafts/standards are needed (aside from additional server-side hardware investment)

1) Session Resumption ... ikev2 MUST (not SHOULD) both as tools for handling post wake/crash recovery.
2) MOBIKE & correct handling of Source Port Address Changes ... MUST (... the latter should have been a MUST for Ikev1).
3) additional IKEv2 drafts/standards to create 'very low cal' clients and improve handling of path mtu (.... another effect of the lowest common denominator). These 'very low cal' clients will also be chatty, but consume less because more of the ike operations are offloaded to the high capacity server.
4) Strategies for dealing with network/sleep/wake events crash-recovery and energy conservation... perhaps as drafts or design guidelines to improve the standard of all ikev2 implementations.

> 
>> And in some cases (if the sleep was more than a certain threshold),
>> rather than wait for dpd to failover, the choice is to go for rekey.
> 
> Rekey is not an option, as that would require the IKE SA to still be
> up. I think you mean startting the IKE SA from the beginning. If the
> device has been sleeping for long time, and it suspects the IKE SA is
> gone, it might try shorter period for the DPD, i.e. send few retries
> for the empty INFORMATIONAL message and if no response is received,
> then fall back to start the IKE SA from the beginning.
For your choice of algorithm, the lowest common denominators will always experience delays in connecting because the DPD retries max out more often than not. Just pointing that out.

> 
> The device of course needs to first wait that the network is up again
> before doing this test, as when you wake up from the sleep, the device
> most likely needs to find the wifi network again, do DHCP, perhaps
> even do hotel login page etc before it can actually send any packets
> to network, and doing DPD during that time, would certainly fail, even
> when the other end would still be there. 
Correct, ios 4 & 5 used to do these sort of things. I'm not sure about other smartphones.

> 
> It is important to remember what are the fundamental restrictions set
> by the protocol, and which issues are just caused by bad
> implementations. Quite a lot of the problems we are seeing are caused
> by bad implementations... 

Not solely. the network then isn't the same now. Back then the lowest common denominator was a laptop connected through a cell modem or campus wifi network, path mtu wasn't bad, certificate payloads weren't big.

Now it is a power-constrained smartphone connected to a high-loss oversubscribed network. Some of the issues that weren't important/visible then are now significant (e.g. source address/port changes are very common, large certificate payloads commonplace & their interaction with small or changing path mtu is an obstacle, detecting local ip address changes alone isn't enough... you might get the same address back but your ports/gw/dns changed).

> -- 
> kivinen@iki.fi

Re: [IPsec] IPsec Digest, Vol 123, Issue 21 Les Leposo
Re: [IPsec] IPsec Digest, Vol 123, Issue 21 Yoav Nir
Re: [IPsec] IPsec Digest, Vol 123, Issue 21 Les Leposo
Re: [IPsec] IPsec Digest, Vol 123, Issue 21 Tero Kivinen
Re: [IPsec] IPsec Digest, Vol 123, Issue 21 Paul Wouters
Re: [IPsec] IPsec Digest, Vol 123, Issue 21 Les Leposo
Re: [IPsec] IPsec Digest, Vol 123, Issue 21 Les Leposo
Re: [IPsec] IPsec Digest, Vol 123, Issue 21 Paul Wouters
Re: [IPsec] IPsec Digest, Vol 123, Issue 21 Les Leposo
Re: [IPsec] IPsec Digest, Vol 123, Issue 21 Tero Kivinen
Re: [IPsec] IPsec Digest, Vol 123, Issue 21 Tero Kivinen
Re: [IPsec] IPsec Digest, Vol 123, Issue 21 Les Leposo
Re: [IPsec] IPsec Digest, Vol 123, Issue 21 Yoav Nir
Re: [IPsec] IPsec Digest, Vol 123, Issue 21 Paul Wouters
Re: [IPsec] IPsec Digest, Vol 123, Issue 21 Paul Wouters
Re: [IPsec] IPsec Digest, Vol 123, Issue 21 Les Leposo
Re: [IPsec] IPsec Digest, Vol 123, Issue 21 Les Leposo
Re: [IPsec] IPsec Digest, Vol 123, Issue 21 Yoav Nir
Re: [IPsec] IPsec Digest, Vol 123, Issue 21 Les Leposo
Re: [IPsec] IPsec Digest, Vol 123, Issue 21 Tero Kivinen
Re: [IPsec] IPsec Digest, Vol 123, Issue 21 Les Leposo