Re: [IPsec] FYI: A Novel Denial-of-Service Attack Against IKEv2 - HAL-Inria

Tristan Ninet <tristan.ninet@inria.fr> Sat, 21 September 2019 17:22 UTC

Return-Path: <tristan.ninet@inria.fr>
X-Original-To: ipsec@ietfa.amsl.com
Delivered-To: ipsec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 71965120071 for <ipsec@ietfa.amsl.com>; Sat, 21 Sep 2019 10:22:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.899
X-Spam-Level:
X-Spam-Status: No, score=-6.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FZ-UacN_Ga5t for <ipsec@ietfa.amsl.com>; Sat, 21 Sep 2019 10:22:22 -0700 (PDT)
Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1291012004C for <ipsec@ietf.org>; Sat, 21 Sep 2019 10:22:21 -0700 (PDT)
X-IronPort-AV: E=Sophos;i="5.64,532,1559512800"; d="scan'208,217";a="320189898"
X-MGA-submission: MDHTuv9yjz6IPQmyaxg6Shsn+SXkWIl9RFCl5x/kccpPp47Dyon4EoA5oSB3uLhXbJ9OIlSR6gYZFno42FyT1B340ubQYjyDHbScUArzDbvO5GuO4PPd1eRLzPaLJtx/Wb1maPsrV+vgYu63jJAZj7bxFAzUmHrkZwiLr2EdgtxS2w==
Received: from zcs-store1.inria.fr ([128.93.142.28]) by mail3-relais-sop.national.inria.fr with ESMTP; 21 Sep 2019 19:22:19 +0200
Date: Sat, 21 Sep 2019 19:22:19 +0200
From: Tristan Ninet <tristan.ninet@inria.fr>
To: Paul Wouters <paul@nohats.ca>
Cc: "ipsec@ietf.org WG" <ipsec@ietf.org>, Olivier Zendra <olivier.zendra@inria.fr>, romaric maillard <romaric.maillard@thalesgroup.com>
Message-ID: <94576796.2706392.1569086539715.JavaMail.zimbra@inria.fr>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="=_84d42a1a-e2b6-4f90-a3a3-30fcf375e823"
X-Originating-IP: [92.184.116.248]
X-Mailer: Zimbra 8.7.11_GA_3800 (ZimbraWebClient - FF69 (Linux)/8.7.11_GA_3800)
Thread-Index: MZ5jtk/0ggJ19ZcaM0YjCtR4eGPdkg==
Thread-Topic: A Novel Denial-of-Service Attack Against IKEv2 - HAL-Inria
Archived-At: <https://mailarchive.ietf.org/arch/msg/ipsec/RP4OB8LMhmouTmnvsZjd6Y6ruxo>
Subject: Re: [IPsec] FYI: A Novel Denial-of-Service Attack Against IKEv2 - HAL-Inria
X-BeenThere: ipsec@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Discussion of IPsec protocols <ipsec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipsec>, <mailto:ipsec-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ipsec/>
List-Post: <mailto:ipsec@ietf.org>
List-Help: <mailto:ipsec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipsec>, <mailto:ipsec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 21 Sep 2019 17:22:27 -0000

Dear Mr. Wouters, 

Thank you for your interest in our work. 

> I've read through the paper, and I believe is very much misrepresents what it 
> deems is a DoS attack against the IKEv2 protocol. 
> 
> The DoS attack described seems to think it can change the IP address and cause 
> Initiator to be authenticated by a different peer than intended (ignoring all of 
> IDi / IDr payloads it is relaying). Then the different peer is happy, but the 
> last IKE_AUTH reply to the initiator would signify a failure. Then when the 
> initiator sends an Informational message with a Delete payload and 
> AUTHENTICATION_FAILED notify, the attacker drops the message. Now the different 
> peer has "lost resources" since its IKE SA (and possibly IPsec SA) is up. A 
> proper implementation would send a Liveness probe if its IPsec SA counters 
> remain zero. It would also put an idle limit on an childless SA that resulted 
> from a TS_UNAVAILABLE (as opposed to a by design childless IKE SA) 

Let us denote Initiator by A, Responder by B, Victim by C, IKE_SA_INIT request 
by m1, IKE_SA_INIT response by m2, IKE_AUTH request by m3, and IKE_AUTH response 
by m4. 

I understand you are saying that authentication of A to C would fail because of 
IDi and IDr payloads. 

However, we put as a requirement to the attack that C trusts A, i.e. C has some 
configuration entry with the ID of A. In this case, authentication will succeed. 

Regarding the IDr payload, this payload is optional in m3. In fact libreswan 
does not send this payload in m3 by default. 

Furhermore, even if IDr is sent in m3, as explained in Section V-C-a) 
("Vulnerability of strongSwan") in our paper, the RFC does not mandate to cancel 
the IKE SA setup if the IDr of m3 does not match C's identity. The RFC says: "If 
the IDr proposed by the initiator is not acceptable to the responder, the 
responder might use some other IDr to finish the exchange". This actually 
suggests that completing the IKE SA setup even if IDr is wrong is OK. So 
depending on the implementation, even if IDr is sent in m3, an IKE SA could be 
set up, and the attack could succeed. 

Therefore, with our requirements, authentication of A to C succeeds, and at 
least an IKE SA per deviation is up in C. 

You then say that a proper implementation would send a Liveness probe if its 
IPsec SA sequence numbers remain zero. 

The RFC does say that Liveness checks are needed. In this regard, strongswan and 
libreswan do not follow the RFC since in both implementations, Dead Peer 
Detection (DPD) is disabled by default. 

However, DPD does not deter the attack. A classic flooding DoS attack can only 
set up half-open SAs. An IKEv2 implementation should remove half-open SAs after 
some short time. In strongswan by default half-open SAs are removed after 30s. 
Therefore a high-rate of m1 messages is needed to achieve memory exhaustion 
using classic flooding against IKEv2. 

However, the Deviation attack sets up full IKE SAs (if not Child SAs as well) in 
C. We measured that a full connection (one IKE SA + one Child SA) is 23kB in 
strongswan, whereas a half-open SA is 1kB. This divides by 23 the minimum 
throughput of m1 messages to deviate in order to exhaust the memory of C. 

In strongswan, when DPD is enabled, by default, connections with a dead peer 
(such as in the Deviation Attack) are removed after dpdtimeout + total 
retransmission timeout = 30s + 165s = 195s. It does not make sense to go much 
lower, and some implementations might want to set this timeout higher so that 
bandwidth is not overwhelmed. This longer stay of undesirable connections in C's 
memory divides by 6 the minimum throughput of m1 messages to deviate in order to 
exhaust the memory of C. 

In total the Deviation Attack thus divides the required throughput by 140. In 
consequence the DA is much harder to detect using intrusion detection systems 
than classic DoS attacks, in particular when DPD timeout is high. 

In strongswan, when DPD is disabled, connections with a dead peer are removed at 
the time of rekeying, i.e. by default 3h. A DoS with a throughput 8000 times 
lower than classic DoS techniques is then possible. 

On the other hand, the requirements for the attack are quite strong. Firstly, 
the attacker needs to have some control over the connection between A and B. 
Secondly, all Initiator parties authenticate themselves using signature mode and 
are trusted by Victim. Thirdly, the attacker needs to find enough m1 messages to 
deviate. In addition, each m1 message must come from a different IKEv2 peer. 
Otherwise connections will simply replace current connections with the same 
peer. In fact I did not see this behavior in the RFC, but strongswan behaves 
this way by default. 

The contribution of the paper is to point out that when the above requirements 
are satisfied, then an attacker may perform a DoS attack with a significantly 
lower throughput than expected from classic flooding techniques. 

Note in addition that the vulnerability described in our paper is very much not 
specific to IKEv2. The vulnerability may appear in any stateful protocol that 
performs authentication and does not satisfy the weak agreement property. 

I admit however that we do not sufficiently detail the difference between the DA 
and classic DoS attacks in the paper. Also the version of the paper you read is 
quite old. It is a version that we submitted to a conference some time ago. We 
are currently publishing a more recent version at another conference. 

You also say that a proper implementation would put an idle limit on an 
childless SA that resulted from a TS_UNAVAILABLE (as opposed to a by design 
childless IKE SA). I think you mean TS_UNACCEPTABLE. I cannot find this behavior 
anywhere in the RFC. But even if it is true, the point I made above stays valid. 

> 
> It makes more wrong assumptions like "(TS negotiation will fail in most cases)" 
> which I guess they think would fail because the different peer's have different 
> IPsec SA configurations, but really if they are that different, they will also 
> have different IDi/IDr payloads because a peer's configuration with many other 
> peers for specific subnets would be configured with local/remote IDs as to not 
> tie these to hardcoded IP addresses. Without explicit ID, the ID used is 
> normally the ID_IPvx, and if that is used, using an IP address X with ID_IPv4 Y 
> will also cause an IKE failure of the victim peer because for IP address X it 
> would then expect ID_IPv4 X. 

You say that our assumption that TS negotiation will fail in most cases is 
wrong. Even if this assumption is wrong it only makes the attack stronger, as an 
even heavier connection is installed in C. 

You say that authentication using ID_IPvx will fail because A would be using 
different IP addresses in its IDi and in the source address of the m3 packet. 
However, in the attack the deviation only changes the destination address, not 
the source address. Thus no such failure would occur. 

> 
> It then "proves" this by using the strongswan/libreswan option uniqueids=no, 
> which is an non-standard override used to allow multiple IDs to establish more 
> than one connection. Obviously, such connections MUST use liveness/dpd to kick 
> out idle connections, because you cannot detect a reconnect from behind NAT from 
> a different user behind the same NAT, and you would accumulate a lot of 
> connections from restarting clients that you wouldn't be able to cleanup 
> otherwise. 

In our demonstration we decided, for practical reasons, not to create the N 
Initiator machines of the generic scenario, but instead to only create 
`N_{demo}` Initiator VMs, with each of them sending `N/N_{demo}` m1 messages to 
Responder machines. We are using fully virtualized machines (Virtualbox) in our 
demo, so it would not have been possible to create that many VMs. 

However, by default in strongSwan, a party requires an IKE ID to be unique among 
the IKE SAs it manages. That is, when that party receives an m3 message with an 
IDi payload that is equal to the peer ID of an existing IKE SA in its SAD, it 
will delete the old IKE SA and replace it with the new one. Because of this in 
our experiment, the Deviation Attack would add only `N_{demo}` unintended 
connections in Victim. To stay faithful to the generic scenario, we tell Victim 
not to delete the old IKE SA in this situation, and instead, to set up the new 
one alongside. This is done using the "uniqueids=never" option. 

However, as I said above, the attack does assume that each m1 message comes 
from a different IKEv2 peer. It is obvious that if the attack works with Ndemo 
Initiators and the "uniqueids=never" option set in Victim, then it works in the 
same setup but with N Initiators and without the "uniqueids=never" option in 
Victim. 

The above explanation is in appendix E of the version of the paper you read. It 
is also in the README of the demo. 

> 
> Assuming all peers are on dynamic IPs, so no ACL's kick in between the peers 
> (which would really only happen on a mesh cloud encryption, where an attacker 
> would be extremely unlikely able to selectively NAT or DROP packets), the DoS 
> attack would still fail. If peer A tries to talk to peer B and ends up talking 
> to peer C, then the attack would fail if peer A sends the IDr payload. If it 
> does not send an IDr payload, then it likely expects the ID_IPv4 later which 
> wouldn't match when it got redirected to a different peer, using a different 
> ID_IPv4. But even we assume all of this is true, and the attacker can block the 
> packets from peer A's delete request to peer C, then yes peer C uses one 
> connection resource. If peer A attempts a new connection because its connection 
> failed, the attacker can try this again, but peer C will replace the existing 
> IKE/IPsec SA in the normal case. So doing the attack twice from the same peer 
> (who is still trying to connect) will still only lead to one extra unused 
> connection of peer C. So you would need many peers to get any real amount of 
> memory wasted on peer C. But if peer C is expecting to be part of a mesh with 
> _many_ peers, it will have the resources to setup connections with a large 
> number of peers. Deflecting a few peers through other peers isn't going to 
> present a different scale level. 

You say that authentication will fail if A sends IDr. We already discussed this 
point above. Note that sending an IDr payload in the IKE_AUTH request might be 
indeed a potential countermeasure to the deviation attack, assuming that the 
implementation actually rejects incoming IKE_AUTH messages with improper IDr 
when acting as a responder (behavior that one might expect but once again is not 
mandated by the RFC). 

You say that A likely expects the ID_IPv4 later which wouldn't match when it got 
redirected to a different peer, using a different ID_IPv4. Well of course, and 
that will make the authentication of C to A fail, but the harm is already done 
since an SA was set up in C. 

You say that the attack has strong requirements. We already discussed this point 
above. It is true that the deviation attack is only successful in very specific 
conditions but I think that our paper is rather explicit on this point. 

> 
> In the case where repeated attempts would not replace the existing connection 
> (which they emulate using the strongswan/libreswan uniqueids=no option on peer 
> C), peer C would indeed end up using another one connection of memory. Peer A 
> would do some kind of back-off on the failures, so maybe you get a one 
> connection per minute rate. You would still need a lot of peers to DoS peer C, 
> which again means that peer C is already expected to talk to a lot of peers. At 
> best you could double the load by sending each peer configured through another 
> peer. 

As said above, we do not consider the case where repeated attempts would not 
replace the existing connection. 

> 
> And all this assumes peer C does not remove idle IKE/IPsec SA's, and does not 
> use liveness/DPD. Which in a mesh peer to peer enterprise network encryption you 
> would use. In a remote access VPN scenario, there is no "redirect to a different 
> peer" as all peers only connect to the one security gateway (or a set of 
> gateways with identical credentials) so 'redirecting' a peer is not possible. 

As said above, the Deviation Attack works even if C uses DPD, even if less 
efficient. 

> 
> It goes on to say the attack is not possible with PSKs, which I don't 
> understand. They also then mumbled about asymmetric authentication, which I 
> don't understand, but regardless is basically only employed with EAPTLS and 
> Remote Access VPNs, so it does not apply to this attack. 

When we said that the attack is not possible with PSKs, we assumed that the PSK 
between A and B is different than the PSK between A and C. In this case, 
authentication of A to C will fail and the attack does not work because no SA in 
installed in C. 

We were not talking about asymmetric encryption. Maybe it was bad phrasing. We 
were just considering the somehow "asymmetric" (i.e. not the same on both sides) 
situation where A authenticates himself using signature and C authenticates 
himself using PSK (in fact we switched A and C in the paper, this is a typo). 
What we said is that in this situation the attack still works, which is true. 

As a side note, the above situation is mentioned in the RFC: "There is no 
requirement that the initiator and responder sign with the same cryptographic 
algorithms. [...] In particular, the initiator may be using a shared key while 
the responder may have a public signature key and certificate". I am not sure 
whether this is a common situation. 

> 
> When using PSK with a Group-ID, this "attack" is in fact the actual normal 
> deployment that implementations already handle. A groupID plus PSK, at least on 
> libreswan, implies uniqueids=no for free because otherwise each client would 
> kick of the next client. Now imagine a client on flaky wifi that keeps dropping 
> and reconnecting. It would actually set up duplicate connections faster than 
> this entire peer redirection attack. Those configurations better better use 
> liveness/dpd already to ensure an often reconnecting client is not generating 
> hundreds of new IKE/IPsec SA's without cleaning up the old ones. (initial 
> contact cannot be used when using groupID with PSK as all different clients 
> identify the same). 

As I said, we only considered the case where the PSK between A and B is 
different than the PSK between A and C. 

> 
> They link to proof of concept code at gitlab.com/deviation/demo but I cannot 
> access this - it is a private repository. But from the description in the paper, 
> they only show 3 VMs and one of these connection tricks - and don't show any of 
> this at scale using containers to see how the target would respond when this 
> happens at a scale of which this becomes an actual attack. 

As I said above, the version of the paper you read is quite old. The gitlab repo 
that this version refers to was an anonymized repo. As later we did not need 
anonymisation anymore, we switched this repo to private and moved its content to 
a new public repo at gitlab.inria.fr/tninet/demo. Feel free to check 
it out. 

We refer to the above repo in the version of the paper that will soon be 
published. I agree however that because the version you read is public, we 
should have kept gitlab.com/deviation/demo public. To this end, we recently set 
the latter back to public and mirrored it to gitlab.inria.fr/tninet/demo. 

We actually use Ndemo + 3 VMs (Ndemo Initiators + Victim + Intruder + Probe), 
which by default in our demo amounts to 6 VMs since we use Ndemo = 3 by default. 
I already explained above why we do not used all N Initiators and why it does 
not affect our proof. 

> 
> In conclusion, I wish the authors of the paper would have contacted the IPsec 
> community at IETF or the opensource vendors of strongswan or libreswan before 
> publication. It also shows the limitations of formal proofs in the absence of 
> understanding operational deployments and implementation details. And that one 
> should never describe one's own inventions as "novel". Leave that praise to 
> others. 

We should indeed have contacted the IPsec community at IETF and the vendors of 
strongswan and libreswan. It would have been interesting to discuss the attack 
with you prior to publication. We hope that the answers and explanations we 
provided above remove (at least some of) your concerns with this paper. 

Best Regards, 
Tristan Ninet