Re: [IPsec] I-D Action: draft-ietf-ipsecme-ddos-protection-05.txt

"Valery Smyslov" <svanru@gmail.com> Mon, 28 March 2016 14:06 UTC

Message-ID: <22A743E8E50E402EBD698E354719C11C@buildpc>
From: Valery Smyslov <svanru@gmail.com>
To: Paul Wouters <paul@nohats.ca>
References: <20160321201328.12185.28466.idtracker@ietfa.amsl.com> <alpine.LFD.2.20.1603212023560.16028@bofh.nohats.ca> <3A97CB619AEA41FE9EE901AF4FAF45CA@buildpc> <alpine.LFD.2.20.1603271303400.15492@bofh.nohats.ca>
Date: Mon, 28 Mar 2016 17:05:55 +0300
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="iso-8859-1"; reply-type="response"
Content-Transfer-Encoding: 7bit
Archived-At: <http://mailarchive.ietf.org/arch/msg/ipsec/rDeN0s3aUh5DQodaDhGB-2KAM0U>
Cc: ipsec@ietf.org
Subject: Re: [IPsec] I-D Action: draft-ietf-ipsecme-ddos-protection-05.txt
Precedence: list

Hi Paul,

>> Do I get you right that you want to remove the following text?
>>
>>                                                    IPv6 networks are
>>    currently a
>>    rarity, so we can only speculate on what their wide deployment will
>>    be like, but the current thinking is that ISP customers will be
>>    assigned whole subnets, so we don't expect the kind of NAT deployment
>>    that is common in IPv4.
> 
> Yes. Since it is speculative and will quickly go stale in the RFC (we hope :)

OK.

>> It is based on both. You must maintain some statistics information
>> (number of half-open IKE_SA_INIT, number of failed IKE_AUTH)
>> and make a decision whether to use defensive measures
>> by analyzing this statistics.
> 
> I agree on the half-open SA's, I'm not convinced about meassuring failed
> IKE_AUTH. Previous failures cost no current resources, so I'm not sure
> if it should be used to as a metric. The metric should be based on
> current resources, which seem already reflected in the number of
> half-open SA's you're carrying and the amount of resources those are
> using.

You are right that once an attacker sent bogus IKE_AUTH request
and the responder spent some CPU resources to calculate the DH
to only recognize that it cannot decrypt the message, nothing can be 
done since the responder has already spent that resources.

However, let's look on a wider picture. The goal of attacker is to 
exhaust responder's CPU power (not the memory like in half-open SAs), 
so the attacker would initiate zillions of fake IKE SAs to send bogus IKE_AUTH 
request on each of them. Only in this case the attack will be successful. 
But since the responder's CPU most likely doesn't have zillions cores,
the processing of all these SAs will be done in sequence
and will take some time. If during that time the responder
measures the percentage of bogus IKE_AUTH requests
it can detect that the attack is most likely in progress and
turn on IKE_AUTH puzzles. So few millions of that 
zillions bogus IKE_AUTH requests would pass before
the defense is activated, however the others will be thwarted.

>> In other words, you must distinguish attack from just a high load.
> 
> "just a high load" would not have zillions of half-open SA's.

Sure. And would not have zillions of bogus IKE_AUTH requests.
That's why it is not sufficient to measure only load.
If the responder cannot tolerate the load caused solely by legitimate users
(no half-open SAs, no bogus IKE_AUTH requests, just too many initiators),
then you'd better replace it with the more powerfull one 
than to punish the users.

>> See above. Once you make a decision that an attak is in progress
>> (e.g. by monitoring the number of failed IKE_AUTH within
>> last N seconds), you'll turn on IKE_AUTH puzzles or take some other measures.
> 
> you seem to think you can save the poor legitimate client in a see of
> botnet. 

By all means. Otherwise we would fail completely.

I agree that in case of attack the legitimate clients would suffer,
however the server shoul do its best to allow them to work.

> I am not convinced of that. Excluding that, the only thing the
> server needs to do when under attack is ensure it does not die. 

No. It should continue to serve legitimate clients (at least try to do it).

> So a limit of the half-open SA's make sense _for the server itself_. It's not
> doing that to help legitimate clients - those have already lost. They
> are drowned out (with or without a sea of puzzles)

The liveness of the server is not a goal per se.
If it were the goal then there would be a simple and perfect DDoS defense - 
just unplug network cable. The server will be alive forever,
however it won't provide the service, so the _Denial_of_Service_
attack will be successful. We don't want that kind of defense.

>>>   The cookie mechanism limits the amount of allocated
>>>   state to the size of the bot-net, multiplied by the number of half-
>>>   open SAs allowed per peer address, multiplied by the amount of state
>>>   allocated for each half-open SA.  With typical values this can easily
>>>   reach hundreds of megabytes.
>>>
>>>  It would be clearer to to mention explicitely that the cookie mechanism
>>>  prevents spoofed packets from taking up state, thereby limiting [....]
>>
>> Could you please be more explicit what text you are not happy with?
> 
> I don't think it is obvious enough in the text that cookies prevent
> attacks based on source IP spoofing, and tHat this attack is based on
> a network of compromised machines that talk IKE without needing
> spoofing. I would also just say "attacker" instead of "bot-net" to keep
> it generic.

OK.

>> With the latter approach all the information regarding the SA
>> is stored in the ticket itself. The server stores nothing in this case - it 
>> just decrypts the presented ticket and resumes the IKE SA.
>> In this case the server doesn't know whether the ticket
>> is used before unless it maintaines a cache of recently
>> used tickets.
> 
> Ahh okay. Thanks for the information. That makes sense now. But it does
> open up another attack. Attackers can flood the responder with bogus
> resumption tickets, using up the responders CPU. But I see that is covered
> itself in RFC 5723 section 9.3, but that is currently not references in
> your current document. Possibly add that for completeness sake?

OK.

>>>  advise here is warranted - it has nothing to do with ddos.
>>
>> I think this advise is closely related to DoS protection. You yourself 
>> described the attack two lines above.
> 
> It is an (obvious) attack but not a DDOS attack. eg:
> 
> client    IKE_INIT Request          --->
>                                     <--- IKE_INIT Response  server
> attaker   IKE_AUTH Request (bogus)  --->  [fails]
> client    IKE_AUTH Request          --->
> 
> I think any implementor should really already handle this case in
> general. Any failures of unauthenticated packets must be dropped
> and the timeout timer continued to wait for the legitimate response.
> That's a core part of the IKEv2 spec, so I don't think that needs
> to be repeated in this document.

The text is not exactly about this. 

Once the responder sent IKE_SA_INIT response it is able to 
calculate SKEYSEED and SK_* keys. However, it is a good
idea not to do it immeditely, but instead wait for the IKE_AUTH request to come.
The reason is that in case IKE_AUTH request never came (attack, 
network problem etc.), the responder would not spent 
quite a lot of CPU resources calculating D-H shared secret.

However, in this case careless implementations could
discard the just computed SK_* keys if the IKE_AUTH 
request failes to decrypt. This is wrong, because these
key depend only in the information from IKE_SA_INIT 
and there is no reason to discard them once they are computed. 
The text just emphasize this idea.

> Just to clarify, I do think you need to rewrite the text to not blindly
> trust the ASN.1 length encoding. Or just remove that advise and stick
> to the other items to discuss right after.

OK.

>> Liveness check is about 50 bytes. Even if it is performed
>> every second, it results in 2 packet/sec and 100 bytes/sec traffic per a 
>> client. Is it a lot?
> 
> See other discussions. We sadly have a strong demand by operational
> people to have really short liveness timers. While as implementor, we
> have refused < 1s, people often do use 1s timers as a way to do High
> Availability. So I think the advise of limiting the number of allowed
> responses for an IKE SA in general is dangerous. There are many
> unexpected use cases.

No, there is no advises to limit the number of responses.
There is an advice to delay responce in case of there are too
many requests in order to limit the rate of requests. 
If your implementation relies on an immediate reply and 
no packets loss, then don't follow this advice.

However, I think that if implementations cannot tolerate
2-3 sec delay to requests, then they cannot operate reliably.

>> Because with NULL auth the peer is not authenticated and we'd rather limit 
>> him/her abilities to mount DoS attack
>> by initiating N exchanges in parallel, that would increase
>> our peak load. If the peer is authenticated, then launching
>> N exchanges simultaneously is not an attack in general. And if the 
>> authenticated peer mounts such a DoS attack, the
>> he/she could be traced down and either out-of-band
>> measures are taken or peer's credentials are revoked.
> 
> So you are saying basically that this text should have appeared in the
> AUTH NULL RFC, but didn't. 

The more general text was included there (Section 3.2), and you were the author :-)

> Perhaps then a separate section for AUTH NULL
> clients could be put in this document, and then also let it update that
> RFC?

I don't think the update is needed. RFC 7619 has already referenced
this document (as a draft) and has warnings that NULL auth
clients are unauthenticated and thus can mount various attacks.

>> Is it really needed? RFC 7296 doesn't deal with NULL auth,
>> and RFC 7619 does reference this draft in Security Considerations.
>> What others think of it?
> 
> I'm curious too what others think this is: a "recommendation" or a "change".

I think it is a recommendation.

>> It doesn't matter what exchange type is. The intention is to artificially 
>> limit the number of exchanges the malicious peer can initiate per second.
> 
> See earlier discussion about 1s liveness probes.... It is very common.

As I said above you can ignore this recommendation
if your implementation so deeply relies on quick response.

>> I don't think artificaial delay is a violation of RFC 7296.
>> Each IKE request will be answered. RFC 7296 doesn't require that
>> it is answered immediately (or as soon as responder can prepare the 
>> response).
> 
> Yes, you are right. Strike this remark :)

OK.

>>>  While the document mentions Fragmentation with respect to puzzles, it
>>>  does not mention ddos attacks based on malicious fragmentation packets.
>>>  It could be that the base RFC is clear enough, but perhaps this document
>>>  should give some advise too?
>>
>> I think RFC 7383 lists possible DoS attacks in Security Considerations 
>> section.
>> Do you think it's not enough?
> 
> It is, but you are inconsistent with what you pull it or reference from
> other RFC's? See above discussion.

OK.

> btw. thanks for this discussion. It has raised some interesting
> implementation details, some of which I now want to implement :)

Thank you :-)

> Paul

Regards,
Valery.

[IPsec] I-D Action: draft-ietf-ipsecme-ddos-prote… internet-drafts
Re: [IPsec] I-D Action: draft-ietf-ipsecme-ddos-p… Paul Wouters
Re: [IPsec] I-D Action: draft-ietf-ipsecme-ddos-p… Valery Smyslov
Re: [IPsec] I-D Action: draft-ietf-ipsecme-ddos-p… Tero Kivinen
Re: [IPsec] I-D Action: draft-ietf-ipsecme-ddos-p… Tero Kivinen
Re: [IPsec] I-D Action: draft-ietf-ipsecme-ddos-p… Michael Richardson
Re: [IPsec] I-D Action: draft-ietf-ipsecme-ddos-p… Paul Wouters
Re: [IPsec] I-D Action: draft-ietf-ipsecme-ddos-p… Valery Smyslov
Re: [IPsec] I-D Action: draft-ietf-ipsecme-ddos-p… Paul Wouters
Re: [IPsec] I-D Action: draft-ietf-ipsecme-ddos-p… Valery Smyslov
Re: [IPsec] I-D Action: draft-ietf-ipsecme-ddos-p… Paul Wouters