Re: [IPsec] Review of draft-ietf-ipsecme-ddos-protection-06

"Valery Smyslov" <svanru@gmail.com> Fri, 03 June 2016 13:24 UTC

Message-ID: <E61D75BBDD0F4A159352B3258BBAA7DE@buildpc>
From: Valery Smyslov <svanru@gmail.com>
To: Paul Wouters <paul@nohats.ca>
References: <alpine.LRH.2.20.1605311635540.16809@bofh.nohats.ca> <4200F5373D5542C985F3D4C51609213C@buildpc> <alpine.LRH.2.20.1606022148040.23132@bofh.nohats.ca>
Date: Fri, 03 Jun 2016 16:23:48 +0300
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="iso-8859-1"; reply-type="response"
Content-Transfer-Encoding: 7bit
Archived-At: <http://mailarchive.ietf.org/arch/msg/ipsec/7CI-uJ7r_NIcLU4xePAMJX8KYRs>
Cc: ipsec@ietf.org, Yoav Nir <ynir.ietf@gmail.com>
Subject: Re: [IPsec] Review of draft-ietf-ipsecme-ddos-protection-06
Precedence: list

Hi Paul,

>>>     An obvious defense, which is described in Section 4.2, is limiting
>>>     the number of half-open SAs opened by a single peer.  However, since
>>>     all that is required is a single packet, an attacker can use multiple
>>>     spoofed source IP addresses.
>>>
>>>  I am not sure why this is mentioned here in this way, because the attack
>>>  of spoofed source IP is already handled effectively with DOS cookies. I
>>>  think it is better to state "bot-nets are large enough that they have
>>>  enough unique IP addresses" and avoid talking about spoofing in this
>>>  section altogether.
>>
>> Here are some general observations of IKEv2 vulnerabilities,
>> regardless of the existing and proposed defense mechanisms, which are 
>> described in subsequent sections.
> 
> But it is incomplete and out of place. Section is is about The
> Vulnerability. It talks about vulnerabilities, then this one solution to
> one thing, then goes into detail about the work that makes it
> vulnerable. That is why I suggest to just remove the paragraph.

Ok, I see your point.

>>>     Stage #3 includes public key operations, typically more than one.
>>>
>>>  It seems this sentence needs to say something that these operations are
>>>  very expensive, similar to describing the "effort" in the previous
>>>  sentences of stage #1 and stage #2.
>>
>> OK. How about:
>>
>>    Stage #3 may include public key operations if certificates are involved.
>>    These operations are often more computationly expensive than those
>>    performed at stage #2.
> 
> Looks good.
> 
>>>     It seems that the first thing cannot be dealt with at the IKE level.
>>>     It's probably better left to Intrusion Prevention System (IPS)
>>>     technology.
>>>
>>>  I would rewrite this more authoritively, and not use the word "seems"
>>
>> OK. How about:
>>
>>    If an attacker is so powerfull that it is able to overwhelm
>>    the Responder's CPU that deals with generating cookies,
>>    then the attack cannot be dealt with at the IKE level and
>>    must be handled by means of the Intrusion Prevention System (IPS)
>>    technology.
> 
> Looks good.
> 
>>>     Depending on the Responder implementation, this can be repeated with
>>>     the same half-open SA.
>>>
>>>  I don't think this "depends on the implemention". Since any on-path
>>>  attacker can spoof rubbish, a Responder MUST ignore the failed packet
>>>  and remain ready to accept the real one for a certain about of time. 
>>
>> "Depending on the Responder implementation" means here that if along with 
>> discarding the failed packet the Responder also discards the computed SK_* 
>> keys, then it will need to re-calculate them again
>> when the next IKE_AUTH packet is received, so the attack can be
>> repeated. The SK_* keys don't depend on IKE_AUTH messages,
>> so in general there is no need to discard them even if the received
>> IKE_AUTH packet failed to decrypt properly, and the draft advises to keep 
>> them in this case. However, implementations may have good reasons to do this 
>> (e.g. to free hardware resources if crypto is performed in HW).
> 
> Oh, I didnt realise you talked about re-using DH components. Ok, in that
> case it makes sense but you might want to say it only applies to those
> who re-use DH calculations between different IKE peers. Our software
> never does that (and I think FIPS also puts additional constraints on
> this)

No, it is not about re-using DH private key with different peers. 
I probably poorly explained. Let me try again.

Once the IKE_SA_INIT is complete the responder has all needed data
to calculate SKEYSEED and SK_* keys. However, it is a CPU consuming
operations, so the responder may want to postpone them until the keys are
really needed, i.e. until it receives the IKE_AUTH request from the initiator.
This behaviour allows responder not to waste resources in case 
IKE_SA_INIT was from an attacker and IKE_AUTH request never comes. 

Once IKE_AUTH request arrives the responder performs DH, calculates SKEYSEED 
and SK_* keys that allows him to decrypt and verify this request. In case it fails
to decrypt IKE_AUTH request, the responder has two possibilities - 
keep just calculated SK_* keys until the next (hopely proper) IKE_AUTH
request is received or discard them (e.g. to save crypto resources) and
recalculate them again once the next IKE_AUTH request is received (note
that re-calculating will result in EXACTLY the same keys, since they don't
depent on any data from IKE_AUTH). The draft recommends to keep the 
keys until the proper IKE_AUTH request is received (or until the exchange 
timed out). This advise may look obvious, but I think is still worth to mention.

I recall we've already discussed this while reviewing the -05 version...

>> Please, see above.
>>
>> Do you think more explanationa are needed here?
> 
> No I guess it is fine.

Are you sure after the above explanation?

>>>     Retransmission policies in practice wait at least one or two seconds
>>>     before retransmitting for the first time.
>>>
>>>  I'm not sure if this is still true. Libreswan starts at 0.5s and doubles,
>>>  and I know that iOS was faster too.
>>
>> Well, there are different implementations and each has its own
>> retransmission policy. The Responder should take into account
>> the slowest sensible retransmission policy, which seems to be the one 
>> described in the draft.
>>
>> Will the following text make you happy?
>>
>>    Many retransmission policies in practice wait one or two seconds
>>    before retransmitting for the first time.
> 
> It would be nicer to rewrite it without mentioning any absolute times.
> That way the text will also remain more relevant in the future if/when
> these timings change.

I don't think it is a good idea. The draft should give implementers some
estimate timings. "One or two seconds" is here a "worst case". If Implementers
take this data into consideration when selecting the short timeout,
they'll always be on the safe side, because if some implementations retransmit
more aggressively, then they'll always fit within this time period.

So I'd rather keep the text as above.

>>>     When not under attack, the half-open SA timeout SHOULD be set high
>>>     enough that the Initiator will have enough time to send multiple
>>>     retransmissions, minimizing the chance of transient network
>>>     congestion causing IKE failure.
>>>
>>>  I agree, but I'd like to note that this and the text just above mentioning
>>>  "several minutes" is kind of archaic. We found a limit of 30 seconds on
>>
>> That's what RFC 7296 recommends (Section 2.4).
> 
> Okay, fair enough. I guess you mention shortening it while under attack,
> so it's all okay.
> 
>>>  other implementations so common as a timeout, that we see no more value in
>>>  keeping an IKE exchange around for more then 30 seconds. (we do re-start
>>>  and try a new exchange from scratch for longer, in some configurations we
>>>  try that forever)
>>>
>>>     For IPv6, ISPs assign between a /48 and a /64, so it makes sense to use
>>>     a 64-bit prefix as the basis for rate limiting in IPv6.
>>>
>>>  Why does that make sense over using /48 ? Wouldn't you rather rate limit
>>>  some innocent neighbours over not actually defending against the attack?
>>>  If puzzles work as advertised, real clients on that /48 should still be
>>>  able to connect.
>>
>> Well, I'm not an IPv6 expert. Probably Michael Richardson (who suggested this 
>> change) or somebody else will comment on this.
> 
> This does not so much relate to IPv6 but to whether you rather
> overestimate or underestimate the attacker's IP space. If you
> underestimate, you will take longer to punish the attacking IPs. If you
> overestimate you will needlessly slow down legitimate clients.
> 
> I don't know which of the two is better, hence my objection to "it makes
> sense" because I don't see that.

What's your suggestion for this text? Just remove "it make sense" or 
completely rewrite the para? If the latter, please provide the text.

>>>     Regardless of the type of rate-limiting used, there is a huge
>>>     advantage in blocking the DoS attack using rate-limiting for
>>>     legitimate clients that are away from the attacking nodes.  In such
>>>     cases, adverse impacts caused by the attack or by the measures used
>>>     to counteract the attack can be avoided.
>>>
>>>  I don't understand this paragraph at all. I guess "rate-limiting for
>>>  legitimate clients" just confuses me. I think it might attempt to be
>>>  saying "not blocking ranges with no attackers helps real clients", but
>>>  it is very unclear.
>>
>> Yoav?
>>
>>>     to calculate the PRF
>>>
>>>  One does not "calculate" a PRF. One uses a PRF to calculate something.
>>
>> OK.
> 
> You didn't provide text but I assume you changed it somehow.

s/PRF/"output of PRF" or s/PRF/"the result of PRF"   Is it OK?

>>>  The section that starts with "Upon receiving this challenge," seems to
>>>  be discussing the pros and conns of this method before it has explained
>>>  the method. The reader is forced to skip this or forward to section 7
>>>  and getting back to this part. I suggest to re-order some text to avoid
>>>  this, or to give a better short summary of the puzzle nature just before
>>>  this paragraph.
>>
>> It describes the puzzles mechanism in general, while Sections 7 & 8
>> describe the particular instantiation of puzzles in IKEv2.
>> I'd rather to keep some background about puzzles here,
>> so that all possible defenses are described in one place.
> 
> Then I think it still requires a one-line introduction to puzzles.

I'm a bit confused. I've been thinking that the whole Section 4.4 
is a high-level description of the puzzles. Where do you want to insert
the one-line introduction?

>>>     When the Responder is under attack, it MAY choose to prefer
>>>     previously authenticated peers who present a Session Resumption
>>>     ticket (see [RFC5723] for details).
>>>
>>>  Why is this only a MAY? Why is it not a SHOULD or MUST?
>>
>> A good question. I think the idea was not to force the Responder
>> to serve only resumed clients and to let him(her) prioterize
>> clients according to its own policy. In my opinion MUST is too strong, but 
>> SHOULD is probably OK.
> 
> In the famous words of Steve Kent, if you say SHOULD instead of MUST,
> explain when the Responder should not.

When it has good reasons :-)

Seriously, consider the situation when the responder finds itself
under attack and switches to only respond to IKE_SA_RESUME
requests. In this case it will leave legitimate clients without
resumption tickets (e.g. ticket expired) out of scope. 

I think there is no reasom to put MUST here, since in any case
it is a local policy which dictates the responder's behaviour,
and ther are no interoperability issues whether is is MAY, 
SHOULD or MUST, it is just the responder's local policy matter.
So SHOULD is just good advise.

>>>     The Responder MAY require such
>>>     Initiators to pass a return routability check by including the COOKIE
>>>     notification in the IKE_SESSION_RESUME response message, as allowed
>>>     by Section 4.3.2. of [RFC5723].
>>>
>>>  Perhaps this should say the responder SHOULD require COOKIEs for resumed
>>>  sessions if it also requires COOKIEs for IKE_INIT requests. That is, it
>>>  should not give preference to resumed sessions as those could be equally
>>>  forged as IKE_INIT requests.
>>
>> A good point. I tend to agree. Yoav?
>>
>>>     With a typical setup and typical Child SA lifetimes, there
>>>     are typically no more than a few such exchanges, often less.
>>>
>>>  (ignoring the language) I do not believe this is true. This goes back to
>>>  the discussion on how often people deploy liveness probes. Implementors
>>>  seem to think 30s, while endusers want and do configure things like 1s.
>>>  I don't think the text about the amount of IKE exchanges are typical
>>>  are needed because the text below talks about specific abuse anyway,
>>>  and not in terms of just number of exchanges.
>>
>> Are you suggesting to remove it?
> 
> Yes. You can just talk about something like "If an abusive amount of
> (otherwise) valid IKE messages are received, ....." and let the
> implemetor decide how many IKE messages counts as abusive? 

OK, I see your point.

> That also
> avoids what to do when rekey's happen because that would likely reset
> the counter because it is a new state?

Well, I think the proper approach is to measure the rate of such
exchanges (per SA or course). So, just reset the counter every 
second and measure how many exchanges happened within
the second. If the number looks abusive, take measures.

>>>        If the peer creates too many Child SA with the same or overlapping
>>>        Traffic Selectors, implementations can respond with the
>>>        NO_ADDITIONAL_SAS notification.
>>>
>>>  I think this requires normative language, eg: implementations MUST respond
>>>  with a NO_ADDITIONAL_SAS notification. The same for the next bullet item
>>>  where it says "implementations can introduce an artificial delay", which
>>>  should be like: "MAY introduce an artificial delay" (or even SHOULD, or
>>>  rewrite "too many" to "many" and use MAY)
>>
>> I'd use MAY and keep "too many". "Too many" means here that a peer is at 
>> least misbehaved, while just "many" doesn't imply this
>> (in my reading).
> 
> You cannot say "too many" and "MAY". If it is too many, it is abusive.
> So you MUST take action. On the other hand if you say "many", then you
> leave it open to interpretation whether it is abuse or not, and you can
> use "MAY".

I see. Language differences :-) Ok, let's remove "too".

>>>  Section 5 switchs from talking about "the Responder" to "the
>>>  implementation".
>>>  I think it should be "the Responder" throughout the document.
>>
>> OK.
>>
>>>      the retransmitted messages should be silently discarded.
>>>
>>>  That should be normative too, MUST be discarded.
>>
>> Agree.
> 
> Paul

Thank you,
Valery.

Re: [IPsec] Review of draft-ietf-ipsecme-ddos-pro… Valery Smyslov
Re: [IPsec] Review of draft-ietf-ipsecme-ddos-pro… Tero Kivinen
Re: [IPsec] Review of draft-ietf-ipsecme-ddos-pro… Valery Smyslov
Re: [IPsec] Review of draft-ietf-ipsecme-ddos-pro… Paul Wouters
Re: [IPsec] Review of draft-ietf-ipsecme-ddos-pro… Valery Smyslov
Re: [IPsec] Review of draft-ietf-ipsecme-ddos-pro… Paul Wouters
Re: [IPsec] Review of draft-ietf-ipsecme-ddos-pro… Waltermire, David A. (Fed)
Re: [IPsec] Review of draft-ietf-ipsecme-ddos-pro… Valery Smyslov
[IPsec] Review of draft-ietf-ipsecme-ddos-protect… Paul Wouters
Re: [IPsec] Review of draft-ietf-ipsecme-ddos-pro… Valery Smyslov
Re: [IPsec] Review of draft-ietf-ipsecme-ddos-pro… Paul Wouters
Re: [IPsec] Review of draft-ietf-ipsecme-ddos-pro… Valery Smyslov
Re: [IPsec] Review of draft-ietf-ipsecme-ddos-pro… Paul Wouters
Re: [IPsec] Review of draft-ietf-ipsecme-ddos-pro… Valery Smyslov