[IPsec] Review of draft-ietf-ipsecme-ddos-protection-06

Paul Wouters <paul@nohats.ca> Tue, 31 May 2016 20:44 UTC

Return-Path: <paul@nohats.ca>
X-Original-To: ipsec@ietfa.amsl.com
Delivered-To: ipsec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 45AAF12D8D6 for <ipsec@ietfa.amsl.com>; Tue, 31 May 2016 13:44:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.526
X-Spam-Level:
X-Spam-Status: No, score=-2.526 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_ADSP_ALL=0.8, RP_MATCHES_RCVD=-1.426] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DpO1yfbbArww for <ipsec@ietfa.amsl.com>; Tue, 31 May 2016 13:44:20 -0700 (PDT)
Received: from mx.nohats.ca (mx.nohats.ca [IPv6:2a03:6000:1004:1::68]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A2BDF12D8D8 for <ipsec@ietf.org>; Tue, 31 May 2016 13:44:20 -0700 (PDT)
Received: from localhost (localhost [IPv6:::1]) by mx.nohats.ca (Postfix) with ESMTP id 3rK57l2VLDz4K8 for <ipsec@ietf.org>; Tue, 31 May 2016 22:44:19 +0200 (CEST)
X-Virus-Scanned: amavisd-new at mx.nohats.ca
Received: from mx.nohats.ca ([IPv6:::1]) by localhost (mx.nohats.ca [IPv6:::1]) (amavisd-new, port 10024) with ESMTP id 3YPobIKcqGM9 for <ipsec@ietf.org>; Tue, 31 May 2016 22:44:17 +0200 (CEST)
Received: from bofh.nohats.ca (206-248-139-105.dsl.teksavvy.com [206.248.139.105]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx.nohats.ca (Postfix) with ESMTPS for <ipsec@ietf.org>; Tue, 31 May 2016 22:44:17 +0200 (CEST)
Received: by bofh.nohats.ca (Postfix, from userid 1000) id 4535C677A53; Tue, 31 May 2016 16:44:16 -0400 (EDT)
DKIM-Filter: OpenDKIM Filter v2.10.3 bofh.nohats.ca 4535C677A53
Received: from localhost (localhost [127.0.0.1]) by bofh.nohats.ca (Postfix) with ESMTP id 336D44066B3C for <ipsec@ietf.org>; Tue, 31 May 2016 16:44:16 -0400 (EDT)
Date: Tue, 31 May 2016 16:44:16 -0400
From: Paul Wouters <paul@nohats.ca>
To: "ipsec@ietf.org WG" <ipsec@ietf.org>
Message-ID: <alpine.LRH.2.20.1605311635540.16809@bofh.nohats.ca>
User-Agent: Alpine 2.20 (LRH 67 2015-01-07)
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="US-ASCII"
Archived-At: <http://mailarchive.ietf.org/arch/msg/ipsec/iof0Z4KyN76Xl0TUjs8Ho7ki4y0>
Subject: [IPsec] Review of draft-ietf-ipsecme-ddos-protection-06
X-BeenThere: ipsec@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Discussion of IPsec protocols <ipsec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipsec>, <mailto:ipsec-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ipsec/>
List-Post: <mailto:ipsec@ietf.org>
List-Help: <mailto:ipsec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipsec>, <mailto:ipsec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 31 May 2016 20:44:23 -0000

This is a partial review of draft-ietf-ipsecme-ddos-protection-06
up to Section 6. I hope to complete the rest in the next few days.

I think this document needs another revision before continuing.
(and I would prefer it to be split in two)

Issues / Questions:

    An obvious defense, which is described in Section 4.2, is limiting
    the number of half-open SAs opened by a single peer.  However, since
    all that is required is a single packet, an attacker can use multiple
    spoofed source IP addresses.

I am not sure why this is mentioned here in this way, because the attack
of spoofed source IP is already handled effectively with DOS cookies. I
think it is better to state "bot-nets are large enough that they have
enough unique IP addresses" and avoid talking about spoofing in this
section altogether.


    Stage #3 includes public key operations, typically more than one.

It seems this sentence needs to say something that these operations are
very expensive, similar to describing the "effort" in the previous
sentences of stage #1 and stage #2.

    It seems that the first thing cannot be dealt with at the IKE level.
    It's probably better left to Intrusion Prevention System (IPS)
    technology.

I would rewrite this more authoritively, and not use the word "seems"

    Depending on the Responder implementation, this can be repeated with
    the same half-open SA.

I don't think this "depends on the implemention". Since any on-path
attacker can spoof rubbish, a Responder MUST ignore the failed packet
and remain ready to accept the real one for a certain about of time. And
this also applies to this later section in the document:

    If the received IKE_AUTH message failed to decrypt correctly (or
    failed to pass ICV check), then the Responder SHOULD still keep the
    computed SK_* keys, so that if it happened to be an attack, then the
    malicious Initiator cannot get advantage of repeating the attack
    multiple times on a single IKE SA.




    Retransmission policies in practice wait at least one or two seconds
    before retransmitting for the first time.

I'm not sure if this is still true. Libreswan starts at 0.5s and doubles,
and I know that iOS was faster too.

    When not under attack, the half-open SA timeout SHOULD be set high
    enough that the Initiator will have enough time to send multiple
    retransmissions, minimizing the chance of transient network
    congestion causing IKE failure.

I agree, but I'd like to note that this and the text just above mentioning
"several minutes" is kind of archaic. We found a limit of 30 seconds on
other implementations so common as a timeout, that we see no more value in
keeping an IKE exchange around for more then 30 seconds. (we do re-start
and try a new exchange from scratch for longer, in some configurations we
try that forever)

    For IPv6, ISPs assign between a /48 and a /64, so it makes sense to use
    a 64-bit prefix as the basis for rate limiting in IPv6.

Why does that make sense over using /48 ? Wouldn't you rather rate limit
some innocent neighbours over not actually defending against the attack?
If puzzles work as advertised, real clients on that /48 should still be
able to connect.

    Regardless of the type of rate-limiting used, there is a huge
    advantage in blocking the DoS attack using rate-limiting for
    legitimate clients that are away from the attacking nodes.  In such
    cases, adverse impacts caused by the attack or by the measures used
    to counteract the attack can be avoided.

I don't understand this paragraph at all. I guess "rate-limiting for
legitimate clients" just confuses me. I think it might attempt to be
saying "not blocking ranges with no attackers helps real clients", but
it is very unclear.

    to calculate the PRF

One does not "calculate" a PRF. One uses a PRF to calculate something.

The section that starts with "Upon receiving this challenge," seems to
be discussing the pros and conns of this method before it has explained
the method. The reader is forced to skip this or forward to section 7
and getting back to this part. I suggest to re-order some text to avoid
this, or to give a better short summary of the puzzle nature just before
this paragraph.

    When the Responder is under attack, it MAY choose to prefer
    previously authenticated peers who present a Session Resumption
    ticket (see [RFC5723] for details).

Why is this only a MAY? Why is it not a SHOULD or MUST?

    The Responder MAY require such
    Initiators to pass a return routability check by including the COOKIE
    notification in the IKE_SESSION_RESUME response message, as allowed
    by Section 4.3.2. of [RFC5723].

Perhaps this should say the responder SHOULD require COOKIEs for resumed
sessions if it also requires COOKIEs for IKE_INIT requests. That is, it
should not give preference to resumed sessions as those could be equally
forged as IKE_INIT requests.

    With a typical setup and typical Child SA lifetimes, there
    are typically no more than a few such exchanges, often less.

(ignoring the language) I do not believe this is true. This goes back to
the discussion on how often people deploy liveness probes. Implementors
seem to think 30s, while endusers want and do configure things like 1s.
I don't think the text about the amount of IKE exchanges are typical
are needed because the text below talks about specific abuse anyway,
and not in terms of just number of exchanges.

       If the peer creates too many Child SA with the same or overlapping
       Traffic Selectors, implementations can respond with the
       NO_ADDITIONAL_SAS notification.

I think this requires normative language, eg: implementations MUST respond
with a NO_ADDITIONAL_SAS notification. The same for the next bullet item
where it says "implementations can introduce an artificial delay", which
should be like: "MAY introduce an artificial delay" (or even SHOULD, or
rewrite "too many" to "many" and use MAY)


Section 5 switchs from talking about "the Responder" to "the implementation".
I think it should be "the Responder" throughout the document.

     the retransmitted messages should be silently discarded.

That should be normative too, MUST be discarded.

NITS:

always bounded -> always bound

"effectively defend" -> defend
(if it was "effective", we wouldn't need puzzles :)

thwart -> prevent or handle or counter?
(thwart is just an odd/uncommon word for non-native englush speakers)

The following sentence kind of runs on:

    Generating the IKE_SA_INIT request is cheap, and sending multiple
    such requests can either cause the Responder to allocate too much
    resources and fail, or else if resource allocation is somehow
    throttled, legitimate Initiators would also be prevented from setting
    up IKE SAs.

How about:

    Generating the IKE_SA_INIT request is cheap. Sending large amounts of
    IKE_SA_INIT requests can cause a Responder to use up all its resources.
    If the Responder tries to defend against this by throttling new requests,
    this will also prevent legitimate Initiators from setting up IKE SAs.

Next,

    Yes, there's a stage 4 where the Responder actually creates Child
    SAs, but when talking about (D)DoS, we never get to this stage.

This is rather strange language for an RFC, how about:

    The fourth stage where the Responder creates the Child SA
    is not reached by attackers who cannot pass the authentication
    step.


so it's -> so it is

attempt to either exhaust -> attempt either to exhaust

This should be easy because -> this is easy because

even without changes to the protocol -> without changes to the protocol

Puzzles, introduced in Section 4.4, do the same thing only more of it ->
Puzzles, introduced in Section 4.4, accomplish this goal and more.

They don't have to be so hard -> Puzzles do not have to be so hard

can't -> cannot

it's -> it is

they increase the cost of a half-open SAs for the attacker so that it can
create only a few. ->
puzzles increase the cost of creating half-open SAs so the attacker is
limited in the amount they can create.

Reducing the amount of time an abandoned half-open SA is kept attacks
the issue from the other side. It reduces the value the attacker
gets from managing to create a half-open SA.  ->
Reducing the lifetime of an abandoned half-open SA also reduces the
impact of such attacks.

(I don't much like using comma's for numbers, as it means different things
  in different parts of the worlds. eg 60,000 and 1,000 in this document)

Reduce the retention time to 3 seconds, and the attacker needs to
create 20,000 half-open SAs per second. ->
If the retention time is reduced to 3 seconds, the attacker would need to
create 20,000 half-open SAs per second to get the same result.

making it more likely to thwart an exhaustion attack against Responder
memory ->

making it more likely that the attacks run out of memory before the Responder.

The attacker has two ways to do better -> The attacker has two alternative
attacks to do better

It seems that the first thing -> It seems that the first alternative

On the other hand, sending an IKE_AUTH request is surprisingly cheap. ->
On the other hand, the second alternative of sending an IKE_AUTH request
is very cheap.

It requires a proper IKE header with the correct IKE SPIs, and it
requires a single Encrypted payload.  The content of the payload
might as well be junk.  ->
It requires generating a roper IKE header with correct IKE SPIs and a
single Encrypted payload. The content of the Encrypted payload is
irrelevant and therefore cheap to generate.

does not check -> fails the integrity check.

Puzzles can make attacks of such sort -> Puzzles make attacks of such sort

Puzzles have their place as part of #4 -> Puzzles are used as a solution
for strategy #4.

Defense Measures while IKE SA is being created ->
Defense Measures while the IKE SA is being created

any IKE_SA_INIT request will require solving a puzzle. ->
any IKE_SA_INIT request will be required to solve a puzzle.

The downside -> The disadvantage
(the other case does use advantage/disadvantage properly, so this is the odd
  one out)

can still effectively DoS the Responder -> can still effectively DDoS the Responder.
(there are some more DoS -> DDoS changes that you could make)

to mitigate DoS attack -> to mitigate DoS attacks

the cookie mechanism from -> the cookie mechanism of

    It is loosely based on the proof-of-work technique used
    in Bitcoins [bitcoins].

I think refering to bitcoins is a bit of a stretch and only distracts.

    This sets an upper bound, determined by the
    attacker's CPU, to the number of negotiations it can initiate in a
    unit of time. ->
    Puzzles set an upper bound, determined by the
    attacker's CPU, to the number of negotiations the attacker can initiate in a
    unit of time. ->

for it to make any difference in mitigating DDoS attacks. -> [remove]

and this fact allows -> and this allows

a malicious peer -> an attacker   (we used attacker all the way up to here
in this document, why change it now?)


Preventing Attacks using "Hash and URL" Certificate Encoding ->
Preventing "Hash and URL" Certificate Encoding attacks

In IKEv2 each side may use "Hash and URL" Certificate Encoding ->
In IKEv2 each side may use the "Hash and URL" Certificate Encoding

a DoS attack on responder -> a DoS attack on the responder

  before continue downloading. -> before continuing to download the file.

See Section 5 of [RFC7383] for details. -> See Section 5 of [RFC7383] for
details on how to mitigate these attacks.

Defense Measures after IKE SA is created -> Defense Measures after an IKE SA is created

Once IKE SA is created -> Once an IKE SA is created

there is usually not much traffic over it -> there usually are only a limited
amount of IKE messages exchanged.

In most cases this traffic consists of exchanges aimed to create
additional Child SAs, rekey, or delete them and check the liveness of
the peer. ->
This IKE traffic consists of exchanges aimed to create additional Child SAs,
IKE rekeys, IKE deletions and IKE liveness tests.

Such behavior may be caused by buggy implementation, misconfiguration or be
intentional.  The latter becomes more of a real threat if the peer uses NULL
Authentication, described in [RFC7619]. In this case the peer remains
anonymous, allowing it to escape any responsibility for its actions.  ->
Such behavior can be caused by broken implementations, misconfiguration or
as an intended attack. Extra case should be taken in the case of NULL
Authentication [RFC7619] where one essentially allows IKE SAs with untrusted
third parties that could be malicious.

See Section 3 of [RFC7619] for details -> See Section 3 of [RFC7619] for details on how to mitigate attacks when using NULL Authentication.

The following recommendations for defense against possible DoS attacks after
IKE SA is established are mostly intended for implementations that allow
unauthenticated IKE sessions; however, they may also be useful in other
cases. ->
The following recommendations apply especially for NULL Authenticated IKE
sessions, but also apply to authenticated IKE sessions, with the difference
that in the latter case, the identified peer can be locked out.

then the peer could initiate multiple simultaneous -> peers are able to
initiate multiple simultaneous

that could increase host resource consumption -> that increases host resource consumption

Since currently there is no way -> Since there is no way

decrease window size once it was increased -> decrease the window size
once it has been increased

For that reason, it is NOT RECOMMENDED to ever increase the IKEv2 window size
above its default value of one if the peer uses NULL Authentication.->
It is NOT RECOMMENDED to allow an IKEv2 window size greater than one when
NULL Authentication has been used.

If the peer initiates requests to rekey IKE SA or Child SA too
often, implementations can respond to some of these requests with
the TEMPORARY_FAILURE notification, indicating that the request
should be retried after some period of time. ->
If a peer initiates an abusive amount of CREATE_CHILD exchanges, the
Responder SHOULD reply with TEMPORARY_FAILURE notifications indicating
the peer must slow down their requests.

If the peer initiates too many exchanges of any kind, implementations can
introduce an artificial delay before responding to each request message.->
If a peer initiates many exchanges of any kind, the Responder MAY
introduce an artificial delay before responding to the request.

"the implementation need" -> the Responder needs

making it possible to process requests from the others -> and frees up
resources on the Responder that can be used for answering legitimate clients.

Note, that if the Responder receives retransmissions -> If the Responder
receives retransmissions

the retransmitted messages should be silently discarded. -> the retransmitted
messages MUST be discarded.

The delay should not be too long to avoid causing the IKE SA to be deleted on the other end due to timeout. ->
The delay must be short enough to avoid legitimate peers deleting the IKE
SA due to a timeout.

[ to be continued ]