Re: [IPsec] Review of draft-ietf-ipsecme-ddos-protection-06

Paul Wouters <paul@nohats.ca> Thu, 30 June 2016 08:58 UTC

Date: Thu, 30 Jun 2016 04:58:07 -0400
From: Paul Wouters <paul@nohats.ca>
To: Valery Smyslov <svanru@gmail.com>
In-Reply-To: <CE2060023EDE4BDD838CBC70763875B4@buildpc>
Message-ID: <alpine.LRH.2.20.1606300444130.4545@bofh.nohats.ca>
References: <alpine.LRH.2.20.1605311635540.16809@bofh.nohats.ca> <4200F5373D5542C985F3D4C51609213C@buildpc> <alpine.LRH.2.20.1606022148040.23132@bofh.nohats.ca> <E61D75BBDD0F4A159352B3258BBAA7DE@buildpc> <alpine.LRH.2.20.1606031155230.11420@bofh.nohats.ca> <A6682BC2468947F1A1669A9B9D558BF5@buildpc> <alpine.LRH.2.20.1606222214230.27151@bofh.nohats.ca> <CE2060023EDE4BDD838CBC70763875B4@buildpc>
User-Agent: Alpine 2.20 (LRH 67 2015-01-07)
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"; format="flowed"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ipsec/U7DpBKdGznw2VwT6UDdDNuuVgus>
Cc: ipsec@ietf.org, Yoav Nir <ynir.ietf@gmail.com>
Subject: Re: [IPsec] Review of draft-ietf-ipsecme-ddos-protection-06
Precedence: list

On Tue, 28 Jun 2016, Valery Smyslov wrote:

This is part two of my review. I do think the document needs some work
moving text to better locations and I have some questions I would like
to see resolved. I wrote down some nits but stopped doing that in the
end because I think chunks of text shoud be moved. By biggest issue
is that section 6 and section 7 are not clearly separated, and I see
various chunks of text I think is in the wrong section (or is section 6
text repeated in section 7)

I think this document should update 7296 due to adding non-encrypted
payloads to IKE_AUTH - even though the core IKEv2 RFC does not say that
is not allowed. Someone implementing 7296 should be aware of it to allow
it in their implementation.

I will redo a nits/grammer check on the next iteration of the document.

Note, that it is not possible with clients using NULL Authentication,
since their identity cannot be verified.

It feels that this sentence should be followed by some specific advise for
this category of clients?

Section 6 descibes in item 1: "A general DDoS attack" some numbers that I
find dangerous to follow. It descibes a scenario of 20 tunnels per second
as expected that when increased to 100 tunnels per second is considerared
a DDOS attack. But that does not take into account network failures
that would cause a large chunk of clients to reconnect at once. While
the draft says this "can be interpreted as an attack", implementors
might just put in these hardcoded numbers from the document. I'd rather
describe it a bit different so we don't give them these absolute numbers.

for example:

Typical measures might be 5 concurrent half-open SAs, 1 decrypt
failure, or 10 EAP failures within a minute.

Why is this "typical"? And again, these numbers provide too tempting easy
numbers for implementors to hardcode in their implementation. And I
think these are speculation at best.

puzzle difficulty should be set to such a level (number of zero-bits)
that all legitimate clients can handle it without degraded user
experience.

This is of course the big issue of this draft. Is this possible at all?
Note te _lack_ of specific numbers here :)

I don't understand this paragraph:

it is best to begin by requiring stateless cookies from all
Initiators. This will force the attacker to use real source
addresses, and help avoid the need to impose a greater burden in the
form of cookies on the general population of Initiators.

Perhaps the "form of cookies" was meant to say "puzzles" ?

And this one confuses me too:

When cookies are activated for all requests and the attacker is still
managing to consume too many resources, the Responder MAY increase
the difficulty of puzzles imposed on IKE_SA_INIT

But up to now, we haven't been given advise to enable puzzles, and now we
are recommended to increase the difficult of puzzles?

If the load on the Responder is still too great, and there are many
nodes causing multiple half-open SAs or IKE_AUTH failures, the
Responder MAY impose hard limits on those nodes.

Unlike elsewhere, it does not describe what failure to send back, if
anything, to the responder. Sending back a NOTIFY might actually not
be desirable, as it would tell the attacker their attack has reached
a good enough volume to lock out real clients. Some advise on how
to handle this scenario is needed.

And confusingly, only NOW are puzzles suggested as a next "last step",
even though before this it already told us to incrase puzzle strength.

Why is there not an option to add puzzles to the CREATE_CHILD_SA to
punish just those clients requesting too many of those? (I'm not sure
if it is a good idea, I'm asking because I'm not sure it is a bad idea)

Section 7 has:

According to the plan, described in Section 6, [.....]

We are not Cylons :)

Seriously though, section 6 describes _when_ to activate puzzles, and section 7
should describe how to activate puzzles without any contetx of when to enable it or not.
Currently, section 7 repeats some of the "plan" of section 6, which should not be
needed and makes the implementation section longer/harder to read. Some of the text
in section 7, like the new "processing some fraction of requests" should be in section 6,
not section 7.

7.1.1 states: "then it MUST include two notifications in its response message"
So earlier text said "may" also use cookies, and this text assumes there puzzles
can only happen with cookies. That is contradicting. I would say remove the requirement
in section 7 and change the text in section 6 to make it obvious that cookies should be
the first line of defense and should still be used when handing out puzzles on top of
cookies.

If you mean to talk about the interaction or combination of puzzles and cookies, perhaps
a separate section on that would be most clear.

7.1.1.1 introduces a term ZBC which I have no idea what it means yet. It then talks
about difficulty level 0 which I don't know what that is. Does it translate to number
of zero's in solution? If so I would expect level 1 to be the lowest? Maybe this
discussion should go into the section 7 introduction. What is the general idea of the
puzzle, what are difficuly levels, etc.

The
Responder MAY set different difficulty levels to different requests
depending on the IP address the request has come from.

I would think that MAY should be stronger, a SHOULD ? If you can detect a few problem
causing IPs or IP ranges, you made good points saying to only punish those with puzzles.

The Responder parses received SA payload and
finds mutually supported set of transforms of type PRF. It selects
most preferred transform from this set and includes it into the
PUZZLE notification.

I find the use of transform a bit confusing. I would say PRF. (and "most preferred" -> preferred)

If there are no mutually
supported PRFs, then negotiation will fail anyway and there is no
reason to return a puzzle.

I first thought of the AES_GCM not needing PRF, then realised I confused IKE and IPsec SA.
Perhaps add change "negotiation will fail" to "IKE SA negotiation will fail".

7.1.1.3. Generating Cookie

If Responder supports puzzles then cookie should be computed in such
a manner, that the Responder is able to learn some important
information from the sole cookie,

We are in the middle of puzzles and not cookies. Why suddenly cookies?
Again, I think the document can use a better introduction in section 7
that explains the interaction between the two, laying out the principes.
Only later on do I read responder puzzle state is encoded in the cookie.

The point of encoding puzzle information in the cookie is presumbly so
that this state does not need to be remembered by the responder. So how
does it know "The number of consecutive puzzles given to the Initiator."?
Is this a counter in the <AdditionalInfo> ?

I would like to put a note here as well about the maximum cookie size of 64
bytes that implementors might not realise. to avoid naive implementations
building a very big "string cookie" with these bullet points of information.

This would give the responder the AdditionalInfo information which it might
use as feedback on how successful the attack is. Why not use an encryption
of <AdditionalInfo> using the <secret> ? Would that be too expensive for
the Responder? I'd think not as it is not more expensive than decrypting
IKE_AUTH.

Or stick with the second approach listed, where the responder keeps this
state locally, which probably is better anyway because it needs to know
the scale of the entire attack that cannot be learned from individual
negotiation states.

7.1.2 states if the inititor does not want to solve a puzzle of difficulty X,
it will pretend not to support the NOTIFY. This causes the responder to not
learn that the initiator rejected the difficulty versus that it just does not
support puzzles. It would be useful for the responder to know how many iniators
support puzzles, so I would recommend a different NOTIFY for the "puzzle too
difficult" error path (Maybe a return notify of PUZZLED :)

If the received message contains a PUZZLE notification and doesn't
contain a COOKIE notification, then this message is malformed because
it requests to solve the puzzle, but doesn't provide enough
information to do it.

Again, conflicting with earlier text saying cookies are not mandated for puzzles,
which now it seems they are.

It seems 7.1.4 paragraph 2 and 3 are better moved to the introduction of that
section.

If a PS payload is found in the message, then the Responder MUST
verify the puzzle solution that it contains.

Doesn't that open up the responder to a DDOS attack. Initiators will just
submit fake puzzle solutions to drive up the initiator CPU.

Also, if the responder is no longer under attack, why can't it just ignore
the puzzle solution and continue with regular IKE?

if the Responder didn't indicate any
particular difficulty level (by setting ZBC to zero) and the
Initiator was free to select any difficulty level it can afford,

Woah, these options were not discussed before at all. So that's what level 0 means!
I would really move this text to the start of section 7 in the introduction of how
puzzles work in general.

o Demand more work from Initiator by giving it a new puzzle.

This seems a waste of a round trip. Why can the responder demand a variable puzzle
without telling what would make it happy, only to have the initiator misguess and
cause another roundtrip, or to avoid a potential roundtrip, waste too much resources
and cause visible delays to endusers by overestimating puzzle difficulty? I think
this is not a good feature of the protocol.

The more puzzles the Initiator solves the higher its chances are to be served

That seems bad. Each puzzle is a delaying round-trip!

includes a puzzle
solution in the first IKE_AUTH request message outside the Encrypted
payload

Note this is very exceptional and should probably be written out in our syntax, eg:

HDR, SK {IDi, [CERT,] [CERTREQ,]
[IDr,] AUTH, SAi2,
TSi, TSr} [PUZZLE] -->

While I checked RFC 7296 it does not state some exchanges only have encrypted payloads,
and so technically this document does not update the core document, but I think many
implementations might not expect unencrypted payloads in IKE_AUTH and CREATE_CHILD_SA
so perhaps that is important enough to mark this document as "updating 7296".

in the IKE_AUTH exchange S is a concatenation of Nr and SPIr.

Why Nr and SPIr? I think because of re-using DH vales on the responder? If so, it would
be good to explain that.

Should it state somewhere that IKE_AUTH puzzles are not allowed unless the clinet
confirmed support for puzzles in IKE_INIT. Because then the responder does not actually
know whetrher the initiator supports puzzles at all. And since it is stated those IKE_AUTH
responses without puzzle should be silently dropped, that becomes very important.

I think the IKE fragmentation paragraph deserves to be in its own sub-section. It's pretty
important to get it right and not start attempts to decrypt potentially bogus fragments.

Regarding the puzzle format, this is now 11 octects. we've aligned things in the past, so
should we do that hear and add another 1 octet of reserved bits? The puzzle notification
also misses an IANA line: The payload type for the Puzzle payload is <TBA by IANA>.

Section 9 states: A good rule of thumb is for taking about 1 second to solve the puzzle.

Why is this a good rule of thumb? Again this comes down to me having no idea whether or
not puzzles is a good idea to begin with. I'm very skeptical of this claim. A botnet
will be able to waste 1s of 99% CPU on this attack per node.

Initiators should set a maximum difficulty level beyond which they
won't try to solve the puzzle and log or display a failure message to
the administrator or user.

and again note that this information will never be available to the Responder, so it
can never figure out what solving levels is causing a sharp drop in legitimate clients.
(other than seeing an attack that is more successful, but it won't know that this is
caused by the puzzle difficulty level)

that the Responder's load remain close to the maximum it can tolerate.

Which ignores the pain on initiators. it should probably be pointed out somewhere
that confirmation a puzzle solution is a lot cheaper than solving the puzzle.

nits

Note this draft also mentions IKEv2 with version instead of
refering to just IKE, which makes more sense if we end up
with IKEv3.

buggy implementation -> broken implementation

escape any responsibility for its actions ->
escape any responsibility for its bad behaviour

Since currently there is no way -> Since there is currently no way

In IKEv2 client can request various configuration attributes ->
In IKE, a client can request various configuration attributes

Most often those attributes -> Most often these attributes

for defeating the DDoS attack -> for surviving DDoS attacks.

Implementations may be deployed in different environments ->
Implementations are deployed in different environments

As an example -> For example

, searching for two things: -> for two scenarios:

Supposing the the tunnels -> Supposing that the tunnels

If they are mitigated well enough -> If these are mitigated successfully

This is a good
thing as it prevents Initiators that are not close to the attackers
from being affected.

I think this sentence adds nothing and can be removed.

This will force the attacker to use real source addresses, ->
This will mitigate attacks based on IP address spoofing

I would probably shorten the introduction of section 7 to something like:

Puzzles can be added in the IKE_INIT and IKE_AUTH ecchanges.

and leave out the text describing the document flow, like "Both sections are divided into ..."

The message may optionally contain a COOKIE notification

If implementations base themselves on this draft, it is actually basically
guaranteed to have a cookie, say "may optionally" seems a bit weak?
"most likely" or "usually" seems better.

I mean, this part of the document should assume previous parts of the document
are implemented.

To support this feature -> To support crypto agility

also MAY ignore -> MAY ignore

ready to deal with them, -> ready to solve them"

Re: [IPsec] Review of draft-ietf-ipsecme-ddos-pro… Valery Smyslov
Re: [IPsec] Review of draft-ietf-ipsecme-ddos-pro… Tero Kivinen
Re: [IPsec] Review of draft-ietf-ipsecme-ddos-pro… Valery Smyslov
Re: [IPsec] Review of draft-ietf-ipsecme-ddos-pro… Paul Wouters
Re: [IPsec] Review of draft-ietf-ipsecme-ddos-pro… Valery Smyslov
Re: [IPsec] Review of draft-ietf-ipsecme-ddos-pro… Paul Wouters
Re: [IPsec] Review of draft-ietf-ipsecme-ddos-pro… Waltermire, David A. (Fed)
Re: [IPsec] Review of draft-ietf-ipsecme-ddos-pro… Valery Smyslov
[IPsec] Review of draft-ietf-ipsecme-ddos-protect… Paul Wouters
Re: [IPsec] Review of draft-ietf-ipsecme-ddos-pro… Valery Smyslov
Re: [IPsec] Review of draft-ietf-ipsecme-ddos-pro… Paul Wouters
Re: [IPsec] Review of draft-ietf-ipsecme-ddos-pro… Valery Smyslov
Re: [IPsec] Review of draft-ietf-ipsecme-ddos-pro… Paul Wouters
Re: [IPsec] Review of draft-ietf-ipsecme-ddos-pro… Valery Smyslov