Re: [IPsec] Review of draft-ietf-ipsecme-ddos-protection-06

Paul Wouters <paul@nohats.ca> Fri, 03 June 2016 16:24 UTC

Return-Path: <paul@nohats.ca>
X-Original-To: ipsec@ietfa.amsl.com
Delivered-To: ipsec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9707712D0BB for <ipsec@ietfa.amsl.com>; Fri, 3 Jun 2016 09:24:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.526
X-Spam-Level:
X-Spam-Status: No, score=-2.526 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_ADSP_ALL=0.8, RP_MATCHES_RCVD=-1.426] autolearn=unavailable autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IiOUaiN5xwjf for <ipsec@ietfa.amsl.com>; Fri, 3 Jun 2016 09:24:29 -0700 (PDT)
Received: from mx.nohats.ca (mx.nohats.ca [193.110.157.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DBE7512D6FB for <ipsec@ietf.org>; Fri, 3 Jun 2016 09:14:17 -0700 (PDT)
Received: from localhost (localhost [IPv6:::1]) by mx.nohats.ca (Postfix) with ESMTP id 3rLq0j1M7nz5Bm; Fri, 3 Jun 2016 18:14:13 +0200 (CEST)
X-Virus-Scanned: amavisd-new at mx.nohats.ca
Received: from mx.nohats.ca ([IPv6:::1]) by localhost (mx.nohats.ca [IPv6:::1]) (amavisd-new, port 10024) with ESMTP id upfZuUs3o7eE; Fri, 3 Jun 2016 18:14:11 +0200 (CEST)
Received: from bofh.nohats.ca (206-248-139-105.dsl.teksavvy.com [206.248.139.105]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx.nohats.ca (Postfix) with ESMTPS; Fri, 3 Jun 2016 18:14:11 +0200 (CEST)
Received: by bofh.nohats.ca (Postfix, from userid 1000) id C9566322DC8; Fri, 3 Jun 2016 12:14:10 -0400 (EDT)
DKIM-Filter: OpenDKIM Filter v2.10.3 bofh.nohats.ca C9566322DC8
Received: from localhost (localhost [127.0.0.1]) by bofh.nohats.ca (Postfix) with ESMTP id C3875406B71E; Fri, 3 Jun 2016 12:14:10 -0400 (EDT)
Date: Fri, 03 Jun 2016 12:14:10 -0400
From: Paul Wouters <paul@nohats.ca>
To: Valery Smyslov <svanru@gmail.com>
In-Reply-To: <E61D75BBDD0F4A159352B3258BBAA7DE@buildpc>
Message-ID: <alpine.LRH.2.20.1606031155230.11420@bofh.nohats.ca>
References: <alpine.LRH.2.20.1605311635540.16809@bofh.nohats.ca> <4200F5373D5542C985F3D4C51609213C@buildpc> <alpine.LRH.2.20.1606022148040.23132@bofh.nohats.ca> <E61D75BBDD0F4A159352B3258BBAA7DE@buildpc>
User-Agent: Alpine 2.20 (LRH 67 2015-01-07)
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"; format="flowed"
Archived-At: <http://mailarchive.ietf.org/arch/msg/ipsec/Gw2T05SGWIRcsg2o7QCdBPVjyA8>
Cc: ipsec@ietf.org, Yoav Nir <ynir.ietf@gmail.com>
Subject: Re: [IPsec] Review of draft-ietf-ipsecme-ddos-protection-06
X-BeenThere: ipsec@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Discussion of IPsec protocols <ipsec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipsec>, <mailto:ipsec-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ipsec/>
List-Post: <mailto:ipsec@ietf.org>
List-Help: <mailto:ipsec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipsec>, <mailto:ipsec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Jun 2016 16:24:30 -0000

On Fri, 3 Jun 2016, Valery Smyslov wrote:

[ cut everything we agreed ]

>> > >      Depending on the Responder implementation, this can be repeated 
>> > >      with
>> > >      the same half-open SA.
>> > > 
>> > >   I don't think this "depends on the implemention". Since any on-path
>> > >   attacker can spoof rubbish, a Responder MUST ignore the failed packet
>> > >   and remain ready to accept the real one for a certain about of time. 
>> > 
>> >  "Depending on the Responder implementation" means here that if along 
>> >  with discarding the failed packet the Responder also discards the 
>> >  computed SK_* keys, then it will need to re-calculate them again
>> >  when the next IKE_AUTH packet is received, so the attack can be
>> >  repeated. The SK_* keys don't depend on IKE_AUTH messages,
>> >  so in general there is no need to discard them even if the received
>> >  IKE_AUTH packet failed to decrypt properly, and the draft advises to 
>> >  keep them in this case. However, implementations may have good reasons 
>> >  to do this (e.g. to free hardware resources if crypto is performed in 
>> >  HW).
>>
>>  Oh, I didnt realise you talked about re-using DH components. Ok, in that
>>  case it makes sense but you might want to say it only applies to those
>>  who re-use DH calculations between different IKE peers. Our software
>>  never does that (and I think FIPS also puts additional constraints on
>>  this)
>
> No, it is not about re-using DH private key with different peers. I probably 
> poorly explained. Let me try again.
>
> Once the IKE_SA_INIT is complete the responder has all needed data
> to calculate SKEYSEED and SK_* keys. However, it is a CPU consuming
> operations, so the responder may want to postpone them until the keys are
> really needed, i.e. until it receives the IKE_AUTH request from the 
> initiator.
> This behaviour allows responder not to waste resources in case IKE_SA_INIT 
> was from an attacker and IKE_AUTH request never comes. 
> Once IKE_AUTH request arrives the responder performs DH, calculates SKEYSEED 
> and SK_* keys that allows him to decrypt and verify this request. In case it 
> fails
> to decrypt IKE_AUTH request, the responder has two possibilities - keep just 
> calculated SK_* keys until the next (hopely proper) IKE_AUTH
> request is received or discard them (e.g. to save crypto resources) and
> recalculate them again once the next IKE_AUTH request is received (note
> that re-calculating will result in EXACTLY the same keys, since they don't
> depent on any data from IKE_AUTH). The draft recommends to keep the keys 
> until the proper IKE_AUTH request is received (or until the exchange timed 
> out). This advise may look obvious, but I think is still worth to mention.
>
> I recall we've already discussed this while reviewing the -05 version...

Ohh okay. I vaguely remember. I guess reading this explanation, I would
say it should not be needed to mention it in this document, but it you
want to do it anyway, how about:

OLD:

Depending on the Responder implementation, this can be repeated with
the same half-open SA.

NEW:

If a Responder does not hold on to the calculated SKEYSEED and SK_*
keys (which it should in case a valid IKE_AUTH comes in later) this
attack might be repeated on the same half-open SA.

>> >  Please, see above.
>> > 
>> >  Do you think more explanationa are needed here?
>>
>>  No I guess it is fine.
>
> Are you sure after the above explanation?

No, see above :)

>> >  Will the following text make you happy?
>> > 
>> >     Many retransmission policies in practice wait one or two seconds
>> >     before retransmitting for the first time.
>>
>>  It would be nicer to rewrite it without mentioning any absolute times.
>>  That way the text will also remain more relevant in the future if/when
>>  these timings change.
>
> I don't think it is a good idea. The draft should give implementers some
> estimate timings. "One or two seconds" is here a "worst case". If 
> Implementers
> take this data into consideration when selecting the short timeout,
> they'll always be on the safe side, because if some implementations 
> retransmit
> more aggressively, then they'll always fit within this time period.
>
> So I'd rather keep the text as above.

I'm okay with that.

>> > >      For IPv6, ISPs assign between a /48 and a /64, so it makes sense 
>> > >      to use
>> > >      a 64-bit prefix as the basis for rate limiting in IPv6.
>> > > 
>> > >   Why does that make sense over using /48 ? Wouldn't you rather rate 
>> > >   limit
>> > >   some innocent neighbours over not actually defending against the 
>> > >   attack?
>> > >   If puzzles work as advertised, real clients on that /48 should still 
>> > >   be
>> > >   able to connect.
>> > 
>> >  Well, I'm not an IPv6 expert. Probably Michael Richardson (who suggested 
>> >  this change) or somebody else will comment on this.
>>
>>  This does not so much relate to IPv6 but to whether you rather
>>  overestimate or underestimate the attacker's IP space. If you
>>  underestimate, you will take longer to punish the attacking IPs. If you
>>  overestimate you will needlessly slow down legitimate clients.
>>
>>  I don't know which of the two is better, hence my objection to "it makes
>>  sense" because I don't see that.
>
> What's your suggestion for this text? Just remove "it make sense" or 
> completely rewrite the para? If the latter, please provide the text.

Something like:

For IPv6, ISPs assign between a /48 and a /64, so it does not make sense
for rate-limiting to work on single IPv6 IPs. Instead, ratelimits should
be done based on either the /48 or /64 of the misbehaving IPv6 address
observed.

>> > >      Regardless of the type of rate-limiting used, there is a huge
>> > >      advantage in blocking the DoS attack using rate-limiting for
>> > >      legitimate clients that are away from the attacking nodes.  In 
>> > >      such
>> > >      cases, adverse impacts caused by the attack or by the measures 
>> > >      used
>> > >      to counteract the attack can be avoided.
>> > > 
>> > >   I don't understand this paragraph at all. I guess "rate-limiting for
>> > >   legitimate clients" just confuses me. I think it might attempt to be
>> > >   saying "not blocking ranges with no attackers helps real clients", 
>> > >   but
>> > >   it is very unclear.
>> > 
>> >  Yoav?

So this is still pending.

>> > >      to calculate the PRF
>> > > 
>> > >   One does not "calculate" a PRF. One uses a PRF to calculate 
>> > >   something.
>> > 
>> >  OK.
>>
>>  You didn't provide text but I assume you changed it somehow.
>
> s/PRF/"output of PRF" or s/PRF/"the result of PRF"   Is it OK?

Sure.

>> > >   The section that starts with "Upon receiving this challenge," seems 
>> > >   to
>> > >   be discussing the pros and conns of this method before it has 
>> > >   explained
>> > >   the method. The reader is forced to skip this or forward to section 7
>> > >   and getting back to this part. I suggest to re-order some text to 
>> > >   avoid
>> > >   this, or to give a better short summary of the puzzle nature just 
>> > >   before
>> > >   this paragraph.
>> > 
>> >  It describes the puzzles mechanism in general, while Sections 7 & 8
>> >  describe the particular instantiation of puzzles in IKEv2.
>> >  I'd rather to keep some background about puzzles here,
>> >  so that all possible defenses are described in one place.
>>
>>  Then I think it still requires a one-line introduction to puzzles.
>
> I'm a bit confused. I've been thinking that the whole Section 4.4 is a 
> high-level description of the puzzles. Where do you want to insert
> the one-line introduction?

Re-reading the section I agree with you. I guess in reviewing the
document I lost toe flow of information. No change needed.

>> > >      When the Responder is under attack, it MAY choose to prefer
>> > >      previously authenticated peers who present a Session Resumption
>> > >      ticket (see [RFC5723] for details).
>> > > 
>> > >   Why is this only a MAY? Why is it not a SHOULD or MUST?
>> > 
>> >  A good question. I think the idea was not to force the Responder
>> >  to serve only resumed clients and to let him(her) prioterize
>> >  clients according to its own policy. In my opinion MUST is too strong, 
>> >  but SHOULD is probably OK.
>>
>>  In the famous words of Steve Kent, if you say SHOULD instead of MUST,
>>  explain when the Responder should not.
>
> When it has good reasons :-)
>
> Seriously, consider the situation when the responder finds itself
> under attack and switches to only respond to IKE_SA_RESUME
> requests. In this case it will leave legitimate clients without
> resumption tickets (e.g. ticket expired) out of scope. 
> I think there is no reasom to put MUST here, since in any case
> it is a local policy which dictates the responder's behaviour,
> and ther are no interoperability issues whether is is MAY, SHOULD or MUST, it 
> is just the responder's local policy matter.
> So SHOULD is just good advise.

Actually, what you are describing is something else:

When the Responder is under attack, it MUST NOT prefer previously
authenticated peers who present a Session Resumption ticket [RFC5723]
as that could cause a complete lock-out of legitimate clients that
have no session to resume.

Although that is probably better rewritten a bit:

When the Responder is under attack, it SHOULD prefer previously
authenticated peers who present a Session Resumption ticket [RFC5723]
unless the attack itself consists of sending bogus resumption requests,
in which case it SHOULD treat resumption and new session requests
equally to avoid locking out a class of legitimate clients.

>>  That also
>>  avoids what to do when rekey's happen because that would likely reset
>>  the counter because it is a new state?
>
> Well, I think the proper approach is to measure the rate of such
> exchanges (per SA or course). So, just reset the counter every second and 
> measure how many exchanges happened within
> the second. If the number looks abusive, take measures.

>From our implementation point of view "per SA" is difficult, because we
delete failed SA states, and then lose the count of those. So in that
sense, using global counters makes more sense. For us, a rekey means
that the old state is replaced with a new state.

so perhaps it is useful to elaborate a little more?

Paul