[tcpm] mitigating TCP ACK loop ("ACK storm") DoS attacks

Neal Cardwell <ncardwell@google.com> Tue, 10 February 2015 02:11 UTC

Return-Path: <ncardwell@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4C8651A8A8D for <tcpm@ietfa.amsl.com>; Mon, 9 Feb 2015 18:11:50 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.89
X-Spam-Level:
X-Spam-Status: No, score=-0.89 tagged_above=-999 required=5 tests=[BAYES_20=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FM_FORGED_GMAIL=0.622, GB_I_LETTER=-2, J_CHICKENPOX_35=0.6, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BQ2QTBTaFX6Y for <tcpm@ietfa.amsl.com>; Mon, 9 Feb 2015 18:11:47 -0800 (PST)
Received: from mail-qc0-x233.google.com (mail-qc0-x233.google.com [IPv6:2607:f8b0:400d:c01::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2623D1A6F96 for <tcpm@ietf.org>; Mon, 9 Feb 2015 18:11:47 -0800 (PST)
Received: by mail-qc0-f179.google.com with SMTP id r5so4650432qcx.10 for <tcpm@ietf.org>; Mon, 09 Feb 2015 18:11:46 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:date:message-id:subject:from:to:cc:content-type; bh=lmkz3QEZ/ocj96CtPv9N3AK2uJjGrnaQuleCfuAxC44=; b=MfzEjUe8UsVZjeiPU/8LcQtuij8d5kMLsbQ6ZaPJrlmTw22hmjtL0JbYw2ql+AIHgq uQBpgOoNpuhuGe63aH76jWbABzpaz5h8T2FGyi604yAQ08bIE9sh/s5xMzePPiJZKgF3 IxtQENdkeugFGF04ixyo5xCoEK9iCmS908NmOxhDCk2KpDaNwvnP900UrEDa5AY7n1uv 46Oe1a7+3fL9HPQKtfEaCKAsdub/Fdjk8gjeJdc7P4i9hifO6kICnd9mwlkYCJ6MIKXU LSYPAfBAWMedEBdU/VCVy76ifyJx1S7Yik6+J72rszFCgigrg5slTOFEZN0c3MT8rEyb KgIg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to:cc :content-type; bh=lmkz3QEZ/ocj96CtPv9N3AK2uJjGrnaQuleCfuAxC44=; b=lz5dwy8PcrD0y4TpFUS6oJGICrfnAfQMHQPSw1CZoINpbavC/xghGUKDXFKhnQhxXJ 2OGlwxj2tiPcamJK6Eg6nlCBu/ExSsr23u9iomk7Z92KjqJNyxt+DN5E8lYyvJGmR/Ex FvpFLEBB1w3uCleteHIFVRmrjBwPctFAdEKzFJ43s94Po7CioaYzcTVQC6dmIin7opzq hpr4peaVXxaxbfVJTBK7o+ZdKvaMr5YXRRuEvIAmDya9NJsnjsxZOfe8iZcOw12YH8rz B6Hblbv4l+DZ4Al9Z8wuGrW+rIZEkVO99b9etlTZOxo47gUmV515OEujN5dWpo9Y6MAJ 914w==
X-Gm-Message-State: ALoCoQlMcb659rUqrtTiNoEa/7B/auC8TwtPdcnO6hrVzmj1HdD8AKlWNvCFQz2kYBkbNnBbKmMc
MIME-Version: 1.0
X-Received: by 10.140.86.75 with SMTP id o69mr45995447qgd.98.1423534306301; Mon, 09 Feb 2015 18:11:46 -0800 (PST)
Received: by 10.140.137.68 with HTTP; Mon, 9 Feb 2015 18:11:46 -0800 (PST)
Date: Mon, 09 Feb 2015 21:11:46 -0500
Message-ID: <CADVnQynQ07-=gzUGbBivua17guztXG7hF4u3gk9m1D+sYyB_Fw@mail.gmail.com>
From: Neal Cardwell <ncardwell@google.com>
To: "tcpm@ietf.org" <tcpm@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <http://mailarchive.ietf.org/arch/msg/tcpm/ThEU4LYfUY_9IBxn0CfMYUQRo-Q>
Cc: Eric Dumazet <edumazet@google.com>
Subject: [tcpm] mitigating TCP ACK loop ("ACK storm") DoS attacks
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Feb 2015 02:11:50 -0000

TCP DoS scenarios involving ACK loops (aka "ACK storms" or "packet
wars") have come up previously on the TCPM list. For example, Anil
Agarwal brought them up in the Nov 2013 TCPM thread "TCP mismatched
sequence numbers issue"

I wanted to mention that our TCP team at Google has recently submitted
a patch series for Linux that mitigates such attacks by rate-limiting
the dupacks that are sent in response to out-of-window incoming
packets. The code has been in use at Google and was recently merged
into the official Linux "net-next" tree, which means that it should
land in the next official Linux release.

The patch series summary can be browsed at the following URL:

  http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=f06535c599354816cfbc653ce8965804c7385c61

Below I'm including the cover letter summarizing the patch series, for
convenience/reference.

We are interested to hear any feedback folks may have.

Thanks!

neal

==============

tcp: mitigate TCP ACK loops due to out-of-window validation dupacks

This patch series mitigates "ack loop" DoS scenarios by rate-limiting
outgoing duplicate ACKs sent in response to incoming "out of window"
segments.

Background
-----------

There are several cases in which the TCP RFCs specify that a TCP
endpoint should send a pure duplicate ACK in response to a pure
duplicate ACK that appears to be invalid due to being "out of window":

(1) RFC 793 (section 3.9, page 69) specifies that endpoints should
    send a duplicate ACK in response to an ACK when the incoming
    sequence number is invalid due to being outside the receive
    window: "If an incoming segment is not acceptable, an
    acknowledgment should be sent in reply".

(2) RFC 793 (section 3.9, page 72) says: "If the ACK acknowledges
    something not yet sent (SEG.ACK > SND.NXT) then send an ACK".

(3) RFC 1323 (section 4.2.1, page 18) specifies that endpoints should
    send a duplicate ACK in response to an ACK when the PAWS check for
    the incoming timestamp value fails: "If .... SEG.TSval < TS.Recent
    and if TS.Recent is valid ... Send an acknowledgement in reply"

The problem
------------

Normally, this is not a problem. However, a buggy middlebox or
malicious man-in-the-middle can inject a few packets into the
conversation that advance each endpoint's notion of the current window
(sequence, ACK, or timestamp), without either side noticing. In this
case, from then on each side can think the other is sending invalid
segments. Thus an infinite feedback loop of duplicate ACKs can ensue,
as each endpoint receives a duplicate ACK, decides that it is invalid
(due to sequence number, ACK number, or timestamp), and then sends a
dupack in reply, which the other side decides is invalid, responding
with a dupack... ad infinitum. This ping-pong feedback loop can happen
at a very high rate.

This phenomenon can and does happen in practice. It has been seen in
datacenter and Internet contexts at Google, and has been documented by
Anil Agarwal in the Nov 2013 tcpm thread "TCP mismatched sequence
numbers issue", and Avery Fay in the Feb 2015 Linux netdev thread
"Invalid timestamp? causing tight ack loop (hundreds of thousands of
packets / sec)".

This patch series
------------------

This patch series mitigates such ack loops by rate-limiting outgoing
duplicate ACKs sent in response to incoming TCP packets that are for
an existing connection but that are invalid due to any of the reasons
mentioned above: sequence number (1), ACK field (2), or timestamp
value (3). The rate limit for such duplicate ACKs is specified by a
new sysctl, tcp_invalid_ratelimit, which specifies the minimal space
between such outbound duplicate ACKs, in milliseconds. The default is
500 (500ms), and 0 disables the mechanism.

We rate-limit these duplicate ACK responses rather than blocking them
entirely or resetting the connection, because legitimate connections
can rely on dupacks in response to some out-of-window segments. For
example, zero window probes are typically sent with a sequence number
that is below the current window, and ZWPs thus expect to thus elicit
a dupack in response.

Testing: this approach has been in use at Google for a while.