Re: New I-D: Security Considerations Regarding Compression Dictionaries

"W. Felix Handte" <w@felixhandte.com> Wed, 30 October 2019 02:19 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 584D91200A3 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 29 Oct 2019 19:19:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.651
X-Spam-Level:
X-Spam-Status: No, score=-2.651 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, MAILING_LIST_MULTI=-1, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FUZP6aOyfUOB for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 29 Oct 2019 19:19:57 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [IPv6:2603:400a:ffff:804:801e:34:0:38]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7F62212009E for <httpbisa-archive-bis2Juki@lists.ietf.org>; Tue, 29 Oct 2019 19:19:57 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.89) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1iPdXu-0003Cf-Bk for ietf-http-wg-dist@listhub.w3.org; Wed, 30 Oct 2019 02:17:14 +0000
Resent-Date: Wed, 30 Oct 2019 02:17:14 +0000
Resent-Message-Id: <E1iPdXu-0003Cf-Bk@frink.w3.org>
Received: from titan.w3.org ([2603:400a:ffff:804:801e:34:0:4c]) by frink.w3.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from <w@felixhandte.com>) id 1iPdXs-0003Bu-Oj for ietf-http-wg@listhub.w3.org; Wed, 30 Oct 2019 02:17:12 +0000
Received: from felixhandte.com ([54.172.180.13] helo=mail.felixhandte.com) by titan.w3.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <w@felixhandte.com>) id 1iPdXr-0002PS-EP for ietf-http-wg@w3.org; Wed, 30 Oct 2019 02:17:12 +0000
Received: from [192.168.1.142] (209-122-196-235.s6849.c3-0.nyw-cbr1.nyr-nyw.ny.cable.rcncustomer.com [209.122.196.235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.felixhandte.com (Postfix) with ESMTPSA id 7C70830005 for <ietf-http-wg@w3.org>; Wed, 30 Oct 2019 02:17:10 +0000 (UTC)
To: ietf-http-wg@w3.org
References: <20988909-6e4e-ea45-139a-ca403a7433eb@felixhandte.com> <CAN2QdAGX0vtBSuUBS_HYsoTuTmmO=-LX_w9OizG+v6jqFMtLTA@mail.gmail.com>
From: "W. Felix Handte" <w@felixhandte.com>
Message-ID: <f99d6b86-72af-a019-ae8b-a5673adfc814@felixhandte.com>
Date: Tue, 29 Oct 2019 22:17:10 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0
MIME-Version: 1.0
In-Reply-To: <CAN2QdAGX0vtBSuUBS_HYsoTuTmmO=-LX_w9OizG+v6jqFMtLTA@mail.gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Received-SPF: pass client-ip=54.172.180.13; envelope-from=w@felixhandte.com; helo=mail.felixhandte.com
X-W3C-Hub-Spam-Status: No, score=-3.9
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, W3C_AA=-1, W3C_WL=-1
X-W3C-Scan-Sig: titan.w3.org 1iPdXr-0002PS-EP 4db6f8f8a26b05b6b8164374801b74f3
X-Original-To: ietf-http-wg@w3.org
Subject: Re: New I-D: Security Considerations Regarding Compression Dictionaries
Archived-At: <https://www.w3.org/mid/f99d6b86-72af-a019-ae8b-a5673adfc814@felixhandte.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/37079
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

On 10/29/19 7:54 PM, Watson Ladd wrote:
> I'm not sure I appreciate the distinction of "dictionary-based"
> compression vs. other compression algorithms you draw in the draft.
> The BREACH attack didn't look at changes to the Huffman table, which
> was dominated by good old ETOAIN SHRDLU. Instead it changed the length
> of matches back into the datastream, and thus the length of the
> observed output. There isn't a separate dictionary to match substrings
> in in DEFLATE.

Yeah, "dictionary" is an extremely overloaded term in compression, to
say nothing of computer science generally. This has produced a great
deal of confusion. But I haven't come up with a better term for the concept.

What I mean by "dictionary" in this context is mostly a user-supplied
buffer that the compressor makes LZ77-style matches into. (Though, as
the document notes, various algorithms expect various kinds of data in
the dictionaries they accept.) This makes it very much a potential
vector for exactly that kind of attack. And DEFLATE, at least as
implemented by zlib, does support dictionaries of this form [0].

> A perfect compression algorithm reveals the Kolmogorov complexity of
> the input. This is enough (if you can compute Kolmogorov complexity)
> to reveal the differences between "hunter2 h" and "hunter2 z", and
> then "hunter2 hu" and "hunter2 ha", etc.

Right, compression as it exists today has real outstanding security
issues. My goal with the document is to assess whether the use of
dictionaries introduces additional problems on top of the existing ones.

[0] https://github.com/madler/zlib/blob/master/zlib.h#L611