Re: New Version Notification for draft-vkrasnov-h2-compression-dictionaries-01.txt

Vlad Krasnov <vlad@cloudflare.com> Wed, 02 November 2016 21:24 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A6EE8129989 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Wed, 2 Nov 2016 14:24:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.997
X-Spam-Level:
X-Spam-Status: No, score=-7.997 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_SORBS_SPAM=0.5, RP_MATCHES_RCVD=-1.497, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cloudflare.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id C_Jat3B4iUek for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Wed, 2 Nov 2016 14:24:36 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 00F68129987 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Wed, 2 Nov 2016 14:24:35 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.80) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1c22xH-0004Al-Va for ietf-http-wg-dist@listhub.w3.org; Wed, 02 Nov 2016 21:20:19 +0000
Resent-Date: Wed, 02 Nov 2016 21:20:19 +0000
Resent-Message-Id: <E1c22xH-0004Al-Va@frink.w3.org>
Received: from mimas.w3.org ([128.30.52.79]) by frink.w3.org with esmtps (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <vlad@cloudflare.com>) id 1c22xB-00049v-Rx for ietf-http-wg@listhub.w3.org; Wed, 02 Nov 2016 21:20:13 +0000
Received: from mail-pf0-f172.google.com ([209.85.192.172]) by mimas.w3.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84_2) (envelope-from <vlad@cloudflare.com>) id 1c22x5-0008SB-T4 for ietf-http-wg@w3.org; Wed, 02 Nov 2016 21:20:08 +0000
Received: by mail-pf0-f172.google.com with SMTP id n85so18167283pfi.1 for <ietf-http-wg@w3.org>; Wed, 02 Nov 2016 14:19:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=y1hSUrQ8lMLTrFnphiPY8mxpY7InPZO9WnFKSmMFIlg=; b=GoZfq3o0r/kmgq3DRUoImMLenKWfbvAxu0ExDmmjxwnqYmcKhRJhpdBGYLS6zvnqhz +70/7Iyixvcmy+abIcD35rYbgJ6m2eeqv1iiI+/s94F2Op9ASv7jjNhbY1auKfrAMbkU IrOvI4KXGPCNwXPlv61FxJ0+UbxIl9gz2nI/8=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=y1hSUrQ8lMLTrFnphiPY8mxpY7InPZO9WnFKSmMFIlg=; b=WTZF6dWZfEN38NFleyag6Qo7BOzwXxWb/yisVDzsaCxIVD6lUHpjwi3q+Y3dxCQghx ymSs0BqiWdiLfNuKKw8BW6SNlT+JN81Lyu/fycsTdDOVWGO3EkcoQMv8IuXuz6+TPjUW qLzZnLfWK8Jp+Gpqrz3kZQxcwO6j2lArNy6L17m2IDq6UkBJUUrCewrHkoibiS+8fCAe yNnqp+e0XlJQeeb5b/1kXyVY+dKR/ETV22jCvQGVrsG+0N9P2Lr/A+oaP22UmsFaYRmA yRTw1oyGBkNkCkgUwEQOub97gMPgoWtvI+egGALURVOVv+Co5HDzs4bU0JzjbJcIJ9Ey X38g==
X-Gm-Message-State: ABUngvezxjaZs2W6Ew0odA5oFZfyywV5hAkx2oyVVcOF8n1JyS6UIH448wfYLLA/V1cEMbsg
X-Received: by 10.98.207.195 with SMTP id b186mr10488028pfg.40.1478121581321; Wed, 02 Nov 2016 14:19:41 -0700 (PDT)
Received: from ?IPv6:2606:4700:ff01:8200:f426:cd5e:3542:808e? ([2606:4700:ff01:8200:f426:cd5e:3542:808e]) by smtp.gmail.com with ESMTPSA id 21sm6889217pfs.88.2016.11.02.14.19.40 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 02 Nov 2016 14:19:40 -0700 (PDT)
From: Vlad Krasnov <vlad@cloudflare.com>
Message-Id: <7DE838F0-916A-4F92-9631-2C0C1073AFF6@cloudflare.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_2B35EE04-870B-4BBB-B87F-434F7FD6B355"
Mime-Version: 1.0 (Mac OS X Mail 10.1 \(3251\))
Date: Wed, 2 Nov 2016 14:19:39 -0700
In-Reply-To: <CAPapA7TNeMTaPz7SN_0bmHx7G9a8V9c6eD=4LcRQj+G2XPQrGQ@mail.gmail.com>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
To: Jyrki Alakuijala <jyrki@google.com>
References: <147793576451.32369.14134057573457350871.idtracker@ietfa.amsl.com> <3669167D-26AC-4B78-8175-99B0028B6891@cloudflare.com> <CAPapA7TNeMTaPz7SN_0bmHx7G9a8V9c6eD=4LcRQj+G2XPQrGQ@mail.gmail.com>
X-Mailer: Apple Mail (2.3251)
Received-SPF: pass client-ip=209.85.192.172; envelope-from=vlad@cloudflare.com; helo=mail-pf0-f172.google.com
X-W3C-Hub-Spam-Status: No, score=-3.5
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, W3C_AA=-1, W3C_WL=-1
X-W3C-Scan-Sig: mimas.w3.org 1c22x5-0008SB-T4 f2d16300e6704366f1e0a908b2c81c9b
X-Original-To: ietf-http-wg@w3.org
Subject: Re: New Version Notification for draft-vkrasnov-h2-compression-dictionaries-01.txt
Archived-At: <http://www.w3.org/mid/7DE838F0-916A-4F92-9631-2C0C1073AFF6@cloudflare.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/32822
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

> Brotli has two separate ways of using a static dictionary. The first way is the traditional way that zlib supports. The current custom dictionary interface in brotli supports this method.

I tested Brotli with the traditional approach for both static and dynamic dictionaries, and the gains are much higher than those of deflate, I guess thanks in part to the much larger window.

For my current study on static dictionaries I generated dictionaries using a tool I wrote https://github.com/vkrasnov/dictator <https://github.com/vkrasnov/dictator> from the Alexa top 500, based on content-encoding. It is optimized for deflate, and there are things to improve there yet.

A bug plus it that the same dictionary is beneficial for both deflate and Brotli, or even LZMA as a matter of fact.

> The second way is denser. It allows for every pair of (length, distance) to point to a unique dictionary sequence. Because of this, dictionary sequences that point to length N strings would save log2(N) bits in distance specification in comparison to traditional dictionaries.

The second way is more expensive in terms of performance, but I suppose if you can generate static dictionaries only once, you only need to consider the cost of dictionary lookup.

The larger problem is that then we will have to support different dictionaries for different algorithms. We can do it, if like Martin suggested we have a versioning system for the dictionaries.

Another question then: do we support simultaneous use of dynamic and static dictionaries, which would only work with Brotli?

> The second way allows for about 3 % increase in compression density in comparison to the first way, or alternatively one can reach to same compression density by using smaller dictionaries (possibly about half the size).

My main goal here is to allow for efficient recompression on the fly even for static, or previously compressed, content. Using dynamic dictionaries, lets you compress/re-compress very well with lower compression setting, really fast.

For example if you already have a stream compressed with gzip, is it worth it for you to recompress it to brotli? When you use dynamic dictionaries (with the simple static dictionaries) the answer is definitely yes.