Re: New Version Notification for draft-vkrasnov-h2-compression-dictionaries-01.txt

Jyrki Alakuijala <jyrki@google.com> Wed, 02 November 2016 20:03 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C5CF61294DB for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Wed, 2 Nov 2016 13:03:25 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.997
X-Spam-Level:
X-Spam-Status: No, score=-7.997 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_SORBS_SPAM=0.5, RP_MATCHES_RCVD=-1.497, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5zPlrontKpFj for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Wed, 2 Nov 2016 13:03:23 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D2953129489 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Wed, 2 Nov 2016 13:03:23 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.80) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1c21hF-0001SD-LS for ietf-http-wg-dist@listhub.w3.org; Wed, 02 Nov 2016 19:59:41 +0000
Resent-Date: Wed, 02 Nov 2016 19:59:41 +0000
Resent-Message-Id: <E1c21hF-0001SD-LS@frink.w3.org>
Received: from mimas.w3.org ([128.30.52.79]) by frink.w3.org with esmtps (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <jyrki@google.com>) id 1c21hA-0001QL-30 for ietf-http-wg@listhub.w3.org; Wed, 02 Nov 2016 19:59:36 +0000
Received: from mail-yw0-f171.google.com ([209.85.161.171]) by mimas.w3.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84_2) (envelope-from <jyrki@google.com>) id 1c21h4-00030o-6B for ietf-http-wg@w3.org; Wed, 02 Nov 2016 19:59:30 +0000
Received: by mail-yw0-f171.google.com with SMTP id t125so22473914ywc.1 for <ietf-http-wg@w3.org>; Wed, 02 Nov 2016 12:59:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=OFShvbGMQx5TX8rk9GOmigXJTQBoxnF0L+qZqPG/s14=; b=LQKIfg4LsYZS1JjRD4N78es2gjaHeSKeDG8KKxW9dFL0dhs6bvOwjtMAUVT/dfI4VF Tl7fooeNlIIRQGrA9FHGsGtFomsnBvdr3DN6qCW7fAavVB+mTohaxB9HLM/eOZkTezvj mbONFLTXEhwDjxmq8CQ4Cg/0ySRM+dQuuyQPVMm+e7ozbw+FXvW/8/eFjbP8w9PlBJui zZKsbLgLGv/EEm9BEbJX+ttpSzZS38d0EHFDDvffMRIGwjSIolLmnQEAKCdKu/UyEq08 wRuaF+VsAqaHZXIh22WMdjWaDF2pdPVl+FNMvNueWY2HLKSdEGy0iLPHiVoHiSb1cK/5 D8yw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=OFShvbGMQx5TX8rk9GOmigXJTQBoxnF0L+qZqPG/s14=; b=I+ej2grp450NB2ffY+xP2CKsm1+aaj5GjPOGQSqRQgG1aLXsbVg0DjQG+F+nny6/5Y FF3l8ofDY9R//XjX0GjDslok3+8zh1v8AurB0829e/t/nOU4z2XJEpYUK4aGuBT9pNX7 5ZCFscyoLxoDdctfI6KhJyO9mTHY9BGRxiZuYslG0X5UFiqcWFf6ue9LMsu5b2LX/TNh wuqGzwZ9NQON4zmfRhf1E4wORdhrFAiGQreqQy4galmH/uvvLK23fZG+Fgl8J0Mu+pH9 BV9E4Xg/grF9D1jtSGkfbqrkrFfa9Mzh4hbWvP3tfQDEIvP5z4rijlxsWTZCYAcDZUmJ rDbA==
X-Gm-Message-State: ABUngvfkE9OV3IqafQ8TBcVwKdJxSn0LsJOzeYK2snwS7jsACAIv3/oAWijZlud5ro/oY5ghTnW297FmV7E5rHcf
X-Received: by 10.36.211.207 with SMTP id n198mr3943089itg.28.1478116744028; Wed, 02 Nov 2016 12:59:04 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.64.225.115 with HTTP; Wed, 2 Nov 2016 12:59:03 -0700 (PDT)
In-Reply-To: <3669167D-26AC-4B78-8175-99B0028B6891@cloudflare.com>
References: <147793576451.32369.14134057573457350871.idtracker@ietfa.amsl.com> <3669167D-26AC-4B78-8175-99B0028B6891@cloudflare.com>
From: Jyrki Alakuijala <jyrki@google.com>
Date: Wed, 2 Nov 2016 20:59:03 +0100
Message-ID: <CAPapA7TNeMTaPz7SN_0bmHx7G9a8V9c6eD=4LcRQj+G2XPQrGQ@mail.gmail.com>
To: Vlad Krasnov <vlad@cloudflare.com>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Content-Type: multipart/alternative; boundary=001a11470b1a7773ce054056df0d
Received-SPF: pass client-ip=209.85.161.171; envelope-from=jyrki@google.com; helo=mail-yw0-f171.google.com
X-W3C-Hub-Spam-Status: No, score=-6.6
X-W3C-Hub-Spam-Report: AWL=0.240, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, RP_MATCHES_RCVD=-2.294, SPF_PASS=-0.001, W3C_AA=-1, W3C_IRA=-1, W3C_WL=-1
X-W3C-Scan-Sig: mimas.w3.org 1c21h4-00030o-6B 981377954395023dc572791beab0b3a1
X-Original-To: ietf-http-wg@w3.org
Subject: Re: New Version Notification for draft-vkrasnov-h2-compression-dictionaries-01.txt
Archived-At: <http://www.w3.org/mid/CAPapA7TNeMTaPz7SN_0bmHx7G9a8V9c6eD=4LcRQj+G2XPQrGQ@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/32819
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

Brotli has two separate ways of using a static dictionary. The first way is
the traditional way that zlib supports. The current custom dictionary
interface in brotli supports this method.

The second way is denser. It allows for every pair of (length, distance) to
point to a unique dictionary sequence. Because of this, dictionary
sequences that point to length N strings would save log2(N) bits in
distance specification in comparison to traditional dictionaries.

Further, second way of coding dictionaries allows for simple
transformations of the dictionary, i.e., primitive grammar to be defined.

We didn't add a user-programmable interface to define these yet in
libbrotli, but the second way for static dictionaries is already used for
the internal static dictionary. Our experience with it is that it is
significantly more powerful in comparison than a simple sequence of bytes.

Ideally, when used with brotli, a dictionary sharing mechanism would allow
for both of these to be used, and to allow for primitive grammar support,
too. The primitive grammar support allows for a small dictionary of
prefixes and suffixes, for example the current prefix-suffix-dictionary is
208 bytes -- the \0s here are separators:

kPrefixSuffix :=
    "\0 \0, \0 of the \0 of \0s \0.\0 and \0 in \0\"\0 to \0\">\0\n\0.
\0]\0"
    " for \0 a \0 that \0\'\0 with \0 from \0 by \0(\0. The \0 on \0 as \0"
    " is \0ing \0\n\t\0:\0ed \0=\"\0 at \0ly \0,\0=\'\0.com/\0. This \0"
    " not \0er \0al \0ful \0ive \0less \0est \0ize \0\xc2\xa0\0ous "

In addition to this, there is a combination table that combines a prefix +
middle-out-transform + suffix (choosing from 21 pre-programmed 'middle-out'
transforms that operate on the dictionary entries).

These "grammar tables" are less than 1 kB in size, but give about 1.5 % in
compression density, much higher win than elsewhere in the static
dictionary. It is possible that such grammar tables give more gain for
languages that are heavy on their prefix use (articles, prepositions, etc.).

The second way allows for about 3 % increase in compression density in
comparison to the first way, or alternatively one can reach to same
compression density by using smaller dictionaries (possibly about half the
size).