Re: delta encoding and state management

Nico Williams <nico@cryptonector.com> Thu, 17 January 2013 21:20 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2EACE21F8865 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 17 Jan 2013 13:20:51 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -8.792
X-Spam-Level:
X-Spam-Status: No, score=-8.792 tagged_above=-999 required=5 tests=[AWL=1.185, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0KU0QVOw4+Mt for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 17 Jan 2013 13:20:50 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 4714F21F8859 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Thu, 17 Jan 2013 13:20:50 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1Tvws3-0002wt-Di for ietf-http-wg-dist@listhub.w3.org; Thu, 17 Jan 2013 21:19:35 +0000
Resent-Date: Thu, 17 Jan 2013 21:19:35 +0000
Resent-Message-Id: <E1Tvws3-0002wt-Di@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <nico@cryptonector.com>) id 1Tvws0-0002w4-Ee for ietf-http-wg@listhub.w3.org; Thu, 17 Jan 2013 21:19:32 +0000
Received: from caiajhbdcaib.dreamhost.com ([208.97.132.81] helo=homiemail-a87.g.dreamhost.com) by maggie.w3.org with esmtp (Exim 4.72) (envelope-from <nico@cryptonector.com>) id 1Tvwrz-00045w-G1 for ietf-http-wg@w3.org; Thu, 17 Jan 2013 21:19:32 +0000
Received: from homiemail-a87.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a87.g.dreamhost.com (Postfix) with ESMTP id 5171B26C073 for <ietf-http-wg@w3.org>; Thu, 17 Jan 2013 13:19:10 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h= mime-version:in-reply-to:references:date:message-id:subject:from :to:cc:content-type; s=cryptonector.com; bh=TXWEuFic4Xdhs5QwAB8/ LWROyMs=; b=jCUe5wXz+JwxibRddYvB7dhL0cOm5s1UdDIb2ywFGf3o+n4ZJ8AG 3tc1xtdNzzCMpuMl+WQZvRs07qFQiZkuv6hvn4pRi2gvVlV2AdHIIH3amcT9tXRC 8+SJut2jSzyOQ/fLJxgtUZpvDEOvR1sHcGGmxoVqMAklX4qdpzPWKhM=
Received: from mail-wi0-f171.google.com (mail-wi0-f171.google.com [209.85.212.171]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by homiemail-a87.g.dreamhost.com (Postfix) with ESMTPSA id DFF0126C06F for <ietf-http-wg@w3.org>; Thu, 17 Jan 2013 13:19:09 -0800 (PST)
Received: by mail-wi0-f171.google.com with SMTP id hn14so4817738wib.16 for <ietf-http-wg@w3.org>; Thu, 17 Jan 2013 13:19:08 -0800 (PST)
MIME-Version: 1.0
X-Received: by 10.180.8.130 with SMTP id r2mr276143wia.28.1358457548678; Thu, 17 Jan 2013 13:19:08 -0800 (PST)
Received: by 10.217.82.73 with HTTP; Thu, 17 Jan 2013 13:19:08 -0800 (PST)
In-Reply-To: <CAP+FsNcmLH6fWQoptBoP3a1x-zSpbP8piCFz1fg5KuF+6R3jjg@mail.gmail.com>
References: <CABP7Rbf-_Of0Gnn7uaeuPiiZ6n+MxbpJjbggmD3qjykWX3gaXQ@mail.gmail.com> <CAK3OfOgvK=GEhCr3jghgFu-1FnZLv5j4bmpYoEpsj59kekL5kg@mail.gmail.com> <CAP+FsNcmLH6fWQoptBoP3a1x-zSpbP8piCFz1fg5KuF+6R3jjg@mail.gmail.com>
Date: Thu, 17 Jan 2013 15:19:08 -0600
Message-ID: <CAK3OfOj3ZgOZnzcQCifhb9f2One7vBUNGv7yhidkZqRzaeZYvQ@mail.gmail.com>
From: Nico Williams <nico@cryptonector.com>
To: Roberto Peon <grmocg@gmail.com>
Cc: James M Snell <jasnell@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Content-Type: text/plain; charset="UTF-8"
Received-SPF: none client-ip=208.97.132.81; envelope-from=nico@cryptonector.com; helo=homiemail-a87.g.dreamhost.com
X-W3C-Hub-Spam-Status: No, score=-3.3
X-W3C-Hub-Spam-Report: AWL=-3.159, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001
X-W3C-Scan-Sig: maggie.w3.org 1Tvwrz-00045w-G1 ccf8e2634072c700b52ec01250ef887a
X-Original-To: ietf-http-wg@w3.org
Subject: Re: delta encoding and state management
Archived-At: <http://www.w3.org/mid/CAK3OfOj3ZgOZnzcQCifhb9f2One7vBUNGv7yhidkZqRzaeZYvQ@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/15973
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

On Thu, Jan 17, 2013 at 2:52 PM, Roberto Peon <grmocg@gmail.com> wrote:
> Can you support that with data? The data I've seen thusfar strongly
> disproves that assertion...

No need: it follows from basic information theory that a generic
compression algorithm cannot do better than a minimal encoding of the
same data.  The key is that the encoding has to be truly minimal,
otherwise all bets are off.

Of course, the encoding may not really be always minimal.  A varint
encoding of seconds since epoch is not minimal if the epoch is far in
the past, but resetting the epoch has its costs, and gzip and friends
won't know anything about that, -- in time a datetime encoding like
seconds-since-epoch will become less than minimal.

We probably cannot get truly minimal encodings of everything we care
about, but as we saw with the datetime encoding/compression
sub-thread, in some cases we can do much better than generic encoding.
 The thing about generic compression is that it has to look for
repeating patterns, but some patterns are hard to detect, and
dictionaries add overhead, which is why we stand a very good chance of
doing at least a satisfactory job with minimal encodings.  And if we
can avoid long-term per-connection compression state then that's a big
architectural win.  It doesn't have to be the case that we compress
better this way than the other, just well enough.

Nico
--