Re: delta encoding and state management

Mark Nottingham <mnot@mnot.net> Thu, 17 January 2013 21:34 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E2A4C21F88BD for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 17 Jan 2013 13:34:46 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.102
X-Spam-Level:
X-Spam-Status: No, score=-9.102 tagged_above=-999 required=5 tests=[AWL=1.497, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id z7NDPJdridvk for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 17 Jan 2013 13:34:45 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 7C32521F88BC for <httpbisa-archive-bis2Juki@lists.ietf.org>; Thu, 17 Jan 2013 13:34:45 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1Tvx5t-000192-LD for ietf-http-wg-dist@listhub.w3.org; Thu, 17 Jan 2013 21:33:53 +0000
Resent-Date: Thu, 17 Jan 2013 21:33:53 +0000
Resent-Message-Id: <E1Tvx5t-000192-LD@frink.w3.org>
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <mnot@mnot.net>) id 1Tvx5p-00016o-FK for ietf-http-wg@listhub.w3.org; Thu, 17 Jan 2013 21:33:49 +0000
Received: from mxout-07.mxes.net ([216.86.168.182]) by lisa.w3.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from <mnot@mnot.net>) id 1Tvx5o-0005Yf-Mu for ietf-http-wg@w3.org; Thu, 17 Jan 2013 21:33:49 +0000
Received: from [192.168.1.80] (unknown [118.209.240.13]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id 81CD722E25B; Thu, 17 Jan 2013 16:33:25 -0500 (EST)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
From: Mark Nottingham <mnot@mnot.net>
In-Reply-To: <CAK3OfOj3ZgOZnzcQCifhb9f2One7vBUNGv7yhidkZqRzaeZYvQ@mail.gmail.com>
Date: Fri, 18 Jan 2013 08:33:21 +1100
Cc: Roberto Peon <grmocg@gmail.com>, James M Snell <jasnell@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <7A7CEC55-50F6-4090-AD07-03179D5A6C6A@mnot.net>
References: <CABP7Rbf-_Of0Gnn7uaeuPiiZ6n+MxbpJjbggmD3qjykWX3gaXQ@mail.gmail.com> <CAK3OfOgvK=GEhCr3jghgFu-1FnZLv5j4bmpYoEpsj59kekL5kg@mail.gmail.com> <CAP+FsNcmLH6fWQoptBoP3a1x-zSpbP8piCFz1fg5KuF+6R3jjg@mail.gmail.com> <CAK3OfOj3ZgOZnzcQCifhb9f2One7vBUNGv7yhidkZqRzaeZYvQ@mail.gmail.com>
To: Nico Williams <nico@cryptonector.com>
X-Mailer: Apple Mail (2.1499)
Received-SPF: pass client-ip=216.86.168.182; envelope-from=mnot@mnot.net; helo=mxout-07.mxes.net
X-W3C-Hub-Spam-Status: No, score=-3.3
X-W3C-Hub-Spam-Report: AWL=-3.281, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001
X-W3C-Scan-Sig: lisa.w3.org 1Tvx5o-0005Yf-Mu e88f59e713afa413474398ce65c41415
X-Original-To: ietf-http-wg@w3.org
Subject: Re: delta encoding and state management
Archived-At: <http://www.w3.org/mid/7A7CEC55-50F6-4090-AD07-03179D5A6C6A@mnot.net>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/15974
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

On 18/01/2013, at 8:19 AM, Nico Williams <nico@cryptonector.com> wrote:

> On Thu, Jan 17, 2013 at 2:52 PM, Roberto Peon <grmocg@gmail.com> wrote:
>> Can you support that with data? The data I've seen thusfar strongly
>> disproves that assertion...
> 
> No need: it follows from basic information theory that a generic
> compression algorithm cannot do better than a minimal encoding of the
> same data.  The key is that the encoding has to be truly minimal,
> otherwise all bets are off.
> 
> Of course, the encoding may not really be always minimal.  A varint
> encoding of seconds since epoch is not minimal if the epoch is far in
> the past, but resetting the epoch has its costs, and gzip and friends
> won't know anything about that, -- in time a datetime encoding like
> seconds-since-epoch will become less than minimal.
> 
> We probably cannot get truly minimal encodings of everything we care
> about, but as we saw with the datetime encoding/compression
> sub-thread, in some cases we can do much better than generic encoding.
> The thing about generic compression is that it has to look for
> repeating patterns, but some patterns are hard to detect, and
> dictionaries add overhead, which is why we stand a very good chance of
> doing at least a satisfactory job with minimal encodings.  And if we
> can avoid long-term per-connection compression state then that's a big
> architectural win.  It doesn't have to be the case that we compress
> better this way than the other, just well enough.


The problem is that the biggest part of our target data is often opaque blobs -- in the forms of cookies, ETags, request URIs and referers. A "minimal encoding" of a cookie, as currently defined, isn't going to save us much at all.

Cheers,


--
Mark Nottingham   http://www.mnot.net/