Re: delta encoding and state management

Roberto Peon <grmocg@gmail.com> Thu, 17 January 2013 21:36 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E56DB21F88BC for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 17 Jan 2013 13:36:39 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.38
X-Spam-Level:
X-Spam-Status: No, score=-10.38 tagged_above=-999 required=5 tests=[AWL=0.218, BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YQerfEKEgqdo for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 17 Jan 2013 13:36:24 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 9A9B321F88A9 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Thu, 17 Jan 2013 13:36:24 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1Tvx7s-0002fF-5Q for ietf-http-wg-dist@listhub.w3.org; Thu, 17 Jan 2013 21:35:56 +0000
Resent-Date: Thu, 17 Jan 2013 21:35:56 +0000
Resent-Message-Id: <E1Tvx7s-0002fF-5Q@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <grmocg@gmail.com>) id 1Tvx7o-0002eZ-ON for ietf-http-wg@listhub.w3.org; Thu, 17 Jan 2013 21:35:52 +0000
Received: from mail-la0-f52.google.com ([209.85.215.52]) by maggie.w3.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.72) (envelope-from <grmocg@gmail.com>) id 1Tvx7n-0004h6-Po for ietf-http-wg@w3.org; Thu, 17 Jan 2013 21:35:52 +0000
Received: by mail-la0-f52.google.com with SMTP id ee20so1059310lab.25 for <ietf-http-wg@w3.org>; Thu, 17 Jan 2013 13:35:25 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=dMArUcMMoYFrlpIB9jmYN8wUG5ir1hH4/tLs0M6mVFs=; b=E4mDSBeoVDXrkBzGupeLk0+AWtFvswgCXidfislC3yv+0VL3vsrT9PV7GEbrjaTxmh qQuBQR+q7ItZ7FtGXGuTsUYStu411R8+7Kim1cuNCQaGCe6dPm5I3Bf2RIc+wZ3Odo9n G15T4NOG2Ic70tQeXjGGe4/yc5ssAX9lajOziywDkJz4w8TaQo0QiLpUx+/mhZU45/EF B7tt+CGpx1ADPtme2K5b3DB5FvtDSkmBAQt4dbO1OBOpdmmyzuUVTbu87s669wiU7pSg tp8sK2AhnVW3SyvP4qPTIsSo4yWUae+cKI/w+mfIw2TFZOjnjlWyD2O9TQbE+nTjUPzJ /8YQ==
MIME-Version: 1.0
X-Received: by 10.152.144.130 with SMTP id sm2mr6254634lab.49.1358458525131; Thu, 17 Jan 2013 13:35:25 -0800 (PST)
Received: by 10.112.81.5 with HTTP; Thu, 17 Jan 2013 13:35:24 -0800 (PST)
In-Reply-To: <CAK3OfOj3ZgOZnzcQCifhb9f2One7vBUNGv7yhidkZqRzaeZYvQ@mail.gmail.com>
References: <CABP7Rbf-_Of0Gnn7uaeuPiiZ6n+MxbpJjbggmD3qjykWX3gaXQ@mail.gmail.com> <CAK3OfOgvK=GEhCr3jghgFu-1FnZLv5j4bmpYoEpsj59kekL5kg@mail.gmail.com> <CAP+FsNcmLH6fWQoptBoP3a1x-zSpbP8piCFz1fg5KuF+6R3jjg@mail.gmail.com> <CAK3OfOj3ZgOZnzcQCifhb9f2One7vBUNGv7yhidkZqRzaeZYvQ@mail.gmail.com>
Date: Thu, 17 Jan 2013 13:35:24 -0800
Message-ID: <CAP+FsNfswUN-CK6heRGqEnSJatHGo3q2mZZLTrPnjapCZz2sTg@mail.gmail.com>
From: Roberto Peon <grmocg@gmail.com>
To: Nico Williams <nico@cryptonector.com>
Cc: James M Snell <jasnell@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Content-Type: multipart/alternative; boundary="e89a8f2348a9d5235b04d382c5e0"
Received-SPF: pass client-ip=209.85.215.52; envelope-from=grmocg@gmail.com; helo=mail-la0-f52.google.com
X-W3C-Hub-Spam-Status: No, score=-3.4
X-W3C-Hub-Spam-Report: AWL=-2.598, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001
X-W3C-Scan-Sig: maggie.w3.org 1Tvx7n-0004h6-Po 523fe0abbfe086f330e1ef8a0345d21a
X-Original-To: ietf-http-wg@w3.org
Subject: Re: delta encoding and state management
Archived-At: <http://www.w3.org/mid/CAP+FsNfswUN-CK6heRGqEnSJatHGo3q2mZZLTrPnjapCZz2sTg@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/15975
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

Here are where my doubts stem from:

"Custom" keys and values (cookies, user-agent strings, etc) are added to
headers and repeated basically all the time. If we don't know the input a
priori, we cannot construct a perfect or minimal encoding.

Lets assume we do know about all of the possible header keys and values at
the time we create the spec.
Even then, we achieve something better than stateful compression only when
the sum-total of allowed permutations is less than the amount of bits you
need to toggle the visibility on a piece of state (which is very small
indeed).
-=R


On Thu, Jan 17, 2013 at 1:19 PM, Nico Williams <nico@cryptonector.com>wrote:

> On Thu, Jan 17, 2013 at 2:52 PM, Roberto Peon <grmocg@gmail.com> wrote:
> > Can you support that with data? The data I've seen thusfar strongly
> > disproves that assertion...
>
> No need: it follows from basic information theory that a generic
> compression algorithm cannot do better than a minimal encoding of the
> same data.  The key is that the encoding has to be truly minimal,
> otherwise all bets are off.
>
> Of course, the encoding may not really be always minimal.  A varint
> encoding of seconds since epoch is not minimal if the epoch is far in
> the past, but resetting the epoch has its costs, and gzip and friends
> won't know anything about that, -- in time a datetime encoding like
> seconds-since-epoch will become less than minimal.
>
> We probably cannot get truly minimal encodings of everything we care
> about, but as we saw with the datetime encoding/compression
> sub-thread, in some cases we can do much better than generic encoding.
>  The thing about generic compression is that it has to look for
> repeating patterns, but some patterns are hard to detect, and
> dictionaries add overhead, which is why we stand a very good chance of
> doing at least a satisfactory job with minimal encodings.  And if we
> can avoid long-term per-connection compression state then that's a big
> architectural win.  It doesn't have to be the case that we compress
> better this way than the other, just well enough.
>
> Nico
> --
>