Re: delta encoding and state management

James M Snell <jasnell@gmail.com> Thu, 17 January 2013 21:46 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 574A521F8939 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 17 Jan 2013 13:46:41 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.125
X-Spam-Level:
X-Spam-Status: No, score=-9.125 tagged_above=-999 required=5 tests=[AWL=1.473, BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ayqFdD83Jnp1 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 17 Jan 2013 13:46:40 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 3CEE121F8937 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Thu, 17 Jan 2013 13:46:39 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1TvxHE-0000w4-Ec for ietf-http-wg-dist@listhub.w3.org; Thu, 17 Jan 2013 21:45:36 +0000
Resent-Date: Thu, 17 Jan 2013 21:45:36 +0000
Resent-Message-Id: <E1TvxHE-0000w4-Ec@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <jasnell@gmail.com>) id 1TvxH9-0000ue-Nc for ietf-http-wg@listhub.w3.org; Thu, 17 Jan 2013 21:45:31 +0000
Received: from mail-ia0-f174.google.com ([209.85.210.174]) by maggie.w3.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.72) (envelope-from <jasnell@gmail.com>) id 1TvxH6-0004zr-NY for ietf-http-wg@w3.org; Thu, 17 Jan 2013 21:45:31 +0000
Received: by mail-ia0-f174.google.com with SMTP id o25so821708iad.33 for <ietf-http-wg@w3.org>; Thu, 17 Jan 2013 13:45:02 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=BFifMWFZpyzhudpaen4r9KlIZUrdxifW5OapjVYXcVg=; b=ShftG79IzEliebxDe6poVePwLDlBROR9HpVktEM7SvQFgawLvkt8E5q9vONHb28SId VV+m3B50QrVycWoy7IVO1yiDxP/u7+J68EnlwKLDKLBxVuKXMrCGBs1n2Kyz0LjTiu/0 U+GKuh+HHE4CY39MnvK2i7EHPebQA9qi1AADE/EjXpCZdMT+cfdP3A/k+Fu8bAr3sRiY 9mJCjlqC615mNcG2nPjzhCAvVSbENZBgK19Y29j2Fx2ySjk+2U8v8SkyshT/FfGOVy8v QTuC2IAfhhlr19fEKYNkkT4I49q+51MfxEokIL2EGdV2bLvlbTCKiOFMo+MPOjqWLZHi yJzw==
X-Received: by 10.50.150.174 with SMTP id uj14mr290728igb.19.1358459102512; Thu, 17 Jan 2013 13:45:02 -0800 (PST)
MIME-Version: 1.0
Received: by 10.64.26.137 with HTTP; Thu, 17 Jan 2013 13:44:42 -0800 (PST)
In-Reply-To: <CAP+FsNfswUN-CK6heRGqEnSJatHGo3q2mZZLTrPnjapCZz2sTg@mail.gmail.com>
References: <CABP7Rbf-_Of0Gnn7uaeuPiiZ6n+MxbpJjbggmD3qjykWX3gaXQ@mail.gmail.com> <CAK3OfOgvK=GEhCr3jghgFu-1FnZLv5j4bmpYoEpsj59kekL5kg@mail.gmail.com> <CAP+FsNcmLH6fWQoptBoP3a1x-zSpbP8piCFz1fg5KuF+6R3jjg@mail.gmail.com> <CAK3OfOj3ZgOZnzcQCifhb9f2One7vBUNGv7yhidkZqRzaeZYvQ@mail.gmail.com> <CAP+FsNfswUN-CK6heRGqEnSJatHGo3q2mZZLTrPnjapCZz2sTg@mail.gmail.com>
From: James M Snell <jasnell@gmail.com>
Date: Thu, 17 Jan 2013 13:44:42 -0800
Message-ID: <CABP7RbfDZcRH-0_AaN9iYjPN-v6QjU6_Xdy5o1BHYnDFWHtuAg@mail.gmail.com>
To: Roberto Peon <grmocg@gmail.com>
Cc: Nico Williams <nico@cryptonector.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Content-Type: multipart/alternative; boundary="f46d043d644b3f45b604d382e84e"
Received-SPF: pass client-ip=209.85.210.174; envelope-from=jasnell@gmail.com; helo=mail-ia0-f174.google.com
X-W3C-Hub-Spam-Status: No, score=-3.5
X-W3C-Hub-Spam-Report: AWL=-2.666, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001
X-W3C-Scan-Sig: maggie.w3.org 1TvxH6-0004zr-NY 7c6da143c000e5b7d0880c5e456c5cf0
X-Original-To: ietf-http-wg@w3.org
Subject: Re: delta encoding and state management
Archived-At: <http://www.w3.org/mid/CABP7RbfDZcRH-0_AaN9iYjPN-v6QjU6_Xdy5o1BHYnDFWHtuAg@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/15977
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

On Thu, Jan 17, 2013 at 1:35 PM, Roberto Peon <grmocg@gmail.com> wrote:

> Here are where my doubts stem from:
>
> "Custom" keys and values (cookies, user-agent strings, etc) are added to
> headers and repeated basically all the time. If we don't know the input a
> priori, we cannot construct a perfect or minimal encoding.
>
> Lets assume we do know about all of the possible header keys and values at
> the time we create the spec.
> Even then, we achieve something better than stateful compression only when
> the sum-total of allowed permutations is less than the amount of bits you
> need to toggle the visibility on a piece of state (which is very small
> indeed).
>

We certainly cannot come up with optimized binary encodings for everything
but we can get a good ways down the road optimizing the parts we do know
about. We've already seen, for instance, that date headers can be optimized
significantly; and the separation of individual cookie crumbs allows us to
keep from having to resend the entire cookie whenever just one small part
changes. I'm certain there are other optimizations we can make without
feeling like we have to find encodings for everything.

- James


> -=R
>
>
> On Thu, Jan 17, 2013 at 1:19 PM, Nico Williams <nico@cryptonector.com>wrote:
>
>> On Thu, Jan 17, 2013 at 2:52 PM, Roberto Peon <grmocg@gmail.com> wrote:
>> > Can you support that with data? The data I've seen thusfar strongly
>> > disproves that assertion...
>>
>> No need: it follows from basic information theory that a generic
>> compression algorithm cannot do better than a minimal encoding of the
>> same data.  The key is that the encoding has to be truly minimal,
>> otherwise all bets are off.
>>
>> Of course, the encoding may not really be always minimal.  A varint
>> encoding of seconds since epoch is not minimal if the epoch is far in
>> the past, but resetting the epoch has its costs, and gzip and friends
>> won't know anything about that, -- in time a datetime encoding like
>> seconds-since-epoch will become less than minimal.
>>
>> We probably cannot get truly minimal encodings of everything we care
>> about, but as we saw with the datetime encoding/compression
>> sub-thread, in some cases we can do much better than generic encoding.
>>  The thing about generic compression is that it has to look for
>> repeating patterns, but some patterns are hard to detect, and
>> dictionaries add overhead, which is why we stand a very good chance of
>> doing at least a satisfactory job with minimal encodings.  And if we
>> can avoid long-term per-connection compression state then that's a big
>> architectural win.  It doesn't have to be the case that we compress
>> better this way than the other, just well enough.
>>
>> Nico
>> --
>>
>
>