Header Compression - streaming proposal

Gábor Molnár <gabor.molnar@sch.bme.hu> Fri, 05 July 2013 02:45 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 46EDD11E8111 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 4 Jul 2013 19:45:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.616
X-Spam-Level:
X-Spam-Status: No, score=-9.616 tagged_above=-999 required=5 tests=[AWL=0.060, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kihmRiunCPSk for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 4 Jul 2013 19:45:19 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 4D55711E8109 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Thu, 4 Jul 2013 19:45:18 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1Uuvzs-0006vh-MX for ietf-http-wg-dist@listhub.w3.org; Fri, 05 Jul 2013 02:43:44 +0000
Resent-Date: Fri, 05 Jul 2013 02:43:44 +0000
Resent-Message-Id: <E1Uuvzs-0006vh-MX@frink.w3.org>
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <gabor.molnar@sch.bme.hu>) id 1Uuvzk-0006tR-1e for ietf-http-wg@listhub.w3.org; Fri, 05 Jul 2013 02:43:36 +0000
Received: from balu.sch.bme.hu ([152.66.208.40]) by lisa.w3.org with esmtp (Exim 4.72) (envelope-from <gabor.molnar@sch.bme.hu>) id 1Uuvzi-0006Wd-P0 for ietf-http-wg@w3.org; Fri, 05 Jul 2013 02:43:36 +0000
Received: from mail-ie0-f174.google.com (mail-ie0-f174.google.com [209.85.223.174]) by balu.sch.bme.hu (Sun Java System Messaging Server 6.2-7.05 (built Sep 5 2006)) with ESMTPSA id <0MPF00I52ZJTKV00@balu.sch.bme.hu> for ietf-http-wg@w3.org; Fri, 05 Jul 2013 04:43:07 +0200 (CEST)
Received: by mail-ie0-f174.google.com with SMTP id 9so4342188iec.33 for <ietf-http-wg@w3.org>; Thu, 04 Jul 2013 19:43:05 -0700 (PDT)
Received: by 10.64.24.210 with HTTP; Thu, 04 Jul 2013 19:42:44 -0700 (PDT)
X-Received: by 10.50.16.8 with SMTP id b8mr27383233igd.1.1372992185147; Thu, 04 Jul 2013 19:43:05 -0700 (PDT)
Date: Fri, 05 Jul 2013 04:42:44 +0200
From: Gábor Molnár <gabor.molnar@sch.bme.hu>
To: HTTP Working Group <ietf-http-wg@w3.org>
Message-id: <CA+KJw_5xfvnCYM7QmtLQebPDO-fJbZz6D47mjHEWui3=fiHUoQ@mail.gmail.com>
MIME-version: 1.0
Content-type: multipart/alternative; boundary="Boundary_(ID_wo/juqqt9W3uGDZkjrF0Vw)"
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=Nll1HTtgj3G66Gtxgbwz51SpoSEAWKQIki2CP7a8LXs=; b=oGb4wIMYMC6lqH3MB+cJyY0v7kNwkF+id96R0BvS0oyrrRSA/MmaItne80rNyQ7lkX 40nU0ylPmprcivzo76Fb0Oe0l1n2+ummbbtb7jXEW4y4g9uUsrJd9SeNzd74qrgFL/Tr sgugxfZl+vAgN0PucUOQ06vIsBSgcLCdvHOLhg3OE4cxncDfiZmIsoYH0+nO/ip7D+ha HV0oOHb3wsjxGzx2KG2N4LesTeJtHyCmNduvfvaFrsoHwGAbG79eOIIT2IQOW37h+TRn 7QyoUPpHm8aHrIUv/inXh11VW79unV9KL5sq19r9XR5XEmFBGw2sBjcrCbfVS6k5YY3L Ekhw==
Received-SPF: pass client-ip=152.66.208.40; envelope-from=gabor.molnar@sch.bme.hu; helo=balu.sch.bme.hu
X-W3C-Hub-Spam-Status: No, score=-4.1
X-W3C-Hub-Spam-Report: AWL=-1.597, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.227, SPF_PASS=-0.001
X-W3C-Scan-Sig: lisa.w3.org 1Uuvzi-0006Wd-P0 3a43210fe137dca1a1834ee9eeafdbba
X-Original-To: ietf-http-wg@w3.org
Subject: Header Compression - streaming proposal
Archived-At: <http://www.w3.org/mid/CA+KJw_5xfvnCYM7QmtLQebPDO-fJbZz6D47mjHEWui3=fiHUoQ@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/18618
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

This a proposal for a seemingly minor change, that could make it possible
to implement
a streaming encoder/decoder for the compression spec, and make the decoding
process
simpler. It would also eliminate certain corner cases, like the shadowing
problem.

There's a lot of talk recently on enforcing the memory usage limits of the
compression
spec. There's one component however, that we don't take into account when
computing
the memory usage of compression implementations: it's the Working Set. The
problem
is that it can grow without bounds, since as far as I know, HTTP does not
impose limits
on the size of the header set. I tried to come up with a decoder
implementation
architecture for the compression spec that would not have to store the
whole set in the
memory.

Such a decoder would instead stream the output of the decoding process,
header by
header. This seems to be a legitimate approach, since most of the
memory-conscious
parsers I know are implemented as streaming parsers (streaming json, xml,
http, ... parsers). Gzip, the base of the previously used header
compression mechanism
is a streaming compressor/decompressor as well, of course.

It turns out that it is not possible to implement the current spec as a
streaming parser.
The only reason is this: *if an entry gets inserted into the working set,
it is not guaranteed
that it will remain there until the end of the decompression process, since
it could be
deleted any time*. Because of this, it is not possible to emit any headers
until the end
of the process.

I propose a simple change, that could, however, guarantee this: *in header
blocks, Indexed
Representations should come first*. This would guarantee that after the
Indexed
Representations are over, there will be no deletion from the Working Set.
This is the only
thing that would have to be changed. Existing decoding process can be
applied as if nothing
would change.

But it is now possible to implement a streaming, and - as a side effect -
much simpler
decoder like this:

0. There's only one component: the Header Table. An entry in the Header
Table is a
    name-value pair with an index (just like before), and a 'reference'
flag that is not set by
    default.
1. First phase of decoding: dealing with indexed representations. Indexed
representations
    simply flip the 'reference' flag on the entry they reference.
2. Second phase of decoding: before starting the processing of literal
representations, emit
    every name-value pair that is flagged in the Header Table.
3. Third phase of decoding: for every literal representations:
  1. emit the name-value pair
  2. insert it in the table if needed (incremental or substitution indexing
with table size
      enforcement)
4. When a new header block arrives, jump to 1.

It is maybe not obvious at first, but this process is equivalent the the
current decoding process,
if indexed representations come first. Please point out corner cases if you
find any.

I think that the 'Indexed Representations come first' pattern is something
that comes naturally
when implementing an encoder. Even examples in the spec can remain
unchanged, since they
follow this pattern already.

Regards,
  Gábor