Re: Header Compression - streaming proposal

Gábor Molnár <gabor.molnar@sch.bme.hu> Mon, 08 July 2013 08:57 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EB23E11E81B2 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Mon, 8 Jul 2013 01:57:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.643
X-Spam-Level:
X-Spam-Status: No, score=-9.643 tagged_above=-999 required=5 tests=[AWL=0.033, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RF-LTOX6A1NC for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Mon, 8 Jul 2013 01:57:40 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 1696A1F0D39 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Mon, 8 Jul 2013 01:57:16 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1Uw7ET-0002Ju-T9 for ietf-http-wg-dist@listhub.w3.org; Mon, 08 Jul 2013 08:55:41 +0000
Resent-Date: Mon, 08 Jul 2013 08:55:41 +0000
Resent-Message-Id: <E1Uw7ET-0002Ju-T9@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <gabor.molnar@sch.bme.hu>) id 1Uw7EL-0002It-3v for ietf-http-wg@listhub.w3.org; Mon, 08 Jul 2013 08:55:33 +0000
Received: from balu.sch.bme.hu ([152.66.208.40]) by maggie.w3.org with esmtp (Exim 4.72) (envelope-from <gabor.molnar@sch.bme.hu>) id 1Uw7EJ-0003d0-J1 for ietf-http-wg@w3.org; Mon, 08 Jul 2013 08:55:33 +0000
Received: from mail-ie0-f171.google.com (mail-ie0-f171.google.com [209.85.223.171]) by balu.sch.bme.hu (Sun Java System Messaging Server 6.2-7.05 (built Sep 5 2006)) with ESMTPSA id <0MPM0069Q0RQYH00@balu.sch.bme.hu> for ietf-http-wg@w3.org; Mon, 08 Jul 2013 10:55:04 +0200 (CEST)
Received: by mail-ie0-f171.google.com with SMTP id qd12so9523627ieb.30 for <ietf-http-wg@w3.org>; Mon, 08 Jul 2013 01:55:01 -0700 (PDT)
Received: by 10.64.24.210 with HTTP; Mon, 08 Jul 2013 01:54:41 -0700 (PDT)
X-Received: by 10.42.36.3 with SMTP id s3mr7036030icd.42.1373273701542; Mon, 08 Jul 2013 01:55:01 -0700 (PDT)
Date: Mon, 08 Jul 2013 10:54:41 +0200
From: Gábor Molnár <gabor.molnar@sch.bme.hu>
In-reply-to: <CABkgnnVqWjWrGWuP+eZniGJe+WWL7Ekt+88wJ8xO9tkHqzhNfA@mail.gmail.com>
To: Martin Thomson <martin.thomson@gmail.com>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-id: <CA+KJw_6W_J=cLP3AsVQK3GV-PuvM2MsDW7BVUeH=7=s_rk4gGw@mail.gmail.com>
MIME-version: 1.0
Content-type: multipart/alternative; boundary="Boundary_(ID_hPkrF9aeYrl0ut/+P58g6Q)"
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=XxhiX+J5jdy7N/5L4hZEexzQSGHmIsIN92lkFY8dK6o=; b=DokceQ9M0NE0RDEarh5lELlz+aHwYB2k5VDLbiOEBrxWVc1Y4VziVEgKuAJWKOWg53 KnOv4qbTnINJhjnOw1MHxH50PQ9u0iv/KEXRddBWcHGbZeQnRPA3R2u0DRSuEvrIyj0x B4mCh+geH0I8I7qhUxQj+b9cIxdeXJutjOjpFFZvYGSX76YGepKm9HWd9uDYXBztM+f6 CD9471G+laOLGmDrePiB5H/7a5qddfeKLbo0i/zk8PoIqEes8st+AfqWnkg1nc9J+SId HELELfzBfoh0ESAkP8c6pwaRebUW8JMEIStWnOdLBEqcvqUD1dVRs03tnIYdttxIkbPJ HSMw==
References: <CA+KJw_5xfvnCYM7QmtLQebPDO-fJbZz6D47mjHEWui3=fiHUoQ@mail.gmail.com> <CA+KJw_4zqU7jdZNs9NpfA3HbjAcnhRLgMKG0Apf_nzyK9VrkHg@mail.gmail.com> <CABkgnnVqWjWrGWuP+eZniGJe+WWL7Ekt+88wJ8xO9tkHqzhNfA@mail.gmail.com>
Received-SPF: pass client-ip=152.66.208.40; envelope-from=gabor.molnar@sch.bme.hu; helo=balu.sch.bme.hu
X-W3C-Hub-Spam-Status: No, score=-4.4
X-W3C-Hub-Spam-Report: AWL=-1.759, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.303, SPF_PASS=-0.001
X-W3C-Scan-Sig: maggie.w3.org 1Uw7EJ-0003d0-J1 f6b0bf04abd91195f4e21c16e3203cdf
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Header Compression - streaming proposal
Archived-At: <http://www.w3.org/mid/CA+KJw_6W_J=cLP3AsVQK3GV-PuvM2MsDW7BVUeH=7=s_rk4gGw@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/18634
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

2013/7/5 Martin Thomson <martin.thomson@gmail.com>

> In terms of simplicity, doing toggles first, then literals (w/ or w/o
> changes to the table), makes a lot of sense to me.
>
> That shifts some complexity to the encoder.  An encoder will have to
> run two passes over its headers.
>

I've been thinking about this for a while: is it worth moving complexity
from
the decoder to the encoder, when a resource constrained client have to
implement both (for request/response)? I think the answer is yes, because
 1. it *remains possible* to implement encoding with minimal resource
     usage: just use literal representations without indexing for all
headers.
 2. it *becomes possible* to implement decoding with minimal resource
     usage (streaming decoder).


> It also makes routing decisions a little harder for intermediation,
> since routing information (usually :path, but the other :-headers need
> to be checked too) are no longer at the head of the line if we assume
> that :path changes request-by-request.
>

I don't think it becomes harder. If an output buffer is used, and at the end
the headers are reordered, then we get back to the situation we have now.
When implementing a routing logic that can handle streaming, then the
routing
decision can be made somewhat sooner than the end of the decompressing
process. I agree that implementing such a streaming capable routing logic
might be more difficult and might not be worth it (especially given that
proxies are usually not memory-constrained).


> I'm just pointing out the trade-off.  Those costs do seem manageable.
>

Thanks for the feedback!


>  On 5 July 2013 01:51, Gábor Molnár <gabor.molnar@sch.bme.hu> wrote:
> > An important detail was left out:
> >   3.3. step: if the entry was inserted, set the reference flag to true on
> > it.
> >
> >
> > 2013/7/5 Gábor Molnár <gabor.molnar@sch.bme.hu>
> >>
> >> This a proposal for a seemingly minor change, that could make it
> possible
> >> to implement
> >> a streaming encoder/decoder for the compression spec, and make the
> >> decoding process
> >> simpler. It would also eliminate certain corner cases, like the
> shadowing
> >> problem.
> >>
> >> There's a lot of talk recently on enforcing the memory usage limits of
> the
> >> compression
> >> spec. There's one component however, that we don't take into account
> when
> >> computing
> >> the memory usage of compression implementations: it's the Working Set.
> The
> >> problem
> >> is that it can grow without bounds, since as far as I know, HTTP does
> not
> >> impose limits
> >> on the size of the header set. I tried to come up with a decoder
> >> implementation
> >> architecture for the compression spec that would not have to store the
> >> whole set in the
> >> memory.
> >>
> >> Such a decoder would instead stream the output of the decoding process,
> >> header by
> >> header. This seems to be a legitimate approach, since most of the
> >> memory-conscious
> >> parsers I know are implemented as streaming parsers (streaming json,
> xml,
> >> http, ... parsers). Gzip, the base of the previously used header
> >> compression mechanism
> >> is a streaming compressor/decompressor as well, of course.
> >>
> >> It turns out that it is not possible to implement the current spec as a
> >> streaming parser.
> >> The only reason is this: if an entry gets inserted into the working set,
> >> it is not guaranteed
> >> that it will remain there until the end of the decompression process,
> >> since it could be
> >> deleted any time. Because of this, it is not possible to emit any
> headers
> >> until the end
> >> of the process.
> >>
> >> I propose a simple change, that could, however, guarantee this: in
> header
> >> blocks, Indexed
> >> Representations should come first. This would guarantee that after the
> >> Indexed
> >> Representations are over, there will be no deletion from the Working
> Set.
> >> This is the only
> >> thing that would have to be changed. Existing decoding process can be
> >> applied as if nothing
> >> would change.
> >>
> >> But it is now possible to implement a streaming, and - as a side effect
> -
> >> much simpler
> >> decoder like this:
> >>
> >> 0. There's only one component: the Header Table. An entry in the Header
> >> Table is a
> >>     name-value pair with an index (just like before), and a 'reference'
> >> flag that is not set by
> >>     default.
> >> 1. First phase of decoding: dealing with indexed representations.
> Indexed
> >> representations
> >>     simply flip the 'reference' flag on the entry they reference.
> >> 2. Second phase of decoding: before starting the processing of literal
> >> representations, emit
> >>     every name-value pair that is flagged in the Header Table.
> >> 3. Third phase of decoding: for every literal representations:
> >>   1. emit the name-value pair
> >>   2. insert it in the table if needed (incremental or substitution
> >> indexing with table size
> >>       enforcement)
> >> 4. When a new header block arrives, jump to 1.
> >>
> >> It is maybe not obvious at first, but this process is equivalent the the
> >> current decoding process,
> >> if indexed representations come first. Please point out corner cases if
> >> you find any.
> >>
> >> I think that the 'Indexed Representations come first' pattern is
> something
> >> that comes naturally
> >> when implementing an encoder. Even examples in the spec can remain
> >> unchanged, since they
> >> follow this pattern already.
> >>
> >> Regards,
> >>   Gábor
> >
> >
>