Header Compression - streaming proposal
Gábor Molnár <gabor.molnar@sch.bme.hu> Fri, 05 July 2013 02:45 UTC
Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 46EDD11E8111 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 4 Jul 2013 19:45:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.616
X-Spam-Level:
X-Spam-Status: No, score=-9.616 tagged_above=-999 required=5 tests=[AWL=0.060, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kihmRiunCPSk for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 4 Jul 2013 19:45:19 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 4D55711E8109 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Thu, 4 Jul 2013 19:45:18 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1Uuvzs-0006vh-MX for ietf-http-wg-dist@listhub.w3.org; Fri, 05 Jul 2013 02:43:44 +0000
Resent-Date: Fri, 05 Jul 2013 02:43:44 +0000
Resent-Message-Id: <E1Uuvzs-0006vh-MX@frink.w3.org>
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <gabor.molnar@sch.bme.hu>) id 1Uuvzk-0006tR-1e for ietf-http-wg@listhub.w3.org; Fri, 05 Jul 2013 02:43:36 +0000
Received: from balu.sch.bme.hu ([152.66.208.40]) by lisa.w3.org with esmtp (Exim 4.72) (envelope-from <gabor.molnar@sch.bme.hu>) id 1Uuvzi-0006Wd-P0 for ietf-http-wg@w3.org; Fri, 05 Jul 2013 02:43:36 +0000
Received: from mail-ie0-f174.google.com (mail-ie0-f174.google.com [209.85.223.174]) by balu.sch.bme.hu (Sun Java System Messaging Server 6.2-7.05 (built Sep 5 2006)) with ESMTPSA id <0MPF00I52ZJTKV00@balu.sch.bme.hu> for ietf-http-wg@w3.org; Fri, 05 Jul 2013 04:43:07 +0200 (CEST)
Received: by mail-ie0-f174.google.com with SMTP id 9so4342188iec.33 for <ietf-http-wg@w3.org>; Thu, 04 Jul 2013 19:43:05 -0700 (PDT)
Received: by 10.64.24.210 with HTTP; Thu, 04 Jul 2013 19:42:44 -0700 (PDT)
X-Received: by 10.50.16.8 with SMTP id b8mr27383233igd.1.1372992185147; Thu, 04 Jul 2013 19:43:05 -0700 (PDT)
Date: Fri, 05 Jul 2013 04:42:44 +0200
From: Gábor Molnár <gabor.molnar@sch.bme.hu>
To: HTTP Working Group <ietf-http-wg@w3.org>
Message-id: <CA+KJw_5xfvnCYM7QmtLQebPDO-fJbZz6D47mjHEWui3=fiHUoQ@mail.gmail.com>
MIME-version: 1.0
Content-type: multipart/alternative; boundary="Boundary_(ID_wo/juqqt9W3uGDZkjrF0Vw)"
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=Nll1HTtgj3G66Gtxgbwz51SpoSEAWKQIki2CP7a8LXs=; b=oGb4wIMYMC6lqH3MB+cJyY0v7kNwkF+id96R0BvS0oyrrRSA/MmaItne80rNyQ7lkX 40nU0ylPmprcivzo76Fb0Oe0l1n2+ummbbtb7jXEW4y4g9uUsrJd9SeNzd74qrgFL/Tr sgugxfZl+vAgN0PucUOQ06vIsBSgcLCdvHOLhg3OE4cxncDfiZmIsoYH0+nO/ip7D+ha HV0oOHb3wsjxGzx2KG2N4LesTeJtHyCmNduvfvaFrsoHwGAbG79eOIIT2IQOW37h+TRn 7QyoUPpHm8aHrIUv/inXh11VW79unV9KL5sq19r9XR5XEmFBGw2sBjcrCbfVS6k5YY3L Ekhw==
Received-SPF: pass client-ip=152.66.208.40; envelope-from=gabor.molnar@sch.bme.hu; helo=balu.sch.bme.hu
X-W3C-Hub-Spam-Status: No, score=-4.1
X-W3C-Hub-Spam-Report: AWL=-1.597, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.227, SPF_PASS=-0.001
X-W3C-Scan-Sig: lisa.w3.org 1Uuvzi-0006Wd-P0 3a43210fe137dca1a1834ee9eeafdbba
X-Original-To: ietf-http-wg@w3.org
Subject: Header Compression - streaming proposal
Archived-At: <http://www.w3.org/mid/CA+KJw_5xfvnCYM7QmtLQebPDO-fJbZz6D47mjHEWui3=fiHUoQ@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/18618
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>
This a proposal for a seemingly minor change, that could make it possible to implement a streaming encoder/decoder for the compression spec, and make the decoding process simpler. It would also eliminate certain corner cases, like the shadowing problem. There's a lot of talk recently on enforcing the memory usage limits of the compression spec. There's one component however, that we don't take into account when computing the memory usage of compression implementations: it's the Working Set. The problem is that it can grow without bounds, since as far as I know, HTTP does not impose limits on the size of the header set. I tried to come up with a decoder implementation architecture for the compression spec that would not have to store the whole set in the memory. Such a decoder would instead stream the output of the decoding process, header by header. This seems to be a legitimate approach, since most of the memory-conscious parsers I know are implemented as streaming parsers (streaming json, xml, http, ... parsers). Gzip, the base of the previously used header compression mechanism is a streaming compressor/decompressor as well, of course. It turns out that it is not possible to implement the current spec as a streaming parser. The only reason is this: *if an entry gets inserted into the working set, it is not guaranteed that it will remain there until the end of the decompression process, since it could be deleted any time*. Because of this, it is not possible to emit any headers until the end of the process. I propose a simple change, that could, however, guarantee this: *in header blocks, Indexed Representations should come first*. This would guarantee that after the Indexed Representations are over, there will be no deletion from the Working Set. This is the only thing that would have to be changed. Existing decoding process can be applied as if nothing would change. But it is now possible to implement a streaming, and - as a side effect - much simpler decoder like this: 0. There's only one component: the Header Table. An entry in the Header Table is a name-value pair with an index (just like before), and a 'reference' flag that is not set by default. 1. First phase of decoding: dealing with indexed representations. Indexed representations simply flip the 'reference' flag on the entry they reference. 2. Second phase of decoding: before starting the processing of literal representations, emit every name-value pair that is flagged in the Header Table. 3. Third phase of decoding: for every literal representations: 1. emit the name-value pair 2. insert it in the table if needed (incremental or substitution indexing with table size enforcement) 4. When a new header block arrives, jump to 1. It is maybe not obvious at first, but this process is equivalent the the current decoding process, if indexed representations come first. Please point out corner cases if you find any. I think that the 'Indexed Representations come first' pattern is something that comes naturally when implementing an encoder. Even examples in the spec can remain unchanged, since they follow this pattern already. Regards, Gábor
- Header Compression - streaming proposal Gábor Molnár
- Re: Header Compression - streaming proposal Gábor Molnár
- Re: Header Compression - streaming proposal Martin Thomson
- Re: Header Compression - streaming proposal Gábor Molnár
- Re: Header Compression - streaming proposal Gábor Molnár
- Re: Header Compression - streaming proposal Martin Thomson
- RE: Header Compression - streaming proposal RUELLAN Herve
- Re: Header Compression - streaming proposal Roberto Peon