Re: Significantly reducing headers footprint

Roberto Peon <grmocg@gmail.com> Sun, 10 June 2012 23:41 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E123321F8493 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sun, 10 Jun 2012 16:41:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.998
X-Spam-Level:
X-Spam-Status: No, score=-9.998 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, J_CHICKENPOX_47=0.6, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WapeO0m9Hkhw for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sun, 10 Jun 2012 16:41:40 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 4800121F8491 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Sun, 10 Jun 2012 16:41:40 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.69) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1Sdrk2-0003ip-IF for ietf-http-wg-dist@listhub.w3.org; Sun, 10 Jun 2012 23:40:18 +0000
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtp (Exim 4.69) (envelope-from <grmocg@gmail.com>) id 1Sdrjr-0003hR-HC for ietf-http-wg@listhub.w3.org; Sun, 10 Jun 2012 23:40:07 +0000
Received: from mail-wi0-f179.google.com ([209.85.212.179]) by lisa.w3.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.72) (envelope-from <grmocg@gmail.com>) id 1Sdrjn-0005Ki-OC for ietf-http-wg@w3.org; Sun, 10 Jun 2012 23:40:05 +0000
Received: by wibhr14 with SMTP id hr14so1942066wib.8 for <ietf-http-wg@w3.org>; Sun, 10 Jun 2012 16:39:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=zEE9L2HpLxAFx4/hL/esKYwrN80W/DjGub2DNIKGb0M=; b=oyIhyk87G9+BNzaCr5WNWaZi9OJz0vBYg9iGaW47uO4g0YK7v2X0H2aeuQR+5vr+C6 1vge8Q5RR5eokOIGhOYkiQZnz4ZQP5j+8P1/iVmJ1QRJvLE4g/DI3t1wiVwfr20WlF/j Hp5zZ9LdAFZF6kfuHFTJpE9ubPMtT3fVkY+113upOm4eNn4d55TznJkNCsLkZZFagdAV yEXWiX6rPP0iQRtDORA/HIlt+LBGejrWhiThnY9DenUGqDOWadOitz45XBbhwdpd7uS5 OErAJyo3Tzo7LUiY2dchhBPWFrOvLPV431qz0DZUc4xOUKsL18Rlmwl5xq0EsfmA7Ver tRZw==
MIME-Version: 1.0
Received: by 10.180.99.70 with SMTP id eo6mr16171822wib.17.1339371577743; Sun, 10 Jun 2012 16:39:37 -0700 (PDT)
Received: by 10.194.0.238 with HTTP; Sun, 10 Jun 2012 16:39:37 -0700 (PDT)
In-Reply-To: <20120610231757.GA4983@1wt.eu>
References: <20120610231757.GA4983@1wt.eu>
Date: Sun, 10 Jun 2012 16:39:37 -0700
Message-ID: <CAP+FsNeqeihCZbbDG3mO55M9wEVEuFixGfrdPQ8jkSgSmjzN3A@mail.gmail.com>
From: Roberto Peon <grmocg@gmail.com>
To: Willy Tarreau <w@1wt.eu>
Cc: ietf-http-wg@w3.org
Content-Type: multipart/alternative; boundary="f46d04428b501d19fb04c226bf99"
Received-SPF: pass client-ip=209.85.212.179; envelope-from=grmocg@gmail.com; helo=mail-wi0-f179.google.com
X-W3C-Hub-Spam-Status: No, score=-2.7
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001
X-W3C-Scan-Sig: lisa.w3.org 1Sdrjn-0005Ki-OC 5b2ce41815f24a0c8e980776691fb421
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Significantly reducing headers footprint
Archived-At: <http://www.w3.org/mid/CAP+FsNeqeihCZbbDG3mO55M9wEVEuFixGfrdPQ8jkSgSmjzN3A@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/13708
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>
Resent-Message-Id: <E1Sdrk2-0003ip-IF@frink.w3.org>
Resent-Date: Sun, 10 Jun 2012 23:40:18 +0000

On Sun, Jun 10, 2012 at 4:17 PM, Willy Tarreau <w@1wt.eu> wrote:

> Hi,
>
> I recently managed to collect requests from some enterprise proxies to
> experiment with binary encoding as described in our draft [1].
>
> After some experimentation and discussions with some people, I managed to
> get significant gains [2] which could still be improved.
>
> What's currently performed is the following :
>  - message framing
>  - binary encoding of the HTTP version (2 bits)
>  - binary encoding of the method (4 bits)
>  - move Host header to the URI
>  - encoding of the URI relative to the previous one
>  - binary encoding of each header field names (1 byte)
>  - encoding of each header relative to the previous one.
>  - binary encoding of the If-Modified-Since date
>
> The code achieving this is available at [2]. It's an ugly PoC but it's
> a useful experimentation tool for me, feel free to use it to experiment
> with your own implementations if you like.
>
> I'm already observing request compression ratios of 90-92% on various
> requests, including on a site with a huge page with large cookies and
> URIs ; 132 kB of requests were reduced to 10kB. In fact while the draft
> suggests use of multiple header contexts (connection, common and message),
> now I'm feeling like we don't need to store 3 contexts anymore, one single
> is enough if requests remain relative to previous one.
>

For my deployment, I'm fairly certain this would not be all that common.
Two contexts may be enough 'connection' and 'common', but I think you had
it right the first time.
The more clients you have and are aggregating through to elsewhere, to more
advantageous that scheme becomes.


>
> But I think that by typing a bit more the protocol, we could improve even
> further and at the same time improve interoperability. Among the things
> I am observing which still take some space in the page load of an online
> newspaper (127 objects, data were anonymized) :
>
>  - User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; fr;
> rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12
>    => Well, this one is only sent once over the connection, but we could
>       reduce this further by using a registery of known vendors/products
>       and incite vendors to emit just a few bytes (vendor/product/version).
>
>  - Accept: text/css,*/*;q=0.1
>    => this one changes depending on what object the browser requests, so it
>       is less efficiently compressed :
>
>        1 Accept:
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
>        4 Accept: text/css,*/*;q=0.1
>        8 Accept: */*
>        1 Accept: image/png,image/*;q=0.8,*/*;q=0.5
>        2 Accept: */*
>        9 Accept: image/png,image/*;q=0.8,*/*;q=0.5
>        2 Accept:
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
>       90 Accept: image/png,image/*;q=0.8,*/*;q=0.5
>        1 Accept: */*
>        9 Accept: image/png,image/*;q=0.8,*/*;q=0.5
>
>    => With better request reordering, we could have this :
>
>       11 Accept: */*
>      109 Accept: image/png,image/*;q=0.8,*/*;q=0.5
>        4 Accept: text/css,*/*;q=0.1
>        3 Accept:
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
>

Achieving this seems difficult? How would we get a reording to occur in a
reasonable manner?


>
>    I'm already wondering if we have *that* many content-types and if we
> need
>    to use long words such as "application" everywhere.
>

We were quite wordy in the past :)


>
>  - Accept-Language: fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3
>    Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
>    Accept-Encoding: gzip,deflate
>
>    => Same comment as above concerning the number of possible values.
> However
>       these ones were all sent identical so the gain is more for the remote
>       parser than for the upstream link.
>
>  - Referer: http://www.example.com/
>    => referrers do compress quite well relative to each other. Still there
>       are many blogs and newspapers on the net today with very large URLs,
>       and their URLs cause very large referrers to be sent along with each
>       object composing the page. At least a better ordering of the requests
>       saves a few more hundred bytes for the whole page. In the end I only
>       got 4 different values :
>       http://www.example.com/
>
> http://www.example.com/sites/news/files/css/css_RWicSr_h9UxCJrAbE57UbNf_oNYhtaF5YghFXJemVNQ.css
>
> http://www.example.com/sites/news/files/css/css_lKoFARDAyB20ibb5wNG8nMhflDNNW_Nb9DsNprYt8mk.css
>
> http://www.example.com/sites/news/files/css/css_qSyFGRLc-tslOV1oF9GCzEe1eGDn4PP7vOM1HGymNYU.css
>
>    Among the improvements I'm thinking about, we could decide to use
> relative
>    URIs when the site is the same. I don't know either if it's of any use
> on
>    the server side to know that the request was emitted for a specific CSS.
>
>  - If-Modified-Since: Fri, 27 Apr 2012 14:41:31 GMT
>    => I have encoded this one on 32 and 64 bits and immediately saved 3.1
> and
>       2.6 kB respectively. Well, storing 4 more bytes per request might be
>       wasted considering that we probably don't need a nanosecond
> resolution
>       for 585 years. But 40-48 bits might be fine.
>
>  - Cache-Control: max-age=0
>    => I suspect the user hit the Refresh button, this was present in about
>       half the requests. Anyway, this raises the question of the length it
>       requires for something which is just a boolean here ("ignore cache").
>       Probably that a client has very few Cache-Control header values to
>       send, and that reducing this to a smaller set would be beneficial.
>
>  - If-None-Match: "3013140661"
>    => I guess there is nothing we can do on this one, except suggest that
>       implementors use more bits and less bytes to emit their etags.
>
>  - Cookie: xtvrn=$OaiJty$; xtan327981=c; xtant327981=c; has_js=c;
> __utma=KBjWnx24Q.7qFKqmB7v.i0JDH91L_R.0kU2W1uL49.JM4KtFLV0b.C;
> __utmc=Rae9ZgQHz;
> __utmz=NRSZOcCWV.d5MlK5RJsi.-.f.N8J73w=S1SLuT_j0m.O8|VsIxwE=(jHw58obb)|r9SgsT=WQfZe8jr|pFSZGH=/@/qwDyMw3I;
> __gads=td=ASP_D5ml4Ebevrej:R=pvxltafqZK:x=E4FUn3YiNldW3rhxzX6YlCptZp8zF-b5qc;
> _chartbeat2=oQvb8k_G9tduhauf.LqOukjnlaaE7K.uDBaR79E1WT4t.Kr9L_lIrOtruE8;
> __qca=LC9oiRpFSWShYlxUtD37GJ2k8AL; __utmb=vG8UMEjrz.Qf.At.pXD61lUeHZ;
> pm8196_1=c; pm8194_1=c
>
>    => amazingly, this one compresses extremely well with the above scheme,
>       because additions are performed at the end so consecutive cookies
> keep
>       a lot in common, and changes are not too frequent. However, given the
>       omnipresent usage of cookies, I was wondering why we should not
> create
>       a new entity of its own for the cookies instead of abusing the Cookie
>       header. It would make it a lot easier for both ends to find what they
>       need. For instance, a load balancer just needs to find a server name
>       in the thing above. What a waste of on-wire bits and of CPU cycles !
>

You're suggesting breaking the above into smaller, addressable bits?


>
> BTW, binary encoding would probably also help addressing a request I often
> hear in banking environments : the need to sign/encrypt/compress only
> certain
> headers or cookies. Right now when people do this, they have to
> base64-encode
> the result, which is another transformation at both ends and inflates the
> data. If we make provisions in the protocol for announcing encrypted or
> compressed headers using 2-3 bits, it might become more usable. I'm not
> convinced it provides any benefit between a browser and an origin server
> though. So maybe it will remain application-specific and the transport
> just has to make it easier to emit 8-bit data in header field values.
>
>

> Has anyone any opinion on the subject above ? Or ideas about other things
> that terribly clobber the upstream pipe and that should be fixed in 2.0 ?
>

I like binary framing because it is significantly easier to get right and
works well when we're considering things other than just plain HTTP.
Token-based parsing is quite annoying in comparison-- it either requires
significant implementation complexity to minimize memory. With length-based
framing, the implementation complexity is decreased arguably for everyone
and certainly in cases where you wish to be efficient with buffers.

-=R


> I hope I'll soon find some time to update our draft to reflect recent
> updates
> and findings.
>
> Regards,
> Willy
>
> --
> [1] http://tools.ietf.org/id/draft-tarreau-httpbis-network-friendly-00.txt
> [2] http://1wt.eu/http2/
>
>
>