Re: Compression analysis of perfect atom-based compressor

Mark Nottingham <mnot@mnot.net> Sat, 06 April 2013 07:21 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1420021F97E9 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sat, 6 Apr 2013 00:21:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.599
X-Spam-Level:
X-Spam-Status: No, score=-9.599 tagged_above=-999 required=5 tests=[AWL=1.000, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5vkB1Rdoi8YW for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sat, 6 Apr 2013 00:21:21 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 69B3321F9797 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Sat, 6 Apr 2013 00:21:16 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1UONPL-0003UZ-Fn for ietf-http-wg-dist@listhub.w3.org; Sat, 06 Apr 2013 07:19:27 +0000
Resent-Date: Sat, 06 Apr 2013 07:19:27 +0000
Resent-Message-Id: <E1UONPL-0003UZ-Fn@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <mnot@mnot.net>) id 1UONPG-0003Tt-Um for ietf-http-wg@listhub.w3.org; Sat, 06 Apr 2013 07:19:22 +0000
Received: from mxout-07.mxes.net ([216.86.168.182]) by maggie.w3.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from <mnot@mnot.net>) id 1UONPD-0008Nc-62 for ietf-http-wg@w3.org; Sat, 06 Apr 2013 07:19:21 +0000
Received: from [192.168.1.80] (unknown [118.209.42.8]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id 07FC522E1F4; Sat, 6 Apr 2013 03:18:56 -0400 (EDT)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\))
From: Mark Nottingham <mnot@mnot.net>
In-Reply-To: <op.wu2umda3iw9drz@riaa>
Date: Sat, 06 Apr 2013 18:18:52 +1100
Cc: ietf-http-wg@w3.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <913DC7EC-F3C5-4FAA-A435-6392B58874F1@mnot.net>
References: <CAP+FsNew+0ce6q24KRAdut7g4OjU5ysOSxd8Yk-FBmRrx0550w@mail.gmail.com> <op.wu2umda3iw9drz@riaa>
To: Martin Nilsson <nilsson@opera.com>
X-Mailer: Apple Mail (2.1503)
Received-SPF: pass client-ip=216.86.168.182; envelope-from=mnot@mnot.net; helo=mxout-07.mxes.net
X-W3C-Hub-Spam-Status: No, score=-4.3
X-W3C-Hub-Spam-Report: AWL=-2.416, BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001
X-W3C-Scan-Sig: maggie.w3.org 1UONPD-0008Nc-62 1790cb00b778ffb114e834a6ee71dc02
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Compression analysis of perfect atom-based compressor
Archived-At: <http://www.w3.org/mid/913DC7EC-F3C5-4FAA-A435-6392B58874F1@mnot.net>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/17205
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

On 06/04/2013, at 3:55 AM, Martin Nilsson <nilsson@opera.com> wrote:

> On Fri, 05 Apr 2013 01:55:53 +0200, Roberto Peon <grmocg@gmail.com> wrote:
> 
>> 
>>   - We need a better survey of headers from everywhere :)
> 
> I just captured 251'644 requests on a mobile site and counted the number of occurrences of every header, and it is quite the zoo of browser specific, device specific and network specific information added to these requests. Significant amount of request size is spent on wap profile headers like this, which obviously would benefit greatly from compression.
> 
> "X-WAP-Profile-Diff: 1; <?xml version=\"1.0\"?><rdf:RDF xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\" xmlns:prf=\"http://www.wapforum.org/UAPROF/ccppschema-19991014#\"><!-- browser vendor site: Default description of properties --><rdf:Description><prf:CcppAccept><rdf:Bag><rdf:li>application/vnd.wap.wmlscriptc</rdf:li><rdf:li>text/vnd.wap.wml</rdf:li><rdf:li>application/vnd.wap.xhtml+xml</rdf:li><rdf:li>application/xhtml+xml</rdf:li><rdf:li>text/xml</rdf:li><rdf:li>text/html</rdf:li><rdf:li>text/css</rdf:li><rdf:li>multipart/mixed</rdf:li><rdf:li>*/*</rdf:li></rdf:Bag></prf:CcppAccept></rdf:Description></rdf:RDF>,2; <?xml version=\"1.0\"?><rdf:RDF xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\" xmlns:prf=\"http://www.wapforum.org/UAPROF/ccppschema-19991014#\"><!-- browser vendor site: Default description of properties --><rdf:Description><prf:CcppAccept-Charset><rdf:Bag><rdf:li>*</rdf:li></rdf:Bag></prf:CcppAccept-Charset></rdf:Description></rdf:RDF>,3; <?xml version=\"1.0\"?><rdf:RDF xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\" xmlns:prf=\"http://www.wapforum.org/UAPROF/ccppschema-19991014#\"><!-- browser vendor site: Default description of properties --><rdf:Description><prf:CcppAccept-Language><rdf:Seq><rdf:li>en</rdf:li></rdf:Seq></prf:CcppAccept-Language></rdf:Description></rdf:RDF>"

Wow. Hardly know where to start with that one...


> The problem is how to create an exportable version of this kind of information. As you can see in the list, there is a ton of private information. The toplist of headers, cut off at a 100 count:


...

I think it depends on what we want to do with the data. E.g., if it's suitable for feeding into the compression-test suite, you could do so and report the results back to us, or you could come to some agreement with the implementers about sharing the data under terms your lawyers are comfortable with (blech, but still..).

If it's just getting characteristics of the traces to inform our discussions, you could just continue to post your observations, and take requests for other summaries of the data.

Cheers,

--
Mark Nottingham   http://www.mnot.net/