Re: Compression analysis of perfect atom-based compressor

"Martin Nilsson" <nilsson@opera.com> Fri, 05 April 2013 16:56 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7D1BB21F982A for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Fri, 5 Apr 2013 09:56:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.999
X-Spam-Level:
X-Spam-Status: No, score=-7.999 tagged_above=-999 required=5 tests=[BAYES_50=0.001, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zbnmbBxu0nip for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Fri, 5 Apr 2013 09:56:46 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id AF73D21F9822 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Fri, 5 Apr 2013 09:56:46 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1UO9vX-0003DS-F2 for ietf-http-wg-dist@listhub.w3.org; Fri, 05 Apr 2013 16:55:47 +0000
Resent-Date: Fri, 05 Apr 2013 16:55:47 +0000
Resent-Message-Id: <E1UO9vX-0003DS-F2@frink.w3.org>
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <nilsson@opera.com>) id 1UO9vU-0003CS-Qh for ietf-http-wg@listhub.w3.org; Fri, 05 Apr 2013 16:55:44 +0000
Received: from smtp.opera.com ([213.236.208.81]) by lisa.w3.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from <nilsson@opera.com>) id 1UO9vS-0005ws-Re for ietf-http-wg@w3.org; Fri, 05 Apr 2013 16:55:44 +0000
Received: from riaa (oslo.jvpn.opera.com [213.236.208.46]) (authenticated bits=0) by smtp.opera.com (8.14.3/8.14.3/Debian-5+lenny1) with ESMTP id r35GtFJo001153 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for <ietf-http-wg@w3.org>; Fri, 5 Apr 2013 16:55:16 GMT
Content-Type: text/plain; charset="utf-8"; format="flowed"; delsp="yes"
To: ietf-http-wg@w3.org
References: <CAP+FsNew+0ce6q24KRAdut7g4OjU5ysOSxd8Yk-FBmRrx0550w@mail.gmail.com>
Date: Fri, 05 Apr 2013 18:55:15 +0200
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: Martin Nilsson <nilsson@opera.com>
Organization: Opera Software ASA
Message-ID: <op.wu2umda3iw9drz@riaa>
In-Reply-To: <CAP+FsNew+0ce6q24KRAdut7g4OjU5ysOSxd8Yk-FBmRrx0550w@mail.gmail.com>
User-Agent: Opera Mail/12.15 (Linux)
Received-SPF: pass client-ip=213.236.208.81; envelope-from=nilsson@opera.com; helo=smtp.opera.com
X-W3C-Hub-Spam-Status: No, score=-5.5
X-W3C-Hub-Spam-Report: AWL=-0.781, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-2.373, SPF_PASS=-0.001
X-W3C-Scan-Sig: lisa.w3.org 1UO9vS-0005ws-Re 06c0402fc9d35080b8daed60876ba9ec
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Compression analysis of perfect atom-based compressor
Archived-At: <http://www.w3.org/mid/op.wu2umda3iw9drz@riaa>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/17204
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

On Fri, 05 Apr 2013 01:55:53 +0200, Roberto Peon <grmocg@gmail.com> wrote:

>
>    - We need a better survey of headers from everywhere :)

I just captured 251'644 requests on a mobile site and counted the number  
of occurrences of every header, and it is quite the zoo of browser  
specific, device specific and network specific information added to these  
requests. Significant amount of request size is spent on wap profile  
headers like this, which obviously would benefit greatly from compression.

"X-WAP-Profile-Diff: 1; <?xml version=\"1.0\"?><rdf:RDF  
xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"  
xmlns:prf=\"http://www.wapforum.org/UAPROF/ccppschema-19991014#\"><!--  
browser vendor site: Default description of properties  
--><rdf:Description><prf:CcppAccept><rdf:Bag><rdf:li>application/vnd.wap.wmlscriptc</rdf:li><rdf:li>text/vnd.wap.wml</rdf:li><rdf:li>application/vnd.wap.xhtml+xml</rdf:li><rdf:li>application/xhtml+xml</rdf:li><rdf:li>text/xml</rdf:li><rdf:li>text/html</rdf:li><rdf:li>text/css</rdf:li><rdf:li>multipart/mixed</rdf:li><rdf:li>*/*</rdf:li></rdf:Bag></prf:CcppAccept></rdf:Description></rdf:RDF>,2;  
<?xml version=\"1.0\"?><rdf:RDF  
xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"  
xmlns:prf=\"http://www.wapforum.org/UAPROF/ccppschema-19991014#\"><!--  
browser vendor site: Default description of properties  
--><rdf:Description><prf:CcppAccept-Charset><rdf:Bag><rdf:li>*</rdf:li></rdf:Bag></prf:CcppAccept-Charset></rdf:Description></rdf:RDF>,3;  
<?xml version=\"1.0\"?><rdf:RDF  
xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"  
xmlns:prf=\"http://www.wapforum.org/UAPROF/ccppschema-19991014#\"><!--  
browser vendor site: Default description of properties  
--><rdf:Description><prf:CcppAccept-Language><rdf:Seq><rdf:li>en</rdf:li></rdf:Seq></prf:CcppAccept-Language></rdf:Description></rdf:RDF>"

The problem is how to create an exportable version of this kind of  
information. As you can see in the list, there is a ton of private  
information. The toplist of headers, cut off at a 100 count:

user-agent           253841
host                 251654
accept               172367
connection           169545
x-wap-profile        168415
accept-encoding      160263
accept-language      147588
accept-charset       142473
x-forwarded-for      76146
cookie2              72529
referer              58741
cache-control        51080
via                  46627
cookie               44593
content-length       36777
content-type         33884
drm-version          32665
x-nokia-musicshop-version 23321
x-nokia-musicshop-bearer 23318
if-modified-since    13569
x-wap-profile-diff   10942
wap-connection       10271
x-network-info       9805
pragma               9576
x-msisdn             8332
x-up-calling-line-id 8027
x-wap-proxy-cookie   7501
x-nokia-device-type  7359
x-mobile-gateway     7359
x-country-code       7339
x-nokia-remotesocket 6474
x-nokia-localsocket  6371
x-up_devcap-screendepth 6247
x-nokia-gateway-id   5081
x-nokia-msisdn       5034
rim_cod_selection    5006
msisdn               4634
proxy-connection     4568
x-nokia-bearer       4455
profile              4117
x-bluecoat-via       3981
range                3702
clientip             3437
x-piper-id           3436
keep-alive           2988
http_x_msisdn        2679
rat                  2665
x-ucbrowser-ua       2355
x-ucbrowser-device   2118
x-ucbrowser-device-ua 2118
proxy-authorization  1967
nbg-imp-mobile       1956
nbg-imp-omitlog      1956
x-nsn-proxytype      1956
nbg-imp-msisdn       1956
nbg-imp-userpref     1956
nbg-imp-proxy        1956
nbg-imp-user-agent   1954
if-range             1925
x-up-forwarded-for   1821
pnp                  1805
x-forwarded-port     1804
x-forwarded-proto    1804
x-nokiasession       1603
x-up-3ggp-imeisv     1591
x-up-sgsn-ip         1591
client-ip            1473
x-ebo-ua             1357
x-roaming            1336
imsi                 1286
x-nokia-upgradeid    1141
y-msisdn             1122
x-source-id          1089
x-nokia-ipaddress    1065
x-nokia-chargingid   1044
x-nokia-imsi         1031
x-nokia-maxdownlinkbitrate 1004
x-nokia-maxuplinkbitrate 993
apn                  981
x-rat-type           897
x-nsn-maxuplinkbitrate 892
x-nsn-bearer         892
x-nsn-gateway-id     892
x-up-bear-type       892
x-nsn-maxdownlinkbitrate 892
x-wap-client-ip      864
x-nokia-roamingind   788
x-nokia-prepaidind   787
x-nokia-rattype      787
x-imei               786
x-msp-clid           780
x-msp-ag             774
x-rim-transcode-content 746
ua-cpu               739
te                   681
x-ucbrowser-phone-ua 600
x-ucbrowser-phone    600
ip                   546
x-cnection           539
x-xxy-connection     506
bearer-indication    505
x-up-subno           465
surrogate-capability 416
x-wap-network-client-ip 396
x-brazil-forwarded-for 396
x-wap-clientid       383
x-nokia-connection_mode 362
max-forwards         344
x-fh-ip              338
x-fh-msisdn          338
x-fh-apn             338
x-wap-fh-subscriber-info 338
x-fh-event-id        338
x-fh-sgsn-ip         337
x-fh-virtual-gateway 337
x-fh-port            337
x-up-devcap-screendepth 328
x-up-devcap-screenpixels 328
user-ip              325
x-huawei-imsi        312
x-mms-prepaid-flag   308
x-mms-sgsnip         308
x-mms-sgsnmccmnc     308
q-ua                 299
ua-pixels            295
ua-voice             295
ua-color             295
ua-os                289
x-ip-address         287
called-station-id    279
schar_h              274
context_id           274
sgsn-ip-address      273
bearer-type          273
charging-characteristics 273
nas-ip-address       273
ip-address           273
accounting-session-id 273
x-nokia-roamingid    257
x-nokia-prepaidid    257
x-wap-network-client-msisdn 257
x-nokia-ggsnipaddress 243
x-rim-accept-encoding 235
x-rim-img-setting    235
x-rim-image-threshold 235
x-rim-gw-properties  235
x-wap-msisdn         226
x-up-bearer-type     225
imsisdn              221
x-up-subscriber-cos  219
x-nokia-sgsnipaddress 213
x-wap-personalization 205
x-up-user-location-info 193
x-up-ch-rat-type     193
x-up-nai             193
x-network-client-ip  192
http_client_ip       174
encoding-version     166
x-method             136
 from                 132
xxxxxxxxxx           112
x-up-devcap-cc       111
ipaddr               111
dnt                  104
x-nokia.wia.accept.original 103
x-nbg_transactionid  102
x-wap-client-sdu-size 102
x-wap-gateway        102
x-wap-session-id     102
device-stock-ua      101
x-online-host        101

/Martin Nilsson

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/