[hybi] permessage-deflate performance tuning statistics

Peter Thorson <webmaster@zaphoyd.com> Tue, 15 October 2013 18:34 UTC

From: Peter Thorson <webmaster@zaphoyd.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Message-Id: <FD138330-7D7E-4450-B4F5-64551F92F26D@zaphoyd.com>
Date: Tue, 15 Oct 2013 13:34:13 -0500
To: "hybi@ietf.org" <hybi@ietf.org>
Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\))
Subject: [hybi] permessage-deflate performance tuning statistics
Precedence: list

Hi all,

I've been doing a bit of research and testing on the compression performance and memory usage of permessage-deflate on WebSocket like workloads. I plan to write about this with more final numbers once the spec is official, but some of the intermediate results and tools may be of interest to this group during the standardization process so I'm sharing some of those notes here.

Some highlights:
- Permessage-deflate offers significant bandwidth savings.
- How the extension performs depends greatly on the type of data that is being compressed and the compression settings given to deflate.
- Settings that work well for HTTP and the zlib/permessage-deflate defaults are inefficient for some common WebSocket workflows.
- The two parameters presently in the draft specification both provide significant and meaningful options for tuning compression performance for those workflows. Implementations (especially browsers) are greatly encouraged to support all options.

Details & Methods:

My goal is to explore a number of the settings offered by deflate and determine what effect they have on compression performance, as well as CPU/memory usage. To this end I have written a tool (https://github.com/zaphoyd/ws-pmce-stats) that will produce a report of compression related statistics when fed a transcript of messages.

The first workflow I have explored in detail is a WebSocket service that uses a JSON based protocol to deliver short streaming updates. My some of my sample data is present in the datasets folder of the above git repository. Some examples include a mock chat service with data seeded from publicly logged mediawiki IRC channel and a mock stock ticker service with data seeded with historical stock quote data.

I explored the effects of the context takeover and window bits settings from the permessage-deflate draft spec as well as a few zlib settings that can be unilaterally specified without any extension negotiation. Some, but not all of these, are presently exposed in higher level languages that use zlib as their underlying compression library. I looked at two of these settings in particular, the "Compression Level" and the "Memory Level". The former affects speed vs compression ratio, the latter memory usage vs compression ratio.

Preliminary results for the JSON short message service workflow:

Context Takeover
================
Allowing context takeover drastically improves compression ratios. With other stats at defaults, no_context_takeover achieves a compression ratio of 0.84, with takeover 0.30. This is a significant gain. Note: this gain comes at a fairly high cost. Enabling context takeover requires a separate context to be maintained for every connection, rather than fixed number for all connections.

Window Bits
===========
Window bits has a sizable but well distributed effect on ratios. It has a significant effect on memory usage though. With all other stats at defaults:
window bits = compression ratio / buffer size per connection
08 = 0.510 / 1+128=129KiB
09 = 0.510 / 2+128=130KiB
10 = 0.435 / 4+128=132KiB
11 = 0.384 / 8+128=136KiB
12 = 0.353 / 16+128=144KiB
13 = 0.330 / 32+128=160KiB
14 = 0.315 / 64+128=192KiB
15 = 0.304 / 128+128=256KiB
Reducing window bits from the default (15) to 11 provides an 8% reduction in compression but nearly a 50% savings in per connection memory usage. Reduced window bits to very small values (8-9) also increases compression runtime by 40-50%. 10 Is less slow, 11+ appear to all be about the same speed.

Compression Level
=================
Compression level does not have a material impact on performance or ratios for this workflow.

Memory Level
============
Memory level does not have a significant impact on compression ratios. A value of 9 produces the ratio 0.304 and a value of 1 produces the ratio 0.307. It does affect memory usage and compression speed however:
mem_level value = runtime / memory usage
1 = 13.05ms / 128+1=129KiB
2 = 10.41ms / 128+2=130KiB
3 = 10.15ms / 128+4=132KiB
4 = 8.18ms / 128+8=136KiB
5 = 7.63ms / 128+16=144KiB
6 = 7.69ms / 128+32=160KiB
7 = 7.92ms / 128+64=192KiB
8 = 7.69ms / 128+128=256KiB
9 = 7.84ms / 128+256=386KiB

All of the stats above show the effects of changing one parameter in isolation. Additional gains, especially with respect to memory usage per connection can be had by combinations of parameters. Many of the speed, compression, and memory effects of parameters are dependent on each other. Two nice balances of all factors for the JSON short message service data set (vs defaults) are something like..

context-takeover=on
window bits=11
memory level=4
This provides memory usage of 16KiB/connection vs 256KiB, has no runtime speed penalty, and achieves a 0.384 vs 0.304 compression ratio.

context-takeover=on
window bits=11
memory level=1
This provides memory usage of 5KiB/connection vs 256KiB, runs ~15% slower, and achieves a similar 0.385 vs 0.304 compression ratio.

The ws-pmce-stats tool can help you plug in values and get a sense for which combinations of settings are optimal for your traffic mix. In general I have found that the shorter your messages, the less you benefit from high window bit and memory level values. If you routinely send WebSocket messages with payloads in the high hundreds of KB or MBs you will benefit from higher values for memory level and window bits. If you have extremely limited memory, no context takeover will allow fixed memory usage for all connections. Its price is heavy for small messages & JSON protocols, but less problematic for large ones. I've found that 11 window bits and compression memory level 4 is still quite effective even up to message payloads of ~200KB.

I'd love to hear any feedback anyone has about the methods or results. I am particularly interested in collecting more sample WebSocket workflows. I haven't run any numbers for binary connections yet. I'd love to hear details about other workflows that might have different properties than the ones studied here so far, especially if you have sample transcripts.

I'd also be interested in any feedback on the ws-pmce-stats program. Is something like this useful to anyone else? It meets my needs right now, but I have a few ideas of how to expand it (binary message support, other compression algorithms, machine readable output) if it sounds useful to others.

Re: [hybi] permessage-deflate performance tuning … Joakim Erdfelt
[hybi] permessage-deflate performance tuning stat… Peter Thorson
Re: [hybi] permessage-deflate performance tuning … Tobias Oberstein
Re: [hybi] permessage-deflate performance tuning … Peter Thorson
Re: [hybi] permessage-deflate performance tuning … Adam Rice
Re: [hybi] permessage-deflate performance tuning … Tobias Oberstein