Re: [hybi] permessage-deflate performance tuning statistics

Hi Peter,

this is fantastic empirical evidence - very much appreciated!

As a first concrete action from this, I'll take the following issue to the Python
community:

"Expose the 'Memory Level' knob on the Python zlib wrapper"

Cheers,
Tobias

> -----Ursprüngliche Nachricht-----
> Von: hybi-bounces@ietf.org [mailto:hybi-bounces@ietf.org] Im Auftrag von
> Peter Thorson
> Gesendet: Dienstag, 15. Oktober 2013 20:34
> An: hybi@ietf.org
> Betreff: [hybi] permessage-deflate performance tuning statistics
> 
> Hi all,
> 
> I've been doing a bit of research and testing on the compression
> performance and memory usage of permessage-deflate on WebSocket like
> workloads. I plan to write about this with more final numbers once the spec
> is official, but some of the intermediate results and tools may be of interest
> to this group during the standardization process so I'm sharing some of those
> notes here.
> 
> Some highlights:
> - Permessage-deflate offers significant bandwidth savings.
> - How the extension performs depends greatly on the type of data that is
> being compressed and the compression settings given to deflate.
> - Settings that work well for HTTP and the zlib/permessage-deflate defaults
> are inefficient for some common WebSocket workflows.
> - The two parameters presently in the draft specification both provide
> significant and meaningful options for tuning compression performance for
> those workflows. Implementations (especially browsers) are greatly
> encouraged to support all options.
> 
> Details & Methods:
> 
> My goal is to explore a number of the settings offered by deflate and
> determine what effect they have on compression performance, as well as
> CPU/memory usage. To this end I have written a tool
> (https://github.com/zaphoyd/ws-pmce-stats) that will produce a report of
> compression related statistics when fed a transcript of messages.
> 
> The first workflow I have explored in detail is a WebSocket service that uses a
> JSON based protocol to deliver short streaming updates. My some of my
> sample data is present in the datasets folder of the above git repository.
> Some examples include a mock chat service with data seeded from publicly
> logged mediawiki IRC channel and a mock stock ticker service with data
> seeded with historical stock quote data.
> 
> I explored the effects of the context takeover and window bits settings from
> the permessage-deflate draft spec as well as a few zlib settings that can be
> unilaterally specified without any extension negotiation. Some, but not all of
> these, are presently exposed in higher level languages that use zlib as their
> underlying compression library. I looked at two of these settings in particular,
> the "Compression Level" and the "Memory Level". The former affects speed
> vs compression ratio, the latter memory usage vs compression ratio.
> 
> Preliminary results for the JSON short message service workflow:
> 
> Context Takeover
> ================
> Allowing context takeover drastically improves compression ratios. With
> other stats at defaults, no_context_takeover achieves a compression ratio of
> 0.84, with takeover 0.30. This is a significant gain. Note: this gain comes at a
> fairly high cost. Enabling context takeover requires a separate context to be
> maintained for every connection, rather than fixed number for all
> connections.
> 
> Window Bits
> ===========
> Window bits has a sizable but well distributed effect on ratios. It has a
> significant effect on memory usage though. With all other stats at defaults:
> window bits = compression ratio / buffer size per connection
> 08 = 0.510 / 1+128=129KiB
> 09 = 0.510 / 2+128=130KiB
> 10 = 0.435 / 4+128=132KiB
> 11 = 0.384 / 8+128=136KiB
> 12 = 0.353 / 16+128=144KiB
> 13 = 0.330 / 32+128=160KiB
> 14 = 0.315 / 64+128=192KiB
> 15 = 0.304 / 128+128=256KiB
> Reducing window bits from the default (15) to 11 provides an 8% reduction in
> compression but nearly a 50% savings in per connection memory usage.
> Reduced window bits to very small values (8-9) also increases compression
> runtime by 40-50%. 10 Is less slow, 11+ appear to all be about the same
> speed.
> 
> Compression Level
> =================
> Compression level does not have a material impact on performance or ratios
> for this workflow.
> 
> Memory Level
> ============
> Memory level does not have a significant impact on compression ratios. A
> value of 9 produces the ratio 0.304 and a value of 1 produces the ratio 0.307.
> It does affect memory usage and compression speed however:
> mem_level value = runtime / memory usage
> 1 = 13.05ms / 128+1=129KiB
> 2 = 10.41ms / 128+2=130KiB
> 3 = 10.15ms / 128+4=132KiB
> 4 = 8.18ms / 128+8=136KiB
> 5 = 7.63ms / 128+16=144KiB
> 6 = 7.69ms / 128+32=160KiB
> 7 = 7.92ms / 128+64=192KiB
> 8 = 7.69ms / 128+128=256KiB
> 9 = 7.84ms / 128+256=386KiB
> 
> All of the stats above show the effects of changing one parameter in
> isolation. Additional gains, especially with respect to memory usage per
> connection can be had by combinations of parameters. Many of the speed,
> compression, and memory effects of parameters are dependent on each
> other. Two nice balances of all factors for the JSON short message service
> data set (vs defaults) are something like..
> 
> context-takeover=on
> window bits=11
> memory level=4
> This provides memory usage of 16KiB/connection vs 256KiB, has no runtime
> speed penalty, and achieves a 0.384 vs 0.304 compression ratio.
> 
> context-takeover=on
> window bits=11
> memory level=1
> This provides memory usage of 5KiB/connection vs 256KiB, runs ~15% slower,
> and achieves a similar 0.385 vs 0.304 compression ratio.
> 
> The ws-pmce-stats tool can help you plug in values and get a sense for which
> combinations of settings are optimal for your traffic mix. In general I have
> found that the shorter your messages, the less you benefit from high
> window bit and memory level values. If you routinely send WebSocket
> messages with payloads in the high hundreds of KB or MBs you will benefit
> from higher values for memory level and window bits. If you have extremely
> limited memory, no context takeover will allow fixed memory usage for all
> connections. Its price is heavy for small messages & JSON protocols, but less
> problematic for large ones. I've found that 11 window bits and compression
> memory level 4 is still quite effective even up to message payloads of
> ~200KB.
> 
> I'd love to hear any feedback anyone has about the methods or results. I am
> particularly interested in collecting more sample WebSocket workflows. I
> haven't run any numbers for binary connections yet. I'd love to hear details
> about other workflows that might have different properties than the ones
> studied here so far, especially if you have sample transcripts.
> 
> I'd also be interested in any feedback on the ws-pmce-stats program. Is
> something like this useful to anyone else? It meets my needs right now, but I
> have a few ideas of how to expand it (binary message support, other
> compression algorithms, machine readable output) if it sounds useful to
> others.
> _______________________________________________
> hybi mailing list
> hybi@ietf.org
> https://www.ietf.org/mailman/listinfo/hybi