Re: [hybi] permessage-deflate performance tuning statistics

Nice.  Was meaning to do this myself, but just haven't had the free time to
devote to it.
I imagine that the results of this will serve the community well.

Is your experimental data testing if fragmentation has any impact? (lots of
smaller fragments for a larger message, commonly seen in streaming
behaviors)

Are you testing just a single type of data on the connection?
Are you testing multiple types of data during the same connection, such as
a theoretical protocol that has markup + metadata + and pre compressed
media?

--
Joakim Erdfelt <joakim@intalio.com>
webtide.com <http://www.webtide.com/> - intalio.com/jetty
Expert advice, services and support from from the Jetty & CometD experts
eclipse.org/jetty - cometd.org

On Tue, Oct 15, 2013 at 11:34 AM, Peter Thorson <webmaster@zaphoyd.com>wrote:

> Hi all,
>
> I've been doing a bit of research and testing on the compression
> performance and memory usage of permessage-deflate on WebSocket like
> workloads. I plan to write about this with more final numbers once the spec
> is official, but some of the intermediate results and tools may be of
> interest to this group during the standardization process so I'm sharing
> some of those notes here.
>
> Some highlights:
> - Permessage-deflate offers significant bandwidth savings.
> - How the extension performs depends greatly on the type of data that is
> being compressed and the compression settings given to deflate.
> - Settings that work well for HTTP and the zlib/permessage-deflate
> defaults are inefficient for some common WebSocket workflows.
> - The two parameters presently in the draft specification both provide
> significant and meaningful options for tuning compression performance for
> those workflows. Implementations (especially browsers) are greatly
> encouraged to support all options.
>
> Details & Methods:
>
> My goal is to explore a number of the settings offered by deflate and
> determine what effect they have on compression performance, as well as
> CPU/memory usage. To this end I have written a tool (
> https://github.com/zaphoyd/ws-pmce-stats) that will produce a report of
> compression related statistics when fed a transcript of messages.
>
> The first workflow I have explored in detail is a WebSocket service that
> uses a JSON based protocol to deliver short streaming updates. My some of
> my sample data is present in the datasets folder of the above git
> repository. Some examples include a mock chat service with data seeded from
> publicly logged mediawiki IRC channel and a mock stock ticker service with
> data seeded with historical stock quote data.
>
> I explored the effects of the context takeover and window bits settings
> from the permessage-deflate draft spec as well as a few zlib settings that
> can be unilaterally specified without any extension negotiation. Some, but
> not all of these, are presently exposed in higher level languages that use
> zlib as their underlying compression library. I looked at two of these
> settings in particular, the "Compression Level" and the "Memory Level". The
> former affects speed vs compression ratio, the latter memory usage vs
> compression ratio.
>
> Preliminary results for the JSON short message service workflow:
>
> Context Takeover
> ================
> Allowing context takeover drastically improves compression ratios. With
> other stats at defaults, no_context_takeover achieves a compression ratio
> of 0.84, with takeover 0.30. This is a significant gain. Note: this gain
> comes at a fairly high cost. Enabling context takeover requires a separate
> context to be maintained for every connection, rather than fixed number for
> all connections.
>
> Window Bits
> ===========
> Window bits has a sizable but well distributed effect on ratios. It has a
> significant effect on memory usage though. With all other stats at defaults:
> window bits = compression ratio / buffer size per connection
> 08 = 0.510 / 1+128=129KiB
> 09 = 0.510 / 2+128=130KiB
> 10 = 0.435 / 4+128=132KiB
> 11 = 0.384 / 8+128=136KiB
> 12 = 0.353 / 16+128=144KiB
> 13 = 0.330 / 32+128=160KiB
> 14 = 0.315 / 64+128=192KiB
> 15 = 0.304 / 128+128=256KiB
> Reducing window bits from the default (15) to 11 provides an 8% reduction
> in compression but nearly a 50% savings in per connection memory usage.
> Reduced window bits to very small values (8-9) also increases compression
> runtime by 40-50%. 10 Is less slow, 11+ appear to all be about the same
> speed.
>
> Compression Level
> =================
> Compression level does not have a material impact on performance or ratios
> for this workflow.
>
> Memory Level
> ============
> Memory level does not have a significant impact on compression ratios. A
> value of 9 produces the ratio 0.304 and a value of 1 produces the ratio
> 0.307. It does affect memory usage and compression speed however:
> mem_level value = runtime / memory usage
> 1 = 13.05ms / 128+1=129KiB
> 2 = 10.41ms / 128+2=130KiB
> 3 = 10.15ms / 128+4=132KiB
> 4 = 8.18ms / 128+8=136KiB
> 5 = 7.63ms / 128+16=144KiB
> 6 = 7.69ms / 128+32=160KiB
> 7 = 7.92ms / 128+64=192KiB
> 8 = 7.69ms / 128+128=256KiB
> 9 = 7.84ms / 128+256=386KiB
>
> All of the stats above show the effects of changing one parameter in
> isolation. Additional gains, especially with respect to memory usage per
> connection can be had by combinations of parameters. Many of the speed,
> compression, and memory effects of parameters are dependent on each other.
> Two nice balances of all factors for the JSON short message service data
> set (vs defaults) are something like..
>
> context-takeover=on
> window bits=11
> memory level=4
> This provides memory usage of 16KiB/connection vs 256KiB, has no runtime
> speed penalty, and achieves a 0.384 vs 0.304 compression ratio.
>
> context-takeover=on
> window bits=11
> memory level=1
> This provides memory usage of 5KiB/connection vs 256KiB, runs ~15% slower,
> and achieves a similar 0.385 vs 0.304 compression ratio.
>
> The ws-pmce-stats tool can help you plug in values and get a sense for
> which combinations of settings are optimal for your traffic mix. In general
> I have found that the shorter your messages, the less you benefit from high
> window bit and memory level values. If you routinely send WebSocket
> messages with payloads in the high hundreds of KB or MBs you will benefit
> from higher values for memory level and window bits. If you have extremely
> limited memory, no context takeover will allow fixed memory usage for all
> connections. Its price is heavy for small messages & JSON protocols, but
> less problematic for large ones. I've found that 11 window bits and
> compression memory level 4 is still quite effective even up to message
> payloads of ~200KB.
>
> I'd love to hear any feedback anyone has about the methods or results. I
> am particularly interested in collecting more sample WebSocket workflows. I
> haven't run any numbers for binary connections yet. I'd love to hear
> details about other workflows that might have different properties than the
> ones studied here so far, especially if you have sample transcripts.
>
> I'd also be interested in any feedback on the ws-pmce-stats program. Is
> something like this useful to anyone else? It meets my needs right now, but
> I have a few ideas of how to expand it (binary message support, other
> compression algorithms, machine readable output) if it sounds useful to
> others.
> _______________________________________________
> hybi mailing list
> hybi@ietf.org
> https://www.ietf.org/mailman/listinfo/hybi
>