Re: [hybi] permessage-deflate performance tuning statistics

Tobias Oberstein <tobias.oberstein@tavendo.de> Tue, 15 October 2013 19:01 UTC

Return-Path: <tobias.oberstein@tavendo.de>
X-Original-To: hybi@ietfa.amsl.com
Delivered-To: hybi@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2289421F9FAE for <hybi@ietfa.amsl.com>; Tue, 15 Oct 2013 12:01:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[AWL=0.000, BAYES_00=-2.599]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id S94DYdbJbBCg for <hybi@ietfa.amsl.com>; Tue, 15 Oct 2013 12:01:51 -0700 (PDT)
Received: from EXHUB020-1.exch020.serverdata.net (exhub020-1.exch020.serverdata.net [206.225.164.28]) by ietfa.amsl.com (Postfix) with ESMTP id 1DC3D11E8156 for <hybi@ietf.org>; Tue, 15 Oct 2013 12:01:47 -0700 (PDT)
Received: from EXVMBX020-12.exch020.serverdata.net ([169.254.3.240]) by EXHUB020-1.exch020.serverdata.net ([206.225.164.28]) with mapi; Tue, 15 Oct 2013 12:01:46 -0700
From: Tobias Oberstein <tobias.oberstein@tavendo.de>
To: Peter Thorson <webmaster@zaphoyd.com>, "hybi@ietf.org" <hybi@ietf.org>
Date: Tue, 15 Oct 2013 12:01:43 -0700
Thread-Topic: [hybi] permessage-deflate performance tuning statistics
Thread-Index: Ac7J1TgI93EbKEIiSiKEI1JXVmJz1QAA2mgg
Message-ID: <634914A010D0B943A035D226786325D44469B06DCF@EXVMBX020-12.exch020.serverdata.net>
References: <FD138330-7D7E-4450-B4F5-64551F92F26D@zaphoyd.com>
In-Reply-To: <FD138330-7D7E-4450-B4F5-64551F92F26D@zaphoyd.com>
Accept-Language: de-DE, en-US
Content-Language: de-DE
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: de-DE, en-US
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [hybi] permessage-deflate performance tuning statistics
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Oct 2013 19:01:58 -0000

Hi Peter,

this is fantastic empirical evidence - very much appreciated!

As a first concrete action from this, I'll take the following issue to the Python
community:

"Expose the 'Memory Level' knob on the Python zlib wrapper"

Cheers,
Tobias
 
> -----Ursprüngliche Nachricht-----
> Von: hybi-bounces@ietf.org [mailto:hybi-bounces@ietf.org] Im Auftrag von
> Peter Thorson
> Gesendet: Dienstag, 15. Oktober 2013 20:34
> An: hybi@ietf.org
> Betreff: [hybi] permessage-deflate performance tuning statistics
> 
> Hi all,
> 
> I've been doing a bit of research and testing on the compression
> performance and memory usage of permessage-deflate on WebSocket like
> workloads. I plan to write about this with more final numbers once the spec
> is official, but some of the intermediate results and tools may be of interest
> to this group during the standardization process so I'm sharing some of those
> notes here.
> 
> Some highlights:
> - Permessage-deflate offers significant bandwidth savings.
> - How the extension performs depends greatly on the type of data that is
> being compressed and the compression settings given to deflate.
> - Settings that work well for HTTP and the zlib/permessage-deflate defaults
> are inefficient for some common WebSocket workflows.
> - The two parameters presently in the draft specification both provide
> significant and meaningful options for tuning compression performance for
> those workflows. Implementations (especially browsers) are greatly
> encouraged to support all options.
> 
> Details & Methods:
> 
> My goal is to explore a number of the settings offered by deflate and
> determine what effect they have on compression performance, as well as
> CPU/memory usage. To this end I have written a tool
> (https://github.com/zaphoyd/ws-pmce-stats) that will produce a report of
> compression related statistics when fed a transcript of messages.
> 
> The first workflow I have explored in detail is a WebSocket service that uses a
> JSON based protocol to deliver short streaming updates. My some of my
> sample data is present in the datasets folder of the above git repository.
> Some examples include a mock chat service with data seeded from publicly
> logged mediawiki IRC channel and a mock stock ticker service with data
> seeded with historical stock quote data.
> 
> I explored the effects of the context takeover and window bits settings from
> the permessage-deflate draft spec as well as a few zlib settings that can be
> unilaterally specified without any extension negotiation. Some, but not all of
> these, are presently exposed in higher level languages that use zlib as their
> underlying compression library. I looked at two of these settings in particular,
> the "Compression Level" and the "Memory Level". The former affects speed
> vs compression ratio, the latter memory usage vs compression ratio.
> 
> Preliminary results for the JSON short message service workflow:
> 
> Context Takeover
> ================
> Allowing context takeover drastically improves compression ratios. With
> other stats at defaults, no_context_takeover achieves a compression ratio of
> 0.84, with takeover 0.30. This is a significant gain. Note: this gain comes at a
> fairly high cost. Enabling context takeover requires a separate context to be
> maintained for every connection, rather than fixed number for all
> connections.
> 
> Window Bits
> ===========
> Window bits has a sizable but well distributed effect on ratios. It has a
> significant effect on memory usage though. With all other stats at defaults:
> window bits = compression ratio / buffer size per connection
> 08 = 0.510 / 1+128=129KiB
> 09 = 0.510 / 2+128=130KiB
> 10 = 0.435 / 4+128=132KiB
> 11 = 0.384 / 8+128=136KiB
> 12 = 0.353 / 16+128=144KiB
> 13 = 0.330 / 32+128=160KiB
> 14 = 0.315 / 64+128=192KiB
> 15 = 0.304 / 128+128=256KiB
> Reducing window bits from the default (15) to 11 provides an 8% reduction in
> compression but nearly a 50% savings in per connection memory usage.
> Reduced window bits to very small values (8-9) also increases compression
> runtime by 40-50%. 10 Is less slow, 11+ appear to all be about the same
> speed.
> 
> Compression Level
> =================
> Compression level does not have a material impact on performance or ratios
> for this workflow.
> 
> Memory Level
> ============
> Memory level does not have a significant impact on compression ratios. A
> value of 9 produces the ratio 0.304 and a value of 1 produces the ratio 0.307.
> It does affect memory usage and compression speed however:
> mem_level value = runtime / memory usage
> 1 = 13.05ms / 128+1=129KiB
> 2 = 10.41ms / 128+2=130KiB
> 3 = 10.15ms / 128+4=132KiB
> 4 = 8.18ms / 128+8=136KiB
> 5 = 7.63ms / 128+16=144KiB
> 6 = 7.69ms / 128+32=160KiB
> 7 = 7.92ms / 128+64=192KiB
> 8 = 7.69ms / 128+128=256KiB
> 9 = 7.84ms / 128+256=386KiB
> 
> All of the stats above show the effects of changing one parameter in
> isolation. Additional gains, especially with respect to memory usage per
> connection can be had by combinations of parameters. Many of the speed,
> compression, and memory effects of parameters are dependent on each
> other. Two nice balances of all factors for the JSON short message service
> data set (vs defaults) are something like..
> 
> context-takeover=on
> window bits=11
> memory level=4
> This provides memory usage of 16KiB/connection vs 256KiB, has no runtime
> speed penalty, and achieves a 0.384 vs 0.304 compression ratio.
> 
> context-takeover=on
> window bits=11
> memory level=1
> This provides memory usage of 5KiB/connection vs 256KiB, runs ~15% slower,
> and achieves a similar 0.385 vs 0.304 compression ratio.
> 
> The ws-pmce-stats tool can help you plug in values and get a sense for which
> combinations of settings are optimal for your traffic mix. In general I have
> found that the shorter your messages, the less you benefit from high
> window bit and memory level values. If you routinely send WebSocket
> messages with payloads in the high hundreds of KB or MBs you will benefit
> from higher values for memory level and window bits. If you have extremely
> limited memory, no context takeover will allow fixed memory usage for all
> connections. Its price is heavy for small messages & JSON protocols, but less
> problematic for large ones. I've found that 11 window bits and compression
> memory level 4 is still quite effective even up to message payloads of
> ~200KB.
> 
> I'd love to hear any feedback anyone has about the methods or results. I am
> particularly interested in collecting more sample WebSocket workflows. I
> haven't run any numbers for binary connections yet. I'd love to hear details
> about other workflows that might have different properties than the ones
> studied here so far, especially if you have sample transcripts.
> 
> I'd also be interested in any feedback on the ws-pmce-stats program. Is
> something like this useful to anyone else? It meets my needs right now, but I
> have a few ideas of how to expand it (binary message support, other
> compression algorithms, machine readable output) if it sounds useful to
> others.
> _______________________________________________
> hybi mailing list
> hybi@ietf.org
> https://www.ietf.org/mailman/listinfo/hybi