Re: [hybi] preliminary WebSockets compression experiments

Mike Belshe <mike@belshe.com> Mon, 26 April 2010 16:29 UTC

Return-Path: <mike@belshe.com>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 2D06F28C1BE for <hybi@core3.amsl.com>; Mon, 26 Apr 2010 09:29:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.276
X-Spam-Level:
X-Spam-Status: No, score=0.276 tagged_above=-999 required=5 tests=[AWL=-0.348, BAYES_50=0.001, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XFjCuJQfFMZh for <hybi@core3.amsl.com>; Mon, 26 Apr 2010 09:29:47 -0700 (PDT)
Received: from mail-pv0-f172.google.com (mail-pv0-f172.google.com [74.125.83.172]) by core3.amsl.com (Postfix) with ESMTP id 1BC733A69BF for <hybi@ietf.org>; Mon, 26 Apr 2010 09:28:26 -0700 (PDT)
Received: by pvg11 with SMTP id 11so673925pvg.31 for <hybi@ietf.org>; Mon, 26 Apr 2010 09:28:12 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.143.24.24 with SMTP id b24mr2095524wfj.180.1272299291856; Mon, 26 Apr 2010 09:28:11 -0700 (PDT)
Received: by 10.143.44.3 with HTTP; Mon, 26 Apr 2010 09:28:11 -0700 (PDT)
In-Reply-To: <4BD59DA0.6060506@webtide.com>
References: <q2z3f94964f1004231247zc7b60dc3l5fbb4748d129c3c@mail.gmail.com> <z2o2a10ed241004231448l7a63e329p98e04fbe1a750539@mail.gmail.com> <4BD59DA0.6060506@webtide.com>
Date: Mon, 26 Apr 2010 09:28:11 -0700
Message-ID: <k2m2a10ed241004260928q2448b8f6nae7c13db81a97579@mail.gmail.com>
From: Mike Belshe <mike@belshe.com>
To: Greg Wilkins <gregw@webtide.com>
Content-Type: multipart/alternative; boundary="001636e0a6ae56cf4b04852643ae"
Cc: hybi@ietf.org
Subject: Re: [hybi] preliminary WebSockets compression experiments
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Apr 2010 16:29:49 -0000

On Mon, Apr 26, 2010 at 7:05 AM, Greg Wilkins <gregw@webtide.com> wrote:

> Mike,
>
> some slightly off topic thinking out loud on my idea that websockets
> should be able to eventually replace the framing layer of SPDY....
>
> The problem that SPDY poses for efficient compression implemented in the
> framing layer is that it's frames may contain both headers and
> a content body.
>

The most efficient compression will always come from the entity which knows
the most about what it is compressing.   Compressing headers separately from
other content compresses more efficiently than if you mix compression of
headers and data.  (Imagine a gzip buffer with 32KB of window where you
first send 500B of headers and then 50KB of data through it - by the time
your next chunk of headers come through, all state from the prior headers
has been lost.  And the data that you sent through might have been
compressed anyway!)

I welcome you to test out this theory - I've done so and I know the answer
:-)   As you can see, header-specific compression is a feature.

Also - remember the note about zlib memory footprint.  Using an 11bit window
size works just fine for headers because they are generally small, and this
keeps overall zlib memory pretty low (25K or so?).  But, if you want a
general compressor, you'll need the full 15bit compression window if you
want decent efficiency, and then you're looking at 256KB of RAM for each
zlib compression state instance.  I know, you could use a different
compressor - but most compressors will have a similar property.

The point is that a content-specific compressor will almost always
outperform a generic compressor in efficiency, speed, and footprint.



>
> The headers are very compressible, but SPDY will probably have
> already compressed them. The content body may be very compressible
> (eg HTML) or not (eg JPG).
>
> So if we were to run SPDY over websocket, then it would be unlikely
> that the compression mechanism in websocket would be used, as SPDY
> will have already compressed what is compressible.   It strikes me
> as somewhat duplicated/wasted effort to have compression in both
> layers.
>

Roberto tried to convey why optional compression is a bad thing sometimes.
 Let me use a real world example.

We've wondered here at Google why so many requests come from browsers
claiming not to support compression.  After a lot of investigation (I was
not involved with this work), we ran into client requests which contained
headers like:
     Accept-Xncoding: gzip, deflate

What is that?

Well, it turns out that some anti-virus providers didn't want to support
compression in payloads.  Doing so meant that sometimes they couldn't
inspect the payloads, and they were too lazy to implement the basic,
well-known compressors.  So these anti-virus programs watched for outbound
HTTP requests and intentionally mangled the browser's outbound
"Accept-Encoding" header.  This prevents the server from knowing the browser
supports compression, and thus the anti-virus software can look through the
result without having to do any decompress work.   But what happened here?
 The poor user in this case just had the single-worst-thing happen to his
browser - no compression at all!!!  And all because of some lazy
man-in-the-middle.

This is an example where too much flexibility over an insecure pipe bites
you.  And the best answer is to have the protocol help avoid it.  You can do
so by encrypting everything to avoid tampering, or you can avoid it by
mandating compression.  If compression really is a key feature of the
protocol, then allowing some implementations to turn it off is a bad idea.
 It will come back to haunt you.

Summary:
  * Compression is always best handled by the party that knows what is being
compressed.  (Compress at the highest layer possible)
  * Generic stream-compression of unknown content is okay, but not great.
 (because the content may well already be compressed)
  * Don't allow endpoints to not-support compression.  If they're allowed to
skip it, you end up with a critical feature missing for some poor users.
  * Don't allow intermediaries to disable your compression. This can only be
done by removing configuration flags/bits or by encrypting everything.



>
> Ideally websockets should be able to provide framing compression
> sufficient so that SPDY (or any new application layer) would not
> be required to have it's own compression mechanisms.
>

I don't buy it for the reasons above.  But if you can prove me wrong, then
great!



>
> For SPDY, the problem of using framing compression is that the
> decision to compress the headers and/or the body is separate.
>

Why is this a problem?


> So perhaps the frame batch concept in BWTP might be useful
> after all - so that a HTTP message could be a batch of a
> two frames, one (normally compressed) containing the headers
> and another (optionally compressed) containing the body.
>

Okay.  I didn't have a problem with the batch frame - it's just a different
syntax.

Mike




>
> Note also that compression efficiency means that the current
> 1 header field per frame approach in the latest BWTP draft
> is a non starter.
>
>
>
> cheers
>
>
>
>