Re: [hybi] Payload only compression extension, again

Greg Wilkins <gregw@intalio.com> Mon, 02 May 2011 21:59 UTC

Return-Path: <gregw@intalio.com>
X-Original-To: hybi@ietfa.amsl.com
Delivered-To: hybi@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4221FE07D3 for <hybi@ietfa.amsl.com>; Mon, 2 May 2011 14:59:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.921
X-Spam-Level:
X-Spam-Status: No, score=-2.921 tagged_above=-999 required=5 tests=[AWL=0.056, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hv+2JjDiEIcn for <hybi@ietfa.amsl.com>; Mon, 2 May 2011 14:59:38 -0700 (PDT)
Received: from mail-qy0-f172.google.com (mail-qy0-f172.google.com [209.85.216.172]) by ietfa.amsl.com (Postfix) with ESMTP id A5AF3E0797 for <hybi@ietf.org>; Mon, 2 May 2011 14:59:37 -0700 (PDT)
Received: by qyk29 with SMTP id 29so1637586qyk.10 for <hybi@ietf.org>; Mon, 02 May 2011 14:59:36 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.229.77.142 with SMTP id g14mr993822qck.10.1304373576059; Mon, 02 May 2011 14:59:36 -0700 (PDT)
Received: by 10.229.73.147 with HTTP; Mon, 2 May 2011 14:59:36 -0700 (PDT)
In-Reply-To: <BANLkTi=L=8r7dCkRa6MTC3AGziZWeM+fpA@mail.gmail.com>
References: <AANLkTik2LqCC2-ZLLdWNNaQ18ypcQU_5djJobkYtYk6T@mail.gmail.com> <AANLkTik+uh98b0n7U=xrE0Aaa7MyBfZVXSwj+8wfVTKW@mail.gmail.com> <AANLkTinCtDepu+wDt4=8GyXqhfn=SQ7v2SjJhKzP2Mzr@mail.gmail.com> <AANLkTinhw0j5U_tvfCCrcEx=J6b7wBua4XzhWkvthUjL@mail.gmail.com> <BANLkTi=SjQwGQu-3v2wjniyp9DrQ1ZcQdA@mail.gmail.com> <BANLkTi=dqFN-57GV3rYDv4feTAaZFQko1g@mail.gmail.com> <BANLkTikWFwfs0FOuET5ZS1HEzjweNO0_CA@mail.gmail.com> <BANLkTikHXtVM+7nfz60toKTJCXBuMwMC1g@mail.gmail.com> <BANLkTinsMu+Znbg7Fe7+9HZeZg=Q8SwDHg@mail.gmail.com> <4DB8D3B2.8090002@mozilla.com> <BANLkTimE9qYEvBVLGbOsU7YDVGDwHpGjgA@mail.gmail.com> <BANLkTi=L=8r7dCkRa6MTC3AGziZWeM+fpA@mail.gmail.com>
Date: Tue, 03 May 2011 07:59:36 +1000
Message-ID: <BANLkTi=-msm-ppt1n_2U5+bHoZO1i+Yj7A@mail.gmail.com>
From: Greg Wilkins <gregw@intalio.com>
To: John Tamplin <jat@google.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
Cc: "hybi@ietf.org" <hybi@ietf.org>, Patrick McManus <pmcmanus@mozilla.com>
Subject: Re: [hybi] Payload only compression extension, again
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 02 May 2011 21:59:40 -0000

On 3 May 2011 00:51, John Tamplin <jat@google.com> wrote:
> On Mon, May 2, 2011 at 5:27 AM, Greg Wilkins <gregw@intalio.com> wrote:
>>
>> In jetty trunk I have an implementation of an extension I've called
>> x-deflate-frame for now.
>>
>> It uses 1 reserve bit to indicate if the payload is compressed.
>> If the payload is compressed, then it is sent as:
>>
>>    uncompressed-length  deflated-data
>>
>> The uncompressed-length is encoded in the same format as the websocket
>> frame length.  The uncompressed length is sent so that a buffer may be
>> allocated before starting to inflate the data.
>
>  Is sufficient value gained by this to justify the cost?

The cost is trivial.

The benefit is to be able to inflate into an allocated buffer of the
correct size.  Without knowing the size, then you will either over
allocate, or will need a growing buffer, which involves data copies.
  As the inflated size may be many many times larger than the deflated
size, I think that it is worth investing 1,3 or 9 bytes to avoid
another arbitrary limit or extra copying.

I started prototyping without this length and it was a good
simplification to the code when I decided to try it.  Sure there are
libraries that will hide the variable/unknown length better than the
java libraries, but they are just smoothing over significant costs in
allocation and data copying.

>> The extension supports a minLength parameter (default 64) and will not
>> attempt to compress payloads that are smaller than that length.
>
> What is the value of making this configurable on a per-connection basis?
>  Will this be exposed to the JS API?  If not, how will the browser come up
> with a value here?

Probably no great need to have this configurable.  I was mostly
testing extension parameters with this (but also modelling gzip
filters which do appear to make this configurable).   If the client
does not send the parameter, the server still responds with the
default value, so the client know it.


>> When compressing the data, if the combined uncompressed-length and
>> deflated-data is longer than the original data, then the original data
>> frame is sent instead.
>
> Maintaining compression state across frames (which I think is required for
> this to be useful) and doing this requires either being able to save/restore
> the compression state or saying that even the case of the compressed data
> growing in size and being sent uncompressed, it still affects the
> compression state.  The former imposes some constraints on the
> implementation, and the latter imposes a performance penalty on the receiver
> (and the sender when it decides for other reasons, such as knowing the type
> of data being sent, not to compress the data).

The later also has cost on the server as it prevents simple buffer
management.    If can deflate into a buffer the same size as the raw
data, then you avoid reallocations and data copies - which are the
last things you want to do if you have discovered that the deflation
is increasing the size.


>> Currently I'm just using the standard java Inflater/Deflater  gzip
>> classes with a reset between frames.  Not exactly sure what that means
>> with regards to the algorithm details that Takeshi has specified, but
>> I think they are close.
>
> See the compression experiments I posted last May - if you don't maintain
> compression state between frames, then you lose most of the benefit of
> compression.  Being able to send uncompressed frames cheaply avoids the cost
> of this, but doesn't fix not getting the benefit of compression.
> Unfortunately, Java's default deflate implementation doesn't allow this, as
> you need to flush out the compressed bits at the end of a frame (though you
> can play tricks with changing the compression level to 0 and back, but that
> seems non-portable).  I believe you can do it with jzlib though.

Yes, the java std lib does appear lacking in this regard.  As I said,
I'm just using it for prototyping and don't think that it's
deficiencies should greatly influence any effort to standardise such
an extension - specially if better libraries are available (argh! they
have license clashes and eclipse IP issues... still not a problem for
this group).

cheers