RE: [lemonade] compression

"Stephane H. Maes" <stephane.maes@oracle.com> Fri, 17 March 2006 02:01 UTC

Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1FK4Hv-0002Ew-6R; Thu, 16 Mar 2006 21:01:59 -0500
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1FK4Ht-0002Er-82 for lemonade@ietf.org; Thu, 16 Mar 2006 21:01:57 -0500
Received: from rgminet01.oracle.com ([148.87.113.118]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1FK4Hq-0003UI-C5 for lemonade@ietf.org; Thu, 16 Mar 2006 21:01:57 -0500
Received: from rgmsgw300.us.oracle.com (rgmsgw300.us.oracle.com [138.1.186.49]) by rgminet01.oracle.com (Switch-3.1.6/Switch-3.1.6) with ESMTP id k2H21rlI010481 for <lemonade@ietf.org>; Thu, 16 Mar 2006 19:01:53 -0700
Received: from rgmsgw300.us.oracle.com (localhost [127.0.0.1]) by rgmsgw300.us.oracle.com (Switch-3.1.7/Switch-3.1.7) with ESMTP id k2H21qZ5004064 for <lemonade@ietf.org>; Thu, 16 Mar 2006 19:01:52 -0700
Received: from us.oracle.com (dhcp-apac-jp-csvpn-gw9-141-144-144-198.vpn.oracle.com [141.144.144.198]) by rgmsgw300.us.oracle.com (Switch-3.1.7/Switch-3.1.7) with ESMTP id k2H21nSs004045 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO) for <lemonade@ietf.org>; Thu, 16 Mar 2006 19:01:51 -0700
From: "Stephane H. Maes" <stephane.maes@oracle.com>
To: Enhancements to Internet email to support diverse service enivronments <lemonade@ietf.org>
Subject: RE: [lemonade] compression
Date: Thu, 16 Mar 2006 18:02:06 -0800
Message-ID: <20060316180206588.00000004288@smaes-lap3>
In-Reply-To: <4419F1C5.9020700@att.com>
X-Mailer: Oracle Connector for Outlook 10.1.2.0.3 71207 (11.0.8010)
X-Accept-Language: en-us, en
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
X-Brightmail-Tracker: AAAAAQAAAAI=
X-Whitelist: TRUE
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 48472a944c87678fcfe8db15ffecdfff
X-BeenThere: lemonade@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: "stephane.maes@oracle.com" <stephane.maes@oracle.com>
List-Id: Enhancements to Internet email to support diverse service enivronments <lemonade.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/lemonade>, <mailto:lemonade-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:lemonade@ietf.org>
List-Help: <mailto:lemonade-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/lemonade>, <mailto:lemonade-request@ietf.org?subject=subscribe>
Errors-To: lemonade-bounces@ietf.org

The comparison on compression seems to be a bit outdated as it seems based on old LZIP spec. Following the agreements from Beijing, we aren't even compressing commands anymore, so the TLS comparison is apples to oranges.

We are only compressing body parts, as such, TLS might win for compressing command syntax, but the one test that wasn't checked is TLS compressing an already compressed part...

Also attachment compression is essential. It is not clear if the test looked at that. IETF mailing lists do not really provide attachment use cases.... :)

So I don't think that it is a clear cut or that the numbers + argument below motivate yet supperiority of ne method over another. At least more evaluations are needed. Note I have no problem switching method if it turns out more efficient.
 
Thanks

Stephane

_____
Stephane H. Maes, PhD,
Director of Architecture, Mobile, Voice, and Communications Platforms, Oracle.
Ph: +1-203-300-7786 (mobile/SMS); Fax / Office UM: +1-650-607-6296
e-mail: stephane.maes@oracle.com
IM: shmaes (AIM/Y!/skype) or stephane_maes@hotmail.com (MSN Messenger) or stephane.maes@gmail.com (Google)
 
-----Original Message-----
From: Tony Hansen [mailto:tony@att.com] 
Sent: Thursday, March 16, 2006 3:16 PM
To: Enhancements to Internet email to support diverse service enivronments
Subject: Re: [lemonade] compression

Given the choice between draft-ietf-lemonade-compress-00.txt and draft-gulbrandsen-imap-deflate-02.txt, I think that the latter, draft-gulbrandsen-imap-deflate-02.txt, is a *better* choice to go forward with as a solution for compression of the IMAP stream.

As I've said before, I think that Arnt's method is much simpler. And as his measurements indicate, it also provides superior performance.

I'd like to nominate draft-gulbrandsen-imap-deflate-02.txt to *replace* draft-ietf-lemonade-compress-00.txt as the group's compression mechanism of choice.

	Tony Hansen
	tony@att.com

Arnt Gulbrandsen wrote:
> Finally following up the short thread last autumn with some numbers, 
> here's a short comparison of RFC 3749, draft-gulbrandsen-imap-deflate 
> and draft-lemonade-lzip.
>
> RFC 3749 and imap-deflate are very similar in terms of compression. In 
> general, RFC 3749 will win over imap-deflate by about 20 bytes, 
> independently of the length of the IMAP connection. Two minutes or two 
> hours, doesn't matter. The difference is because 3749 negotiates 
> compression at a cost of just two bytes (one in the client hello, one 
> in the server).
>
> For the IMAP conversation appended, here are the results comparing 
> 3749/deflate with lzip. These numbers are approximate: I took some 
> working code and hacked it to produce numbers instead of work. I 
> accidentally mangled the line endings, so all the numbers are off by 
> some small factor (1%?).
>
> The thread I used is just a few messages from lemonade. I just picked 
> a random thread, and read a few messages using a fairly popular IMAP 
> client, and played with the TCP connection until all passwords etc. 
> were gone.
>
> There are four commands, each with 1-4 untagged and one tagged response.
> I have numbered them a1 to a4.
>
> Command a1 is compressed by all three to 42 bytes, much worse than its 
> original 22. Its response is compressed to 70 bytes, from 123. The 
> reason they do the same is that all three compressors use the same 
> algorithm and do exactly the same work: Starting compression (which 
> has overhead of about 20 bytes IIRC), learning that the compressed 
> text contains ASCII, etc.
>
> Command a2 is compressed by lzip from 202 to 170 bytes, and by 
> 3749/deflate to 138. The response is compressed by lzip from 3498 to
> 1016 bytes, and by 3749/deflate to 899. lzip does worse than the two 
> others because it pays the startup cost again, while the two others 
> already have trained their compressors to know that there's lots of 
> printable ASCII, '* FETCH' occurs often, etc.
>
> Command a3 is compressed by lzip from 20 to 40 bytes (the same as the 
> first command, really) and by 3749/deflate to 9 bytes. (Lzip pays the 
> startup cost for the seventh time. The compressors in 3749/deflate 
> have learnt from commands a1 and a2 approximately how an IMAP command 
> looks, so they compress even a short command efficiently.) The 
> response tells the same picture: 3749/deflate compresses the response 
> from 2772 to 1112 bytes, lzip manages 1284. The difference here is 
> mostly that 3749/deflate already knows that almost all bytes sent are 
> ASCII words, while for lzip, the compressor has to learn it. There's 
> also the usual 20-byte startup fee.
>
> Command a4 is 40 bytes again with lzip. With 3749/deflate it's 22. The 
> response has an original size of 1078 characters. 3749/deflate 
> compresses that to 280 characters. This result is so good becaus the 
> response is pure text, and it's in the same thread as the a3 response.
> lzip doens't do so well: 559. Its compressor again has to learn 
> everything from scratch, and it can't exploit the fact that threads 
> frequently contain the same words and even sentences.
>
> Summing up: All three lose a bit on their first command, about 20 bytes.
> 3749/deflate win on the second and subsequent command. Lzip wins on 
> the big command, loses on the two small commands. All three win on the 
> responses, but 3749/deflate win much more, sometimes achieving results 
> half the size of lzip's.
>
> I'll be happy to repeat this survey with a bigger corpus, if it's 
> desirable. I don't see the point, myself. The numbers are clear enough 
> already.
>
> Finally, I don't know whether the upstream comparison matters. I 
> included the numbers because I thought they might matter for 
> battery-powered devices where transmitting bytes to the server is 
> expensive. Not my field. Downstream surely matters, and that's where 
> 3749/gulbrandsen win by a large margin.


_______________________________________________
lemonade mailing list
lemonade@ietf.org
https://www1.ietf.org/mailman/listinfo/lemonade


_______________________________________________
lemonade mailing list
lemonade@ietf.org
https://www1.ietf.org/mailman/listinfo/lemonade