Re: Design Issue: GZIP flag on DATA Frames

Frédéric Kayser <f.kayser@free.fr> Tue, 21 May 2013 23:18 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DD86A21F9590 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 21 May 2013 16:18:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.649
X-Spam-Level:
X-Spam-Status: No, score=-7.649 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HELO_EQ_FR=0.35, MANGLED_WRLDWD=2.3, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id f0nMVolE82JC for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 21 May 2013 16:17:56 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 3199121F958D for <httpbisa-archive-bis2Juki@lists.ietf.org>; Tue, 21 May 2013 16:17:55 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1Uevmw-0000SD-W4 for ietf-http-wg-dist@listhub.w3.org; Tue, 21 May 2013 23:16:15 +0000
Resent-Date: Tue, 21 May 2013 23:16:14 +0000
Resent-Message-Id: <E1Uevmw-0000SD-W4@frink.w3.org>
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <f.kayser@free.fr>) id 1Uevmk-0000Ph-ME for ietf-http-wg@listhub.w3.org; Tue, 21 May 2013 23:16:02 +0000
Received: from smtp5-g21.free.fr ([212.27.42.5]) by lisa.w3.org with esmtp (Exim 4.72) (envelope-from <f.kayser@free.fr>) id 1Uevme-00073Y-Ue for ietf-http-wg@w3.org; Tue, 21 May 2013 23:16:02 +0000
Received: from [192.168.0.1] (unknown [81.56.127.176]) by smtp5-g21.free.fr (Postfix) with ESMTP id 6B1E4D480BB for <ietf-http-wg@w3.org>; Wed, 22 May 2013 01:15:30 +0200 (CEST)
Content-Type: text/plain; charset=windows-1252
Mime-Version: 1.0 (Apple Message framework v1085)
From: =?iso-8859-1?Q?Fr=E9d=E9ric_Kayser?= <f.kayser@free.fr>
In-Reply-To: <4050.1369156663@critter.freebsd.dk>
Date: Wed, 22 May 2013 01:15:28 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <BC687C49-F0A9-498E-8303-2C1D46CA57D4@free.fr>
References: <CABP7Rbfb92Vxrmxj6fKdt+jpO_Qknq8FRjsu5GZW=17uoi4OFg@mail.gmail.com> <519BAB26.2010501@zinks.de> <4050.1369156663@critter.freebsd.dk>
To: ietf-http-wg@w3.org
X-Mailer: Apple Mail (2.1085)
Received-SPF: none client-ip=212.27.42.5; envelope-from=f.kayser@free.fr; helo=smtp5-g21.free.fr
X-W3C-Hub-Spam-Status: No, score=-3.4
X-W3C-Hub-Spam-Report: AWL=-3.404, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001
X-W3C-Scan-Sig: lisa.w3.org 1Uevme-00073Y-Ue f3b582e4cdb453c624a096395a3905da
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Design Issue: GZIP flag on DATA Frames
Archived-At: <http://www.w3.org/mid/BC687C49-F0A9-498E-8303-2C1D46CA57D4@free.fr>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/18081
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

Hello,
HTML5 accepts only UTF-8 encoding, there are better ways than Deflate to compress Unicode texts, bzip2 to start with. Deflate has no clue about UTF-8 since it is byte oriented, its search window is limited to 32 kilo bytes (in UTF-8 a single Devanagari character –used for Hindi and other languages in India– takes 3 bytes which seriously reduces the size of text that can actually be used as a reference for string matching, the same goes for other scripts like Cyrillic (Russian, Ukrainian, Bulgarian…), Greek,  Hebrew, Arabic… since they can no longer rely on single byte charsets and UTF-8 means 2 bytes per character for those).

For web performance having a compression scheme that could recognize and reverse/redo base64 encoding (Data URI, RFC2397) to handle "binary blobs" inside text files would be very appreciated.

Deflate misses some flexibility since it has no super fast mode à la LZ4 that would still provide decent compression but at much lower CPU cost (no entropy coding), nor something heavier on the other side (LZMA like).

Deflate was a nice compression scheme in the 90s, but the World (Wide Web) has changed since the 90s, look how archivers handle text files nowadays: they switch to PPMd, bzip2… because Deflate is outdated.

Compressing the headers is a good idea, but thinking about new compression schemes for the payload should not be overlooked.

Regards
Frédéric Kayser

Le 21 mai 2013 à 19:17, Poul-Henning Kamp a écrit :

> In message <519BAB26.2010501@zinks.de>de>, Roland Zink writes:
> 
>> This seem to make the introduction of new compression schemes more complex.
> 
> And what is the plausibility that any new compression schemes will ever
> make that worth-while ?
> 
> It's not nill, but it makes a convincing impression of nill.
> 
> -- 
> Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
> phk@FreeBSD.ORG         | TCP/IP since RFC 956
> FreeBSD committer       | BSD since 4.3-tahoe    
> Never attribute to malice what can adequately be explained by incompetence.