Re: Design Issue: GZIP flag on DATA Frames

Frédéric Kayser <> Tue, 21 May 2013 23:18 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id DD86A21F9590 for <>; Tue, 21 May 2013 16:18:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -7.649
X-Spam-Status: No, score=-7.649 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HELO_EQ_FR=0.35, MANGLED_WRLDWD=2.3, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_HI=-8]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id f0nMVolE82JC for <>; Tue, 21 May 2013 16:17:56 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 3199121F958D for <>; Tue, 21 May 2013 16:17:55 -0700 (PDT)
Received: from lists by with local (Exim 4.72) (envelope-from <>) id 1Uevmw-0000SD-W4 for; Tue, 21 May 2013 23:16:15 +0000
Resent-Date: Tue, 21 May 2013 23:16:14 +0000
Resent-Message-Id: <>
Received: from ([]) by with esmtp (Exim 4.72) (envelope-from <>) id 1Uevmk-0000Ph-ME for; Tue, 21 May 2013 23:16:02 +0000
Received: from ([]) by with esmtp (Exim 4.72) (envelope-from <>) id 1Uevme-00073Y-Ue for; Tue, 21 May 2013 23:16:02 +0000
Received: from [] (unknown []) by (Postfix) with ESMTP id 6B1E4D480BB for <>; Wed, 22 May 2013 01:15:30 +0200 (CEST)
Content-Type: text/plain; charset=windows-1252
Mime-Version: 1.0 (Apple Message framework v1085)
From: =?iso-8859-1?Q?Fr=E9d=E9ric_Kayser?= <>
In-Reply-To: <>
Date: Wed, 22 May 2013 01:15:28 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <> <> <>
X-Mailer: Apple Mail (2.1085)
Received-SPF: none client-ip=;;
X-W3C-Hub-Spam-Status: No, score=-3.4
X-W3C-Hub-Spam-Report: AWL=-3.404, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001
X-W3C-Scan-Sig: 1Uevme-00073Y-Ue f3b582e4cdb453c624a096395a3905da
Subject: Re: Design Issue: GZIP flag on DATA Frames
Archived-At: <>
X-Mailing-List: <> archive/latest/18081
Precedence: list
List-Id: <>
List-Help: <>
List-Post: <>
List-Unsubscribe: <>

HTML5 accepts only UTF-8 encoding, there are better ways than Deflate to compress Unicode texts, bzip2 to start with. Deflate has no clue about UTF-8 since it is byte oriented, its search window is limited to 32 kilo bytes (in UTF-8 a single Devanagari character –used for Hindi and other languages in India– takes 3 bytes which seriously reduces the size of text that can actually be used as a reference for string matching, the same goes for other scripts like Cyrillic (Russian, Ukrainian, Bulgarian…), Greek,  Hebrew, Arabic… since they can no longer rely on single byte charsets and UTF-8 means 2 bytes per character for those).

For web performance having a compression scheme that could recognize and reverse/redo base64 encoding (Data URI, RFC2397) to handle "binary blobs" inside text files would be very appreciated.

Deflate misses some flexibility since it has no super fast mode à la LZ4 that would still provide decent compression but at much lower CPU cost (no entropy coding), nor something heavier on the other side (LZMA like).

Deflate was a nice compression scheme in the 90s, but the World (Wide Web) has changed since the 90s, look how archivers handle text files nowadays: they switch to PPMd, bzip2… because Deflate is outdated.

Compressing the headers is a good idea, but thinking about new compression schemes for the payload should not be overlooked.

Frédéric Kayser

Le 21 mai 2013 à 19:17, Poul-Henning Kamp a écrit :

> In message <>de>, Roland Zink writes:
>> This seem to make the introduction of new compression schemes more complex.
> And what is the plausibility that any new compression schemes will ever
> make that worth-while ?
> It's not nill, but it makes a convincing impression of nill.
> -- 
> Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
> phk@FreeBSD.ORG         | TCP/IP since RFC 956
> FreeBSD committer       | BSD since 4.3-tahoe    
> Never attribute to malice what can adequately be explained by incompetence.