Re: Delta Compression and UTF-8 Header Values

James Cloos <> Tue, 12 February 2013 23:53 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id BD5CD21F898B for <>; Tue, 12 Feb 2013 15:53:16 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -7.028
X-Spam-Status: No, score=-7.028 tagged_above=-999 required=5 tests=[AWL=3.419, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8, SARE_SUB_ENC_UTF8=0.152]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id wVN2zPAtXTdj for <>; Tue, 12 Feb 2013 15:53:13 -0800 (PST)
Received: from ( []) by (Postfix) with ESMTP id 3706721F8972 for <>; Tue, 12 Feb 2013 15:53:12 -0800 (PST)
Received: from lists by with local (Exim 4.72) (envelope-from <>) id 1U5Pcn-0005jL-6Z for; Tue, 12 Feb 2013 23:50:57 +0000
Resent-Date: Tue, 12 Feb 2013 23:50:57 +0000
Resent-Message-Id: <>
Received: from ([]) by with esmtp (Exim 4.72) (envelope-from <>) id 1U5Pcc-0005hP-2J for; Tue, 12 Feb 2013 23:50:46 +0000
Received: from ([]) by with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from <>) id 1U5PcY-0005rZ-02 for; Tue, 12 Feb 2013 23:50:46 +0000
Received: by (Postfix, from userid 10) id 877E6402B1; Tue, 12 Feb 2013 23:49:55 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=eagle; t=1360713019; bh=Vfv0vrr0qbuzTCJxT15TGWLH3Gi54Ap68OLvfafF2Xs=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=s9/mZJSyVdgTKohlkW4NZcUBRvZWyjX/XyS397kP1bSVJ/m5ERC+di7WFJdW3yfW3 eYE4rmE3t5vJwnHm4Q55tLi1e5NgaBqByoaAnr0wmYOzzAJkXXh0wu0vVtyQ4ESata Sou1Lb+RZZVWxlhx3IYXsHhI+OcUdP94vcAbTUM8=
Received: by (Postfix, from userid 500) id 494986007A; Tue, 12 Feb 2013 23:41:09 +0000 (UTC)
From: James Cloos <>
To: Roberto Peon <>
Cc: James M Snell <>, Mark Nottingham <>, "ietf-http-wg\" <>
In-Reply-To: <> (Roberto Peon's message of "Mon, 11 Feb 2013 14:53:01 -0800")
References: <> <> <> <> <>
User-Agent: Gnus/5.130006 (Ma Gnus v0.6) Emacs/24.3.50 (gnu/linux)
Copyright: Copyright 2013 James Cloos
OpenPGP: ED7DAEA6; url=
OpenPGP-Fingerprint: E9E9 F828 61A4 6EA9 0F2B 63E7 997A 9F17 ED7D AEA6
Date: Tue, 12 Feb 2013 18:41:09 -0500
Message-ID: <>
Lines: 26
MIME-Version: 1.0
Content-Type: text/plain
X-Hashcash: 1:30:130212:"ietf-http-wg\"::dln26QBkhfV/ovEv:00000000000000000000000000000000000000QkJIb
Received-SPF: pass client-ip=;;
X-W3C-Hub-Spam-Status: No, score=-3.6
X-W3C-Hub-Spam-Report: AWL=-3.449, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RP_MATCHES_RCVD=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001
X-W3C-Scan-Sig: 1U5PcY-0005rZ-02 12abbad02cb2d181608269dd980388b1
Subject: Re: Delta Compression and UTF-8 Header Values
Archived-At: <>
X-Mailing-List: <> archive/latest/16590
Precedence: list
List-Id: <>
List-Help: <>
List-Post: <>
List-Unsubscribe: <>

>>>>> "RP" == Roberto Peon <> writes:

RP> The header names are almost completely handled with the pre-seeded
RP> dictionary, so they really don't affect the character frequency
RP> count and/or thus the huffman encoding.

RP> Arithmetic coding gets better compression ratios, at the expense of
RP> gobs of CPU and complexity. I don't think that is a good tradeoff :/

It is sometimes hard to guess whether huffman is chosen due to inertia,
arithmetic patent agnst, or good technical reasons.  It is good to know
that in this case it is the latter.

I may not have expressed my primary point quite well enough though:

Although I doubt that right now there is any text in the headers which
is both common enough to warrent inclusion in a static table and not
seven-bit clean, my point was that even if such text shows up over time,
the fact that it is not seven-bit should not prevent its inclusion in
future, extended versions of the static table.  As such specifying that
text is defined to be utf-8 and the use of a static huffman table should
not contra-indicate each other.

James Cloos <>         OpenPGP: 1024D/ED7DAEA6