Re: Delta Compression and UTF-8 Header Values

James Cloos <cloos@jhcloos.com> Tue, 12 February 2013 23:53 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BD5CD21F898B for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 12 Feb 2013 15:53:16 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.028
X-Spam-Level:
X-Spam-Status: No, score=-7.028 tagged_above=-999 required=5 tests=[AWL=3.419, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8, SARE_SUB_ENC_UTF8=0.152]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wVN2zPAtXTdj for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 12 Feb 2013 15:53:13 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 3706721F8972 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Tue, 12 Feb 2013 15:53:12 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1U5Pcn-0005jL-6Z for ietf-http-wg-dist@listhub.w3.org; Tue, 12 Feb 2013 23:50:57 +0000
Resent-Date: Tue, 12 Feb 2013 23:50:57 +0000
Resent-Message-Id: <E1U5Pcn-0005jL-6Z@frink.w3.org>
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <cloos@jhcloos.com>) id 1U5Pcc-0005hP-2J for ietf-http-wg@listhub.w3.org; Tue, 12 Feb 2013 23:50:46 +0000
Received: from eagle.jhcloos.com ([207.210.242.212]) by lisa.w3.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from <cloos@jhcloos.com>) id 1U5PcY-0005rZ-02 for ietf-http-wg@w3.org; Tue, 12 Feb 2013 23:50:46 +0000
Received: by eagle.jhcloos.com (Postfix, from userid 10) id 877E6402B1; Tue, 12 Feb 2013 23:49:55 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jhcloos.com; s=eagle; t=1360713019; bh=Vfv0vrr0qbuzTCJxT15TGWLH3Gi54Ap68OLvfafF2Xs=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=s9/mZJSyVdgTKohlkW4NZcUBRvZWyjX/XyS397kP1bSVJ/m5ERC+di7WFJdW3yfW3 eYE4rmE3t5vJwnHm4Q55tLi1e5NgaBqByoaAnr0wmYOzzAJkXXh0wu0vVtyQ4ESata Sou1Lb+RZZVWxlhx3IYXsHhI+OcUdP94vcAbTUM8=
Received: by carbon.jhcloos.org (Postfix, from userid 500) id 494986007A; Tue, 12 Feb 2013 23:41:09 +0000 (UTC)
From: James Cloos <cloos@jhcloos.com>
To: Roberto Peon <grmocg@gmail.com>
Cc: James M Snell <jasnell@gmail.com>, Mark Nottingham <mnot@mnot.net>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
In-Reply-To: <CAP+FsNf2x-K0OFQVLOKsc+ZM+BUDJGygcnUH=buQm4yA2Su2cw@mail.gmail.com> (Roberto Peon's message of "Mon, 11 Feb 2013 14:53:01 -0800")
References: <CABP7RbfRLXPpL4=wip=FvqD3DM7BM8PXi7uRswHAusXUmPO_xw@mail.gmail.com> <CE65E38D-A482-4EA9-BAF4-F6498F643A78@mnot.net> <CABP7RbcRrjV7EhwoGbkWbYJEXeWOwH4gQuaCG7N0siQqeMtcag@mail.gmail.com> <m3sj52a61n.fsf@carbon.jhcloos.org> <CAP+FsNf2x-K0OFQVLOKsc+ZM+BUDJGygcnUH=buQm4yA2Su2cw@mail.gmail.com>
User-Agent: Gnus/5.130006 (Ma Gnus v0.6) Emacs/24.3.50 (gnu/linux)
Face: iVBORw0KGgoAAAANSUhEUgAAABAAAAAQAgMAAABinRfyAAAACVBMVEX///8ZGXBQKKnCrDQ3 AAAAJElEQVQImWNgQAAXzwQg4SKASgAlXIEEiwsSIYBEcLaAtMEAADJnB+kKcKioAAAAAElFTkSu QmCC
Copyright: Copyright 2013 James Cloos
OpenPGP: ED7DAEA6; url=http://jhcloos.com/public_key/0xED7DAEA6.asc
OpenPGP-Fingerprint: E9E9 F828 61A4 6EA9 0F2B 63E7 997A 9F17 ED7D AEA6
Date: Tue, 12 Feb 2013 18:41:09 -0500
Message-ID: <m38v6t6qxd.fsf@carbon.jhcloos.org>
Lines: 26
MIME-Version: 1.0
Content-Type: text/plain
X-Hashcash: 1:30:130212:grmocg@gmail.com::k5YFzcpN4BvbUSf5:1PiBA
X-Hashcash: 1:30:130212:jasnell@gmail.com::NAF/jLS2Y22MOjyv:00000000000000000000000000000000000000000000BjBo
X-Hashcash: 1:30:130212:mnot@mnot.net::FhFOs54UgCClcxfp:000Ml/kt
X-Hashcash: 1:30:130212:"ietf-http-wg\@w3.org"::dln26QBkhfV/ovEv:00000000000000000000000000000000000000QkJIb
X-Hashcash: 1:30:130212:ietf-http-wg@w3.org::sjCFIbocEBMIl1sn:00000000000000000000000000000000000000000NJrV3
Received-SPF: pass client-ip=207.210.242.212; envelope-from=cloos@jhcloos.com; helo=eagle.jhcloos.com
X-W3C-Hub-Spam-Status: No, score=-3.6
X-W3C-Hub-Spam-Report: AWL=-3.449, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RP_MATCHES_RCVD=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001
X-W3C-Scan-Sig: lisa.w3.org 1U5PcY-0005rZ-02 12abbad02cb2d181608269dd980388b1
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Delta Compression and UTF-8 Header Values
Archived-At: <http://www.w3.org/mid/m38v6t6qxd.fsf@carbon.jhcloos.org>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/16590
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

>>>>> "RP" == Roberto Peon <grmocg@gmail.com> writes:

RP> The header names are almost completely handled with the pre-seeded
RP> dictionary, so they really don't affect the character frequency
RP> count and/or thus the huffman encoding.

RP> Arithmetic coding gets better compression ratios, at the expense of
RP> gobs of CPU and complexity. I don't think that is a good tradeoff :/

It is sometimes hard to guess whether huffman is chosen due to inertia,
arithmetic patent agnst, or good technical reasons.  It is good to know
that in this case it is the latter.

I may not have expressed my primary point quite well enough though:

Although I doubt that right now there is any text in the headers which
is both common enough to warrent inclusion in a static table and not
seven-bit clean, my point was that even if such text shows up over time,
the fact that it is not seven-bit should not prevent its inclusion in
future, extended versions of the static table.  As such specifying that
text is defined to be utf-8 and the use of a static huffman table should
not contra-indicate each other.

-JimC
-- 
James Cloos <cloos@jhcloos.com>         OpenPGP: 1024D/ED7DAEA6