Re: Delta Compression and UTF-8 Header Values

James Cloos <cloos@jhcloos.com> Mon, 11 February 2013 21:43 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C698421F890D for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Mon, 11 Feb 2013 13:43:43 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.319
X-Spam-Level:
X-Spam-Status: No, score=-5.319 tagged_above=-999 required=5 tests=[AWL=5.128, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8, SARE_SUB_ENC_UTF8=0.152]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AaVXU-MXy9d3 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Mon, 11 Feb 2013 13:43:42 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id C98A921F8884 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Mon, 11 Feb 2013 13:43:42 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1U518W-0002F0-V6 for ietf-http-wg-dist@listhub.w3.org; Mon, 11 Feb 2013 21:42:04 +0000
Resent-Date: Mon, 11 Feb 2013 21:42:04 +0000
Resent-Message-Id: <E1U518W-0002F0-V6@frink.w3.org>
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <cloos@jhcloos.com>) id 1U518M-0002AM-WC for ietf-http-wg@listhub.w3.org; Mon, 11 Feb 2013 21:41:55 +0000
Received: from eagle.jhcloos.com ([207.210.242.212]) by lisa.w3.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from <cloos@jhcloos.com>) id 1U518L-0004SY-UG for ietf-http-wg@w3.org; Mon, 11 Feb 2013 21:41:54 +0000
Received: by eagle.jhcloos.com (Postfix, from userid 10) id 35D6A4027F; Mon, 11 Feb 2013 21:41:08 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jhcloos.com; s=eagle; t=1360618892; bh=sQmUrEudUJRU0O87WLTx8pdWcb3rU3ZIhKBt/DKRuyI=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=SSto4fcTABeWd1V0/C7eLZVnvCjrpF2t5HiBaNpxSLKAN2JVVmSBm7qFBOsianIQo u+7E9EpX36Mm3ugnhNE+PC9K/bLIiWCKaBgEVfVjMOlCzt0rIV/b9nFLRpWZ2Sqw3x jlKysnD3o36pQjtHH5RyJNe0bw5F40cslhfpDdcE=
Received: by carbon.jhcloos.org (Postfix, from userid 500) id DE1E260069; Mon, 11 Feb 2013 21:34:03 +0000 (UTC)
From: James Cloos <cloos@jhcloos.com>
To: James M Snell <jasnell@gmail.com>
Cc: Mark Nottingham <mnot@mnot.net>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
In-Reply-To: <CABP7RbcRrjV7EhwoGbkWbYJEXeWOwH4gQuaCG7N0siQqeMtcag@mail.gmail.com> (James M. Snell's message of "Fri, 8 Feb 2013 17:10:02 -0800")
References: <CABP7RbfRLXPpL4=wip=FvqD3DM7BM8PXi7uRswHAusXUmPO_xw@mail.gmail.com> <CE65E38D-A482-4EA9-BAF4-F6498F643A78@mnot.net> <CABP7RbcRrjV7EhwoGbkWbYJEXeWOwH4gQuaCG7N0siQqeMtcag@mail.gmail.com>
User-Agent: Gnus/5.130006 (Ma Gnus v0.6) Emacs/24.3.50 (gnu/linux)
Face: iVBORw0KGgoAAAANSUhEUgAAABAAAAAQAgMAAABinRfyAAAACVBMVEX///8ZGXBQKKnCrDQ3 AAAAJElEQVQImWNgQAAXzwQg4SKASgAlXIEEiwsSIYBEcLaAtMEAADJnB+kKcKioAAAAAElFTkSu QmCC
Copyright: Copyright 2013 James Cloos
OpenPGP: ED7DAEA6; url=http://jhcloos.com/public_key/0xED7DAEA6.asc
OpenPGP-Fingerprint: E9E9 F828 61A4 6EA9 0F2B 63E7 997A 9F17 ED7D AEA6
Date: Mon, 11 Feb 2013 16:34:03 -0500
Message-ID: <m3sj52a61n.fsf@carbon.jhcloos.org>
Lines: 21
MIME-Version: 1.0
Content-Type: text/plain
X-Hashcash: 1:30:130211:jasnell@gmail.com::sArtqLJ6WIzNkS0r:0000000000000000000000000000000000000000000uLr2U
X-Hashcash: 1:30:130211:mnot@mnot.net::i/p3E3C/7prx1ta2:000G0eYy
X-Hashcash: 1:30:130211:"ietf-http-wg\@w3.org"::bEfqfcDzjRMVLRdd:00000000000000000000000000000000000000soo8Y
X-Hashcash: 1:30:130211:ietf-http-wg@w3.org::kxOddDS5g7nSR3vr:00000000000000000000000000000000000000000WadQ1
Received-SPF: pass client-ip=207.210.242.212; envelope-from=cloos@jhcloos.com; helo=eagle.jhcloos.com
X-W3C-Hub-Spam-Status: No, score=-1.1
X-W3C-Hub-Spam-Report: AWL=-1.044, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RP_MATCHES_RCVD=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001
X-W3C-Scan-Sig: lisa.w3.org 1U518L-0004SY-UG 2d165834cc84d4fb28bbbd6c3ed431ac
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Delta Compression and UTF-8 Header Values
Archived-At: <http://www.w3.org/mid/m3sj52a61n.fsf@carbon.jhcloos.org>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/16567
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

>>>>> "JMS" == James M Snell <jasnell@gmail.com> writes:

JMS> we'll be able to huffman code anything that is flagged
JMS> as ASCII, and won't be able to touch the rest.

Would that really be an issue?  The static huffman can only really be
for the common strings, yes?  Which mostly means the header names and
not the header values?  So even if the headers were limited to ascii
the tables wouldn't help much for most of the values?

(As an aside, Would arithmetic be of any better value than huffman, here?)

Using one bit for each string to specify utf8-text blob vs binary blob,
and using the former for everthing know to be text, seems the best
overall choice.  And if any non-ascii utf8 sequences become common
enough, they can be added to future revisions of the static table just
as easily as 7-bit strings can be.

-JimC
-- 
James Cloos <cloos@jhcloos.com>         OpenPGP: 1024D/ED7DAEA6