Re: Delta Compression and UTF-8 Header Values
Willy Tarreau <w@1wt.eu> Sat, 09 February 2013 15:00 UTC
Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B2AC821F8887 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sat, 9 Feb 2013 07:00:37 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.31
X-Spam-Level:
X-Spam-Status: No, score=-10.31 tagged_above=-999 required=5 tests=[AWL=0.137, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8, SARE_SUB_ENC_UTF8=0.152]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5LnVqd4WyM0I for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sat, 9 Feb 2013 07:00:37 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 4026E21F884A for <httpbisa-archive-bis2Juki@lists.ietf.org>; Sat, 9 Feb 2013 07:00:37 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1U4Btl-00085x-0H for ietf-http-wg-dist@listhub.w3.org; Sat, 09 Feb 2013 14:59:25 +0000
Resent-Date: Sat, 09 Feb 2013 14:59:25 +0000
Resent-Message-Id: <E1U4Btl-00085x-0H@frink.w3.org>
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <w@1wt.eu>) id 1U4Bte-00085H-ED for ietf-http-wg@listhub.w3.org; Sat, 09 Feb 2013 14:59:18 +0000
Received: from 1wt.eu ([62.212.114.60]) by lisa.w3.org with esmtp (Exim 4.72) (envelope-from <w@1wt.eu>) id 1U4Btd-0007BQ-7G for ietf-http-wg@w3.org; Sat, 09 Feb 2013 14:59:18 +0000
Received: (from willy@localhost) by mail.home.local (8.14.4/8.14.4/Submit) id r19EwYM1009341; Sat, 9 Feb 2013 15:58:34 +0100
Date: Sat, 09 Feb 2013 15:58:34 +0100
From: Willy Tarreau <w@1wt.eu>
To: Martin Nilsson <nilsson@opera.com>
Cc: ietf-http-wg@w3.org
Message-ID: <20130209145834.GB8712@1wt.eu>
References: <CABP7RbfRLXPpL4=wip=FvqD3DM7BM8PXi7uRswHAusXUmPO_xw@mail.gmail.com> <CE65E38D-A482-4EA9-BAF4-F6498F643A78@mnot.net> <511642E9.9010607@it.aoyama.ac.jp> <20130209133341.GA8712@1wt.eu> <op.wr8se6rpiw9drz@uranium.westinmy-starwoodgp.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <op.wr8se6rpiw9drz@uranium.westinmy-starwoodgp.com>
User-Agent: Mutt/1.4.2.3i
Received-SPF: pass client-ip=62.212.114.60; envelope-from=w@1wt.eu; helo=1wt.eu
X-W3C-Hub-Spam-Status: No, score=-4.0
X-W3C-Hub-Spam-Report: AWL=-2.080, BAYES_00=-1.9, RP_MATCHES_RCVD=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001
X-W3C-Scan-Sig: lisa.w3.org 1U4Btd-0007BQ-7G 4b24bdb170f6077b569a62bbf165176f
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Delta Compression and UTF-8 Header Values
Archived-At: <http://www.w3.org/mid/20130209145834.GB8712@1wt.eu>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/16486
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>
On Sat, Feb 09, 2013 at 03:12:32PM +0100, Martin Nilsson wrote: > On Sat, 09 Feb 2013 14:33:41 +0100, Willy Tarreau <w@1wt.eu> wrote: > > >Also, processing it is > >particularly inefficient as you have to parse each and every byte to find > >a length, making string comparisons quite slow. > > You don't need to know the length in characters to compare strings. Just > comparing byte on byte works fine. This is exactly what you want to avoid when comparing with lots of strings. It's generally more efficient to first compare lengths, then byte per byte only if lengths match. This is equally true when checking for some regex patterns such as "/cache/dir/../..../" where "." denotes a character. And last but not least, the Boyer-Moore search is much less efficient with UTF-8 encoding than what it is with non-encoded data. I'm really all for just transporting raw data as much as possible, that only the two ends need to understand and agree upon when it comes to the encoding. However, if some data come from commonly UTF-8 encoded sources, then I'd rather keep them as-is than having to re-encode them. Willy
- Re: Delta Compression and UTF-8 Header Values Mark Nottingham
- Re: Delta Compression and UTF-8 Header Values James M Snell
- Re: Delta Compression and UTF-8 Header Values Adrien W. de Croy
- Delta Compression and UTF-8 Header Values James M Snell
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values James M Snell
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Roberto Peon
- Re: Delta Compression and UTF-8 Header Values James M Snell
- Re: Delta Compression and UTF-8 Header Values Bjoern Hoehrmann
- Re: Delta Compression and UTF-8 Header Values Martin J. Dürst
- Re: Delta Compression and UTF-8 Header Values Martin J. Dürst
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Willy Tarreau
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Martin Nilsson
- Re: Delta Compression and UTF-8 Header Values Martin Nilsson
- Re: Delta Compression and UTF-8 Header Values Albert Lunde
- Re: Delta Compression and UTF-8 Header Values Willy Tarreau
- Re: Delta Compression and UTF-8 Header Values Willy Tarreau
- Re: Delta Compression and UTF-8 Header Values Nico Williams
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Adrien W. de Croy
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Martin J. Dürst
- Re: Delta Compression and UTF-8 Header Values Martin J. Dürst
- Re: Delta Compression and UTF-8 Header Values Martin J. Dürst
- Re: Delta Compression and UTF-8 Header Values Roberto Peon
- Re: Delta Compression and UTF-8 Header Values Frédéric Kayser
- Re: Delta Compression and UTF-8 Header Values James M Snell
- Re: Delta Compression and UTF-8 Header Values Frédéric Kayser
- Re: Delta Compression and UTF-8 Header Values Roberto Peon
- Re: Delta Compression and UTF-8 Header Values Willy Tarreau
- Re: Delta Compression and UTF-8 Header Values James M Snell
- Re: Delta Compression and UTF-8 Header Values Frédéric Kayser
- Re: Delta Compression and UTF-8 Header Values Roberto Peon
- Re: Delta Compression and UTF-8 Header Values Nico Williams
- Re: Delta Compression and UTF-8 Header Values Roberto Peon
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Julian Reschke
- Re: Delta Compression and UTF-8 Header Values Julian Reschke
- Re: Delta Compression and UTF-8 Header Values Julian Reschke
- Re: Delta Compression and UTF-8 Header Values Willy Tarreau
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Willy Tarreau
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Mark Nottingham
- Re: Delta Compression and UTF-8 Header Values Roberto Peon
- Re: Delta Compression and UTF-8 Header Values Zhong Yu
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Zhong Yu
- Re: Delta Compression and UTF-8 Header Values Zhong Yu
- Re: Delta Compression and UTF-8 Header Values Zhong Yu
- Re: Delta Compression and UTF-8 Header Values Nico Williams
- Re: Delta Compression and UTF-8 Header Values Nico Williams
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Nico Williams
- Re: Delta Compression and UTF-8 Header Values Nico Williams
- Re: Delta Compression and UTF-8 Header Values Phillip Hallam-Baker
- Re: Delta Compression and UTF-8 Header Values James Cloos
- Re: Delta Compression and UTF-8 Header Values Roberto Peon
- Re: Delta Compression and UTF-8 Header Values James Cloos
- Re: Delta Compression and UTF-8 Header Values Roberto Peon