Re: Delta Compression and UTF-8 Header Values

Willy Tarreau <w@1wt.eu> Sat, 09 February 2013 15:00 UTC

Resent-Date: Sat, 09 Feb 2013 14:59:25 +0000
Resent-Message-Id: <E1U4Btl-00085x-0H@frink.w3.org>
Date: Sat, 09 Feb 2013 15:58:34 +0100
From: Willy Tarreau <w@1wt.eu>
To: Martin Nilsson <nilsson@opera.com>
Cc: ietf-http-wg@w3.org
Message-ID: <20130209145834.GB8712@1wt.eu>
References: <CABP7RbfRLXPpL4=wip=FvqD3DM7BM8PXi7uRswHAusXUmPO_xw@mail.gmail.com> <CE65E38D-A482-4EA9-BAF4-F6498F643A78@mnot.net> <511642E9.9010607@it.aoyama.ac.jp> <20130209133341.GA8712@1wt.eu> <op.wr8se6rpiw9drz@uranium.westinmy-starwoodgp.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <op.wr8se6rpiw9drz@uranium.westinmy-starwoodgp.com>
User-Agent: Mutt/1.4.2.3i
Received-SPF: pass client-ip=62.212.114.60; envelope-from=w@1wt.eu; helo=1wt.eu
Subject: Re: Delta Compression and UTF-8 Header Values
Archived-At: <http://www.w3.org/mid/20130209145834.GB8712@1wt.eu>
Resent-From: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list

On Sat, Feb 09, 2013 at 03:12:32PM +0100, Martin Nilsson wrote:
> On Sat, 09 Feb 2013 14:33:41 +0100, Willy Tarreau <w@1wt.eu> wrote:
> 
> >Also, processing it is
> >particularly inefficient as you have to parse each and every byte to find
> >a length, making string comparisons quite slow.
> 
> You don't need to know the length in characters to compare strings. Just  
> comparing byte on byte works fine.

This is exactly what you want to avoid when comparing with lots of strings.
It's generally more efficient to first compare lengths, then byte per byte
only if lengths match. This is equally true when checking for some regex
patterns such as "/cache/dir/../..../" where "." denotes a character. And
last but not least, the Boyer-Moore search is much less efficient with
UTF-8 encoding than what it is with non-encoded data.

I'm really all for just transporting raw data as much as possible, that
only the two ends need to understand and agree upon when it comes to the
encoding. However, if some data come from commonly UTF-8 encoded sources,
then I'd rather keep them as-is than having to re-encode them.

Willy

Re: Delta Compression and UTF-8 Header Values Mark Nottingham
Re: Delta Compression and UTF-8 Header Values James M Snell
Re: Delta Compression and UTF-8 Header Values Adrien W. de Croy
Delta Compression and UTF-8 Header Values James M Snell
Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
Re: Delta Compression and UTF-8 Header Values James M Snell
Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
Re: Delta Compression and UTF-8 Header Values Roberto Peon
Re: Delta Compression and UTF-8 Header Values James M Snell
Re: Delta Compression and UTF-8 Header Values Bjoern Hoehrmann
Re: Delta Compression and UTF-8 Header Values Martin J. Dürst
Re: Delta Compression and UTF-8 Header Values Martin J. Dürst
Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
Re: Delta Compression and UTF-8 Header Values Willy Tarreau
Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
Re: Delta Compression and UTF-8 Header Values Martin Nilsson
Re: Delta Compression and UTF-8 Header Values Martin Nilsson
Re: Delta Compression and UTF-8 Header Values Albert Lunde
Re: Delta Compression and UTF-8 Header Values Willy Tarreau
Re: Delta Compression and UTF-8 Header Values Willy Tarreau
Re: Delta Compression and UTF-8 Header Values Nico Williams
Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
Re: Delta Compression and UTF-8 Header Values Adrien W. de Croy
Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
Re: Delta Compression and UTF-8 Header Values Martin J. Dürst
Re: Delta Compression and UTF-8 Header Values Martin J. Dürst
Re: Delta Compression and UTF-8 Header Values Martin J. Dürst
Re: Delta Compression and UTF-8 Header Values Roberto Peon
Re: Delta Compression and UTF-8 Header Values Frédéric Kayser
Re: Delta Compression and UTF-8 Header Values James M Snell
Re: Delta Compression and UTF-8 Header Values Frédéric Kayser
Re: Delta Compression and UTF-8 Header Values Roberto Peon
Re: Delta Compression and UTF-8 Header Values Willy Tarreau
Re: Delta Compression and UTF-8 Header Values James M Snell
Re: Delta Compression and UTF-8 Header Values Frédéric Kayser
Re: Delta Compression and UTF-8 Header Values Roberto Peon
Re: Delta Compression and UTF-8 Header Values Nico Williams
Re: Delta Compression and UTF-8 Header Values Roberto Peon
Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
Re: Delta Compression and UTF-8 Header Values Julian Reschke
Re: Delta Compression and UTF-8 Header Values Julian Reschke
Re: Delta Compression and UTF-8 Header Values Julian Reschke
Re: Delta Compression and UTF-8 Header Values Willy Tarreau
Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
Re: Delta Compression and UTF-8 Header Values Willy Tarreau
Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
Re: Delta Compression and UTF-8 Header Values Mark Nottingham
Re: Delta Compression and UTF-8 Header Values Roberto Peon
Re: Delta Compression and UTF-8 Header Values Zhong Yu
Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
Re: Delta Compression and UTF-8 Header Values Zhong Yu
Re: Delta Compression and UTF-8 Header Values Zhong Yu
Re: Delta Compression and UTF-8 Header Values Zhong Yu
Re: Delta Compression and UTF-8 Header Values Nico Williams
Re: Delta Compression and UTF-8 Header Values Nico Williams
Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
Re: Delta Compression and UTF-8 Header Values Nico Williams
Re: Delta Compression and UTF-8 Header Values Nico Williams
Re: Delta Compression and UTF-8 Header Values Phillip Hallam-Baker
Re: Delta Compression and UTF-8 Header Values James Cloos
Re: Delta Compression and UTF-8 Header Values Roberto Peon
Re: Delta Compression and UTF-8 Header Values James Cloos
Re: Delta Compression and UTF-8 Header Values Roberto Peon