Re: Delta Compression and UTF-8 Header Values
Zhong Yu <zhong.j.yu@gmail.com> Sun, 10 February 2013 23:27 UTC
Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C7DC121F88BD for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sun, 10 Feb 2013 15:27:19 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.5
X-Spam-Level:
X-Spam-Status: No, score=-10.5 tagged_above=-999 required=5 tests=[AWL=-0.053, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8, SARE_SUB_ENC_UTF8=0.152]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aXEASN1P+yCQ for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sun, 10 Feb 2013 15:27:19 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 14CBA21F88B5 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Sun, 10 Feb 2013 15:27:18 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1U4gHj-0003JA-7u for ietf-http-wg-dist@listhub.w3.org; Sun, 10 Feb 2013 23:26:11 +0000
Resent-Date: Sun, 10 Feb 2013 23:26:11 +0000
Resent-Message-Id: <E1U4gHj-0003JA-7u@frink.w3.org>
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <zhong.j.yu@gmail.com>) id 1U4gHc-0003IQ-41 for ietf-http-wg@listhub.w3.org; Sun, 10 Feb 2013 23:26:04 +0000
Received: from mail-oa0-f46.google.com ([209.85.219.46]) by lisa.w3.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.72) (envelope-from <zhong.j.yu@gmail.com>) id 1U4gHb-0003Iz-8r for ietf-http-wg@w3.org; Sun, 10 Feb 2013 23:26:04 +0000
Received: by mail-oa0-f46.google.com with SMTP id k1so5769450oag.33 for <ietf-http-wg@w3.org>; Sun, 10 Feb 2013 15:25:37 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:content-transfer-encoding; bh=tnVe3MNmg+WqcZPqSDC/N4zxFydJqUfxcxwNSXfxdKc=; b=oXLHxrRcVwLQLUAwqtrpLhnRxxNlDHdw9Z+04+VUAGx0IbPOOPF8Of4BoYYCRitHIA pXS1CaxGeOp5SJ7sNeEw96u/0RGbzKnJ6DV2G9HuuIbEiiB/ZyujKZXSWqWQ2+W1zt7w H3k/iDsN0lUwysD9dnZ9q+tuosqk4ycK4EKm/iEFWQ+BbeTyVVmRXm3pCZBd7pdVRC+u ZsXHd52kDnpWfnwAldoZFOl7fUPUd5JuL371UmCHzMh7olXk/X5TCt6vW9+LJHsDz2AM 50s/GyG8mwVzG3/fEAnLkvDUbnJ8tDvvLIJSpJzYVk1kAW0OhHvTr8jItCOfLYe7OVAQ h8Iw==
MIME-Version: 1.0
X-Received: by 10.60.32.161 with SMTP id k1mr9638501oei.21.1360538737050; Sun, 10 Feb 2013 15:25:37 -0800 (PST)
Received: by 10.76.12.227 with HTTP; Sun, 10 Feb 2013 15:25:36 -0800 (PST)
In-Reply-To: <CACuKZqEhzqY8ksBVSdYPsrVbNNxwg-yWp=JorWANJ0UjqyQ2dw@mail.gmail.com>
References: <CABP7RbfRLXPpL4=wip=FvqD3DM7BM8PXi7uRswHAusXUmPO_xw@mail.gmail.com> <CE65E38D-A482-4EA9-BAF4-F6498F643A78@mnot.net> <511642E9.9010607@it.aoyama.ac.jp> <20130209133341.GA8712@1wt.eu> <511729F6.6000201@it.aoyama.ac.jp> <20130210072642.GN8712@1wt.eu> <CACuKZqEhzqY8ksBVSdYPsrVbNNxwg-yWp=JorWANJ0UjqyQ2dw@mail.gmail.com>
Date: Sun, 10 Feb 2013 17:25:36 -0600
Message-ID: <CACuKZqEsSaPLvFtTpLDSD8y3d2X2wdtQAqFciNESxLNk7ipTHw@mail.gmail.com>
From: Zhong Yu <zhong.j.yu@gmail.com>
To: Willy Tarreau <w@1wt.eu>
Cc: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, Mark Nottingham <mnot@mnot.net>, James M Snell <jasnell@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
Received-SPF: pass client-ip=209.85.219.46; envelope-from=zhong.j.yu@gmail.com; helo=mail-oa0-f46.google.com
X-W3C-Hub-Spam-Status: No, score=-3.4
X-W3C-Hub-Spam-Report: AWL=-2.590, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001
X-W3C-Scan-Sig: lisa.w3.org 1U4gHb-0003Iz-8r bdca4f38a0317a345c3df97a122ca70b
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Delta Compression and UTF-8 Header Values
Archived-At: <http://www.w3.org/mid/CACuKZqEsSaPLvFtTpLDSD8y3d2X2wdtQAqFciNESxLNk7ipTHw@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/16536
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>
On Sun, Feb 10, 2013 at 4:58 PM, Zhong Yu <zhong.j.yu@gmail.com> wrote: > On Sun, Feb 10, 2013 at 1:26 AM, Willy Tarreau <w@1wt.eu> wrote: >> Hello Martin, >> >> On Sun, Feb 10, 2013 at 02:02:46PM +0900, "Martin J. Dürst" wrote: >>> >The encoding can >>> >become inefficient to transport for other charsets by inflating data by up >>> >to 50% >>> >>> Well, that's actually an urban myth. The 50% is for CJK >>> (Chinese/Japanese/Korean). >> >> With the fast development of China, it is perfectly imaginable that >> in 10 years, a significant portion of the web traffic is made with >> Chineese URLs, so we must not ignore that. > > The problem of Chinese character in URL is %-encoding: > > %##%##%## > > 9 bytes for a single Chinese character! where ideally 2 bytes should suffice. > > However, this is a URI issue, not an HTTP issue. Is HTTP going to > unilaterally "upgrade" URI format? That is possible, but it seems a > big step, and it'll only decease interop for some coming years. ... and I did not know about IRI... Is HTTP2 going to adopt IRI? > > From my perspective, URLs are not a priority to optimize; they are > usually not that big; servers can unilaterally use a more efficient > encoding method for special chars. Maybe we should restraint from > trying to change URI syntax. > > Zhong Yu > >> >>> For the languages/scripts of India, South >>> East Asia, and a few more places, it can be 200%. (For texts purely in >>> an alphabet in the Supplemental planes such as Old Italic, Shavian, >>> Osmanya,..., it can be 300%, but I guess we can ignore these.) But these >>> numbers only apply to cases that don't contain any ASCII at all. >> >> I don't see how this is possible since you have 6 bits of data per byte >> plus a few bits on the first byte, and you need 3 bytes to transport 16 >> bits, which is 50% for me :-) >> >>> >and may make compression less efficient. >>> >>> That depends very much on the method of compression that's used. >> >> I agree, but adding unused bits or entropy in general will make compression >> algorithms less efficient. >> >>> >I'm not saying I'm totally against UTF-8 in HTTP/2 (eventhough I hate using >>> >it), I'm saying that it's not *THE* solution to every problem. It's just >>> >*A* >>> >solution to *A* problem : "how to extend character sets in existing >>> >documents >>> >without having to re-encode them all". I don't think this specific problem >>> >is >>> >related to the scope of the HTTP/2 work, so at first glance, I'd say that >>> >UTF-8 doesn't seem to solve a known problem here. >>> >>> The fact that I mentioned Websockets may have lead to a >>> misunderstanding. I'm not proposing to use UTF-8 only in bodies, just in >>> headers (I wouldn't object, though). My understanding was that James was >>> talking about headers, and I was doing so, too. >> >> I was talking about header values too. As a developer of intermediaries, >> I'm not interested in the body at all. I'm seeing people do ugly things >> all the time, like regex-matching hosts with ".*\.example\.com" without >> being aware how slow it is to do that on each and every Host header field. >> Typically doing that with an UTF-8 aware library is even slower. >> >> That's why I'm having some concerns. >> >> Ideally, everything we transport should be in its original form. If hosts >> come from DNS, they should appear encoded as they were returned by the DNS >> server (even with the ugly IDN format). If paths are supposed to be UTF-8, >> let them be sent in their raw original UTF-8 form without changing the >> format. But then we don't want to mix Host and path, and we want to put as >> a first rule that only the shortest forms are allowed. If most header fields >> are pure ASCII (eg: encodings), declare them as such. If some header fields >> are enums, use enums and not text. Etc... >> >> Regards, >> Willy >> >>
- Re: Delta Compression and UTF-8 Header Values Mark Nottingham
- Re: Delta Compression and UTF-8 Header Values James M Snell
- Re: Delta Compression and UTF-8 Header Values Adrien W. de Croy
- Delta Compression and UTF-8 Header Values James M Snell
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values James M Snell
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Roberto Peon
- Re: Delta Compression and UTF-8 Header Values James M Snell
- Re: Delta Compression and UTF-8 Header Values Bjoern Hoehrmann
- Re: Delta Compression and UTF-8 Header Values Martin J. Dürst
- Re: Delta Compression and UTF-8 Header Values Martin J. Dürst
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Willy Tarreau
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Martin Nilsson
- Re: Delta Compression and UTF-8 Header Values Martin Nilsson
- Re: Delta Compression and UTF-8 Header Values Albert Lunde
- Re: Delta Compression and UTF-8 Header Values Willy Tarreau
- Re: Delta Compression and UTF-8 Header Values Willy Tarreau
- Re: Delta Compression and UTF-8 Header Values Nico Williams
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Adrien W. de Croy
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Martin J. Dürst
- Re: Delta Compression and UTF-8 Header Values Martin J. Dürst
- Re: Delta Compression and UTF-8 Header Values Martin J. Dürst
- Re: Delta Compression and UTF-8 Header Values Roberto Peon
- Re: Delta Compression and UTF-8 Header Values Frédéric Kayser
- Re: Delta Compression and UTF-8 Header Values James M Snell
- Re: Delta Compression and UTF-8 Header Values Frédéric Kayser
- Re: Delta Compression and UTF-8 Header Values Roberto Peon
- Re: Delta Compression and UTF-8 Header Values Willy Tarreau
- Re: Delta Compression and UTF-8 Header Values James M Snell
- Re: Delta Compression and UTF-8 Header Values Frédéric Kayser
- Re: Delta Compression and UTF-8 Header Values Roberto Peon
- Re: Delta Compression and UTF-8 Header Values Nico Williams
- Re: Delta Compression and UTF-8 Header Values Roberto Peon
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Julian Reschke
- Re: Delta Compression and UTF-8 Header Values Julian Reschke
- Re: Delta Compression and UTF-8 Header Values Julian Reschke
- Re: Delta Compression and UTF-8 Header Values Willy Tarreau
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Willy Tarreau
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Mark Nottingham
- Re: Delta Compression and UTF-8 Header Values Roberto Peon
- Re: Delta Compression and UTF-8 Header Values Zhong Yu
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Zhong Yu
- Re: Delta Compression and UTF-8 Header Values Zhong Yu
- Re: Delta Compression and UTF-8 Header Values Zhong Yu
- Re: Delta Compression and UTF-8 Header Values Nico Williams
- Re: Delta Compression and UTF-8 Header Values Nico Williams
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Nico Williams
- Re: Delta Compression and UTF-8 Header Values Nico Williams
- Re: Delta Compression and UTF-8 Header Values Phillip Hallam-Baker
- Re: Delta Compression and UTF-8 Header Values James Cloos
- Re: Delta Compression and UTF-8 Header Values Roberto Peon
- Re: Delta Compression and UTF-8 Header Values James Cloos
- Re: Delta Compression and UTF-8 Header Values Roberto Peon