Re: Delta Compression and UTF-8 Header Values

"Poul-Henning Kamp" <phk@phk.freebsd.dk> Sun, 10 February 2013 09:40 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 83DEF21F8540 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sun, 10 Feb 2013 01:40:14 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.346
X-Spam-Level:
X-Spam-Status: No, score=-10.346 tagged_above=-999 required=5 tests=[AWL=-0.199, BAYES_00=-2.599, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_HI=-8, SARE_SUB_ENC_UTF8=0.152]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FccGxcL3xGgw for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sun, 10 Feb 2013 01:40:14 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id ED22821F84F9 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Sun, 10 Feb 2013 01:40:13 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1U4TMl-0005AI-05 for ietf-http-wg-dist@listhub.w3.org; Sun, 10 Feb 2013 09:38:31 +0000
Resent-Date: Sun, 10 Feb 2013 09:38:31 +0000
Resent-Message-Id: <E1U4TMl-0005AI-05@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <phk@phk.freebsd.dk>) id 1U4TMe-00059Z-VW for ietf-http-wg@listhub.w3.org; Sun, 10 Feb 2013 09:38:24 +0000
Received: from phk.freebsd.dk ([130.225.244.222]) by maggie.w3.org with esmtp (Exim 4.72) (envelope-from <phk@phk.freebsd.dk>) id 1U4TMe-0004Fv-He for ietf-http-wg@w3.org; Sun, 10 Feb 2013 09:38:24 +0000
Received: from critter.freebsd.dk (critter.freebsd.dk [192.168.61.3]) by phk.freebsd.dk (Postfix) with ESMTP id D56938A50F; Sun, 10 Feb 2013 09:38:03 +0000 (UTC)
Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.5/8.14.5) with ESMTP id r1A9c3ss079641; Sun, 10 Feb 2013 09:38:03 GMT (envelope-from phk@phk.freebsd.dk)
To: Frédéric Kayser <f.kayser@free.fr>
cc: ietf-http-wg@w3.org
In-reply-to: <A4C04DB9-2524-49EC-8774-AF2EBF3EA350@free.fr>
From: Poul-Henning Kamp <phk@phk.freebsd.dk>
References: <CABP7RbfRLXPpL4=wip=FvqD3DM7BM8PXi7uRswHAusXUmPO_xw@mail.gmail.com> <CE65E38D-A482-4EA9-BAF4-F6498F643A78@mnot.net> <511642E9.9010607@it.aoyama.ac.jp> <20130209133341.GA8712@1wt.eu> <op.wr8se6rpiw9drz@uranium.westinmy-starwoodgp.com> <A4C04DB9-2524-49EC-8774-AF2EBF3EA350@free.fr>
Date: Sun, 10 Feb 2013 09:38:03 +0000
Message-ID: <79640.1360489083@critter.freebsd.dk>
Received-SPF: none client-ip=130.225.244.222; envelope-from=phk@phk.freebsd.dk; helo=phk.freebsd.dk
X-W3C-Hub-Spam-Status: No, score=-4.3
X-W3C-Hub-Spam-Report: AWL=-2.447, BAYES_00=-1.9, RP_MATCHES_RCVD=-0.001
X-W3C-Scan-Sig: maggie.w3.org 1U4TMe-0004Fv-He 017d9a7931eec81106fee1a03c3bda7e
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Delta Compression and UTF-8 Header Values
Archived-At: <http://www.w3.org/mid/79640.1360489083@critter.freebsd.dk>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/16509
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

Content-Type: text/plain; charset=ISO-8859-1
--------
In message <A4C04DB9-2524-49EC-8774-AF2EBF3EA350@free.fr>, =?iso-8859-1?Q?Fr=E9
d=E9ric_Kayser?= writes:

>Comparing Unicode strings without prior normalisation can lead to =
>surprising results: "Fr=E9d=E9ric" and "Fr=E9d=E9ric" [...]

Indeed they do :)  (Yes, I noticed your second email)

But where in the HTTP/2 protocol is it exactly that you want to introduce
diacritics ?

	Cäché-Cöntrøl: Not-in-Europe

anyone ?


Willy writes:

> With the fast development of China, it is perfectly imaginable that
> in 10 years, a significant portion of the web traffic is made with
> Chineese URLs, so we must not ignore that.

I would just ignore it.

The only two places which care about the character-set of the URL,
is the ultimate client and the ultimate server, to everybody else,
it is just a sequence of opaque bits, which they must treat as a
indivisible unit.

Please somebody give men an example of exactly where in a HTTP/2
protocol session a HTTP protocol entity, (as opposed to the
ultimate consumer or producer of the information), cannot simply
use memcmp(), but needs to know the characterset encoding ?

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.