Re: Delta Compression and UTF-8 Header Values
Nico Williams <nico@cryptonector.com> Mon, 11 February 2013 00:11 UTC
Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5D6E021F886C for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sun, 10 Feb 2013 16:11:50 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -8.039
X-Spam-Level:
X-Spam-Status: No, score=-8.039 tagged_above=-999 required=5 tests=[AWL=1.786, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, RCVD_IN_DNSWL_HI=-8, SARE_SUB_ENC_UTF8=0.152]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id csWU8Q29fFmo for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sun, 10 Feb 2013 16:11:49 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id 77D9C21F8804 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Sun, 10 Feb 2013 16:11:49 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1U4gxt-000256-Sy for ietf-http-wg-dist@listhub.w3.org; Mon, 11 Feb 2013 00:09:45 +0000
Resent-Date: Mon, 11 Feb 2013 00:09:45 +0000
Resent-Message-Id: <E1U4gxt-000256-Sy@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <nico@cryptonector.com>) id 1U4gxm-00024K-KA for ietf-http-wg@listhub.w3.org; Mon, 11 Feb 2013 00:09:38 +0000
Received: from caiajhbdccac.dreamhost.com ([208.97.132.202] helo=homiemail-a88.g.dreamhost.com) by maggie.w3.org with esmtp (Exim 4.72) (envelope-from <nico@cryptonector.com>) id 1U4gxl-00068c-LF for ietf-http-wg@w3.org; Mon, 11 Feb 2013 00:09:38 +0000
Received: from homiemail-a88.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a88.g.dreamhost.com (Postfix) with ESMTP id 9373F264058 for <ietf-http-wg@w3.org>; Sun, 10 Feb 2013 16:09:16 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h= mime-version:in-reply-to:references:date:message-id:subject:from :to:cc:content-type; s=cryptonector.com; bh=sY8tXDWPXtBfSFFSRi2o AkO1bjY=; b=A3gY4Rkug0JJl3q70U2Z8XZoUcgbZCIN2fjEovNVyDfFwfumiooq kndF5H7vVYjpPnZHo1pw4V+UV9buktlPoq2CoHZa8eO53l6EC88QXXMJpdWn/Z/R km28SqUOoA5cvrxzrFckvk2ghN5KcVXlb0Itn/JRMuJbNO0MI11+G20=
Received: from mail-wi0-f171.google.com (mail-wi0-f171.google.com [209.85.212.171]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by homiemail-a88.g.dreamhost.com (Postfix) with ESMTPSA id 46B6D264057 for <ietf-http-wg@w3.org>; Sun, 10 Feb 2013 16:09:16 -0800 (PST)
Received: by mail-wi0-f171.google.com with SMTP id hn17so2610642wib.4 for <ietf-http-wg@w3.org>; Sun, 10 Feb 2013 16:09:14 -0800 (PST)
MIME-Version: 1.0
X-Received: by 10.194.60.5 with SMTP id d5mr3667186wjr.4.1360541354885; Sun, 10 Feb 2013 16:09:14 -0800 (PST)
Received: by 10.217.39.133 with HTTP; Sun, 10 Feb 2013 16:09:14 -0800 (PST)
In-Reply-To: <CACuKZqHMQdktfOU3PJC=X-G8R=BQ40bhFJw=ZTfeSpem9L=GEw@mail.gmail.com>
References: <CABP7RbfRLXPpL4=wip=FvqD3DM7BM8PXi7uRswHAusXUmPO_xw@mail.gmail.com> <6372.1360352116@critter.freebsd.dk> <51164503.2030709@it.aoyama.ac.jp> <58832.1360414202@critter.freebsd.dk> <511726A5.5030302@it.aoyama.ac.jp> <79576.1360488507@critter.freebsd.dk> <51176C95.1040308@gmx.de> <79780.1360491855@critter.freebsd.dk> <CACuKZqHMQdktfOU3PJC=X-G8R=BQ40bhFJw=ZTfeSpem9L=GEw@mail.gmail.com>
Date: Sun, 10 Feb 2013 18:09:14 -0600
Message-ID: <CAK3OfOi+cXMLGsMCpD1cRBxzz46wVYYj8nz021fhqhM7fTDMWA@mail.gmail.com>
From: Nico Williams <nico@cryptonector.com>
To: Zhong Yu <zhong.j.yu@gmail.com>
Cc: Poul-Henning Kamp <phk@phk.freebsd.dk>, Julian Reschke <julian.reschke@gmx.de>, "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>, James M Snell <jasnell@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Content-Type: text/plain; charset="UTF-8"
Received-SPF: none client-ip=208.97.132.202; envelope-from=nico@cryptonector.com; helo=homiemail-a88.g.dreamhost.com
X-W3C-Hub-Spam-Status: No, score=-4.5
X-W3C-Hub-Spam-Report: AWL=-2.499, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001
X-W3C-Scan-Sig: maggie.w3.org 1U4gxl-00068c-LF f4fc5d38b325902f336a7062443c32a2
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Delta Compression and UTF-8 Header Values
Archived-At: <http://www.w3.org/mid/CAK3OfOi+cXMLGsMCpD1cRBxzz46wVYYj8nz021fhqhM7fTDMWA@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/16537
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>
On Sun, Feb 10, 2013 at 4:49 PM, Zhong Yu <zhong.j.yu@gmail.com> wrote: > On Sun, Feb 10, 2013 at 4:24 AM, Poul-Henning Kamp <phk@phk.freebsd.dk> wrote: >>>1) Filenames in Content-Disposition >> >> These only have meaning to the ultimate destinations, and if their >> filesystems don't support UTF-8, they'll have to do $something anyway. The filesystems pretty much do support either UTF-8 or "just-use-8". In general "just-use-8" only really interops if everyone uses the same codeset, and the only codeset we have that can be used universally is... Unicode. >> Nobody in the HTTP/2 protocol-chain can do anything but treat this >> as an opaque bytestring. > > But how does the 2 ends agree on which encoding to use? It might be > easier if HTTP just dictate UTF-8. Not might be. Will be. We've done this in many other protocols. In general we must either tag text with codeset metadata or declare that Unicode (UTF-8, generally) SHALL be used in the middle (and pushing codeset conversions to the edge. No character set other than Unicode is suitable for use "in the middle", and tagging strings with codeset metadata is particularly difficult. It might be useful to go over what we've done in filesystems and remote/distributed filesystem protocols. Very briefly, in ZFS we implemented fast normalization-insensitive string comparison and hashing functionality; the filesystem has an option to reject any non-UTF-8 byte sequences, but otherwise never normalizes on CREATE (compare to HFS+). Meanwhile NFSv4 calls for using only UTF-8 on the wire. This works. It works *really* well. The code is even open source. Filesystems are a great example of an application where tagging strings with codeset metadata doesn't work: we'd need to push process setlocale information into the kernel, and tag strings all the way from the system call boundary -through the VFS- to the filesystem driver -- with consequent impact on stable interfaces up and down the stack, and massive code modifications requirements. Filesystems are not the only example of this, but because filesystems cross so many layers in our stacks (user-land APIs, kernel-land APIs, on-the-wire protocols, on-disk formats) they are perhaps the best example. UTF-8 in the middle. Nico --
- Re: Delta Compression and UTF-8 Header Values Mark Nottingham
- Re: Delta Compression and UTF-8 Header Values James M Snell
- Re: Delta Compression and UTF-8 Header Values Adrien W. de Croy
- Delta Compression and UTF-8 Header Values James M Snell
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values James M Snell
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Roberto Peon
- Re: Delta Compression and UTF-8 Header Values James M Snell
- Re: Delta Compression and UTF-8 Header Values Bjoern Hoehrmann
- Re: Delta Compression and UTF-8 Header Values Martin J. Dürst
- Re: Delta Compression and UTF-8 Header Values Martin J. Dürst
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Willy Tarreau
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Martin Nilsson
- Re: Delta Compression and UTF-8 Header Values Martin Nilsson
- Re: Delta Compression and UTF-8 Header Values Albert Lunde
- Re: Delta Compression and UTF-8 Header Values Willy Tarreau
- Re: Delta Compression and UTF-8 Header Values Willy Tarreau
- Re: Delta Compression and UTF-8 Header Values Nico Williams
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Adrien W. de Croy
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Martin J. Dürst
- Re: Delta Compression and UTF-8 Header Values Martin J. Dürst
- Re: Delta Compression and UTF-8 Header Values Martin J. Dürst
- Re: Delta Compression and UTF-8 Header Values Roberto Peon
- Re: Delta Compression and UTF-8 Header Values Frédéric Kayser
- Re: Delta Compression and UTF-8 Header Values James M Snell
- Re: Delta Compression and UTF-8 Header Values Frédéric Kayser
- Re: Delta Compression and UTF-8 Header Values Roberto Peon
- Re: Delta Compression and UTF-8 Header Values Willy Tarreau
- Re: Delta Compression and UTF-8 Header Values James M Snell
- Re: Delta Compression and UTF-8 Header Values Frédéric Kayser
- Re: Delta Compression and UTF-8 Header Values Roberto Peon
- Re: Delta Compression and UTF-8 Header Values Nico Williams
- Re: Delta Compression and UTF-8 Header Values Roberto Peon
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Julian Reschke
- Re: Delta Compression and UTF-8 Header Values Julian Reschke
- Re: Delta Compression and UTF-8 Header Values Julian Reschke
- Re: Delta Compression and UTF-8 Header Values Willy Tarreau
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Willy Tarreau
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Mark Nottingham
- Re: Delta Compression and UTF-8 Header Values Roberto Peon
- Re: Delta Compression and UTF-8 Header Values Zhong Yu
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Zhong Yu
- Re: Delta Compression and UTF-8 Header Values Zhong Yu
- Re: Delta Compression and UTF-8 Header Values Zhong Yu
- Re: Delta Compression and UTF-8 Header Values Nico Williams
- Re: Delta Compression and UTF-8 Header Values Nico Williams
- Re: Delta Compression and UTF-8 Header Values Poul-Henning Kamp
- Re: Delta Compression and UTF-8 Header Values Nico Williams
- Re: Delta Compression and UTF-8 Header Values Nico Williams
- Re: Delta Compression and UTF-8 Header Values Phillip Hallam-Baker
- Re: Delta Compression and UTF-8 Header Values James Cloos
- Re: Delta Compression and UTF-8 Header Values Roberto Peon
- Re: Delta Compression and UTF-8 Header Values James Cloos
- Re: Delta Compression and UTF-8 Header Values Roberto Peon