Re: support for non-ASCII in strings, was: signatures vs sf-date
Willy Tarreau <w@1wt.eu> Sat, 03 December 2022 09:53 UTC
Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6D71DC1522A0 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sat, 3 Dec 2022 01:53:20 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.951
X-Spam-Level:
X-Spam-Status: No, score=-4.951 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, MAILING_LIST_MULTI=-1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PkwhG5-tdeUo for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sat, 3 Dec 2022 01:53:18 -0800 (PST)
Received: from lyra.w3.org (lyra.w3.org [128.30.52.18]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id CF495C14CF17 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Sat, 3 Dec 2022 01:53:18 -0800 (PST)
Received: from lists by lyra.w3.org with local (Exim 4.94.2) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1p1PCj-00BgQ8-Tv for ietf-http-wg-dist@listhub.w3.org; Sat, 03 Dec 2022 09:53:05 +0000
Resent-Date: Sat, 03 Dec 2022 09:53:05 +0000
Resent-Message-Id: <E1p1PCj-00BgQ8-Tv@lyra.w3.org>
Received: from mimas.w3.org ([128.30.52.79]) by lyra.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <w@1wt.eu>) id 1p1PCh-00BgOf-Eo for ietf-http-wg@listhub.w3.org; Sat, 03 Dec 2022 09:53:03 +0000
Received: from wtarreau.pck.nerim.net ([62.212.114.60] helo=1wt.eu) by mimas.w3.org with esmtp (Exim 4.94.2) (envelope-from <w@1wt.eu>) id 1p1PCf-006K9y-Ga for ietf-http-wg@w3.org; Sat, 03 Dec 2022 09:53:03 +0000
Received: (from willy@localhost) by pcw.home.local (8.15.2/8.15.2/Submit) id 2B39qmjN007117; Sat, 3 Dec 2022 10:52:48 +0100
Date: Sat, 03 Dec 2022 10:52:48 +0100
From: Willy Tarreau <w@1wt.eu>
To: Julian Reschke <julian.reschke@gmx.de>
Cc: ietf-http-wg@w3.org
Message-ID: <20221203095248.GB7078@1wt.eu>
References: <202212021129.2B2BTY9f005362@critter.freebsd.dk> <b1d3af79-373f-a9af-7ff9-39f5f44915f0@gmx.de> <202212021214.2B2CEUQx005654@critter.freebsd.dk> <7a93fa17-38fe-5fa8-54ed-a726ab9d5a39@gmx.de> <841DC85E-F936-4350-A74F-170D22E6ADCE@gbiv.com> <202212021918.2B2JIBHC007228@critter.freebsd.dk> <65070e79-5429-a4cd-abe2-667b526badf1@gmx.de> <202212022147.2B2LlcqP008154@critter.freebsd.dk> <53D8E497-284A-4B2C-91D8-367542AA0A7C@mnot.net> <c6b41b93-23b0-f3b8-5d7f-05e52614070a@gmx.de>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <c6b41b93-23b0-f3b8-5d7f-05e52614070a@gmx.de>
User-Agent: Mutt/1.10.1 (2018-07-13)
Received-SPF: pass client-ip=62.212.114.60; envelope-from=w@1wt.eu; helo=1wt.eu
X-W3C-Hub-Spam-Status: No, score=-4.9
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, W3C_AA=-1, W3C_IRA=-1, W3C_WL=-1
X-W3C-Scan-Sig: mimas.w3.org 1p1PCf-006K9y-Ga befb401870e21df2cb2d94143876051c
X-Original-To: ietf-http-wg@w3.org
Subject: Re: support for non-ASCII in strings, was: signatures vs sf-date
Archived-At: <https://www.w3.org/mid/20221203095248.GB7078@1wt.eu>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/40638
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>
Hi Julian, On Sat, Dec 03, 2022 at 08:47:10AM +0100, Julian Reschke wrote: > > There are some cases where non-ASCII strings are needed in header fields; mostly, when you're presenting something to a human from the fields. Those cases are not as common. However, there's a catch to adding them: if full unicode strings were available in the protocol, many designers will understandably use them because it's been drilled into all our heads that unicode is what you use for strings. > > > > Hence, footgun. > > I would appreciate if you would explain why there is a problem we need > to prevent, and what exactly that problem is. Do you have an example? The main problem I'm personally having with this is that lots of text-based processing (regex etc) that is designed to apply on a subset of the input set will first have to pass through some non-bijective transformation (typically iconv) and that's where problems start to happen, with the usual stuff such as accentuated letters which lose their accents and turn to the regular one, sometimes only after being turned to upper case, and so on, making it possible to make some invalid contents match certain rules on certain components. I am particularly worried of letting this enter the protocol. If I'm setting up a rule saying that /static always routes to the static server, it means that /stàtic will not go there. But what if down the chain this gets turned to /STATIC then back to /static, to finally match an existing directory on the default server ? You will of course tell me that this is a bad example as I'm putting it on the URL but the problem is exactly the same with other headers. Causing such trouble to Link, Content-Type (for content analysis evasion), the path or domain in Set-Cookie etc is really problematic. On the request path we could imagine such things landing as far as into logs or data bases, with some diacritics being accidently turned into language symbols or delimitors. I actually find it very nice that anything that is not computer-safe has to be percent-encoded, it clearly sets a limit between the two worlds, the one that must match bytes, and the one that interpret characters, including homoglyphs, emojis, RTL vs LTR etc. The world has had several decades to adapt to this, and web development frameworks now make it seamless for developers to deal with this. People set up blogs, shopping carts and discussion boards with a few lines of code without ever having to wonder how data are encoded over the wire. Computers don't need to know what characters *look like* but how they are encoded. Humans mostly don't need to know how they are encoded but are only interested in what they look like. The current situation serves both worlds perfectly fine, and a move in either direction would break this important balance in my opinion. We could of course imagine to pass some info indicating how contents are supposed to be interpreted when that's not obvious from the header field name, but if applications use non-standard fields, they're expected to either know how they are supposed to exploit their contents, or to ignore the header. That has always been like this and been fine. After all, nothing prevents one from passing percent-encoded sounds, images, or even shell code in headers if they want. Right now it's reliably transported till its target. Just my two cents, Willy
- signatures vs sf-date Julian Reschke
- Re: signatures vs sf-date Poul-Henning Kamp
- Re: signatures vs sf-date Julian Reschke
- Re: signatures vs sf-date Poul-Henning Kamp
- Re: signatures vs sf-date Martin J. Dürst
- Re: signatures vs sf-date Poul-Henning Kamp
- Re: signatures vs sf-date Julian Reschke
- Re: signatures vs sf-date Poul-Henning Kamp
- Re: signatures vs sf-date Julian Reschke
- Re: signatures vs sf-date Poul-Henning Kamp
- support for non-ASCII in strings, was: signatures… Julian Reschke
- Re: support for non-ASCII in strings, was: signat… Julian Reschke
- Re: support for non-ASCII in strings, was: signat… Poul-Henning Kamp
- Re: support for non-ASCII in strings, was: signat… Julian Reschke
- Re: support for non-ASCII in strings, was: signat… Poul-Henning Kamp
- Re: support for non-ASCII in strings, was: signat… Poul-Henning Kamp
- Re: support for non-ASCII in strings, was: signat… Julian Reschke
- Re: support for non-ASCII in strings, was: signat… Julian Reschke
- Re: support for non-ASCII in strings, was: signat… Poul-Henning Kamp
- Re: support for non-ASCII in strings, was: signat… Poul-Henning Kamp
- Re: support for non-ASCII in strings, was: signat… Julian Reschke
- Re: support for non-ASCII in strings, was: signat… Carsten Bormann
- Re: support for non-ASCII in strings, was: signat… Poul-Henning Kamp
- Re: signatures vs sf-date Justin Richer
- Re: signatures vs sf-date Julian Reschke
- Re: signatures vs sf-date Ilari Liusvaara
- Re: signatures vs sf-date Poul-Henning Kamp
- Re: support for non-ASCII in strings, was: signat… Roy T. Fielding
- Re: support for non-ASCII in strings, was: signat… Poul-Henning Kamp
- Re: support for non-ASCII in strings, was: signat… Roy T. Fielding
- Re: support for non-ASCII in strings, was: signat… Poul-Henning Kamp
- Re: signatures vs sf-date Justin Richer
- Re: support for non-ASCII in strings, was: signat… Julian Reschke
- Re: signatures vs sf-date Julian Reschke
- Re: support for non-ASCII in strings, was: signat… Poul-Henning Kamp
- Re: support for non-ASCII in strings, was: signat… Mark Nottingham
- Re: support for non-ASCII in strings, was: signat… Julian Reschke
- Re: support for non-ASCII in strings, was: signat… Julian Reschke
- Re: support for non-ASCII in strings, was: signat… Mark Nottingham
- Re: support for non-ASCII in strings, was: signat… Julian Reschke
- Re: support for non-ASCII in strings, was: signat… Willy Tarreau
- Re: support for non-ASCII in strings, was: signat… Julian Reschke
- Re: support for non-ASCII in strings, was: signat… Julian Reschke
- Re: signatures vs sf-date Julian Reschke
- Re: signatures vs sf-date Mark Nottingham
- Re: signatures vs sf-date Julian Reschke
- Re: signatures vs sf-date Lucas Pardue
- Re: signatures vs sf-date Julian Reschke
- Re: signatures vs sf-date Ilari Liusvaara
- Re: signatures vs sf-date Lucas Pardue
- Re: signatures vs sf-date Mark Nottingham
- Re: signatures vs sf-date Lucas Pardue
- Re: signatures vs sf-date Mark Nottingham
- Re: signatures vs sf-date Julian Reschke
- Re: signatures vs sf-date Mark Nottingham
- Re: signatures vs sf-date Julian Reschke
- Re: signatures vs sf-date Watson Ladd
- Re: signatures vs sf-date Julian Reschke
- Re: signatures vs sf-date Watson Ladd
- Re: signatures vs sf-date Julian Reschke