Re: Consensus call to include Display Strings in draft-ietf-httpbis-sfbis

Mark Nottingham <mnot@mnot.net> Thu, 25 May 2023 22:38 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5AFCFC169501 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 25 May 2023 15:38:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.239
X-Spam-Level:
X-Spam-Status: No, score=-2.239 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.25, MAILING_LIST_MULTI=-1, RAZOR2_CF_RANGE_51_100=1.886, RAZOR2_CHECK=0.922, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=mnot.net header.b="sinjDVph"; dkim=pass (2048-bit key) header.d=messagingengine.com header.b="X86CObVq"
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RzkNTnnNa-II for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 25 May 2023 15:38:29 -0700 (PDT)
Received: from lyra.w3.org (lyra.w3.org [128.30.52.18]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 64F15C169509 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Thu, 25 May 2023 15:38:29 -0700 (PDT)
Received: from lists by lyra.w3.org with local (Exim 4.94.2) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1q2Jb9-007e4t-8J for ietf-http-wg-dist@listhub.w3.org; Thu, 25 May 2023 22:38:19 +0000
Resent-Date: Thu, 25 May 2023 22:38:19 +0000
Resent-Message-Id: <E1q2Jb9-007e4t-8J@lyra.w3.org>
Received: from mimas.w3.org ([128.30.52.79]) by lyra.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <mnot@mnot.net>) id 1q2Jb8-007e3x-CV for ietf-http-wg@listhub.w3.org; Thu, 25 May 2023 22:38:18 +0000
Received: from out1-smtp.messagingengine.com ([66.111.4.25]) by mimas.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <mnot@mnot.net>) id 1q2Jb8-002N7B-2Z for ietf-http-wg@w3.org; Thu, 25 May 2023 22:38:18 +0000
Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 5C2ED5C0307; Thu, 25 May 2023 18:38:13 -0400 (EDT)
Received: from mailfrontend2 ([10.202.2.163]) by compute1.internal (MEProxy); Thu, 25 May 2023 18:38:13 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mnot.net; h=cc :cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to; s=fm2; t= 1685054293; x=1685140693; bh=rZaWirmfmdODBQINz/BvpujiIX710FPD7Rp g1FjQUtU=; b=sinjDVphs8p3b7pvFUVZkOKySEZlRyXJgNjHXlvWIz8NGPy4E40 D2uJBS1SNpj+c8ukLxFz0sE/WE/sU+El06PjbdaubdYW2KTgs0RQtC92hYmU1dZQ +uuQubHFz4GuH02tRCTyh5bTMIkKVe7/e4N6hB/w+iu9YEJPMubYkUb6IsRag51A aEUxrW5QUuAKu1nSB+H2AJBeApDC3xCmsMQlh0WBvNutnuTrUIRMU+Xf1dmnZB+U BEXj78Nc/8bGzElQXhsyGzRKvYlCqOCl5BWTEJAMCrotie7DOWNC1DNvYJtznDIs Vt/WhwD/O0LhGJfb/efotvzXb8Fejqg65VQ==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t= 1685054293; x=1685140693; bh=rZaWirmfmdODBQINz/BvpujiIX710FPD7Rp g1FjQUtU=; b=X86CObVqZJ84IQbqClv4yyVdEoPLLolvMA/+6EqPjalwlfqgPfj IjBOJWz4JNYaYVs6Lz98nnWGgNETufhDIEJWbTSBg/i7tNz62HJQD19ikp/2S+yJ X+fj77HzrlEZxTE1A3CnPx11CMWd/WHv3dnpAgnIGfkSnpbHXX8kNfqWiDuyJHng NcN4pSa6+K2QpclLq8frwUgcFzpdpsQL9BFlvF37CJqX8URrUSOAM73aaCCXkn/T Ak8iUffiQdj42hlFNguu3Vak9oUpoBn2iZnotiMyP9wiWLqzIChVI6n/jwEWCbb7 PICmS+AAUYPxBxLPOj5/Pj/U7EYKcx9t3DQ==
X-ME-Sender: <xms:VeNvZNFCw_I-WU0VJ9CsvVBlTdviV5K8F9BllCA6BXC7XFbZ8skmEA> <xme:VeNvZCVVzeleYFpEZ90cJhNjkvo_-SHexrp0wjaNzm-vwghqfECns-syApbB9MAaS XaxHTEANvPtRpGF_Q>
X-ME-Received: <xmr:VeNvZPLWaJLvIjEw3zjucSnVeVN1S80XjsNB7gijK5NeCAGPavyZ4usIEZ9N29IAUhjXQyrXQ2qnIMGnW7QJisVDO49HWmnMsLP1O6m3eAJSxrG7THK_3Dma>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrfeejkedgudefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurheptggguffhjgffvefgkfhfvffosehtqhhmtdhhtddvnecuhfhrohhmpeforghr khcupfhothhtihhnghhhrghmuceomhhnohhtsehmnhhothdrnhgvtheqnecuggftrfgrth htvghrnheptddtgefgueevtddugfdtkeffudegveetffegjeelhfdvtedvueejteegueeg teetnecuffhomhgrihhnpehmnhhothdrnhgvthenucevlhhushhtvghrufhiiigvpedtne curfgrrhgrmhepmhgrihhlfhhrohhmpehmnhhothesmhhnohhtrdhnvght
X-ME-Proxy: <xmx:VeNvZDFdvM8ueglzifhAxkI584R0yATlywcOpMY1UqTwgukzAA4Ovg> <xmx:VeNvZDV4XVIJv2sF-m9Ipn920KpkEFGy0JpVvfjPQ4mjw6wwk70iWw> <xmx:VeNvZONojlNMX6dp4Kj5pf6nUlwV6vWvMckxcfnoKf4K9XnbylTUcw> <xmx:VeNvZKd9lccD6LgnSHC_aNdjL2JNOiHP-bYqC-gMkbUojv95GkbxFQ>
Feedback-ID: ie6694242:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 25 May 2023 18:38:11 -0400 (EDT)
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.600.7\))
From: Mark Nottingham <mnot@mnot.net>
In-Reply-To: <C687C218-7793-4B74-BB51-B7C34059F9C4@gbiv.com>
Date: Fri, 26 May 2023 08:38:09 +1000
Cc: Tommy Pauly <tpauly@apple.com>, HTTP Working Group <ietf-http-wg@w3.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <F84B0780-7710-4F74-9830-ECBD4A926C3D@mnot.net>
References: <FC5270AF-509C-4331-AE8F-1F2D51BBC5F2@apple.com> <C687C218-7793-4B74-BB51-B7C34059F9C4@gbiv.com>
To: Roy Fielding <fielding@gbiv.com>
X-Mailer: Apple Mail (2.3731.600.7)
Received-SPF: pass client-ip=66.111.4.25; envelope-from=mnot@mnot.net; helo=out1-smtp.messagingengine.com
X-W3C-Hub-DKIM-Status: validation passed: (address=mnot@mnot.net domain=mnot.net), signature is good
X-W3C-Hub-DKIM-Status: validation passed: (address=mnot@mnot.net domain=messagingengine.com), signature is good
X-W3C-Hub-Spam-Status: No, score=-7.0
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RAZOR2_CF_RANGE_51_100=1.886, RAZOR2_CHECK=0.922, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, W3C_AA=-1, W3C_DB=-1, W3C_IRA=-1, W3C_IRR=-3, W3C_WL=-1
X-W3C-Scan-Sig: mimas.w3.org 1q2Jb8-002N7B-2Z 30a7c8384f7f0f9b31210fdf45338fa7
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Consensus call to include Display Strings in draft-ietf-httpbis-sfbis
Archived-At: <https://www.w3.org/mid/F84B0780-7710-4F74-9830-ECBD4A926C3D@mnot.net>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/51091
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

Hi Roy,

> On 26 May 2023, at 3:21 am, Roy T. Fielding <fielding@gbiv.com> wrote:
> 
> I think (b) is unnecessary given that HTTP is 8-bit clean for UTF-8
> and we are specifically talking about new fields for which there
> are no deployed parsers. Yes, I know what it says in RFC 9110.

Yes, the parsers may be new, but in some contexts, they may not have access to the raw bytes of the field value. Many HTTP libraries and abstractions (e.g., CGI) assume an encoding and expose strings; some of those may apply the advice that HTTP has documented for many years and assume ISO-8859-1.

Yes, in many cases you can use UTF-8 on the wire successfully. However, making that assumption is a local convention; we can't assume that it holds for the entire Internet, because we don't know all of the various implementations that have been deployed and how they behave. All we know is a) how the implementations we've seen behave, and b) what we've written down before.

In the past we've made decisions like this and chosen to be conservative. We could certainly break that habit now, but we'd need (at the least) to have a big warning that this type might not be interoperable with deployed systems. Personally, I don't think that's worth it, given the relative rarity that we expect for this particular type, and the relatively low overhead of encoding.


> The PR doesn't clearly express any of these points. It says the
> strings contain Unicode (a character set) but they obviously don't;
> they contain sequences of unvalidated pct-encoded octets.
> This allows arbitrary octets to be encoded for something that
> is supposed to be a display string.
[...]
> If this is truly for a display string, the feature must be
> specific about the encoding and allowed characters.
> My suggestion would be to limit the string to non-CNTRL
> ASCII and non-control valid UTF-8. We don't want to allow
> anything that would twist the feature to some other ends.
> 
> Assuming we do this with pct-encoding, we should not allow
> arbitrary octets to be encoded. We should disallow encodings
> that are unnecessary (normal printable ASCII aside from % and "),
> control characters, or octets not valid for UTF-8. That can
> be specified by prose and reference to the IETF specs, or
> we could specify the allowed ranges with a regular expression.
> Either one is better than allowing arbitrary octets to be encoded.

I think that's reasonable and we can discuss improvements after adopting the PR.

> In general, it is safer to send raw UTF-8 over the wire in HTTP
> than it is to send arbitrary pct-encoded octets, simply because
> pct-encoding is going to bypass most security checks long enough
> for the data to reach an applications where people do stupid
> things with strings that they assume contain something that is
> safe to display.

That's an odd assertion - where are those security checks taking place?


> Note that I am not saying that we should consider normalization
> or any other weirdness specific to Unicode. We don't need to.
> We just need to stay within the confines of what has already
> been defined as valid and safe UTF-8. Everything else is being
> actively targeted by pentesters and script kiddies, on every
> public server on the Internet, to the point where we have to
> block it within CDN configurations just to avoid overloading
> the origin servers.

Understood.

Cheers,


--
Mark Nottingham   https://www.mnot.net/