Re: Consensus call to include Display Strings in draft-ietf-httpbis-sfbis

"Roy T. Fielding" <fielding@gbiv.com> Thu, 25 May 2023 17:24 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 78A08C151094 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 25 May 2023 10:24:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.748
X-Spam-Level:
X-Spam-Status: No, score=-7.748 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.25, MAILING_LIST_MULTI=-1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gbiv.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5PTyN2M05rmI for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 25 May 2023 10:24:19 -0700 (PDT)
Received: from lyra.w3.org (lyra.w3.org [128.30.52.18]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 06E5EC15106C for <httpbisa-archive-bis2Juki@lists.ietf.org>; Thu, 25 May 2023 10:24:18 -0700 (PDT)
Received: from lists by lyra.w3.org with local (Exim 4.94.2) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1q2Ef0-0077lU-8R for ietf-http-wg-dist@listhub.w3.org; Thu, 25 May 2023 17:21:58 +0000
Resent-Date: Thu, 25 May 2023 17:21:58 +0000
Resent-Message-Id: <E1q2Ef0-0077lU-8R@lyra.w3.org>
Received: from mimas.w3.org ([128.30.52.79]) by lyra.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <fielding@gbiv.com>) id 1q2Eey-0077kb-W5 for ietf-http-wg@listhub.w3.org; Thu, 25 May 2023 17:21:57 +0000
Received: from giraffe.ash.relay.mailchannels.net ([23.83.222.69]) by mimas.w3.org with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <fielding@gbiv.com>) id 1q2Eey-002HoR-6F for ietf-http-wg@w3.org; Thu, 25 May 2023 17:21:56 +0000
X-Sender-Id: dreamhost|x-authsender|fielding@gbiv.com
Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id F05EA7E19A0; Thu, 25 May 2023 17:21:46 +0000 (UTC)
Received: from pdx1-sub0-mail-a206.dreamhost.com (unknown [127.0.0.6]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 865097E2E2D; Thu, 25 May 2023 17:21:46 +0000 (UTC)
ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1685035306; a=rsa-sha256; cv=none; b=rxh1uPJyu24ao2BKG+pHfw71bKCBZKEei+dV7m3wcKhWpLdcLxfEUTJcKvztccTMKHfe74 AhRJsjol7KN3YNBfgY8AFsJ5t1BgUSZwTg7OWV47oVHMwKaHbgzsWL9sFmDVgB/2y2oJXX 0zjNr2iI+yXOx4yq3koLHywE4qOFGYZfsOLGqDXuAbf/liaGdj+U0Ry8GtYcNW+qcWujM6 e8fJ6XigIZBIP3REPJSX2ElRBud7XXwXzukOzCseBBnmuVkhk+gGe4mGRSX/xNSpR/D1iB bByVc/BnE3fkGIthj8j3RH72h+SJ1iUWgGyNC7VusRSiktW/Y9yTZqU04DiAMg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1685035306; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=K3i6TL+C/wQfiyK1/07AD56Mx6BCJxMWUq3pJNE682o=; b=Lgjn6dWGDABPiXfga5ypNTxOp32LDkFkixfakI7yk0BAFJCxKwEU9sVKkytV5KctCJOsNA bdb8k3C4tfFgIjWcOsbPnRReAqmft5OuBFgyUqd1c+8LZMpECmBty5ECMb4gNv4QrLV76h ZbLMnxPxQfZz1qdc1a22wQUikxFk8dRp/KFIZOoVM/SSZfV1Ki+0RuzjH/NtbKuqIYYm48 KQVuOiPhGqdTziiog1fQiZO+oTIg26U5mKCOpiNnR9kSEIHlCjMzF/WNRmcvQXuwGe9vCY lpUDrXhrcTVi1Cqenwbx1Un7V8UkGAZHahKqLvacojlAtfy/paJruW6oO0GfvQ==
ARC-Authentication-Results: i=1; rspamd-79bb5575d7-5ghpr; auth=pass smtp.auth=dreamhost smtp.mailfrom=fielding@gbiv.com
X-Sender-Id: dreamhost|x-authsender|fielding@gbiv.com
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|fielding@gbiv.com
X-MailChannels-Auth-Id: dreamhost
X-Hysterical-Reign: 326da1536abae418_1685035306780_1061494804
X-MC-Loop-Signature: 1685035306780:2397568121
X-MC-Ingress-Time: 1685035306779
Received: from pdx1-sub0-mail-a206.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384) by 100.125.42.166 (trex/6.8.1); Thu, 25 May 2023 17:21:46 +0000
Received: from smtpclient.apple (ip72-194-77-117.oc.oc.cox.net [72.194.77.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: fielding@gbiv.com) by pdx1-sub0-mail-a206.dreamhost.com (Postfix) with ESMTPSA id 4QRvx61pNmz2p; Thu, 25 May 2023 10:21:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gbiv.com; s=dreamhost; t=1685035306; bh=3bFxjJFzTyHdGpc+/FcjfK1FC/mJjAlRtA6au+xz5co=; h=Content-Type:Subject:From:Date:Cc:Content-Transfer-Encoding:To; b=ksBtc8R4pnqk6yGHt+p3JgKp15D7Ny/H9ZuKLn5CnJ1MKZu4dxpTQPq51v7u3HZfq GCo1gr5VuW/OJPqKjieK2eJMxtu8zlP7ej/uPKMYebHZRF4C8nyq7bDvbbcb/sPpww IBGhR6BQwGJl9yUiKm96jnLhTrqcFgcrq8KJYa9i7J69dn4Tqg8ZLEcajdW37nnksH XmCDtOC+ET+UWeL5cgtLXRw2CJJMvVG8nhHscBvcL9yMxkW4khbOYRD6GLCtaDP/US Li/gwGPgtbfY7YlJ+ni+t/h/mBVIgSHXqBlYx4nEeRpH8ejuSAifVRAwWsXWvfSBuo zXG9jdI4v+SZg==
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.500.231\))
From: "Roy T. Fielding" <fielding@gbiv.com>
In-Reply-To: <FC5270AF-509C-4331-AE8F-1F2D51BBC5F2@apple.com>
Date: Thu, 25 May 2023 10:21:34 -0700
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <C687C218-7793-4B74-BB51-B7C34059F9C4@gbiv.com>
References: <FC5270AF-509C-4331-AE8F-1F2D51BBC5F2@apple.com>
To: Tommy Pauly <tpauly@apple.com>
X-Mailer: Apple Mail (2.3731.500.231)
Received-SPF: pass client-ip=23.83.222.69; envelope-from=fielding@gbiv.com; helo=giraffe.ash.relay.mailchannels.net
X-W3C-Hub-DKIM-Status: validation passed: (address=fielding@gbiv.com domain=gbiv.com), signature is good
X-W3C-Hub-Spam-Status: No, score=-6.1
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, W3C_AA=-1, W3C_DB=-1, W3C_IRA=-1, W3C_WL=-1
X-W3C-Scan-Sig: mimas.w3.org 1q2Eey-002HoR-6F 7c028ee18b50bef77d4a773727a05fa8
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Consensus call to include Display Strings in draft-ietf-httpbis-sfbis
Archived-At: <https://www.w3.org/mid/C687C218-7793-4B74-BB51-B7C34059F9C4@gbiv.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/51088
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

> On May 24, 2023, at 7:26 PM, Tommy Pauly <tpauly@apple.com> wrote:
> 
> Hello HTTP WG,
> 
> As part of the WGLC for draft-ietf-httpbis-sfbis, we’ve been discussing the inclusion of "Display Strings” (strings that allow Unicode content).
> 
> While not part of the initial scope of this -bis effort, this addition has had significant discussion and support expressed for inclusion.
> 
> This email starts a formal consensus call to determine if the working group would like to expand the scope of draft-ietf-httpbis-sfbis to include Display Strings, specifically to merge in the following pull request (modulo any editorial changes that are needed):
> 
> https://github.com/httpwg/http-extensions/pull/2494

I think this would have been better in parts, namely

  a) should sfbis add a data type for display strings of non-ASCII content?

  b) should display strings be encoded as ASCII via pct-encoding?

  c) should the encoded characters be limited to %x22 ("), %x25 (%),
     and relatively safe non-ASCII non-control valid UTF-8?

I support (a) if we also require (c).

I think (b) is unnecessary given that HTTP is 8-bit clean for UTF-8
and we are specifically talking about new fields for which there
are no deployed parsers. Yes, I know what it says in RFC 9110.

I think (c) is a requirement regardless of how we do (b).

The PR doesn't clearly express any of these points. It says the
strings contain Unicode (a character set) but they obviously don't;
they contain sequences of unvalidated pct-encoded octets.
This allows arbitrary octets to be encoded for something that
is supposed to be a display string.

I don't think these are editorial questions. I think we need
to have at least rough consensus on *what* the feature is
allowed to contain before we add the feature to the spec.

If this is truly for a display string, the feature must be
specific about the encoding and allowed characters.
My suggestion would be to limit the string to non-CNTRL
ASCII and non-control valid UTF-8. We don't want to allow
anything that would twist the feature to some other ends.

Assuming we do this with pct-encoding, we should not allow
arbitrary octets to be encoded. We should disallow encodings
that are unnecessary (normal printable ASCII aside from % and "),
control characters, or octets not valid for UTF-8. That can
be specified by prose and reference to the IETF specs, or
we could specify the allowed ranges with a regular expression.
Either one is better than allowing arbitrary octets to be encoded.

In general, it is safer to send raw UTF-8 over the wire in HTTP
than it is to send arbitrary pct-encoded octets, simply because
pct-encoding is going to bypass most security checks long enough
for the data to reach an applications where people do stupid
things with strings that they assume contain something that is
safe to display.

Note that I am not saying that we should consider normalization
or any other weirdness specific to Unicode. We don't need to.
We just need to stay within the confines of what has already
been defined as valid and safe UTF-8. Everything else is being
actively targeted by pentesters and script kiddies, on every
public server on the Internet, to the point where we have to
block it within CDN configurations just to avoid overloading
the origin servers.

....Roy