Re: Consensus call to include Display Strings in draft-ietf-httpbis-sfbis

Poul-Henning Kamp <phk@phk.freebsd.dk> Thu, 25 May 2023 22:24 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D66A3C13738A for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 25 May 2023 15:24:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.95
X-Spam-Level:
X-Spam-Status: No, score=-4.95 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, MAILING_LIST_MULTI=-1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2WTAPWmsCap4 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 25 May 2023 15:24:00 -0700 (PDT)
Received: from lyra.w3.org (lyra.w3.org [128.30.52.18]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B0ED4C13739A for <httpbisa-archive-bis2Juki@lists.ietf.org>; Thu, 25 May 2023 15:24:00 -0700 (PDT)
Received: from lists by lyra.w3.org with local (Exim 4.94.2) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1q2JN8-007bhF-Fy for ietf-http-wg-dist@listhub.w3.org; Thu, 25 May 2023 22:23:50 +0000
Resent-Date: Thu, 25 May 2023 22:23:50 +0000
Resent-Message-Id: <E1q2JN8-007bhF-Fy@lyra.w3.org>
Received: from titan.w3.org ([128.30.52.76]) by lyra.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <phk@critter.freebsd.dk>) id 1q2JN6-007bfk-H0 for ietf-http-wg@listhub.w3.org; Thu, 25 May 2023 22:23:48 +0000
Received: from phk.freebsd.dk ([130.225.244.222]) by titan.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <phk@critter.freebsd.dk>) id 1q2JN4-008vR5-O1 for ietf-http-wg@w3.org; Thu, 25 May 2023 22:23:48 +0000
Received: from critter.freebsd.dk (unknown [192.168.55.3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by phk.freebsd.dk (Postfix) with ESMTPS id 71F7889293; Thu, 25 May 2023 22:23:41 +0000 (UTC)
Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.17.1/8.16.1) with ESMTPS id 34PMNeEE001083 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Thu, 25 May 2023 22:23:40 GMT (envelope-from phk@critter.freebsd.dk)
Received: (from phk@localhost) by critter.freebsd.dk (8.17.1/8.16.1/Submit) id 34PMNecG001082; Thu, 25 May 2023 22:23:40 GMT (envelope-from phk)
Message-Id: <202305252223.34PMNecG001082@critter.freebsd.dk>
To: "Roy T. Fielding" <fielding@gbiv.com>
cc: Tommy Pauly <tpauly@apple.com>, HTTP Working Group <ietf-http-wg@w3.org>
In-reply-to: <C687C218-7793-4B74-BB51-B7C34059F9C4@gbiv.com>
From: Poul-Henning Kamp <phk@phk.freebsd.dk>
References: <FC5270AF-509C-4331-AE8F-1F2D51BBC5F2@apple.com> <C687C218-7793-4B74-BB51-B7C34059F9C4@gbiv.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <1080.1685053420.1@critter.freebsd.dk>
Date: Thu, 25 May 2023 22:23:40 +0000
Received-SPF: pass client-ip=130.225.244.222; envelope-from=phk@critter.freebsd.dk; helo=phk.freebsd.dk
X-W3C-Hub-Spam-Status: No, score=-4.9
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, W3C_AA=-1, W3C_IRA=-1, W3C_WL=-1
X-W3C-Scan-Sig: titan.w3.org 1q2JN4-008vR5-O1 cad19a865b2476982a6067815a6101e2
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Consensus call to include Display Strings in draft-ietf-httpbis-sfbis
Archived-At: <https://www.w3.org/mid/202305252223.34PMNecG001082@critter.freebsd.dk>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/51090
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

--------
Roy T. Fielding writes:

> I think this would have been better in parts, namely

Agreed.

> My suggestion would be to limit the string to non-CNTRL
> ASCII and non-control valid UTF-8. We don't want to allow
> anything that would twist the feature to some other ends.
> [...]
> Note that I am not saying that we should consider normalization
> or any other weirdness specific to Unicode.

Each new version of UniCode adds new code points, and they decided
up front that UniCode sequences would not be versioned.

Instead they issued guidance, and I'm paraphrasing here: "If you
receive a code-point you dont recognize, assume the sender has a
new version of UniCode than you do and display something safe and
distinct."

I have also never seen a document where UniCode clearly and
definitive promise to never add further control characters.

So checking that you have "non-control valid UTF-8" is always going
to require a (moderately) up-to-date representation of which unicode
codepoints are valid and which of those are controls.

Why would we inflict that burden at the HTTP level ?

> We just need to stay within the confines of what has already
> been defined as valid and safe UTF-8.

Do you have a specific document in mind here ?

> In general, it is safer to send raw UTF-8 over the wire in HTTP
> than it is to send arbitrary pct-encoded octets, simply because
> pct-encoding is going to bypass most security checks long enough
> for the data to reach an applications where people do stupid
> things with strings that they assume contain something that is
> safe to display.

This is precisely why I think we should /never/ employ pct-encoding
in HTTP headers.

Given that HTTP is increasingly being treated as a transport protocol,
(not that I agree with that either,) I think it is a much safer
approach to handle UTF8 as opaque binary data at the HTTP level,
and transfer it as such, in sf-binary fields.

> Everything else is being
> actively targeted by pentesters and script kiddies, on every
> public server on the Internet, to the point where we have to
> block it within CDN configurations just to avoid overloading
> the origin servers.

100% agreement: The only thing DisplayString offers over sf-binary,
is increased risk.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.