Re: Consensus call to include Display Strings in draft-ietf-httpbis-sfbis

Julian Reschke <julian.reschke@gmx.de> Fri, 26 May 2023 05:22 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 34CC5C151092 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 25 May 2023 22:22:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.748
X-Spam-Level:
X-Spam-Status: No, score=-2.748 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.25, MAILING_LIST_MULTI=-1, NICE_REPLY_A=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmx.de
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oPNIUCPuQwQH for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Thu, 25 May 2023 22:22:15 -0700 (PDT)
Received: from lyra.w3.org (lyra.w3.org [128.30.52.18]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2A9A2C15108C for <httpbisa-archive-bis2Juki@lists.ietf.org>; Thu, 25 May 2023 22:22:14 -0700 (PDT)
Received: from lists by lyra.w3.org with local (Exim 4.94.2) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1q2Ptq-008NjE-FN for ietf-http-wg-dist@listhub.w3.org; Fri, 26 May 2023 05:22:02 +0000
Resent-Date: Fri, 26 May 2023 05:22:02 +0000
Resent-Message-Id: <E1q2Ptq-008NjE-FN@lyra.w3.org>
Received: from titan.w3.org ([128.30.52.76]) by lyra.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <julian.reschke@gmx.de>) id 1q2Pto-008NZw-GG for ietf-http-wg@listhub.w3.org; Fri, 26 May 2023 05:22:00 +0000
Received: from mout.gmx.net ([212.227.15.15]) by titan.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <julian.reschke@gmx.de>) id 1q2Ptm-00925u-Vi for ietf-http-wg@w3.org; Fri, 26 May 2023 05:22:00 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.de; s=s31663417; t=1685078514; i=julian.reschke@gmx.de; bh=0uboWm8b2/dxZJ5IWxZXQquWqZHToQClP/5AeHLPICU=; h=X-UI-Sender-Class:Date:Subject:To:References:From:In-Reply-To; b=TtOeQTChZNANPWtowwasi48jQzXjeW7X4l67ZQ4UOtdDO8Dk1UUH0ggqSmGjIiwvb BSYElqilVZyUA6dXVWxlueHx49yn7+wnNmfZnQVFcQHZOh72j/dISks7X5I7TuE/o1 diAqBlxR6WsVdRRwFd0jLP3XFgQWbAq3dzTkhGuQCJhf054yGL7TNvof6v7UaGDPb9 5nh8z0u7mSxbh9d41pD1TFQ1JTqa4mn0O2j5k/w4w4gTgiVialZCOrUXVYoQzjhFY9 92UJihm9JrjJ2N4MaFafYg/YgcWNL5eetdGHljchbNORPDPAVEy183u5owonkzD5t8 afV1YgMa4Rr2A==
X-UI-Sender-Class: 724b4f7f-cbec-4199-ad4e-598c01a50d3a
Received: from [192.168.178.20] ([91.61.50.92]) by mail.gmx.net (mrgmx004 [212.227.17.190]) with ESMTPSA (Nemesis) id 1MiacH-1qZIbV0TXb-00fl7g for <ietf-http-wg@w3.org>; Fri, 26 May 2023 07:21:54 +0200
Message-ID: <5a704134-ce9c-2201-62ff-3a70ba6ac775@gmx.de>
Date: Fri, 26 May 2023 07:21:53 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0
Content-Language: en-US
To: ietf-http-wg@w3.org
References: <FC5270AF-509C-4331-AE8F-1F2D51BBC5F2@apple.com> <C687C218-7793-4B74-BB51-B7C34059F9C4@gbiv.com> <202305252223.34PMNecG001082@critter.freebsd.dk>
From: Julian Reschke <julian.reschke@gmx.de>
In-Reply-To: <202305252223.34PMNecG001082@critter.freebsd.dk>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: quoted-printable
X-Provags-ID: V03:K1:/f1jwHIUMb4oSOW15FfyF6mpgbYAkIYYllZ4bnzL7pYi3as9U1m 7NDyhz8qKJmtcnOUPXpxWcywbo/iibo7e0abnROlPfHg/6jglj/xb0TvFikRWGrF6G38lMJ 6WHxSQt2KQ5DiAKuUZ+FMmZXLl33CgJEUrDOI+DM7e+I/inlxBLGahwuUBZIhOZndcsMdZI MHmmpZnEpkB0PUknBIoAw==
UI-OutboundReport: notjunk:1;M01:P0:eq9NJDJf3mw=;+9OIJwW9VCxwNMhtnGZgedyqlPV xv30UPcOs3hBLhMjpUmeZgq1fTYjmhZx1sVXAP1mEZ/4xmhTVBMMvFfRN/zoiIpHAAh1HQMWK 1/l+u3I7t4vh9idbZBMbtQPx5ctkwY7WxK9WGRbhvQrjhzHQ0LVePypbVuoEkTo4YRhcPvU2E 5ihIHT6VHLJkBnF6K60E3nPCTf4MdaEmRV5VAMM1HhhQFzMLWclpsIe+I99u4LCAwZ4s6M+BS sKHMjvHE+CLaTz6t+AT+cuJO1UIiU6wxyRnhfoJ9Wob6ks/yvl6CZFNnonEOKv1PSX06WkZyS 3QYwhEq9+J+gMRRMRsBlTTkX/gdF9FNQzyANeLJslZnw+0EXKQ4GhZ/TI3+Ut+oSzzXCiIzd4 vFr6jPUndBOXAnDNC3UTPwJL3vWpOyeO+1t/SoJAq1suVWiu2AYkXEZYuJzW/tRykvyufpHBH WDrvi1raenYKpxyvbjWUvIboi7MrkOwvz5RoDiVBQ5fVfQET2v0YE54twgHwKlr6ePdnaEY4A 5UKy84DE1678EvLGqdpnqAuIOzSPZSLOtUWjz6j5RFAtc5DiioUH7ATuQqHDiTJkQr7MMzBcH oEbKju0Oe8PpryoPBFCy4mc8eZvF1DnoYMGnfJd/XHzmtfzI90YQ512PjhTYYt6gKPSEBdU9v zJ478TvDDb+4qJ4TU74kn4M5UtcxxKxrDaLCgh2VRGwzo3i93XCVHQ+/9U8JEudjZSbUWkpxw Me3qCkxEk4zZxhI8yHuuGeVCWnSWbT//XRe2yo1EvWTD3mesZydSKLASfJq01qn8S2tiSxX6f KuxyA2VIY6ZZkWD5RT4bvoq96s2L5mcM8JoKnCEyZZpUyoaXjaCr5cdNI1cVXgRf3PUATB4jT 3Lc8zQtyKf9RHFJr8A2OLQfbW53rS0uf+UHMBNYS9yvHJ7RyYp/9ndMUdvJ5LwOPGSdNUDKIW Z9eQUg==
Received-SPF: pass client-ip=212.227.15.15; envelope-from=julian.reschke@gmx.de; helo=mout.gmx.net
X-W3C-Hub-DKIM-Status: validation passed: (address=julian.reschke@gmx.de domain=gmx.de), signature is good
X-W3C-Hub-DKIM-Status: validation passed: (address=julian.reschke@gmx.de domain=julian.reschke@gmx.de), signature is good
X-W3C-Hub-Spam-Status: No, score=-5.9
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, NICE_REPLY_A=-0.091, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, W3C_AA=-1, W3C_IRA=-1, W3C_WL=-1
X-W3C-Scan-Sig: titan.w3.org 1q2Ptm-00925u-Vi 9ffeb777bf71ccfcdc2ad22f899027d9
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Consensus call to include Display Strings in draft-ietf-httpbis-sfbis
Archived-At: <https://www.w3.org/mid/5a704134-ce9c-2201-62ff-3a70ba6ac775@gmx.de>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/51093
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

On 26.05.2023 00:23, Poul-Henning Kamp wrote:
> --------
> Roy T. Fielding writes:
>
>> I think this would have been better in parts, namely
>
> Agreed.

I agree partly; I think Mark went ahead with a concrete proposal so that
this can be done quickly. It's clear that there are many ways to do
this, and I'm pretty sure that it'll be very hard to agree on the best one.

At the end of the day what matters is that we have that capability, as
opposed to not having at all.

>> My suggestion would be to limit the string to non-CNTRL
>> ASCII and non-control valid UTF-8. We don't want to allow
>> anything that would twist the feature to some other ends.
>> [...]
>> Note that I am not saying that we should consider normalization
>> or any other weirdness specific to Unicode.
>
> Each new version of UniCode adds new code points, and they decided
> up front that UniCode sequences would not be versioned.
>
> Instead they issued guidance, and I'm paraphrasing here: "If you
> receive a code-point you dont recognize, assume the sender has a
> new version of UniCode than you do and display something safe and
> distinct."

How exactly does that matter for the discussion we are having here?

> I have also never seen a document where UniCode clearly and
> definitive promise to never add further control characters.
>
> So checking that you have "non-control valid UTF-8" is always going
> to require a (moderately) up-to-date representation of which unicode
> codepoints are valid and which of those are controls.

Yes, that would need to be clarified; I believe Roy refers to a
definition of controls that is fixed.

> Why would we inflict that burden at the HTTP level ?
>
>> We just need to stay within the confines of what has already
>> been defined as valid and safe UTF-8.
>
> Do you have a specific document in mind here ?
>
>> In general, it is safer to send raw UTF-8 over the wire in HTTP
>> than it is to send arbitrary pct-encoded octets, simply because
>> pct-encoding is going to bypass most security checks long enough
>> for the data to reach an applications where people do stupid
>> things with strings that they assume contain something that is
>> safe to display.
>
> This is precisely why I think we should /never/ employ pct-encoding
> in HTTP headers.

But we do already. Also, the argument that security checks can be
bypassed applies to sf-binary as well.

> Given that HTTP is increasingly being treated as a transport protocol,
> (not that I agree with that either,) I think it is a much safer
> approach to handle UTF8 as opaque binary data at the HTTP level,
> and transfer it as such, in sf-binary fields.
>
>> Everything else is being
>> actively targeted by pentesters and script kiddies, on every
>> public server on the Internet, to the point where we have to
>> block it within CDN configurations just to avoid overloading
>> the origin servers.
>
> 100% agreement: The only thing DisplayString offers over sf-binary,
> is increased risk.

No, it offers a way to label Unicode data as such (without requiring
out-of-band knowledge).

That's *exactly* the same reason why we are adding sf-date. If that's a
concern for you, why didn't you argue against the introduction of
sf-date as well? After all, it does not add any value over sf-integer
except for inlining the type information.

Best regards, Julian