Re: Consensus call to include Display Strings in draft-ietf-httpbis-sfbis

"Poul-Henning Kamp" <phk@phk.freebsd.dk> Fri, 30 June 2023 05:11 UTC

Received: from mimas.w3.org ([128.30.52.79]) by lyra.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <phk@critter.freebsd.dk>) id 1qF6Pt-00H1kI-G2 for ietf-http-wg@listhub.w3.org; Fri, 30 Jun 2023 05:11:33 +0000
Received: from phk.freebsd.dk ([130.225.244.222]) by mimas.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <phk@critter.freebsd.dk>) id 1qF6Pq-00H9bg-7E for ietf-http-wg@w3.org; Fri, 30 Jun 2023 05:11:33 +0000
Received: from critter.freebsd.dk (unknown [192.168.55.3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by phk.freebsd.dk (Postfix) with ESMTPS id 0A38389282; Fri, 30 Jun 2023 05:11:23 +0000 (UTC)
Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.17.1/8.16.1) with ESMTPS id 35U5BM0R055454 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Fri, 30 Jun 2023 05:11:22 GMT (envelope-from phk@critter.freebsd.dk)
Received: (from phk@localhost) by critter.freebsd.dk (8.17.1/8.16.1/Submit) id 35U5BMh9055453; Fri, 30 Jun 2023 05:11:22 GMT (envelope-from phk)
Message-Id: <202306300511.35U5BMh9055453@critter.freebsd.dk>
To: Ilari Liusvaara <ilariliusvaara@welho.com>
cc: HTTP Working Group <ietf-http-wg@w3.org>
In-reply-to: <ZJ1wDMnj9IiBbmlU@LK-Perkele-VII2.locald>
From: Poul-Henning Kamp <phk@phk.freebsd.dk>
References: <FC5270AF-509C-4331-AE8F-1F2D51BBC5F2@apple.com> <39E3B9FB-DD37-4D22-A35E-D50DAC512C69@apple.com> <84B0BBBB-6652-4442-88DF-0E3F3FEF5CEF@mnot.net> <202306260714.35Q7E4JR068513@critter.freebsd.dk> <ZJ1ALI5LKxHb7BSV@LK-Perkele-VII2.locald> <202306290919.35T9Jgus008318@critter.freebsd.dk> <ZJ1wDMnj9IiBbmlU@LK-Perkele-VII2.locald>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <55451.1688101881.1@critter.freebsd.dk>
Date: Fri, 30 Jun 2023 05:11:21 +0000
Received-SPF: pass client-ip=130.225.244.222; envelope-from=phk@critter.freebsd.dk; helo=phk.freebsd.dk
X-W3C-Hub-Spam-Status: No, score=-4.9
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, W3C_AA=-1, W3C_IRA=-1, W3C_WL=-1
X-W3C-Scan-Sig: mimas.w3.org 1qF6Pq-00H9bg-7E b2fae839e1ca4100788dcbd985ac5314
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Consensus call to include Display Strings in draft-ietf-httpbis-sfbis
Archived-At: <https://www.w3.org/mid/202306300511.35U5BMh9055453@critter.freebsd.dk>

--------
Ilari Liusvaara writes:
> On Thu, Jun 29, 2023 at 09:19:42AM +0000, Poul-Henning Kamp wrote:

> > There is no way to make UniCode safe, because it is anyones guess what
> > UniCode decides to add later.
>
> I did some digging about when Unicode last added some "interesting"
> stuff. The last one I could find was some additional direction
> overrides from 2013. All the other "interesting" stuff seems to be
> from 1993 (the very first version of Unicode). And the Cc stuff seems
> to be even older than that.

Did you also find where they promised to never do anything silly again ?

If so, please share a link, because I cannot find it anywhere...

> > Avoiding any and all hazards related to that /at the HTTP level/, is
> > why I still think we should base64 encode them, instead of the mutant
> > percent-with-the-random-backslash-thrown-in currently proposed.
>
> How would that help? Even currently, all that stuff must be escaped.
> And the hazards of unicode are associated with displaying it, and then
> it does not matter if it was percent-encoded or base64-encoded.

All the characters in b64 output are graphical and "safe", and b64
data is already part of the "vocabulary" of HTTP fields, where it
is used to transport things you should not throw at a terminal, so
there is no risk of some program somewhere doing something stupid.

And you are right about efficiency:

It is much more efficient than %xx for approximately half
the worlds populations primary and often only languages.

But more importantly, b64 is already part of the specification, so
less code will have to be written for SFbis.

> However, encoding not capable of representing Cc would be entierely
> different thing. And clearly Cc contains by far the most hazardous
> stuff in the entiere Unicode.

As I said: I dont think we improve the situation by wading into that
sump, apart from clearly signing it's existence.

Poul-Henning

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.