Re: Consensus call to include Display Strings in draft-ietf-httpbis-sfbis

Ilari Liusvaara <ilariliusvaara@welho.com> Thu, 29 June 2023 11:50 UTC

Received: from mimas.w3.org ([128.30.52.79]) by lyra.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <ilariliusvaara@welho.com>) id 1qEqAf-00Eme8-CM for ietf-http-wg@listhub.w3.org; Thu, 29 Jun 2023 11:50:45 +0000
Received: from welho-filter2b.welho.com ([83.102.41.28] helo=welho-filter2.welho.com) by mimas.w3.org with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <ilariliusvaara@welho.com>) id 1qEqAd-00Gpaf-ND for ietf-http-wg@w3.org; Thu, 29 Jun 2023 11:50:45 +0000
Received: from localhost (localhost [127.0.0.1]) by welho-filter2.welho.com (Postfix) with ESMTP id 9AA1143D4D for <ietf-http-wg@w3.org>; Thu, 29 Jun 2023 14:50:37 +0300 (EEST)
X-Virus-Scanned: Debian amavisd-new at pp.htv.fi
Received: from welho-smtp1.welho.com ([IPv6:::ffff:83.102.41.84]) by localhost (welho-filter2.welho.com [::ffff:83.102.41.24]) (amavisd-new, port 10024) with ESMTP id L_I7MoHEYVGQ for <ietf-http-wg@w3.org>; Thu, 29 Jun 2023 14:50:37 +0300 (EEST)
Received: from LK-Perkele-VII2 (87-94-129-82.rev.dnainternet.fi [87.94.129.82]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by welho-smtp1.welho.com (Postfix) with ESMTPSA id 640E67A for <ietf-http-wg@w3.org>; Thu, 29 Jun 2023 14:50:36 +0300 (EEST)
Date: Thu, 29 Jun 2023 14:50:36 +0300
From: Ilari Liusvaara <ilariliusvaara@welho.com>
To: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <ZJ1wDMnj9IiBbmlU@LK-Perkele-VII2.locald>
References: <FC5270AF-509C-4331-AE8F-1F2D51BBC5F2@apple.com> <39E3B9FB-DD37-4D22-A35E-D50DAC512C69@apple.com> <84B0BBBB-6652-4442-88DF-0E3F3FEF5CEF@mnot.net> <202306260714.35Q7E4JR068513@critter.freebsd.dk> <ZJ1ALI5LKxHb7BSV@LK-Perkele-VII2.locald> <202306290919.35T9Jgus008318@critter.freebsd.dk>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
In-Reply-To: <202306290919.35T9Jgus008318@critter.freebsd.dk>
Sender: ilariliusvaara@welho.com
Received-SPF: pass client-ip=83.102.41.28; envelope-from=ilariliusvaara@welho.com; helo=welho-filter2.welho.com
X-W3C-Hub-Spam-Status: No, score=-3.9
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, W3C_AA=-1, W3C_WL=-1
X-W3C-Scan-Sig: mimas.w3.org 1qEqAd-00Gpaf-ND 4856575ba27e946005dc28d0aed0e433
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Consensus call to include Display Strings in draft-ietf-httpbis-sfbis
Archived-At: <https://www.w3.org/mid/ZJ1wDMnj9IiBbmlU@LK-Perkele-VII2.locald>

On Thu, Jun 29, 2023 at 09:19:42AM +0000, Poul-Henning Kamp wrote:
> --------
> Ilari Liusvaara writes:
> 
> > 2) I think it should be specified that any direction change characters
> > MUST NOT affect any text surrounding the displayed string. At least
> > getting this wrong causes at most some screwed up text rendering.
> 
> There is no way to make UniCode safe, because it is anyones guess what
> UniCode decides to add later.

I did some digging about when Unicode last added some "interesting"
stuff. The last one I could find was some additional direction
overrides from 2013. All the other "interesting" stuff seems to be
from 1993 (the very first version of Unicode). And the Cc stuff seems
to be even older than that.


> I dont think it makes any sense for us to wade into that sump,
> beyond a sternly written "Security Considerations" which says
> that UniCode is by definition unsafe.
> 
> Avoiding any and all hazards related to that /at the HTTP level/, is
> why I still think we should base64 encode them, instead of the mutant
> percent-with-the-random-backslash-thrown-in currently proposed.

How would that help? Even currently, all that stuff must be escaped.
And the hazards of unicode are associated with displaying it, and then
it does not matter if it was percent-encoded or base64-encoded.

Between percent-encoding and base64 it is merely about efficiency.
However, encoding not capable of representing Cc would be entierely
different thing. And clearly Cc contains by far the most hazardous
stuff in the entiere Unicode.




-Ilari