Re: Libraries assuming iso-8859-1 (was: Re: Consensus call to include Display Strings in draft-ietf-httpbis-sfbis)

Poul-Henning Kamp <phk@phk.freebsd.dk> Sun, 28 May 2023 07:28 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9963DC15109B for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sun, 28 May 2023 00:28:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.95
X-Spam-Level:
X-Spam-Status: No, score=-4.95 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, MAILING_LIST_MULTI=-1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id z-e7l_pMHvrJ for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sun, 28 May 2023 00:28:39 -0700 (PDT)
Received: from lyra.w3.org (lyra.w3.org [128.30.52.18]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 429A7C14CE46 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Sun, 28 May 2023 00:28:38 -0700 (PDT)
Received: from lists by lyra.w3.org with local (Exim 4.94.2) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1q3ApF-00Cjog-EY for ietf-http-wg-dist@listhub.w3.org; Sun, 28 May 2023 07:28:25 +0000
Resent-Date: Sun, 28 May 2023 07:28:25 +0000
Resent-Message-Id: <E1q3ApF-00Cjog-EY@lyra.w3.org>
Received: from titan.w3.org ([128.30.52.76]) by lyra.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <phk@critter.freebsd.dk>) id 1q3ApE-00Cjno-0i for ietf-http-wg@listhub.w3.org; Sun, 28 May 2023 07:28:24 +0000
Received: from phk.freebsd.dk ([130.225.244.222]) by titan.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <phk@critter.freebsd.dk>) id 1q3ApC-009oIX-AY for ietf-http-wg@w3.org; Sun, 28 May 2023 07:28:23 +0000
Received: from critter.freebsd.dk (unknown [192.168.55.3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by phk.freebsd.dk (Postfix) with ESMTPS id DE9D4892BB; Sun, 28 May 2023 07:28:16 +0000 (UTC)
Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.17.1/8.16.1) with ESMTPS id 34S7SGxx092549 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Sun, 28 May 2023 07:28:16 GMT (envelope-from phk@critter.freebsd.dk)
Received: (from phk@localhost) by critter.freebsd.dk (8.17.1/8.16.1/Submit) id 34S7SBAv092547; Sun, 28 May 2023 07:28:11 GMT (envelope-from phk)
Message-Id: <202305280728.34S7SBAv092547@critter.freebsd.dk>
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
cc: Mark Nottingham <mnot@mnot.net>, Roy Fielding <fielding@gbiv.com>, Tommy Pauly <tpauly@apple.com>, HTTP Working Group <ietf-http-wg@w3.org>
In-reply-to: <c81e6562-7927-a342-9032-df69aba4ad43@it.aoyama.ac.jp>
From: Poul-Henning Kamp <phk@phk.freebsd.dk>
References: <FC5270AF-509C-4331-AE8F-1F2D51BBC5F2@apple.com> <C687C218-7793-4B74-BB51-B7C34059F9C4@gbiv.com> <F84B0780-7710-4F74-9830-ECBD4A926C3D@mnot.net> <c81e6562-7927-a342-9032-df69aba4ad43@it.aoyama.ac.jp>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-ID: <92544.1685258891.1@critter.freebsd.dk>
Content-Transfer-Encoding: quoted-printable
Date: Sun, 28 May 2023 07:28:11 +0000
Received-SPF: pass client-ip=130.225.244.222; envelope-from=phk@critter.freebsd.dk; helo=phk.freebsd.dk
X-W3C-Hub-Spam-Status: No, score=-4.9
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, W3C_AA=-1, W3C_IRA=-1, W3C_WL=-1
X-W3C-Scan-Sig: titan.w3.org 1q3ApC-009oIX-AY 84b64f94db9a098d6df5eb445e0c704b
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Libraries assuming iso-8859-1 (was: Re: Consensus call to include Display Strings in draft-ietf-httpbis-sfbis)
Archived-At: <https://www.w3.org/mid/202305280728.34S7SBAv092547@critter.freebsd.dk>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/51114
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

--------
Martin J. Dürst writes:

Adding base64 encoding to the table:

>                               Legacy  UTF-8   proposed  expansion  base64  b64expansion
> ASCII                        1       1       1         1           1.33    1.33
> Latin+Accents, e.g. Polish   1       ~1.5    ~2        2           2       2
> Arabic/Cyrillic/...          1       2       6         6           2.66    2.66
> Indic scripts,...            1       3       9         9           4       4
> Chinese/Japanese/...         2       3       9         4.5         4       2
>
> So some text in an Indic or South Asian Script gets expanded by a factor 
> of 9 when compared to a legacy singlebyte encoding.

Base64 does not penalize non-western languages nearly as much.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.