Re: Compression Dictionary follow-up from IETF 119

Patrick Meenan <patmeenan@gmail.com> Wed, 24 April 2024 13:43 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=ietf.org@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 145B0C14F6A4 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Wed, 24 Apr 2024 06:43:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.85
X-Spam-Level:
X-Spam-Status: No, score=-2.85 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.248, HTML_MESSAGE=0.001, MAILING_LIST_MULTI=-1, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=w3.org header.b="hDuJKWyE"; dkim=pass (2048-bit key) header.d=w3.org header.b="KkoJTIeA"; dkim=pass (2048-bit key) header.d=gmail.com header.b="RKT/2YBB"
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mEGIAEF-KiDc for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Wed, 24 Apr 2024 06:43:14 -0700 (PDT)
Received: from mab.w3.org (mab.w3.org [IPv6:2600:1f18:7d7a:2700:d091:4b25:8566:8113]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 82B9EC14F6B9 for <httpbisa-archive-bis2Juki@ietf.org>; Wed, 24 Apr 2024 06:43:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=w3.org; s=s1; h=Subject:Content-Type:Cc:To:Message-ID:Date:From:In-Reply-To: References:MIME-Version:Reply-To; bh=3msb3IgpPQM3ub3JP2cH/+0LwbyWVGj2Ve+4DWxRhrA=; b=hDuJKWyEsXBDgFgaTi7dKNGzAF 23J41zRj9egKgIdm6hFhRyh/6kHWmE8DYltY32rQ5+bJ2shz51arvVfIbX9TqrtMItnaCJaTt6C5t ULCB6QKwoTwwEYY6qGjPc+9rcGd4+n8hKeSqt105khH4dha4pxB18vRanj68EowbPc8tnJLq+zg1D 6Qz8yyVHU9+UgDKdM4H3pZgCnbt56Pn8hF3853Hnnh3PtMS/uti3ydIjtsBp+4we6qtR3TKwWVtEn dVNcFvPPg8wLON0dINYV4DZbvTdY0zn8hWPqJmw6SIww+tVghG9/duKrmH8lJS9TviOh52lsglSC1 zfjJfpsg==;
Received: from lists by mab.w3.org with local (Exim 4.96) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1rzctJ-001lSI-1H for ietf-http-wg-dist@listhub.w3.org; Wed, 24 Apr 2024 13:42:29 +0000
Resent-Date: Wed, 24 Apr 2024 13:42:29 +0000
Resent-Message-Id: <E1rzctJ-001lSI-1H@mab.w3.org>
Received: from ip-10-0-0-144.ec2.internal ([10.0.0.144] helo=pan.w3.org) by mab.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from <patmeenan@gmail.com>) id 1rzctG-001lR6-0a for ietf-http-wg@listhub.w3.internal; Wed, 24 Apr 2024 13:42:26 +0000
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=w3.org; s=s1; h=Content-Type:Cc:To:Subject:Message-ID:Date:From:In-Reply-To: References:MIME-Version:Reply-To; bh=3msb3IgpPQM3ub3JP2cH/+0LwbyWVGj2Ve+4DWxRhrA=; t=1713966146; x=1714830146; b=KkoJTIeA82ff7QkGPJ2d8RTIMGIdBVeIVtc6HZWyfMguWYAdmMLtQJkeXxXFusSyhh+MGCYJoIb hNa+4Xho4x5D+fJKjyVVqoxQisHtPZfKqNMx+SLmtBQp09XIszYD+x7GlSKU/YDaIasLBdeE/UQQL T6L1hb1jfu0ntWsmtGWsHA56U1xmyBCOcQKl2Biolmx5RfjFGjGjzY7Pib9mIFO6IsXzj146U0j6a 19zWjE7MwrOn3jB0Fc1JvIUN5yuVbO7VkrGnslEGxizMPs9gel0FxolaypX2w16vo7WSd1wPk3od6 Rhxna/FXWWbMWFemjqeW4NtaMABvuiwp+8oQ==;
Received-SPF: pass (pan.w3.org: domain of gmail.com designates 2a00:1450:4864:20::12f as permitted sender) client-ip=2a00:1450:4864:20::12f; envelope-from=patmeenan@gmail.com; helo=mail-lf1-x12f.google.com;
Received: from mail-lf1-x12f.google.com ([2a00:1450:4864:20::12f]) by pan.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from <patmeenan@gmail.com>) id 1rzctF-00DlIj-1t for ietf-http-wg@w3.org; Wed, 24 Apr 2024 13:42:26 +0000
Received: by mail-lf1-x12f.google.com with SMTP id 2adb3069b0e04-516d2600569so8771366e87.0 for <ietf-http-wg@w3.org>; Wed, 24 Apr 2024 06:42:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713966141; x=1714570941; darn=w3.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=3msb3IgpPQM3ub3JP2cH/+0LwbyWVGj2Ve+4DWxRhrA=; b=RKT/2YBBxcyqsQjRXnz20/RQb6da1yplHW3JZVJG6rFxAAn7JwhwL1pK1x83cyynCi fDyEpYdmWecSP553pRrca9WapQkgiEHjaUB4XlBwZRu2sFA5P54MGXedDywYdPJbkIk4 1Iw2uiCwLveMgAQJNJaYhDBteMeGrsk4uKa005SJcyKSp8czLXvrHIlWCf6ct15Mupnn PzxWAh5usjUmbdhA6BbgrGVtNsrMtrWpMXqWFnhU59fATpWzmR1oqRjNGY6g9ntId5MJ teRUmNQX5Xu9xy6Mb3KcPWkyLt4XOwHrZvm/SFfjdl3f6Fs96ylocyCnGnF3z3yHb/kE 8Tlg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713966141; x=1714570941; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=3msb3IgpPQM3ub3JP2cH/+0LwbyWVGj2Ve+4DWxRhrA=; b=ei21mo4YaBID0ZhVUqj7ZdyyFG/sez6HHFUixUd0MrOH0yxVpmLBoTmKiDYvgx02sg E4fcr/2/SMr7OY0LJMrdR/PDS+Aw6OuKgDuW0MvMoTWsO34kw4pU+wGBF80aKAv0/31c TT/bZXMf0gDlqMpP50dGQT6FIaopBBJLnR7uO2u1f15nokFd13pvlNv0p6fOTLluLAYs /+XJvNjxFp7dfmNyGPXNgq/oFjgfH3BHs1AScQcKiySsVNw3bkEa6yap/yDaBMVOsQIb k3t3CMXjjOsFndajlXAlh8hKLS2CITors1eb7TfqqScVTph20PmWDGme1dCMIaw20eFK ftOQ==
X-Gm-Message-State: AOJu0YwxwI5/q5Kl8SHM/oMdN0UddA+rgQswq8bNHYnPwSAfxw6Exvjo 2xv2GqFqKdvk7P1ugD5JCpPf6IpzsTDVnhXJnunhzudl1MTRH49Qy1ONzPDo//+lNoMhiqUdPDx F/swTKH9t9szpxmAdF8HPDQ6khiHvj0/7
X-Google-Smtp-Source: AGHT+IGMnKKvLDElStgaNwW/LdLAVPbeqwhxXD0ThPhvF1S6Zo6HpjBLnL70BGqTkTWgboe+K/CY81d5Rx5O4gdUoos=
X-Received: by 2002:a05:6512:931:b0:51b:4e3:1fa9 with SMTP id f17-20020a056512093100b0051b04e31fa9mr1818794lft.12.1713966140795; Wed, 24 Apr 2024 06:42:20 -0700 (PDT)
MIME-Version: 1.0
References: <CAJV+MGzO5XLMkRJFXBGzmJHQtHq-MBQfOPNFRyPfX8D0uzJV-Q@mail.gmail.com> <c65e80de-3565-4ba9-b9f7-e651f4f60d28@app.fastmail.com>
In-Reply-To: <c65e80de-3565-4ba9-b9f7-e651f4f60d28@app.fastmail.com>
From: Patrick Meenan <patmeenan@gmail.com>
Date: Wed, 24 Apr 2024 09:42:09 -0400
Message-ID: <CAJV+MGzzvREhxTg2DOmooHAHU6-zrZPwcTSe_+h4zyNe_yAL-A@mail.gmail.com>
To: Lucas Pardue <lucas@lucaspardue.com>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Content-Type: multipart/alternative; boundary="000000000000fab9ca0616d7d67f"
X-W3C-Hub-DKIM-Status: validation passed: (address=patmeenan@gmail.com domain=gmail.com), signature is good
X-W3C-Hub-Spam-Status: No, score=-5.1
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, DMARC_PASS=-0.001, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, W3C_AA=-1, W3C_DB=-1, W3C_WL=-1
X-W3C-Scan-Sig: pan.w3.org 1rzctF-00DlIj-1t c8453c7503e95c8b3ee6a912a274aabc
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Compression Dictionary follow-up from IETF 119
Archived-At: <https://www.w3.org/mid/CAJV+MGzzvREhxTg2DOmooHAHU6-zrZPwcTSe_+h4zyNe_yAL-A@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/51932
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/email/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

To some extent, yes but usually with regex matching to pull out just the
hex string characters - though if you have access to the HTTP headers, you
likely have access to the URL as well. The other place where the friction
was less was just in day-to-day development with dev tools and inspecting
the headers and manually checking the equivalent file on the origin (though
bas64url encoding of the hash on the origin would make this easier to
eyeball).

On Wed, Apr 24, 2024 at 8:59 AM Lucas Pardue <lucas@lucaspardue.com> wrote:

> Hi Pat,
>
> On Sun, Mar 24, 2024, at 18:57, Patrick Meenan wrote:
>
> Thanks for the great discussion. There were two points of discussion that
> were left a bit open that I wanted to follow up on:
>
> 1 - Potential for "match" to be a client DOS vector
>
> All of the match patterns for a given origin (partitioned by page origin)
> need to be evaluated before a decision can be made and there was a concern
> that a lot of dictionaries could DOS the client (or be a footgun).
>
> Not matching is a graceful fallback so things are entirely within the
> client control (much as the HTTP cache is).
>
> Chrome currently has a limit of 1000 dictionaries per partition so if a
> site sets more than that, some will be evicted. We may tune that number if
> we start to see impact on the request times from running the matches.
>
>
> 2 - Questions about the use case for hex-encoded dictionary hashes.
>
> There was some question about the cases where developers are using the
> hex-encoded hash values where sf-binary was causing extra friction.
>
> The main flow where that has been an issue is when delta-encoding static
> assets (e.g. javascript bundles). At build time, the current version of a
> bundle is compressed using a previous version as a dictionary and is stored
> with the hex dictionary hash as part of the file name (then published to
> wherever they are served from).  Hex encoding is easy to use at build time
> since that is the output from cli tooling and is filesystem-safe during the
> build.  At serving time, the Available-Dictionary header value is appended
> to the URL and the file is checked, falling back to the unmodified URL.
>
> Most that I have talked to are keeping the hex encoding and adding
> processing to the serving path to convert the sf-binary to hex (e.g.
> hexencode(base64decode(strip(AvailableDictionary, ':'))) ).
>
>
> This makes me wonder something. In the version where it was hex encoded,
> were people actually parsing the HTTP field value before appending it to a
> URL? If not, that would seem like a possible attack vector.
>
> The processing step you illustrate at least adds some validation check,
> since the base64 decode would fail closed on bad input if the field value
> passed verbatim and not parsed first.
>
> Just a thought.
>
> Cheers
> Lucas
>
>
>
> We'll keep an eye on feedback from the updated Chrome origin trial to get
> a sense for how common it is and if there are any situations where it isn't
> easy to work with.
>
> Thanks,
>
> -Pat
>
>
>