Re: Call for Adoption: draft-meenan-httpbis-compression-dictionary

Patrick Meenan <patmeenan@gmail.com> Fri, 18 August 2023 13:43 UTC

Received: from mimas.w3.org ([128.30.52.79]) by lyra.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <patmeenan@gmail.com>) id 1qWzgq-001EAm-Fy for ietf-http-wg@listhub.w3.org; Fri, 18 Aug 2023 13:43:46 +0000
Received: from mail-ed1-x52a.google.com ([2a00:1450:4864:20::52a]) by mimas.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from <patmeenan@gmail.com>) id 1qWzlP-003WWi-CJ for ietf-http-wg@w3.org; Fri, 18 Aug 2023 13:43:45 +0000
Received: by mail-ed1-x52a.google.com with SMTP id 4fb4d7f45d1cf-52580a24128so1147036a12.2 for <ietf-http-wg@w3.org>; Fri, 18 Aug 2023 06:43:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692366219; x=1692971019; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=hYsLl72UhWWqL8WQqNGQEyfsvYiIBHgKdG+w/1m6g1A=; b=HigdqJWkcZWm+cHWBuYespm3HYRWbbQeAb1h3/eLkg3Ug14kQ4/HJNI4jIT9ExvMEf RtNjUsJS8+Qo5ouBGyMh4qdWQWiZ0OAgVXwkHWZ1497+hs7hgaSSAc9KRE6Kjjrdaynl T0SLurxyJlM06ocVK0zGXy/ETrot7S/TTNVcvzZcY+ZuI/ZVP+viPxqPWsAkB7q1J6zR uZzwrV5i8gt+zdbpzGTgxy7WAb3ksYShrF/obgMxye6jzMZrce/PH/9rtbYb0Piyj4pT vH/wz5EOgzbgV7rJEhzkIOf/TYv5qRo5mf93Q7olwk2BNtQ2At7VLP6/hAb0g3+B+lsn Bckg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692366219; x=1692971019; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=hYsLl72UhWWqL8WQqNGQEyfsvYiIBHgKdG+w/1m6g1A=; b=TyNRAVaFInvBkU70ZN0UEi3IPNx1Jtt6UEHoskyDdq4HXDC+ESL6GpDAsd2IypnHfR JhBW8F7F+VMrhcShDm7WcMEkG37QiVx8hPuYvqtfEFvqzhgnlvw1xaUetx2BFIjFllLI p/MTtktfAzuk61tocWs4uG80+/nRrird+m5aQ+K4D/JnyYmtO0MqI4ZbhWHnxOBLQ3kS OlIUl0IHgmyO51XrltyNr/L7esrVOfEz9T6Dq3muj3CTqwERwGEeOgQulxsbTj+pY4wf IM5nIDHt0srrEHEacbo14iziahytsvrKE1L0x1YlPbzdoGXxGUXyv92AP+4WLpEh20Za WSZA==
X-Gm-Message-State: AOJu0Yw/JiljLmqvp68xaDy55d87r7EF0AEULpxXtR2kNiWYBBQ3XqUf GQZKNiAIdHwYJMhu7r2R8i6siZTs8FKSasEgTMcoXflmiHI=
X-Google-Smtp-Source: AGHT+IEmDwL3BX97ImagE2Z2q2AlBE2lK2BEaWnD7A16maBsb7bR61d32kPU1yuo0v00tebm5VmdekmltucsMD0VfTE=
X-Received: by 2002:aa7:d996:0:b0:527:237d:3765 with SMTP id u22-20020aa7d996000000b00527237d3765mr2133242eds.26.1692366218818; Fri, 18 Aug 2023 06:43:38 -0700 (PDT)
MIME-Version: 1.0
References: <02E987DB-018F-45B9-9871-4D7CFE25A37E@mnot.net> <005742F6-1383-4814-9E85-F9C2CDB7525E@gbiv.com> <CAJV+MGxySTUmQqp++OxeVACt8zSFTo=ETjO=PHeB7o1HshpkAg@mail.gmail.com> <A10BBDFB-044D-4446-80FA-B9985B2FF783@eissing.org>
In-Reply-To: <A10BBDFB-044D-4446-80FA-B9985B2FF783@eissing.org>
From: Patrick Meenan <patmeenan@gmail.com>
Date: Fri, 18 Aug 2023 09:43:27 -0400
Message-ID: <CAJV+MGxh5jzTWGzAmohyWeGd+9928HP=YfZPu1ch0pXo=LUhnQ@mail.gmail.com>
To: Stefan Eissing <stefan@eissing.org>
Cc: Fielding Roy <fielding@gbiv.com>, Mark Nottingham <mnot@mnot.net>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>, Tommy Pauly <tpauly@apple.com>
Content-Type: multipart/alternative; boundary="0000000000004d7938060332b7a8"
Received-SPF: pass client-ip=2a00:1450:4864:20::52a; envelope-from=patmeenan@gmail.com; helo=mail-ed1-x52a.google.com
X-W3C-Hub-DKIM-Status: validation passed: (address=patmeenan@gmail.com domain=gmail.com), signature is good
X-W3C-Hub-Spam-Status: No, score=-5.1
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, W3C_AA=-1, W3C_DB=-1, W3C_WL=-1
X-W3C-Scan-Sig: mimas.w3.org 1qWzlP-003WWi-CJ 905da9f9515e4054575e2a77ff2e16cd
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Call for Adoption: draft-meenan-httpbis-compression-dictionary
Archived-At: <https://www.w3.org/mid/CAJV+MGxh5jzTWGzAmohyWeGd+9928HP=YfZPu1ch0pXo=LUhnQ@mail.gmail.com>

Content-Encoding is end-to-end and follows the content (though some reverse
proxies decode/re-encode the encoding if you are relying on the reverse
proxy for your compression or modifying the payload).  Transfer-Encoding or
anything at the HTTP/2 or 3 layer would be hop-to-hop.

The main requirements for a reverse proxy to "work" with an origin using
dictionary compression are:
- Pass unknown "Accept-Encoding" values through (if they are stripped, the
responses will still work but dictionary compression won't be used)
- Treat "Content-Encoding" responses with unknown encodings as opaque
responses (most that I have tested already do this)
- Support "Vary" for cache keys (if it is a caching proxy) for
"Accept-Encoding" and "Sec-Available-Dictionary" request headers (may
require some config depending on the proxy)

Here are some notes from April when I tested it on Fastly, CloudFront and
Cloudflare, all of which are reverse-proxies:
https://github.com/pmeenan/compression-dictionary-notes/blob/main/CDN.md

The basic flow looks something like this:

- Request comes in to reverse proxy from c1 for
https://example.com/v2/main.js with "Accept-Encoding: deflate, gzip, br,
zstd, br-d, zstd-d" and "Sec-Available-Dictionary: xxxyyyzzz"
- Resource isn't found in cache, request is made from reverse-proxy to b1
for the URL with the same request headers
- Response from b1 comes back with "Content-Encoding: br-d" and "Vary:
content-encoding, sec-available-dictionary" (and appropriate cache headers
making it cache eligible)
- Proxy stores it in cache, keyed by URL, The Accept-Encoding string and
the xxxyyyzzz dictionary
- Proxy responds with the dictionary-compressed resource (doesn't try to
re-compress it since it is already using content-encoding (and maybe with
an encoding the proxy doesn't understand)

- Request comes in to reverse proxy from c2 for
https://example.com/v2/main.js with "Accept-Encoding: deflate, gzip, br,
zstd, br-d, zstd-d" and "Sec-Available-Dictionary: xxxyyyzzz"
- Proxy finds resource in cache, keyed by URL, Accept-Encoding and
xxxyyyzzz dictionary and serves the dictionary-compressed resource from
cache

On Fri, Aug 18, 2023 at 6:25 AM Stefan Eissing <stefan@eissing.org> wrote:

>
>
> > Am 17.08.2023 um 22:39 schrieb Patrick Meenan <patmeenan@gmail.com>:
> >
> > Probably worth continuing the discussion in a dedicated thread if
> adopted but hopefully it won't hurt to take a first pass (inline)...
> >
> > On Thu, Aug 17, 2023 at 1:55 PM Roy T. Fielding <fielding@gbiv.com>
> wrote:
> > I think implementation of such through content-codings is fundamentally
> > misguided because it changes the resource itself and impacts all caching
> > along the chain of requests in ways that are non-recoverable. That is due
> > to the lost metadata and variance on whatever request field is used to
> indicate
> > that some downstream client can grok some possible dictionary.
> >
> > The decoded version of the resource is unchanged. It's not fundamentally
> different than brotli which happens to include a default dictionary and the
> caching is guaranteed to be maintained in a consistent way as long as
> "Vary" works on "Accept-Encoding" as well as whatever header negotiates the
> dictionary.  Even without the dictionary, if something in the middle
> doesn't know how to process one of the content-encodings (and needs to be
> able to access the content) then the accept-encoding should be modified to
> only include encodings that it knows how to work with.  This isn't really
> notably different than "br" or "zstd".
>
> How would a caching reverse proxy work here? Assume there are frontend
> connection c1 and c2 and backend connection b1?
>
> Can there be dictionary state shared between the clients and the backend?
> If not, and the reverse proxy would need to decode/re-encode content, this
> looks like a Hop-By-Hop thing. Which transfer-encoding seems to suite
> better, e.g. better suited to work with the existing infra.
>
> Maybe I just have an incomplete understanding how this is supposed to work.
>
> Kind Regards,
> Stefan
>
> >  In short, it looks like an easy solution for a browser, but will wreak
> > havoc with the larger architecture of the Web.
> >
> > The right way to do this is to implement it as a transfer encoding that
> > can be decoded without loss or confusion with the unencoded resource,
> > which would require extending h2 and h3 to support that feature of
> HTTP/1.1.
> >
> > For the existing draft, there is a lot of unnecessary confusion regarding
> > features of fetch, like CORS, that don't make any sense from a security
> > perspective. That's not what CORS is capable of covering, nor how it is
> > implemented in practice, so reusing it doesn't make any sense.
> > The same goes for use of the Sec- prefix on header fields.
> >
> > CORS covers privacy from a browser perspective as far as the readability
> of responses relative to the origin of the containing document which is
> exactly the context that it is needed for here. The concern that it takes
> care of is to make sure that responses that shouldn't be readable from the
> document context of the client can't be exposed to oracle timing attacks
> (because there won't be any client-opaque responses). HTTP itself doesn't
> really have the same document framing context and need for protecting read
> access of individual responses on a shared connection by clients running in
> different document contexts.
> >  Allowing a response from one origin to define a compression dictionary
> > for responses received from some other origin would clearly violate the
> > assumptions of https in so many ways (space, time, and cross-analysis).
> > I don't see how we could possibly allow that even if both origins were
> > covered by the same certificate. It would be far easier to require that
> > everything have the same origin (as defined in RFC9110, not fetch) or
> > by having the response origin define specifically which dictionary is
> > being used (identifying both the dictionary URL and hash).  In the latter
> > case, it would be possible to pre-define common dictionaries and thus
> > reduce or remove the need to download them.
> >
> > Maybe we crossed wires somewhere, but the dictionaries and the responses
> they apply to MUST be same-origin to each other in this ID. Where CORS
> comes into play is the dictionary or compressed response's relation to the
> document context that they are being fetched from (in a browser case
> anyway).
> >
> > Moving the compression down into the transport layer is what we tried
> before but failed to navigate the browser security issues because the
> transport layer doesn't have the context of which responses need to be
> opaque, which responses are partitioned across document or frame
> boundaries, etc and that the dictionary compression could be used to
> perform oracle attacks across those boundaries.
> >  Likewise, using * as a wildcard in arbitrary URL references is a foot
> gun.
> > It would make more sense to have two attributes, prefix and suffix, and
> > have them only match within the URL path (i.e., exclude the origin and
> > query portions, preventing matches on full URIs or user-supplied
> > query parameters). That is far more likely to get right than allowing
> > things like "//example.com/*/*/*/*/****"
> >
> > The origin is already excluded from being configurable. There is some
> discussion about only supporting relative paths but allowing for full URLs
> just made it easier to reference the existing URL RFC without having to
> re-define just the parts we need to support.
> >
> > Query params can't necessarily be excluded and some sites are going to
> want to allow for either fixed query param matching or wildcard (and maybe
> for both the static and dynamic use case).  Allowing for * allows for some
> flexibility in site URL structure while still keeping the matching
> relatively simple and without the complexity of URLPattern (
> https://github.com/WICG/urlpattern/blob/main/mdn-drafts/QUICK-REFERENCE.md
> )
> >
> > Anyway, I look forward to shaking these issues out.  I'll see about
> creating issues in the github repo that I have been using for the ID for
> all of the questions and concerns raised to make sure we don't lose track
> of any of them (repo is here:
> https://github.com/pmeenan/i-d-compression-dictionary ).
> >
> > Thanks,
> >
> > -Pat
>
>