Re: Call for Adoption: draft-meenan-httpbis-compression-dictionary
Patrick Meenan <patmeenan@gmail.com> Fri, 18 August 2023 13:43 UTC
Received: from mimas.w3.org ([128.30.52.79]) by lyra.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <patmeenan@gmail.com>) id 1qWzgq-001EAm-Fy for ietf-http-wg@listhub.w3.org; Fri, 18 Aug 2023 13:43:46 +0000
Received: from mail-ed1-x52a.google.com ([2a00:1450:4864:20::52a]) by mimas.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from <patmeenan@gmail.com>) id 1qWzlP-003WWi-CJ for ietf-http-wg@w3.org; Fri, 18 Aug 2023 13:43:45 +0000
Received: by mail-ed1-x52a.google.com with SMTP id 4fb4d7f45d1cf-52580a24128so1147036a12.2 for <ietf-http-wg@w3.org>; Fri, 18 Aug 2023 06:43:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692366219; x=1692971019; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=hYsLl72UhWWqL8WQqNGQEyfsvYiIBHgKdG+w/1m6g1A=; b=HigdqJWkcZWm+cHWBuYespm3HYRWbbQeAb1h3/eLkg3Ug14kQ4/HJNI4jIT9ExvMEf RtNjUsJS8+Qo5ouBGyMh4qdWQWiZ0OAgVXwkHWZ1497+hs7hgaSSAc9KRE6Kjjrdaynl T0SLurxyJlM06ocVK0zGXy/ETrot7S/TTNVcvzZcY+ZuI/ZVP+viPxqPWsAkB7q1J6zR uZzwrV5i8gt+zdbpzGTgxy7WAb3ksYShrF/obgMxye6jzMZrce/PH/9rtbYb0Piyj4pT vH/wz5EOgzbgV7rJEhzkIOf/TYv5qRo5mf93Q7olwk2BNtQ2At7VLP6/hAb0g3+B+lsn Bckg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692366219; x=1692971019; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=hYsLl72UhWWqL8WQqNGQEyfsvYiIBHgKdG+w/1m6g1A=; b=TyNRAVaFInvBkU70ZN0UEi3IPNx1Jtt6UEHoskyDdq4HXDC+ESL6GpDAsd2IypnHfR JhBW8F7F+VMrhcShDm7WcMEkG37QiVx8hPuYvqtfEFvqzhgnlvw1xaUetx2BFIjFllLI p/MTtktfAzuk61tocWs4uG80+/nRrird+m5aQ+K4D/JnyYmtO0MqI4ZbhWHnxOBLQ3kS OlIUl0IHgmyO51XrltyNr/L7esrVOfEz9T6Dq3muj3CTqwERwGEeOgQulxsbTj+pY4wf IM5nIDHt0srrEHEacbo14iziahytsvrKE1L0x1YlPbzdoGXxGUXyv92AP+4WLpEh20Za WSZA==
X-Gm-Message-State: AOJu0Yw/JiljLmqvp68xaDy55d87r7EF0AEULpxXtR2kNiWYBBQ3XqUf GQZKNiAIdHwYJMhu7r2R8i6siZTs8FKSasEgTMcoXflmiHI=
X-Google-Smtp-Source: AGHT+IEmDwL3BX97ImagE2Z2q2AlBE2lK2BEaWnD7A16maBsb7bR61d32kPU1yuo0v00tebm5VmdekmltucsMD0VfTE=
X-Received: by 2002:aa7:d996:0:b0:527:237d:3765 with SMTP id u22-20020aa7d996000000b00527237d3765mr2133242eds.26.1692366218818; Fri, 18 Aug 2023 06:43:38 -0700 (PDT)
MIME-Version: 1.0
References: <02E987DB-018F-45B9-9871-4D7CFE25A37E@mnot.net> <005742F6-1383-4814-9E85-F9C2CDB7525E@gbiv.com> <CAJV+MGxySTUmQqp++OxeVACt8zSFTo=ETjO=PHeB7o1HshpkAg@mail.gmail.com> <A10BBDFB-044D-4446-80FA-B9985B2FF783@eissing.org>
In-Reply-To: <A10BBDFB-044D-4446-80FA-B9985B2FF783@eissing.org>
From: Patrick Meenan <patmeenan@gmail.com>
Date: Fri, 18 Aug 2023 09:43:27 -0400
Message-ID: <CAJV+MGxh5jzTWGzAmohyWeGd+9928HP=YfZPu1ch0pXo=LUhnQ@mail.gmail.com>
To: Stefan Eissing <stefan@eissing.org>
Cc: Fielding Roy <fielding@gbiv.com>, Mark Nottingham <mnot@mnot.net>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>, Tommy Pauly <tpauly@apple.com>
Content-Type: multipart/alternative; boundary="0000000000004d7938060332b7a8"
Received-SPF: pass client-ip=2a00:1450:4864:20::52a; envelope-from=patmeenan@gmail.com; helo=mail-ed1-x52a.google.com
X-W3C-Hub-DKIM-Status: validation passed: (address=patmeenan@gmail.com domain=gmail.com), signature is good
X-W3C-Hub-Spam-Status: No, score=-5.1
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, W3C_AA=-1, W3C_DB=-1, W3C_WL=-1
X-W3C-Scan-Sig: mimas.w3.org 1qWzlP-003WWi-CJ 905da9f9515e4054575e2a77ff2e16cd
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Call for Adoption: draft-meenan-httpbis-compression-dictionary
Archived-At: <https://www.w3.org/mid/CAJV+MGxh5jzTWGzAmohyWeGd+9928HP=YfZPu1ch0pXo=LUhnQ@mail.gmail.com>
Content-Encoding is end-to-end and follows the content (though some reverse proxies decode/re-encode the encoding if you are relying on the reverse proxy for your compression or modifying the payload). Transfer-Encoding or anything at the HTTP/2 or 3 layer would be hop-to-hop. The main requirements for a reverse proxy to "work" with an origin using dictionary compression are: - Pass unknown "Accept-Encoding" values through (if they are stripped, the responses will still work but dictionary compression won't be used) - Treat "Content-Encoding" responses with unknown encodings as opaque responses (most that I have tested already do this) - Support "Vary" for cache keys (if it is a caching proxy) for "Accept-Encoding" and "Sec-Available-Dictionary" request headers (may require some config depending on the proxy) Here are some notes from April when I tested it on Fastly, CloudFront and Cloudflare, all of which are reverse-proxies: https://github.com/pmeenan/compression-dictionary-notes/blob/main/CDN.md The basic flow looks something like this: - Request comes in to reverse proxy from c1 for https://example.com/v2/main.js with "Accept-Encoding: deflate, gzip, br, zstd, br-d, zstd-d" and "Sec-Available-Dictionary: xxxyyyzzz" - Resource isn't found in cache, request is made from reverse-proxy to b1 for the URL with the same request headers - Response from b1 comes back with "Content-Encoding: br-d" and "Vary: content-encoding, sec-available-dictionary" (and appropriate cache headers making it cache eligible) - Proxy stores it in cache, keyed by URL, The Accept-Encoding string and the xxxyyyzzz dictionary - Proxy responds with the dictionary-compressed resource (doesn't try to re-compress it since it is already using content-encoding (and maybe with an encoding the proxy doesn't understand) - Request comes in to reverse proxy from c2 for https://example.com/v2/main.js with "Accept-Encoding: deflate, gzip, br, zstd, br-d, zstd-d" and "Sec-Available-Dictionary: xxxyyyzzz" - Proxy finds resource in cache, keyed by URL, Accept-Encoding and xxxyyyzzz dictionary and serves the dictionary-compressed resource from cache On Fri, Aug 18, 2023 at 6:25 AM Stefan Eissing <stefan@eissing.org> wrote: > > > > Am 17.08.2023 um 22:39 schrieb Patrick Meenan <patmeenan@gmail.com>: > > > > Probably worth continuing the discussion in a dedicated thread if > adopted but hopefully it won't hurt to take a first pass (inline)... > > > > On Thu, Aug 17, 2023 at 1:55 PM Roy T. Fielding <fielding@gbiv.com> > wrote: > > I think implementation of such through content-codings is fundamentally > > misguided because it changes the resource itself and impacts all caching > > along the chain of requests in ways that are non-recoverable. That is due > > to the lost metadata and variance on whatever request field is used to > indicate > > that some downstream client can grok some possible dictionary. > > > > The decoded version of the resource is unchanged. It's not fundamentally > different than brotli which happens to include a default dictionary and the > caching is guaranteed to be maintained in a consistent way as long as > "Vary" works on "Accept-Encoding" as well as whatever header negotiates the > dictionary. Even without the dictionary, if something in the middle > doesn't know how to process one of the content-encodings (and needs to be > able to access the content) then the accept-encoding should be modified to > only include encodings that it knows how to work with. This isn't really > notably different than "br" or "zstd". > > How would a caching reverse proxy work here? Assume there are frontend > connection c1 and c2 and backend connection b1? > > Can there be dictionary state shared between the clients and the backend? > If not, and the reverse proxy would need to decode/re-encode content, this > looks like a Hop-By-Hop thing. Which transfer-encoding seems to suite > better, e.g. better suited to work with the existing infra. > > Maybe I just have an incomplete understanding how this is supposed to work. > > Kind Regards, > Stefan > > > In short, it looks like an easy solution for a browser, but will wreak > > havoc with the larger architecture of the Web. > > > > The right way to do this is to implement it as a transfer encoding that > > can be decoded without loss or confusion with the unencoded resource, > > which would require extending h2 and h3 to support that feature of > HTTP/1.1. > > > > For the existing draft, there is a lot of unnecessary confusion regarding > > features of fetch, like CORS, that don't make any sense from a security > > perspective. That's not what CORS is capable of covering, nor how it is > > implemented in practice, so reusing it doesn't make any sense. > > The same goes for use of the Sec- prefix on header fields. > > > > CORS covers privacy from a browser perspective as far as the readability > of responses relative to the origin of the containing document which is > exactly the context that it is needed for here. The concern that it takes > care of is to make sure that responses that shouldn't be readable from the > document context of the client can't be exposed to oracle timing attacks > (because there won't be any client-opaque responses). HTTP itself doesn't > really have the same document framing context and need for protecting read > access of individual responses on a shared connection by clients running in > different document contexts. > > Allowing a response from one origin to define a compression dictionary > > for responses received from some other origin would clearly violate the > > assumptions of https in so many ways (space, time, and cross-analysis). > > I don't see how we could possibly allow that even if both origins were > > covered by the same certificate. It would be far easier to require that > > everything have the same origin (as defined in RFC9110, not fetch) or > > by having the response origin define specifically which dictionary is > > being used (identifying both the dictionary URL and hash). In the latter > > case, it would be possible to pre-define common dictionaries and thus > > reduce or remove the need to download them. > > > > Maybe we crossed wires somewhere, but the dictionaries and the responses > they apply to MUST be same-origin to each other in this ID. Where CORS > comes into play is the dictionary or compressed response's relation to the > document context that they are being fetched from (in a browser case > anyway). > > > > Moving the compression down into the transport layer is what we tried > before but failed to navigate the browser security issues because the > transport layer doesn't have the context of which responses need to be > opaque, which responses are partitioned across document or frame > boundaries, etc and that the dictionary compression could be used to > perform oracle attacks across those boundaries. > > Likewise, using * as a wildcard in arbitrary URL references is a foot > gun. > > It would make more sense to have two attributes, prefix and suffix, and > > have them only match within the URL path (i.e., exclude the origin and > > query portions, preventing matches on full URIs or user-supplied > > query parameters). That is far more likely to get right than allowing > > things like "//example.com/*/*/*/*/****" > > > > The origin is already excluded from being configurable. There is some > discussion about only supporting relative paths but allowing for full URLs > just made it easier to reference the existing URL RFC without having to > re-define just the parts we need to support. > > > > Query params can't necessarily be excluded and some sites are going to > want to allow for either fixed query param matching or wildcard (and maybe > for both the static and dynamic use case). Allowing for * allows for some > flexibility in site URL structure while still keeping the matching > relatively simple and without the complexity of URLPattern ( > https://github.com/WICG/urlpattern/blob/main/mdn-drafts/QUICK-REFERENCE.md > ) > > > > Anyway, I look forward to shaking these issues out. I'll see about > creating issues in the github repo that I have been using for the ID for > all of the questions and concerns raised to make sure we don't lose track > of any of them (repo is here: > https://github.com/pmeenan/i-d-compression-dictionary ). > > > > Thanks, > > > > -Pat > >
- Re: Call for Adoption: draft-meenan-httpbis-compr… W. Felix Handte
- Re: Call for Adoption: draft-meenan-httpbis-compr… Eric Kinnear
- Call for Adoption: draft-meenan-httpbis-compressi… Mark Nottingham
- Re: Call for Adoption: draft-meenan-httpbis-compr… Patrick Meenan
- Re: Call for Adoption: draft-meenan-httpbis-compr… Martin Thomson
- Re: Call for Adoption: draft-meenan-httpbis-compr… Martin Thomson
- Re: Call for Adoption: draft-meenan-httpbis-compr… Patrick Meenan
- Re: Call for Adoption: draft-meenan-httpbis-compr… Lucas Pardue
- Re: Call for Adoption: draft-meenan-httpbis-compr… Roy T. Fielding
- Re: Call for Adoption: draft-meenan-httpbis-compr… Patrick Meenan
- Re: Call for Adoption: draft-meenan-httpbis-compr… Stefan Eissing
- Re: Call for Adoption: draft-meenan-httpbis-compr… Patrick Meenan
- Re: Call for Adoption: draft-meenan-httpbis-compr… Mark Nottingham