Re: Broader discussion - limit dictionary encoding to one compression algorithm?
Patrick Meenan <patmeenan@gmail.com> Wed, 22 May 2024 14:04 UTC
Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=ietf.org@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3F393C151090 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Wed, 22 May 2024 07:04:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.847
X-Spam-Level:
X-Spam-Status: No, score=-2.847 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, MAILING_LIST_MULTI=-1, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=w3.org header.b="Mm4F84Cq"; dkim=pass (2048-bit key) header.d=w3.org header.b="nGgYiHni"; dkim=pass (2048-bit key) header.d=gmail.com header.b="ZFhUqdF7"
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eQiOqhvg5jCL for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Wed, 22 May 2024 07:04:28 -0700 (PDT)
Received: from mab.w3.org (mab.w3.org [IPv6:2600:1f18:7d7a:2700:d091:4b25:8566:8113]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 42D78C15108F for <httpbisa-archive-bis2Juki@ietf.org>; Wed, 22 May 2024 07:04:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=w3.org; s=s1; h=Subject:Content-Type:To:Message-ID:Date:From:In-Reply-To:References: MIME-Version:Cc:Reply-To; bh=CEdwSw0atLLr4etlR7yM5bql8f6Iw/mE0ZiKlpBRxwQ=; b= Mm4F84CqfnmRIPjauAujDl8sJNP1VUzuF2Di6LGoUZsyvhDRMXJstnKGVEZPbfOWto1axJxZZNWuE mMpnQlYrgmP5xZmcdvq4mKzxmHhGpEsM3surO4rihdgudswtQYUyhdt5dnWBuhmKUVhAwYsOqR0tB ufCJW3ruEoHZCY/K/OIlcpA0gTJxkPRH4abKOI7n/RUJCoJZkX+OlFLomwvgdXPagPzcjM6qRQfcX o5H6a1PmTh0ssYmN+BEvjYQ/5bSupUowYF8j3BKvHoFbI/Xvn/YUupB27angZKz8jMjGXmtm+wBLE WJPssZHLmitaC+hTuljKt4beEMhEq1hTpQ==;
Received: from lists by mab.w3.org with local (Exim 4.96) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1s9mYv-000SM5-2G for ietf-http-wg-dist@listhub.w3.org; Wed, 22 May 2024 14:03:25 +0000
Resent-Date: Wed, 22 May 2024 14:03:25 +0000
Resent-Message-Id: <E1s9mYv-000SM5-2G@mab.w3.org>
Received: from ip-10-0-0-144.ec2.internal ([10.0.0.144] helo=pan.w3.org) by mab.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from <patmeenan@gmail.com>) id 1s9mYr-000SKH-05 for ietf-http-wg@listhub.w3.internal; Wed, 22 May 2024 14:03:21 +0000
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=w3.org; s=s1; h=Content-Type:To:Subject:Message-ID:Date:From:In-Reply-To:References: MIME-Version:Cc:Reply-To; bh=CEdwSw0atLLr4etlR7yM5bql8f6Iw/mE0ZiKlpBRxwQ=; t=1716386601; x=1717250601; b=nGgYiHniu91uoY3qhz29DTQ5aBVTbm6wm1JmibsMZH/DFxb jjN28CPJQjvx/2D+HxoRj6DAdwbI9e8UUWKHN5OAdqZFv73ryy90iK89sMbGV5QS6McFHxRwH61Bu CG9YImsMJRlslQVTc1FVrCFhNhZ21F+JGDoUCDGPDidRdZ9LFwIzvnYT3rH9pi4O4+WnWbUl8uygp lnf5FAF+m82T/h+wjgj2gYm/b4MVuB+bDpa+rb8nxzekHfm3akEIl2Scce5WVzL+gZDsd3fo1JnVI E/wdfucy3xW68DWog/8X+HbLJwzhn2S2V2qj0MzDBe72H1coYKISjqXSLRMbPCzA==;
Received-SPF: pass (pan.w3.org: domain of gmail.com designates 2a00:1450:4864:20::632 as permitted sender) client-ip=2a00:1450:4864:20::632; envelope-from=patmeenan@gmail.com; helo=mail-ej1-x632.google.com;
Received: from mail-ej1-x632.google.com ([2a00:1450:4864:20::632]) by pan.w3.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from <patmeenan@gmail.com>) id 1s9mYq-006k1U-0e for ietf-http-wg@w3.org; Wed, 22 May 2024 14:03:20 +0000
Received: by mail-ej1-x632.google.com with SMTP id a640c23a62f3a-a59cdd185b9so198914466b.1 for <ietf-http-wg@w3.org>; Wed, 22 May 2024 07:03:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716386595; x=1716991395; darn=w3.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=CEdwSw0atLLr4etlR7yM5bql8f6Iw/mE0ZiKlpBRxwQ=; b=ZFhUqdF7fgXHTtFxZJj+KFe4hbVrSFhMQS720SfHxDNnVNMOa//OBXPLUU3+/PcP0w buSoWKQlBhRySTtsAHKvnw5wngPzdR/cr9yg6LluksVrTHXk7NHL8Oz6W9mbwABXrBmg aFd1kyHZDxWTnEgzUAx46F4PzmyPjgYzH2EzyD5hOiAxVQlM2ovpKzGNJ9QP5AVzZiYE XuSIIlX1aXz+Utp3LIrzst39BVERtBu/mAoRdg2n60q3MIp3gMKwnAg5g2pCcTbQwNNM O+/DvnGg3LsBFRGOHh08xT8QjV4t4hJaZ8CP5VpDdXrez29Wr8oqhb5uV5HlO/gN4fPM LnIQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716386595; x=1716991395; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=CEdwSw0atLLr4etlR7yM5bql8f6Iw/mE0ZiKlpBRxwQ=; b=XkdHX1vg3a75TbSqmyt186cUCAhMpANzccW1PkkxgComRNsnpTg9j9YLTjmHxDW37M QU9FS3E0cJrm5jjWI3Iwo5EL9VoDLm78l9yNYu9ovcyLbonwUIWKEOC0jfl2rml+2Mh2 KwGJl071KO2WqJ41Y4MEvAJlU+w7iPaguxO0fn63UtiRuDU6dOjBKSR50/5Hd9uZVFG3 iwRkUbmUHmn89eMcTonwkGpbrhWrjRGlrp9fqs4Uq13A5MBchOijJxwzDP8qpRHty2Gy 0VskH1fz2dUky535s7DOzVEuvNOlamxvzUm0UU2jVxw1V2p4dpyH9Hsys88Nkwsu0fjG GVYw==
X-Gm-Message-State: AOJu0YzjqCOzPWhdSaQ+Tri5XDQ616hK6D5sBm0dTL+wi6lSAHx358Ry yFDHKCHGhAfRLKj+sz6gPo4PT5GAdK7BVqg/xYJc7jPtqHtbITXPWV5EuoiGbMXI5To3P2h87v+ gAYug0cNLK/FC6QYsKzvE1cGWDyhErjf4
X-Google-Smtp-Source: AGHT+IEV4ThZZqF89+Ddz3SWdidhi0SH0nPWQquWoR6vDvKSQ/pKVZIbfO+4q97/ImIS50Ihqp/suAO27I3oFwqmswU=
X-Received: by 2002:a17:906:4c53:b0:a59:ad76:b371 with SMTP id a640c23a62f3a-a62232b3846mr191118066b.26.1716386595064; Wed, 22 May 2024 07:03:15 -0700 (PDT)
MIME-Version: 1.0
References: <CAJV+MGzjUnZZ=XFn5veOvuhVWyZNP2b9U0fxpS3UmrDC_bc_wQ@mail.gmail.com> <202405211641.44LGfY2U006906@critter.freebsd.dk> <CAJV+MGzwbFAC7NhP611HDaVYMjMX0Q+KZ-QYirFu5WzjWL771g@mail.gmail.com>
In-Reply-To: <CAJV+MGzwbFAC7NhP611HDaVYMjMX0Q+KZ-QYirFu5WzjWL771g@mail.gmail.com>
From: Patrick Meenan <patmeenan@gmail.com>
Date: Wed, 22 May 2024 10:03:03 -0400
Message-ID: <CAJV+MGwWLuujKUG1vAz8e8F0SoYWxN3KFv-nWfRYiFVk1Q9U2Q@mail.gmail.com>
To: HTTP Working Group <ietf-http-wg@w3.org>
Content-Type: multipart/alternative; boundary="0000000000004bd44f06190b65d6"
X-W3C-Hub-DKIM-Status: validation passed: (address=patmeenan@gmail.com domain=gmail.com), signature is good
X-W3C-Hub-Spam-Status: No, score=-5.1
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, DMARC_PASS=-0.001, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, W3C_AA=-1, W3C_DB=-1, W3C_WL=-1
X-W3C-Scan-Sig: pan.w3.org 1s9mYq-006k1U-0e 52ac694c7c6a7810f1e63ae5e0471392
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Broader discussion - limit dictionary encoding to one compression algorithm?
Archived-At: <https://www.w3.org/mid/CAJV+MGwWLuujKUG1vAz8e8F0SoYWxN3KFv-nWfRYiFVk1Q9U2Q@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/51961
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/email/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>
It's probably worth noting that the draft is not specifying "Brotli" and "Zstandard" but, rather, "dcb" and "dcz" which are specific parameters for each (window size in particular) that lead to the restrictions I mentioned. They are effectively the dictionary-equivalent of "zstd" and "br", both of which use the same 8 and 16 MB windows respectively that "dcz" and "dcb" define. Dictionary compression for delta updates is more likely to benefit from large window variants for use cases where you want to use http to deliver delta updates of large files since the window and other params for each directly impact the effectiveness of the delta encoding and size of resources that they can be applied to. I would not be surprised to see large/huge variants of the content encoding be defined and used outside of the browser case and they can still leverage the same dictionary mechanism, just with a different content-encoding (and would just need to define an appropriate content-encoding). There are other compression algorithms that are specific to resource types that can do MUCH better delta encoding than what Zstandard and Brotli provide in the general case. Courgette, for example: https://www.chromium.org/developers/design-documents/software-updates-courgette/ I wouldn't be surprised if a better diff update were to be developed for ML models that could do something better than pattern matching knowing the format of the file (giant collection of weights), particularly given the size of the Gen AI models where even the smallest are multiple gigabytes. I don't expect dictionary updates over HTTP (using the compression dictionary transport mechanism) will be limited to 1-2 content-encodings for very long so the main question is if we define both "dcb" and "dcz" now or only one of them and let other content-encodings follow for different use cases in future RFCs. I think it makes sense to spec the dictionary-aware versions of both "zstd" and "br" since we already have both of them and they are both in broad use and the parameters map directly to what is currently defined for "dbz" and "dcb". This is effectively defining how the existing encodings should behave when using dictionaries. On Tue, May 21, 2024 at 1:02 PM Patrick Meenan <patmeenan@gmail.com> wrote: > > > On Tue, May 21, 2024 at 12:41 PM Poul-Henning Kamp <phk@phk.freebsd.dk> > wrote: > >> Patrick Meenan writes: >> >> > ** The case for a single content-encoding: >> > […] >> > ** The case for both Brotli and Zstandard: >> >> First, those are not really the two choices before us. >> >> Option one is: Pick one single algorithm >> >> Option two is: Add a negotiation mechanism and seed a new IANA registry >> with those two algorithms >> >> As far as I can tell, there are no credible data which shows any >> performance difference between the two, and no of reason to think that any >> future compression algorithm will do significantly better. >> > > We already have a negotiation mechanism. It uses "Accept-Encoding" and > "Content-Encoding" and the existing registry. Nothing about the negotiation > changes if we use one, two or more. The question is if we specify and > register the "dcb" content-encoding as well as the "dcz" content encoding > as part of this draft or if we only register one (or if we also add a > restriction that no other content encodings can use the dictionary > negotiation). > > As far as future encodings, we don't know if any algorithms will do better > but there is the potential for content-aware delta encodings to do better > (with things like reallocated addresses in WASM, etc). More likely, there > will probably come a time where someone wants to delta-encode > multi-gigabyte resources where the 50/128MB limitations laid out for "dcb" > and "dcz" won't work and a "large window" variant may need to be specified > (as a new content encoding). >
- Broader discussion - limit dictionary encoding to… Patrick Meenan
- Re: Broader discussion - limit dictionary encodin… Poul-Henning Kamp
- Re: Broader discussion - limit dictionary encodin… Patrick Meenan
- Re: Broader discussion - limit dictionary encodin… Glenn Strauss
- Re: Broader discussion - limit dictionary encodin… Roy T. Fielding
- Re: Broader discussion - limit dictionary encodin… Jyrki Alakuijala
- Re: Broader discussion - limit dictionary encodin… Patrick Meenan
- Re: Broader discussion - limit dictionary encodin… Jyrki Alakuijala