Re: Continuing discussion on Cache Digest

Alcides Viamontes E <> Sat, 20 August 2016 10:09 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id E8CFD12D79B for <>; Sat, 20 Aug 2016 03:09:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -8.167
X-Spam-Status: No, score=-8.167 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.247, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id j4yNsiiD7fd9 for <>; Sat, 20 Aug 2016 03:09:11 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 795FB12B02E for <>; Sat, 20 Aug 2016 03:09:10 -0700 (PDT)
Received: from lists by with local (Exim 4.80) (envelope-from <>) id 1bb39B-0006za-Qe for; Sat, 20 Aug 2016 10:05:01 +0000
Resent-Date: Sat, 20 Aug 2016 10:05:01 +0000
Resent-Message-Id: <>
Received: from ([]) by with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <>) id 1bb392-0006il-AT for; Sat, 20 Aug 2016 10:04:52 +0000
Received: from ([]) by with esmtps (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <>) id 1bb390-00086P-7b for; Sat, 20 Aug 2016 10:04:51 +0000
Received: by with SMTP id n59so117939304uan.2 for <>; Sat, 20 Aug 2016 03:04:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=NPTySUoIVyxuiXfwcV/qq+2qXK+UtqwGRMsILnhwoB4=; b=As5I77a1J8trFTJcxb/FkvXmGsD14fYd0l27vQLvsYh2I7rSU32WrlgwqdZGwUJXQY OdLNqP2imQi2eBOC+oBWxOLkD2UrYl4VuPReu6MXf27FZdfpjbFvJbf5aHiw9MSDH95e rDjTuLDmTKbZ6c4mSO+alypR7gMIabYzWTW6h8dgd1JiqyS4ydyKZPKmpkhcUtzjRr2y o/oCsChUxmfZVnIuzEcRKTqhJhUfJr3yDoIRk3rTyTnlW05OQfnYYYJv7ZwzVE//Ucxj 2N+jKW3UNTgNtiKzKF6BLr0ATWXSCiZqMBzIIB79lDQvYQdB+Fb2OjoEC6RS6cQiUdQI LI1Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=NPTySUoIVyxuiXfwcV/qq+2qXK+UtqwGRMsILnhwoB4=; b=HP6Niyg5XiWxecMLcMrqKqo7S8hifZOBKPC5bMv7tefWlbv87Ezq6CIvICp9w/TwRc lrXJbxuRRKZ+QG8f+lwvuS8M4D3DHtzqMl/16Iek0XIioTfglnsPCVFEBgXY9mA6w3AQ dvObrglpmNOk+4e2BnqQrMamHnSfR1vTVbuNktXI8AZtnsoLUAuEPq6UJ5p3AOCJeqRb flLD8/0s9r/LON+X/eq0gmmClPfL2KAJ24Vq5H7egG0LBZA7jmYOsjfn6TIoKMl490j8 s8kGvu2CiP4+f0Ags6gx8OZ4ske+hNFTppolcILYMt7K1DkYITZjXMtN7ODhbVXRACtf sLkA==
X-Gm-Message-State: AEkoouskCRdowQg0ZTgq4Fyo8NZ5/M/AXbA/ZenFrlX+k0U3Zg1+yN/+DUkZy7djhH5LkpDR3U/+7I8N2U7GEw==
X-Received: by with SMTP id 37mr6243637uak.57.1471687463690; Sat, 20 Aug 2016 03:04:23 -0700 (PDT)
MIME-Version: 1.0
Received: by with HTTP; Sat, 20 Aug 2016 03:04:23 -0700 (PDT)
In-Reply-To: <>
References: <>
From: Alcides Viamontes E <>
Date: Sat, 20 Aug 2016 12:04:23 +0200
Message-ID: <>
To: Mark Nottingham <>
Cc: HTTP Working Group <>, Kazuho Oku <>
Content-Type: multipart/alternative; boundary="001a113efda27e7ecc053a7df0f4"
Received-SPF: pass client-ip=;;
X-W3C-Hub-Spam-Status: No, score=-4.2
X-W3C-Hub-Spam-Report: AWL=-2.067, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, W3C_NW=0.5
X-W3C-Scan-Sig: 1bb390-00086P-7b 1afc96126bf9246a9966d90cd8155cef
Subject: Re: Continuing discussion on Cache Digest
Archived-At: <>
X-Mailing-List: <> archive/latest/32336
Precedence: list
List-Id: <>
List-Help: <>
List-Post: <>
List-Unsubscribe: <>

Just a quick few opinions below

On Sat, Aug 20, 2016 at 2:59 AM, Mark Nottingham <> wrote:

> [ with my "cache digest co-author" hat on ]
> In discussions about Cache Digest, one of the questions that came up was
> whether or not it was necessary to use a digest mechanism (e.g., Bloom
> filter, Golumb compressed set), or whether or not we could just send a list
> of the cached representations.
> Curious about this, I whipped up a script to parse the contents of
> Chrome's cache, to get some idea as to how many cached responses per origin
> a browser keeps.
> See:
> and responses to:
> The caveats around this are too numerous to cover, but to mention a few:
>   - this is just anecdata, and a very small sample at that
>   - it's skewed towards:
>         a) people who follow me on Twitter;
>         b) people who use Chrome;
>         c) people who can easily run a Python program (leaving most
> Windows users out)
>   - it includes both fresh and stale cached responses
>   - it assumes that the Chrome URL gives the complete and correct state of
> the cache
> Looking at the responses (five so far) and keeping that in mind, a few
> observations:
> 1. Unsurprisingly, the number of cached responses per origin appears to
> follow (roughly) a Zipf curve, like so many other Web stats do
> 2. Origins with tens of cached responses appear to be very common
> 3. Origins with hundreds of cached responses appear to be not uncommon at
> all
> 4. Origins with thousands of cached responses are encountered
> More data is, of course, welcome.
> My early take-away is that if we design a mechanism where the cached
> responses are enumerated, instead of having the entire cache's contents for
> the origin digested, there needs to be some mechanism whereby the most
> relevant cached responses are selected.

I would very much like a selection mechanism even with cache digests. In my
experience with cache-digests-as-a-cookie, the digest size is far smaller
than most authentication cookies, but there may be scenarios where people
will want more control on the number of bytes spent on a digest.

> The most likely time to do that is when the responses themselves are first
> cached; e.g., with a cache-control extension. I think the challenges that
> such a scheme would face are:
> a) Keeping the advertisement concise (because it should fit into a
> navigation request, without bumping into another RT of congestion window)
> b) Being able to express the presence of a larger number of URLs (since
> one of the effects of HTTP/2 is atomisation into a larger number of smaller
> resources), with bits of state like "fresh/stale" attached
> c) Being manageable for the origin (since they'll effectively have to
> predict what URLs are important to know about ahead of time, and in the
> face of site changes)
> To me, this makes CD more attractive, because we have more confidence that
> (a) and (b) are in hand, and (c) isn't a worry because the entire origin's
> cache state will be sent. Provided that the security/privacy issues are in
> hand, and that it's reasonably implementable by clients, I think CD also
> has a better chance of success because it decouples the sending of the
> cache state from its use, making it easier to reuse the data on the server
> side without close client coordination.
> So, I think the things that we do need to work on in CD are:
> 1) Choosing a more efficient hash algorithm and assuring that it's
> reasonable to implement in browsers
> 2) Refining the flags / operation models so that it's as simple and
> sensible as possible (but we need feedback on how clients want to send it)
> 3) Defining a way for origins to opt into getting CD, rather than always
> sending it.
Thumbs up for all of this! Although I see 1) as difficult to achieve in
practice, GCS is already quite good.

Alcides Viamontes E.
Zunzun AB
(+46) 722294542
( is a property of Zunzun AB)