Re: Continuing discussion on Cache Digest

Alcides Viamontes E <alcidesv@shimmercat.com> Sat, 20 August 2016 10:09 UTC

Resent-Date: Sat, 20 Aug 2016 10:05:01 +0000
Resent-Message-Id: <E1bb39B-0006za-Qe@frink.w3.org>
MIME-Version: 1.0
In-Reply-To: <1C76B7AB-A7B9-4759-AC52-475C3E030137@mnot.net>
References: <1C76B7AB-A7B9-4759-AC52-475C3E030137@mnot.net>
From: Alcides Viamontes E <alcidesv@shimmercat.com>
Date: Sat, 20 Aug 2016 12:04:23 +0200
Message-ID: <CAAMqGzZJ0rD_2DkruvHNu1vwVs2ERngcun9jGnSD22dq3eY07g@mail.gmail.com>
To: Mark Nottingham <mnot@mnot.net>
Cc: HTTP Working Group <ietf-http-wg@w3.org>, Kazuho Oku <kazuhooku@gmail.com>
Content-Type: multipart/alternative; boundary="001a113efda27e7ecc053a7df0f4"
Received-SPF: pass client-ip=209.85.217.176; envelope-from=alcidesv@zunzun.se; helo=mail-ua0-f176.google.com
Subject: Re: Continuing discussion on Cache Digest
Archived-At: <http://www.w3.org/mid/CAAMqGzZJ0rD_2DkruvHNu1vwVs2ERngcun9jGnSD22dq3eY07g@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list

Just a quick few opinions below

On Sat, Aug 20, 2016 at 2:59 AM, Mark Nottingham <mnot@mnot.net> wrote:

> [ with my "cache digest co-author" hat on ]
>
> In discussions about Cache Digest, one of the questions that came up was
> whether or not it was necessary to use a digest mechanism (e.g., Bloom
> filter, Golumb compressed set), or whether or not we could just send a list
> of the cached representations.
>
> Curious about this, I whipped up a script to parse the contents of
> Chrome's cache, to get some idea as to how many cached responses per origin
> a browser keeps.
>
> See:
>   https://gist.github.com/mnot/793fcfb0d003e87ea7e8035c43eafdb9
> and responses to:
>   https://twitter.com/mnot/status/766542805980155905
>
> The caveats around this are too numerous to cover, but to mention a few:
>   - this is just anecdata, and a very small sample at that
>   - it's skewed towards:
>         a) people who follow me on Twitter;
>         b) people who use Chrome;
>         c) people who can easily run a Python program (leaving most
> Windows users out)
>   - it includes both fresh and stale cached responses
>   - it assumes that the Chrome URL gives the complete and correct state of
> the cache
>
> Looking at the responses (five so far) and keeping that in mind, a few
> observations:
>
> 1. Unsurprisingly, the number of cached responses per origin appears to
> follow (roughly) a Zipf curve, like so many other Web stats do
> 2. Origins with tens of cached responses appear to be very common
> 3. Origins with hundreds of cached responses appear to be not uncommon at
> all
> 4. Origins with thousands of cached responses are encountered
>
> More data is, of course, welcome.
>
> My early take-away is that if we design a mechanism where the cached
> responses are enumerated, instead of having the entire cache's contents for
> the origin digested, there needs to be some mechanism whereby the most
> relevant cached responses are selected.
>

I would very much like a selection mechanism even with cache digests. In my
experience with cache-digests-as-a-cookie, the digest size is far smaller
than most authentication cookies, but there may be scenarios where people
will want more control on the number of bytes spent on a digest.


> The most likely time to do that is when the responses themselves are first
> cached; e.g., with a cache-control extension. I think the challenges that
> such a scheme would face are:
>
> a) Keeping the advertisement concise (because it should fit into a
> navigation request, without bumping into another RT of congestion window)
> b) Being able to express the presence of a larger number of URLs (since
> one of the effects of HTTP/2 is atomisation into a larger number of smaller
> resources), with bits of state like "fresh/stale" attached
> c) Being manageable for the origin (since they'll effectively have to
> predict what URLs are important to know about ahead of time, and in the
> face of site changes)
>
> To me, this makes CD more attractive, because we have more confidence that
> (a) and (b) are in hand, and (c) isn't a worry because the entire origin's
> cache state will be sent. Provided that the security/privacy issues are in
> hand, and that it's reasonably implementable by clients, I think CD also
> has a better chance of success because it decouples the sending of the
> cache state from its use, making it easier to reuse the data on the server
> side without close client coordination.
>
> So, I think the things that we do need to work on in CD are:
>
> 1) Choosing a more efficient hash algorithm and assuring that it's
> reasonable to implement in browsers
> 2) Refining the flags / operation models so that it's as simple and
> sensible as possible (but we need feedback on how clients want to send it)
> 3) Defining a way for origins to opt into getting CD, rather than always
> sending it.
>
>
Thumbs up for all of this! Although I see 1) as difficult to achieve in
practice, GCS is already quite good.




-- 
Alcides Viamontes E.
Zunzun AB
(+46) 722294542
(www.shimmercat.com is a property of Zunzun AB)

Continuing discussion on Cache Digest Mark Nottingham
Re: Continuing discussion on Cache Digest Alcides Viamontes E