Re: Continuing discussion on Cache Digest

Alcides Viamontes E <alcidesv@shimmercat.com> Sat, 20 August 2016 10:09 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E8CFD12D79B for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sat, 20 Aug 2016 03:09:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -8.167
X-Spam-Level:
X-Spam-Status: No, score=-8.167 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.247, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=shimmercat-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id j4yNsiiD7fd9 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Sat, 20 Aug 2016 03:09:11 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 795FB12B02E for <httpbisa-archive-bis2Juki@lists.ietf.org>; Sat, 20 Aug 2016 03:09:10 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.80) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1bb39B-0006za-Qe for ietf-http-wg-dist@listhub.w3.org; Sat, 20 Aug 2016 10:05:01 +0000
Resent-Date: Sat, 20 Aug 2016 10:05:01 +0000
Resent-Message-Id: <E1bb39B-0006za-Qe@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <alcidesv@zunzun.se>) id 1bb392-0006il-AT for ietf-http-wg@listhub.w3.org; Sat, 20 Aug 2016 10:04:52 +0000
Received: from mail-ua0-f176.google.com ([209.85.217.176]) by maggie.w3.org with esmtps (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <alcidesv@zunzun.se>) id 1bb390-00086P-7b for ietf-http-wg@w3.org; Sat, 20 Aug 2016 10:04:51 +0000
Received: by mail-ua0-f176.google.com with SMTP id n59so117939304uan.2 for <ietf-http-wg@w3.org>; Sat, 20 Aug 2016 03:04:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shimmercat-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=NPTySUoIVyxuiXfwcV/qq+2qXK+UtqwGRMsILnhwoB4=; b=As5I77a1J8trFTJcxb/FkvXmGsD14fYd0l27vQLvsYh2I7rSU32WrlgwqdZGwUJXQY OdLNqP2imQi2eBOC+oBWxOLkD2UrYl4VuPReu6MXf27FZdfpjbFvJbf5aHiw9MSDH95e rDjTuLDmTKbZ6c4mSO+alypR7gMIabYzWTW6h8dgd1JiqyS4ydyKZPKmpkhcUtzjRr2y o/oCsChUxmfZVnIuzEcRKTqhJhUfJr3yDoIRk3rTyTnlW05OQfnYYYJv7ZwzVE//Ucxj 2N+jKW3UNTgNtiKzKF6BLr0ATWXSCiZqMBzIIB79lDQvYQdB+Fb2OjoEC6RS6cQiUdQI LI1Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=NPTySUoIVyxuiXfwcV/qq+2qXK+UtqwGRMsILnhwoB4=; b=HP6Niyg5XiWxecMLcMrqKqo7S8hifZOBKPC5bMv7tefWlbv87Ezq6CIvICp9w/TwRc lrXJbxuRRKZ+QG8f+lwvuS8M4D3DHtzqMl/16Iek0XIioTfglnsPCVFEBgXY9mA6w3AQ dvObrglpmNOk+4e2BnqQrMamHnSfR1vTVbuNktXI8AZtnsoLUAuEPq6UJ5p3AOCJeqRb flLD8/0s9r/LON+X/eq0gmmClPfL2KAJ24Vq5H7egG0LBZA7jmYOsjfn6TIoKMl490j8 s8kGvu2CiP4+f0Ags6gx8OZ4ske+hNFTppolcILYMt7K1DkYITZjXMtN7ODhbVXRACtf sLkA==
X-Gm-Message-State: AEkoouskCRdowQg0ZTgq4Fyo8NZ5/M/AXbA/ZenFrlX+k0U3Zg1+yN/+DUkZy7djhH5LkpDR3U/+7I8N2U7GEw==
X-Received: by 10.176.1.40 with SMTP id 37mr6243637uak.57.1471687463690; Sat, 20 Aug 2016 03:04:23 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.31.32.9 with HTTP; Sat, 20 Aug 2016 03:04:23 -0700 (PDT)
In-Reply-To: <1C76B7AB-A7B9-4759-AC52-475C3E030137@mnot.net>
References: <1C76B7AB-A7B9-4759-AC52-475C3E030137@mnot.net>
From: Alcides Viamontes E <alcidesv@shimmercat.com>
Date: Sat, 20 Aug 2016 12:04:23 +0200
Message-ID: <CAAMqGzZJ0rD_2DkruvHNu1vwVs2ERngcun9jGnSD22dq3eY07g@mail.gmail.com>
To: Mark Nottingham <mnot@mnot.net>
Cc: HTTP Working Group <ietf-http-wg@w3.org>, Kazuho Oku <kazuhooku@gmail.com>
Content-Type: multipart/alternative; boundary="001a113efda27e7ecc053a7df0f4"
Received-SPF: pass client-ip=209.85.217.176; envelope-from=alcidesv@zunzun.se; helo=mail-ua0-f176.google.com
X-W3C-Hub-Spam-Status: No, score=-4.2
X-W3C-Hub-Spam-Report: AWL=-2.067, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, W3C_NW=0.5
X-W3C-Scan-Sig: maggie.w3.org 1bb390-00086P-7b 1afc96126bf9246a9966d90cd8155cef
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Continuing discussion on Cache Digest
Archived-At: <http://www.w3.org/mid/CAAMqGzZJ0rD_2DkruvHNu1vwVs2ERngcun9jGnSD22dq3eY07g@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/32336
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

Just a quick few opinions below

On Sat, Aug 20, 2016 at 2:59 AM, Mark Nottingham <mnot@mnot.net> wrote:

> [ with my "cache digest co-author" hat on ]
>
> In discussions about Cache Digest, one of the questions that came up was
> whether or not it was necessary to use a digest mechanism (e.g., Bloom
> filter, Golumb compressed set), or whether or not we could just send a list
> of the cached representations.
>
> Curious about this, I whipped up a script to parse the contents of
> Chrome's cache, to get some idea as to how many cached responses per origin
> a browser keeps.
>
> See:
>   https://gist.github.com/mnot/793fcfb0d003e87ea7e8035c43eafdb9
> and responses to:
>   https://twitter.com/mnot/status/766542805980155905
>
> The caveats around this are too numerous to cover, but to mention a few:
>   - this is just anecdata, and a very small sample at that
>   - it's skewed towards:
>         a) people who follow me on Twitter;
>         b) people who use Chrome;
>         c) people who can easily run a Python program (leaving most
> Windows users out)
>   - it includes both fresh and stale cached responses
>   - it assumes that the Chrome URL gives the complete and correct state of
> the cache
>
> Looking at the responses (five so far) and keeping that in mind, a few
> observations:
>
> 1. Unsurprisingly, the number of cached responses per origin appears to
> follow (roughly) a Zipf curve, like so many other Web stats do
> 2. Origins with tens of cached responses appear to be very common
> 3. Origins with hundreds of cached responses appear to be not uncommon at
> all
> 4. Origins with thousands of cached responses are encountered
>
> More data is, of course, welcome.
>
> My early take-away is that if we design a mechanism where the cached
> responses are enumerated, instead of having the entire cache's contents for
> the origin digested, there needs to be some mechanism whereby the most
> relevant cached responses are selected.
>

I would very much like a selection mechanism even with cache digests. In my
experience with cache-digests-as-a-cookie, the digest size is far smaller
than most authentication cookies, but there may be scenarios where people
will want more control on the number of bytes spent on a digest.


> The most likely time to do that is when the responses themselves are first
> cached; e.g., with a cache-control extension. I think the challenges that
> such a scheme would face are:
>
> a) Keeping the advertisement concise (because it should fit into a
> navigation request, without bumping into another RT of congestion window)
> b) Being able to express the presence of a larger number of URLs (since
> one of the effects of HTTP/2 is atomisation into a larger number of smaller
> resources), with bits of state like "fresh/stale" attached
> c) Being manageable for the origin (since they'll effectively have to
> predict what URLs are important to know about ahead of time, and in the
> face of site changes)
>
> To me, this makes CD more attractive, because we have more confidence that
> (a) and (b) are in hand, and (c) isn't a worry because the entire origin's
> cache state will be sent. Provided that the security/privacy issues are in
> hand, and that it's reasonably implementable by clients, I think CD also
> has a better chance of success because it decouples the sending of the
> cache state from its use, making it easier to reuse the data on the server
> side without close client coordination.
>
> So, I think the things that we do need to work on in CD are:
>
> 1) Choosing a more efficient hash algorithm and assuring that it's
> reasonable to implement in browsers
> 2) Refining the flags / operation models so that it's as simple and
> sensible as possible (but we need feedback on how clients want to send it)
> 3) Defining a way for origins to opt into getting CD, rather than always
> sending it.
>
>
Thumbs up for all of this! Although I see 1) as difficult to achieve in
practice, GCS is already quite good.




-- 
Alcides Viamontes E.
Zunzun AB
(+46) 722294542
(www.shimmercat.com is a property of Zunzun AB)