Re: dont-revalidate Cache-Control header

Ilya Grigorik <igrigorik@gmail.com> Mon, 13 July 2015 22:36 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E3B241A854B for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Mon, 13 Jul 2015 15:36:27 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.011
X-Spam-Level:
X-Spam-Status: No, score=-7.011 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TvXXDRL_mTO1 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Mon, 13 Jul 2015 15:36:25 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 141E91A8547 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Mon, 13 Jul 2015 15:36:24 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.80) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1ZEmGm-0004yB-Ip for ietf-http-wg-dist@listhub.w3.org; Mon, 13 Jul 2015 22:32:16 +0000
Resent-Date: Mon, 13 Jul 2015 22:32:16 +0000
Resent-Message-Id: <E1ZEmGm-0004yB-Ip@frink.w3.org>
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <igrigorik@gmail.com>) id 1ZEmGi-0004xL-6a for ietf-http-wg@listhub.w3.org; Mon, 13 Jul 2015 22:32:12 +0000
Received: from mail-qg0-f51.google.com ([209.85.192.51]) by lisa.w3.org with esmtps (TLS1.2:RSA_ARCFOUR_SHA1:128) (Exim 4.80) (envelope-from <igrigorik@gmail.com>) id 1ZEmGf-0005bD-S1 for ietf-http-wg@w3.org; Mon, 13 Jul 2015 22:32:11 +0000
Received: by qget71 with SMTP id t71so163533952qge.2 for <ietf-http-wg@w3.org>; Mon, 13 Jul 2015 15:31:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=I3orlx5ZX0daQbvgpwsuKwlkt57RHcD/8kQZ1bHspA4=; b=p3rNzas/CmWd0ZywGJuNb03L+J+kuxpCgaCA3u9QEROWK5qTTKLmVw9E7+zmmMo/u+ abZxbUcfdC435IpM1aJo4AOFiAkE8iDp3vdWSRwS2YGxmkQAPdGZPqmxLYSytK/eVcS1 7LN+RZh0dlse120OlKlGIeqJgnlsPE4v4hpRmXN1UBdvw9POtB7dOr5o3c6vUh7D6cHH R4BlEZCR0glG8NCKeEE7hflVGxEF3qAe+mK3IkjZz2LQHptXA+qmYk7nVfiAUhSpMrHF jMFb4nFX3RrA8EjxNwn+wAXJOjJKl8+MGgtKyPJTDDcLlk/szH9H1TeHUKcOoNIOGu3k bP+g==
X-Received: by 10.140.92.2 with SMTP id a2mr7733269qge.6.1436826703925; Mon, 13 Jul 2015 15:31:43 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.140.19.85 with HTTP; Mon, 13 Jul 2015 15:31:04 -0700 (PDT)
In-Reply-To: <CABgOVaLZotBxWrjaY=m987eGKr+avKqsHsiMtmgPgvPgkiJf2A@mail.gmail.com>
References: <CABgOVaLHBb4zcgvO4NUUmAzUjNkocBGYY3atFA9iuYyoLaLQsA@mail.gmail.com> <559F9E90.4020801@treenet.co.nz> <CABgOVaLG6QZyjqk2AGYupShST_u3ty9BpxUcPX+_yMEC1hyHAQ@mail.gmail.com> <961203FE-7E54-410F-923E-71C04914CD2E@mnot.net> <CABgOVaJxntEyT0v4GvWm0Qi9jbUPEnzxJgg4KyQSM1T_gN1mjQ@mail.gmail.com> <CAHixhFoWXSGeVOqDPX5D51b7EjpyPCzKovEn6n6aoWxhYk_6Vg@mail.gmail.com> <CABgOVaLZotBxWrjaY=m987eGKr+avKqsHsiMtmgPgvPgkiJf2A@mail.gmail.com>
From: Ilya Grigorik <igrigorik@gmail.com>
Date: Mon, 13 Jul 2015 15:31:04 -0700
Message-ID: <CAKRe7JH0_6b=CSE5bGBWa8pNoQYBNcoyjt8iKYP-G8JJCmgY9Q@mail.gmail.com>
To: Ben Maurer <ben.maurer@gmail.com>
Cc: Adam Rice <ricea@chromium.org>, Mark Nottingham <mnot@mnot.net>, Amos Jeffries <squid3@treenet.co.nz>, HTTP Working Group <ietf-http-wg@w3.org>
Content-Type: multipart/alternative; boundary="001a1139bdf84a97ea051ac949c2"
Received-SPF: pass client-ip=209.85.192.51; envelope-from=igrigorik@gmail.com; helo=mail-qg0-f51.google.com
X-W3C-Hub-Spam-Status: No, score=-6.0
X-W3C-Hub-Spam-Report: AWL=-0.290, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, W3C_AA=-1, W3C_IRA=-1, W3C_WL=-1
X-W3C-Scan-Sig: lisa.w3.org 1ZEmGf-0005bD-S1 2278b7b7fc305bbd70c00ffa3b980142
X-Original-To: ietf-http-wg@w3.org
Subject: Re: dont-revalidate Cache-Control header
Archived-At: <http://www.w3.org/mid/CAKRe7JH0_6b=CSE5bGBWa8pNoQYBNcoyjt8iKYP-G8JJCmgY9Q@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/29946
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

On Fri, Jul 10, 2015 at 3:29 AM, Amos Jeffries <squid3@treenet.co.nz> wrote:

> > At Facebook, we use this method to serve our static resources. However
> > we've noticed that despite our nearly infinite expiration dates we see
> > 10-20% of requests (depending on browser) for static resource being
> > conditional revalidation. We believe this happens because UAs perform
> > revalidation of requests if a user refreshes the page. Our statistics
> show
> > that about 2% of navigations to FB are reloads -- however these requests
> > cause a disproportionate amount of traffic to our static resources
> because
> > they are never served by the user's cache.
>
> That tells me that 10-20% of your traffic is probably coming from a
> HTTP/1.1 proxy cache. Whether it reveals itself as a proxy or not.
>
> Speaking for Squid, we limit caching time at 1 year**. After which
> objects get revalidated before use. Expires header in HTTP/1.1 only
> means that objects are stale and must be revalidated before next use.
> Proxy with existing content does that with a synthesized revalidation
> request even if the client that triggered it did a plain GET. Thereafter
> the proxy has a new Expires value to use*** until that itself expires.


Amos, not sure I follow the proxy conclusion.. I'm reading this correctly,
it sounds like if I specify a 1 year+ max-age, then Squid will revalidate
the object for each request? If so, ouch. However, unless that gotcha
accounts for all of the extra revalidations, why would the proxy cause more
revalidations? Intuitively, shouldn't it reduce the number of revalidations
by collapsing number of requests to FB origin?

(also, as Ben noted, due to HTTPS, I doubt that's the culprit...)


On Sat, Jul 11, 2015 at 10:58 AM, Ben Maurer <ben.maurer@gmail.com> wrote:

> One major issue with this solution is that it doesn't address situations
> where content is embedded in a third party site. Eg, if a user includes an
> API like Google Maps or the Facebook like button those APIs may load
> subresources that should fall under this stricter policy. This issue cuts
> both ways -- if 3rd party content on your site isn't prepared for these
> semantics you could break it.


Hmm, I think a markup solution would still work for the embed case:
- you provide a stable embed URL with relatively short TTL (for quick
updates)
- embedded resource is typically HTML (iframe) or script, that initiates
subresources fetches
-- said resource can add appropriate attributes/markup on its subresources
to trigger the mode we're discussing here

^^ I think that would work, no? Also, slight tangent.. Fetch API has notion
of "only-if-cached" and "force-cache", albeit both of those are skipped on
"reload", see step 11:
https://fetch.spec.whatwg.org/#http-network-or-cache-fetch.

On Mon, Jul 13, 2015 at 2:57 AM, Ben Maurer <ben.maurer@gmail.com> wrote:

> We could also study this in the HTTP Archive -- if I took all resources
> that had a 30 day or greater max age and send their servers revalidation
> requests 1 week from today, what % of them return a 304 vs other responses.


Not perfect, but I think it's should offer a pretty good estimate:
http://bigqueri.es/t/how-many-resources-persist-across-a-months-period/607

- ~48% of resource requests end up requesting the same URL (after 30 days).
Of those...
-- ~84% fetch the same content (~40% of all request and ~33% of total bytes)
-- ~16% fetch different content (~8% of all requests and ~9% of total bytes)

ig