Re: dont-revalidate Cache-Control header

Ben Maurer <ben.maurer@gmail.com> Tue, 14 July 2015 10:06 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 62D081A9153 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 14 Jul 2015 03:06:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.011
X-Spam-Level:
X-Spam-Status: No, score=-7.011 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HGUF2uYk1i8o for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 14 Jul 2015 03:06:50 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A21EE1A914D for <httpbisa-archive-bis2Juki@lists.ietf.org>; Tue, 14 Jul 2015 03:06:50 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.80) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1ZEx3w-0001PR-SS for ietf-http-wg-dist@listhub.w3.org; Tue, 14 Jul 2015 10:03:44 +0000
Resent-Date: Tue, 14 Jul 2015 10:03:44 +0000
Resent-Message-Id: <E1ZEx3w-0001PR-SS@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <ben.maurer@gmail.com>) id 1ZEx3s-0001Ok-Da for ietf-http-wg@listhub.w3.org; Tue, 14 Jul 2015 10:03:40 +0000
Received: from mail-la0-f45.google.com ([209.85.215.45]) by maggie.w3.org with esmtps (TLS1.2:RSA_ARCFOUR_SHA1:128) (Exim 4.80) (envelope-from <ben.maurer@gmail.com>) id 1ZEx3q-0005tA-Ci for ietf-http-wg@w3.org; Tue, 14 Jul 2015 10:03:39 +0000
Received: by lagx9 with SMTP id x9so3023431lag.1 for <ietf-http-wg@w3.org>; Tue, 14 Jul 2015 03:03:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=QLd9T5FAa2MwFEYWR24bV1Z9KkrxXeJ48AbfHaeGFMM=; b=G/czQvHCMVrT+Yq4dKQTsAWYCl6dn03GmWzkj2scNvufCuhxwxmlNcbMNqJWgHwBQ5 8l8uhWmF84WsE3uFV0riIoV8Zq2kyKQFHbQ/UHjp2ndXq00Pv86IaKWoDRDNxEdxr2UA NDgj4e8ANnZuY7ECblMfE9RXcL3ZAKFg/6hHgfMW/s4eMBza+1/pEXsKmDJ1GSVKsHHe Wq/8qvHJHl+diYzVv/+Td+FoGJZHz6GB9Mb6ec22Z2Qo1AGX8qWxpRzNwa2xB0HRtipp Bm9YYVS+rsRtc5PMejEmCnFFqLqSTP0bDf37LMY1jS/KYbYxfSMD8T4mGB2ENKY4w3r+ kQOw==
MIME-Version: 1.0
X-Received: by 10.152.19.35 with SMTP id b3mr36253210lae.92.1436868191069; Tue, 14 Jul 2015 03:03:11 -0700 (PDT)
Received: by 10.25.163.147 with HTTP; Tue, 14 Jul 2015 03:03:10 -0700 (PDT)
In-Reply-To: <55A4D289.20304@treenet.co.nz>
References: <CABgOVaLHBb4zcgvO4NUUmAzUjNkocBGYY3atFA9iuYyoLaLQsA@mail.gmail.com> <559F9E90.4020801@treenet.co.nz> <CABgOVaLG6QZyjqk2AGYupShST_u3ty9BpxUcPX+_yMEC1hyHAQ@mail.gmail.com> <961203FE-7E54-410F-923E-71C04914CD2E@mnot.net> <CABgOVaJxntEyT0v4GvWm0Qi9jbUPEnzxJgg4KyQSM1T_gN1mjQ@mail.gmail.com> <CAHixhFoWXSGeVOqDPX5D51b7EjpyPCzKovEn6n6aoWxhYk_6Vg@mail.gmail.com> <CABgOVaLZotBxWrjaY=m987eGKr+avKqsHsiMtmgPgvPgkiJf2A@mail.gmail.com> <CAKRe7JH0_6b=CSE5bGBWa8pNoQYBNcoyjt8iKYP-G8JJCmgY9Q@mail.gmail.com> <55A4D289.20304@treenet.co.nz>
Date: Tue, 14 Jul 2015 11:03:10 +0100
Message-ID: <CABgOVaKer1SDK6P6nj58ccZcouzyfy6J39sfYAPxAnnPqT13RA@mail.gmail.com>
From: Ben Maurer <ben.maurer@gmail.com>
To: Amos Jeffries <squid3@treenet.co.nz>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Content-Type: multipart/alternative; boundary="089e01493fb21e325c051ad2f219"
Received-SPF: pass client-ip=209.85.215.45; envelope-from=ben.maurer@gmail.com; helo=mail-la0-f45.google.com
X-W3C-Hub-Spam-Status: No, score=-5.6
X-W3C-Hub-Spam-Report: AWL=-0.909, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, W3C_AA=-1, W3C_WL=-1
X-W3C-Scan-Sig: maggie.w3.org 1ZEx3q-0005tA-Ci 338310b458c4c837aedd65e02df7224a
X-Original-To: ietf-http-wg@w3.org
Subject: Re: dont-revalidate Cache-Control header
Archived-At: <http://www.w3.org/mid/CABgOVaKer1SDK6P6nj58ccZcouzyfy6J39sfYAPxAnnPqT13RA@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/29951
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

On Tue, Jul 14, 2015 at 10:12 AM, Amos Jeffries <squid3@treenet.co.nz>
wrote:
>
> > Amos, not sure I follow the proxy conclusion.. I'm reading this
> correctly,
> > it sounds like if I specify a 1 year+ max-age, then Squid will revalidate
> > the object for each request?
>
> No, for the first year you get normal caching behaviour. Then from the
> 1yr mark you get one-ish revalidation, and the new copy is used the
> resets the 1yr counter. So instead of getting things cached forever /
> 68yrs (possibly by error). You get at least one revalidation check per
> year per object.
>

We are seeing this behavior on objects that have not existed for 1 year.
So I don't think we are triggering that behavior.

>> One major issue with this solution is that it doesn't address situations
> >> where content is embedded in a third party site. Eg, if a user includes
> an
> >> API like Google Maps or the Facebook like button those APIs may load
> >> subresources that should fall under this stricter policy. This issue
> cuts
> >> both ways -- if 3rd party content on your site isn't prepared for these
> >> semantics you could break it.
> >
> >
> > Hmm, I think a markup solution would still work for the embed case:
> > - you provide a stable embed URL with relatively short TTL (for quick
> > updates)
> > - embedded resource is typically HTML (iframe) or script, that initiates
> > subresources fetches
> > -- said resource can add appropriate attributes/markup on its
> subresources
> > to trigger the mode we're discussing here
> >
> > ^^ I think that would work, no? Also, slight tangent.. Fetch API has
> notion
> > of "only-if-cached" and "force-cache", albeit both of those are skipped
> on
> > "reload", see step 11:
> > https://fetch.spec.whatwg.org/#http-network-or-cache-fetch.
>

One version of the "markup option" could be allowing us to explicitly
override the Fetch attributes of a subresource. Eg, set the cache-mode to
force-cached. This option does present the challenge of ensuring that we
address every possible subresource (think url()'s in css). Being able to
control fetch options for sub resources is extremely useful.

That said, this doesn't feel like a great thing for us to promote as a web
performance best practice. "If you use long cache lifetimes for your static
content, the dont-revalidate cache control header will reduce the cost of
client reloads" seems like a piece of advice folks might take, as would
"Use the <meta> tag 'dont-reload-non-expired-resources' to avoid browsers
revalidating your content when the user presses reload". On the other hand
"you should find every image, script, stylesheet, etc and set the fetch
option on each to say force-cached" feels more tedious and unlikely to be
used.

> On Mon, Jul 13, 2015 at 2:57 AM, Ben Maurer wrote:
> >
> >> We could also study this in the HTTP Archive -- if I took all resources
> >> that had a 30 day or greater max age and send their servers revalidation
> >> requests 1 week from today, what % of them return a 304 vs other
> responses.
> >
> >
> > Not perfect, but I think it's should offer a pretty good estimate:
> >
> http://bigqueri.es/t/how-many-resources-persist-across-a-months-period/607
> >
> > - ~48% of resource requests end up requesting the same URL (after 30
> days).
> > Of those...
> > -- ~84% fetch the same content (~40% of all request and ~33% of total
> bytes)
> > -- ~16% fetch different content (~8% of all requests and ~9% of total
> bytes)
>

Is it possible to limit this to only resources which claimed a > 30 day
max-age in the first request?

-b