Re: dont-revalidate Cache-Control header

Ben Maurer <ben.maurer@gmail.com> Fri, 10 July 2015 17:18 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0B85A1A004A for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Fri, 10 Jul 2015 10:18:20 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.011
X-Spam-Level:
X-Spam-Status: No, score=-7.011 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0T46kltFgkb8 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Fri, 10 Jul 2015 10:18:17 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0107A1A00A1 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Fri, 10 Jul 2015 10:18:16 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.80) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1ZDbt6-0008Q0-Vq for ietf-http-wg-dist@listhub.w3.org; Fri, 10 Jul 2015 17:15:01 +0000
Resent-Date: Fri, 10 Jul 2015 17:15:00 +0000
Resent-Message-Id: <E1ZDbt6-0008Q0-Vq@frink.w3.org>
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <ben.maurer@gmail.com>) id 1ZDbt1-0008Gv-TA for ietf-http-wg@listhub.w3.org; Fri, 10 Jul 2015 17:14:55 +0000
Received: from mail-lb0-f169.google.com ([209.85.217.169]) by lisa.w3.org with esmtps (TLS1.2:RSA_ARCFOUR_SHA1:128) (Exim 4.80) (envelope-from <ben.maurer@gmail.com>) id 1ZDbsz-0004cP-NN for ietf-http-wg@w3.org; Fri, 10 Jul 2015 17:14:55 +0000
Received: by lblf12 with SMTP id f12so26659349lbl.2 for <ietf-http-wg@w3.org>; Fri, 10 Jul 2015 10:14:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Q0+Rmz2IuhJIa9WF+J5fbM4vvpnk7x7p22jtjzYKmxg=; b=GbniOYP5mOBuJSnk12bTgmphncYYYZvo7cFl3KwmWHsjJ7vs9XB3lxWpUFZh8C9+S5 OH6SWzF44VGW5lB0++IQ6PWJgV1WH42niWzBPeGUKZQDJRrUYMBwztqgLCKINviMrUeP xGDC35taPhrbekMI6HZVaDokRflsOEr4JHpLAJxek8mGKXPfm5r+yCtU7AHZfeVNXIyd u4ojE6y/sbcxgy+FOMns2UHCEM1VhVRIWhz3JsSPzjbZr2SPgrZUEW+56H+oWASHMcJe iyLZEXHPwOPR6AtKun2c7LaanEQ5relXA5OQDC4QzZCkzqk5gM57fpmfmpb1VrtQgy+n aQYQ==
MIME-Version: 1.0
X-Received: by 10.152.27.74 with SMTP id r10mr21094480lag.31.1436548466901; Fri, 10 Jul 2015 10:14:26 -0700 (PDT)
Received: by 10.25.163.147 with HTTP; Fri, 10 Jul 2015 10:14:26 -0700 (PDT)
In-Reply-To: <559F9E90.4020801@treenet.co.nz>
References: <CABgOVaLHBb4zcgvO4NUUmAzUjNkocBGYY3atFA9iuYyoLaLQsA@mail.gmail.com> <559F9E90.4020801@treenet.co.nz>
Date: Fri, 10 Jul 2015 10:14:26 -0700
Message-ID: <CABgOVaLG6QZyjqk2AGYupShST_u3ty9BpxUcPX+_yMEC1hyHAQ@mail.gmail.com>
From: Ben Maurer <ben.maurer@gmail.com>
To: Amos Jeffries <squid3@treenet.co.nz>
Cc: ietf-http-wg@w3.org
Content-Type: multipart/alternative; boundary="089e0158ca14128bea051a8881a8"
Received-SPF: pass client-ip=209.85.217.169; envelope-from=ben.maurer@gmail.com; helo=mail-lb0-f169.google.com
X-W3C-Hub-Spam-Status: No, score=-5.6
X-W3C-Hub-Spam-Report: AWL=-0.911, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, W3C_AA=-1, W3C_WL=-1
X-W3C-Scan-Sig: lisa.w3.org 1ZDbsz-0004cP-NN 139bfcdc8e1d7f7c73769e410b95af10
X-Original-To: ietf-http-wg@w3.org
Subject: Re: dont-revalidate Cache-Control header
Archived-At: <http://www.w3.org/mid/CABgOVaLG6QZyjqk2AGYupShST_u3ty9BpxUcPX+_yMEC1hyHAQ@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/29928
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

On Fri, Jul 10, 2015 at 3:29 AM, Amos Jeffries <squid3@treenet.co.nz> wrote:

> On 10/07/2015 10:25 a.m., Ben Maurer wrote:

> In long cacing if a website has a
> > resource X.js which might change from time to time, rather than
> referencing
> > X.js and giving the endpoint a short expiration date, they reference
> > X-v1.js with a nearly infinite expiration date.
>
> I know where this started... An old tutorial written back in the late
> 1990's when HTTP/1.0 expiration was the only type of caching available
> and the first Browser War was in full swing. "Archaic" is the best word
> to describe it.
>
> It applies badly to HTTP/1.1 caching situations and can actually
> *reduce* cacheability of objects if applied indiscriminantly. Also
> introducing the possibility of nasty human errors via the manual version
> control system.
>

Nobody is recommending people do this manually, just as we don't recommend
people write javascript without whitespace. A build process should take
resources such as JS and CSS, minify them and create uniquely named versions


> There are a few edge cases where it applies well. But Best Practice it
> certainly is NOT in the current web environmant.


This practice seems to be one that is widely recommended -- eg the Google
guide in my original email, Steve Souders's book
https://books.google.com/books?id=jRVlgNDOr60C&lpg=PA23&ots=pbw_DA5ce0&dq=cache%20expiration%20steve%20souders&pg=PA27#v=onepage&q=cache%20expiration%20steve%20souders&f=false.


At Facebook we've found that this technique dramatically increases user
performance. A number of other large websites also deploy it.


> > When X.js changes, the
> > website uploads X-v2.js and changes any references to use the new
> version.
> > This has the benefit that the browser never needs to revalidate resources
> > and that it sees changes instantly. [1]
>
> These days we have HTTP/1.1 revalidation. Where the object ETag is
> derived from either stored mtime value, a hash of the object, or both
> the HTTP/1.1 software out there today can take care of version control
> easily and fast without any manual assistance needed from the web dev or
> duplicated copies of things hanging around.


ETag validation is exactly what we want to avoid. Validating an etag still
requires a round trip from the client to the server to revalidate the tag.
When we serve a resource X.js on our page, if the user has the current
version of X.js in their cache we want them to use it without a round trip
to the server. Naming the file X-<version>.js accomplishes this, as if the
server references X-v20.js the client knows that v20 is the current
preferred version and that if it has cached v20 it can use it without an
extra RTT.



> > At Facebook, we use this method to serve our static resources. However
> > we've noticed that despite our nearly infinite expiration dates we see
> > 10-20% of requests (depending on browser) for static resource being
> > conditional revalidation. We believe this happens because UAs perform
> > revalidation of requests if a user refreshes the page. Our statistics
> show
> > that about 2% of navigations to FB are reloads -- however these requests
> > cause a disproportionate amount of traffic to our static resources
> because
> > they are never served by the user's cache.
>
> That tells me that 10-20% of your traffic is probably coming from a
> HTTP/1.1 proxy cache. Whether it reveals itself as a proxy or not.


Facebook is served exclusively over HTTPS meaning that we see relatively
few proxies. We are fairly sure that we're not seeing this level of proxy
traffic and that the revalidations are from UA triggered refreshes.

>
> I do think I know where you are coming from with this, and kind of
> agree. The UA whose refresh button goes straight to reload-everything
> instead of efficient revalidate-only-as-needed behaviour is broken IMHO.
> However that is a UA bug, not a protocol problem.


This is the crux of the problem -- as a website that carefully manages its
cache control headers, we want UAs to treat a reload as a normal navigation
and to respect normal caching rules. However the de facto behavior of UAs
is that a refresh causes revalidation of all resources on the page.

There are two possible solutions:

1) Ask UAs to change their behavior. This will silently change the behavior
of websites that do nothing. Maybe there's a website out there that says
"please hit the refresh button on your website to see the latest weather".
That site will no longer work if it depends on the revalidation behavior.
In my discussions with UA implementers, they seem unwilling (or at least
extremely hesitant) to take such a risk

2) Create a new behavior that websites can opt in to. Ensure that UAs
implement it consistently. This has less risk of breaking existing sites,
though I understand the hesitance to have a header that says "no *REALLY*
trust my expiration times". Perhaps the header is poorly advertising the
functionality that we wish to achieve. A better name/behavior might be
Cache-control: content-addressed. content-addressed would signal that the
contents of the current URL is a pure function of the URL itself. IE, that
the contents will never change. It would take priority over a max-age
header and signal to the browser that the resource should be permanently
cached.