[http-state] some notes on cookies

Dan Winship <dan.winship@gmail.com> Wed, 05 August 2009 17:28 UTC

Return-Path: <dan.winship@gmail.com>
X-Original-To: http-state@core3.amsl.com
Delivered-To: http-state@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 11C7B3A7205 for <http-state@core3.amsl.com>; Wed, 5 Aug 2009 10:28:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.265
X-Spam-Level:
X-Spam-Status: No, score=-2.265 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, IP_NOT_FRIENDLY=0.334]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qmmA8olQxG48 for <http-state@core3.amsl.com>; Wed, 5 Aug 2009 10:28:24 -0700 (PDT)
Received: from mysterion.org (mysterion.org [69.25.196.35]) by core3.amsl.com (Postfix) with ESMTP id B7DA13A6B28 for <http-state@ietf.org>; Wed, 5 Aug 2009 10:28:24 -0700 (PDT)
Received: from desktop.home.mysterion.org (c-76-97-71-164.hsd1.ga.comcast.net [76.97.71.164]) by mysterion.org (Postfix) with ESMTPA id 5119E802AE for <http-state@ietf.org>; Wed, 5 Aug 2009 13:28:21 -0400 (EDT)
Message-ID: <4A79C12F.6020903@gmail.com>
Date: Wed, 05 Aug 2009 13:28:15 -0400
From: Dan Winship <dan.winship@gmail.com>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1b3pre) Gecko/20090513 Fedora/3.0-2.3.beta2.fc11 Thunderbird/3.0b2
MIME-Version: 1.0
To: "http-state@ietf.org" <http-state@ietf.org>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Subject: [http-state] some notes on cookies
X-BeenThere: http-state@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Discuss HTTP State Management Mechanism <http-state.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/http-state>
List-Post: <mailto:http-state@ietf.org>
List-Help: <mailto:http-state-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 05 Aug 2009 17:28:26 -0000

When I was implementing cookie handling for libsoup, I set up Firefox to
log all cookie-related activity for a few weeks and then went over the
logs afterward trying to learn things. Here are some things I found.


Netscape vs RFC 2109 vs RFC 2965
--------------------------------

I didn't see a single Set-Cookie2 header from any of the sites I visited
during that time.
(http://www.nextthing.org/archives/2005/08/07/fun-with-http-headers and
http://code.google.com/webstats/2005-12/httpheaders.html, both from late
2005, suggest Set-Cookie is [was?] about 100 times more common than
Set-Cookie2, but in my experience of "web sites I actually visit", it
was infinity times more common.)
http://www.mnot.net/blog/2006/10/27/cookie_fun says that among browsers,
only Opera supports Set-Cookie2. (That was pre-Chrome, but I'm assuming
Chrome does not support it either.)

A handful of sites return cookies with Version=1 (ie, RFC 2109), but
this is generally a lie; they are no more likely to parse according to
the RFC 2109 grammar than cookies without Version=1. No cookie included
any RFC 2109 or 2965 attributes other than Version and Max-Age.

Lots of cookies include both Max-Age and Expires (making them violate
both the Netscape spec and RFC 2109). Very few had *only* Max-Age. (I am
not sure which browsers understand Max-Age, or what they do when both
Max-Age and Expires are present, but don't match.)


Parsing
-------

Every cookie NAME I saw matched the Netscape spec rule ("a sequence of
characters excluding semi-colon, comma and white space [and ending
with an equals sign]"). (Actually, all but two of them matched the RFC
2109 rule, "token", as well.)

99% of the cookie VALUEs matched the Netscape spec rule ("a sequence of
characters excluding semi-colon, comma and white space"), but some had
commas or spaces (often both) in the VALUE.

Many many VALUEs did not match the RFC 2109 rule (token|quoted-string).
A handful of cookies do put quotes around the VALUE, but
http://codereview.chromium.org/17045 suggests that all browsers but
Firefox treat the quotes the same as they'd treat any other character.
Firefox's behavior is odd; it parses the VALUE as a quoted-string
(meaning it could in theory include a semicolon), but then includes the
quotes as part of the parsed value. For all the quoted cookie values I
saw, Firefox's behavior was equivalent to other browsers.

Other than HttpOnly, Version, and Max-Age, no cookie had any attributes
that weren't defined in the Netscape spec. (Note that neither the
Netscape spec nor 2109 allows unrecognized attributes to appear, which
is something we should fix, since I believe all clients just ignore
unrecognized attributes.)

RFC 2109 allows attribute values to be quoted-strings, but I only saw
one cookie that did this, and it was coming from an ad server, so even
if browsers parsed it completely bogusly, users would never notice and
complain, so this doesn't provide any evidence about how browsers treat
quoted attribute values.

Some cookies included an erroneous trailing ";".


Expires
-------

The majority of cookies with an Expires attribute used the date syntax
specified by the most-recent version of the Netscape spec:

    Wdy, DD-Mon-YYYY HH:MM:SS GMT

(which is to say, an rfc1123-date with the spaces in the date part
replaced with hyphens). However, about 1/3 of cookies used some
other format instead, such as:

    Wdy, DD Mon YYYY HH:MM:SS GMT     [1]
    Wdy, DD-Mon-YY HH:MM:SS GMT       [2]
    Weekday, DD-Mon-YY HH:MM:SS GMT   [3]

[1] is an unmodified rfc1123-date and is the most popular alternative
format, especially when you include cookies set from javascript, where
Netscape had explicitly told web authors that they could use
Date.toGMTString when setting document.cookie.
(http://web.archive.org/web/20080208100914/http://wp.netscape.com/eng/mozilla/3.0/handbook/javascript/advtopic.htm)

[2] is the date format given in the original version of the Netscape
spec (which is also quoted in RFC 2109). [3] is the format used by the
*example* in the Netscape spec (both original and updated versions),
which doesn't actually match its own definition.

There were also various other not-quite-right formats, including every
other possible combination of 2- and 4-digit years, and of
zero-prefixed, space-prefixed, and non-prefixed single-digit days. The
absolute worst was "Wdy Mon DD HH:MM:SS YYYY GMT", which is an *invalid*
asctime-date (a real asctime-date doesn't specify "GMT"), making it
three degrees of separation away from being reasonable, but it's the
format used by "a certain well-known online bookseller", so browsers
really have no choice but to accept it.

So... "Clients SHOULD accept any string they can manage to extract any
remotely plausible date out of." Or something.

It may also be worth warning implementors explicitly about
overflow/underflow; some sites issue cookies with expiration dates
outside the range of 32-bit time_t.


Path
----

The Netscape spec says that:

    the path "/foo" would match "/foobar"

but this is crazy and I don't think anyone actually implements it that way.

RFC 2109 says that the Path specified in a Set-Cookie must be a prefix
of the Request-URI, but the original spec did not make this requirement,
and since some sites require the original behavior, browsers don't
implement the RFC 2109 rule; any web page on a given host is allowed to
set a cookie with any Path. Thus, Path cannot be used as a security
measure; it's solely an optimization, used to inform the browser that it
can save bandwidth by sending certain cookies only to the resources
that actually care about them.


-- Dan