[http-state] parser rules of draft-ietf-httpstate-cookie-22

"Roy T. Fielding" <fielding@gbiv.com> Wed, 23 February 2011 22:06 UTC

Return-Path: <fielding@gbiv.com>
X-Original-To: http-state@core3.amsl.com
Delivered-To: http-state@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 0D21E3A68FD for <http-state@core3.amsl.com>; Wed, 23 Feb 2011 14:06:59 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.599
X-Spam-Level:
X-Spam-Status: No, score=-102.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iktydKkQlP20 for <http-state@core3.amsl.com>; Wed, 23 Feb 2011 14:06:57 -0800 (PST)
Received: from homiemail-a73.g.dreamhost.com (caiajhbdcbhh.dreamhost.com [208.97.132.177]) by core3.amsl.com (Postfix) with ESMTP id A84523A67FC for <http-state@ietf.org>; Wed, 23 Feb 2011 14:06:57 -0800 (PST)
Received: from homiemail-a73.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a73.g.dreamhost.com (Postfix) with ESMTP id 645C31F0078; Wed, 23 Feb 2011 14:07:45 -0800 (PST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gbiv.com; h=subject:mime-version :content-type:from:in-reply-to:date:cc:content-transfer-encoding :message-id:references:to; q=dns; s=gbiv.com; b=Hs0UfWX/OocbFcTd O19avjWR1owS/OIi3Wu+jChjlphAs1m04l4jsCI9SIz4LlOX9ti/EyqrS5V8kn4G lHukwMRGGr0V+LA5GNQo10Yw1ne6UxwhPr1BwiIBXuOaZ44FFzp52UWyCNnrejmJ F7HSqNcghQIarN/W/su9aq53OWw=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gbiv.com; h=subject :mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; s=gbiv.com; bh=CTI+0MbdHpsUB46Kll9lTONJW0s=; b=b0XJJlRBbPYnxfz5Yt4LgVWDk1zM Z49YXMUufpJiOTQ3r6MwkMocfE6ryOmdABwB0EdBaSKFNhnqhbaXMP7UHeRVdSdi DdCKYdvmP/OKdG3Gbjfqj/9bILvZWc1wruxbIiNZHZ3j4OMWc6zAJCpdczhBdWt8 Uc6UivqmxWDw2qY=
Received: from kiwi.corp.day.com (wsip-98-189-13-228.oc.oc.cox.net [98.189.13.228]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: fielding@gbiv.com) by homiemail-a73.g.dreamhost.com (Postfix) with ESMTPSA id EB78D1F0014; Wed, 23 Feb 2011 14:07:44 -0800 (PST)
Mime-Version: 1.0 (Apple Message framework v1082)
Content-Type: text/plain; charset=us-ascii
From: "Roy T. Fielding" <fielding@gbiv.com>
In-Reply-To: <AANLkTinnySHEXvaQSxoUAKNaPWThDWdJwnhvCdVfa5Vr@mail.gmail.com>
Date: Wed, 23 Feb 2011 14:07:44 -0800
Content-Transfer-Encoding: quoted-printable
Message-Id: <1E7DE6DF-864A-48AF-B9A3-698DEF4B3B2D@gbiv.com>
References: <20110204184735.26023.qmail@mm01.prod.mesa1.secureserver.net> <AANLkTi=qBVkGwMHqAidtwP5_A8pPrF-Y9MV4jgYS5_QM@mail.gmail.com> <7384878F-C44A-42A4-9694-1BB1C18AA5E6@gbiv.com> <AANLkTinFq7bE_e3SSgdjuFvZ8hGn1xy4Hc1VKwc=vp1D@mail.gmail.com> <49225418-A1AF-4299-8C4F-2E608D34265D@gbiv.com> <AANLkTimrJF3LFR4t4j=U2L33kFh+wf-R=sjjwexcmyPi@mail.gmail.com> <26240DE2-4DD3-4863-81B1-635D34BA4AE4@gbiv.com> <AANLkTikzB=VORtn7xiG2JY8ymTjk4epC9huZTC-s0nzq@mail.gmail.com> <4D5AEE94.6010303@gmx.de> <AANLkTimkmZ99qDcXB6=-PGtXq6WQ7+RSreRwsBAHryEj@mail.gmail.com> <DA7A626A-9613-4A49-8A46-8096F7F465B4@gbiv.com> <AANLkTi=aX26NgDx3J0zk6a6H-Fg-9hyuBhfwvVW5nBiH@mail.gmail.com> <AANLkTinnySHEXvaQSxoUAKNaPWThDWdJwnhvCdVfa5Vr@mail.gmail.com>
To: Adam Barth <ietf@adambarth.com>
X-Mailer: Apple Mail (2.1082)
Cc: iesg@iesg.org, http-state <http-state@ietf.org>
Subject: [http-state] parser rules of draft-ietf-httpstate-cookie-22
X-BeenThere: http-state@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Discuss HTTP State Management Mechanism <http-state.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/http-state>
List-Post: <mailto:http-state@ietf.org>
List-Help: <mailto:http-state-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 23 Feb 2011 22:06:59 -0000

On Feb 15, 2011, at 5:51 PM, Adam Barth wrote:
> On Tue, Feb 15, 2011 at 5:46 PM, Adam Barth <ietf@adambarth.com>; wrote:
>> On Tue, Feb 15, 2011 at 3:55 PM, Roy T. Fielding <fielding@gbiv.com>; wrote:
>>> On Feb 15, 2011, at 2:05 PM, Adam Barth wrote:
>>>> On Tue, Feb 15, 2011 at 1:22 PM, Julian Reschke <julian.reschke@gmx.de>; wrote:
>>>>> On 15.02.2011 22:11, Adam Barth wrote:
>>>>>> ...
>>>>>> You really think we should recommend that servers use invalid UTF-8
>>>>>> sequences as cookie-values?  That sounds like bad advice...
>>>>>> ...
>>> 
>>> I am not recommending that they use invalid UTF-8 sequences.
>>> The ABNF only tells the implementer what octets can be used
>>> and what needs to be anticipated while parsing.  If I were
>>> designing a new Cookie protocol, I would exclude the high bits,
>>> but this is how the current Cookie protocol actually works in
>>> practice.
>> 
>> I feel like I'm just repeating myself.  The text you're complaining
>> about does not define how to parse a Set-Cookie header.
> 
> To be crystal clear, this is the only text in the document that refers
> to this figure:
> 
>          Servers SHOULD NOT send Set-Cookie headers
>          that fail to conform to the following grammar:

Which, as I said, is an incorrect requirement if this spec is
supposed to reflect how Cookie and Set-Cookie are used on
the Internet.  That should be more obvious in context:

Abstract:

   This document defines the HTTP Cookie and Set-Cookie header fields.
   These header fields can be used by HTTP servers to store state
   (called cookies) at HTTP user agents, letting the servers maintain a
   stateful session over the mostly stateless HTTP protocol.  Although
   cookies have many historical infelicities that degrade their security
   and privacy, the Cookie and Set-Cookie header fields are widely used
   on the Internet.

ToC

   4.  Server Requirements  . . . . . . . . . . . . . . . . . . . . . 12
     4.1.  Set-Cookie . . . . . . . . . . . . . . . . . . . . . . . . 12
       4.1.1.  Syntax . . . . . . . . . . . . . . . . . . . . . . . . 12
       4.1.2.  Semantics (Non-Normative)  . . . . . . . . . . . . . . 13
     4.2.  Cookie . . . . . . . . . . . . . . . . . . . . . . . . . . 16
       4.2.1.  Syntax . . . . . . . . . . . . . . . . . . . . . . . . 16
       4.2.2.  Semantics  . . . . . . . . . . . . . . . . . . . . . . 16
   5.  User Agent Requirements  . . . . . . . . . . . . . . . . . . . 17

4.1.1.  Syntax

   Informally, the Set-Cookie response header contains the header name
   "Set-Cookie" followed by a ":" and a cookie.  Each cookie begins with
   a name-value pair, followed by zero or more attribute-value pairs.
   Servers SHOULD NOT send Set-Cookie headers that fail to conform to
   the following grammar:


set-cookie-header = "Set-Cookie:" SP set-cookie-string
set-cookie-string = cookie-pair *( ";" SP cookie-av )
cookie-pair       = cookie-name "=" cookie-value
cookie-name       = token
cookie-value      = token / *base64-character
base64-character  = ALPHA / DIGIT / "+" / "/" / "="
token             = <token, defined in [RFC2616], Section 2.2>


[BTW, the grammar loses indentation because the comment on
max-age-av is too long]

Your argument appears to be that the cookie-value production
doesn't need to reflect actual usage because the specification
does not include a separate section on how servers should parse
received Cookie values.  In other words, the spec is incomplete.

My argument is that server developers (like me) already have
implementations that parse the Cookie value in a similar way
that user agents parse the Set-Cookie value, largely because
those values are set by third-party modules that the server
has no real control over.  BTW, server implementations like
libapreq <http://httpd.apache.org/apreq/> will only allow
embedded whitespace in value if it is a quoted-string.

Therefore, I would like you to change the ABNF so that it
reflects the reality of (Set-)Cookie usage on the Internet,
for the same reason that you have insisted the algorithm
for user agent parsing reflects reality.  Changing the ABNF
to include base64 does not do that -- it is just another
fantasy production that differs from all prior specs of
the cookie algorithm.  Changing it to

 cookie-value      = %x21-2B / %x2D-3A / %x3C-7E / %x80-FF

or just the minimum

 cookie-value      = %x21-2B / %x2D-3A / %x3C-7E

returns the definition to the original Netscape spec (at
least in the first case), reflects how they are implemented
on the Internet, and eliminates this artificial distinction
between the server and user agent requirements.

>>> I don't have any objection to a sentence that says servers
>>> SHOULD NOT send %x80-FF even though the grammar allows it.
>>> But I don't see any need for it either given the apparent
>>> interoperability shown by user agents.
>> 
>> That's essentially what the current document says.  The only
>> difference is that it also recommends that servers not generate
>> characters such as %x24, whose role in cookies has been muddied by RFC
>> 2965.

I don't see any restrictions on '$' (it is a valid token char).

The current document says

 cookie-value      = token / *base64-character

which unnecessarily excludes DQUOTE, "(" , ")" , "<" , ">" ,
"@", ":" , "\" , "[" , "]" , "?" , "{" , and "}".  It excludes
Google's very common __utmz cookie because that cookie's format
uses both pipe "|" (only allowed in token) and equals "="
(only allowed in *base64-character).

I do not see any reason for this working group to invent a
new grammar that has no implementation experience and differs
from every single prior specification of (Set-)Cookies.

The IESG should not approve such an error for publication,
since it will just require us to start this process over.

Cheers,

Roy T. Fielding                            <http://roy.gbiv.com/>
Principal Scientist, Adobe          <http://adobe.com/enterprise>