Re: [http-state] Is this an omission in the parser rules of draft-ietf-httpstate-cookie-21?

Adam Barth <ietf@adambarth.com> Tue, 15 February 2011 04:42 UTC

Return-Path: <ietf@adambarth.com>
X-Original-To: http-state@core3.amsl.com
Delivered-To: http-state@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 8CFE03A6C38 for <http-state@core3.amsl.com>; Mon, 14 Feb 2011 20:42:23 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.573
X-Spam-Level:
X-Spam-Status: No, score=-2.573 tagged_above=-999 required=5 tests=[AWL=0.404, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ylyr6J8LYkrg for <http-state@core3.amsl.com>; Mon, 14 Feb 2011 20:42:22 -0800 (PST)
Received: from mail-yi0-f44.google.com (mail-yi0-f44.google.com [209.85.218.44]) by core3.amsl.com (Postfix) with ESMTP id 6FB473A6A78 for <http-state@ietf.org>; Mon, 14 Feb 2011 20:42:22 -0800 (PST)
Received: by yie19 with SMTP id 19so2673971yie.31 for <http-state@ietf.org>; Mon, 14 Feb 2011 20:42:46 -0800 (PST)
Received: by 10.150.228.11 with SMTP id a11mr5348248ybh.381.1297744966598; Mon, 14 Feb 2011 20:42:46 -0800 (PST)
Received: from mail-iw0-f172.google.com (mail-iw0-f172.google.com [209.85.214.172]) by mx.google.com with ESMTPS id v4sm2197495ybe.5.2011.02.14.20.42.45 (version=SSLv3 cipher=OTHER); Mon, 14 Feb 2011 20:42:45 -0800 (PST)
Received: by iwc10 with SMTP id 10so5929299iwc.31 for <http-state@ietf.org>; Mon, 14 Feb 2011 20:42:44 -0800 (PST)
Received: by 10.42.180.73 with SMTP id bt9mr6018444icb.401.1297744964847; Mon, 14 Feb 2011 20:42:44 -0800 (PST)
MIME-Version: 1.0
Received: by 10.231.215.67 with HTTP; Mon, 14 Feb 2011 20:42:13 -0800 (PST)
In-Reply-To: <49225418-A1AF-4299-8C4F-2E608D34265D@gbiv.com>
References: <20110204184735.26023.qmail@mm01.prod.mesa1.secureserver.net> <AANLkTi=qBVkGwMHqAidtwP5_A8pPrF-Y9MV4jgYS5_QM@mail.gmail.com> <7384878F-C44A-42A4-9694-1BB1C18AA5E6@gbiv.com> <AANLkTinFq7bE_e3SSgdjuFvZ8hGn1xy4Hc1VKwc=vp1D@mail.gmail.com> <49225418-A1AF-4299-8C4F-2E608D34265D@gbiv.com>
From: Adam Barth <ietf@adambarth.com>
Date: Mon, 14 Feb 2011 20:42:13 -0800
Message-ID: <AANLkTimrJF3LFR4t4j=U2L33kFh+wf-R=sjjwexcmyPi@mail.gmail.com>
To: "Roy T. Fielding" <fielding@gbiv.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: http-state@ietf.org
Subject: Re: [http-state] Is this an omission in the parser rules of draft-ietf-httpstate-cookie-21?
X-BeenThere: http-state@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Discuss HTTP State Management Mechanism <http-state.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/http-state>
List-Post: <mailto:http-state@ietf.org>
List-Help: <mailto:http-state-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Feb 2011 04:42:23 -0000

On Mon, Feb 14, 2011 at 7:23 PM, Roy T. Fielding <fielding@gbiv.com>; wrote:
> On Feb 4, 2011, at 11:29 AM, Adam Barth wrote:
>> On Fri, Feb 4, 2011 at 11:24 AM, Roy T. Fielding <fielding@gbiv.com>; wrote:
>>> On Feb 4, 2011, at 10:51 AM, Adam Barth wrote:
>>>> On Fri, Feb 4, 2011 at 10:47 AM, Remy Lebeau <remy@lebeausoftware.org>; wrote:
>>>>> -------- Original Message --------
>>>>> Subject: Re: [http-state] Is this an omission in the parser rules of
>>>>> draft-ietf-httpstate-cookie-21?
>>>>> From: Adam Barth
>>>>> Date: Fri, February 04, 2011 10:19 am
>>>>> To: Remy Lebeau
>>>>> Cc: http-state@ietf.org
>>>>>
>>>>>> The draft gives user agents precise
>>>>>> instructions for how to parse all
>>>>>> manner of cookies, including cookies with
>>>>>> values that contain quote characters. That
>>>>>> information is contained in Section 5
>>>>>
>>>>> I have re-read Section 5 and I do not see its grammar or parsing rules
>>>>> accounting for quoted-string values at all. It only says to remove WSP
>>>>> characters surrounding extracted names and values, and quote characters
>>>>> are not part of the WSP definition. So what am I missing? Where exactly
>>>>> does it say how to unquote a quoted-string used in attribute values?
>>>>
>>>> Precisely.  It does not say to unquote a quoted-string because that's
>>>> not how cookies work.  The role of the quote character is cookies is
>>>> identical to the role of the "!" character.  That is, neither play a
>>>> special role in the protocol.  Any representations by the contrary by
>>>> 2109 or any other document are fiction and have only caused pain and
>>>> misery in the world.
>>>
>>> That may be, but the grammar for server generation of set-cookie
>>> values is clearly wrong because use of DQUOTE in cookie values is
>>> common (roughly 10% of the values in my browser cookie store) and
>>> previously defined, even if we consider DQUOTE to be part of the
>>> value string.  Let's just change the generating grammar for value to
>>> match how cookies are actually parsed and only exclude characters
>>> that are known to cause failures.
>>
>> The grammar is not used for parsing.  Parsing is defined in Section 5,
>> not Section 4.
>
> Parsing for user agents is defined in section 5.  Servers have to
> parse cookies as well, and the grammar provided in section 4 is wrong
> for both generation and parsing.

I agree that the grammar depicted in Section 4 is not appropriate for
parsing.  That grammar is not for parsing.  It's for generation.
Perhaps we should add instructions for how servers should parse the
Cookie header?  That's absent from the current document.

As for generation, there's no exogenous notion of correctness.  We can
speak about usefulness, if you like, but correctness is endogenous to
this document.

> Why do you want to push forward a document with a known error in the
> grammar?  Just fix it.

I disagree that it's an "error."

> This is not a trivial detail.  As written, the specification says
> server developers SHOULD break sites like amazon.com.
>
>   cookie-value      = token / ""

Can you explain how the developers of site B using a restricted
grammar for generation will break amazon.com?  That seems entirely
non-sensical.

> The correct grammar is in the original Netscape cookie spec: "This
> string is a sequence of characters excluding semi-colon, comma and
> white space."  Which easily translates (conservatively) into
>
>   cookie-value      = %x21-2B / %x2D-3A / %x3C-7E / %x80-FF
>
> and it does not conflict with section 5.

I've updated the draft to recommend the grammar below for generating
cookie-values:

cookie-value      = token / *base64-character
base64-character  = ALPHA / DIGIT / "+" / "/" / "="

Do you have a particular use case in mind for generating Set-Cookie
headers outside this grammar?

Adam