Re: [http-state] Comments on draft-ietf-httpstate-cookie-08

Adam Barth <ietf@adambarth.com> Sun, 30 May 2010 20:48 UTC

Return-Path: <ietf@adambarth.com>
X-Original-To: http-state@core3.amsl.com
Delivered-To: http-state@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 73F3A3A690C for <http-state@core3.amsl.com>; Sun, 30 May 2010 13:48:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.02
X-Spam-Level:
X-Spam-Status: No, score=0.02 tagged_above=-999 required=5 tests=[AWL=-0.603, BAYES_50=0.001, FM_FORGED_GMAIL=0.622]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id P7+Z9Y+oA35X for <http-state@core3.amsl.com>; Sun, 30 May 2010 13:48:52 -0700 (PDT)
Received: from mail-yw0-f182.google.com (mail-yw0-f182.google.com [209.85.211.182]) by core3.amsl.com (Postfix) with ESMTP id 5D0C73A6848 for <http-state@ietf.org>; Sun, 30 May 2010 13:48:52 -0700 (PDT)
Received: by ywh12 with SMTP id 12so2491592ywh.19 for <http-state@ietf.org>; Sun, 30 May 2010 13:48:38 -0700 (PDT)
Received: by 10.101.133.9 with SMTP id k9mr3962108ann.43.1275252518553; Sun, 30 May 2010 13:48:38 -0700 (PDT)
Received: from mail-gy0-f172.google.com (mail-gy0-f172.google.com [209.85.160.172]) by mx.google.com with ESMTPS id n18sm26110225anl.2.2010.05.30.13.48.36 (version=SSLv3 cipher=RC4-MD5); Sun, 30 May 2010 13:48:37 -0700 (PDT)
Received: by gyh4 with SMTP id 4so2337186gyh.31 for <http-state@ietf.org>; Sun, 30 May 2010 13:48:36 -0700 (PDT)
Received: by 10.231.156.1 with SMTP id u1mr4535139ibw.46.1275252516345; Sun, 30 May 2010 13:48:36 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.231.60.4 with HTTP; Sun, 30 May 2010 13:48:16 -0700 (PDT)
In-Reply-To: <op.vdfzz8lovqd7e2@killashandra.oslo.osa>
References: <op.vdfzz8lovqd7e2@killashandra.oslo.osa>
From: Adam Barth <ietf@adambarth.com>
Date: Sun, 30 May 2010 13:48:16 -0700
Message-ID: <AANLkTimlI-W0XCSff8Q4Lmh4AFs4PaY76N1QAxJAHZ7R@mail.gmail.com>
To: yngve@opera.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
Cc: "http-state@ietf.org" <http-state@ietf.org>
Subject: Re: [http-state] Comments on draft-ietf-httpstate-cookie-08
X-BeenThere: http-state@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Discuss HTTP State Management Mechanism <http-state.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/http-state>
List-Post: <mailto:http-state@ietf.org>
List-Help: <mailto:http-state-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 30 May 2010 20:48:54 -0000

On Fri, May 28, 2010 at 6:05 PM, Yngve Nysaeter Pettersen
<yngve@opera.com> wrote:
> Here are some more comments about the httpstate draft.

Thanks for your feedback.  See below for detailed comments:

> * Sec 3 (overview)
>
>  - "In subsequent requests, the user agent returns a Cookie request header
> to the origin server"
>
> Should this say that a client may send a Cookie header (if it decides to
> accept the Cookie)? (I also prefer "send" instead of "returns",
> alternatively "return the stored cookie in a Cookie header")

I've changed this to "send."  As this is just the overview, I've left
it focusing on the common case, which is that the user agent does
actually end up sending the cookie.

>  - The section also mentions gateways,concerning changes to the Set-Cookie
> headers. Might it be useful to also specify that gateways must not insert
> Set-Cookie headers?

Why must gateways not insert Set-Cookie headers?  Granted, it might or
might not be friendly behavior, but it seems ok from a protocol
standpoint.

> * Sec 3.1
>
> - Should the expiration example mention max-age?

These are just examples.  I think its fine to focus on the common case.

> - Should "a expiration date" be changed to "a validity period" or words to
> that effect?

Hum...  The Expires attribute doesn't contain a period.  It contains a date...

> * Sec 4.1.1
>
> - "Servers SHOULD NOT include two Set-Cookie header fields in the same
> response with the same cookie-name."
>
> Should a caution about "unpredictable results" be added to this?

The results are predictable, just nutty.

> * Sec 4.1.2.3
>
> -   For example, some user agents will reject Domain attributes of "com" or
> "co.uk".
>
> This seems to imply that there are agents that will accept a TLD as a valid
> domain (except ".local" from 2965, which is a special case). I'd rather have
> the section say that a domain attribute need to specify at least a second
> level domain.

There are some user agents that will accept a TLD as a valid domain.
I'd advise against building such a user agent, but not everyone will
take my advice.

> * sec 4.1.2.4
>
> - %XX escaping and path compares. Maybe add a reference to the URI compare
> definition in 2616?

Can you provide a test case that demonstrates that %-escaping is
relevant in this context?  My testing shows that user agents ignore
%-escaping in the Path attribute.  Now, some user agents canonicalize
%-escaping in URLs, but that's a matter best left to another working
group, IMHO.

> * Sec 4.1.2.6
>
> - Maybe an idea to mention that HttpOnly is a relatively new attribute,
> which may not be implemented by legacy clients?

I believe HTTPOnly is broadly implemented among user agents for which
you can detect whether it's implement.  Note that for the vast
majority of user agents, you can't tell whether they implement
HTTPOnly, so we might as well act as if they did implement it.

> * sec 7.1
>
>  - Should this section have a recommendation to servers about graceful
> handling if the client does not send cookies? (Yes, it is mentioned
> generally further up)

We could, but I don't think it would make much of a difference.

> * Double Qoutes in the cookie-value field

I've replied to this section earlier.

Generally, this whole business of double-quote being a meaningful
character in cookies values was a fantasy invented by RFC 2109.  As
far as I can tell, IE has never treated double-quote different from
any other character in cookie values, which means that the vast
majority of web sites will work fine if we don't either.

The cargo cult of double-quotes being meaningful in cookie attributes
has caused nothing but pain and misery in the world and should be
shot.

(Details below.)

> While not permitted by the Server side part of the spec, the draft seem to
> allow the double qoute character <"> (DQUOTE) anywhere in a cookie value,
> whether they are balanced, or not.

Indeed.

> This is IMO of significant concern since it breaks with how parameter values
> are defined and used in both MIME, HTTP, and RFC 2965, as (token |
> quoted-string), and may cause interoperability problems with clients and
> servers that support RFC 2965 (Opera supports 2965, and AFAIK the Apache
> Tomcat server also supports it).

The syntax of the Set-Cookie header differs from the syntax of other
things related to HTTP.  That's just an unfortunate fact of life.  If
we were designing things to be beautiful from the start, we would
probably do something different, but that's not how things work today.

As for RFC 2965, we're not creating any problems that don't exist
today.  We're just writing down how the cookie protocol works.  If
Cookie2 had deployment problems, that's a topic for another time.

> The handling of cookie values that are not tokens, and contain double quotes
> inside the value in a form that does not match the 2616 definition of
> quoted-string is according to my quick testing inconsistent between a few
> browsers (IE8, FF 3.6 and Opera 10.53),

Indeed.  Cookie values are not quoted strings.

> and AFAICT there is likely to be
> problem with Python based server-environments, like Django 1.1, which is the
> only one I have tested.

I suspect that Django 1.1 works fine in IE5, IE6, IE7, and IE8, which
means it will work fine w.r.t. the spec's treatment of quotes in
cookie values as (in this respect) the spec's behavior matches every
version of IE I could dig up for testing.

> As for the server-side, when sending this header
>
>   Cookie:  name1="foo1; name2=bar"; name3=foo2"2; name4=bar"4; name5=foo
>
> to Django 1.1 or the Python 2.6 Cookie module, the result is the following
> set of cookies:
>
>   name1="foo1; name2=bar"
>   name3=foo2
>   name4=bar
>   name5=foo

That's only an issue if Django sends a Set-Cookie header that
generates that Cookie header.  If somewhere there's a Django server
that does send such a Set-Cookie header, the author of that server is
likely expecting the Cookie header to parse that way because that's
how it works for the majority of user agents.  If they aren't
expecting that parsing, they'll discover it as soon as they test their
site with any version of IE.

By contrast, if we adopt Opera's parsing of the Set-Cookie header,
then we're much more likely to cause interoperability problems for
sites that rely on the more common parsing.

> In other words, when using quotes inside a value with any syntax other than
> quoted-string (which permit internal quoting if backslash escaping is used),
> the result does not just depend on the client the server is sending it to,
> it also depends on the script engine and its cookie parser.

Indeed.  Such is life.

> IMO the specification should specifically comment on these issues, and
> preferably allow clients to discard cookies with quotes that does not match
> the quoted-string syntax, as well as specifically tell servers not to use
> double quotes, except as quoted-string (if they absolutely want to break the
> spec's requirement of only using tokens). Perhaps one way to do that is to
> specifically say that the result of using quotes in this fashion is
> undefined?

I replied to this paragraph earlier.  Essentially, everything you're
asking for is already provided by the current draft.

On Sat, May 29, 2010 at 5:50 AM, Bjoern Hoehrmann <derhoermi@gmx.net> wrote:
> * Yngve Nysaeter Pettersen wrote:
>>* Sec 4.1.2.3
>>
>>-   For example, some user agents will reject Domain attributes of "com"
>>or "co.uk".
>>
>>This seems to imply that there are agents that will accept a TLD as a
>>valid domain (except ".local" from 2965, which is a special case). I'd
>>rather have the section say that a domain attribute need to specify at
>>least a second level domain.
>
> As I understand it, there are a number of frameworks that will auto-
> matically use the server name, or failing that its address, as default
> value for the domain attribute. It does not strike me as a good idea to
> prohibit `Domain=localhost` for instance.

The spec just works the same way user agents work today.  If folks
with such servers are living happy lives today, they'll continue to
live happy lives.  If they're living unhappy lives, then we're not
doing them any additional harm.

On Sat, May 29, 2010 at 7:48 AM, Yngve N. Pettersen (Developer Opera
Software ASA) <yngve@opera.com> wrote:
> Come to think of it, is the single label (local intranet server) situation
> covered in the domain sematics?

Yes.  Feel free to read the spec for details, but what happens
(essentially) is that you can set a host-only cookie for intranet
hosts but you cannot set a cookie without the host-only flag.

On Sat, May 29, 2010 at 10:54 AM, Yngve N. Pettersen (Developer Opera
Software ASA) <yngve@opera.com> wrote:
> On Sat, 29 May 2010 18:51:41 +0200, Adam Barth <ietf@adambarth.com> wrote:
>> I'll reply to your full email in detail later, but another note:
>>
>> On Fri, May 28, 2010 at 6:05 PM, Yngve Nysaeter Pettersen
>> <yngve@opera.com> wrote:
>>>
>>> IMO the specification should specifically comment on these issues, and
>>> preferably allow clients to discard cookies with quotes that does not
>>> match
>>> the quoted-string syntax,
>>
>> User agents already have the option to discard any cookie for any
>> reason.  Adding this text to the spec would be redundant.
>
> I think it may still be useful to point out specific issues where such
> discarding policies are used automatically. My impression of that text is
> that it focus more on cleanup and user controlled filtering, rather than
> what policies the client may have implemented. I suspect that most
> administrators will read it that way too. For example, the draft
> specifically mentions public suffix as a reason why a client can discard a
> cookie.

Yes.  The draft suggests that people discard cookies in ways that are
productive.  I'd rather not encourage folks to discard cookies in ways
that don't interoperate, but, from a protocol point of view, we need
to allow them to do that.  Hence the current text in the spec.

You're welcome to build a user agent that discards cookies with
unbalanced double-quotes.  However, you'll find your user agent in the
minority.  Instead, I'd recommend you build a user agent that doesn't
treat double-quote as a special character and we'll all live happier
lives.

>>> Perhaps one way to do that is to
>>> specifically say that the result of using quotes in this fashion is
>>> undefined?
>>
>> Leaving things like this undefined hurts interoperability.
>
> The situation for this particular value syntax is already undefined, and
> while it is a corner case, it can have unanticipated effects, particularly
> in combination with different cookie ordering algorithms. Putting the server
> administrators on notice about the issue should hopefully increase
> interoperability because they can then avoid the problematic cases.

The behavior for this particular value syntax is indeed defined.  You
can find the definition in Section 5.  We're already advising server
administrators not to use double-quotes.  That's the most helpful
piece of advise I know as it avoids the complexities here.  For
servers, there's no real reason to use double-quotes anyway.  It's
just a trail of tears.

> If we are going to document cookies as used on the Internet, then I think
> that includes pointing out the areas that should be avoided, particularly
> the ones where the results will be undefined. That should not just be done
> by how the syntax is defined, but also directly mentioning them.
>
> This information can be mentioned either as notes (or Notes) in the
> definitions themselves, or collected in a separate "Implementation
> pitfalls"-section.

I agree that we should discuss this issue in more depth in the
deviation description document.  However, I think we should fill up
this document with a bunch of informative text about legacy user
agents except in cases where understanding that information is of
critical importance (e.g., for security).

Adam