[http-state] comments on -07

Dan Winship <dan.winship@gmail.com> Mon, 19 April 2010 20:07 UTC

Return-Path: <dan.winship@gmail.com>
X-Original-To: http-state@core3.amsl.com
Delivered-To: http-state@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 08EEC3A6A61 for <http-state@core3.amsl.com>; Mon, 19 Apr 2010 13:07:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.45
X-Spam-Level:
X-Spam-Status: No, score=0.45 tagged_above=-999 required=5 tests=[AWL=-0.485, BAYES_50=0.001, IP_NOT_FRIENDLY=0.334, J_CHICKENPOX_73=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id m1XRjQjiBtvS for <http-state@core3.amsl.com>; Mon, 19 Apr 2010 13:07:22 -0700 (PDT)
Received: from mysterion.org (mysterion.org [69.25.196.35]) by core3.amsl.com (Postfix) with ESMTP id 9F0B03A69AE for <http-state@ietf.org>; Mon, 19 Apr 2010 13:07:21 -0700 (PDT)
Received: from desktop.home.mysterion.org (c-76-97-71-164.hsd1.ga.comcast.net [76.97.71.164]) by mysterion.org (Postfix) with ESMTPA id 158B0802AE for <http-state@ietf.org>; Mon, 19 Apr 2010 16:07:10 -0400 (EDT)
Message-ID: <4BCCB7E1.9080903@gmail.com>
Date: Mon, 19 Apr 2010 16:06:57 -0400
From: Dan Winship <dan.winship@gmail.com>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100330 Fedora/3.0.4-1.fc13 Thunderbird/3.0.4
MIME-Version: 1.0
To: http-state@ietf.org
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Subject: [http-state] comments on -07
X-BeenThere: http-state@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Discuss HTTP State Management Mechanism <http-state.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/http-state>
List-Post: <mailto:http-state@ietf.org>
List-Help: <mailto:http-state-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 19 Apr 2010 20:07:24 -0000

> 1.  Introduction

should explain the relation to 2109 and 2965? And are we obsoleting
2965 too?

> The scope indicates the
> maximum amount of time the user agent should retain the cookie, to
> which servers the user agent should return the cookie, and for which
> protocols the cookie is applicable.

would read a little better if the 2nd and 3rd clauses were flipped to
be parallel with the first. "the maximum amount of time the user agent
should return the cookie, the servers to which the user agent should
return the cookie, and the protocols for which the cookie is
applicable".

> The OWS (optional whitespace) rule

...is not explicitly defined either here or in RFC 5234.

It only ends up being used in two rules:

> set-cookie-header = "Set-Cookie:" OWS set-cookie-string OWS
> cookie-header = "Cookie:" OWS cookie-string OWS

And really, since set-cookie-header is only used for production, not
parsing, it should just be

     set-cookie-header = "Set-Cookie:" SP set-cookie-string

and then for cookie-header, we can just do

     cookie-header = "Cookie:" WSP* cookie-string

and put the notes about correct spacing in the text near there.

> Servers MAY return a Set-Cookie response header with any response.

Even non-2xx responses? Have we tested that?

> User agents SHOULD send a Cookie request header, subject to other
> rules detailed below, with every request.

"subject to other rules detailed below" makes this rule basically a
no-op, since user agents are allowed to discard or ignore cookies for
any reason they want to.

Assuming we keep the sentence, should we say "User agents that support
cookies" or something like that? We said before "The terms user agent,
client, server, proxy, and origin server have the same meaning as in
the HTTP/1.1 specification", which implies that "User agents SHOULD
send a Cookie request header" applies to ALL user agents... At a
minimum, we should be clear here that servers can't assume that all
clients will support cookies.

> An origin server MAY include multiple Set-Cookie header fields in a
> single response.  Note that an intervening gateway MUST NOT fold
> multiple Set-Cookie header fields into a single header field.

"Gateways that want to be transparent to cookies MUST NOT fold
multiple Set-Cookie header fields into a single header field."

> If a server sends multiple responses containing Set-Cookie headers
> concurrently to the user agent (e.g., when communicating with the
> user agent over multiple sockets), these responses create a "race
> condition" that can lead to unpredictable behavior.

might be better somewhere else?

> == Server -> User Agent ==
> Set-Cookie: lang=; Expires=Sun, 06 Nov 1994 08:49:37 GMT
>
> == User Agent -> Server ==
> (No Cookie header)

you should either state explicitly that each of the examples is
independent of the others, or else redo them to be cumulative (eg,
still returning the SID cookie in this example). (I think cumulative
would be better.)

> Informally, the Set-Cookie response header comprises the token Set-
> Cookie:

Don't say "token" since "Set-Cookie:" is not an HTTP token.

    the header name "Set-Cookie", followed by a ":" and the cookie
    value

or something.

> Each cookie begins with a name-value-
> pair, followed by zero or more attribute-value pairs.

no hyphen between "name-value" and "pair"

> cookie-pair       = cookie-name "=" cookie-value
> cookie-name       = token
> cookie-value      = token

Did we come to a consensus on restricting cookie-value to "token"? It
seems a little limiting (though I guess the server can always just
encode the data...)

I had a thought at one point that we could do something like

    cookie-value         = token / cookie-quoted-string
    cookie-quoted-string = DQUOTE *(cqdtext | cookie-quoted-pair) DQUOTE
    cqdtext              = WSP / %x21 / %x23-3A / %x3C-5B / %x5D-7E
    cookie-quoted-pair   = "\" ( %x20-3A / %3C-7E)

ie, you can use quoted strings as long as they don't contain
semicolons, such that clients-that-treat-quoted-strings-specially and
clients-that-just-treat-cookie-value-as-a-pile-of-bytes would treat
them exactly the same way.

> cookie-av         = expires-av / max-age-av / domain-av /
>                     path-av / secure-av / httponly-av

 / extension-av ?

> expires-av        = "Expires" "=" sane-cookie-date

it may be worth pointing out explicitly in the running text that
expires-av is NOT quoted.

> path-value        = <abs_path, as defined in RFC 2616>

except that it can't contain any ";"s

> To maximize compatibility with user agents, servers that wish to
> store non-ASCII data in a cookie-value SHOULD encode that data using
> a printable ASCII encoding, such as base64.

Base64 uses "/" and "=", which are not allowed in tokens, and thus not
allowed in cookie-value according to the current grammar. See, I told
you token was too restrictive. :-)

> NOTE: Some user agents represent dates using 32-bit integers.  Some
> of these user agents might contain bugs that cause them process dates
> after the year 2038 incorrectly.  Servers wishing to interoperate
> with these user agents might wish to use dates before 2038.

Such user agents might also wrap around on the other side and treat
dates before 1902 as being in the future, so we might want to suggest
a lower limit on pre-expired dates too (1970?). Maybe both of these
belong in 4.1.2.1 though?

(Also, it's not really "32-bit integers", it's specifically "32-bit
UNIX time_t values".)

> NOTE: The syntax above allows whitespace around the U+003D ("=")
> characters.

Actually, it doesn't, since we're using RFC 5234 ABNF, not RFC 2616
crazy implied LWS ABNF.

That means you also need an "SP" after the ";" in set-cookie-string.

> and it expires at the end of the
> current session (as defined by the user agent).

We might want more discussion about that (maybe in the security
considerations section?) In particular wrt mobile devices.

>    WARNING: Not all user agents support the Max-Age attribute.  User
>    agents that do not support the Max-Age attribute will retain the
>    cookie for the current session only.

Well, unless there's an Expires attribute. Move this after the "both
Max-Age and Expires" paragraph, and change the second sentence to:
"User agents that do not support the Max-Age attribute will determine
the cookie lifetime based solely on the Expires attribute".

> 4.1.2.3.  The Domain Attribute

Since the 4.1.1 grammar limits Domain to "token", we should probably
point out that this uses the IDNA A-label form.

> is ignored.)  If the server omits the Domain attribute, the user
> agent will return the cookie only to the origin server.

OK, I was annoyed that the Expires section didn't say anything about
session cookies, but then decided, "well, I guess he mentioned it in
4.1.2 so he doesn't have to mention it here". But you mentioned
origin-server-only cookies there too, and mention them again here. So
maybe it would be nice to mention session cookies a second time too.
(Maybe after the "both Max-Age and Expires" paragraph, have a "neither
Max-Age nor Expires" paragraph.)

>    host name.  For example, if example.com returns a Set-Cookie
>    header without a Domain attribute, these user agents will
>    erroneously send the cookie to www.example.com.

add "as well" to the end?

> The Path attribute limits the scope of the cookie to a set of paths.

the most common usage of Path is to *un*limit the scope ("Path=/").
How about:

     The scope of the cookie also includes a set of paths. The Path
     attribute can be used to set this explicitly; if the server omits
     the Path attribute, the user agent will use the directory of the
     Request-URI's path component as the default value.

     The user agent will include the cookie in an HTTP request only if
     the path portion of the Request-URI matches (or is a subdirectory
     of) the cookie's Path attribute, where the U+002F ("/") character
     is interpreted as a directory separator.

> 2.  The "same-origin" policy implemented by many user agents does not
>     isolate different paths within an origin.  For example, /foo/
>     bar.html can read cookies with a Path attribute of "/baz" because
>     they are within the "same origin".

This is talking about javascript, right? It's pretty confusing if you
don't know that. ("can read cookies" how?)

     2.  The "same-origin" policy implemented by web browsers does not
         isolate different paths within an origin. For example, a
         script running on a page downloaded from /foo/bar.html can
         read cookies from the browser's cookie store with a Path
         attribute of "/baz", because they are within the "same
         origin".

or maybe just forward-ref the whole thing to the security
considerations section.

> 4.1.2.6.  The HttpOnly Attribute
>
> The HttpOnly attribute limits the scope of the cookie to HTTP
> requests.  In particular, the attribute instructs the user agent to
> omit the cookie when providing access to its cookie store via "non-
> HTTP" APIs (as defined by the user agent).

That makes it almost uselessly vague. I'd change "(as defined by the
user agent)" to "(such as JavaScript's document.cookie API)". And we
should say somewhere *why* you'd want to do that? (Maybe in 8.5?)

> 4.2.1.  Syntax
>
> The user agent returns stored cookies to the origin server in the
> Cookie header.  If the server conforms to the requirements in this
> section, the requirements in the next section will cause the user
> agent to return a Cookie header that conforms to the following
> grammar:

"If the server conforms to the requirements in section 4.1, then the
requirements in section 5 will cause the user agent to..."

> 4.2.2.  Semantics

is this, like 4.1.2, non-normative?

> In particular,
> if the Cookie header contains two cookies with the same name, servers
> SHOULD NOT rely upon the order in which these cookies appear in the
> header.

maybe "In particular, if a user agent has two cookies with the same
name (but different Path or Domain attributes) for a given server, the
server SHOULD NOT assume that the two cookies will appear in the
header in a particular (or even consistent) order."

> The user agent MUST use the following algorithm to *parse a cookie-
> date*:

the asterisks there seem superfluous?

also, "MUST"? Is anyone actually planning to rewrite their cookie date
parser to be exactly equivalent to this grammar? Mm... I think I'll
split "conformance" discussion into a separate mail.

> mystery         = <anything except a delimiter>

not valid ABNF, you need to specify the byte-ranges

It would be nice to add a comment to "delimiter" explaining what
characters it includes too.

> 3.  Abort these steps and *fail to parse* if

asterisks again

>     *  the year-value is less than 1601 or greater than 30827,

30827???

It's not clear to me that there's any good reason for
allowing/requiring 5DIGIT years.

> 4.  If the year-value is greater than 68 and less than 100, increment
>     the year-value by 1900.
>
> 5.  If the year-value is greater than or equal to 0 and less than 69,
>     increment the year-value by 2000.

You need to move these steps to before step 3, since you've already
aborted if the year was less than 1601. Probably just make them
sub-steps of the "date-token matches the year production" step.

> A *canonicalized* host-name is the host-name converted to lower case
> and expressed in punycode [RFC3492].

The idnabis drafts ought to be real RFCs soon... Per their rules,
"punycode" refers only to the *algorithm* defined by 3492, and you
want to refer to "A-labels". Or probably something like "converted to
the A-label form as used for DNS lookup according to RFC xxxx"
(whatever draft-idnabis-protocol becomes).

It's weird that it talks about lowercasing and punycoding the
request-host, but not the cookie-domain. Of course, that turns out to
be because we "already" lowercased the cookie-domain, even though we
won't learn about that until later on in the document...

> 5.1.3.  Paths
>
> The user agent MUST use the following algorithm to compute the
> *default-path* of a cookie:

It is confusing that the algorithm for determining cookie-domain from
the Domain attribute is in 5.2, but the algorithm for determining
cookie-path from Path is (for the most part) in 5.1. This ties in to
the comment above about request-host vs cookie-domain
canonicalization. I'm not sure having the algorithms split out from
the processing really makes sense. Or alternatively, it might make
more sense if you swapped 5.1 and 5.2, so the high-level description
comes first, and the details afterward.

> 1.  Let uri-path be the path portion of the Request-URI.

More precision here might be good, especially since Request-URI as
defined in RFC 2616 is known to be wrong. (The "abs_path" case is
supposed to be "abs_path [ "?" query ]".)

     1.  Let uri-path be the abs_path portion of the Request-URI.
         That is, if the Request-URI contains just a path (and
	 optional query string), then the uri-path is that path
         (without the "?" or query), and if the Request-URI contains a
         full absoluteURI, the uri-path is the abs_path component of
         that URI.

What about the "OPTIONS * HTTP/1.1" and "CONNECT example.com:443
HTTP/1.1" cases? Do clients send cookies with those as though path was
"/"?

> 2.  If the first character of the uri-path is not a U+002F ("/")
>     character, output U+002F ("/") and skip the remaining steps.

As defined above, uri-path must either start with "/" or be empty, so
this is equivalent to "If the uri-path is the empty string, ..."

> the user agent *receives a set-cookie-string* consisting of the value

more gratuitous asterisking. I didn't comment on some of the other
ones before, but this one, like the others I mentioned, is never
referred to again. I think in general it's probably better to just
refer to "parsing a domain attribute according to section 5.blah"
rather than having magic defined terms.

> 1.  If the set-cookie-string is empty or consists entirely of WSP
>     characters, ignore the set-cookie-string entirely.

this is unecessary, since it's covered by step 3. OTOH, you should say
explicitly that any leading whitespace is not part of the
set-cookie-string, otherwise "  =foo" would result in setting a
nameless cookie (via step 6).

> 7.  The cookie-name is the name string, and the cookie-value is the
>     value string.

Somewhere in here we need a note mentioning that the client can reject
cookies that are too large.

>        Consume the characters of the unparsed-attributes up to, but
...
>     Let the cookie-av string be the characters consumed in this step.

"consume" tends to imply "throw away" to me. (Especially since you
used it in exactly that sense in the previous step, "Consume the
';'".) It might be that the easiest fix is to replace the earlier use
of "consume" with something else...

> 6.  Process the attribute-name and attribute-value according to the
>     requirements in the following subsections.

Need to say somewhere that unrecognized attribute-names are ignored.
Sort of related, if the unparsed-attributes ends with a trailing ";",
then you'll end up parsing out an attribute with an empty name and
value.

> If delta-seconds is less than or equal to zero (0), let expiry-time
> be the earliest representable date and time.  Otherwise, let the
> expiry-time be the current date and time plus delta-seconds seconds.

To be consistent with Expires parsing, shouldn't it be the Date header
plus delta-seconds seconds?

> Convert the cookie-domain to lower case.

and IDNA A-label? I guess you're assuming people aren't going to try
to use non-ASCII Domain attributes, but given that we warn about
non-ASCII cookie-values, is that reasonable?

> 1.   A user agent MAY ignore a received cookie in its entirety.  For
>      example, the user agent might wish to block receiving cookies
>      from "third-party" responses.

But we've already said that they MUST have parsed it... So they MUST
parse it but then MAY ignore it?

> 5.   If the user agent is configured to use a "public suffix" list
>      and the domain-attribute is a public suffix:

Maybe add some wiggle room here for draft-pettersen-subtld-structure
or other future improvements and just say "If the user agent is able
to determine that the domain-attribute is a public suffix" (keeping
the "NOTE" pointing out that the best current solution is the public
suffix list).

>         If the domain-attribute is identical to the canonicalized
>         Request-URI's host:

you say "canonicalized host of the Request-URI" below, which is
clearer that it's the host being canonicalized, not the URI.

> 10.  If the cookie's name and value are both empty, abort these steps
>      and ignore the cookie entirely.

That's an artifact from an earlier draft. We've already thrown out
nameless cookies

> 11.  If the cookie was received from a non-HTTP API and the cookie's
>      http-only-flag is set, abort these steps and ignore the cookie
>      entirely.

There is no defined way to *receive[s] a cookie* other than via HTTP.
And if we're going to define semantics for processing document.cookie
then we need to allow nameless cookies, right? (Or did we decide we
don't even in that case? I forget.)

>      2.  If the newly created cookie was received from an non-HTTP
>          API and the old-cookie's host-only-flag is set, abort these
>          steps and ignore the newly created cookie entirely.

s/host-only/http-only/ ?

> 13.  Insert the newly created cookie into the cookie store.

should probably say "if it doesn't have an expiry-time in the past"?

> The user agent MUST evict a cookie from the cookie store if, at any
> time, a cookie exists in the cookie store with an expiry date in the
> past.

That reads like "if any cookie is expired, then you have to evict
*SOME* cookie". Hm... reading on, it appears that that's the intended
reading. That's bizarre. I think you should just say that cookies are
evicted when they expire, and also, that when inserting new cookies
into the store, old cookies can be evicted if adding the new cookie
would result in too many per-host or total cookies.

>     *  The Request-URI's path patch-matches cookie's path.

s/patch/path/

>     *  If the cookie's http-only-flag is true, then exclude the
>        cookie unless the cookie-string is being generated for an
>        "HTTP" API (as defined by the user agent).

We're talking about generating a Cookie header for an HTTP request.
It's an HTTP API.

>     NOTE: Not all user agents sort the cookie-list in this order, but
>     this order reflects common practice when this document was
>     written.  The specific ordering might not be optimal in every
>     metric, but using the consensus ordering is a relatively low cost
>     way to improve interoperability between user agents.

"not be optimal" isn't the point. The debate wasn't about a "best"
ordering, it was about whether there should be an ordering at all.

       NOTE: Not all user agents sort the cookie-list in this order,
       but this order reflects common practice when this document was
       written, and historically, there have occasionally been servers
       that (erroneously) depended on this order.

> NOTE: Despite its name, the cookie-string is actually a sequence of
> octets, not a sequence of characters.  To convert the cookie-string
> into a sequence of characters (e.g., for presentation to the user),
> the user agent SHOULD use the UTF-8 character encoding [RFC3629].

That's a little confusing... I'd say something more like "When
presenting the cookie-string to the user, user agents SHOULD assume
that the string is UTF-8". But also, it doesn't belong here anyway;
user agents aren't generally going to present cookie-strings, they're
going to present cookie-names and cookie-values. Probably this should
go in 7.2.

> Servers SHOULD use as few and as small cookies as possible to avoid
> reaching these implementation limits and to avoid network latency due
> to the Cookie header being included in every request.

s/avoid network latency/minimize network bandwidth/.

> Servers should gracefully degrade if the user agent fails to return

s/should/SHOULD/ ?

> One reason the Cookie and Set-Cookie headers uses such esoteric
> syntax is because many platforms (both in servers and user agents)
> provide a string-based application programmer interface (API) to
> cookies, requiring application-layer programmers to generate and
> parse the syntax used by the Cookie and Set-Cookie headers.

Needs to end with something like ", which many programmers have done
incorrectly, resulting in interoperability problems."

> Cookies have a number of security and privacy pitfalls.

add something like "The following are merely a brief overview." ?

(also, should it still say "privacy" here, since this is the "Security
Considerations" section and you already talked about privacy in
section 7?)

> Servers SHOULD encrypt and sign the contents of cookies when

really? I mean, yeah, they should, but... no one's going to. And
especially if the cookie content is just a nonce, a
signed-and-encrypted nonce is not much different from a plain nonce.

Maybe just "SHOULD NOT transmit sensitive information unencrypted in
cookies".

> Cookies do not always provide isolation by path.  Although the
> network-level protocol does not send cookie stored for one path to

s/cookie/cookies/

> 8.6.  Weak Integrity

the breakdown of sections here still seems odd to me. In particular,
the fact that "servers SHOULD NOT both run mutually distrusting
services on different ports of the same host and use cookies to store
security-sensitive information" appears under "Weak Confidentiality",
but "servers SHOULD NOT both run mutually distrusting services on
different paths of the same host and use cookies store
security-sensitive information" appears under "Weak Integrity".

> cookies by storing large number of cookies.  Once the user agent

s/storing large/storing a large/ (or s/number/numbers/)