Re: [http-state] Comments on draft-ietf-httpstate-cookie-08.txt (4.1.2.2 - 5.2.6)

Adam Barth <ietf@adambarth.com> Sun, 30 May 2010 21:22 UTC

Return-Path: <ietf@adambarth.com>
X-Original-To: http-state@core3.amsl.com
Delivered-To: http-state@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 5A2953A6896 for <http-state@core3.amsl.com>; Sun, 30 May 2010 14:22:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.859
X-Spam-Level:
X-Spam-Status: No, score=-0.859 tagged_above=-999 required=5 tests=[AWL=0.518, BAYES_50=0.001, FM_FORGED_GMAIL=0.622, GB_I_LETTER=-2]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oAJxbqiLiZF8 for <http-state@core3.amsl.com>; Sun, 30 May 2010 14:22:38 -0700 (PDT)
Received: from mail-gw0-f44.google.com (mail-gw0-f44.google.com [74.125.83.44]) by core3.amsl.com (Postfix) with ESMTP id 136303A6833 for <http-state@ietf.org>; Sun, 30 May 2010 14:22:38 -0700 (PDT)
Received: by gwj19 with SMTP id 19so2343809gwj.31 for <http-state@ietf.org>; Sun, 30 May 2010 14:22:23 -0700 (PDT)
Received: by 10.101.4.15 with SMTP id g15mr3680021ani.149.1275254543210; Sun, 30 May 2010 14:22:23 -0700 (PDT)
Received: from mail-yw0-f182.google.com (mail-yw0-f182.google.com [209.85.211.182]) by mx.google.com with ESMTPS id n18sm26251600anl.2.2010.05.30.14.22.20 (version=SSLv3 cipher=RC4-MD5); Sun, 30 May 2010 14:22:21 -0700 (PDT)
Received: by ywh12 with SMTP id 12so2514637ywh.19 for <http-state@ietf.org>; Sun, 30 May 2010 14:22:20 -0700 (PDT)
Received: by 10.231.120.37 with SMTP id b37mr4543330ibr.81.1275254540093; Sun, 30 May 2010 14:22:20 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.231.60.4 with HTTP; Sun, 30 May 2010 14:22:00 -0700 (PDT)
In-Reply-To: <9c8006lme07degqokchpkk0jq5vkt1njdu@hive.bjoern.hoehrmann.de>
References: <hovvv5ph6ipqda3mm7rr9v2jo8pp59l2bn@hive.bjoern.hoehrmann.de> <AANLkTil_uXcJcbYcHMat7kTxpV2JL0dtLf1uyxuBlfTp@mail.gmail.com> <9c8006lme07degqokchpkk0jq5vkt1njdu@hive.bjoern.hoehrmann.de>
From: Adam Barth <ietf@adambarth.com>
Date: Sun, 30 May 2010 14:22:00 -0700
Message-ID: <AANLkTimdleqfOVKOSc3ZuTIfrOs2EDLOBCvHMvzhxQL8@mail.gmail.com>
To: Bjoern Hoehrmann <derhoermi@gmx.net>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
Cc: http-state@ietf.org
Subject: Re: [http-state] Comments on draft-ietf-httpstate-cookie-08.txt (4.1.2.2 - 5.2.6)
X-BeenThere: http-state@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Discuss HTTP State Management Mechanism <http-state.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/http-state>
List-Post: <mailto:http-state@ietf.org>
List-Help: <mailto:http-state-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 30 May 2010 21:22:40 -0000

On Fri, May 28, 2010 at 4:05 PM, Bjoern Hoehrmann <derhoermi@gmx.net> wrote:
> * Adam Barth wrote:
>>> Later in the section: "WARNING: Not all user agents support the Max-Age
>>> attribute. User agents that do not support the Max-Age attribute will
>>> ignore the attribute." I see no reason why this needs to be a "WARNING"
>>> rather than a note,
>>
>>Changed to a NOTE.
>>
>>> and the text needs to clarify that support for the
>>> attribute is not optional but rather required.
>>
>>That requirement is contained in the section entitled User Agent Requirements.
>
> The point is that the text does not say those user agents are non-con-
> forming, so the casual reader may well assume support is optional. It
> should say something like "Some older user agents do not support it",
> or whatever, so long as it is clear that they are not compliant.

I've changed this to the following:

[[
              Some legacy user agents do not support the Max-Age
              attribute.
]]

Hopefully that should be sufficient to convey that this is not desired.

>>> And "will" is also the wrong word.
>>
>>What word do you suggest instead?
>
> I don't have a ready proposal how to rephrase it, but the point is that
> they may not do so and still conform to the specification.

I've removed the word "will."

>>> The subsequent "Warning" needs to explain how and why the described be-
>>> haviour is erroneous.
>>
>>I don't understand this comment.  It's a warning that some existing
>>user agents behave incorrect w.r.t. the description that section.
>>That behavior is erroneous: it is in error.
>
> A remedy would be to state in the note what the correct behavior is,
> rather than simply say it is wrong. Obviously the developers of the
> misbehaving user agents did not think it wrong.

On the contrary, even the developers of the misbehaving user agents
agree that the behavior is wrong, at least according to their public
statements on the matter.

>>> In section 4.1.2.4, "If the server omits the Path attribute, the user
>>> agent will use the directory of the request-uri's path component as the
>>> default value." It is not clear what the "directory" of a path component
>>> is, RFC 3986 for example does not define it.
>>
>>I've put directory in quotes.  The definition is in Section 5.
>
> This probably works if you include the forward reference to the re-
> levant sub-section of section 5.

Done.

>>> "If the server conforms to the requirements in Section 4.1, the
>>> requirements in the Section 5 will cause the user agent to return a
>>> Cookie header that conforms to the following grammar" This seems some-
>>> what confused. If the user agent is free to send whatever it wants in
>>> the header, perhaps because the `Set-Cookie` it processed was invalid,
>>> then this should be clearly stated here, more so if it is required to.
>>
>>Section 4 does not contain any requirements on the user agent.  All
>>the user agent requirements are contained in Section 5.  This sentence
>>just states some consequences of those requirements.
>
> The text draws attention to a condition, either it matters that the
> server conforms to the requirements in section 4.1 for the user agent
> to meet the requirements in section 5, then it needs to be explained
> what the implications are of the server not doing so, or the user agent
> will behave according to the requirements in section 5 regardless of
> whether the server conforms to its requirements, then no attention
> should be drawn to the condition.

The consequent follows only if both of the antecedents are satisfied.
There's no entanglement between the antecedents.

I've changed the structure of this sentence to be more parallel to
reflect the parallelism in the situation.  I've also demoted the
mention of Section 5 to a parenthetical.

[[
          <t>The user agent sends stored cookies to the origin server in the
          Cookie header. If the server conforms to the requirements in <xref
          target="sane-set-cookie" /> (and the user agent conforms to the
          requirements in the <xref target="ua-requirements" />), the user
          agent will send a Cookie header that conforms to the following
          grammar:</t>
]]

>>> In section 5.1.1, "If the found-day-of-month flag is not set ..." Up to
>>> this point it has not been explained what this flag is. The same goes
>>> probably for all "flags". Definitions need to be added.
>>
>>They don't mean anything other than what the text says.  There's
>>nothing to define.
>
> There are two occurrences in the draft of `found-day-of-month`, the
> first is "If the found-day-of-month flag is not set and ..."; the other
> is in "Abort these steps and fail to parse if ... at least one of the
> found-day-of-month, found-month, found-year, or found-time flags is not
> set";

Ah, I understand your confusion.  You missed the third occurrence of
the term in which the flag is actually set.  :)

> The variable is read twice but never written to. You might try to read
> the draft after replacing all occurences of "found-day-of-month" with
> "squirrel0815"; that is, in effect, what I do. I could guess that maybe
> `found-day-of-month` is supposed to be set when the `day-of-month` sym-
> bol is used when parsing the input, but the draft does not say that, so
> my guess may very well be wrong.

I think the text still makes sense if read with squirrel0815:

[[
              <t>If the squirrel0815 flag is not set and the date-token
              matches the day-of-month production, set the squirrel0815
              flag and ...</t>

              [...]

              <t>at least one of the squirrel0815, found-month,
              found-year, or found-time flags is not set, ...</t>
]]

>>> The surprising requirements in this section need to be accompanied by
>>> some rationale, for example, accepting 1601 as year but not 1600 is
>>> surprising and so is the 68/69 boundary.
>>
>>The reason is just that's how the protocol works.  It's odd, but
>>that's how it works.
>
> I think it is unlikely that surprising requirements will be widely im-
> plemented unless the reasoning behind them is explained.

That requirement is already widely implemented, so we have nothing to fear here.

> I also note that the draft fails to address handling impossible dates
> like the 31st of february, but that is obviously an error.

Fixed.  We not fail explicitly when no such date exists.

>>> In section 5.1.2 it needs to be explained what is supposed to happen if
>>> the normalization process fails, e.g. when ToASCII fails.
>>
>>Can you provide a test case that illustrates behavior that you think
>>is not defined?
>
> For instance, ToASCII(ToASCII(x)) fails for any `x`, and the draft does
> not discuss this failure.
>
>>> There should
>>> be a reminder that using non-ASCII characters directly is prohibited.
>>
>>Who is prohibited from using non-ASCII characters for what?  I don't
>>understand your request.
>
> A `domain-value` cannot contain non-ASCII characters directly.

Do you mean there's a certain sequence of octets that the server is
prohibited from sending?  Please read the draft more carefully.  The
server is permitted to send any sequence of octets it pleases.  We
recommend against sending some crazy ones, but none are prohibited by
this document.

>>> There needs to be a discussion, I am not sure this is the right place,
>>> on the handling of permissable HTTP "hosts" (non-DNS names, IP-Literals
>>> and IPv4 addresses, for instance); some questions are, is it intentional
>>> that the Domain attribute disallows setting the attribute to a IPv4
>>> address, would a leading '.' in the attribute still be stripped, if such
>>> host specifications are disallowed, must all cookies coming from them be
>>> rejected? With a Set-Cookie from 127.0.0.1 with no Domain attribute, is
>>> it incorrect to imply Domain=127.0.0.1? (I don't care, the specification
>>> however needs to discuss those questions).
>>
>>Can you provide a test case that illustrates behavior that you think
>>is not defined?
>
> I believe I did.

I should be more clear.  In this case, I'm asking for a sequence of
octets exchanged by the server and the user agent that causes the
protocol to arrive in a situation that you believe is not defined.
You have not provided any such sequence of octets and so I cannot use
them to test the protocol definition.

>>> The draft needs to explain how to handle a Set-Cookie header in a reply
>>> to an `OPTIONS *` request. Right now it seems `uri-path` would be unde-
>>> fined in that case, but the draft does not say what the consequence is.
>>> There should be an explicit mention of the case, not just adjusted algo-
>>> rithms.
>>
>>The draft says:
>>
>>[[
>>            <t>If the uri-path is empty or if first character of the uri-path
>>            is not a U+002F ("/") character, output U+002F ("/") and skip the
>>            remaining steps.</t>
>>]]
>>
>>Doesn't this fall under the case of the uri-path being empty?
>
> Since `uri-path` is "the path portion", and "*" is not a path, we do not
> know what `uri-path` is.

I've changed the text to say explicitly that uri-path is empty if the
path portion of the request-uri does not exist.  In that case, the
default-path will be computed as "/", which is correct.

>>> In section 5.2, if the draft takes the position that HTTP header values
>>> are sequences of octets and not Unicode code points, then it needs to be
>>> explained at the beginning of this section how those octets are turned
>>> into characters (at least so long as it refers directly to the header
>>> and not some pre-processed value based on it, and so long it refers to
>>> characters instead of octets).
>>
>>The octets are never turned into characters.
>
> You eventually pass some of the octets to the ToASCII function which
> only accepts characters, not octets. So, either you do convert, or you
> cannot use ToASCII.

What algorithm should I use instead?  It's quite difficult to work
with underlying specifications that can't provide such elementray
concepts as the URL of an HTTP request or the host name of a URL.

>>> In section 5.2.1 and subsequent sections the term "case-insensitively
>>> matches" is not defined; this should probably be defined in the termi-
>>> nology section.
>>
>>I'll be happy to add this if you can provide the definition you'd like
>>to see in the terminology section.
>
> There should be plenty of specifications to steal the text from, you
> could say "In this specification two strings are said to case-insen-
> sitively match each other if and only if they are equivalent under the
> i;ascii-casemap collation defined in RFC 4790". I am sure you can find
> more appropriate text.

[[
   The "i;ascii-casemap" collation is a simple collation that operates
   on octet strings and treats US-ASCII letters case-insensitively.
]]

That does appear to be what we'd like.  Thanks.

>>> I doubt the draft should require implementations to use the "last" Date
>>> header, there are plenty of HTTP implementations where that is not
>>> possible as the headers are only accessible in folded form.
>>
>>As far as I can tell, that's how the protocol works.  Can you provide
>>a test case that shows otherwise?
>
> A quick reading of the libwww-perl code suggests it does not use the
> "Date" header at all when determining the expiration time. Qt would
> seem to be another example.

See below.

>>> Section
>>> 4.1.2.1 also should be updated to clearly express that even though the
>>> expiry time is set in an absolute form, it'll be handled as relative
>>> value, as that is rather surprising.
>>
>>There are lots of details explained in Section 5 that are not
>>presented in Section 4.  It's unclear why this particular detail
>>merits inclusion.
>
> It is common for cookie processing library to operate only on the values
> of the Set-Cookie and Cookie headers, not on whole request and response
> objects or the value of the headers plus the value of a Date header; for
> example, QNetworkCookie::parseCookies only takes a byte array as input.
> Obviously the requirement has not been anticipated or has been ignored.
> That should be sufficient reason to draw special attention to it.

See below.

On Sat, May 29, 2010 at 11:15 AM, Daniel Stenberg <daniel@haxx.se> wrote:
> On Fri, 28 May 2010, Bjoern Hoehrmann wrote:
>> I doubt the draft should require implementations to use the "last" Date
>> header, there are plenty of HTTP implementations where that is not possible
>> as the headers are only accessible in folded form. Section 4.1.2.1 also
>> should be updated to clearly express that even though the expiry time is set
>> in an absolute form, it'll be handled as relative value, as that is rather
>> surprising.
>
> I agree. I don't think we've discussed this properly on the list. (Or if we
> did I might've missed it.)

We discussed it briefly in the context of time zones.  The current
text is based on the Firefox behavior.

> Why would the 'expires' attribute be treated relative like 5.2.1 describes?

Dunno.

> Expires is an absolute time, and I don't see why clients shouldn't try to
> use exactly that time in time comparisons. If the server/client clocks are
> off in any significant way, every response-with-cookies without Date: header
> would then be subject to flaws and I expect that a majority of
> cookie-sending server responses are done without Date:
>
> I also would expect that lots of implmentations treat that time as an
> absolute and not a relative unconditionally.
>
> Are the big-5 doing it this way?

I will test in more detail.  Hopefully this is a Firefox-only
behavior, in which case I'd be in favor of removing it because its
very subtle.




On Sat, May 29, 2010 at 11:37 AM, Yngve N. Pettersen (Developer Opera
Software ASA) <yngve@opera.com> wrote:
> On Sat, 29 May 2010 20:15:42 +0200, Daniel Stenberg <daniel@haxx.se> wrote:
>
>> On Fri, 28 May 2010, Bjoern Hoehrmann wrote:
>>
>>> I doubt the draft should require implementations to use the "last" Date
>>> header, there are plenty of HTTP implementations where that is not possible
>>> as the headers are only accessible in folded form. Section 4.1.2.1 also
>>> should be updated to clearly express that even though the expiry time is set
>>> in an absolute form, it'll be handled as relative value, as that is rather
>>> surprising.
>>
>> I agree. I don't think we've discussed this properly on the list. (Or if
>> we did I might've missed it.)
>>
>> Why would the 'expires' attribute be treated relative like 5.2.1
>> describes? Expires is an absolute time, and I don't see why clients
>> shouldn't try to use exactly that time in time comparisons. If the
>> server/client clocks are off in any significant way, every
>> response-with-cookies without Date: header would then be subject to flaws
>> and I expect that a majority of cookie-sending server responses are done
>> without Date:
>>
>> I also would expect that lots of implmentations treat that time as an
>> absolute and not a relative unconditionally.
>>
>> Are the big-5 doing it this way?
>
> Opera is using the Expires attribute as an absolute UTC date and time.
> Frankly, I think it should stay that way.

Thanks, this is useful data.

> Servers should keep their clock well in sync, so should clients,
> particularly since quite a few security features in SSL depends on a
> reasonably accurate clock.

There are plenty of non-SSL sites that we need to worry about as well.

> The situation where a slightly off clock can cause problems is for
> shortlived cookies, and those are better handled through maxage.

Indeed, but that doesn't stop existing sites from using Expires.

> My guess is that this has been adopted from the cache lifetime definitions
> in 2616 and 2616-bis.

The text in the spec was adopted from studying how Firefox behaves at
the suggestion of dwitte.  I don't know the origin of the behavior in
Firefox.

> I have no idea whether other clients use relative time at present, but if
> they don't, then all the clients that could be adapted to it would already
> support maxage.

Considerations about Max-Age are irrelevant.  We're worried about
already deployed servers who are using Expires instead of Max-Age.

On Sat, May 29, 2010 at 1:43 PM, Daniel Stenberg <daniel@haxx.se> wrote:
> On Sat, 29 May 2010, Yngve N. Pettersen (Developer Opera Software ASA)
> wrote:
>> The situation where a slightly off clock can cause problems is for
>> shortlived cookies, and those are better handled through maxage.
>
> Well, one could imagine a similar concept for maxage, like for the case
> where the server wants the cookie to be alive 10 seconds, but the mere
> transfer of the cookie to the client takes 5... I would expect most client
> implementations to consider the maxage time to start ticking the moment the
> cookie is actually parsed and not from the Date: of the http response.

I'll add this to my testing matrix.

Adam