Re: [http-state] Comments on draft-ietf-httpstate-cookie-08.txt (1 - 4.1.2.)

Adam Barth <ietf@adambarth.com> Mon, 31 May 2010 01:59 UTC

Return-Path: <ietf@adambarth.com>
X-Original-To: http-state@core3.amsl.com
Delivered-To: http-state@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id AC2293A693F for <http-state@core3.amsl.com>; Sun, 30 May 2010 18:59:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.354
X-Spam-Level:
X-Spam-Status: No, score=0.354 tagged_above=-999 required=5 tests=[AWL=-0.869, BAYES_50=0.001, FM_FORGED_GMAIL=0.622, J_CHICKENPOX_43=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XkldqwEsnpS9 for <http-state@core3.amsl.com>; Sun, 30 May 2010 18:59:02 -0700 (PDT)
Received: from mail-gy0-f172.google.com (mail-gy0-f172.google.com [209.85.160.172]) by core3.amsl.com (Postfix) with ESMTP id E7B113A6832 for <http-state@ietf.org>; Sun, 30 May 2010 18:59:01 -0700 (PDT)
Received: by gyh4 with SMTP id 4so2453263gyh.31 for <http-state@ietf.org>; Sun, 30 May 2010 18:58:47 -0700 (PDT)
Received: by 10.151.24.20 with SMTP id b20mr4399676ybj.198.1275271126421; Sun, 30 May 2010 18:58:46 -0700 (PDT)
Received: from mail-gy0-f172.google.com (mail-gy0-f172.google.com [209.85.160.172]) by mx.google.com with ESMTPS id q8sm7555807ybk.7.2010.05.30.18.58.43 (version=SSLv3 cipher=RC4-MD5); Sun, 30 May 2010 18:58:44 -0700 (PDT)
Received: by gyh4 with SMTP id 4so2453213gyh.31 for <http-state@ietf.org>; Sun, 30 May 2010 18:58:43 -0700 (PDT)
Received: by 10.231.155.3 with SMTP id q3mr4903835ibw.20.1275271123376; Sun, 30 May 2010 18:58:43 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.231.60.4 with HTTP; Sun, 30 May 2010 18:58:23 -0700 (PDT)
In-Reply-To: <g13rv59fpsefi1jhuuuds4evqqc8baia7o@hive.bjoern.hoehrmann.de>
References: <f5jqv5pu3oksmjndegd5a329gp40opqsr5@hive.bjoern.hoehrmann.de> <AANLkTin2dZ3v681D2W4yZEHhnc_0G8mAQRsMA8ZQ6wWF@mail.gmail.com> <g13rv59fpsefi1jhuuuds4evqqc8baia7o@hive.bjoern.hoehrmann.de>
From: Adam Barth <ietf@adambarth.com>
Date: Sun, 30 May 2010 18:58:23 -0700
Message-ID: <AANLkTimUXR0FI3D3KKZ2rOO2QEEGReKlRBCY5ZanwL24@mail.gmail.com>
To: Bjoern Hoehrmann <derhoermi@gmx.net>
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: quoted-printable
Cc: http-state@ietf.org
Subject: Re: [http-state] Comments on draft-ietf-httpstate-cookie-08.txt (1 - 4.1.2.)
X-BeenThere: http-state@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Discuss HTTP State Management Mechanism <http-state.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/http-state>
List-Post: <mailto:http-state@ietf.org>
List-Help: <mailto:http-state-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 31 May 2010 01:59:05 -0000

On Wed, May 26, 2010 at 2:52 PM, Bjoern Hoehrmann <derhoermi@gmx.net> wrote:
> * Adam Barth wrote:
>>> Later in the section, there should be a bibliographic reference for the
>>> "Netscape cookie specification".
>>
>>I'd be happy to add this if you let me know what citation you think we
>>should use here.  The problem is that I don't know of a canonical URL
>>to cite now that the original Netscape site is defunct.
>
> It does not have to include an address, something like 'Netscape
> Communications Corp., "Persistent Client State - HTTP Cookies"' would
> do. http://wp.netscape.com/newsref/std/cookie_spec.html should be the
> address, the Internet Archive has archived copies of it, and it should
> be fine to cite the Internet Archive address (at least I did do that
> in RFC 4329 for the JavaScript reference manual).

Done.

>>> In section 2.2 the imported rules should be phrased as grammar to make
>>> them easer to scan and read, i.e., e.g.
>>>
>>>  ALPHA = <as defined in ...> ; A-Z / a-z
>>>  ...
>>
>>I looked a bunch of RFCs and this didn't appear to be that common.
>>I'd be happy to do this if you can show me three relatively recent
>>RFCs that use this pattern.
>
> RFC 5536, RFC 5537, RFC 5538. The draft itself does this for other
> rules, like for `token`.

See below.

On Thu, May 27, 2010 at 12:49 AM, Julian Reschke <julian.reschke@gmx.de> wrote:
> On 26.05.2010 22:45, Adam Barth wrote:
>> On Wed, May 26, 2010 at 10:22 AM, Bjoern Hoehrmann<derhoermi@gmx.net>
>>  wrote:
>>> Later in the section, there should be a bibliographic reference for the
>>> "Netscape cookie specification".
>>
>> I'd be happy to add this if you let me know what citation you think we
>> should use here.  The problem is that I don't know of a canonical URL
>> to cite now that the original Netscape site is defunct.
>
> I agree that this needs a citation, even if we can't provide the original
> URI. Proposal: create something on purl.org (I know the RFC Editor is likely
> to accept that), and let it redirect to the best copy we can find. If that
> breaks, we still can update the purl.

I've added the citation without a URL.  There are lots of broken URLs
in the old cookie specs.  I don't really want to add to the sadness of
broken URLs.  Hopefully search engines in the future will be powerful
enough to find the document if it still exists.

>>> In the same paragraph "However, none of
>>> these documents describe how the Cookie and Set-Cookie headers are
>>> actually used on the Internet" is rather unclear and does not appear to
>>> do the relevant documents justice. As reader I am left wondering if the
>>> intend is to say they did not attempt to do so, or were incomplete, or
>>> what else is wrong with them.
>>
>> That's just a statement of fact.  I'm not sure what the intentions
>> were of their creators, but (however we got here) we find ourselves in
>> that situation.
>
> That statement is kind of confusing to people who do not already know the
> history.
>
> Maybe it would make sense to cite
>
> [Kri2001]       Kristol, D., “HTTP Cookies: Standards, Privacy, and
> Politics”, ACM Transactions on Internet Technology Vol. 1, #2, November
> 2001, <http://arxiv.org/abs/cs.SE/0105018>.
>
> to give some more context.

Done.

>>> In section 2.2 the imported rules should be phrased as grammar to make
>>> them easer to scan and read, i.e., e.g.
>>>
>>>  ALPHA =<as defined in ...>  ; A-Z / a-z
>>>  ...
>>
>> I looked a bunch of RFCs and this didn't appear to be that common.
>> I'd be happy to do this if you can show me three relatively recent
>> RFCs that use this pattern.
>
> Adam is right; not expanding the RFC 5234 base rules is quite common.

Ok.  I've left these as is.

>>> In section 2.3 we have "The terms request-host and request-uri refer to
>>> the values the user agent would send to the server as, respectively, the
>>> host (but not port) and the absoluteURI (http_URL) of the HTTP
>>> Request-Line." I am not entirely sure how to read that; for example, the
>>> "host" would be part of "absoluteURI" if it is sent in the Request-Line,
>>> and sending an absoluteURI there is unusual (unless talking to a proxy).
>>> Also, the reference to http_URL is probably incorrect if "https" URLs
>>> are also permissable.
>>
>> I find it completely ridiculous that the specs make it so hard to say
>> what URL and host we're sending a request to, which would seem to be
>> the two most important things about an HTTP request.  Hopefully
>> someone will chime in with the magic incantation we're supposed to use
>> here.
>
> If you're willing to wait for httpbis, "Effective Request URI" might be the
> right thing
> (<http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p1-messaging-latest.html#effective.request.uri>).

I'm glad this is getting fixed.  I don't particularly want to take a
dependency on HTTPbis, as discussed before.

>>> Later in the section, "The origin server MAY send the user agent a
>>> Set-Cookie response header with the same or different information, or it
>>> MAY send no Set-Cookie header at all." It is unclear what "same" refers
>>> to here. It sounds like this might refer to servers setting the same
>>> cookie over and over again even if the client is already sending it back
>>> but that does not become clear from the paragraph.
>>
>> It just means the server can send whatever it likes, including things
>> that are the same or different from what it sent before or not sending
>> things at all.
>
> I also have to point out that this use of MAY is irritating; just state that
> this is optional behavior.

Ok.  I've just removed this sentence.  It was meant to be helpful, but
it seems to be causing more problems then it's worth.  It's not need
for the normative content of the document anyway.

>>> At the end of section 3 it should be pointed out that the do-not-fold
>>> rule is inconsistent with RFC 2616.
>>
>> This is pointed out in HTTPbis.  At worst, this inconsistency will
>> exist for only a short time.
>
> HTTPbis currently says:
>
> "Note: The "Set-Cookie" header as implemented in practice (as opposed to how
> it is specified in [RFC2109]) can occur multiple times, but does not use the
> list syntax, and thus cannot be combined into a single line. (See Appendix
> A.2.3 of [Kri2001] for details.) Also note that the Set-Cookie2 header
> specified in [RFC2965]  does not share this problem."
> --
> <http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p1-messaging-latest.html#rfc.section.3.2.p.7>
>
> So the Cookie spec has an additional requirement compared both to RFC 2616
> and HTTPbis, and that should be pointed out (and explained!).

I've changed this text to be purely descriptive:

[[
      Note that folding multiple Set-Cookie header fields into a
      single header field might change the semantics of the header because the
      U+002C (",") character is used by the Set-Cookie header in a way that
      conflicts with such folding.
]]

>>> The grammar makes reference to "abs_path", but does not define what it
>>> is.
>>
>> I've just made this<any CHAR except CTLs or ";">.
>
> Is this supposed to be the same as abs_path in RFC 2396? If it is, RFC 3986
> has replaced this with path-absolute. More consistency with these specs
> would be good.

This production is just advice for servers.  It doesn't really matter
that much how closely they stick to abs_path and it's easier to have
something concrete in the grammar instead of making the reader chase a
mountain of definitions.

>> ...
>> I've removed the concept of a cookie store from this section because
>> it's not needed to explain what's going on.
>> ...
>
> It's good that you already improved the spec; can you please post this
> somewhere (not as new ID!), so people doing the LC review can check whether
> something has already been addressed?

Sure.  As always, you can find the up-to-the-minute version of the draft here:

http://github.com/abarth/http-state/blob/master/drafts/cookie.xml

On Thu, May 27, 2010 at 7:33 AM, Bjoern Hoehrmann <derhoermi@gmx.net> wrote:
> * Adam Barth wrote:
>>I don't agree.  The purpose of the text in the spec is to explain why
>>the text in this document isn't consistent with those documents.
>>Those documents do not reflect reality.  This document is intended to
>>reflect reality.
>
> The purpose of an introductory section is to take the reader from where
> he is standing and explain to him where things will be going from there.
>
> Pointing out that the ten years old specification for the Set-Cookie2
> and Cookie2 headers does not describe how the Set-Cookie and Cookie
> headers are used today does not do that, it rather misleads the reader.
>
> Telling the reader that even older specifications do not describe how
> the headers are used currently is not helpful either unless it is also
> explained how and to what degree things are different; what it says is
> "You know stuff about cookies. It's wrong or not, continue with the next
> three dozens of pages to find out." That leaves the reader constantly
> wondering.
>
> Drawing attention to the obvious should be avoided as readers would have
> to expend additional effort to confirm that what is being pointed out is
> indeed obvious -- and not some subtlety they've missed -- for no gain.
> If a specification replaces two older ones based on implementation ex-
> perience, then it is obvious that the older specifications do not re-
> flect current practise equally well.

Hopefully the added citation to Kristol's missive should suffice.  The
old spec are actively harmful.  This text is meant to reassure the
reader that we're contrite for our previous sins and are trying to do
no harm in this document.

>>> Right now the first occurrence of "host-name" is in "A canonicalized
>>> host-name is the host-name converted to ..." which proceeds without
>>> saying what a "host-name" is in the first place.
>>
>>Precisely.  For example "f(x) = x + 1" doesn't proceed to say what "x"
>>is in the first place.
>
> In mathematics, function definitions require a specification of their
> domain and codomain; you would know what `x` is, namely a member of
> the domain of the function; but without specifying the domain you do
> not have a function.

That's a matter for philosophers of mathematics, but certainly not all
of them would agree with you.  :)

> In the context of "hosts" the term "canonicalized" is overloaded with
> definitions, RFC 1123 for example defines it quite differently than
> draft-ietf-httpstate-cookie-08.txt.

I've changed the text to define a "canonicalized string", which should
emphasize that the name of the formal parameter is irrelevant.

> Further, the permissable range of values for `request-host` is larger
> than the apparent domain of the draft's `canonicalized` "function".
> For instance, the "host" can be an IP-Literal, but you cannot apply
> the ToASCII function to those -- they are outside its domain.

Ok.  Well, I'm happy to write whatever you like in that part of the
spec.  The point is we need the puny code version of the host name.
When I wrote that before, someone complained that I couldn't use the
words "puny code" in that way, so I adopted the text they recommend.
Now I'm told that I can't use the text they gave me.

This text is really trying to accomplish a very simply task, which is
to obtain the lower case, puny code version of a host name from a URL.
 In code, this is as simple as writing "url->host()".  Let me know
what the magic phase is, and I'm happy to appease.

> So I've found no reason to assume that I am supposed to lookup
> "canonicalized request-host" under "A canonicalized host-name is".

Hopefully replacing the string "host-name" with "string" has make this
easier for you to understand.

>>> The implied lws rule in RFC 2616 has been a source of considerable
>>> confusion as it implied optional white space where people do not ex-
>>> pect it and because some rules use it and some rules do not and that
>>> is not made clear through the grammar but rather through prose and
>>> thus easy to miss. The draft builds on RFC 2616 but unlike most other
>>> extension header specifications does not re-use the grammar notation
>>> defined in RFC 2616 but a very slightly different notation, and then
>>> the draft imports rules from RFC 2616 where one has to pay very close
>>> attention to those notational differences and the surrounding prose
>>> to read the grammar correctly. I do not care much how, but I do think
>>> the draft needs to draw attention to this to avoid confusion.
>>
>>Feel free to send me a diff with a proposed clarification.
>
> If you require help to turn my sketch into proper text, you would need
> to tell what difficulty you are having.

The very first sentence of the "Syntax Notation" section says:

[[
        <t>This specification uses the Augmented Backus-Naur Form (ABNF)
        notation of <xref target="RFC5234"/>.</t>
]]

Would you prefer that line said something else to clarify that we
really mean the ABNF notation of RFC5234 as opposed to something else?
 Would you like that text copy/pasted to every location where we use
an ABNF grammar?  I'm happy to review a diff to the spec.

In general, the more specific your feedback, the more actionable it is
by me.  Currently, your feedback is "I do not care much how, but I do
think the draft needs to draw attention to this to avoid confusion."
I don't know how to draw more attention to the issue other than
stating unequivocally which grammar form we're using as the first
normative requirement after stating our conformance criteria.

Adam