[http-state] algorithm definitions

Julian Reschke <julian.reschke@gmx.de> Fri, 16 July 2010 13:42 UTC

Return-Path: <julian.reschke@gmx.de>
X-Original-To: http-state@core3.amsl.com
Delivered-To: http-state@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id DF6393A6A30 for <http-state@core3.amsl.com>; Fri, 16 Jul 2010 06:42:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.024
X-Spam-Level:
X-Spam-Status: No, score=-3.024 tagged_above=-999 required=5 tests=[AWL=-0.425, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id i9rFcpi4jW5w for <http-state@core3.amsl.com>; Fri, 16 Jul 2010 06:42:30 -0700 (PDT)
Received: from mail.gmx.net (mailout-de.gmx.net [213.165.64.22]) by core3.amsl.com (Postfix) with SMTP id 6D1C43A6A04 for <http-state@ietf.org>; Fri, 16 Jul 2010 06:42:30 -0700 (PDT)
Received: (qmail invoked by alias); 16 Jul 2010 13:42:40 -0000
Received: from mail.greenbytes.de (EHLO [192.168.1.144]) [217.91.35.233] by mail.gmx.net (mp004) with SMTP; 16 Jul 2010 15:42:40 +0200
X-Authenticated: #1915285
X-Provags-ID: V01U2FsdGVkX1/u7npB7w/umtXos123jNmYrJEhFIErf6rtFIEf31 XGUtFiUE/56qg/
Message-ID: <4C4061C3.6090606@gmx.de>
Date: Fri, 16 Jul 2010 15:42:27 +0200
From: Julian Reschke <julian.reschke@gmx.de>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.10) Gecko/20100512 Lightning/1.0b1 Thunderbird/3.0.5
MIME-Version: 1.0
To: "http-state@ietf.org" <http-state@ietf.org>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Y-GMX-Trusted: 0
Subject: [http-state] algorithm definitions
X-BeenThere: http-state@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Discuss HTTP State Management Mechanism <http-state.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/http-state>
List-Post: <mailto:http-state@ietf.org>
List-Help: <mailto:http-state-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 16 Jul 2010 13:42:32 -0000

Hi,

from 
<http://tools.ietf.org/html/draft-ietf-httpstate-cookie-09#section-5.2>:

    A user agent MUST use an algorithm equivalent to the following
    algorithm to parse set-cookie-strings:

    1.  If the set-cookie-string contains a U+003B (";") character:

           The name-value-pair string consists of the characters up to,
           but not including, the first U+003B (";"), and the unparsed-
           attributes consist of the remainder of the set-cookie-string
           (including the U+003B (";") in question).

        Otherwise:

           The name-value-pair string consists of all the characters
           contained in the set-cookie-string, and the unparsed-
           attributes is the empty string.

    2.  If the name-value-pair string lacks a U+003D ("=") character,
        ignore the set-cookie-string entirely.

    3.  The (possibly empty) name string consists of the characters up
        to, but not including, the first U+003D ("=") character, and the
        (possibly empty) value string consists of the characters after
        the first U+003D ("=") character.

    4.  Remove any leading or trailing WSP characters from the name
        string and the value string.

    5.  If the name string is empty, ignore the set-cookie-string
        entirely.

    6.  The cookie-name is the name string, and the cookie-value is the
        value string.

    The user agent MUST use an algorithm equivalent to the following
    algorithm to parse the unparsed-attributes:

    1.  If the unparsed-attributes string is empty, skip the rest of
        these steps.

    2.  Discard the first character of the unparsed-attributes (which
        will be a U+003B (";") character).

    3.  If the remaining unparsed-attributes contains a U+003B (";")
        character:

           Consume the characters of the unparsed-attributes up to, but
           not including, the first U+003B (";") character.

        Otherwise:

           Consume the remainder of the unparsed-attributes.

        Let the cookie-av string be the characters consumed in this step.

    4.  If the cookie-av string contains a U+003D ("=") character:

           The (possibly empty) attribute-name string consists of the
           characters up to, but not including, the first U+003D ("=")
           character, and the (possibly empty) attribute-value string
           consists of the characters after the first U+003D ("=")
           character.

        Otherwise:

           The attribute-name string consists of the entire cookie-av
           string, and the attribute-value string is empty.

    5.  Remove any leading or trailing WSP characters from the attribute-
        name string and the attribute-value string.

    6.  Process the attribute-name and attribute-value according to the
        requirements in the following subsections.  (Notice that
        attributes with unrecognized attribute-names are ignored.)

    7.  Return to Step 1.


Wow -- all of this to say that a string should be tokenized where ";" 
occurs, that the first token and the remaining tokens have different 
roles, and how to parse the individual tokens.

A few ideas how to compress this:

- If part 2 / step 1 removes the leading semicolon, why include it in 
the first place?

- Maybe just say ";" and "=" after stating the Unicode code point once? 
Speaking of which, is *anybody* confused about what these characters 
might be?

- Instead of expressing a for-loop in prose, simply state that the 
string is to be split on semicolons, and a certain set of steps is to be 
applied to each fragment.

Etc.

I've heard that this part is exclusively for those who actually write 
the parsing code, and nobody else need to care. I disagree with that. If 
the spec makes normative requirements on handling non-conforming input, 
then it should be phrased in a way so that it's clear what gets 
processed how.

Giving an example of a conforming algorithm is fine, but substituting 
the description with that algorithm IMHO is not.

For instance, when I debug an HTTP/cookie problem and look at an HTTP 
trace, I want to be able to understand how the recipient is going to 
parse the string. Reading the algorithm really isn't very helpful for that.

Also, if we need algorithms instead of format descriptions, why is it ok 
to define date parsing using an ABNF (see section 5.1.1)?

Best regards, Julian