Re: [http-state] Ticket 11: Character encoding for non-ASCII cookies values

Adam Barth <ietf@adambarth.com> Thu, 04 March 2010 01:17 UTC

Return-Path: <ietf@adambarth.com>
X-Original-To: http-state@core3.amsl.com
Delivered-To: http-state@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 942FC3A89EA for <http-state@core3.amsl.com>; Wed, 3 Mar 2010 17:17:52 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.977
X-Spam-Level:
X-Spam-Status: No, score=-1.977 tagged_above=-999 required=5 tests=[AWL=0.000, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id b8rYf2pbfWhn for <http-state@core3.amsl.com>; Wed, 3 Mar 2010 17:17:51 -0800 (PST)
Received: from mail-gy0-f172.google.com (mail-gy0-f172.google.com [209.85.160.172]) by core3.amsl.com (Postfix) with ESMTP id 5C7233A89E4 for <http-state@ietf.org>; Wed, 3 Mar 2010 17:17:51 -0800 (PST)
Received: by gyc15 with SMTP id 15so1114269gyc.31 for <http-state@ietf.org>; Wed, 03 Mar 2010 17:17:50 -0800 (PST)
Received: by 10.150.193.3 with SMTP id q3mr1739505ybf.221.1267665470066; Wed, 03 Mar 2010 17:17:50 -0800 (PST)
Received: from mail-yw0-f173.google.com (mail-yw0-f173.google.com [209.85.211.173]) by mx.google.com with ESMTPS id 23sm15433yxe.55.2010.03.03.17.17.48 (version=SSLv3 cipher=RC4-MD5); Wed, 03 Mar 2010 17:17:49 -0800 (PST)
Received: by ywh3 with SMTP id 3so815180ywh.31 for <http-state@ietf.org>; Wed, 03 Mar 2010 17:17:48 -0800 (PST)
MIME-Version: 1.0
Received: by 10.150.238.17 with SMTP id l17mr1701898ybh.296.1267665468372; Wed, 03 Mar 2010 17:17:48 -0800 (PST)
In-Reply-To: <6EFCDA9A-C4AA-479D-895B-F9229FCF8AB3@apple.com>
References: <5c4444771003021624qc0b00cet27e348cb6d023b08@mail.gmail.com> <CB794A2E-2F2F-4CE4-8B15-BBE1A1E1B50F@apple.com> <alpine.DEB.2.00.1003032150381.3143@tvnag.unkk.fr> <6EFCDA9A-C4AA-479D-895B-F9229FCF8AB3@apple.com>
From: Adam Barth <ietf@adambarth.com>
Date: Wed, 03 Mar 2010 17:17:28 -0800
Message-ID: <5c4444771003031717n390ce79fkdebdfcf51693d877@mail.gmail.com>
To: Mark Pauley <mpauley@apple.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
Cc: Daniel Stenberg <daniel@haxx.se>, http-state <http-state@ietf.org>
Subject: Re: [http-state] Ticket 11: Character encoding for non-ASCII cookies values
X-BeenThere: http-state@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Discuss HTTP State Management Mechanism <http-state.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/http-state>
List-Post: <mailto:http-state@ietf.org>
List-Help: <mailto:http-state-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Mar 2010 01:17:52 -0000

See inline.

On Wed, Mar 3, 2010 at 4:51 PM, Mark Pauley <mpauley@apple.com> wrote:
> You're correct, we do allow those.  What I'm worried about though is that
> valid UTF-8 sequences will contain \r\n .   In any case, writing good
> parsers for a variable-length is really tough.
> To give you a sense of what Safari will accept, we'll accept the following
> for names or values in cookies:
> CTL = [\x00-\x1F\x7F];
> NETSCAPE_SPECIAL = ("," | ";" | "=" | SPACE | HTAB);
> NONETSCAPE_VALCHAR = (CTL | NETSCAPE_SPECIAL);
> NETSCAPE_CHAR = ((CHAR \ NONETSCAPE_VALCHAR) | COMMA_NOSPACE);
> COMMA_NOSPACE = ((",")(CHAR \ (SPACE | HTAB | ";")));
> NETSCAPE_VAL = (CHAR \ NONETSCAPE_VALCHAR)((NETSCAPE_CHAR | SPACE | HTAB)*
> NETSCAPE_CHAR)?;
>
> And NETSCAPE_VAL is effectively our TOKEN.

This grammar is significantly more complicated than what appears to be
necessary.  Is there some reason the paring algorithm in
<http://tools.ietf.org/html/draft-ietf-httpstate-cookie-04#section-5.2>
is insufficient?

> We allow for quoted values, but those still reject the CTL class of
> character.

Safari is currently the only browser to treat quote characters in
cookie values differently from other characters.  (Note that Firefox
used to have this behavior but changed recently to match IE and
Chrome.)  Safari seems to have adopted this behavior in the past two
years.  Source inside Apple have indicated that this change was made
to improve compliance with RFC 2109 and not because of specific web
compatibility concerns.

> So, we certainly will choke on many UTF-8 character sequences.  In that
> case, we'll just ditch the cookie header as a whole.

Indeed.  Safari is currently the only browser that has this behavior.

> One of the main problems so far with cookies has been that the Date spec
> used in the Netscape implementation clashes with the standard of HTTP to
> separate values in headers by a comma token.  That is I believe, some sites
> wish to set multiple cookies by using a comma separator whereas the comma is
> also used as part of an unquoted field.  If we can verify that nobody is
> actually trying to set multiple cookies per Set-Cookies header, then we can
> easily pull this functionality and be a bit more blind to the character set
> of the Set-Cookie header value.

The date parsing algorithm is specified in detail in
<http://tools.ietf.org/html/draft-ietf-httpstate-cookie-04#section-5.1.1>.

Unlike most HTTP headers, a comma cannot be used to combine multiple
Set-Cookie headers.  Safari appears to be the only browser that
attempts to support this behavior.

If you haven't already, I'd encourage you to read the latest draft
available at <http://tools.ietf.org/html/draft-ietf-httpstate-cookie>.
 Hopefully that will answer some number of your questions.

Adam


> On Mar 3, 2010, at 12:53 PM, Daniel Stenberg wrote:
>
> On Wed, 3 Mar 2010, Mark Pauley wrote:
>
> In the future, we ought to treat these as opaque octets.  However, the
> current cookie spec would lead me to believe that we should reject any
> cookies that contain control characters, which would be most non-ascii UTF-8
> sequences, right?
>
> Isn't the RFC2616 'token' a bit too strict for cookie-value ? The netscape
> spec is _very_ liberal ("a sequence of characters excluding semi-colon,
> comma and white space") so the current wording is a great deal more
> restrictive.
>
> Don't cookie implementations already allow and use for example ()<>:@? etc?
>
> --
>
> / daniel.haxx.se
>
>