Re: [http-state] Ticket 11: Character encoding for non-ASCII cookies values

Mark Pauley <mpauley@apple.com> Thu, 04 March 2010 01:24 UTC

Return-Path: <mpauley@apple.com>
X-Original-To: http-state@core3.amsl.com
Delivered-To: http-state@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id EDCD23A89EC for <http-state@core3.amsl.com>; Wed, 3 Mar 2010 17:24:41 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.599
X-Spam-Level:
X-Spam-Status: No, score=-106.599 tagged_above=-999 required=5 tests=[AWL=0.000, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id W7+rZVykM9VT for <http-state@core3.amsl.com>; Wed, 3 Mar 2010 17:24:40 -0800 (PST)
Received: from mail-out4.apple.com (mail-out4.apple.com [17.254.13.23]) by core3.amsl.com (Postfix) with ESMTP id AFFA03A89E9 for <http-state@ietf.org>; Wed, 3 Mar 2010 17:24:40 -0800 (PST)
Received: from relay13.apple.com (relay13.apple.com [17.128.113.29]) by mail-out4.apple.com (Postfix) with ESMTP id 7EABF8ED5CD1; Wed, 3 Mar 2010 17:24:42 -0800 (PST)
X-AuditID: 1180711d-b7b18ae000001001-71-4b8f0bda73c2
Received: from il0301a-dhcp53.apple.com (il0301a-dhcp53.apple.com [17.203.14.181]) (using TLS with cipher AES128-SHA (AES128-SHA/128 bits)) (Client did not present a certificate) by relay13.apple.com (Apple SCV relay) with SMTP id 20.72.04097.ADB0F8B4; Wed, 3 Mar 2010 17:24:42 -0800 (PST)
Mime-Version: 1.0 (Apple Message framework v1078)
Content-Type: text/plain; charset="us-ascii"
From: Mark Pauley <mpauley@apple.com>
In-Reply-To: <5c4444771003031717n390ce79fkdebdfcf51693d877@mail.gmail.com>
Date: Wed, 03 Mar 2010 17:24:42 -0800
Content-Transfer-Encoding: quoted-printable
Message-Id: <F5AA8646-DD59-4B2F-A731-6673BAAEC51A@apple.com>
References: <5c4444771003021624qc0b00cet27e348cb6d023b08@mail.gmail.com> <CB794A2E-2F2F-4CE4-8B15-BBE1A1E1B50F@apple.com> <alpine.DEB.2.00.1003032150381.3143@tvnag.unkk.fr> <6EFCDA9A-C4AA-479D-895B-F9229FCF8AB3@apple.com> <5c4444771003031717n390ce79fkdebdfcf51693d877@mail.gmail.com>
To: Adam Barth <ietf@adambarth.com>
X-Mailer: Apple Mail (2.1078)
X-Brightmail-Tracker: AAAAAQAAAZE=
Cc: Daniel Stenberg <daniel@haxx.se>, http-state <http-state@ietf.org>
Subject: Re: [http-state] Ticket 11: Character encoding for non-ASCII cookies values
X-BeenThere: http-state@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Discuss HTTP State Management Mechanism <http-state.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/http-state>
List-Post: <mailto:http-state@ietf.org>
List-Help: <mailto:http-state-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Mar 2010 01:24:42 -0000

On Mar 3, 2010, at 5:17 PM, Adam Barth wrote:

> See inline.
> 
> On Wed, Mar 3, 2010 at 4:51 PM, Mark Pauley <mpauley@apple.com> wrote:
>> You're correct, we do allow those.  What I'm worried about though is that
>> valid UTF-8 sequences will contain \r\n .   In any case, writing good
>> parsers for a variable-length is really tough.
>> To give you a sense of what Safari will accept, we'll accept the following
>> for names or values in cookies:
>> CTL = [\x00-\x1F\x7F];
>> NETSCAPE_SPECIAL = ("," | ";" | "=" | SPACE | HTAB);
>> NONETSCAPE_VALCHAR = (CTL | NETSCAPE_SPECIAL);
>> NETSCAPE_CHAR = ((CHAR \ NONETSCAPE_VALCHAR) | COMMA_NOSPACE);
>> COMMA_NOSPACE = ((",")(CHAR \ (SPACE | HTAB | ";")));
>> NETSCAPE_VAL = (CHAR \ NONETSCAPE_VALCHAR)((NETSCAPE_CHAR | SPACE | HTAB)*
>> NETSCAPE_CHAR)?;
>> 
>> And NETSCAPE_VAL is effectively our TOKEN.
> 
> This grammar is significantly more complicated than what appears to be
> necessary.  Is there some reason the paring algorithm in
> <http://tools.ietf.org/html/draft-ietf-httpstate-cookie-04#section-5.2>
> is insufficient?
> 
>> We allow for quoted values, but those still reject the CTL class of
>> character.
> 
> Safari is currently the only browser to treat quote characters in
> cookie values differently from other characters.  (Note that Firefox
> used to have this behavior but changed recently to match IE and
> Chrome.)  Safari seems to have adopted this behavior in the past two
> years.  Source inside Apple have indicated that this change was made
> to improve compliance with RFC 2109 and not because of specific web
> compatibility concerns.

That's partially correct.  We received a bug from an external source that drove this, though we didn't immediately change because of the lack of compatibility issues as you say.

> 
>> So, we certainly will choke on many UTF-8 character sequences.  In that
>> case, we'll just ditch the cookie header as a whole.
> 
> Indeed.  Safari is currently the only browser that has this behavior.
> 
>> One of the main problems so far with cookies has been that the Date spec
>> used in the Netscape implementation clashes with the standard of HTTP to
>> separate values in headers by a comma token.  That is I believe, some sites
>> wish to set multiple cookies by using a comma separator whereas the comma is
>> also used as part of an unquoted field.  If we can verify that nobody is
>> actually trying to set multiple cookies per Set-Cookies header, then we can
>> easily pull this functionality and be a bit more blind to the character set
>> of the Set-Cookie header value.
> 
> The date parsing algorithm is specified in detail in
> <http://tools.ietf.org/html/draft-ietf-httpstate-cookie-04#section-5.1.1>.
> 
> Unlike most HTTP headers, a comma cannot be used to combine multiple
> Set-Cookie headers.  Safari appears to be the only browser that
> attempts to support this behavior.
> 
> If you haven't already, I'd encourage you to read the latest draft
> available at <http://tools.ietf.org/html/draft-ietf-httpstate-cookie>.
> Hopefully that will answer some number of your questions.

Interesting.  Well, this could certainly simplify much of our internal cookie parsing mechanism.  Is there going to be a push to deprecate RFC2109?  Could you point me at a page with results on how the browsers handle different Set-Cookie forms in practice?  Changing our behavior is nearly always dictated by broken compatibility with other sites.

> 
> Adam
> 
> 
>> On Mar 3, 2010, at 12:53 PM, Daniel Stenberg wrote:
>> 
>> On Wed, 3 Mar 2010, Mark Pauley wrote:
>> 
>> In the future, we ought to treat these as opaque octets.  However, the
>> current cookie spec would lead me to believe that we should reject any
>> cookies that contain control characters, which would be most non-ascii UTF-8
>> sequences, right?
>> 
>> Isn't the RFC2616 'token' a bit too strict for cookie-value ? The netscape
>> spec is _very_ liberal ("a sequence of characters excluding semi-colon,
>> comma and white space") so the current wording is a great deal more
>> restrictive.
>> 
>> Don't cookie implementations already allow and use for example ()<>:@? etc?
>> 
>> --
>> 
>> / daniel.haxx.se
>> 
>>