Re: [http-state] Ticket 11: Character encoding for non-ASCII cookies values
Mark Pauley <mpauley@apple.com> Thu, 04 March 2010 01:24 UTC
Return-Path: <mpauley@apple.com>
X-Original-To: http-state@core3.amsl.com
Delivered-To: http-state@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id EDCD23A89EC for <http-state@core3.amsl.com>; Wed, 3 Mar 2010 17:24:41 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.599
X-Spam-Level:
X-Spam-Status: No, score=-106.599 tagged_above=-999 required=5 tests=[AWL=0.000, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id W7+rZVykM9VT for <http-state@core3.amsl.com>; Wed, 3 Mar 2010 17:24:40 -0800 (PST)
Received: from mail-out4.apple.com (mail-out4.apple.com [17.254.13.23]) by core3.amsl.com (Postfix) with ESMTP id AFFA03A89E9 for <http-state@ietf.org>; Wed, 3 Mar 2010 17:24:40 -0800 (PST)
Received: from relay13.apple.com (relay13.apple.com [17.128.113.29]) by mail-out4.apple.com (Postfix) with ESMTP id 7EABF8ED5CD1; Wed, 3 Mar 2010 17:24:42 -0800 (PST)
X-AuditID: 1180711d-b7b18ae000001001-71-4b8f0bda73c2
Received: from il0301a-dhcp53.apple.com (il0301a-dhcp53.apple.com [17.203.14.181]) (using TLS with cipher AES128-SHA (AES128-SHA/128 bits)) (Client did not present a certificate) by relay13.apple.com (Apple SCV relay) with SMTP id 20.72.04097.ADB0F8B4; Wed, 3 Mar 2010 17:24:42 -0800 (PST)
Mime-Version: 1.0 (Apple Message framework v1078)
Content-Type: text/plain; charset="us-ascii"
From: Mark Pauley <mpauley@apple.com>
In-Reply-To: <5c4444771003031717n390ce79fkdebdfcf51693d877@mail.gmail.com>
Date: Wed, 03 Mar 2010 17:24:42 -0800
Content-Transfer-Encoding: quoted-printable
Message-Id: <F5AA8646-DD59-4B2F-A731-6673BAAEC51A@apple.com>
References: <5c4444771003021624qc0b00cet27e348cb6d023b08@mail.gmail.com> <CB794A2E-2F2F-4CE4-8B15-BBE1A1E1B50F@apple.com> <alpine.DEB.2.00.1003032150381.3143@tvnag.unkk.fr> <6EFCDA9A-C4AA-479D-895B-F9229FCF8AB3@apple.com> <5c4444771003031717n390ce79fkdebdfcf51693d877@mail.gmail.com>
To: Adam Barth <ietf@adambarth.com>
X-Mailer: Apple Mail (2.1078)
X-Brightmail-Tracker: AAAAAQAAAZE=
Cc: Daniel Stenberg <daniel@haxx.se>, http-state <http-state@ietf.org>
Subject: Re: [http-state] Ticket 11: Character encoding for non-ASCII cookies values
X-BeenThere: http-state@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Discuss HTTP State Management Mechanism <http-state.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/http-state>
List-Post: <mailto:http-state@ietf.org>
List-Help: <mailto:http-state-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Mar 2010 01:24:42 -0000
On Mar 3, 2010, at 5:17 PM, Adam Barth wrote: > See inline. > > On Wed, Mar 3, 2010 at 4:51 PM, Mark Pauley <mpauley@apple.com> wrote: >> You're correct, we do allow those. What I'm worried about though is that >> valid UTF-8 sequences will contain \r\n . In any case, writing good >> parsers for a variable-length is really tough. >> To give you a sense of what Safari will accept, we'll accept the following >> for names or values in cookies: >> CTL = [\x00-\x1F\x7F]; >> NETSCAPE_SPECIAL = ("," | ";" | "=" | SPACE | HTAB); >> NONETSCAPE_VALCHAR = (CTL | NETSCAPE_SPECIAL); >> NETSCAPE_CHAR = ((CHAR \ NONETSCAPE_VALCHAR) | COMMA_NOSPACE); >> COMMA_NOSPACE = ((",")(CHAR \ (SPACE | HTAB | ";"))); >> NETSCAPE_VAL = (CHAR \ NONETSCAPE_VALCHAR)((NETSCAPE_CHAR | SPACE | HTAB)* >> NETSCAPE_CHAR)?; >> >> And NETSCAPE_VAL is effectively our TOKEN. > > This grammar is significantly more complicated than what appears to be > necessary. Is there some reason the paring algorithm in > <http://tools.ietf.org/html/draft-ietf-httpstate-cookie-04#section-5.2> > is insufficient? > >> We allow for quoted values, but those still reject the CTL class of >> character. > > Safari is currently the only browser to treat quote characters in > cookie values differently from other characters. (Note that Firefox > used to have this behavior but changed recently to match IE and > Chrome.) Safari seems to have adopted this behavior in the past two > years. Source inside Apple have indicated that this change was made > to improve compliance with RFC 2109 and not because of specific web > compatibility concerns. That's partially correct. We received a bug from an external source that drove this, though we didn't immediately change because of the lack of compatibility issues as you say. > >> So, we certainly will choke on many UTF-8 character sequences. In that >> case, we'll just ditch the cookie header as a whole. > > Indeed. Safari is currently the only browser that has this behavior. > >> One of the main problems so far with cookies has been that the Date spec >> used in the Netscape implementation clashes with the standard of HTTP to >> separate values in headers by a comma token. That is I believe, some sites >> wish to set multiple cookies by using a comma separator whereas the comma is >> also used as part of an unquoted field. If we can verify that nobody is >> actually trying to set multiple cookies per Set-Cookies header, then we can >> easily pull this functionality and be a bit more blind to the character set >> of the Set-Cookie header value. > > The date parsing algorithm is specified in detail in > <http://tools.ietf.org/html/draft-ietf-httpstate-cookie-04#section-5.1.1>. > > Unlike most HTTP headers, a comma cannot be used to combine multiple > Set-Cookie headers. Safari appears to be the only browser that > attempts to support this behavior. > > If you haven't already, I'd encourage you to read the latest draft > available at <http://tools.ietf.org/html/draft-ietf-httpstate-cookie>. > Hopefully that will answer some number of your questions. Interesting. Well, this could certainly simplify much of our internal cookie parsing mechanism. Is there going to be a push to deprecate RFC2109? Could you point me at a page with results on how the browsers handle different Set-Cookie forms in practice? Changing our behavior is nearly always dictated by broken compatibility with other sites. > > Adam > > >> On Mar 3, 2010, at 12:53 PM, Daniel Stenberg wrote: >> >> On Wed, 3 Mar 2010, Mark Pauley wrote: >> >> In the future, we ought to treat these as opaque octets. However, the >> current cookie spec would lead me to believe that we should reject any >> cookies that contain control characters, which would be most non-ascii UTF-8 >> sequences, right? >> >> Isn't the RFC2616 'token' a bit too strict for cookie-value ? The netscape >> spec is _very_ liberal ("a sequence of characters excluding semi-colon, >> comma and white space") so the current wording is a great deal more >> restrictive. >> >> Don't cookie implementations already allow and use for example ()<>:@? etc? >> >> -- >> >> / daniel.haxx.se >> >>
- [http-state] Ticket 11: Character encoding for no… Adam Barth
- Re: [http-state] Ticket 11: Character encoding fo… Dan Witte
- Re: [http-state] Ticket 11: Character encoding fo… Roy T. Fielding
- Re: [http-state] Ticket 11: Character encoding fo… Adam Barth
- Re: [http-state] Ticket 11: Character encoding fo… Achim Hoffmann
- Re: [http-state] Ticket 11: Character encoding fo… Adam Barth
- Re: [http-state] Ticket 11: Character encoding fo… Mark Pauley
- Re: [http-state] Ticket 11: Character encoding fo… Daniel Stenberg
- Re: [http-state] Ticket 11: Character encoding fo… Adam Barth
- Re: [http-state] Ticket 11: Character encoding fo… Adam Barth
- Re: [http-state] Ticket 11: Character encoding fo… Daniel Stenberg
- Re: [http-state] Ticket 11: Character encoding fo… Mark Pauley
- Re: [http-state] Ticket 11: Character encoding fo… Adam Barth
- Re: [http-state] Ticket 11: Character encoding fo… Mark Pauley
- Re: [http-state] Ticket 11: Character encoding fo… Adam Barth
- Re: [http-state] Ticket 11: Character encoding fo… Adam Barth
- Re: [http-state] Ticket 11: Character encoding fo… Mark Pauley
- Re: [http-state] Ticket 11: Character encoding fo… Daniel Stenberg
- Re: [http-state] Ticket 11: Character encoding fo… Adam Barth
- Re: [http-state] Ticket 11: Character encoding fo… Adam Barth