Re: [http-state] Ticket 11: Character encoding for non-ASCII cookies values
Adam Barth <ietf@adambarth.com> Thu, 04 March 2010 01:17 UTC
Return-Path: <ietf@adambarth.com>
X-Original-To: http-state@core3.amsl.com
Delivered-To: http-state@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 942FC3A89EA for <http-state@core3.amsl.com>; Wed, 3 Mar 2010 17:17:52 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.977
X-Spam-Level:
X-Spam-Status: No, score=-1.977 tagged_above=-999 required=5 tests=[AWL=0.000, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id b8rYf2pbfWhn for <http-state@core3.amsl.com>; Wed, 3 Mar 2010 17:17:51 -0800 (PST)
Received: from mail-gy0-f172.google.com (mail-gy0-f172.google.com [209.85.160.172]) by core3.amsl.com (Postfix) with ESMTP id 5C7233A89E4 for <http-state@ietf.org>; Wed, 3 Mar 2010 17:17:51 -0800 (PST)
Received: by gyc15 with SMTP id 15so1114269gyc.31 for <http-state@ietf.org>; Wed, 03 Mar 2010 17:17:50 -0800 (PST)
Received: by 10.150.193.3 with SMTP id q3mr1739505ybf.221.1267665470066; Wed, 03 Mar 2010 17:17:50 -0800 (PST)
Received: from mail-yw0-f173.google.com (mail-yw0-f173.google.com [209.85.211.173]) by mx.google.com with ESMTPS id 23sm15433yxe.55.2010.03.03.17.17.48 (version=SSLv3 cipher=RC4-MD5); Wed, 03 Mar 2010 17:17:49 -0800 (PST)
Received: by ywh3 with SMTP id 3so815180ywh.31 for <http-state@ietf.org>; Wed, 03 Mar 2010 17:17:48 -0800 (PST)
MIME-Version: 1.0
Received: by 10.150.238.17 with SMTP id l17mr1701898ybh.296.1267665468372; Wed, 03 Mar 2010 17:17:48 -0800 (PST)
In-Reply-To: <6EFCDA9A-C4AA-479D-895B-F9229FCF8AB3@apple.com>
References: <5c4444771003021624qc0b00cet27e348cb6d023b08@mail.gmail.com> <CB794A2E-2F2F-4CE4-8B15-BBE1A1E1B50F@apple.com> <alpine.DEB.2.00.1003032150381.3143@tvnag.unkk.fr> <6EFCDA9A-C4AA-479D-895B-F9229FCF8AB3@apple.com>
From: Adam Barth <ietf@adambarth.com>
Date: Wed, 03 Mar 2010 17:17:28 -0800
Message-ID: <5c4444771003031717n390ce79fkdebdfcf51693d877@mail.gmail.com>
To: Mark Pauley <mpauley@apple.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
Cc: Daniel Stenberg <daniel@haxx.se>, http-state <http-state@ietf.org>
Subject: Re: [http-state] Ticket 11: Character encoding for non-ASCII cookies values
X-BeenThere: http-state@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Discuss HTTP State Management Mechanism <http-state.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/http-state>
List-Post: <mailto:http-state@ietf.org>
List-Help: <mailto:http-state-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Mar 2010 01:17:52 -0000
See inline. On Wed, Mar 3, 2010 at 4:51 PM, Mark Pauley <mpauley@apple.com> wrote: > You're correct, we do allow those. What I'm worried about though is that > valid UTF-8 sequences will contain \r\n . In any case, writing good > parsers for a variable-length is really tough. > To give you a sense of what Safari will accept, we'll accept the following > for names or values in cookies: > CTL = [\x00-\x1F\x7F]; > NETSCAPE_SPECIAL = ("," | ";" | "=" | SPACE | HTAB); > NONETSCAPE_VALCHAR = (CTL | NETSCAPE_SPECIAL); > NETSCAPE_CHAR = ((CHAR \ NONETSCAPE_VALCHAR) | COMMA_NOSPACE); > COMMA_NOSPACE = ((",")(CHAR \ (SPACE | HTAB | ";"))); > NETSCAPE_VAL = (CHAR \ NONETSCAPE_VALCHAR)((NETSCAPE_CHAR | SPACE | HTAB)* > NETSCAPE_CHAR)?; > > And NETSCAPE_VAL is effectively our TOKEN. This grammar is significantly more complicated than what appears to be necessary. Is there some reason the paring algorithm in <http://tools.ietf.org/html/draft-ietf-httpstate-cookie-04#section-5.2> is insufficient? > We allow for quoted values, but those still reject the CTL class of > character. Safari is currently the only browser to treat quote characters in cookie values differently from other characters. (Note that Firefox used to have this behavior but changed recently to match IE and Chrome.) Safari seems to have adopted this behavior in the past two years. Source inside Apple have indicated that this change was made to improve compliance with RFC 2109 and not because of specific web compatibility concerns. > So, we certainly will choke on many UTF-8 character sequences. In that > case, we'll just ditch the cookie header as a whole. Indeed. Safari is currently the only browser that has this behavior. > One of the main problems so far with cookies has been that the Date spec > used in the Netscape implementation clashes with the standard of HTTP to > separate values in headers by a comma token. That is I believe, some sites > wish to set multiple cookies by using a comma separator whereas the comma is > also used as part of an unquoted field. If we can verify that nobody is > actually trying to set multiple cookies per Set-Cookies header, then we can > easily pull this functionality and be a bit more blind to the character set > of the Set-Cookie header value. The date parsing algorithm is specified in detail in <http://tools.ietf.org/html/draft-ietf-httpstate-cookie-04#section-5.1.1>. Unlike most HTTP headers, a comma cannot be used to combine multiple Set-Cookie headers. Safari appears to be the only browser that attempts to support this behavior. If you haven't already, I'd encourage you to read the latest draft available at <http://tools.ietf.org/html/draft-ietf-httpstate-cookie>. Hopefully that will answer some number of your questions. Adam > On Mar 3, 2010, at 12:53 PM, Daniel Stenberg wrote: > > On Wed, 3 Mar 2010, Mark Pauley wrote: > > In the future, we ought to treat these as opaque octets. However, the > current cookie spec would lead me to believe that we should reject any > cookies that contain control characters, which would be most non-ascii UTF-8 > sequences, right? > > Isn't the RFC2616 'token' a bit too strict for cookie-value ? The netscape > spec is _very_ liberal ("a sequence of characters excluding semi-colon, > comma and white space") so the current wording is a great deal more > restrictive. > > Don't cookie implementations already allow and use for example ()<>:@? etc? > > -- > > / daniel.haxx.se > >
- [http-state] Ticket 11: Character encoding for no… Adam Barth
- Re: [http-state] Ticket 11: Character encoding fo… Dan Witte
- Re: [http-state] Ticket 11: Character encoding fo… Roy T. Fielding
- Re: [http-state] Ticket 11: Character encoding fo… Adam Barth
- Re: [http-state] Ticket 11: Character encoding fo… Achim Hoffmann
- Re: [http-state] Ticket 11: Character encoding fo… Adam Barth
- Re: [http-state] Ticket 11: Character encoding fo… Mark Pauley
- Re: [http-state] Ticket 11: Character encoding fo… Daniel Stenberg
- Re: [http-state] Ticket 11: Character encoding fo… Adam Barth
- Re: [http-state] Ticket 11: Character encoding fo… Adam Barth
- Re: [http-state] Ticket 11: Character encoding fo… Daniel Stenberg
- Re: [http-state] Ticket 11: Character encoding fo… Mark Pauley
- Re: [http-state] Ticket 11: Character encoding fo… Adam Barth
- Re: [http-state] Ticket 11: Character encoding fo… Mark Pauley
- Re: [http-state] Ticket 11: Character encoding fo… Adam Barth
- Re: [http-state] Ticket 11: Character encoding fo… Adam Barth
- Re: [http-state] Ticket 11: Character encoding fo… Mark Pauley
- Re: [http-state] Ticket 11: Character encoding fo… Daniel Stenberg
- Re: [http-state] Ticket 11: Character encoding fo… Adam Barth
- Re: [http-state] Ticket 11: Character encoding fo… Adam Barth