Re: [http-state] Ticket 11: Character encoding for non-ASCII cookies values

Mark Pauley <mpauley@apple.com> Thu, 04 March 2010 00:51 UTC

Return-Path: <mpauley@apple.com>
X-Original-To: http-state@core3.amsl.com
Delivered-To: http-state@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 7CE9728C4BB for <http-state@core3.amsl.com>; Wed, 3 Mar 2010 16:51:52 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.598
X-Spam-Level:
X-Spam-Status: No, score=-106.598 tagged_above=-999 required=5 tests=[AWL=-0.000, BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ge5E-+K3h-Af for <http-state@core3.amsl.com>; Wed, 3 Mar 2010 16:51:51 -0800 (PST)
Received: from mail-out4.apple.com (mail-out4.apple.com [17.254.13.23]) by core3.amsl.com (Postfix) with ESMTP id 4E5C328C4BC for <http-state@ietf.org>; Wed, 3 Mar 2010 16:51:51 -0800 (PST)
Received: from relay15.apple.com (relay15.apple.com [17.128.113.54]) by mail-out4.apple.com (Postfix) with ESMTP id 1C87B8ED4D93; Wed, 3 Mar 2010 16:51:53 -0800 (PST)
X-AuditID: 11807136-b7bafae000000e8d-99-4b8f04287075
Received: from il0301a-dhcp53.apple.com (il0301a-dhcp53.apple.com [17.203.14.181]) (using TLS with cipher AES128-SHA (AES128-SHA/128 bits)) (Client did not present a certificate) by relay15.apple.com (Apple SCV relay) with SMTP id 67.3A.03725.9240F8B4; Wed, 3 Mar 2010 16:51:53 -0800 (PST)
Mime-Version: 1.0 (Apple Message framework v1078)
Content-Type: multipart/alternative; boundary="Apple-Mail-13--553579524"
From: Mark Pauley <mpauley@apple.com>
In-Reply-To: <alpine.DEB.2.00.1003032150381.3143@tvnag.unkk.fr>
Date: Wed, 03 Mar 2010 16:51:52 -0800
Message-Id: <6EFCDA9A-C4AA-479D-895B-F9229FCF8AB3@apple.com>
References: <5c4444771003021624qc0b00cet27e348cb6d023b08@mail.gmail.com> <CB794A2E-2F2F-4CE4-8B15-BBE1A1E1B50F@apple.com> <alpine.DEB.2.00.1003032150381.3143@tvnag.unkk.fr>
To: Daniel Stenberg <daniel@haxx.se>
X-Mailer: Apple Mail (2.1078)
X-Brightmail-Tracker: AAAAAQAAAZE=
Cc: http-state <http-state@ietf.org>
Subject: Re: [http-state] Ticket 11: Character encoding for non-ASCII cookies values
X-BeenThere: http-state@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Discuss HTTP State Management Mechanism <http-state.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/http-state>
List-Post: <mailto:http-state@ietf.org>
List-Help: <mailto:http-state-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Mar 2010 00:51:52 -0000

You're correct, we do allow those.  What I'm worried about though is that valid UTF-8 sequences will contain \r\n .   In any case, writing good parsers for a variable-length is really tough.

To give you a sense of what Safari will accept, we'll accept the following for names or values in cookies:

CTL = 		[\x00-\x1F\x7F];
NETSCAPE_SPECIAL = ("," | ";" | "=" | SPACE | HTAB);
NONETSCAPE_VALCHAR = (CTL | NETSCAPE_SPECIAL);
NETSCAPE_CHAR = ((CHAR \ NONETSCAPE_VALCHAR) | COMMA_NOSPACE);
COMMA_NOSPACE = ((",")(CHAR \ (SPACE | HTAB | ";")));
NETSCAPE_VAL = (CHAR \ NONETSCAPE_VALCHAR)((NETSCAPE_CHAR | SPACE | HTAB)* NETSCAPE_CHAR)?;


And NETSCAPE_VAL is effectively our TOKEN.
We allow for quoted values, but those still reject the CTL class of character.

So, we certainly will choke on many UTF-8 character sequences.  In that case, we'll just ditch the cookie header as a whole.


One of the main problems so far with cookies has been that the Date spec used in the Netscape implementation clashes with the standard of HTTP to separate values in headers by a comma token.  That is I believe, some sites wish to set multiple cookies by using a comma separator whereas the comma is also used as part of an unquoted field.  If we can verify that nobody is actually trying to set multiple cookies per Set-Cookies header, then we can easily pull this functionality and be a bit more blind to the character set of the Set-Cookie header value.


On Mar 3, 2010, at 12:53 PM, Daniel Stenberg wrote:

> On Wed, 3 Mar 2010, Mark Pauley wrote:
> 
>> In the future, we ought to treat these as opaque octets.  However, the current cookie spec would lead me to believe that we should reject any cookies that contain control characters, which would be most non-ascii UTF-8 sequences, right?
> 
> Isn't the RFC2616 'token' a bit too strict for cookie-value ? The netscape spec is _very_ liberal ("a sequence of characters excluding semi-colon, comma and white space") so the current wording is a great deal more restrictive.
> 
> Don't cookie implementations already allow and use for example ()<>:@? etc?
> 
> -- 
> 
> / daniel.haxx.se