[http-state] Ticket 11: Character encoding for non-ASCII cookies values

Adam Barth <ietf@adambarth.com> Wed, 03 March 2010 00:25 UTC

Return-Path: <ietf@adambarth.com>
X-Original-To: http-state@core3.amsl.com
Delivered-To: http-state@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 0F2C03A6D3F for <http-state@core3.amsl.com>; Tue, 2 Mar 2010 16:25:12 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.977
X-Spam-Level:
X-Spam-Status: No, score=-1.977 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, FM_FORGED_GMAIL=0.622]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZTeEt8A7rA+F for <http-state@core3.amsl.com>; Tue, 2 Mar 2010 16:25:10 -0800 (PST)
Received: from mail-gy0-f172.google.com (mail-gy0-f172.google.com [209.85.160.172]) by core3.amsl.com (Postfix) with ESMTP id 9608F28C133 for <http-state@ietf.org>; Tue, 2 Mar 2010 16:25:08 -0800 (PST)
Received: by gyc15 with SMTP id 15so522892gyc.31 for <http-state@ietf.org>; Tue, 02 Mar 2010 16:25:06 -0800 (PST)
Received: by 10.150.252.13 with SMTP id z13mr285226ybh.116.1267575906688; Tue, 02 Mar 2010 16:25:06 -0800 (PST)
Received: from mail-iw0-f179.google.com (mail-iw0-f179.google.com [209.85.223.179]) by mx.google.com with ESMTPS id 21sm4606162iwn.15.2010.03.02.16.25.05 (version=SSLv3 cipher=RC4-MD5); Tue, 02 Mar 2010 16:25:05 -0800 (PST)
Received: by iwn9 with SMTP id 9so925365iwn.17 for <http-state@ietf.org>; Tue, 02 Mar 2010 16:25:05 -0800 (PST)
MIME-Version: 1.0
Received: by 10.231.191.131 with SMTP id dm3mr316578ibb.45.1267575904801; Tue, 02 Mar 2010 16:25:04 -0800 (PST)
From: Adam Barth <ietf@adambarth.com>
Date: Tue, 2 Mar 2010 16:24:44 -0800
Message-ID: <5c4444771003021624qc0b00cet27e348cb6d023b08@mail.gmail.com>
To: http-state <http-state@ietf.org>
Content-Type: text/plain; charset=ISO-8859-1
Subject: [http-state] Ticket 11: Character encoding for non-ASCII cookies values
X-BeenThere: http-state@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Discuss HTTP State Management Mechanism <http-state.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/http-state>
List-Post: <mailto:http-state@ietf.org>
List-Help: <mailto:http-state-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Mar 2010 00:25:12 -0000

We had some earlier discussion about what to do with non-ASCII
characters in cookie values.

== On the wire ==

* IE, Firefox, Chrome, and Opera seem to treat non-ASCII characters as
opaque octets on the wire
* Safari seems to drop cookies with non-ASCII characters (although
Maciej said code inspection leads him to believe Safari's behavior is
a bit more complicated).

== In document.cookie ==

In <http://github.com/abarth/http-state/blob/master/notes/2010-02-03-Julian-Reschke.txt>,
Julian Reschke wrote:
[[
I just did a quick test with an ISO-8859-1 encoded cookie value,
client-side javascript and "alert(document.cookie)":
- IE and Firefox appear to treat the cookie as ISO-8859-1
- Safari appears to ignore the cookie
- Chrome and Opera appear to try to decode as UTF-8 (and returns a
REPLACEMENT CHAR in place of the umlaut I tried)
]]

== Proposal ==

The draft treats the cookie values as opaque octets throughout for use
on the wire.  I've added a SHOULD-level requirement to use a UTF8 when
converting the octets to characters (e.g., for use in the user agent's
user interface).

Given that the encoding issue doesn't appear to affect
interoperability on the wire, I think a SHOULD-level recommendation is
appropriate here.  If specific APIs (e.g., document.cookie) have more
specific needs, they can add additional requirements.

Thoughts?

Adam