Re: [http-state] Ticket 11: Character encoding for non-ASCII cookies values

Achim Hoffmann <ah@securenet.de> Wed, 03 March 2010 14:32 UTC

Return-Path: <ah@securenet.de>
X-Original-To: http-state@core3.amsl.com
Delivered-To: http-state@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 3BA9A3A8DE8 for <http-state@core3.amsl.com>; Wed, 3 Mar 2010 06:32:56 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.32
X-Spam-Level:
X-Spam-Status: No, score=-1.32 tagged_above=-999 required=5 tests=[AWL=0.929, BAYES_00=-2.599, HELO_EQ_DE=0.35]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SqekGx2qEtLc for <http-state@core3.amsl.com>; Wed, 3 Mar 2010 06:32:55 -0800 (PST)
Received: from munich.securenet.de (munich.securenet.de [82.135.17.200]) by core3.amsl.com (Postfix) with ESMTP id 12A6C3A873B for <http-state@ietf.org>; Wed, 3 Mar 2010 06:32:52 -0800 (PST)
Received: from oxee.securenet.de (unknown [10.30.18.40]) by munich.securenet.de (Postfix) with ESMTP id 529AC27194 for <http-state@ietf.org>; Wed, 3 Mar 2010 15:32:53 +0100 (CET)
Received: by oxee.securenet.de (Postfix, from userid 65534) id 2F8C91402033; Wed, 3 Mar 2010 15:32:53 +0100 (CET)
Received: from localhost (localhost [127.0.0.1]) by oxee.securenet.de (Postfix) with ESMTP id E8C2D1402436; Wed, 3 Mar 2010 15:32:51 +0100 (CET)
Received: from oxee.securenet.de ([127.0.0.1]) by localhost (oxee.securenet.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 16804-10; Wed, 3 Mar 2010 15:32:51 +0100 (CET)
Received: from [10.30.18.9] (krakatau.securenet.de [10.30.18.9]) by oxee.securenet.de (Postfix) with ESMTP id D0C101402426; Wed, 3 Mar 2010 15:32:51 +0100 (CET)
Message-ID: <4B8E7315.7040602@securenet.de>
Date: Wed, 03 Mar 2010 15:32:53 +0100
From: Achim Hoffmann <ah@securenet.de>
Organization: SecureNet
User-Agent: who">cares?
MIME-Version: 1.0
To: http-state <http-state@ietf.org>
References: <5c4444771003021624qc0b00cet27e348cb6d023b08@mail.gmail.com> <4BF4ABE3-7699-4D75-9E3C-48871CBA13E8@gbiv.com> <5c4444771003022146h1e4dfc3fi4196b5697725ebc3@mail.gmail.com>
In-Reply-To: <5c4444771003022146h1e4dfc3fi4196b5697725ebc3@mail.gmail.com>
X-Enigmail-Version: 0.96.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Virus-Scanned: Open-Xchange Express amavisd-new at oxee.securenet.de
Cc: "Roy T. Fielding" <fielding@gbiv.com>
Subject: Re: [http-state] Ticket 11: Character encoding for non-ASCII cookies values
X-BeenThere: http-state@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Discuss HTTP State Management Mechanism <http-state.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/http-state>
List-Post: <mailto:http-state@ietf.org>
List-Help: <mailto:http-state-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Mar 2010 14:32:56 -0000

Adam Barth wrote on 03.03.2010 06:46:
> On Tue, Mar 2, 2010 at 5:08 PM, Roy T. Fielding <fielding@gbiv.com> wrote:
>> On Mar 2, 2010, at 4:24 PM, Adam Barth wrote:
>>> The draft treats the cookie values as opaque octets throughout for use
>>> on the wire.  I've added a SHOULD-level requirement to use a UTF8 when
>>> converting the octets to characters (e.g., for use in the user agent's
>>> user interface).
>>>
>>> Given that the encoding issue doesn't appear to affect
>>> interoperability on the wire, I think a SHOULD-level recommendation is
>>> appropriate here.  If specific APIs (e.g., document.cookie) have more
>>> specific needs, they can add additional requirements.
>>>
>>> Thoughts?
>> I think that is fine if it is made clear that UTF-8 is only applicable
>> after the field value is extracted from the rest of the message.  I.e.,
>> the HTTP parser must be ASCII-based and thus not vulnerable to
>> invalid Unicode byte sequences.
> 
> Hopefully that should be clear in the draft.  The encoding is mention
> at the end of the serialization section (which is two sections after
> the parsing section).

IIRC previous discussions revealed that some browsers allow arbitrary data
for the cookie.
If the draft now (phase 1) recommends a special encoding, i.e. UTF-8, then
it violates the status quo. Does it?

If a coding like UTF-8 is recommended in phase 2, then this may result in
a transforming/canonicalisation/best-fit-mapping nightmare again.
Think of browser APIs (like JavaScript) which might use a different
encoding (i.e. UCS-2 as in ECMA-262).

Do I miss something here? If not all cookie data (key=value) SHOULD (phase 1)
or MUST (phase 2) be URL encoded.
This leaves the final data format open to whatever the application and/or
the browser wants but is transparent and secure on protocol level.

Achim