Re: [hybi] Technical feedback. was: Process!

Jamie Lokier <jamie@shareable.org> Wed, 03 February 2010 01:34 UTC

Return-Path: <jamie@shareable.org>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 0894A3A682E for <hybi@core3.amsl.com>; Tue, 2 Feb 2010 17:34:25 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.513
X-Spam-Level:
X-Spam-Status: No, score=-3.513 tagged_above=-999 required=5 tests=[AWL=1.086, BAYES_00=-2.599, GB_I_INVITATION=-2]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WKqZQpGE-IKN for <hybi@core3.amsl.com>; Tue, 2 Feb 2010 17:34:23 -0800 (PST)
Received: from mail2.shareable.org (mail2.shareable.org [80.68.89.115]) by core3.amsl.com (Postfix) with ESMTP id B43753A68AF for <hybi@ietf.org>; Tue, 2 Feb 2010 17:34:23 -0800 (PST)
Received: from jamie by mail2.shareable.org with local (Exim 4.63) (envelope-from <jamie@shareable.org>) id 1NcU8y-0004d9-Tt; Wed, 03 Feb 2010 01:35:00 +0000
Date: Wed, 03 Feb 2010 01:35:00 +0000
From: Jamie Lokier <jamie@shareable.org>
To: Greg Wilkins <gregw@webtide.com>
Message-ID: <20100203013500.GJ32743@shareable.org>
References: <4B62C5FE.8090904@it.aoyama.ac.jp> <Pine.LNX.4.64.1001291134350.22020@ps20323.dreamhostps.com> <4B62E516.2010003@webtide.com> <5c902b9e1001290756r3f585204h32cacd6e64fbebaa@mail.gmail.com> <4B636757.3040307@webtide.com> <BBF3CE06-3276-4A7C-8961-7B3DDEE406D0@apple.com> <4B63DC2D.4090702@webtide.com> <5c902b9e1001292325p423d7e82o9478441893e34523@mail.gmail.com> <DF402A25-D858-4E56-811D-464C85226800@apple.com> <4B64B42D.5090007@webtide.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <4B64B42D.5090007@webtide.com>
User-Agent: Mutt/1.5.13 (2006-08-11)
Cc: Hybi <hybi@ietf.org>
Subject: Re: [hybi] Technical feedback. was: Process!
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Feb 2010 01:34:25 -0000

Greg Wilkins wrote:

>   + Why have two framing techniques when binary is sufficient to carry
>     everything.

I agree.  While I acknowledge the argument that

    print length($text), $text

is an invitation to do the wrong thing in some languages, I think that
there are more ways that 0xff can lead to the wrong thing, because so
many languages pass around "nominal UTF-8" which is not guaranteed to
be UTF-8, making 0xff delimiting unreliable too.

>   + Who controls allocation of the frame type byte?  So far every
>     suggestion of usage for that (eg a bit to indicate that the
>     frame contains meta-data headers) has been rejected.  So are
>     binary users simply to pick their own bytes and hope for no
>     collisions?  Will IANA eventually allocate values?  is 7 bits
>     enough?

There will be no collisions for frame bytes which depend on the
sub-protocol name, as those frame bytes are privately agreed between
client and server.

The only ones needing global agreement are:

   - Whatever agents and proxies need to handle the basic frame
     boundaries, e.g. for choosing when to forward sooner and when to
     buffer indefinitely for TCP performance.

     This is provided by the high bit.  Once widely deployed, this
     must not be changed unless there is already a "mandatory feature"
     negotation rule already in place.

   - Any subprotocol frame bytes which are given generic meanings,
     e.g. as discussed for the close message.

     Even these can be subprotocol negotiation dependent.

>   + Sentinel framing is unsafe.   It relies on the fact that there
>     are no 0 bytes

I made that mistake, before reading the draft properly :-)
It's 0xff.

A 0 byte is *valid* UTF-8, so of course it's ok to send that.

But you're still right:

>     in the utf-8 strings that are passed to it.  Strangely enough,
>     users can't be trusted to always provide valid utf-8 data, so if
>     user data is not validated then sentinel encoding allows frame
>     injection attacks.  After all we have learnt with HTTP, it seams
>     silly to repeat the mistake of a protocol that is exposed to
>     such attacks

If you have several bits of code sharing a connection - even if
it's just by sharing a common Javascript framework on a single page,
then you have security issues from this.

Less obviously, frame injection attacks can occur *even* with a
private connection between one client and one server.  All it takes is
for user data to get in which isn't really UTF-8 (or the author didn't
even try non-ASCII text), and attacks are possible then.

If nowhere else, this should be in the SECURITY CONSIDERATIONS section
of the draft.

Another common place for invalid UTF-8 injection is filenames and text
files that haven't been checked:

   - Imagine a simple WebSocket server which reads a directory,
     on a system where filenames are UTF-8, and sends the names.  But on
     such systems, even though filenames are *nominally* UTF-8, there are
     inevitably some which aren't.

   - Imagine a simple WebSocket server which sends a file, whose
     contents are expected to be UTF-8, but for whatever reason are not.

And then there are URI query arguments, which code may assume is UTF-8
encoded because it always is when used normally on the accompanying
web pages... But maliciously crafted URIs break the assumption.

As a general rule, I've seen that many languages which handle UTF-8 do
not check that all inputs are valid UTF-8, and those which do often
emit a warning rather than preventing it from infecting outputs.  This
is not an area to expect care from authors.

>   + the utf-8 Sentinel framing is inflexible.  It sends only raw
>   utf-8.  What if I want to send gzipped utf-8, or utf-16 etc.  This
>   could simply be handled with a content encoding header in the
>   upgrade request and use of binary framing.

Not so much an issue.  You can just pick a binary frame type in the
existing framework, and use that to carry gzipped frames.

Provided both ends agree, you can just tunnel UTF-8 WebSocket
over binary WebSocket :-)

What it should not is assume it's ok to use frame types < 0x80 to
carry binary frames when accompanied by mutually agreed content
negotation.  (Assuming the current format gets deployed).  Because it
is *inevitable* that WebSocket-aware proxies will parse frame
boundaries at some point - if only to improve buffering decisions.

Aside from frame injection attacks due to "not really UTF-8", the
other reason to prefer length-delimited framing is performance.  I'm
not surprised at Google's measurement.  With length-delimited frames,
you don't have to examine every byte when forwarding it to keep track
of boundaries for buffering decisions.

-- Jamie