Re: [hybi] Fwd: Gen-ART last call review of draft-ietf-hybi-thewebsocketprotocol-10

John Tamplin <jat@google.com> Thu, 21 July 2011 03:23 UTC

DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns; h=dkim-signature:mime-version:in-reply-to:references:from:date: message-id:subject:to:cc:content-type: content-transfer-encoding:x-system-of-record; b=Ka/DDLNHDE2WNS2joLlJp2sGQ253IbmJ6/NxruDdU5FoBk4PdHEctb14A0QpEXH9d VFQsohxfiXaKcTD6SBZnA==
MIME-Version: 1.0
In-Reply-To: <4E2792EB.2070408@stpeter.im>
References: <4E2792EB.2070408@stpeter.im>
From: John Tamplin <jat@google.com>
Date: Wed, 20 Jul 2011 23:23:27 -0400
Message-ID: <CABLsOLCy3xAtXavSGc1mJA18Yhh7gZoaVX9Rg07Dyka1sNx0Tw@mail.gmail.com>
To: rbarnes@bbn.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Cc: "hybi@ietf.org" <hybi@ietf.org>
Subject: Re: [hybi] Fwd: Gen-ART last call review of draft-ietf-hybi-thewebsocketprotocol-10
Precedence: list

On Wed, Jul 20, 2011 at 10:46 PM, Peter Saint-Andre <stpeter@stpeter.im> wrote:
> -------- Original Message --------
> Subject: Gen-ART  last call review of draft-ietf-hybi-thewebsocketprotocol-10
> Date: Tue, 19 Jul 2011 23:01:53 -0400
> From: Richard L. Barnes <rbarnes@bbn.com>
> To: General Area Review Team <gen-art@ietf.org>, draft-ietf-websec-thewebsocketprotocol@tools.ietf.org, IETF Discussion <ietf@ietf.org>
>
> I am the assigned Gen-ART reviewer for this draft. For background on
> Gen-ART, please see the FAQ at
> <http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.
>
> Please resolve these comments along with any other Last Call comments
> you may receive.
>
> Document: draft-ietf-hybi-thewebsocketprotocol-10.txt
> Reviewer: Richard Barnes
> Review Date: 19 July 2011
> IETF LC End Date: 25 July 2011
> IESG Telechat date: (if known) -
>
> Summary:
> Not ready.
>
> Major issues:
>
> [Huge buffers]
> The frame length can be 7, 16, or 64 bits long.  Since the client is expected to buffer data until the end of a frame,
> this is asking clients to buffer 128 B, 64 KB, or 16 EB.  If it were 32 bits, the max would be 4 MB.  Why not just
> make this a 32-bit fixed length field?

It is a compromise to get everyone to agree.  There were several
requirements different groups were interested in:
 - Small frames should have minimal overhead.  A chat program sending
individual keystrokes should not have to pay
   an extra 3 bytes of length, and there are small packet thresholds
to consider for mobile environments, where larger
   packets require powering up the radio to a higher power state.
 - Some people wanted to be able to use facilities like Unix's
sendfile to transmit an entire file without having to stay in
   the loop and write framing.  For future compatibility, this
requires a larger size than 4G.  Just like IPv6, it seemed
   better to use an "obviously large enough" value rather than argue
further about exactly how large it would need to be.
 - Having a variable-size length field means the cost to support the
very large buffers is small (one value wasted of the
   lead length byte), while saving a significant percentage of the
frame size for very small payloads.

As there were holdouts on both the side of wanting small headers for
small frames and wanting to send large messages without having to
fragment, this compromise was necessary to make progress.

> [Why is masking necessary?]
> I seriously question the necessity of the masking of data frames.  As I understand it, the goal is to prevent
> proxies that don't understand Upgrade from confusing WebSocket data with HTTP data.  This risk seems a
> little dubious to me; has such a poisoning attack been demonstrated?  It seems like there are much simpler
> ways of doing this, like using a method other than GET (either CONNECT or something new).

There was a very long, contentious discussion about this based on
security research that found transparent proxies could be fooled into
believing the content following the WebSocket handshake was HTTP,
allowing poisoning a transparent cache -- imagine if an attacker could
replace the contents of www.google.com/ga.js on some cache serving
many users, basically it is a wildcard XSS for users of those caches.
There was no attack demonstrated using WebSocket framing, but it was
demonstrated using just the WS handshake followed by user-controlled
data.

The scenario here is that the attacker controls the server completely
and controls the JS code running in the client - what is to be
protected is transparent intermediaries which by their nature cannot
participate in the protocol so we cannot determine their compatibility
in order to proceed.  While there was some discussion of approaches to
limit the risk to currently discovered vulnerabilities (such as
sending a bogus CONNECT message as part of the handshake), a few
people were concerned that further vulnerabilities could be discovered
and that masking the client->server traffic prevented attacker control
over those bytes at an acceptable cost.

> [Why only client-to-server masking?]
> Why isn't masking required on server-to-client frames?

In the attack scenario, the server is under complete control of the
attacker and can send any bytes it chooses anyway.

> [Unlimited buffering with fragmentation]
> Much like with the frame length issue above, the fragmentation mechanism here seems like it imposes a
> heavy burden on the receive side.  Since the receiving client is supposed to buffer data until the end of a
> frame, it seems like fragmentation could be used to cause a receiving client to buffer a frame of indefinite size.

Obviously, an implementation will have to have a maximum size message
that it can support.  In the spec as written, the only recourse when
this size is exceed is to terminate the connection (perhaps retrying
sending smaller messages).  There have been some proposals to allow
each side to state their maximum frame and/or message sizes in the
handshake, but there hasn't been agreement to put them in the spec.

> [Why not plain sockets?]
> The introduction makes clear why this protocol is needed instead of HTTP, but not why this protocol
> improves over providing a plain socket interface.  Presumably this is because the HTTP header provides
> a space where the browser can inject trusted information?

The browser is executing code on behalf of a potential attacker, and
would not give access to raw TCP sockets to such code as that would
allow circumvention of many protections, such as scanning machines
behind a corporate firewall, for example.  If you mean just opening a
socket subject to the same origin restrictions, you would have to have
a special handshake to validate those restrictions, and you need some
framing to delineate messages since TCP is just a stream of bytes and
the API is message-oriented.  If you do those things, then you have
essentially WebSockets (of course you could solve the same problems in
different ways, but it is the same class of solution).

--
John A. Tamplin
Software Engineer (GWT), Google

Re: [hybi] Fwd: Gen-ART last call review of draft… John Tamplin
[hybi] Fwd: Gen-ART last call review of draft-iet… Peter Saint-Andre
Re: [hybi] Fwd: Gen-ART last call review of draft… David Endicott
Re: [hybi] Fwd: Gen-ART last call review of draft… Thomson, Martin
Re: [hybi] Fwd: Gen-ART last call review of draft… John Tamplin
Re: [hybi] Fwd: Gen-ART last call review of draft… David Endicott
Re: [hybi] Fwd: Gen-ART last call review of draft… David Endicott
Re: [hybi] Fwd: Gen-ART last call review of draft… John Tamplin
Re: [hybi] Fwd: Gen-ART last call review of draft… Willy Tarreau
Re: [hybi] Fwd: Gen-ART last call review of draft… Bruce Atherton