[hybi] WebSocket feedback

Ian Hickson <ian@hixie.ch> Thu, 04 March 2010 03:21 UTC

Return-Path: <ian@hixie.ch>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 6B00628C0CF for <hybi@core3.amsl.com>; Wed, 3 Mar 2010 19:21:44 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.523
X-Spam-Level:
X-Spam-Status: No, score=-3.523 tagged_above=-999 required=5 tests=[AWL=1.076, BAYES_00=-2.599, GB_I_INVITATION=-2]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hvvd7YJcREzX for <hybi@core3.amsl.com>; Wed, 3 Mar 2010 19:21:27 -0800 (PST)
Received: from looneymail-a1.g.dreamhost.com (caibbdcaaaaf.dreamhost.com [208.113.200.5]) by core3.amsl.com (Postfix) with ESMTP id 264CA3A7655 for <hybi@ietf.org>; Wed, 3 Mar 2010 19:21:27 -0800 (PST)
Received: from ps20323.dreamhostps.com (ps20323.dreamhost.com [69.163.222.251]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by looneymail-a1.g.dreamhost.com (Postfix) with ESMTP id B322C15D791 for <hybi@ietf.org>; Wed, 3 Mar 2010 19:21:28 -0800 (PST)
Date: Thu, 04 Mar 2010 03:21:28 +0000
From: Ian Hickson <ian@hixie.ch>
To: Hybi <hybi@ietf.org>
In-Reply-To: <8B0A9FCBB9832F43971E38010638454F032E566DDF@SISPE7MB1.commscope.com>
Message-ID: <Pine.LNX.4.64.1002150605580.29686@ps20323.dreamhostps.com>
References: <8B0A9FCBB9832F43971E38010638454F032E566DDF@SISPE7MB1.commscope.com>
Content-Language: en-GB-hixie
Content-Style-Type: text/css
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"
Subject: [hybi] WebSocket feedback
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Mar 2010 03:21:48 -0000

In writing this e-mail I went through all the recent threads, and tried to 
respond to every point. (I haven't quoted all the e-mails, or this one 
would be kilometers long, but I think I've responded to every substantive 
point somewhere.) In this e-mail I discuss changes that I've made to the 
opening handshake, the addition of a closing handshake, and the reasons 
for not changing some other aspects of the protocol.

On an unrelated note, it has been brought to my attention that uploading 
drafts to the IETF site requires human intervention, and that the person 
behind this process has requested that I not update the draft as 
frequently. In order to maintain the transparency of the process, I've 
switched to putting the latest drafts here:

   http://www.whatwg.org/specs/web-socket-protocol/

When I update the IETF copy (which I'll now do more rarely) I'll also 
update it to clearly point to the above draft to avoid any confusion about 
where the latest copy is.


OPENING HANDSHAKES

There are several requirements that were not met by the previous 
handshake:

1. it didn't fail quickly enough with man-in-the-middle proxies, such that 
   the server's response would be sent back in some cases even though the
   connection wasn't established

2. its cross-protocol attack protection could be made stronger

3. it was hard to implement in some existing http stacks 

To address these, I've made some changes to the protocol.

To address #1, I've moved some of the handshake to the WebSocket 
connection, after the headers.

To address #2, I've introduced some of the ideas various people have 
suggested, such as requiring the server to prove that it has read the 
handshake, and designing the protocol such that it is impossible to 
include some of the unique elements using features that the Web platform 
supports. (For example, using Sec-prefixed headers.)

With the changes made for #1 and #2, I've been able to relax the parsing 
requirements, which should hopefully address #3.


On Wed, 3 Feb 2010, Maciej Stachowiak wrote:
> 
> According to a security expert I asked, the strongest defense against 
> cross-protocol attacks is in the first few bytes. So I think removing 
> all requirements on the status-phrase would make the protocol less 
> robust against cross-protocol attacks.

This is true.

For the blind attack case (causing a non-WebSocket-protocol client service 
to cause a WebSocket server to change state), I've tried to mitigate that 
issue by splitting the "proof" part of the handshake into three separate 
parts of the client handshake, and by making use of certain features that 
can't be emulated from the HTTP features of Web browsers, but it's 
certainly possible that these changes have actually made it easier for 
other protocols to be used to attack WebSocket servers. Fundamentally I 
don't know what we can do about this while remaining HTTP-compatible; the 
ideal solution, namely making the handshake appear to be essentially 
noise, is basically a non-starter if we have to work with HTTP.

For a WebSocket-to-other attack, where the goal is to connect to a remote 
server using the WebSocket API, the new handshake really does make it 
quite hard to achieve passively. You'd need to know what the client is 
expecting back, which means seeing the handshake and causing the server to 
somehow respond with the right MD5 fingerprint. I don't know of a way to 
do that without control of the server and the network. I wouldn't know how 
to defend against an attacker who already controls the network, even with 
the first few bytes being fixed.


On Wed, 3 Feb 2010, Jamie Lokier wrote:
> 
> The real trouble is how will a WebSocket/1.1 client interoperate with a 
> WebSocket/1.0 server, without trying, failing, then starting another 
> connection to try again which is wasteful and not entirely reliable.

The WebSocket protocol is (by design) not versioned. New features can be 
added by opt-in options, in a manner very similar to how the 
Sec-WebSocket-Protocol header is sent.



FRAMING

On Wed, 3 Feb 2010, Jamie Lokier wrote:
> Greg Wilkins wrote:
> >   + Why have two framing techniques when binary is sufficient to carry
> >     everything.
> 
> I agree.  While I acknowledge the argument that
> 
>     print length($text), $text
> 
> is an invitation to do the wrong thing in some languages, I think that 
> there are more ways that 0xff can lead to the wrong thing, because so 
> many languages pass around "nominal UTF-8" which is not guaranteed to be 
> UTF-8, making 0xff delimiting unreliable too.

It's true that if the server doesn't handle UTF-8 correctly, it can end up 
outputting 0xFF bytes. In practice, this would need some out-of-band 
erroneous data (e.g. an ISO-8859-1 form submission); you couldn't trigger 
this bug easily by sending data to the server and having it return it 
later, for instance. The length bug, on the other hand, could occur with 
no external data: sending non-ASCII data could very easily result in the 
server screwing things up if we use lengths rather than delimiters.

For scalability it's probably ideal if we can use lengths, but on the long 
run for environments where that matters we'll probably just use compressed 
frames which would be binary anyway (and thus length-encoded), so this 
will probably become a non-issue in that kind of environment.


> >   + Who controls allocation of the frame type byte?  So far every
> >     suggestion of usage for that (eg a bit to indicate that the
> >     frame contains meta-data headers) has been rejected.  So are
> >     binary users simply to pick their own bytes and hope for no
> >     collisions?  Will IANA eventually allocate values?  is 7 bits
> >     enough?
> 
> There will be no collisions for frame bytes which depend on the 
> sub-protocol name, as those frame bytes are privately agreed between 
> client and server.

If a client and a server are speaking a specific sub-protocol, they don't 
actually have to even use WebSockets -- they can just use a protocol that 
happens to look like WebSockets but defines whatever frame types they 
want.

If the client is a browser, and thus they're speaking generic WebSockets, 
then the frame types would be just those supported by the API, and there 
wouldn't be any custom types. So extensions would just be "registered" by 
revving the protocol.


> > [...] users can't be trusted to always provide valid utf-8 data, so if 
> > user data is not validated then sentinel encoding allows frame 
> > injection attacks.  After all we have learnt with HTTP, it seams silly 
> > to repeat the mistake of a protocol that is exposed to such attacks
> 
> If you have several bits of code sharing a connection - even if it's 
> just by sharing a common Javascript framework on a single page, then you 
> have security issues from this.

You certainly have security issues, but you don't have this issue (user 
data containing 0xFF), because the protocol requires the browser to make 
the data be valid UTF-8, and so you'll never be able to inject a 0xFF byte 
from the client.

Of course, if the server has other sources of data, and it doesn't 
validate them, then it's possible you'll have this problem in the other 
direction.


> If nowhere else, this should be in the SECURITY CONSIDERATIONS section 
> of the draft.

The spec does actually mention this in the security section:

# The biggest security risk when sending text data using this
# protocol is sending data using the wrong encoding. If an attacker
# can trick the server into sending data encoded as ISO-8859-1
# verbatim (for instance), rather than encoded as UTF-8, then the
# attacker could inject arbitrary frames into the data stream.

I'm happy to elaborate on this if you think it should be changed.


On Mon, 15 Feb 2010, Thomson, Martin wrote:
> 
> 1. The ABNF for "the wire protocol as allowed by this specification" 
> does not permit binary frames.

That is correct. There are no binary frames in this version of the 
protocol.


>      frame         = text-frame / binary-frame
>      binary-frame  = %x80 length *%x00-FF
>      length        = %x00 / %x01-7f / ( %x81-FF *%x80-FF %x00-7F )
> 
> This is a canonical length encoding in that it doesn't allow for leading 
> zeroes.

Leading zeros are not disallowed. (There wouldn't be much point 
disallowing them as far as I can tell.)


> 2. Value range alternatives (Section 3.4 of RFC 5234) should not include 
> a second "%x".

Fixed, thanks.


> 3. The definition for text-frame in the "the wire protocol including 
> error-handling and forward-compatible parsing rules" could be readily 
> simplified to
> 
>     text-frame   = %x00-7F *%x00-FE %xFF

Fair enough.


> I assume that %x80-7E, as included, is in error and %xFE was the 
> intended end of this range.

Oops, yes, thanks. Fixed.



CLOSING HANDSHAKES

As I see it, there are several use cases here that are important:

1. Server is done, and wants to close the connection, but doesn't want to 
miss any messages from the client, and doesn't want the client to miss any 
messages that the server just sent. The server might be done because it 
needs to shut down and have the client reconnect to some other server, or 
because e.g. the server is a game server and the game is finished, but the 
client might still send some user profile updates that need saving.

2. Server is done, and doesn't care about whether the client receives any 
of its messages or whether it receives any of the client's messages. e.g. 
the server could be sending regular stock ticker updates and the only 
possible messages from the client might be to decide which stocks to 
return.

3. Client is done, and wants to close the connection, but doesn't want to 
miss any messages that the server sent, and wants to ensure the server has 
had a chance to receive all the messages the client sent. The client might 
be done because e.g. the server is an IM server, and the user asked to go 
offline but the server might still send some configuration information 
e.g. the host to use when reconnecting.

4. Client is done, and wants to close the connection, and doesn't care 
about any pending messages either way. For example, the user closed the 
tab that was using the socket.

5. Client is done, and wants to close the connection, but wants all its 
messages to make it to the other side safely. It doesn't care about 
incoming messages from the server. (For example, the script called 
.close() or unregistered its event listener or the page unloaded.)

6. Neither server not client is done, but the connection terminates 
anyway. The client wants to reconnect but wants to make sure the 
connection picks up where it left off.

Use cases #2 and #4 are trivial. The server doesn't have to do anything 
special to handle them -- for #2 it can close the connection willy nilly, 
and for #4 it doesn't need to do anything when the connection is closed 
except close it.

Use case #3 can either be handled by the browser having an API that means 
"start closing but don't actually close yet, just don't accept to send any 
more messages", or by the application layer sending a message to that 
effect. It seems like having an API to do this is redundant, since it's so 
trivial to just encode this into the application-layer protocol (basically 
by asking the server to close the connection).

Use case #6 requires some processing overhead to track which messages have 
been received, which won't always be desired. In particular, I don't think 
we can make it automatic without requiring either all servers to implement 
it (which seems like a non-starter) or making it optional. Making it 
optional suggests leaving it to a future version, so that we get the 
basics right first. It would be relatively easy to add this as an option 
-- just have UAs that support it say so in their handshake, and servers 
that want to opt-in say so in _their_ handshake, and if both parties said 
it, then modify the framing accordingly. For example, maybe when the 
option is enabled, all messages are preceeded by a frame giving a message 
number, and periodically a message is sent back reporting the last message 
seen. We could add that relatively easily.

That leaves #1 and #5.

Both use cases are relatively easy to do at the protocol level if one side 
is always responsible for starting the connection shutdown. It gets more 
exciting if both peers are allowed to start the shutdown. Case #5 in 
particular is interesting, because it would be reasonable to assert that 
the API should guarantee that when you .send() some text and then .close() 
the WebSocket object, the server should be given a chance to receive the 
text.

I think the net result is that ideally .close() should map to sending a 
FIN, aka using shutdown(SHUT_WR). Unfortunately this is not always 
possible -- people cited numerous examples of cases where we can't rely on 
TCP semantics in practice -- so we probably want to simulate this at the 
WebSocket layer or the application layer. Since the JS API exposes a 
.close() method, we should probably expose it at the WebSocket layer.

This suggests that each side should be able to tell the other "I'm done 
sending". When it receives the "I'm done sending", or when it gets a FIN 
(recv() == 0), it can close the connection as soon as it's down sending.

I've now added this to the spec, using a simplified version of some of the 
ideas people put forward that involved a new frame.

As far as I can tell, it works fine if both peers decide to terminate at 
the same time; it's not dependent on who is the close initiator. It also 
handles anyone who can't do half-closes, since that is essentially an 
optional part of the protocol (with no black-box detectable effects at the 
Web Socket semantic layer).


On Tue, 9 Feb 2010, Greg Wilkins wrote:
> 
> if we have orderly close and error close, then
> it would be good to indicate some status in the onclose
> call back, so a websocket user/framework can handle
> differently the cases of - a) can't connect via network
> b) can connect, but permission denied c) was connected
> but got disorderly disconnected d) was connected and
> got orderly shutdown.

Done.



ISSUES OF EDITORIAL STYLE

I've made a few concessions regarding the editorial style, based on some 
of the more commonly raised issues: I've moved away from listing byte 
sequences, and have instead, where appropriate, listed things using their 
UTF-8 equivalent; and I've included more rationale in the introduction, to 
explain some of the design decisions.


On Mon, 1 Feb 2010, Justin Erenkrantz wrote:
> On Mon, Feb 1, 2010 at 3:48 PM, Ian Hickson <ian@hixie.ch> wrote:
> > On Fri, 29 Jan 2010, Justin Erenkrantz wrote:
> >>
> >> Again, this is why the current draft is so impenetrable (to me) since 
> >> it expects that the only person implementing the draft is a client 
> >> vendor using synchronous socket methods...which sort of defeats the 
> >> purpose when you are trying to write a fully async client...or any 
> >> type of server.
> >
> > Could you elaborate on this? I looked for text that assumed 
> > synchronous socket methods but couldn't find any. Could you quote the 
> > offending text?
> 
> All of them are...such as:
> 
> ---
>           Run these steps.  If at any point during these steps a read is
>           attempted but fails because the Web Socket connection is
>           closed, then abort.
> 
>           1.  Let /length/ be zero.
> 
>           2.  _Length_: Read a byte, let /b/ be that byte.
> 
>           3.  Let /b_v/ be integer corresponding to the low 7 bits of
>               /b/ (the value you would get by _and_ing /b/ with 0x7F).
> 
>           4.  Multiply /length/ by 128, add /b_v/ to that result, and
>               store the final result in /length/.
> 
>           5.  If the high-order bit of /b/ is set (i.e. if /b/ _and_ed
>               with 0x80 returns 0x80), then return to the step above
>               labeled _length_.
> 
>           6.  Read /length/ bytes.
> 
>           7.  Discard the read bytes.
> ---

I don't understand why this implies synchronous socket methods. The spec 
explicitly says "Conformance requirements phrased as algorithms or 
specific steps may be implemented in any manner, so long as the end result 
is equivalent". There's nothing that requires that you implement the above 
using synchronous socket methods, so long as the end result is the same. 


On Wed, 3 Feb 2010, Jamie Lokier wrote:
> 
> Looking at the above, it would be quite a bit clearer to talk about 
> reading the end of the input (EOF, TCP FIN) than "if the connection is 
> closed", as the latter has several ambiguous meanings, and you don't 
> actually want to abort parsing when you know the socket is closed, if 
> you still have bytes to read from your buffer

It doesn't say to abort when the connection is closed; it says to abort 
when a read fails because the connection is closed. As far as I can tell, 
tha doesn't have the same ambiguities. In any case, hopefully some of the 
other changes made due to the closing handshake chages have resolved any 
other similar ambiguities.


On Tue, 2 Feb 2010, Greg Wilkins wrote:
> >> 
> >> The length encoding currently allows for a length of 0x80 0x80 0x80 
> >> .... to be sent forever.  This is a nonsense length, but could be 
> >> used for DOS attacks on servers.  I think the 0x80 value should be 
> >> explicitly defined as an error if given as the first byte of a 
> >> length.
> > 
> > How would this be different than sending 0x81 forever?
> 
> Sending 0x81 forever should also be caught, but by the server detecting 
> an overflow of the accumulating length byte.

Ok, how is it different from just sending infinite text in a 0x00 frame? 
Or, in HTTP, sending an HTTP header with an infinite value?


> Again, the point is - if the algorithm style is meant to convey all the 
> error handling, then it needs to be thorough.

That's not an error, just an inefficient way of encoding the length.

The point isn't to catch ever error, but that the processing be defined 
for all inputs. It doesn't matter how errors are handled, so long as they 
are handled.


On Tue, 2 Feb 2010, Justin Erenkrantz wrote:
> On Mon, Feb 1, 2010 at 6:32 PM, Ian Hickson <ian@hixie.ch> wrote:
> > 
> > It may disconnect, but it doesn't have to (there's no "must", it's a 
> > "may" -- the spec uses RFC2119 terminology). So there's no guarantee 
> > that the type is 0x00 at step 3.
> 
> If, as you dictate in your reply, code up *exactly* what it says - you 
> must ignore the "may" (as you don't provide the conditions to satisfy), 
> then the server never disconnects.

"may" means that it is allowed, there are no conditions to satisfy.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'