[hybi] WebSockets feedback

Ian Hickson <ian@hixie.ch> Wed, 14 April 2010 22:58 UTC

Return-Path: <ian@hixie.ch>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 564663A6AAE for <hybi@core3.amsl.com>; Wed, 14 Apr 2010 15:58:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.601
X-Spam-Level:
X-Spam-Status: No, score=0.601 tagged_above=-999 required=5 tests=[BAYES_50=0.001, J_CHICKENPOX_34=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JRMG40Wa2Xbe for <hybi@core3.amsl.com>; Wed, 14 Apr 2010 15:58:41 -0700 (PDT)
Received: from looneymail-a4.g.dreamhost.com (caibbdcaaaaf.dreamhost.com [208.113.200.5]) by core3.amsl.com (Postfix) with ESMTP id B05323A69F4 for <hybi@ietf.org>; Wed, 14 Apr 2010 15:58:41 -0700 (PDT)
Received: from ps20323.dreamhostps.com (ps20323.dreamhost.com [69.163.222.251]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by looneymail-a4.g.dreamhost.com (Postfix) with ESMTP id 380DB83DB for <hybi@ietf.org>; Wed, 14 Apr 2010 15:58:35 -0700 (PDT)
Date: Wed, 14 Apr 2010 22:58:34 +0000
From: Ian Hickson <ian@hixie.ch>
To: Hybi <hybi@ietf.org>
In-Reply-To: <p2o3d5f2a811003310031x5dce7e9cs86a5a8981cd23c1d@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.1004140032040.875@ps20323.dreamhostps.com>
References: <B578CFED7FE85644A170F4F7C9D52692019544C5@ESESSCMS0361.eemea.ericsson.se> <3d5f2a811003150230sdeb4f78hbdece96e5c742cfc@mail.gmail.com> <de17d48e1003180316w3dda1a3fo7db8b357925ec3f8@mail.gmail.com> <p2o3d5f2a811003310031x5dce7e9cs86a5a8981cd23c1d@mail.gmail.com>
Content-Language: en-GB-hixie
Content-Style-Type: text/css
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"
Subject: [hybi] WebSockets feedback
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 14 Apr 2010 22:58:46 -0000

This e-mail has replies to various e-mails sent to the list in the past 
few weeks that I had put aside to reply to. The changes referenced in this 
e-mail are just to the editor's working copy (and the WHATWG copy) of the 
Web Sockets draft, and are not an attempt at making any proposals for 
working group consensus at this time. There's nothing especially important 
in this e-mail, it's mostly just discusson of goals and priorities and 
some minor (mostly editorial) changes to the draft.


On Thu, 4 Mar 2010, Greg Wilkins wrote:
> 
> I hope that this type of mass update is a once off due to the 
> circumstances we find in starting off the WG.

I am open to whatever working style people prefer; in the past people have 
indicated to me that they much prefer when I do bulk replies so they can 
get up to date on all the issues at once rather than having me post 
separate messages to each thread (which has been described as "spamming" 
the group).


> In general I think it would be far easier handle changes in smaller 
> increments and individual threads. Also it would be good to see proposed 
> diffs to the draft before they are actually just put in the draft.  

The changes are available in diff form from the HTML5 Revision Tracker:

   http://html5.org/tools/web-apps-tracker

The source document is in Subversion if you want specific versions:

   http://svn.whatwg.org/webapps/source

When it comes to editing the group's document I'm happy to work in 
whatever way is preferred by the chairs.


> If the handshake messages are to contain content, then they MUST have 
> headers to indicate the content length to be legal HTTP: in this case, a 
> Content-Length header would be appropriate

The idea here is that the 8 random bytes are part of the post-Upgrade Web 
Socket connection, not the GET request, so that any intermediaries will 
fail to send the data along if they do any interpretation, thus failing 
early. If we included them in the Content-Length then unfortunately the 
early failure mode of intermediaries would be lost.


> Considering that we are already shipping products with websocket 
> implementations from the existing draft, can we specify a transition to 
> the new handshake in an appendix.

I'm not sure what you mean; these are all highly immature drafts, I 
wouldn't expect anyone using any of them to expect them to be in any way 
reliable.


> Ie while the standard is under development connections without
> websocket keys MAY be accepted, but implementation SHOULD warn
> that support for such connections is deprecated.

While the standard is under development, any connections are going to be 
purely experimental, so it doesn't seem especially important to have 
conformance criteria specifically for them.


> The text of 1.3, still kind of implies that the HTTP fields must be 
> ordered as per the spec.  Can we add a sentence to say that header 
> ordering may be different.

Quite early in that section it says:

   Fields in the handshake are sent by the client in a random order; the
   order is not meaningful.


> Also I'm not sure this sentence is really that clear:
> 
>  "Additional fields are used to select options in the WebSocket
>    protocol.  The only option available in this version is the
>    subprotocol selector, |Sec-WebSocket-Protocol|:"
> 
> I thought we were going to allow arbitrary extra headers, but that their 
> interpretation was not defined by the ws spec other than the optional 
> Sec-WebSocket-Protocol.  We still need to allow headers for cookies, 
> authentication etc.

Well obviously we wouldn't want to allow arbitrary proprietary fields, 
since, by definition, those would be proprietary and thus not 
interoperable. The only legitimate use case I can think of for such 
extensions would be experimentation, but we don't need those to be 
conforming since experimentation is by its very nature done in controlled 
environments and not expected to interoperate.

I've added "Cookie" to the list, though. As you say, that one is relevant 
in this version.


On Fri, 5 Mar 2010, Thomson, Martin wrote:
> >
> > For this reason, and because it is generally much easier, 
> > sentinel-based framing is used for text frames.
> 
> A lot of accommodation has been made for this mythical brain-dead 
> programmer.

As any Web browser vendor can tell you, he's not so mythical. The antics 
of such authors are so common that the term "tag soup" was coined to 
describe how such authors manhandled HTML.


> Build an "idiot-proof" system and you'll just breed a more 
> virulent strain of idiot.

I don't think anyone is arguing that the proposal is idiot-proof; only 
that amateurs are an important consideration.


> And, as Greg continues to point out, easier is subjective.  For some 
> value of easier, you also get injection attacks and other wonderful 
> things.

I agree that it is subjective. That doesn't mean we should ignore the 
problem, however.


> > From the Web Socket point of view, it shouldn't fail for any size.
> 
> There's a marked difference between "designed for large files" and 
> "suitable for large files".  I personally don't get the use case.  If I 
> want to transfer a large file, it seems obvious to me that indirection 
> is the right solution.  Put a URI in your WS message and let the other 
> end fetch the data.

It's not clear to me what URI you would provide if you are wanting to 
upload (from the client) a 4GB video file, though I agree that it may 
well be wise to use XHR instead of Web Socket to do so.


> One reason you don't want to overload protocol use is that you might 
> give WS traffic a higher priority; file transfers usually get lower 
> priority.  Dumping them all in the same stream leaves you with no way to 
> treat these things separately.

Indeed. That's somewhat academic though since currently there's no way to 
send binary files at all in the API.


> > > > Leading zeros are not disallowed. (There wouldn't be much point 
> > > > disallowing them as far as I can tell.)
> > >
> > > Yes, because multiple representations of the same values has *never* 
> > > caused a security issue.
> > 
> > On the contrary, it's often the source of problems. However, in this 
> > case, I don't really see how it could be. Do you have an attack model 
> > in mind?
> 
> Establishing a covert channel using leading zeroes on each frame might 
> be fun.

Given that you control the client and the server, why would you need such 
a complicated channel? Just use Web Socket.

There's no way to either send or receive these leading zeros from the API 
anyway, so I don't see how you could use this. (If you just wanted to send 
data from a command-line app, you might as well just use raw TCP instead 
of using an obscure part of Web Sockets.)


> Causing a buffer overrun might also be possible.  Assume that an 
> implementation assumes that it can reject a frame when its size gets 
> above a certain threshold.  That implementation supports 10K frames.  A 
> bad implementation incorrectly assumes that the size is never going to 
> take more than two octets, so they only read a few to start:
> 
> Byte[] octets = readSomeOctets();
> Int size = 0;
> for( int i = 0; octets[i] & 0x80 == 0x80; ++i) {
>    size = size << 7 + (octets[i] & 0x7f);
>    if (size > 10000) { throw error; }
> }
> 
> (Yes, that's terrible code.)

Yup. It's not clear to me what you're proposing instead, though. Short of 
just using a fixed-length length field, which I would consider short- 
sighted given the way that network abilities grow over time, I don't see a 
good way to avoid this problem.


> > More importantly, I don't really see what our options are here.
> 
> You can stop insisting that implementers are stupid or lazy.  While 
> there's plenty of evidence to the contrary, there's also evidence that 
> stupid or lazy implementations don't live long enough to thrive.

I see no evidence that the "stupid or lazy" programmers "don't live". 
Quite the contrary, it seems to me that most of the Web is built from 
amateurs -- the long tail is almost all amateurs and I see no reason why 
we should ignore them, especially not just for our convenience.


> If you are assuming that people are idiots, it might be worth pointing 
> out that calling read() on a TCP socket doesn't return an entire frame 
> always.

As the spec is written, such an assumption is harmless, since the spec 
leads an amateur towards reading one byte at a time.


> I've seen code that assumes that one call to read() corresponds to a 
> write() on the other end. When two messages are written close enough 
> together that they get bundled into the same packet, the second message 
> is lost; when a write is too big to fit into the same packet, the tail 
> of that message is lost.
> 
> How far do you want to go?

Pretty much exactly as far as we have gone.


> > I don't see what else we can say with respect to handling DOS or 
> > overflow attacks.
> 
> We can explain how someone might detect such an attack.  We can explain 
> what they might do to mitigate the damage.

I'm happy to add such text; do you have any concrete suggestions?


On Fri, 5 Mar 2010, Greg Wilkins wrote:
> 
> I know you respond to a lot of feedback... but continually merging it to 
> be under a single "Feedback" thread is really not conducive to 
> discussion.  If feels more like you are handing down your decisions from 
> the mountain... and I think contributes somewhat to the excessive heat 
> on this list.

I understand that it may appear that way; as noted, though, other people 
have in fact encouraged me to use this style, so unless I hear otherwise 
from the chairs, I will continue to use this style.


On Fri, 5 Mar 2010, Greg Wilkins wrote:
> > On Wed, 24 Feb 2010, Arman Djusupov wrote:
> >> We currently use 0x81 as the identifier of the first and subsequent 
> >> frames, and 0x80 as the identifier of the last frame. When a message 
> >> is contained in a single then it starts with 0x80.
> >
> > Please don't use frame types that aren't defined yet.
> 
> I thought that a sub protocol was free to use the frame type bytes.

No? What suggests that?


> The frame type byte is defined in the protocol, but with a meaning 
> allocated to only 1 bit - leaving the other 7 available for 
> subprotocols.

Oh good lord, no, the frame type is intended to be reserved for future 
revisions of the protocol. The spec is very precise about what can be 
sent, and currently the only frame types that the spec says a peer can 
send are 0x00 and 0xFF. There'd be no point sending anything else since 
the client implementations would ignore them.

(Some people might want to use a Web Sockets-like protocol, that isn't 
Web Sockets but looks a lot like it, for dedicated client-to-server 
communications unrelated to the Web Socket API. For those, this spec is 
irrelevant, though. They would just write their own spec that happened to 
look a lot like Web Sockets.)


> >> However, I'd be inclined to split the reservation into "reserved for 
> >> future Web Socket revisions and approved extensions" versus "reserved 
> >> for applications to use as they want".
> >
> > Since the API doesn't expose the frame types at all, I don't 
> > understand why any would fall into the second category.
> 
> Ian, I think you are not considering one of the most likely implementors 
> of subprotocols - browsers!
> 
> A subprotocol that provides one or more of: multiplexing, compression, 
> fragmentation, etc would be most useful if implemented transparently 
> below the websocket API by the browsers themselves.
> 
> I certainly do not think that frame bytes should be exposed to the js 
> application, as they are primarily a transport concern and [sic]

I would fully expect us to extend the protocol in the future with more 
features, but that would just be part of the Web Sockets protocol, it 
wouldn't be UA-defined additional frame types.


> > If you're exposing a Web Socket server for non-browser UAs, just make 
> > the client send something in the handshake that opts in to using a 
> > different protocol (that looks very similar to Web Socket, but isn't), 
> > and then this whole spec becomes irrelevant, and you can use whatever 
> > mechanisms you want.
> 
> We don't want an infinite variety of websocket like protocols. One of 
> the huge challenges for websocket is to get the intermediaries and other 
> network appliances to work well with websocket.

People are using BitTorrent, Tor, IMAP, etc, today. Why would future 
non-Web-browser client-side protocols be any different?


> Once that is done, then ideally any subprotocol variation can be 
> transparent to intermediaries and they will not need to be updated as 
> new and interesting uses for websocket are invented.

Well certainly the "infinite variety" of protocols you speak of could look 
enough like Web Socket that that would work, but I really don't think that 
should be a concern for this working group.


> I think this is the acid test of what should be in the base protocol and 
> what should not.  Ie the base protocol should not have any feature that 
> can be subsequently implemented without affecting intermediaries.

Why isn't TCP the base protocol?


> > On Thu, 4 Mar 2010, Dave Cridland wrote:
> >> I'd be happy with, at this point, mere reservation of a range for 
> >> protocol purposes, and leaving a range clear for subprotocol usage.
> >
> > I don't understand how a subprotocol would ever make use of these 
> > frame types. The API doesn't expose them.
> 
> I think it is really important that we all come to an understanding of 
> how subprotocols are going to work.  It's not sufficient to boot a whole 
> bunch of requested features "to be implemented in subprotocols", but 
> then every time somebody proposes how a subprotocol could work they are 
> told that they can't do that because it can't be implemented in the 
> javascript API.

I don't understand what is unclear here. Web Socket exposes a mechanism 
whereby strings of text are sent client to server or server to client. 
Connections are identified by a resource name. Subprotocols therefore have 
that to work with.


> I know you think that application programmers will write absolutely 
> everything except the browser. Well good for you and I wish you well.
> 
> But can you also respect the position that many of us want to continue 
> to stand on the shoulders of giants and reuse software developed by 
> others.
> 
> I think it is entirely reasonable to expect that websocket extensions 
> and subprotocols will be implemented by browsers and server side 
> frameworks and provide a whole range of transport features hidden from 
> the application developer.

Why would that happen separate from this working group?


On Fri, 5 Mar 2010, Greg Wilkins wrote:
> Ian Hickson wrote:
> > The HTTP auth headers aren't currently added to the Web Socket 
> > connection; this was briefly in the spec but was taken out based on 
> > implementor feedback a few months ago.
> 
> I think this is shortsighted and you are disabling an existing 
> authentication mechanism without providing an alternative.
> 
> The upgrade request should be a standard HTTP request and should be able 
> to be authenticated as any other HTTP request.

In practice HTTP requests are authenticated by cookies, which are allowed.


> > On Thu, 4 Mar 2010, Greg Wilkins wrote:
> >> But the problem with this is it assumes success and that a 101 is the 
> >> only possible response.
> > 
> > Why is that a problem?
> 
> I explained the problem in the next paragraph!
> 
> This "solution" will prevent the use of HTTP return status codes. The 
> server will only have the option of sending a 101 or closing the 
> connection.
> 
> Even if you don't accept the example potential uses I've outlined, it 
> strikes me as shortsighted to disable yet another standard mechanism in 
> the name of protecting against some phantom menace.

I don't think that's the reasoning... The reasoning is simply that it 
isn't needed. If you want to use HTTP, the HTTP spec already exists. 
There's no need for Web Sockets to be involved _unless_ you actually 
connect to a Web Socket server. If you're just doing HTTP-to-HTTP, the 
Web Sockets spec is irrelevant.


> >> If the bytes after the request header are sent as request content, 
> >> what attack verctor is opened up?
> > 
> > I think Maciej described the cross-protocol attack in detail last 
> > month, that's probably the best description of the problem I've seen.
> 
> You are misrepresenting this.
> 
> Maciej described an attack vector that is easily addressed simply by 
> having a unique ID in the headers that the server needs to include in 
> the response. If the unique ID is generated by the browser, then it will 
> not even be in existence when any injection attack is formulated and 
> thus we are safe unless the ID is predicable.

The key is protecting against HTTP client to Web Socket server attacks, 
e.g. using XHR to send fake Web Socket requests to the server, that trick 
the server into performing actions that were not requested and that, had 
the client actually been a Web Socket client, would not have been done, 
since the connection would not have been accepted. If the key was only 
sent as a single header, then a simple server-side implementation would 
just match a regexp against the entire request and would easily be tricked 
by header-like text smuggled into the path. By having two Sec-prefixed 
headers, requiring that the headers contain one or more spaces, requiring 
that the server look for a newline to terminate the header, and requiring 
that there be text after the request, it becomes extremely hard to cause a 
client to send everything needed to the server to cause any harm.

(We're only worried about Web browsers doing this because they have the 
user's cookies. Direct connections from an attacker aren't a problem since 
they do not have any ambient authority.)


> The issue the random bytes after the GET request and after the 101 
> response are trying to handle is that of fast fail.  While fast fail is 
> a desirable attribute, there has been no rigorous explanation of how 
> this "solution" achieves that - nor was there any discussion of 
> alternative ways this could be achieved.

It seemed pretty obvious... intermediaries that don't recognise Web Socket 
would consider the eight bytes part of an unrelated request and would thus 
fail to parse them, rather than sending them as part of the first request.


> There has been no evidence or argument presented that this "solution" 
> actually provides fail fast semantics.

Agreed; this is merely hypothetical at this point. Hopefully we will be 
able to test the design before the spec is done.


> Yet this speculative "solution" has been added to the spec at the cost 
> of disabling the possibility of using the established HTTP response.

Not sure what this means.


> There are a significant number of response codes that may be of value in 
> the handshake, either now or in the future. While there may be some 
> problems with using some of them, those issues should be identified and 
> discussed - not wholesale disabling of the possibility to use them.

I don't think anyone is suggesting we change HTTP. If an HTTP server 
receives a request, it is perfectly legitimately allowed to respond using 
HTTP. Same with an HTTP client.


On Fri, 5 Mar 2010, Jamie Lokier wrote:
> Ian Hickson wrote:
> > length measurement wrong without realising it (because of only testing 
> > ASCII, but outputting the string length instead of the byte length). 
> > For this reason, and because it is generally much easier, 
> > sentinel-based framing is used for text frames.
> > 
> > In practice this means that most authors need only implement the 
> > sentinel framing (which is trivial); only more advanced (and thus 
> > competent) authors will need to implement the more complicated 
> > framing.
> 
> What about "UTF-8 text" that incorrectly, but in reality, contains 0xFF 
> bytes?  This can happen in most scripting languages unintentionally, for 
> example a script which reads lines from a file that "should" be in UTF-8 
> (but really contains some 0xff bytes) and sends them as messages, will 
> result in a frame injection error.

Yes, that is indeed a concern. The spec mentions this explicitly actually.


> That's even more likely for, say, reading a directory from a filesystem 
> where UTF-8 is the standard name encoding, and sending those names in a 
> message.  Except... in reality, it's easy for someone to subvert it by 
> putting something non-UTF-8 in there.

Indeed, if there's any non-Web Socket way of getting data into the app, it 
may be possible to get an invalid 0xFF in.


> > On Thu, 18 Feb 2010, Jamie Lokier wrote:
> > > But that ignores something: An endpoint shouldn't be sending frame 
> > > types that the other end doesn't know about.  So there is no need to 
> > > specify how to discard binary frames, unless there is a general 
> > > principle of discarding frames with unknown type byte too.
> > 
> > There _is_ a general principle of discarding frames with unknown type 
> > bytes.
> > 
> > We need to make sure that today's clients don't handle tomorrow's 
> > servers in unpredicatable ways, because if they do, then we might 
> > never be able to upgrade the protocol, due to Web pages depending on 
> > particular behaviours, the same way that, for example, many Web pages 
> > depend on browsers doing browser sniffing, or on browsers ignoring 
> > Content-Location headers.
> 
> But that's what the subprotocol negotiation is for.  We should encourage 
> it's use, or spell out the circumstances in which it's useful for an 
> endpoint to speculatively use frame types that the other end might not 
> recognise.

I don't see how subprotocol negotiation would be relevant here.


> > > One thing that *really* needs to be right early on is making sure 
> > > future proxies know the frame boundaries, for efficient forwarding.
> > > 
> > > If the spec didn't describe binary frame delimiting, we could find, 
> > > in a few years, that we'd have no choice but to use UTF-8 for 
> > > everything, even having to encode binary messages into Unicode text, 
> > > simply because some parts of the net would have proxies parsing 
> > > WebSocket frames and failing to forward anything else properly or 
> > > with useful timing/buffering.
> > > 
> > > That's not stated in the spec, but it's an implied consequence of 
> > > the binary frame rule being described that WebSocket-aware proxies 
> > > will probably apply it.  Perhaps it should be explicit.
> > 
> > I don't see why we'd want any client-side proxies other than SOCKS 
> > proxies or HTTP proxies (via CONNECT), and on the server-side it seems 
> > that upgrading dedicated Web Socket-aware proxies would be relatively 
> > easy in this kind of scenario.
> 
> It's not about what you/we want - it's what we'll get *anyway*, because 
> (a) other people will insist on firewalling your WebSocket connections, 
> to make sure you can only send non-pornographic text messages or 
> whatever, and (b) there are significant transport optimisations possible 
> using proxies, so they will be deployed.
> 
> I'm already finding that I have to write a client-side proxy - to merge 
> keepalives from multiple WebSocket conections, and reduce the number of 
> TCPs to a managable level for a slow network.  I anticipate installing 
> it as a site-wide TCP-intercepting proxy.  That's an example of 
> transport optimisation using a proxy - invisible to the actual clients 
> and servers, and they can't avoid it.  By the way, you mentioned 
> "author-friendly" elsewhere as a criterion.  This is very author 
> friendly because the endpoints don't know or care.
> 
> If I'm already looking at this at this early stage, then I won't be the 
> only one doing it when it's widely deployed.

I don't really understand what you're asking for here. Are you suggesting 
we should define a third conformance class in the spec for 
man-in-the-middle proxies? I suppose we could do that. Would it need any 
requirements other than passing through the bytes unchanged?


> If I send the frame length 2^128, so that I can follow it with any 
> amount of binary data (effectively turning the connection into a byte 
> stream), I am sure that some future implementation, perhaps a proxy, 
> will treat it as zero and then break.

Well there's no binary sending or receiving mechanism at all right now, so 
it's not yet clear how it'll work, but I would presume that to send it'll 
merely take a File or Blob object or whatever JS binary type is made, and 
send it, and to receive it'll wait for the whole frame and then create a 
Blob or whatever JS binary type is made, and fire a single event. So you 
wouldn't be able to stream things in a single frame; the API wouldn't 
expose it that way.

(If you're doing that in a non-browser setting, i.e. you've made an API 
that is specifically for non-browser clients, then the Web Sockets spec is 
irrelevant -- you can just make your own protocol that looks as close to 
Web Sockets as you want, and just call it your own protocol. No need to 
worry about the Web Sockets spec.)


> There is also the open question of what buffering and forwarding 
> behaviour is expected, both of proxies, and of receivers.  If I send a 
> large frame incrementally (i.e. by writing bytes), is it permitted to 
> buffer it with unlimited delay until the end of the frame, or must the 
> received bytes be passed on to the application, or the next hop in the 
> case of a proxy?

For the client (using the API), the data is exposed as an event, so 
there's no choice.

For the server, there doesn't seem to be any particular practical 
difference from an interoperability perspective, it's just an 
implementation detail.

For a man-in-the-middle proxy, either behaviour can naturally occur due to 
regular network weather, so I don't think there's much point requiring one 
or the other.


> > On Wed, 24 Feb 2010, Arman Djusupov wrote:
> > > 
> > > I have a use case which doesn't seem to have been taken into account 
> > > in the spec. When large binary messages of initially unknown size 
> > > are getting streamed over a connection, the transport does not know 
> > > the final size of the binary message when it starts serializing it 
> > > and so it cannot encode the message's length at the start of the 
> > > binary frame. In our implementation we handle such cases by 
> > > buffering data until the buffer is full and then flushing it into a 
> > > frame of specific length; subsequent data of the same message are 
> > > sent in the following frames.
> > 
> > Currently there's no way to send binary data with Web Socket at all, 
> > but going forward, when we add binary support, if one of the use cases 
> > is sending binary data of unknown size, we can definitely use a 
> > chunk-based encoding scheme.
> > 
> > Whether it's useful or not depends on what the API for binary data 
> > ends up looking like. If the API only exposes binary data using Blob 
> > objects, then there's not really a reason to avoid requiring that the 
> > server prebuffer all the data ahead of time to determine the length. 
> > If we expose the data using something akin to the Stream object 
> > currently in the HTML spec, then we'd probably want to make it chunks 
> > of fixed (maximum) size. Since the API doesn't have any binary support 
> > at all right now, though, it's probably premature to worry about this.
> 
> What does the API have to do with it?
>
> Arman's question was not about web browsers as far as I can tell.  It 
> was about non-browser clients, using other languages.

The reason for this protocol is to have two-way communication between a 
script in a Web browser and a server. This is exposed through that API.

Now you can of course, once you have defined a subprotocol that uses 
Web Sockets for use between a Web page and a server, also talk to that 
server from a command-line app, but I don't see why we would add features 
to the protocol for that case specifically. If that use case is so 
important, then the server should just provide a dedicated protocol 
specifically for those clients. It could look like Web Sockets, if that is 
easier (though frankly I think Web Sockets makes a pretty awful generic 
mechanism compared to raw TCP), but there's no reason to claim it _is_ 
Web Sockets, and so the Web Sockets spec is irrelevant here. It's 
effectively just a dedicated custom protocol, and can be specified thus.


> > > We currently use 0x81 as the identifier of the first and subsequent 
> > > frames, and 0x80 as the identifier of the last frame. When a message 
> > > is contained in a single then it starts with 0x80.
> > 
> > Please don't use frame types that aren't defined yet. If you are doing 
> > something for use unrelated to the Web Socket API, there's really no 
> > reason not to just use TCP. You can use the same framing as Web 
> > Sockets (though I don't see why, it's not that great for general 
> > purposes), but if you use Web Socket itself, it's just going to cause 
> > problems for you when the API is updated to do binary.
> 
> Are frame types free to use for applications and/or protocol extensions 
> now, or are they reserved for future WebSocket specifications?

They are all reserved for future Web Socket specifications.


> That needs to be made clear, because it's clear people are starting to 
> use them for protocol extensions.

I've updated the spec to make it clear.


> > On Mon, 1 Mar 2010, Dave Cridland wrote:
> > > This behavioural flux, though, is an excellent argument for putting 
> > > keepalive behaviour into the core, as the people designing the 
> > > frameworks are likely to have good ideas of the defaults, and keepalive 
> > > frequencies will need to be controlled per-deployment, typically, rather 
> > > than per-application.
> > 
> > We could reserve a frame byte for control messages (server talking to the 
> > browser rather than to the script), with the null message being put aside 
> > for keep-alive purposes, if people think that would be especially useful. 
> > I presume we would want the server in charge of sending keepalives rather 
> > than the browser or the script?
> 
> Keepalives and timeouts (they go together) are used for three
> equally critical things:
> 
>    - Keeping TCP open over NATs.
> 
>    - Detecting when the connection is broken at the server, so you can
>      clean it up instead of running out of memory and crashing the
>      server after a few days.  (The Tic-Tac-Toe demo has this problem.)
> 
>    - Detecting when the connection is broken at the client, so it
>      can initiated another one.  (Interactive applications need this.)
> 
> To achieve all three, it's necessary to have keepalive messages sent
> from both sides.  It is not enough to send a message from one side,
> and rely on the TCP ACK coming back.  (That'll keep the NAT open, but
> it won't allow both sides to detect broken connection in a reasonable time.)
> 
> This can be done as a "ping" style request+response, or each side can
> transmit independently when it's not transmitted anything for a
> keepalive interval.
> 
> For a given NAT timeout, the "ping" style uses more bandwidth than the
> "independent" style if the client and server's broken-connection
> timeouts are quite different.  This is because you must reduce the
> keepalive interval by more due to the extra jitter from pinging, and
> because it causes a message in both directions resulting in 3 TCP
> packets, when 2 packets would be enough most of the time.
> 
> When the server and client's broken-connection timeout needs are the
> same, then the "ping" style uses less bandwidth than "independent" style.
> 
> Note that applications like, say, Facebook and Gmail would be better
> off with asymmetric broken-connection timeouts and therefore using the
> "independent" style.
> 
> In some applications, keepalive bandwidth is actually the main
> bandwidth consumer, and it gets quite expensive, so minimising it
> is quite desirable.
> 
> What I conclude from this is that each application must be able to
> decide for itself what keepalive strategy it's going to use, and it
> must not be assumed that server-initiated pings are always a good choice.

These cases all sound like the script talking to the server, not the 
browser talking to the server, and thus can just be done by defining such 
mechanisms in the subprotocol. For example, if the protocol is some chat 
protocol, then the subprotocol could be something like:

Client-to-server:
   MSG <buddy-id> text...
   ADD <buddy-id>
   ACCEPT <buddy-id>
   REJECT <buddy-id>
   PING

Server-to-client:
   MSG <buddy-id> text...
   ADD-REQUEST <buddy-id>
   STATE <buddy-id> <state>
   PING

Here, "PING" is the message that would be used as the keepalive.

We should only add things to the core Web Socket protocol if it's 
something that will really apply to all subprotocols. Otherwise, we're 
adding an implementation burden on server-side implementors who don't need 
the feature but still have to deal with it when the client uses it.


> > IMHO, *we are responsible* for helping authors make the right choice 
> > here, even if in some cases that means simply not giving them a 
> > choice.
> 
> So why do you want to push things like message dispatch, flow control, 
> and cooperation between independent components on a page onto the 
> application authors, instead of helping to make the right choices with 
> those technical things?

Because forcing these features on everybody introduces an implementation 
burden on server-side implementors who don't need the features.


> > > However, I'd be inclined to split the reservation into "reserved for 
> > > future WebSocket revisions and approved extensions" versus "reserved 
> > > for applications to use as they want".
> > 
> > Since the API doesn't expose the frame types at all, I don't 
> > understand why any would fall into the second category.
> 
> The API is not relevant: because the discussion here is all about 
> implementations that aren't using the WebSocket API at all.

While I think we should make it possible to use the API from a non-browser 
client, I don't see why such a client would need to use Web Sockets if its 
needs are so involved that the server has dedicated code to handle it that 
is not used by its browser-side clients.

If you are writing dedicated serer code for non-browser clients, then you 
don't need Web Sockets. Just use TCP.


> > If you're exposing a Web Socket server for non-browser UAs, just make 
> > the client send something in the handshake that opts in to using a 
> > different protocol (that looks very similar to Web Socket, but isn't), 
> > and then this whole spec becomes irrelevant, and you can use whatever 
> > mechanisms you want.
> 
> I'm guess the intention is they want to use WebSocket so that, 
> eventually, browsers will be able to speak to the services they've 
> deployed without having to rewrite those services.

That's quite reasonable, but then you don't need use frame types, since 
those frames aren't going to be used by those services.


> And to reuse the inevitable Web Socket client APIs that will appear 
> quite soon for Java, .NET, Python, Perl etc.

Why not use the already existing TCP client APIs?


> Or maybe it's just a herd instinct, using the protocol even though
> it's not well suited to their problem? ;-)

We shouldn't attempt to make the protocol suit problems that are _by 
definition_ not problems for which the protocol is suited. That makes no 
sense.


> > > If the handshake messages are to contain content, then they MUST 
> > > have headers to indicate the content length to be legal HTTP: in 
> > > this case, a Content-Length header would be appropriate
> > 
> > The handshake messages don't contain content. Should I add a 
> > "Content-Length: 0" field to the handshake to make this clearer?
> 
> Definitely not, because something will inevitably parse the following 
> data as the beginning of another HTTP response if you do that.

That's the idea.


> One argument in favour of length-delimiting is that forwarders 
> (including the APIs) don't have to repeatedly examine the data at all to 
> ensure it's integrity, e.g. for things like parsing and inserting 
> transport control messages (you mentioned them earlier).  Sentinals 
> force each component to examine every byte that it passes along, which 
> is a noticable cost under high load - especially for large messages. 
> (Modern OSes and hardware can move data between files and among network 
> ports without the CPU ever loading it into it's cache.)
> 
> That's a practical efficiency thing.  Google people have said that SPDY 
> experiments showed length-delimiting is generally faster too.

In practice I would imagine that anyone with such needs is going to wait 
until Web Sockets supports compression (probably the first thing to be 
added once we have the basic protocol figured out), at which point they'll 
never need to see text frames at all.

Amateur programmers (the ones more likely to just use text frames) aren't 
generally going to have so much traffic that it matters.


> > > Up above, you implied this was rare.
> > 
> > I believe this problem would be less common than the problem of people 
> > misimplementing string measurement, yes.
> 
> I think you're probably right, but I would vote for length-delimiting on 
> efficiency grounds anyway.

I completely agree that if amateur programmers were not a priority here 
that we'd just use length encodings.


> If you wanted to be more secure, it's possible to protect against both
> issues at once: Require a leading byte length *and* 0xff sentinel following.

I don't really see how that would work, unless we made them alternatives, 
but then we'd still have to pick one for the client to use. I'm not sure 
how that would really solve the problem.


> That won't make amateur code magically work with non-ASCII characters, 
> but it will protect better against injection attacks taking advantage of 
> amateur code.

That just seems like it would become the worst of both worlds, with 
complicated error-handling rules for when they didn't match.


> > > > Leading zeros are not disallowed. (There wouldn't be much point 
> > > > disallowing them as far as I can tell.)
> > > 
> > > Yes, because multiple representations of the same values has *never* 
> > > caused a security issue.
> > 
> > On the contrary, it's often the source of problems. However, in this 
> > case, I don't really see how it could be. Do you have an attack model 
> > in mind?
> 
> How about sending a continuous stream of leading zeros, and causing 
> denial of service because the parser author didn't think to empty the 
> input buffer while parsing the number, or uses a quadratic parsing 
> (repeatedly start from the beginning on receiving more input) because 
> they expected the number to be a short byte sequence?

If we're worried about that kind of attack, you could just as easily send 
an infinite handshake. Why is it a problem here but not with the 
handshake?


> Actually I think we should support split-message chunks and then define 
> a fixed maximum chunk size of 2^31-1.  It wouldn't limit message size, 
> only chunk size.  It's much more likely to be treated correctly by real 
> implementations.

Well we don't do binary frames at all yet, but in principle for binary 
files I would agree. For something like just a compressed text frame, I 
think it'd be better to keep it as one frame, as if it had been a text 
frame. That's probably a discussion for once we have the basics pinned 
down, though; the current design doesn't preclude this kind of approach.


> > More importantly, I don't really see what our options are here. We can 
> > certainly ban the client sending leading nulls -- and indeed we have, 
> > since the client isn't allowed to send binary data at all currently, 
> > let alone binary data with leading nulls in the length. But what 
> > should the server do when faced with a buggy or hostile client? If we 
> > don't define how to handle this, but say that servers should expect it 
> > and close the connection if they see it, then it's clear to me that 
> > most implementations would _not_ check for it (that has been my 
> > experience with other languages and protocols -- asking implementors 
> > to check for something for the purposes of failing usually leads to 
> > them ignoring the requirement on the basis that doing something else 
> > is "more reliable"). So that leaves us with the choice the spec has: 
> > define it in a way that is trivial to implement interoperably (it's 
> > just an intrinsic part of handling valid data), and which is 
> > apparently harmless.
> 
> The something else is only "more reliable" if these things ever occur in 
> the wild.  If it's very clear from the start that there are no leading 
> zeros, nothing will need to parse it, and nothing will send it because 

Agreed so far...

> they get rejected when they first write code which does so.

...but not with this. If we don't define how they are handled, as I 
described above, then what happens will vary from implementation to 
implementation. By defining it in the spec we ensure that it will be 
tested when we write the test suite, and thus that servers are at least 
going to be exposed to it if their developers look at the test suite.


> > > So, your defined behaviour in the face of a DOS or overflow attack 
> > > is to, what? Just work?
> > 
> > The spec explicitly that "If the user agent is faced with content that 
> > is too large to be handled appropriately, runs out of resources for 
> > buffering incoming data, or hits an artificial resource limit intended 
> > to avoid resource starvation, then it must fail the WebSocket 
> > connection". It furthermore says that "Servers may close the WebSocket 
> > connection whenever desired". I don't see what else we can say with 
> > respect to handling DOS or overflow attacks.
> 
> I'm concerned about client APIs, server APIs, and network proxies,
> that may pass along data incrementally and have to keep a running
> counter based on the frame length they have parsed (or sent).
> 
> That's how large length-delimited frames are likely to be handled in
> practice inside frameworks.  And WS-parsing proxies must handle any
> legitimate frame sized.
> 
> Both are quite likely to break when presented with a length which
> doesn't fit into 2^32 or 2^64, depending on the variable size they use
> internally.  By not specifying a maximum, I think we're encouraging
> this as a failure point - in the same way that old HTTP agents crashed
> when faced with a Content-Length >= 2^31 - because the authors didn't
> know what was a sensible limit to implement, but practically they had
> to use some fixed size type for it - a "bignum" implementation would
> be ridiculous despite being the only formally correct thing to do.

The server-side developer is also the subprotocol writer, so they do know 
what the limit is, they decide it when designing their protocol.

Bypassing man-in-the-middle proxies entirely by using TLS seems like the 
best solution if your protocol uses frames so big that those proxies are 
going to screw it up.


> That's why I'm thinking that an explicit maximum length of 2^31-1 per 
> chunk, combined with permitting messages composed of multiple chunks 
> (because that's useful for other reasons anyway), is the safest choice 
> coming to mind at the moment.  For the length-delimited frames, that is. 
> Sentinel-delimited frames can already be any length.

Well, right now the only length delimited frame that's allowed has a 
maximum allowed length of zero. But I agree that when we add binary frames 
it might make sense to have some limits.


> Note that this is *different* from running out of resource limits or 
> protecting against starvation, because this is just about counting, so 
> it doesn't come under the section which talks about resource limits - 
> and it shouldn't.

Using a four-byte word for counting and overflowing that count is 
technically a resource limit. :-)


On Thu, 4 Mar 2010, Scott Ferguson wrote:
>
> A strawman for discussion purposes.

I'm not really sure what problem this strawman is intended to address. Can 
you elaborate?


On Fri, 5 Mar 2010, Greg Wilkins wrote:
> 
> I think we need to decide who will be doing the extending and creating 
> new subprotocols.
> 
> Ian appears to be advancing the position that if it can't be done in 
> javascript via the websocket API, then it can't be done.  Moreover, that 
> if you are not doing it in javascript then you shouldn't be using 
> websockets.

That's an oversimplification of my position.

My position is this:

Web browser vendors want to expose to Web page scripts the ability to open 
essentially a TCP connection to arbitrary servers, to replace the 
combination of XHR and the "infinite <iframe>" hack people use today. This 
obviously would have huge security problems, so instead they are willing 
to settle on a compromise: layering over TCP a minimal handshake to 
enforce an "origin"-based security policy with explicit in-band server 
opt-in. In addition, to avoid having to expose stream APIs to scripts, it 
is desired to make the mechanism use packets (frames) rather than exposing 
a stream. There's also a desire to make this use ports 80 or 443 and be 
able to use this mechanism to talk to servers who currently expose HTTP or 
HTTPS on those ports.

Web Sockets is intended to address this use case.

In addition, there is naturally a desire to make it possible for authors 
to use whatever solutions get developed in this space to talk directly to 
dedicated clients as well as talking from scripts, so that servers who do 
not support a dedicated TCP-based protocol but do expose a Web Socket- 
based protocol for their Web page's scripts can still be used from 
dedicated client programs. (This is similar to how JSON APIs intended for 
JavaScript, e.g. the Twitter API, is also usable from native clients.)

Once a server starts writing dedicated code to handle connections from 
dedicated clients, though, there's no need for the origin-based model or 
the frames, and therefore no reason why the server can't just provide a 
dedicated protocol unrelated to Web Sockets.


Now it's quite possible that there is a need for another protocol also, 
one that addresses some native-client-specific needs that I'm not aware 
of. It's also possible that the majority of this working group's members 
are interested in such a protocol, and not in the protocol needed by 
browser vendors as described above. If that is the case, then we should 
work on that protocol, and the browser vendors' needs shouldn't be a 
concern. IMHO it would be a huge mistake to try to take two such unrelated 
use cases and try to address them simultantously in one protocol. It would 
be equivalent to trying to address the use cases that led to IMAP and HTTP 
simultaneously in one protocol.


> Examples that come to mind include:
> 
>   Browser implementing a simple multiplexing standard
>   to make all websockets from the same tab/window share
>   a common connection.
>
>   Browser implementing compression.

Why wouldn't we just build these straight into the Web Socket spec? Surely 
if browsers just implement their own random extensions, it would only be 
for experimentation, since it wouldn't have a chance to interoperate with 
other browsers and servers.


>   Firewalls acting as aggregators and combining
>   multiple base connections into fewer multiplexed
>   connections to the business servers.

Assuming you mean on the server-side, these are just considered part of 
the server from the spec's point of view.


>   Appliances doing SSL offload and converting wss to ws
>   connections with injected certificate information.

Same here.


> I also think that there will be extensions done in javascript by 
> frameworks/applications, but by definition they work above the API and 
> need no more support from the base protocol than the API already 
> provides.

Right. (Not sure what you mean by "extension" here though.)


On Fri, 5 Mar 2010, Salvatore Loreto wrote:
> 
> 1) do we want WebSocket protocol to be an extensible protocol?

Extensible by whom? In what sense? It's clear that we need to make sure 
the protocol can be extended for future versions of the protocol itself, 
in a backwards- and forwards-compatible fashion, and it also seems clear 
that we need to make sure clients and servers can experiment with such 
proposals (not in the wild) to see how they would work. Is that what you 
mean?


On Mon, 8 Mar 2010, Vladimir Katardjiev wrote:
> 
> First, a copy/paste error. Page 31 seems to be missing 
> Sec-WebSocket-Key2.

Oops, that was a bug in my HTML-to-text convertor. Fixed.


> Same page, you probably meant | instead of " in
> 	
> 	(such as |Cookie")

Also a bug in my script! Fixed.


> Also, this might be just my reading comprehension, but the following 
> paragraph caused a double-take
> 
> 	if a server assumes that its clients are authorized
>       on the basis that they can connect [...] then the server
>       should also verify that the client's handshake includes the
>       invariant "Upgrade" and "Connection" parts of the handshake, and
>       should send the server's handshake before changing any user data.
> 
> Checking Upgrade and Connection is redundant if Sec-WebSocket-* are 
> present from a browser PoV;that was the point of them, no?

Yeah, that note is obsolete now. Removed.


> Right, time to graduate from reading comprehension into actual 
> questions. The big addition obviously Sec-WebSocket-Key1/2. Are there 
> any fundamental limitations on where spaces can and should occur in the 
> header? According to my interpretation as it stands, the following 
> string is a perfectly valid key: "  4abcd". Is this intended? And, if 
> so, can the leading (or trailing, if so inclined) spaces cause issues 
> when parsing by proxies/existing stacks?

I shouldn't think so, at least not with any proxies or existing stacks 
that are compatible enough in other ways to not screw up Web Sockets in 
general. However, I'm just guessing here.


> Also, I must've missed the big discussion, but what's the rationale of 
> having two key headers. One I can buy, but since it should already be 
> impossible to create a Sec-* header from a browser, AND it is impossible 
> to add spaces to the header value, AND it is a GET with a body, it's 
> already triple impossible (impossibler? impossiblest?) for XHR/CORS to 
> forge. If someone has a handy link to where this suggestion was born 
> it'd be appreciated, because this seems like quite a bit of computation 
> involved.

I originally had just one header:

  GET /demo HTTP/1.1
  Host: example.com
  Connection: Upgrade
  Sec-WebSocket-Key: 12345
  Sec-WebSocket-Protocol: sample
  Upgrade: WebSocket
  Origin: http://example.com

However, the obvious (and wrong) way to implement this is to read the 
entire header, and then do something like (in this case in Perl):

   $header =~ m/Sec-WebSocket-Key: \(.+\)\r\n/;

...or some such. Now consider this request:

  GET /demo?Sec-WebSocket-Key: HTTP/1.1
  Host: example.com
  ...

The simplest implementation is thus vulnerable to trivial attack.

As defense in depth, I introduced several overlapping solutions to this:

 * There are two keys, both of which have to end in \r. This means you 
   have to find two places to smuggle data into the connection, not just 
   one, and they have to be on different lines.

 * The keys have to include at least one space, and this is verified by 
   requiring the implementation to divide by that count -- if there aren't 
   enough spaces and you didn't check for it, you'll get a divide-by-zero 
   in many languages.

 * Non-numeric characters are stripped, so that the division is definitely 
   done on a number, so that any side-effects of dividing a string or NaN 
   or something like that are avoided in the smuggling case.

 * The client is required to randomly insert the spaces, as well as random 
   non-numeric characters, and is required to randomyl shuffle the 
   headers, so servers can't rely on any one convention, they have to at 
   least do a half-hearted attempt at really parsing the header.

 * The key also involves eight bytes after the handshake, so you have to 
   properly determine the end of the handshake, so things can't be 
   smuggled into the body, even if you otherwise ignore frames from the 
   client.


> The junk characters in the handshake also seem redundant. If you already 
> need to verify the correct number of spaces this means you are checking 
> for spaces already. As for the garbage characters, all they made me do 
> was replace("[^0-9]", '') instead of replace(' ', '') so I'd say they 
> didn't alter the functionality of my code other than take up more CPU 
> cycles.

They're intended to make sure that the behaviour is well-defined even if 
the key header is being smuggled in somewhere. Otherwise, different 
servers might use different ways of handling invalid handshakes, and that 
is likely to lead to some being more vulnerable. Now we can't entirely 
prevent this, obviosuly, but we can at least help it a little here.


> The closing handshake is also interesting. Maybe state in 5.3. that the 
> server shouldn't disconnect on 0xFF (instead of may; it still may, but 
> is discouraged from doing so since it may cause data loss. If the target 
> audience really is the kind of programmer you need to verify their code 
> by actually ensuring they read the spaces, then make the default level 
> of the protocol ensure data integrity, because that's what they'll see 
> in the lab network).

Good point. Fixed.


> Finally, and probably my smallest serious gripe, but what's up with the 
> mix of references to ASCII and UTF-8 as characters. It is all 
> technically correct, to nobody's surprise, but it's distracting to read. 
> I hope it's just a transitional thing and it'll eventually be UTF-8 
> throughout but it was just too boring to do it all at once.

Not sure what you mean.


> Page 20:  EXAMPLE: For example

Yeah, there's a few of these. I'll look into what to do about it. Maybe I 
should indent the examples rather than prefixing them with "EXAMPLE:"?


> Page 22: -> If the byte is 0x3A (ASCII :)
> 
> The internet has damaged me so badly this was a smiley face.

That, I can't help you with. :-)

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'