Re: [hybi] WebSockets feedback

tl;dr

On Thu, Apr 15, 2010 at 12:58 AM, Ian Hickson <ian@hixie.ch> wrote:
>
> This e-mail has replies to various e-mails sent to the list in the past
> few weeks that I had put aside to reply to. The changes referenced in this
> e-mail are just to the editor's working copy (and the WHATWG copy) of the
> Web Sockets draft, and are not an attempt at making any proposals for
> working group consensus at this time. There's nothing especially important
> in this e-mail, it's mostly just discusson of goals and priorities and
> some minor (mostly editorial) changes to the draft.
>
>
> On Thu, 4 Mar 2010, Greg Wilkins wrote:
>>
>> I hope that this type of mass update is a once off due to the
>> circumstances we find in starting off the WG.
>
> I am open to whatever working style people prefer; in the past people have
> indicated to me that they much prefer when I do bulk replies so they can
> get up to date on all the issues at once rather than having me post
> separate messages to each thread (which has been described as "spamming"
> the group).
>
>
>> In general I think it would be far easier handle changes in smaller
>> increments and individual threads. Also it would be good to see proposed
>> diffs to the draft before they are actually just put in the draft.
>
> The changes are available in diff form from the HTML5 Revision Tracker:
>
>   http://html5.org/tools/web-apps-tracker
>
> The source document is in Subversion if you want specific versions:
>
>   http://svn.whatwg.org/webapps/source
>
> When it comes to editing the group's document I'm happy to work in
> whatever way is preferred by the chairs.
>
>
>> If the handshake messages are to contain content, then they MUST have
>> headers to indicate the content length to be legal HTTP: in this case, a
>> Content-Length header would be appropriate
>
> The idea here is that the 8 random bytes are part of the post-Upgrade Web
> Socket connection, not the GET request, so that any intermediaries will
> fail to send the data along if they do any interpretation, thus failing
> early. If we included them in the Content-Length then unfortunately the
> early failure mode of intermediaries would be lost.
>
>
>> Considering that we are already shipping products with websocket
>> implementations from the existing draft, can we specify a transition to
>> the new handshake in an appendix.
>
> I'm not sure what you mean; these are all highly immature drafts, I
> wouldn't expect anyone using any of them to expect them to be in any way
> reliable.
>
>
>> Ie while the standard is under development connections without
>> websocket keys MAY be accepted, but implementation SHOULD warn
>> that support for such connections is deprecated.
>
> While the standard is under development, any connections are going to be
> purely experimental, so it doesn't seem especially important to have
> conformance criteria specifically for them.
>
>
>> The text of 1.3, still kind of implies that the HTTP fields must be
>> ordered as per the spec.  Can we add a sentence to say that header
>> ordering may be different.
>
> Quite early in that section it says:
>
>   Fields in the handshake are sent by the client in a random order; the
>   order is not meaningful.
>
>
>> Also I'm not sure this sentence is really that clear:
>>
>>  "Additional fields are used to select options in the WebSocket
>>    protocol.  The only option available in this version is the
>>    subprotocol selector, |Sec-WebSocket-Protocol|:"
>>
>> I thought we were going to allow arbitrary extra headers, but that their
>> interpretation was not defined by the ws spec other than the optional
>> Sec-WebSocket-Protocol.  We still need to allow headers for cookies,
>> authentication etc.
>
> Well obviously we wouldn't want to allow arbitrary proprietary fields,
> since, by definition, those would be proprietary and thus not
> interoperable. The only legitimate use case I can think of for such
> extensions would be experimentation, but we don't need those to be
> conforming since experimentation is by its very nature done in controlled
> environments and not expected to interoperate.
>
> I've added "Cookie" to the list, though. As you say, that one is relevant
> in this version.
>
>
> On Fri, 5 Mar 2010, Thomson, Martin wrote:
>> >
>> > For this reason, and because it is generally much easier,
>> > sentinel-based framing is used for text frames.
>>
>> A lot of accommodation has been made for this mythical brain-dead
>> programmer.
>
> As any Web browser vendor can tell you, he's not so mythical. The antics
> of such authors are so common that the term "tag soup" was coined to
> describe how such authors manhandled HTML.
>
>
>> Build an "idiot-proof" system and you'll just breed a more
>> virulent strain of idiot.
>
> I don't think anyone is arguing that the proposal is idiot-proof; only
> that amateurs are an important consideration.
>
>
>> And, as Greg continues to point out, easier is subjective.  For some
>> value of easier, you also get injection attacks and other wonderful
>> things.
>
> I agree that it is subjective. That doesn't mean we should ignore the
> problem, however.
>
>
>> > From the Web Socket point of view, it shouldn't fail for any size.
>>
>> There's a marked difference between "designed for large files" and
>> "suitable for large files".  I personally don't get the use case.  If I
>> want to transfer a large file, it seems obvious to me that indirection
>> is the right solution.  Put a URI in your WS message and let the other
>> end fetch the data.
>
> It's not clear to me what URI you would provide if you are wanting to
> upload (from the client) a 4GB video file, though I agree that it may
> well be wise to use XHR instead of Web Socket to do so.
>
>
>> One reason you don't want to overload protocol use is that you might
>> give WS traffic a higher priority; file transfers usually get lower
>> priority.  Dumping them all in the same stream leaves you with no way to
>> treat these things separately.
>
> Indeed. That's somewhat academic though since currently there's no way to
> send binary files at all in the API.
>
>
>> > > > Leading zeros are not disallowed. (There wouldn't be much point
>> > > > disallowing them as far as I can tell.)
>> > >
>> > > Yes, because multiple representations of the same values has *never*
>> > > caused a security issue.
>> >
>> > On the contrary, it's often the source of problems. However, in this
>> > case, I don't really see how it could be. Do you have an attack model
>> > in mind?
>>
>> Establishing a covert channel using leading zeroes on each frame might
>> be fun.
>
> Given that you control the client and the server, why would you need such
> a complicated channel? Just use Web Socket.
>
> There's no way to either send or receive these leading zeros from the API
> anyway, so I don't see how you could use this. (If you just wanted to send
> data from a command-line app, you might as well just use raw TCP instead
> of using an obscure part of Web Sockets.)
>
>
>> Causing a buffer overrun might also be possible.  Assume that an
>> implementation assumes that it can reject a frame when its size gets
>> above a certain threshold.  That implementation supports 10K frames.  A
>> bad implementation incorrectly assumes that the size is never going to
>> take more than two octets, so they only read a few to start:
>>
>> Byte[] octets = readSomeOctets();
>> Int size = 0;
>> for( int i = 0; octets[i] & 0x80 == 0x80; ++i) {
>>    size = size << 7 + (octets[i] & 0x7f);
>>    if (size > 10000) { throw error; }
>> }
>>
>> (Yes, that's terrible code.)
>
> Yup. It's not clear to me what you're proposing instead, though. Short of
> just using a fixed-length length field, which I would consider short-
> sighted given the way that network abilities grow over time, I don't see a
> good way to avoid this problem.
>
>
>> > More importantly, I don't really see what our options are here.
>>
>> You can stop insisting that implementers are stupid or lazy.  While
>> there's plenty of evidence to the contrary, there's also evidence that
>> stupid or lazy implementations don't live long enough to thrive.
>
> I see no evidence that the "stupid or lazy" programmers "don't live".
> Quite the contrary, it seems to me that most of the Web is built from
> amateurs -- the long tail is almost all amateurs and I see no reason why
> we should ignore them, especially not just for our convenience.
>
>
>> If you are assuming that people are idiots, it might be worth pointing
>> out that calling read() on a TCP socket doesn't return an entire frame
>> always.
>
> As the spec is written, such an assumption is harmless, since the spec
> leads an amateur towards reading one byte at a time.
>
>
>> I've seen code that assumes that one call to read() corresponds to a
>> write() on the other end. When two messages are written close enough
>> together that they get bundled into the same packet, the second message
>> is lost; when a write is too big to fit into the same packet, the tail
>> of that message is lost.
>>
>> How far do you want to go?
>
> Pretty much exactly as far as we have gone.
>
>
>> > I don't see what else we can say with respect to handling DOS or
>> > overflow attacks.
>>
>> We can explain how someone might detect such an attack.  We can explain
>> what they might do to mitigate the damage.
>
> I'm happy to add such text; do you have any concrete suggestions?
>
>
> On Fri, 5 Mar 2010, Greg Wilkins wrote:
>>
>> I know you respond to a lot of feedback... but continually merging it to
>> be under a single "Feedback" thread is really not conducive to
>> discussion.  If feels more like you are handing down your decisions from
>> the mountain... and I think contributes somewhat to the excessive heat
>> on this list.
>
> I understand that it may appear that way; as noted, though, other people
> have in fact encouraged me to use this style, so unless I hear otherwise
> from the chairs, I will continue to use this style.
>
>
> On Fri, 5 Mar 2010, Greg Wilkins wrote:
>> > On Wed, 24 Feb 2010, Arman Djusupov wrote:
>> >> We currently use 0x81 as the identifier of the first and subsequent
>> >> frames, and 0x80 as the identifier of the last frame. When a message
>> >> is contained in a single then it starts with 0x80.
>> >
>> > Please don't use frame types that aren't defined yet.
>>
>> I thought that a sub protocol was free to use the frame type bytes.
>
> No? What suggests that?
>
>
>> The frame type byte is defined in the protocol, but with a meaning
>> allocated to only 1 bit - leaving the other 7 available for
>> subprotocols.
>
> Oh good lord, no, the frame type is intended to be reserved for future
> revisions of the protocol. The spec is very precise about what can be
> sent, and currently the only frame types that the spec says a peer can
> send are 0x00 and 0xFF. There'd be no point sending anything else since
> the client implementations would ignore them.
>
> (Some people might want to use a Web Sockets-like protocol, that isn't
> Web Sockets but looks a lot like it, for dedicated client-to-server
> communications unrelated to the Web Socket API. For those, this spec is
> irrelevant, though. They would just write their own spec that happened to
> look a lot like Web Sockets.)
>
>
>> >> However, I'd be inclined to split the reservation into "reserved for
>> >> future Web Socket revisions and approved extensions" versus "reserved
>> >> for applications to use as they want".
>> >
>> > Since the API doesn't expose the frame types at all, I don't
>> > understand why any would fall into the second category.
>>
>> Ian, I think you are not considering one of the most likely implementors
>> of subprotocols - browsers!
>>
>> A subprotocol that provides one or more of: multiplexing, compression,
>> fragmentation, etc would be most useful if implemented transparently
>> below the websocket API by the browsers themselves.
>>
>> I certainly do not think that frame bytes should be exposed to the js
>> application, as they are primarily a transport concern and [sic]
>
> I would fully expect us to extend the protocol in the future with more
> features, but that would just be part of the Web Sockets protocol, it
> wouldn't be UA-defined additional frame types.
>
>
>> > If you're exposing a Web Socket server for non-browser UAs, just make
>> > the client send something in the handshake that opts in to using a
>> > different protocol (that looks very similar to Web Socket, but isn't),
>> > and then this whole spec becomes irrelevant, and you can use whatever
>> > mechanisms you want.
>>
>> We don't want an infinite variety of websocket like protocols. One of
>> the huge challenges for websocket is to get the intermediaries and other
>> network appliances to work well with websocket.
>
> People are using BitTorrent, Tor, IMAP, etc, today. Why would future
> non-Web-browser client-side protocols be any different?
>
>
>> Once that is done, then ideally any subprotocol variation can be
>> transparent to intermediaries and they will not need to be updated as
>> new and interesting uses for websocket are invented.
>
> Well certainly the "infinite variety" of protocols you speak of could look
> enough like Web Socket that that would work, but I really don't think that
> should be a concern for this working group.
>
>
>> I think this is the acid test of what should be in the base protocol and
>> what should not.  Ie the base protocol should not have any feature that
>> can be subsequently implemented without affecting intermediaries.
>
> Why isn't TCP the base protocol?
>
>
>> > On Thu, 4 Mar 2010, Dave Cridland wrote:
>> >> I'd be happy with, at this point, mere reservation of a range for
>> >> protocol purposes, and leaving a range clear for subprotocol usage.
>> >
>> > I don't understand how a subprotocol would ever make use of these
>> > frame types. The API doesn't expose them.
>>
>> I think it is really important that we all come to an understanding of
>> how subprotocols are going to work.  It's not sufficient to boot a whole
>> bunch of requested features "to be implemented in subprotocols", but
>> then every time somebody proposes how a subprotocol could work they are
>> told that they can't do that because it can't be implemented in the
>> javascript API.
>
> I don't understand what is unclear here. Web Socket exposes a mechanism
> whereby strings of text are sent client to server or server to client.
> Connections are identified by a resource name. Subprotocols therefore have
> that to work with.
>
>
>> I know you think that application programmers will write absolutely
>> everything except the browser. Well good for you and I wish you well.
>>
>> But can you also respect the position that many of us want to continue
>> to stand on the shoulders of giants and reuse software developed by
>> others.
>>
>> I think it is entirely reasonable to expect that websocket extensions
>> and subprotocols will be implemented by browsers and server side
>> frameworks and provide a whole range of transport features hidden from
>> the application developer.
>
> Why would that happen separate from this working group?
>
>
> On Fri, 5 Mar 2010, Greg Wilkins wrote:
>> Ian Hickson wrote:
>> > The HTTP auth headers aren't currently added to the Web Socket
>> > connection; this was briefly in the spec but was taken out based on
>> > implementor feedback a few months ago.
>>
>> I think this is shortsighted and you are disabling an existing
>> authentication mechanism without providing an alternative.
>>
>> The upgrade request should be a standard HTTP request and should be able
>> to be authenticated as any other HTTP request.
>
> In practice HTTP requests are authenticated by cookies, which are allowed.
>
>
>> > On Thu, 4 Mar 2010, Greg Wilkins wrote:
>> >> But the problem with this is it assumes success and that a 101 is the
>> >> only possible response.
>> >
>> > Why is that a problem?
>>
>> I explained the problem in the next paragraph!
>>
>> This "solution" will prevent the use of HTTP return status codes. The
>> server will only have the option of sending a 101 or closing the
>> connection.
>>
>> Even if you don't accept the example potential uses I've outlined, it
>> strikes me as shortsighted to disable yet another standard mechanism in
>> the name of protecting against some phantom menace.
>
> I don't think that's the reasoning... The reasoning is simply that it
> isn't needed. If you want to use HTTP, the HTTP spec already exists.
> There's no need for Web Sockets to be involved _unless_ you actually
> connect to a Web Socket server. If you're just doing HTTP-to-HTTP, the
> Web Sockets spec is irrelevant.
>
>
>> >> If the bytes after the request header are sent as request content,
>> >> what attack verctor is opened up?
>> >
>> > I think Maciej described the cross-protocol attack in detail last
>> > month, that's probably the best description of the problem I've seen.
>>
>> You are misrepresenting this.
>>
>> Maciej described an attack vector that is easily addressed simply by
>> having a unique ID in the headers that the server needs to include in
>> the response. If the unique ID is generated by the browser, then it will
>> not even be in existence when any injection attack is formulated and
>> thus we are safe unless the ID is predicable.
>
> The key is protecting against HTTP client to Web Socket server attacks,
> e.g. using XHR to send fake Web Socket requests to the server, that trick
> the server into performing actions that were not requested and that, had
> the client actually been a Web Socket client, would not have been done,
> since the connection would not have been accepted. If the key was only
> sent as a single header, then a simple server-side implementation would
> just match a regexp against the entire request and would easily be tricked
> by header-like text smuggled into the path. By having two Sec-prefixed
> headers, requiring that the headers contain one or more spaces, requiring
> that the server look for a newline to terminate the header, and requiring
> that there be text after the request, it becomes extremely hard to cause a
> client to send everything needed to the server to cause any harm.
>
> (We're only worried about Web browsers doing this because they have the
> user's cookies. Direct connections from an attacker aren't a problem since
> they do not have any ambient authority.)
>
>
>> The issue the random bytes after the GET request and after the 101
>> response are trying to handle is that of fast fail.  While fast fail is
>> a desirable attribute, there has been no rigorous explanation of how
>> this "solution" achieves that - nor was there any discussion of
>> alternative ways this could be achieved.
>
> It seemed pretty obvious... intermediaries that don't recognise Web Socket
> would consider the eight bytes part of an unrelated request and would thus
> fail to parse them, rather than sending them as part of the first request.
>
>
>> There has been no evidence or argument presented that this "solution"
>> actually provides fail fast semantics.
>
> Agreed; this is merely hypothetical at this point. Hopefully we will be
> able to test the design before the spec is done.
>
>
>> Yet this speculative "solution" has been added to the spec at the cost
>> of disabling the possibility of using the established HTTP response.
>
> Not sure what this means.
>
>
>> There are a significant number of response codes that may be of value in
>> the handshake, either now or in the future. While there may be some
>> problems with using some of them, those issues should be identified and
>> discussed - not wholesale disabling of the possibility to use them.
>
> I don't think anyone is suggesting we change HTTP. If an HTTP server
> receives a request, it is perfectly legitimately allowed to respond using
> HTTP. Same with an HTTP client.
>
>
> On Fri, 5 Mar 2010, Jamie Lokier wrote:
>> Ian Hickson wrote:
>> > length measurement wrong without realising it (because of only testing
>> > ASCII, but outputting the string length instead of the byte length).
>> > For this reason, and because it is generally much easier,
>> > sentinel-based framing is used for text frames.
>> >
>> > In practice this means that most authors need only implement the
>> > sentinel framing (which is trivial); only more advanced (and thus
>> > competent) authors will need to implement the more complicated
>> > framing.
>>
>> What about "UTF-8 text" that incorrectly, but in reality, contains 0xFF
>> bytes?  This can happen in most scripting languages unintentionally, for
>> example a script which reads lines from a file that "should" be in UTF-8
>> (but really contains some 0xff bytes) and sends them as messages, will
>> result in a frame injection error.
>
> Yes, that is indeed a concern. The spec mentions this explicitly actually.
>
>
>> That's even more likely for, say, reading a directory from a filesystem
>> where UTF-8 is the standard name encoding, and sending those names in a
>> message.  Except... in reality, it's easy for someone to subvert it by
>> putting something non-UTF-8 in there.
>
> Indeed, if there's any non-Web Socket way of getting data into the app, it
> may be possible to get an invalid 0xFF in.
>
>
>> > On Thu, 18 Feb 2010, Jamie Lokier wrote:
>> > > But that ignores something: An endpoint shouldn't be sending frame
>> > > types that the other end doesn't know about.  So there is no need to
>> > > specify how to discard binary frames, unless there is a general
>> > > principle of discarding frames with unknown type byte too.
>> >
>> > There _is_ a general principle of discarding frames with unknown type
>> > bytes.
>> >
>> > We need to make sure that today's clients don't handle tomorrow's
>> > servers in unpredicatable ways, because if they do, then we might
>> > never be able to upgrade the protocol, due to Web pages depending on
>> > particular behaviours, the same way that, for example, many Web pages
>> > depend on browsers doing browser sniffing, or on browsers ignoring
>> > Content-Location headers.
>>
>> But that's what the subprotocol negotiation is for.  We should encourage
>> it's use, or spell out the circumstances in which it's useful for an
>> endpoint to speculatively use frame types that the other end might not
>> recognise.
>
> I don't see how subprotocol negotiation would be relevant here.
>
>
>> > > One thing that *really* needs to be right early on is making sure
>> > > future proxies know the frame boundaries, for efficient forwarding.
>> > >
>> > > If the spec didn't describe binary frame delimiting, we could find,
>> > > in a few years, that we'd have no choice but to use UTF-8 for
>> > > everything, even having to encode binary messages into Unicode text,
>> > > simply because some parts of the net would have proxies parsing
>> > > WebSocket frames and failing to forward anything else properly or
>> > > with useful timing/buffering.
>> > >
>> > > That's not stated in the spec, but it's an implied consequence of
>> > > the binary frame rule being described that WebSocket-aware proxies
>> > > will probably apply it.  Perhaps it should be explicit.
>> >
>> > I don't see why we'd want any client-side proxies other than SOCKS
>> > proxies or HTTP proxies (via CONNECT), and on the server-side it seems
>> > that upgrading dedicated Web Socket-aware proxies would be relatively
>> > easy in this kind of scenario.
>>
>> It's not about what you/we want - it's what we'll get *anyway*, because
>> (a) other people will insist on firewalling your WebSocket connections,
>> to make sure you can only send non-pornographic text messages or
>> whatever, and (b) there are significant transport optimisations possible
>> using proxies, so they will be deployed.
>>
>> I'm already finding that I have to write a client-side proxy - to merge
>> keepalives from multiple WebSocket conections, and reduce the number of
>> TCPs to a managable level for a slow network.  I anticipate installing
>> it as a site-wide TCP-intercepting proxy.  That's an example of
>> transport optimisation using a proxy - invisible to the actual clients
>> and servers, and they can't avoid it.  By the way, you mentioned
>> "author-friendly" elsewhere as a criterion.  This is very author
>> friendly because the endpoints don't know or care.
>>
>> If I'm already looking at this at this early stage, then I won't be the
>> only one doing it when it's widely deployed.
>
> I don't really understand what you're asking for here. Are you suggesting
> we should define a third conformance class in the spec for
> man-in-the-middle proxies? I suppose we could do that. Would it need any
> requirements other than passing through the bytes unchanged?
>
>
>> If I send the frame length 2^128, so that I can follow it with any
>> amount of binary data (effectively turning the connection into a byte
>> stream), I am sure that some future implementation, perhaps a proxy,
>> will treat it as zero and then break.
>
> Well there's no binary sending or receiving mechanism at all right now, so
> it's not yet clear how it'll work, but I would presume that to send it'll
> merely take a File or Blob object or whatever JS binary type is made, and
> send it, and to receive it'll wait for the whole frame and then create a
> Blob or whatever JS binary type is made, and fire a single event. So you
> wouldn't be able to stream things in a single frame; the API wouldn't
> expose it that way.
>
> (If you're doing that in a non-browser setting, i.e. you've made an API
> that is specifically for non-browser clients, then the Web Sockets spec is
> irrelevant -- you can just make your own protocol that looks as close to
> Web Sockets as you want, and just call it your own protocol. No need to
> worry about the Web Sockets spec.)
>
>
>> There is also the open question of what buffering and forwarding
>> behaviour is expected, both of proxies, and of receivers.  If I send a
>> large frame incrementally (i.e. by writing bytes), is it permitted to
>> buffer it with unlimited delay until the end of the frame, or must the
>> received bytes be passed on to the application, or the next hop in the
>> case of a proxy?
>
> For the client (using the API), the data is exposed as an event, so
> there's no choice.
>
> For the server, there doesn't seem to be any particular practical
> difference from an interoperability perspective, it's just an
> implementation detail.
>
> For a man-in-the-middle proxy, either behaviour can naturally occur due to
> regular network weather, so I don't think there's much point requiring one
> or the other.
>
>
>> > On Wed, 24 Feb 2010, Arman Djusupov wrote:
>> > >
>> > > I have a use case which doesn't seem to have been taken into account
>> > > in the spec. When large binary messages of initially unknown size
>> > > are getting streamed over a connection, the transport does not know
>> > > the final size of the binary message when it starts serializing it
>> > > and so it cannot encode the message's length at the start of the
>> > > binary frame. In our implementation we handle such cases by
>> > > buffering data until the buffer is full and then flushing it into a
>> > > frame of specific length; subsequent data of the same message are
>> > > sent in the following frames.
>> >
>> > Currently there's no way to send binary data with Web Socket at all,
>> > but going forward, when we add binary support, if one of the use cases
>> > is sending binary data of unknown size, we can definitely use a
>> > chunk-based encoding scheme.
>> >
>> > Whether it's useful or not depends on what the API for binary data
>> > ends up looking like. If the API only exposes binary data using Blob
>> > objects, then there's not really a reason to avoid requiring that the
>> > server prebuffer all the data ahead of time to determine the length.
>> > If we expose the data using something akin to the Stream object
>> > currently in the HTML spec, then we'd probably want to make it chunks
>> > of fixed (maximum) size. Since the API doesn't have any binary support
>> > at all right now, though, it's probably premature to worry about this.
>>
>> What does the API have to do with it?
>>
>> Arman's question was not about web browsers as far as I can tell.  It
>> was about non-browser clients, using other languages.
>
> The reason for this protocol is to have two-way communication between a
> script in a Web browser and a server. This is exposed through that API.
>
> Now you can of course, once you have defined a subprotocol that uses
> Web Sockets for use between a Web page and a server, also talk to that
> server from a command-line app, but I don't see why we would add features
> to the protocol for that case specifically. If that use case is so
> important, then the server should just provide a dedicated protocol
> specifically for those clients. It could look like Web Sockets, if that is
> easier (though frankly I think Web Sockets makes a pretty awful generic
> mechanism compared to raw TCP), but there's no reason to claim it _is_
> Web Sockets, and so the Web Sockets spec is irrelevant here. It's
> effectively just a dedicated custom protocol, and can be specified thus.
>
>
>> > > We currently use 0x81 as the identifier of the first and subsequent
>> > > frames, and 0x80 as the identifier of the last frame. When a message
>> > > is contained in a single then it starts with 0x80.
>> >
>> > Please don't use frame types that aren't defined yet. If you are doing
>> > something for use unrelated to the Web Socket API, there's really no
>> > reason not to just use TCP. You can use the same framing as Web
>> > Sockets (though I don't see why, it's not that great for general
>> > purposes), but if you use Web Socket itself, it's just going to cause
>> > problems for you when the API is updated to do binary.
>>
>> Are frame types free to use for applications and/or protocol extensions
>> now, or are they reserved for future WebSocket specifications?
>
> They are all reserved for future Web Socket specifications.
>
>
>> That needs to be made clear, because it's clear people are starting to
>> use them for protocol extensions.
>
> I've updated the spec to make it clear.
>
>
>> > On Mon, 1 Mar 2010, Dave Cridland wrote:
>> > > This behavioural flux, though, is an excellent argument for putting
>> > > keepalive behaviour into the core, as the people designing the
>> > > frameworks are likely to have good ideas of the defaults, and keepalive
>> > > frequencies will need to be controlled per-deployment, typically, rather
>> > > than per-application.
>> >
>> > We could reserve a frame byte for control messages (server talking to the
>> > browser rather than to the script), with the null message being put aside
>> > for keep-alive purposes, if people think that would be especially useful.
>> > I presume we would want the server in charge of sending keepalives rather
>> > than the browser or the script?
>>
>> Keepalives and timeouts (they go together) are used for three
>> equally critical things:
>>
>>    - Keeping TCP open over NATs.
>>
>>    - Detecting when the connection is broken at the server, so you can
>>      clean it up instead of running out of memory and crashing the
>>      server after a few days.  (The Tic-Tac-Toe demo has this problem.)
>>
>>    - Detecting when the connection is broken at the client, so it
>>      can initiated another one.  (Interactive applications need this.)
>>
>> To achieve all three, it's necessary to have keepalive messages sent
>> from both sides.  It is not enough to send a message from one side,
>> and rely on the TCP ACK coming back.  (That'll keep the NAT open, but
>> it won't allow both sides to detect broken connection in a reasonable time.)
>>
>> This can be done as a "ping" style request+response, or each side can
>> transmit independently when it's not transmitted anything for a
>> keepalive interval.
>>
>> For a given NAT timeout, the "ping" style uses more bandwidth than the
>> "independent" style if the client and server's broken-connection
>> timeouts are quite different.  This is because you must reduce the
>> keepalive interval by more due to the extra jitter from pinging, and
>> because it causes a message in both directions resulting in 3 TCP
>> packets, when 2 packets would be enough most of the time.
>>
>> When the server and client's broken-connection timeout needs are the
>> same, then the "ping" style uses less bandwidth than "independent" style.
>>
>> Note that applications like, say, Facebook and Gmail would be better
>> off with asymmetric broken-connection timeouts and therefore using the
>> "independent" style.
>>
>> In some applications, keepalive bandwidth is actually the main
>> bandwidth consumer, and it gets quite expensive, so minimising it
>> is quite desirable.
>>
>> What I conclude from this is that each application must be able to
>> decide for itself what keepalive strategy it's going to use, and it
>> must not be assumed that server-initiated pings are always a good choice.
>
> These cases all sound like the script talking to the server, not the
> browser talking to the server, and thus can just be done by defining such
> mechanisms in the subprotocol. For example, if the protocol is some chat
> protocol, then the subprotocol could be something like:
>
> Client-to-server:
>   MSG <buddy-id> text...
>   ADD <buddy-id>
>   ACCEPT <buddy-id>
>   REJECT <buddy-id>
>   PING
>
> Server-to-client:
>   MSG <buddy-id> text...
>   ADD-REQUEST <buddy-id>
>   STATE <buddy-id> <state>
>   PING
>
> Here, "PING" is the message that would be used as the keepalive.
>
> We should only add things to the core Web Socket protocol if it's
> something that will really apply to all subprotocols. Otherwise, we're
> adding an implementation burden on server-side implementors who don't need
> the feature but still have to deal with it when the client uses it.
>
>
>> > IMHO, *we are responsible* for helping authors make the right choice
>> > here, even if in some cases that means simply not giving them a
>> > choice.
>>
>> So why do you want to push things like message dispatch, flow control,
>> and cooperation between independent components on a page onto the
>> application authors, instead of helping to make the right choices with
>> those technical things?
>
> Because forcing these features on everybody introduces an implementation
> burden on server-side implementors who don't need the features.
>
>
>> > > However, I'd be inclined to split the reservation into "reserved for
>> > > future WebSocket revisions and approved extensions" versus "reserved
>> > > for applications to use as they want".
>> >
>> > Since the API doesn't expose the frame types at all, I don't
>> > understand why any would fall into the second category.
>>
>> The API is not relevant: because the discussion here is all about
>> implementations that aren't using the WebSocket API at all.
>
> While I think we should make it possible to use the API from a non-browser
> client, I don't see why such a client would need to use Web Sockets if its
> needs are so involved that the server has dedicated code to handle it that
> is not used by its browser-side clients.
>
> If you are writing dedicated serer code for non-browser clients, then you
> don't need Web Sockets. Just use TCP.
>
>
>> > If you're exposing a Web Socket server for non-browser UAs, just make
>> > the client send something in the handshake that opts in to using a
>> > different protocol (that looks very similar to Web Socket, but isn't),
>> > and then this whole spec becomes irrelevant, and you can use whatever
>> > mechanisms you want.
>>
>> I'm guess the intention is they want to use WebSocket so that,
>> eventually, browsers will be able to speak to the services they've
>> deployed without having to rewrite those services.
>
> That's quite reasonable, but then you don't need use frame types, since
> those frames aren't going to be used by those services.
>
>
>> And to reuse the inevitable Web Socket client APIs that will appear
>> quite soon for Java, .NET, Python, Perl etc.
>
> Why not use the already existing TCP client APIs?
>
>
>> Or maybe it's just a herd instinct, using the protocol even though
>> it's not well suited to their problem? ;-)
>
> We shouldn't attempt to make the protocol suit problems that are _by
> definition_ not problems for which the protocol is suited. That makes no
> sense.
>
>
>> > > If the handshake messages are to contain content, then they MUST
>> > > have headers to indicate the content length to be legal HTTP: in
>> > > this case, a Content-Length header would be appropriate
>> >
>> > The handshake messages don't contain content. Should I add a
>> > "Content-Length: 0" field to the handshake to make this clearer?
>>
>> Definitely not, because something will inevitably parse the following
>> data as the beginning of another HTTP response if you do that.
>
> That's the idea.
>
>
>> One argument in favour of length-delimiting is that forwarders
>> (including the APIs) don't have to repeatedly examine the data at all to
>> ensure it's integrity, e.g. for things like parsing and inserting
>> transport control messages (you mentioned them earlier).  Sentinals
>> force each component to examine every byte that it passes along, which
>> is a noticable cost under high load - especially for large messages.
>> (Modern OSes and hardware can move data between files and among network
>> ports without the CPU ever loading it into it's cache.)
>>
>> That's a practical efficiency thing.  Google people have said that SPDY
>> experiments showed length-delimiting is generally faster too.
>
> In practice I would imagine that anyone with such needs is going to wait
> until Web Sockets supports compression (probably the first thing to be
> added once we have the basic protocol figured out), at which point they'll
> never need to see text frames at all.
>
> Amateur programmers (the ones more likely to just use text frames) aren't
> generally going to have so much traffic that it matters.
>
>
>> > > Up above, you implied this was rare.
>> >
>> > I believe this problem would be less common than the problem of people
>> > misimplementing string measurement, yes.
>>
>> I think you're probably right, but I would vote for length-delimiting on
>> efficiency grounds anyway.
>
> I completely agree that if amateur programmers were not a priority here
> that we'd just use length encodings.
>
>
>> If you wanted to be more secure, it's possible to protect against both
>> issues at once: Require a leading byte length *and* 0xff sentinel following.
>
> I don't really see how that would work, unless we made them alternatives,
> but then we'd still have to pick one for the client to use. I'm not sure
> how that would really solve the problem.
>
>
>> That won't make amateur code magically work with non-ASCII characters,
>> but it will protect better against injection attacks taking advantage of
>> amateur code.
>
> That just seems like it would become the worst of both worlds, with
> complicated error-handling rules for when they didn't match.
>
>
>> > > > Leading zeros are not disallowed. (There wouldn't be much point
>> > > > disallowing them as far as I can tell.)
>> > >
>> > > Yes, because multiple representations of the same values has *never*
>> > > caused a security issue.
>> >
>> > On the contrary, it's often the source of problems. However, in this
>> > case, I don't really see how it could be. Do you have an attack model
>> > in mind?
>>
>> How about sending a continuous stream of leading zeros, and causing
>> denial of service because the parser author didn't think to empty the
>> input buffer while parsing the number, or uses a quadratic parsing
>> (repeatedly start from the beginning on receiving more input) because
>> they expected the number to be a short byte sequence?
>
> If we're worried about that kind of attack, you could just as easily send
> an infinite handshake. Why is it a problem here but not with the
> handshake?
>
>
>> Actually I think we should support split-message chunks and then define
>> a fixed maximum chunk size of 2^31-1.  It wouldn't limit message size,
>> only chunk size.  It's much more likely to be treated correctly by real
>> implementations.
>
> Well we don't do binary frames at all yet, but in principle for binary
> files I would agree. For something like just a compressed text frame, I
> think it'd be better to keep it as one frame, as if it had been a text
> frame. That's probably a discussion for once we have the basics pinned
> down, though; the current design doesn't preclude this kind of approach.
>
>
>> > More importantly, I don't really see what our options are here. We can
>> > certainly ban the client sending leading nulls -- and indeed we have,
>> > since the client isn't allowed to send binary data at all currently,
>> > let alone binary data with leading nulls in the length. But what
>> > should the server do when faced with a buggy or hostile client? If we
>> > don't define how to handle this, but say that servers should expect it
>> > and close the connection if they see it, then it's clear to me that
>> > most implementations would _not_ check for it (that has been my
>> > experience with other languages and protocols -- asking implementors
>> > to check for something for the purposes of failing usually leads to
>> > them ignoring the requirement on the basis that doing something else
>> > is "more reliable"). So that leaves us with the choice the spec has:
>> > define it in a way that is trivial to implement interoperably (it's
>> > just an intrinsic part of handling valid data), and which is
>> > apparently harmless.
>>
>> The something else is only "more reliable" if these things ever occur in
>> the wild.  If it's very clear from the start that there are no leading
>> zeros, nothing will need to parse it, and nothing will send it because
>
> Agreed so far...
>
>> they get rejected when they first write code which does so.
>
> ...but not with this. If we don't define how they are handled, as I
> described above, then what happens will vary from implementation to
> implementation. By defining it in the spec we ensure that it will be
> tested when we write the test suite, and thus that servers are at least
> going to be exposed to it if their developers look at the test suite.
>
>
>> > > So, your defined behaviour in the face of a DOS or overflow attack
>> > > is to, what? Just work?
>> >
>> > The spec explicitly that "If the user agent is faced with content that
>> > is too large to be handled appropriately, runs out of resources for
>> > buffering incoming data, or hits an artificial resource limit intended
>> > to avoid resource starvation, then it must fail the WebSocket
>> > connection". It furthermore says that "Servers may close the WebSocket
>> > connection whenever desired". I don't see what else we can say with
>> > respect to handling DOS or overflow attacks.
>>
>> I'm concerned about client APIs, server APIs, and network proxies,
>> that may pass along data incrementally and have to keep a running
>> counter based on the frame length they have parsed (or sent).
>>
>> That's how large length-delimited frames are likely to be handled in
>> practice inside frameworks.  And WS-parsing proxies must handle any
>> legitimate frame sized.
>>
>> Both are quite likely to break when presented with a length which
>> doesn't fit into 2^32 or 2^64, depending on the variable size they use
>> internally.  By not specifying a maximum, I think we're encouraging
>> this as a failure point - in the same way that old HTTP agents crashed
>> when faced with a Content-Length >= 2^31 - because the authors didn't
>> know what was a sensible limit to implement, but practically they had
>> to use some fixed size type for it - a "bignum" implementation would
>> be ridiculous despite being the only formally correct thing to do.
>
> The server-side developer is also the subprotocol writer, so they do know
> what the limit is, they decide it when designing their protocol.
>
> Bypassing man-in-the-middle proxies entirely by using TLS seems like the
> best solution if your protocol uses frames so big that those proxies are
> going to screw it up.
>
>
>> That's why I'm thinking that an explicit maximum length of 2^31-1 per
>> chunk, combined with permitting messages composed of multiple chunks
>> (because that's useful for other reasons anyway), is the safest choice
>> coming to mind at the moment.  For the length-delimited frames, that is.
>> Sentinel-delimited frames can already be any length.
>
> Well, right now the only length delimited frame that's allowed has a
> maximum allowed length of zero. But I agree that when we add binary frames
> it might make sense to have some limits.
>
>
>> Note that this is *different* from running out of resource limits or
>> protecting against starvation, because this is just about counting, so
>> it doesn't come under the section which talks about resource limits -
>> and it shouldn't.
>
> Using a four-byte word for counting and overflowing that count is
> technically a resource limit. :-)
>
>
> On Thu, 4 Mar 2010, Scott Ferguson wrote:
>>
>> A strawman for discussion purposes.
>
> I'm not really sure what problem this strawman is intended to address. Can
> you elaborate?
>
>
> On Fri, 5 Mar 2010, Greg Wilkins wrote:
>>
>> I think we need to decide who will be doing the extending and creating
>> new subprotocols.
>>
>> Ian appears to be advancing the position that if it can't be done in
>> javascript via the websocket API, then it can't be done.  Moreover, that
>> if you are not doing it in javascript then you shouldn't be using
>> websockets.
>
> That's an oversimplification of my position.
>
> My position is this:
>
> Web browser vendors want to expose to Web page scripts the ability to open
> essentially a TCP connection to arbitrary servers, to replace the
> combination of XHR and the "infinite <iframe>" hack people use today. This
> obviously would have huge security problems, so instead they are willing
> to settle on a compromise: layering over TCP a minimal handshake to
> enforce an "origin"-based security policy with explicit in-band server
> opt-in. In addition, to avoid having to expose stream APIs to scripts, it
> is desired to make the mechanism use packets (frames) rather than exposing
> a stream. There's also a desire to make this use ports 80 or 443 and be
> able to use this mechanism to talk to servers who currently expose HTTP or
> HTTPS on those ports.
>
> Web Sockets is intended to address this use case.
>
> In addition, there is naturally a desire to make it possible for authors
> to use whatever solutions get developed in this space to talk directly to
> dedicated clients as well as talking from scripts, so that servers who do
> not support a dedicated TCP-based protocol but do expose a Web Socket-
> based protocol for their Web page's scripts can still be used from
> dedicated client programs. (This is similar to how JSON APIs intended for
> JavaScript, e.g. the Twitter API, is also usable from native clients.)
>
> Once a server starts writing dedicated code to handle connections from
> dedicated clients, though, there's no need for the origin-based model or
> the frames, and therefore no reason why the server can't just provide a
> dedicated protocol unrelated to Web Sockets.
>
>
> Now it's quite possible that there is a need for another protocol also,
> one that addresses some native-client-specific needs that I'm not aware
> of. It's also possible that the majority of this working group's members
> are interested in such a protocol, and not in the protocol needed by
> browser vendors as described above. If that is the case, then we should
> work on that protocol, and the browser vendors' needs shouldn't be a
> concern. IMHO it would be a huge mistake to try to take two such unrelated
> use cases and try to address them simultantously in one protocol. It would
> be equivalent to trying to address the use cases that led to IMAP and HTTP
> simultaneously in one protocol.
>
>
>> Examples that come to mind include:
>>
>>   Browser implementing a simple multiplexing standard
>>   to make all websockets from the same tab/window share
>>   a common connection.
>>
>>   Browser implementing compression.
>
> Why wouldn't we just build these straight into the Web Socket spec? Surely
> if browsers just implement their own random extensions, it would only be
> for experimentation, since it wouldn't have a chance to interoperate with
> other browsers and servers.
>
>
>>   Firewalls acting as aggregators and combining
>>   multiple base connections into fewer multiplexed
>>   connections to the business servers.
>
> Assuming you mean on the server-side, these are just considered part of
> the server from the spec's point of view.
>
>
>>   Appliances doing SSL offload and converting wss to ws
>>   connections with injected certificate information.
>
> Same here.
>
>
>> I also think that there will be extensions done in javascript by
>> frameworks/applications, but by definition they work above the API and
>> need no more support from the base protocol than the API already
>> provides.
>
> Right. (Not sure what you mean by "extension" here though.)
>
>
> On Fri, 5 Mar 2010, Salvatore Loreto wrote:
>>
>> 1) do we want WebSocket protocol to be an extensible protocol?
>
> Extensible by whom? In what sense? It's clear that we need to make sure
> the protocol can be extended for future versions of the protocol itself,
> in a backwards- and forwards-compatible fashion, and it also seems clear
> that we need to make sure clients and servers can experiment with such
> proposals (not in the wild) to see how they would work. Is that what you
> mean?
>
>
> On Mon, 8 Mar 2010, Vladimir Katardjiev wrote:
>>
>> First, a copy/paste error. Page 31 seems to be missing
>> Sec-WebSocket-Key2.
>
> Oops, that was a bug in my HTML-to-text convertor. Fixed.
>
>
>> Same page, you probably meant | instead of " in
>>
>>       (such as |Cookie")
>
> Also a bug in my script! Fixed.
>
>
>> Also, this might be just my reading comprehension, but the following
>> paragraph caused a double-take
>>
>>       if a server assumes that its clients are authorized
>>       on the basis that they can connect [...] then the server
>>       should also verify that the client's handshake includes the
>>       invariant "Upgrade" and "Connection" parts of the handshake, and
>>       should send the server's handshake before changing any user data.
>>
>> Checking Upgrade and Connection is redundant if Sec-WebSocket-* are
>> present from a browser PoV;that was the point of them, no?
>
> Yeah, that note is obsolete now. Removed.
>
>
>> Right, time to graduate from reading comprehension into actual
>> questions. The big addition obviously Sec-WebSocket-Key1/2. Are there
>> any fundamental limitations on where spaces can and should occur in the
>> header? According to my interpretation as it stands, the following
>> string is a perfectly valid key: "  4abcd". Is this intended? And, if
>> so, can the leading (or trailing, if so inclined) spaces cause issues
>> when parsing by proxies/existing stacks?
>
> I shouldn't think so, at least not with any proxies or existing stacks
> that are compatible enough in other ways to not screw up Web Sockets in
> general. However, I'm just guessing here.
>
>
>> Also, I must've missed the big discussion, but what's the rationale of
>> having two key headers. One I can buy, but since it should already be
>> impossible to create a Sec-* header from a browser, AND it is impossible
>> to add spaces to the header value, AND it is a GET with a body, it's
>> already triple impossible (impossibler? impossiblest?) for XHR/CORS to
>> forge. If someone has a handy link to where this suggestion was born
>> it'd be appreciated, because this seems like quite a bit of computation
>> involved.
>
> I originally had just one header:
>
>  GET /demo HTTP/1.1
>  Host: example.com
>  Connection: Upgrade
>  Sec-WebSocket-Key: 12345
>  Sec-WebSocket-Protocol: sample
>  Upgrade: WebSocket
>  Origin: http://example.com
>
> However, the obvious (and wrong) way to implement this is to read the
> entire header, and then do something like (in this case in Perl):
>
>   $header =~ m/Sec-WebSocket-Key: \(.+\)\r\n/;
>
> ...or some such. Now consider this request:
>
>  GET /demo?Sec-WebSocket-Key: HTTP/1.1
>  Host: example.com
>  ...
>
> The simplest implementation is thus vulnerable to trivial attack.
>
> As defense in depth, I introduced several overlapping solutions to this:
>
>  * There are two keys, both of which have to end in \r. This means you
>   have to find two places to smuggle data into the connection, not just
>   one, and they have to be on different lines.
>
>  * The keys have to include at least one space, and this is verified by
>   requiring the implementation to divide by that count -- if there aren't
>   enough spaces and you didn't check for it, you'll get a divide-by-zero
>   in many languages.
>
>  * Non-numeric characters are stripped, so that the division is definitely
>   done on a number, so that any side-effects of dividing a string or NaN
>   or something like that are avoided in the smuggling case.
>
>  * The client is required to randomly insert the spaces, as well as random
>   non-numeric characters, and is required to randomyl shuffle the
>   headers, so servers can't rely on any one convention, they have to at
>   least do a half-hearted attempt at really parsing the header.
>
>  * The key also involves eight bytes after the handshake, so you have to
>   properly determine the end of the handshake, so things can't be
>   smuggled into the body, even if you otherwise ignore frames from the
>   client.
>
>
>> The junk characters in the handshake also seem redundant. If you already
>> need to verify the correct number of spaces this means you are checking
>> for spaces already. As for the garbage characters, all they made me do
>> was replace("[^0-9]", '') instead of replace(' ', '') so I'd say they
>> didn't alter the functionality of my code other than take up more CPU
>> cycles.
>
> They're intended to make sure that the behaviour is well-defined even if
> the key header is being smuggled in somewhere. Otherwise, different
> servers might use different ways of handling invalid handshakes, and that
> is likely to lead to some being more vulnerable. Now we can't entirely
> prevent this, obviosuly, but we can at least help it a little here.
>
>
>> The closing handshake is also interesting. Maybe state in 5.3. that the
>> server shouldn't disconnect on 0xFF (instead of may; it still may, but
>> is discouraged from doing so since it may cause data loss. If the target
>> audience really is the kind of programmer you need to verify their code
>> by actually ensuring they read the spaces, then make the default level
>> of the protocol ensure data integrity, because that's what they'll see
>> in the lab network).
>
> Good point. Fixed.
>
>
>> Finally, and probably my smallest serious gripe, but what's up with the
>> mix of references to ASCII and UTF-8 as characters. It is all
>> technically correct, to nobody's surprise, but it's distracting to read.
>> I hope it's just a transitional thing and it'll eventually be UTF-8
>> throughout but it was just too boring to do it all at once.
>
> Not sure what you mean.
>
>
>> Page 20:  EXAMPLE: For example
>
> Yeah, there's a few of these. I'll look into what to do about it. Maybe I
> should indent the examples rather than prefixing them with "EXAMPLE:"?
>
>
>> Page 22: -> If the byte is 0x3A (ASCII :)
>>
>> The internet has damaged me so badly this was a smiley face.
>
> That, I can't help you with. :-)
>
> --
> Ian Hickson               U+1047E                )\._.,--....,'``.    fL
> http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
> _______________________________________________
> hybi mailing list
> hybi@ietf.org
> https://www.ietf.org/mailman/listinfo/hybi
>
>