Re: [hybi] WebSockets feedback (Was: Bayeux / Jetty perspective.)

Ian Hickson <ian@hixie.ch> Fri, 03 April 2009 21:06 UTC

Return-Path: <ian@hixie.ch>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 883A53A6B39 for <hybi@core3.amsl.com>; Fri, 3 Apr 2009 14:06:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.098
X-Spam-Level:
X-Spam-Status: No, score=-3.098 tagged_above=-999 required=5 tests=[AWL=-0.499, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IkA31cn23kL3 for <hybi@core3.amsl.com>; Fri, 3 Apr 2009 14:06:21 -0700 (PDT)
Received: from looneymail-a3.g.dreamhost.com (caibbdcaaaaf.dreamhost.com [208.113.200.5]) by core3.amsl.com (Postfix) with ESMTP id 484923A6866 for <hybi@ietf.org>; Fri, 3 Apr 2009 14:06:21 -0700 (PDT)
Received: from hixie.dreamhostps.com (hixie.dreamhost.com [208.113.210.27]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by looneymail-a3.g.dreamhost.com (Postfix) with ESMTP id 415E427B79; Fri, 3 Apr 2009 14:07:24 -0700 (PDT)
Date: Fri, 03 Apr 2009 21:07:23 +0000
From: Ian Hickson <ian@hixie.ch>
To: Paul Prescod <paul@prescod.net>, Maciej Stachowiak <mjs@apple.com>
In-Reply-To: <DC8B3ACC-329A-40A4-A41D-70F580350CDA@apple.com>
Message-ID: <Pine.LNX.4.62.0904032027050.25082@hixie.dreamhostps.com>
References: <49D1AE22.1080409@webtide.com> <Pine.LNX.4.62.0903310624180.25058@hixie.dreamhostps.com> <49D5AA5F.8030503@webtide.com> <Pine.LNX.4.62.0904030757200.25082@hixie.dreamhostps.com> <DC8B3ACC-329A-40A4-A41D-70F580350CDA@apple.com>
Content-Language: en-GB-hixie
Content-Style-Type: text/css
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"
Cc: hybi@ietf.org
Subject: Re: [hybi] WebSockets feedback (Was: Bayeux / Jetty perspective.)
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Apr 2009 21:06:22 -0000

On Fri, 3 Apr 2009, Paul Prescod wrote:
>
> In general, the benefit of using an existing protocol is that you take 
> advantage of a lot of infrastructure and in particular of protocols 
> built ON TOP of that protocol. Maybe that doesn't matter for train 
> control, but it matters a lot for webchat, which as far as I'm concerned 
> is the single biggest application of a "server->client" protocol.

It's _an_ application, but I don't think it's the biggest. I would expect 
games to be bigger, for example, and would expect background update 
notificiations in systems like Google Calendar to be even more common.

But it's hard to say in advance.


> but I'll make the case for HTTP because I'm more familiar with it:
> 
>  * new specification is very short and simple (e.g., not much more than [1])

The Web Socket draft is pretty simple.

I'm not convinced that it's really a fair comparison, either. rHTTP as a 
protocol is remarkably complex, in that it includes by reference the whole 
of HTTP, which is a remarkably complex protocol.


>  * client-libraries and intermediaries are more easily adapted (e.g. 
> Curl, httplib, ...)

One advantage of Web Socket is that the protocol is so simple that there's 
no need to adapt an existing library, new code can be easily written to 
support the protocol.


>  * domain knowledge is easily applied ("it's just like writing a PHP script
> except you use Javascript and listen to Firefox rather than Apache")

Actually using rHTTP for true full-duplex connections would be much more 
complicated, since you'd need to somehow get two separate connections from 
the client to the server into the same CGI script. With Web Socket, there 
is only one underlying TCP connection.


>  * standards built on HTTP will work more easily (cookies, AtomPub, 
> WebDAV, Subversion, etc.)

I agree entirely that if you want to do Subversion, or AtomPub, that you 
should not try to layer it on top of WebSocket. That isn't the use case 
we're targetting, though. (I don't know anyone who wants their server to 
be able to speak WebDAV to their JavaScript running in a browser.)


>  * in general, the information system called "the Web" will have fewer 
> protocols and will be simpler, which will help it evolve and survive in 
> the long run. For example, if we add message signing to HTTP then we get 
> it "for free" in both directions. If we add a reliability protocol to 
> HTTP then we get it "for free" in both directions. ("for free" from a 
> specification point of view...from an implementation point of view it is 
> "for cheap" because you can reuse C code between Firefox and Apache if 
> you like, or IIS and IE, etc.)

I don't think that a small number of very complex technologies is a better 
situation than a large number of simple technologies.


> That all said, my current feeling is still that Jabber is the best 
> choice because it is already the dominant protocol for the asynchronous 
> bi-directional use cases we're talking about. But I really cannot see 
> the advantage in inventing a new protocol from scratch. The Web and 
> Internet are complicated enough as they are without a proliferation of 
> framing protocols.

Jabber doesn't really meet the requirements for which Web Socket is 
designed [1]. Specifically:

 - It's not clear that one can easily upgrade from HTTP to a bidirectional 
   Jabber connection. Jabber over HTTP uses two TCP connections, as I 
   understand it.

 - It's not at all clear that a fully-conforming server-side component for 
   Jabber can be written in a few dozen lines of scripting code.

 - Jabber uses a separate sidechannel for streaming video, if I 
   understand correctly; it's not extensible to support efficient transfer 
   of binary data over the same channel.

 - It's not clear that Jabber's handling of virtual hosting is really 
   compatible with the "same-origin" model.

 - It's not clear that Jabber's protocol is designed to be resistant to 
   the attacks we expect (such as connecting to an SMTP server, or an 
   existing Jabber server, or whatnot).

 - It's not clear that Jabber's protocol supports the opt-in security 
   model that is desired for an HTML API.

Also, it's not clear to me that streaming XML is a particularly good 
solution. Experience with Web authors suggests that XML's draconian error 
handling and XML's namespaces are more confusing than desireable.

[1] http://www.ietf.org/mail-archive/web/hybi/current/msg00007.html


On Fri, 3 Apr 2009, Paul Prescod wrote:
>
> As I said, Jabber is probably a better fit. But if it were to use HTTP I 
> would not suggest to use the "surface syntax". If it were to use HTTP 
> then it would just add the minimum required to reverse the 
> communications direction:
>
>  * http://www.ietf.org/internet-drafts/draft-lentczner-rhttp-00.txt

I think rHTTP makes sense for certain use cases, but it doesn't fit the 
requirements that led to Web Socket's development (in particular, as you 
say, you'd need two TCP connections, which is non-trivial to set up in 
most scripting environments shared with a Web server).


On Fri, 3 Apr 2009, Maciej Stachowiak wrote:
> > 
> > Yes, that's the disadvantage of using sentinel markers. The reason for 
> > this design decision is to reduce the likelihood of buffer overruns on 
> > the server side.
> 
> Do you mean a buffer overrun in the sense of a security problem? I don't 
> see how the protocol framing makes this more or less likely.

Not necessarily a security problem. The problem with lengths is that since 
the data is UTF-8, but the length would have to be bytes, you are relying 
on the programmer being able to measure the byte-length of a Unicode 
string, which internally might well be stored as UTF-16 or some other 
encoding, only to be serialised to UTF-8 on output. Sentinel markers are 
significantly easier to deal with from the point of view of an amateur 
script author, IMHO.


> If anything, servers have to be careful to avoid being tricked into 
> sending the end marker in the middle of a message. Multipart MIME has 
> this problem, and implementing it in the context of form submission it's 
> been difficult to get the security details right.

The sentinel marker in Web Socket is 0xFF, which is never valid UTF-8. So 
long as the author properly encodes all text to UTF-8 on output, there 
shouldn't be a problem. If they fail to do that, then their system will 
have bigger problems than just being able to have 0xFF bytes inserted at 
random.


> I think a declared length has better security properties in a network 
> protocol than a distinguished end marker.

The Web Socket protocol uses a declared length for the case of binary 
data, where measuring the data length on the server is likely easy.


> > An interruptible UTF-8 decoder doesn't seem like a huge problem.
> 
> It's doable but you have to preserve state. However, having tricky edge 
> cases that normally don't come up is a design problem. It is, once 
> again, the sort of thing that tends to lead to security holes.

I agree, but in practice I don't think most authors are going to decode 
UTF-8 in parallel with looking for the sentinel. They're just going to 
buffer bytes until they have the sentinel, then get their scripting 
language to just treat the bytes as UTF-8 data.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'