Re: [hybi] Apples and Orangutans

Maciej Stachowiak <mjs@apple.com> Mon, 13 April 2009 00:39 UTC

Return-Path: <mjs@apple.com>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id A23E43A6C7E for <hybi@core3.amsl.com>; Sun, 12 Apr 2009 17:39:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.599
X-Spam-Level:
X-Spam-Status: No, score=-106.599 tagged_above=-999 required=5 tests=[AWL=-0.000, BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id z4nLX7g1mTok for <hybi@core3.amsl.com>; Sun, 12 Apr 2009 17:39:09 -0700 (PDT)
Received: from mail-out3.apple.com (mail-out3.apple.com [17.254.13.22]) by core3.amsl.com (Postfix) with ESMTP id B500B3A6A6E for <hybi@ietf.org>; Sun, 12 Apr 2009 17:39:09 -0700 (PDT)
Received: from relay11.apple.com (relay11.apple.com [17.128.113.48]) by mail-out3.apple.com (Postfix) with ESMTP id 4BFCD5C006B6 for <hybi@ietf.org>; Sun, 12 Apr 2009 17:40:20 -0700 (PDT)
Received: from relay11.apple.com (unknown [127.0.0.1]) by relay11.apple.com (Symantec Brightmail Gateway) with ESMTP id 349C728091 for <hybi@ietf.org>; Sun, 12 Apr 2009 17:40:20 -0700 (PDT)
X-AuditID: 11807130-a708ebb000000fcd-7d-49e289f3b8b9
Received: from gertie.apple.com (gertie.apple.com [17.151.62.15]) by relay11.apple.com (Apple SCV relay) with ESMTP id EBC822808D for <hybi@ietf.org>; Sun, 12 Apr 2009 17:40:19 -0700 (PDT)
MIME-version: 1.0
Content-type: multipart/alternative; boundary="Boundary_(ID_+yjeiVaPsanU/BluJcc86Q)"
Received: from [10.0.1.7] (c-69-181-43-20.hsd1.ca.comcast.net [69.181.43.20]) by gertie.apple.com (Sun Java(tm) System Messaging Server 6.3-7.04 (built Sep 26 2008; 32bit)) with ESMTPSA id <0KI000HSEKJ74L30@gertie.apple.com> for hybi@ietf.org; Sun, 12 Apr 2009 17:40:19 -0700 (PDT)
Message-id: <C0524F14-6C1C-4B33-89D8-4C558B66B5CE@apple.com>
From: Maciej Stachowiak <mjs@apple.com>
To: Ian Hickson <ian@hixie.ch>
In-reply-to: <Pine.LNX.4.62.0904122350370.10339@hixie.dreamhostps.com>
Date: Sun, 12 Apr 2009 17:40:18 -0700
References: <49DEF171.4080506@mozilla.com> <A3699591-148C-4795-967A-6CDE23FE75F0@apple.com> <Pine.LNX.4.62.0904122230550.10339@hixie.dreamhostps.com> <71289452-5A19-48F4-9819-7FD9747EE9CA@apple.com> <Pine.LNX.4.62.0904122251070.10339@hixie.dreamhostps.com> <1D801D05-C7F1-4DDC-8034-FAA458626F53@apple.com> <Pine.LNX.4.62.0904122350370.10339@hixie.dreamhostps.com>
X-Mailer: Apple Mail (2.930.3)
X-Brightmail-Tracker: AAAAAA==
Cc: hybi@ietf.org
Subject: Re: [hybi] Apples and Orangutans
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Apr 2009 00:39:11 -0000

On Apr 12, 2009, at 5:11 PM, Ian Hickson wrote:

> On Sun, 12 Apr 2009, Maciej Stachowiak wrote:
>>>
>>> I don't understand why a 16bit integer solves the problem for TCP,  
>>> but
>>> a string doesn't solve the problem for WebSocket. They seem exactly
>>> equivalent to me -- they identify the target of the connection,  
>>> and by
>>> convention, the protocol that that target supports.
>>
>> HTTP is a better analogy here than TCP in my opinion. The fact that  
>> TCP
>> only maps ports to protocols by convention is not so great - we  
>> have to
>> go to extra design effort to prevent cross-protocol attacks as a  
>> result.
>
> WebSocket has built-in support for preventing cross-protocol attacks  
> --
> only trusted peers can send arbitrary data, everyone else is just
> disconnected before the server receives the data.
>
> Why would HTTP be a better analogy? WebSocket really is just trying  
> to be
> TCP for the Web. It doesn't really have anything in common with HTTP  
> other
> than the handshake, and that's really a red herring.

WebSocket is quite different from TCP, since it has defined message  
boundaries whereas TCP is just a two-way byte stream. TCP is much  
lower level than HTTP or WebSocket, as it imposes no structure on the  
data stream at all. It is expected that protocols are built on top of  
TCP which provide the needed structure. I think HTTP is a good  
protocol to use as inspiration, identifying resources by URL and the  
server reporting their type has proven to be a very effective model  
for data resources and for request-response services on the Web, and I  
think it would be similarly effective for bidirectional messaging  
services. There is little reason to think that bidirectional messaging  
services are fundamentally different.


>>> Why couldn't we exect multiple chat services to use the same path  
>>> for
>>> their chat server, in the same way that multiple chat services  
>>> written
>>> on TCP use the same port?
>>
>> I think that would be a poor design in the same way as relying on the
>> exact path of "/favicon.ico" or "/crossdomain.xml" is a poor design.
>
> Fixed URLs are bad in HTTP because it causes people who don't know  
> that
> the resources exist to try to obtain them. They're also a bad idea  
> because
> HTTP is split across multiple users, but the fixed URLs are per-host.
>
> With WebSocket the first point doesn't apply -- we aren't talking  
> about
> people connecting to arbitrary hosts, we're talking about a  
> convention by
> which someone can guess what protocol a server speaks.

Or more importantly, verify that it is the protocol they expect rather  
than interpret the response as garbage.

> The second point need not apply either, since we don't have to lock  
> the
> entire path down, we can just say that the filename, or the  
> extension, or
> whatever, is where you put the protocol you're willing to speak. So  
> if you
> want to write a chat server that speaks WebSocketJabber, you can  
> make it
> available at:
>
>   ws://example.com/~jsmith/chat.wsj
>
> ...or some such.

Using part of the path to identify the type sounds acceptable to me,  
as long as we define what part of the path holds type information as  
part of the WebSocket protocol instead of leaving it completely ad-hoc.

>>
>> Exposing the information in the API would be trivial. And additional
>> callback, or a property on the WebSocket object guaranteed to be  
>> set by
>> the time the first message is received, would both work OK.
>
> Current API:
>
>   socket = new WebSocket('ws://example.com/chat.dcp');
>   socket.onmessage = function (e) {
>     if (e.data != 'DCP') {
>       error('The server mysteriously stopped speaking DCP!');
>       socket.disconnect();
>     }
>     socket.onmesage = chatHandler;
>   }
>   socket.postMessage('DCP');
>
> Explicit API:
>
>   socket = new WebSocket('ws://example.com/chat', 'DCP');
>   socket.onprotocol = function (e) {
>     if (e.data != 'DCP') {
>       error('The server mysteriously stopped speaking DCP!');
>       socket.disconnect();
>     }
>   }
>   socket.onmesage = chatHandler;
>
> It doesn't seem to gain us much.

Putting type information in-bad is not very reliable. How can you tell  
a server that speaks DCP from one that happens to send that string DCP  
as the first message and doesn't have a convention of sending the  
protocol type as the first message. This also seems to be completely  
different from your path convention suggestion. Which do you think is  
the right way for WebSocket subprotocol identification, path or in- 
band messages? How will subprotocol identifiers be registered?

>> By comparison, XMLHttpRequest provides for type checking by  
>> exposing the
>> HTTP response headers. [...]
>>
>> It is true that most Web content authors don't currently check the  
>> type
>> of XMLHttpRequest responses, except perhaps implicitly by using
>> responseXML.
>
> So why should we bother?

I explained below - we are offering the ability to connect to unknown  
services, something that XHR hasn't offered at all. Consider a  
WebSocket-based syndication service - it initally send you the current  
copy of the feed, then incremental updates as they come in. This would  
clearly be better in many cases than, say Atom over http + periodic  
polling. And it's the kind of service you would likely want to open to  
cross-domain access. But it seems pretty useful for Web-based feed  
readers that used the hypothetical Atom-over-WebSocket to tell that  
this is indeed what they are talking to, and to be able to distinguish  
it from Jabber-over-WebSocket or CalDAV-over-WebSocket instead of  
possibly displaying garbage data to the user. After all, the user  
typed in a random supposedly feed URL, and the reader can't just trust  
that to be right.

>> However, until very recently there was no ability to
>> connect to an unknown service via XMLHttpRequest since cross-domain
>> support is a recent innovation that is still being deployed.
>
> It's not like the people connecting to remote JSON resources using
> <script> ever considered that the remote script might not be JSON.  
> They
> say "json" in the URL they connected to, and they assumed it would  
> work.

Remotely embedding remote scripts in the hope that they are JSON is an  
example of a very bad way to do things that should not be replicated.

>
>> Browsers check types all the time however (even though in various  
>> cases
>> that information is not considered completely reliable). Browsers  
>> do not
>> rely on hardcoded well-known paths to identify the type of the  
>> resource,
>> and it's hard to imagine the Web working as well as it does if they  
>> did.
>
> Browsers check types by sniffing more than anything, and rely on
> well-known paths quite a lot. I think it's a stretch to say that the  
> check
> types "all the time". But this is academic, because the browsers are
> acting on the users' behalf, whereas these scripts are acting on the
> authors' behalf. As the subject line of this thread suggests, it's an
> apples and oranges comparison.

I can only think of one well-known path relied on by the browser I  
work on, that being favicon.ico. While in some cases sniffing is  
applied, in other cases the type information is authoritative. Most  
importantly, other than a few types considered to effectively mean  
"unknown", the type of the main resource is generally trusted.

>
>> One use case that we can foresee is a JavaScript library using  
>> WebSocket
>> to connect to a chat service using a well-known protocol (perhaps
>> someone will specify a protocol for Jabber over WebSocket). In such a
>> case, it is much more likely that the protocol type will be checked  
>> than
>> with hand-coded client code, since the library will want to report
>> errors to the content author in a reasonable way rather than return
>> garbage data.
>
> Why is a trivial handshake not enough for this?

It provides a positive network effect if everyone does the handshake  
the same way, thus it is preferable to have subprotocol negotiation  
built into the protocol. The cost of doing so is minimal. Even using  
part of the path would be adequate, although I think it is preferable  
for the server to state the protocol than for the client to request  
it. The latter allows for the possibility that the same service may  
change protocols, just as on the Web the same resource may change types.

>> Adding a protocol type would be a modest extension that is trivially
>> added to the protocol and client-side API. It would add very little
>> complexity to the case of connecting to a known service, but  
>> creates for
>> the possibility of a sane way of connecting to unknown services.
>
> I don't understand what value it adds.
>
> For custom protocols, authors aren't going to want to worry about the
> details of an additional handshake after the one we require now. Why  
> add
> anything at all, when you can do it all -- if you want it -- over the
> regular WebSocket channel?

Doing protocol negotiation in-band with everyone doing it differently  
is more error-prone than having a simple out-of-band mechanism. It  
could be as simple as an additional HTTP header in the WebSocket  
handshake response. I would consider in-band subprotocol handshakes to  
be a worse alternative than a defined path convention. Now, WebSocket  
could require the first messages exchanged to be subprotocol  
identifiers, and require something distinct from any subprotocol  
identifier (such as the empty string). But that seems worse in just  
about every way to defining out-of-band subprotocol identification.

Regards,
Maciej