Re: [hybi] Apples and Orangutans

Maciej Stachowiak <mjs@apple.com> Sun, 12 April 2009 23:35 UTC

Return-Path: <mjs@apple.com>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id ED21E3A6811 for <hybi@core3.amsl.com>; Sun, 12 Apr 2009 16:35:07 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.599
X-Spam-Level:
X-Spam-Status: No, score=-106.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CzkvBVWYBup3 for <hybi@core3.amsl.com>; Sun, 12 Apr 2009 16:35:07 -0700 (PDT)
Received: from mail-out3.apple.com (mail-out3.apple.com [17.254.13.22]) by core3.amsl.com (Postfix) with ESMTP id EF5FF3A691C for <hybi@ietf.org>; Sun, 12 Apr 2009 16:35:06 -0700 (PDT)
Received: from relay10.apple.com (relay10.apple.com [17.128.113.47]) by mail-out3.apple.com (Postfix) with ESMTP id 4D3C75BFE9CD for <hybi@ietf.org>; Sun, 12 Apr 2009 16:36:17 -0700 (PDT)
Received: from relay10.apple.com (unknown [127.0.0.1]) by relay10.apple.com (Symantec Brightmail Gateway) with ESMTP id 35F7928062 for <hybi@ietf.org>; Sun, 12 Apr 2009 16:36:17 -0700 (PDT)
X-AuditID: 1180712f-a4962bb0000012d3-00-49e27af0f961
Received: from et.apple.com (et.apple.com [17.151.62.12]) by relay10.apple.com (Apple SCV relay) with ESMTP id F08C32806C for <hybi@ietf.org>; Sun, 12 Apr 2009 16:36:16 -0700 (PDT)
MIME-version: 1.0
Content-transfer-encoding: 7bit
Content-type: text/plain; charset="US-ASCII"; format="flowed"; delsp="yes"
Received: from [10.0.1.7] (c-69-181-43-20.hsd1.ca.comcast.net [69.181.43.20]) by et.apple.com (Sun Java(tm) System Messaging Server 6.3-7.04 (built Sep 26 2008; 32bit)) with ESMTPSA id <0KI000HBFHKGL030@et.apple.com> for hybi@ietf.org; Sun, 12 Apr 2009 16:36:16 -0700 (PDT)
Message-id: <1D801D05-C7F1-4DDC-8034-FAA458626F53@apple.com>
From: Maciej Stachowiak <mjs@apple.com>
To: Ian Hickson <ian@hixie.ch>
In-reply-to: <Pine.LNX.4.62.0904122251070.10339@hixie.dreamhostps.com>
Date: Sun, 12 Apr 2009 16:36:15 -0700
References: <49DEF171.4080506@mozilla.com> <A3699591-148C-4795-967A-6CDE23FE75F0@apple.com> <Pine.LNX.4.62.0904122230550.10339@hixie.dreamhostps.com> <71289452-5A19-48F4-9819-7FD9747EE9CA@apple.com> <Pine.LNX.4.62.0904122251070.10339@hixie.dreamhostps.com>
X-Mailer: Apple Mail (2.930.3)
X-Brightmail-Tracker: AAAAAA==
Cc: hybi@ietf.org
Subject: Re: [hybi] Apples and Orangutans
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 12 Apr 2009 23:35:08 -0000

On Apr 12, 2009, at 4:00 PM, Ian Hickson wrote:

> On Sun, 12 Apr 2009, Maciej Stachowiak wrote:
>>>
>>> WebSocket includes a path (from the ws:// URL), which is intended to
>>> be analogous to the TCP port number.
>>
>> I'm not sure the path is a good choice for identifying the type of
>> service available. HTTP has both a path and a type. The path is  
>> sent by
>> the client and identifies the specific resource. The type is sent  
>> by the
>> server and directs how the resource is to be interpreted. I do not  
>> think
>> we can expect, for example, multiple chat services to use the same  
>> path
>> format, but there may be multiple chat services which choose to use  
>> the
>> same protocol-by-convention over WebSocket. Right now, there doesn't
>> seem to be a good way to give this information to the client, in the
>> same way that any arbitrary URL can tell the client that its contents
>> are to be interpreted as XML or PDF or plain text.
>
> I don't understand why a 16bit integer solves the problem for TCP,  
> but a
> string doesn't solve the problem for WebSocket. They seem exactly
> equivalent to me -- they identify the target of the connection, and by
> convention, the protocol that that target supports.

HTTP is a better analogy here than TCP in my opinion. The fact that  
TCP only maps ports to protocols by convention is not so great - we  
have to go to extra design effort to prevent cross-protocol attacks as  
a result.

> Why couldn't we exect multiple chat services to use the same path for
> their chat server, in the same way that multiple chat services  
> written on
> TCP use the same port?

I think that would be a poor design in the same way as relying on the  
exact path of "/favicon.ico" or "/crossdomain.xml" is a poor design.  
For one thing, it would mean you can't host multiple chat services on  
the same host and port. That seems like an arbitrary limitation. Also,  
if your WebSocket service is running on the same host and port as your  
web service, you may not have a free choice of what path to use, since  
some may be taken.

> There's also plenty of precedent on the Web of including type  
> information
> in the URL -- so much so, in fact, that some people find the path
> (specifically the "extension" part of the "filename" component of the
> path) more reliable than the explicit Content-Type data. There's also
> "well-known paths" like robots.txt, which, while suboptimal from  
> HTTP's
> point of view, seem like they'd fit in well with WebSocket.

I think well-known paths are problematic and we should not design  
WebSockets around using it for everything. I cited two examples above  
where I believe you've stated on the record that well-known paths are  
a design problem. What is different about WebSocket that makes relying  
on well-known paths for everything a good idea?

It seems like extremely poor design to conflate the path (which  
identifies a specific resource) with the type (which identifies the  
*kind* of resource).

> It's not really clear to me how we would expose the information in  
> the API
> if we were to use something other than the path.

Exposing the information in the API would be trivial. And additional  
callback, or a property on the WebSocket object guaranteed to be set  
by the time the first message is received, would both work OK. By  
comparison, XMLHttpRequest provides for type checking by exposing the  
HTTP response headers.

> It's not like most authors are going to care, or check to see what  
> the remote end thinks it
> is speaking -- they'll connect to their own servers, or to well-known
> remote third-party servers, and just talk the protocol that that  
> endpoint
> supports. The path is convenient in that we already have it and it  
> seems
> like it's exactly where authors would put the distinction (just as  
> they do
> today on the Web -- content negotiation is used far less than just
> different URLs for different types, after all).

The rest of your argument is that most authors won't bother to check.  
It is true that most Web content authors don't currently check the  
type of XMLHttpRequest responses, except perhaps implicitly by using  
responseXML. However, until very recently there was no ability to  
connect to an unknown service via XMLHttpRequest since cross-domain  
support is a recent innovation that is still being deployed. Browsers  
check types all the time however (even though in various cases that  
information is not considered completely reliable). Browsers do not  
rely on hardcoded well-known paths to identify the type of the  
resource, and it's hard to imagine the Web working as well as it does  
if they did. One use case that we can foresee is a JavaScript library  
using WebSocket to connect to a chat service using a well-known  
protocol (perhaps someone will specify a protocol for Jabber over  
WebSocket). In such a case, it is much more likely that the protocol  
type will be checked than with hand-coded client code, since the  
library will want to report errors to the content author in a  
reasonable way rather than return garbage data.

Adding a protocol type would be a modest extension that is trivially  
added to the protocol and client-side API. It would add very little  
complexity to the case of connecting to a known service, but creates  
for the possibility of a sane way of connecting to unknown services.

Regards,
Maciej