Re: [hybi] #1: HTTP Compliance

On Wed, Jul 21, 2010 at 11:32 PM, Ian Hickson <ian@hixie.ch> wrote:

> On Wed, 21 Jul 2010, John Tamplin wrote:
> > On Tue, May 18, 2010 at 5:16 PM, Ian Hickson <ian@hixie.ch> wrote:
> > > On Tue, 18 May 2010, Greg Wilkins wrote:
> > > >
> > > > If the handshake is HTTP compliant, then the connection for a
> > > > websocket handshake could be taken from the existing pool of idle
> > > > connections to a host.  That would save the time needed to establish
> > > > the connection.
> > >
> > > The resemblence to HTTP is nothing more than a hack to alolow us to
> > > share ports in certain advanced scenarios. Most Web Socket servers
> > > will know nothing about HTTP.
> >
> > I disagree completely -- there is going to be a web server involved in
> > delivering the web application, and I think the more usual scenario is
> > that web server also implements the WebSocket server.  Why would someone
> > prefer to deploy two servers rather than one, except in the case where
> > their web server doesn't yet support WebSocket?  If this protocol is
> > successful, over time that will drop to 0.
>
> I see no advantage in the common case to using the same software for both.
> It's far easier to just host them separately.
>

It is advisable to believe that you/we/anybody don't have all the answers
and build a protocol that allows people to adjust to the realities of their
time instead of rigidly adhering to something that covers only today's
use-cases.

>
> Why don't people use the same software for HTTP and DNS? Or IMAP and SMTP?
> Or IRC and FTP? Why would they act differently for HTTP and WebSocket?
>

Is this a rhetorical question? I suppose I'll assume not.

Well, lets see. HTTP and DNS don't share transport protocols in the common
case.
IMAP and SMTP sometimes are collocated on the same server.
IRC and FTP serve completely different feature sets.
Many of these were not developed by the same parties, nor at the same time.
That is often a very good reason that they don't reside in the same
binary/server.

HTTP and WebSocket share the same transport protocol, and are used by the
same clients and servers to do the same tasks, and it is extremely likely
that they'll be written/maintained by the same people at the same time.
There is a strong desire to put this into the same binary/server.

>
>
> > > Reusing connections is a level of complexity that is completely
> > > unwarranted and that would only be useful in the rarest of cases. It's
> > > a proposal that lies on completely the wrong side of the 80/20 line
> > > and would introduce _massive_ complexity for authors, who would have
> > > no idea why their WebSocket servers were suddenly receiving random
> > > HTTP requests and vice versa.
> >
> > I'm not sold on connection reuse, but I am not sure where these random
> > HTTP requests would be coming from.  If a connection was to
> > ws://foo.org/socket, the connection was closed, and then another
> > connection was needed for http://foo.org/image.gif, presumably the
> > server at foo.org:80 is capable of answering either request since it
> > would have had to handle either request on a new connection.
>
> In this scenario, we are assuming that the server _can't_ answer the Web
> Socket request (otherwise the connection wouldn't be reused). So we are
> talking about cases where people are attempting to connect to servers that
> don't exist. If we're talking about that, then I don't see why it's any
> more of a stretch to imagine trying to connect to a Web Socket server,
> having it succeed from the server's point of view but fail from the
> client's point of view, and then having the client reuse the connection
> for some bogus HTTP request.
>
> In any case, reusing connections when the server fails to return a valid
> Web Socket response but does return a valid HTTP response is an
> optimisation that will help in only the rarest of cases, all of which are
> indicating failurel and thus likely to be cases where the user doesn't
> really care about the milliseconds saved.
>

Failure to upgrade to websocket doesn't imply a cessation of utility.
Idle connections have had their CWND expanded. This is a big latency win for
further uses of the channel.
This assumes that we're willing to take idle connections from the pool of
HTTP connections and upgrade to WS. This seems very reasonable to me.
We may find that it isn't-- there may be more demand for HTTP connections
than WS connections, in which case leaving WS to open its own connection may
result in a net latency win.

>
> On Thu, 22 Jul 2010, Greg Wilkins wrote:
> >
> > Currently the WS handshake can only be rejected by closing the
> > connection and discarding any potential HTTP response.  Thus a webapp
> > that wishes to fall back to a non-ws transport will have to establish a
> > new connection, maybe negotiate TLS, then handshake the new transport.
> > Thus there will be an extra 2 or 3 round trips to establish the
> > fall-back transport.
>
> The only time this would be useful is when the script doesn't know ahead
> of time which host it will be connecting to, and doesn't know ahead of
> time what protocols that host will support, but where it does know that it
> will support either a Web Socket server or an HTTP-based mechanism. This
> will only occur during the transition period where some sites provide an
> HTTP-based protocol but not a Web Socket version, but where other sites
> provide Web Socket equivalents.
>

How are clients to know, a priori to contact, that the server speaks HTTP or
WS or some future protocol?
Keep in mind that the fact that the server speaks WS doesn't imply that the
channel between the client and server allows it to happen, and that neither
the client nor server may control any of this.

>
> This is such an edge case that optimising for it should only be done if it
> can be essentially done for free. This is not the case here. Debugging
> connection reuse will be a huge pain. It's not worth it.
>

I'm hearing:
I don't think that saving the billions of users in the world any of their
time is worth any of mine.
If you can prove that reuse cannot happen, or that it offers no latency
benefit, then I'd agree. You've not proven that, you're just stating an
opinion that doesn't seem to have basis in experience.

>
> If you really truly want to handle this case, just invoke the
> XMLHttpRequest constructor at the same time as the WebSocket constructor,
> and then drop whichever one fails.
>

That seems likely to cause problems for clients on slow links, is much more
likely to cause congestion, is much more likely to needlessly waste server
resources, and needlessly increases complexity for the client in the form of
wonderful races.

>
>
> On Thu, 22 Jul 2010, Jamie Lokier wrote:
> >
> > As noted some time ago, even when WS negotation *succeeds*, it can be
> > slower than comet-style HTTP, both slower in sending the first messages,
> > and slower in receiving the first responses.
> >
> > It means latency-optimised apps may open *two* connections in parallel:
> > One comet-style HTTP, and one WebSocket.  They will communicate initial
> > messages over the HTTP connection, and switch to the WebSocket
> > connection when that is ready.  That's not kind on low bandwidth links,
> > nor easy to program, so it's an ugly compromise.
>
> Could you give an example of an app where the speed in which the Web
> Socket connection is established matters? I can't think of any case where
> the client needs to send information that quickly -- after all, the user
> won't have started doing anything within one RTT of the page loading. (The
> server can easily include any data it wants in the original HTTP request,
> so this is presumably not to _get_ information.)
>

Chat. Games. Web browsing. Protocol discussions. Just about anything,
actually.
"The user" here is a user-agent. It is fully capable of taking action on
time horizons that the user cannot perceive directly.
RTTs can be fairly large, especially for devices using wireless links.
Starting off with an assumption that latency doesn't matter in the beginning
is just... amazingly, surprisingly silly.

>
> On Wed, 21 Jul 2010, Roberto Peon wrote:
> > >
> > > I could see trying multiple WebSocket protocols over one connection,
> > > but trying to try both HTTP or WebSocket connections, not to mention
> > > any other protocols the servers might provide, seems like massive
> > > complexity for negligible gain overall.
> >
> > I fully expect that we'll end up with multiple websocket "sockets" per
> > tab
>
> Presumably to different hosts.
>

Why would you presume that?

>
>
> > and we typically end up with many tabs.
>
> Tabs can share a single WebSocket to a single host using shared workers.
>

Or they can just open a websocket and have it multiplex to the server
without any additional complexity at all for application writers while
maintaining no degradation at all in security model, and requiring no trust
of anything on any other tab.

>
>
> > [...] As for complexity! At worst, you have flow control and
> > multiplexing. Multiplexing involves a unique ID per channel. Flow
> > control involves sending periodic updates telling the other side how
> > much it can send safely. Of course you also need to have a table in
> > which you do a lookup to see that there is already a connection for that
> > domain, including a reference to that connection. None of this is
> > difficult, even in concert.
>
> Over the past couple of months, we've had several Web developers come into
> the #whatwg channel and ask for help implementing the current Web Socket
> draft. We've seen all kinds of difficulties implementing just the current
> spec! People using regular expressions over the buffer to parse the
> handshake [1], people not considering that the handshake might be split
> into two packets, people writing code that reads straight off the end of
> their buffer if the data sent to their server isn't wellformed per the
> handshake... none of what's in the current draft is "difficult", but it's
> still difficult enough, as far as I can tell.
>

... so?
People using regular expressions over a buffer to parse a handshake, not
understanding the basics of TCP, or unable to handle other bits of IO
shouldn't be our target audience.
The target audience should be the consumers of the effects of the protocol--
the billions of people using web browsers, etc.

People making these kinds of mistakes are very unlikely to be writing the
next successful and widely used server (at least, not until they learn to do
these things properly, which some certainly may).
There is nothing wrong with tinkering on the network. I'm sure a lot of
people on this list did that in the past, and I'm certain that the lessons
learned have made those who did so much more effective. If we want a
protocol for learning, we should build one for that. That is not my intended
audience. I have my eye on the audience of billions.

This is the current list rathole, though. Targeting "amateur programmers"
shouldn't be a requirement.
-=R

> [1] (Which I incidentally expected; that's why there are two keys, so you
> can't trick such naive implementations by smuggling a key in the resource
> name. I didn't expect to see code saved by this so soon.)
>
>
> On Thu, 22 Jul 2010, Jamie Lokier wrote:
> > Willy Tarreau wrote:
> > >
> > > [Good description of transparent proxies at ISPs with configurable
> > > HTTP-aware rules on the routers.]
>
> What Willy wrote was not a description of transparent proxies but of
> man-in-the-middle proxies. Transparent proxies are a different beast
> altogether. Please see the HTTP spec for details. Man-in-the-middle
> proxies are not legitimate per the HTTP spec as far as I can tell, and are
> the cause of many problems on the Web (such as the lack of our ability to
> deploy pipelining).
>
>
Legitimate or not, they did exist, do exist, and likely will continue to
exist.

>
> On Thu, 22 Jul 2010, Willy Tarreau wrote:
> >
> > There are not that many ISPs in each country, I mean there are far less
> > ISPs than there are web sites or potential WebSocket implementers.
> > There's a high pressure on them to work as expected by customers.
>
> Not high enough, clearly, or we'd be able to deploy pipelining.
>
>
> On Thu, 22 Jul 2010, Willy Tarreau wrote:
> > >
> > > (Note: from a conformance standpoint, the "server" includes the
> > > proxy.)
> >
> > as seen from the client, yes. As seen from the proxy or the server or
> > any intermediate between them, no :-)
>
> I mean as seen from the point of view of conformance to the specification.
>
> --
> Ian Hickson               U+1047E                )\._.,--....,'``.    fL
> http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
> _______________________________________________
> hybi mailing list
> hybi@ietf.org
> https://www.ietf.org/mailman/listinfo/hybi
>