Re: [hybi] WebSocket -76 is incompatible with HTTP reverse proxies

Willy Tarreau <w@1wt.eu> Wed, 21 July 2010 22:10 UTC

Return-Path: <w@1wt.eu>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 14EBD3A6A51 for <hybi@core3.amsl.com>; Wed, 21 Jul 2010 15:10:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.343
X-Spam-Level:
X-Spam-Status: No, score=-3.343 tagged_above=-999 required=5 tests=[AWL=-1.900, BAYES_00=-2.599, HELO_IS_SMALL6=0.556, J_CHICKENPOX_24=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qyH4qA8w5xcY for <hybi@core3.amsl.com>; Wed, 21 Jul 2010 15:10:19 -0700 (PDT)
Received: from 1wt.eu (1wt.eu [62.212.114.60]) by core3.amsl.com (Postfix) with ESMTP id AAD503A68A3 for <hybi@ietf.org>; Wed, 21 Jul 2010 15:10:18 -0700 (PDT)
Received: (from willy@localhost) by mail.home.local (8.14.4/8.14.4/Submit) id o6LMAVgV006860; Thu, 22 Jul 2010 00:10:31 +0200
Date: Thu, 22 Jul 2010 00:10:31 +0200
From: Willy Tarreau <w@1wt.eu>
To: Ian Hickson <ian@hixie.ch>
Message-ID: <20100721221031.GA6475@1wt.eu>
References: <20100706210039.GA12167@1wt.eu> <B709B846-2A8C-4B84-8F4D-B06B81D91A7B@brandedcode.com> <20100707044129.GH12126@1wt.eu> <AANLkTik-i_9a7JpaFRqPLBr68buPM5Ml3N1iabaJby8k@mail.gmail.com> <8B0A9FCBB9832F43971E38010638454F03E9DCCA29@SISPE7MB1.commscope.com> <AANLkTima-dMQjX7S0WURFPrY--bTJJUs9PZcd4bNmNdW@mail.gmail.com> <8B0A9FCBB9832F43971E38010638454F03E9DCCAD4@SISPE7MB1.commscope.com> <Pine.LNX.4.64.1007211706030.7242@ps20323.dreamhostps.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <Pine.LNX.4.64.1007211706030.7242@ps20323.dreamhostps.com>
User-Agent: Mutt/1.4.2.3i
Cc: "hybi@ietf.org" <hybi@ietf.org>
Subject: Re: [hybi] WebSocket -76 is incompatible with HTTP reverse proxies
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 21 Jul 2010 22:10:21 -0000

On Wed, Jul 21, 2010 at 09:24:16PM +0000, Ian Hickson wrote:
> On Tue, 6 Jul 2010, Willy Tarreau wrote:
> > 
> > Last week, it was reported to me that a site that was running fine on 
> > draft 75 could not get the draft 76 handshake to complete via a HAProxy 
> > load balancer, which runs as an HTTP reverse proxy. The connection would 
> > remain open between the client and haproxy, and between haproxy and the 
> > server, with the server never responding. The same client (Chromium 
> > 6.0.414.0) directly connected to the server worked fine.
> > 
> > The guy was kind enough to send me some network captures which show an 
> > obvious problem : the 8-bytes nonce from the client is not advertised as 
> > a content-length, so it is not forwarded by the reverse proxy as it is 
> > either part of a next request or pending data for when the handshake 
> > completes.
> 
> Right, you need to update all the server-side components to support 
> WebSocket. WebSocket is a new protocol. Similar updates would be needed to 
> support other new protocols.

Ian, I don't know how to explain it to you now. I've exhausted every bit
of possibility a normal human is able to understand, so I think that you
are deliberately acting to refuse to understand the facts :-(

I've told you *several* times that it's not a matter of "updating" server-
side components, but that your cross-dressed protocol will not be mergeable
with HTTP on reverse-proxies. So even if the reverse-proxies are "updated"
to use your terms, then they will have to be either configured to support
WebSocket *OR* configured to support HTTP, but not both on the same IP:port
couple.

What you are suggesting is that a shared component which would have to
support both protocols would have to trust any request that contains an
Upgrade header without even knowing if the server will see it. That's
simply not acceptable. Anyone will be able to play with HTTP servers
behind HTTP reverse-proxies pretending to be talking WebSocket. That's
silly and undesired.

HTTP spec is clear : the protocol resulting from an Upgrade is switched
*AFTER* the "101" response, not *BEFORE*. So the reverse-proxy MUST see
the 101 to accept to pass non-HTTP bytes to the other side if they are
not advertised in the request.

> > I can't agree with that because until the handshake completes, the proxy 
> > does not know whether the server will handle the request as a WS 
> > handshake or anything else, and it must absolutely not accept to blindly 
> > trust any random client who sets an Upgrade header that any server is 
> > free to ignore.
> 
> Obviously all server-side components have to be configured to know the 
> setup that they are in. This includes telling load balancers and other 
> front-end intermediaries which hosts are ready to handle WebSocket 
> connections and which are not. Just like a reverse proxy would not be 
> configured to forward a connection to an SMTP server behind the firewall, 
> it wouldn't be configured to send WebSocket traffic to HTTP servers.

No, this is totally different here, because requests are routed through
multiple layers of reverse-proxies on server-side which fortunately don't
all have to know every single bit of URLs. The outer ones just know how
to process global confs and *some* inner components may specifically be
tuned to know that *some* URLs will be working differently. But that's
clearly not even remotely thinkable for hosting providers. They offer
you a host name, everyone behind the same IP and they don't want to know
how you will be managing your URLs, they just forward you the traffic
for that Host: header and you do whatever you want with it. And if you
want to change your WS URL twice a day, it's your problem and they don't
have to reconfigure all of their components just because of you.

> > Conversely, having no Content-Length header in the request means that we 
> > don't know what a reverse proxy will do if it receives a valid one. For 
> > instance, we could very well imagine that some reverse proxies which 
> > will assume that Content-Length == 8 for any request containing 
> > "Upgrade: WebSocket" will have trouble when receiving a different 
> > Content-Length header. This could be used to pass larger amounts of data 
> > than what is allowed by the protocol to a second reverse-proxy, which, 
> > if it is able to parallelize pipelined requests, will forward the first 
> > one to the server and the second one (embedded in the apparent data) to 
> > another server.
> 
> The spec is very clear about how a server side is to parse the handshake. 
> I don't think there's any ambiguity here. There's no need for the reverse 
> proxy to "assume a Content-Length" or anything like that; if it decides 
> that the request is a WebSocket request (e.g. based on the presence of an 
> "Upgrade: WebSocket" field, or based on the target IP or the given 
> resource name), then it should follow the Web Socket spec.

Please see above for the nth time why it cannot "decide" that it is a WS
request. All the problem comes from that. It can only decide based on what
the client decides to tell it, and when the server responds, it's too late.

> > The first obvious solution that comes to mind is to comply with the HTTP 
> > protocol which will be implemented along the whole chain and to simply 
> > add a "Content-Length: 8" header in the request.
> 
> As far as I can tell there is nothing here that contradicts the HTTP spec. 
> If there is a specific requirment in the HTTP spec that is being 
> contradicted, please cite it.

I really think you're trying to make all of us waste our time while we're
trying to help you release something which works instead of it becoming a
major failure that will be attempted to be used for 6 months then abandonned
due to massive failures everywhere. That's a real pity. So here it comes,
from RFC2616 (and this part remained unchanged in http-bis-p1-10) :

4.4 Message Length
...
  HTTP/1.1 requests containing a message-body MUST include a valid Content-Length
  header field unless the server is known to be HTTP/1.1 compliant. If a request
  contains a message-body and a Content-Length is not given, the server SHOULD
  respond with 400 (bad request) if it cannot determine the length of the message,
  or with 411 (length required) if it wishes to insist on receiving a valid
  Content-Length.

  All HTTP/1.1 applications that receive entities MUST accept the "chunked"
  transfer-coding (section 3.6), thus allowing this mechanism to be used for
  messages when the message length cannot be determined in advance.

  Messages MUST NOT include both a Content-Length header field and a non-identity
  transfer-coding. If the message does include a non-identity transfer-coding,
  the Content-Length MUST be ignored.

  When a Content-Length is given in a message where a message-body is allowed,
  its field value MUST exactly match the number of OCTETs in the message-body.
  HTTP/1.1 user agents MUST notify the user when an invalid length is received
  and detected.


In short, there is a message body, so either you advertise it using Content-Length
or you advertise it using Transfer-Encoding: chunked, though the later requires
that you're certain that the server supports HTTP/1.1 which is not necessarily
the case.

> We could add Content-Length: 0, but as far as I can tell that's implied 
> for GET anyway, so it wouldn't change anything in conforming software. 
> (This isn't very clear in the HTTP spec though.)

Indeed, it would change nothing, it would just clarify the fact that you
want to send nothing, which is already implicit when there's no content-length
nor transfer-encoding in the request.

> We can't add Content-Length: 8, since that would mean the data would be 
> sent through with the first request even in non-WebSocket-aware man-in- 
> the-middle proxies, which defeats the point.

No, this is *what you need*. You need any HTTP-compliant component to reliably
deliver this request because HTTP is the medium over which WebSocket will be
used, like it or not. What you're currently doing is ensuring that HTTP-compliant
software that are already working correctly everywhere will not be able to pass
WebSocket requests to their peers, which will result in the protocol never being
used beyond what it's always been since the beginning : tests between your local
browser and your local hand-written server.

> > Anyway, we have to do something now because we've reached the point Ian
> > tried to ensure we would avoid a long time ago : the deadlock which is
> > undetectable by the client.
> 
> A deadlock isn't a big deal. The problem was a false-positive situation, 
> where the handshake works but frames don't go through.

No, the handshake does not work because the server does never get the first
8 bytes so it does not respond. I would have no problem if those 8 bytes
were not required before the 101 response, but it happens the server needs
them before responding, which is causing the chicken-and-egg problem :

  client : hey, I'm sending you my handshake, and don't care about my extra bytes
  rev-proxy: hey server, I'm sending you a handshake, and if you reply with 101,
             I will send you the extra bytes
  server : I won't complete my handshake until you send me those bytes.


> On Thu, 8 Jul 2010, Greg Wilkins wrote:
> >
> > You are correct that it is not an extra round trip.  But I do not think 
> > it is a good solution to send a complete HTTP message PLUS extra stuff 
> > in the request.
> >
> > If the handshake is legal HTTP, the server should be able to rejects the 
> > websocket upgrade without closing the connection.  This would allow the 
> > connection to remain in the browsers pool of connections and avoid an 
> > extra round trip to establish another connection if the application 
> > falls back to non-websocket transports.
> 
> The browser can't know if the server is really an HTTP server, so it can't 
> possibly reuse the connection. It could in fact be a huge security hole, 
> depending on how we did this. It is, in either case, far more complexity 
> than is in any way justified here.

I don't agree. As long as the server has not responded with 101, it *IS* HTTP.
So you can do whatever you want on the connection before the 101, including
authenticating with a challenge if you like.

> All of these problems come from thinking of Web Sockets as a subprotocol 
> of HTTP. It isn't. Web Sockets is its own high-level protocol built on top 
> of TCP. It just happens to look enough like HTTP that you can reuse the 
> port, but that doesn't mean it's an HTTP-based protocol. Thinking of Web 
> Sockets as having anything to do with HTTP is a mistake.

Please stop denying the initial goals, you know it won't be used it you
can't share the port, and it's even still written in draft-76 :

   When a connection is to be made to a port that is shared by an HTTP
   server (a situation that is quite likely to occur with traffic to
   ports 80 and 443), the connection will appear to the HTTP server to
   be a regular GET request with an Upgrade offer.  In relatively simple
   setups with just one IP address and a single server for all traffic
   to a single hostname, this might allow a practical way for systems
   based on the WebSocket protocol to be deployed. 

Willy