Re: [hybi] Handshake was: The WebSocket protocol issues.

Maciej Stachowiak <mjs@apple.com> Wed, 29 September 2010 08:55 UTC

Return-Path: <mjs@apple.com>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 131A13A6BA5 for <hybi@core3.amsl.com>; Wed, 29 Sep 2010 01:55:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -104.156
X-Spam-Level:
X-Spam-Status: No, score=-104.156 tagged_above=-999 required=5 tests=[AWL=-2.558, BAYES_00=-2.599, GB_SUMOF=5, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fb11+j+QX7V1 for <hybi@core3.amsl.com>; Wed, 29 Sep 2010 01:55:31 -0700 (PDT)
Received: from mail-out4.apple.com (mail-out.apple.com [17.254.13.23]) by core3.amsl.com (Postfix) with ESMTP id 1237E3A6D2E for <hybi@ietf.org>; Wed, 29 Sep 2010 01:55:31 -0700 (PDT)
Received: from relay15.apple.com (relay15.apple.com [17.128.113.54]) by mail-out4.apple.com (Postfix) with ESMTP id 5D0F8B28EF89 for <hybi@ietf.org>; Wed, 29 Sep 2010 01:56:14 -0700 (PDT)
X-AuditID: 11807136-b7b3eae0000066cf-a0-4ca2ff2e1dbe
Received: from elliott.apple.com (elliott.apple.com [17.151.62.13]) by relay15.apple.com (Apple SCV relay) with SMTP id 6F.90.26319.E2FF2AC4; Wed, 29 Sep 2010 01:56:14 -0700 (PDT)
MIME-version: 1.0
Content-type: multipart/alternative; boundary="Boundary_(ID_cfJHaM3XxkVT1YNKfLX3Og)"
Received: from [17.151.85.54] by elliott.apple.com (Sun Java(tm) System Messaging Server 6.3-7.04 (built Sep 26 2008; 32bit)) with ESMTPSA id <0L9I009YQ3HOHQ40@elliott.apple.com> for hybi@ietf.org; Wed, 29 Sep 2010 01:56:14 -0700 (PDT)
From: Maciej Stachowiak <mjs@apple.com>
In-reply-to: <AANLkTikcH1W3bQwumqHbe-Yqa3XdoJqCa2b-mZuvoQ7g@mail.gmail.com>
Date: Wed, 29 Sep 2010 01:56:12 -0700
Message-id: <9746E847-DC8B-45A7-ADF3-2ADB9DA7F82E@apple.com>
References: <AANLkTikszM0pVE-0dpZ2kv=i=y5yzS2ekeyZxtz9N=fQ@mail.gmail.com> <62B5CCE3-79AF-4F60-B3A0-5937C9D291D7@apple.com> <AANLkTikKc+4q_Q1+9uDo=ZpFF6S49i6vj2agZOGWVqKm@mail.gmail.com> <E2D38FF3-F1B9-4305-A7FC-A9690D2AEB4A@apple.com> <AANLkTikRYB_suPmSdH3uzGmdynozECRszDx+BpUvtZ4h@mail.gmail.com> <5CBF797D-A58E-4129-96B3-164F6E7409B9@apple.com> <4CA0D0D2.4040006@caucho.com> <AANLkTinACqm-GxUPhvFMf6_sGfeJofwy1r=28o=vgM43@mail.gmail.com> <4CA12810.8020006@caucho.com> <AANLkTimrMfXrnVMjU3f57L_sO7usyYQ56rBM4aMb2Pfr@mail.gmail.com> <20100928052501.GD12373@1wt.eu> <CA8029B0-71A3-44ED-88C6-934FE833BBA2@apple.com> <AANLkTim+fXj-h6OS3OdcfVfh3Q1UwxD8NLVawb=AWHX+@mail.gmail.com> <4FAC5C93-9BDF-4752-AFBC-162D718397AB@apple.com> <AANLkTikcH1W3bQwumqHbe-Yqa3XdoJqCa2b-mZuvoQ7g@mail.gmail.com>
To: Greg Wilkins <gregw@webtide.com>
X-Mailer: Apple Mail (2.1081)
X-Brightmail-Tracker: AAAAAA==
Cc: hybi <hybi@ietf.org>
Subject: Re: [hybi] Handshake was: The WebSocket protocol issues.
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2010 08:55:33 -0000

On Sep 29, 2010, at 12:58 AM, Greg Wilkins wrote:

> On 29 September 2010 09:50, Maciej Stachowiak <mjs@apple.com> wrote:
> 
>>> + Reliance on the browser not treating the 101 response as an error.
>>> + The inability to read the ping frame after the 101 response.
>> 
>> The threat model I described involves an attack on integrity, not confidentiality, so these two defenses do no good.
>> 
>> In other words, the threat model is that the attacker sends commands with side effects that it shouldn't be able to, and doesn't care about reading the response.
> 
> This would rely on a WS server taking an undesirable side effect on
> the basis of a partially negotiated WS connection.
> If such a server was written, surely it would be vulnerable to any WS
> client and thus this is not a cross protocol issue, just
> a poorly written server.   More importantly, is there anything about a
> raw HTTP request that cannot be done in a normal WS upgrade request
> that is likely to trigger such a side effect?

The raw HTTP request can include a body, which is sent immediately, rather than waiting for the remainder of the handshake. The body can include content framed as WS protocol messages. A WebSocket client will wait until the handshake is complete. So yes, the possibility of attempting an HTTP connection to a WebSocket server does create additional attack surface.

> 
> 
>>> + The inability to send a pong frame
>> 
>> I don't think this one is true. Why couldn't you include it in an HTTP message body? Your original proposal does nothing to prevent this. Adding a server-provided nonce to the ping and requiring it to appear in some form in the pong does add some protection, but only as strong as the server's (possibly ad-hoc) choice of RNG.
> 
> It is conceivable that some HTTP requests might also be valid WS
> frames.   However, as HTTP sent from the server will start with
> HTTP/1.1 XXX reason, then only very contrived frames could be sent.
> The first two bytes of a ping frame carrying a 16 byte hash are STX
> DLE, and certainly not a valid HTTP response and an invalid method
> name in a HTTP request.
> 
> So I stand by my claim that a pong frame cannot be sent by a HTTP
> client, but perhaps some very contrived framed could be (see below).

You don't need the WebSocket frame to be a full HTTP message. You simply put as many WebSocket frames as you want in your HTTP body ahead of time. (That's the original threat model I suggested - you cue up an XHR that has a number of WebSocket frames in the body.) This is the reason client parts of the handshake need to not be predictable up front for an attacker.

> 
> 
> 
>>> + The inability to provide credentials (not otherwise available to
>>> normal HTTP) to access authorized content
>> 
>> Network position is itself effectively a credential. Some servers are only protected by a firewall. This is common not only in enterprise scenarios but also with many consumer devices that offer Web-based configuration. Browsers generally consider it their responsibility not to expand the attack surface in such cases.
> 
> If the network is the only credential needed, then the compromised
> browser will eventually have a WS client and thus this is not a cross
> protocol issue.     There is nothing about WS that is different than
> just HTTP in this regard and we would have to hope that the browser
> origin policy would protect both HTTP and WS from such poorly
> authenticated services being attacked by local compromised clients.
> Unfortunately many such environments probably have XSS vulnerabilities
> as well, so the origin model can be subverted, but this is an existing
> risk and there is no additional risk introduced by WS here.

WS is different for reasons stated under the first point. It won't send messages of the client's choice until the handshake is complete.

> 
> 
> 
>>> + The inability to make a WS server send unauthorized content as
>>> valid WS frames
>> 
>> Not relevant to the threat model. Again, because the threat is an attack on integrity, not confidentiality.
>>> + The inability to provide credentials (not otherwise available to
>>> normal HTTP) that might trigger server side effects
>> 
>> Not relevant to the threat model, which is existing HTTP functionality being used to attack WebSocket servers.
> 
> They are still defences against other cross protocol attacks, hence I
> listed should be listed as part of the defences.

OK, but not defenses against the particular threat I suggested.

> 
> 
> 
> 
>>> + The inability to send a WS frame from a HTTP client that might
>>> trigger server side effect.
>> 
>> I don't think this is true. What stops you from sending the bytes of a WS frame from an in-browser HTTP client?
> 
> It is indeed possible that some HTTP messages may indeed be valid WS
> frames, and I think that perhaps we
> need to do some further consideration of the framing mechanism to make
> sure that ASCII method names do not
> become legal frame headers.

You don't need to use a full HTTP message to send a WS frame, you can just put them in the body of a single message. That's what my original threat model imagined.

Using multiple HTTP requests in combination with HTTP pipelining is an interesting additional threat that I hadn't thought of.



> 
>>> + That even if all of the above could be subverted, the attacker
>>> would only have a capability (to talk WS), that we plan to have widely
>>> available within the browser.
>> 
>> WebSocket itself will not send any frames until the handshake is complete, but HTTP sends the body along with the headers. So it's not the same. You can't count on browser-side checking of the server response to protect WS servers from HTTP clients.
> 
> I don't think you have understood my point.   Let's say that we can
> subvert a HTTP client so that it passes the WS handshake, so we know
> have a WS connection wired to a HTTP client.   Does that now give us
> any capabilities that represent a threat that we would not have if we
> were using a WS client?

The HTTP client doesn't need to really pass the handshake - it just needs to do a good enough job to fool the server, long enough for the server to read some frames. If the server expects it will never even get client frames until the protocol is complete, it may be in for a bad surprise.

You could say it's just the server author's responsibility to code defensively, but I think it would be even better if the protocol is designed to make it hard to make implementation mistakes of this kind.

> 
> Actually to answer my own question, a HTTP message that is contrived
> to look like a WS frame will be able to have the opcode and RSV bits
> set by the caller, which otherwise is not the case for WS clients.  So
> this last defence is weaker than I represented, but that's why we have
> defence in depth.   Again, we might want to revisit framing to ensure
> that HTTP methods cannot form valid WS headers.
> 
> Excellent - this exchange has finally been productive and I think we
> have indeed identified some weaknesses (albeit already existing in
> -76) that may need to be strengthened if we suspect that a HTTP client
> could somehow get past the other handshake defences.
> 
> But so far, there has been no indication that my proposal to change
> the nonce encoding and to frame the hashes as WS packets has weakened
> the defences.  I believe that the analysis we are doing is still
> showing that it is at least as secure as -76.

I suspect -76 is not strong enough, but it is slightly stronger in some specific ways. First, by spreading the data across multiple header fields and incorporating whitespace, it makes it slightly trickier to do injection attacks. Second, by requiring the server to combine data from multiple places, it makes it more likely the server will read more of the handshake.


Another possible strengthening is to design the actual message framing such that it is affected by data from both the client and server parts of the handshake. Let's say the whole frame header is XOR'd with the sum of client and server nonces. Then it is impossible for the server to read frames produced by a party that didn't really participate in the handshake, and likewise makes it impossible to read frames without checking the handshake. Effectively this (lightly) encrypts frame headers so they look like random bytes if there wasn't a proper handshake. For bonus points encrypt the message bodies too, and this would shore up defense against attackers using WebSocket to talk to another protocol (since, past the handshake, their bytes would look random and not actually be controlled by them).

I don't know offhand what the perf impact would be of this kind of approach. 

I note that this ad-hoc approach starts to resemble TLS the more it has added to it, only without the years of review and deployment experience, which is why I am somewhat skeptical of heading further in this direction.


Regards,
Maciej