Re: [hybi] CONNECT handshake text

"Simon Pieters" <simonp@opera.com> Wed, 08 December 2010 18:53 UTC

Return-Path: <simonp@opera.com>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 56B343A684F for <hybi@core3.amsl.com>; Wed, 8 Dec 2010 10:53:49 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.124
X-Spam-Level:
X-Spam-Status: No, score=-6.124 tagged_above=-999 required=5 tests=[AWL=0.175, BAYES_00=-2.599, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AmEILFLXbZuD for <hybi@core3.amsl.com>; Wed, 8 Dec 2010 10:53:47 -0800 (PST)
Received: from smtp.opera.com (smtp.opera.com [213.236.208.81]) by core3.amsl.com (Postfix) with ESMTP id 027243A6823 for <hybi@ietf.org>; Wed, 8 Dec 2010 10:53:46 -0800 (PST)
Received: from dhcp-190.linkoping.osa (c-2e98e355.410-6-64736c14.cust.bredbandsbolaget.se [85.227.152.46]) (authenticated bits=0) by smtp.opera.com (8.14.3/8.14.3/Debian-5+lenny1) with ESMTP id oB8It6fa016906 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Wed, 8 Dec 2010 18:55:07 GMT
Content-Type: text/plain; charset="utf-8"; format="flowed"; delsp="yes"
To: "Ian Fette (イアンフェッティ)" <ifette@google.com>
References: <AANLkTinEXHBeaUPo4gK2CHbq7ZHYnY2PE3Vb+Oi+K1NM@mail.gmail.com> <op.vnd6ijrzidj3kv@dhcp-190.linkoping.osa> <AANLkTimWpLUFuNR62Titix5WyJumnJg7rKXty1yX7G6O@mail.gmail.com>
Date: Wed, 08 Dec 2010 19:55:05 +0100
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
From: Simon Pieters <simonp@opera.com>
Message-ID: <op.vner53njidj3kv@dhcp-190.linkoping.osa>
In-Reply-To: <AANLkTimWpLUFuNR62Titix5WyJumnJg7rKXty1yX7G6O@mail.gmail.com>
User-Agent: Opera Mail/11.00 (MacIntel)
X-Scanned-By: MIMEDefang 2.64 on 213.236.208.81
Cc: hybi@ietf.org
Subject: Re: [hybi] CONNECT handshake text
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Dec 2010 18:53:49 -0000

On Wed, 08 Dec 2010 16:42:04 +0100, Ian Fette (イアンフェッティ)  
<ifette@google.com> wrote:

> 2010/12/8 Simon Pieters <simonp@opera.com>
>
>> On Tue, 07 Dec 2010 19:01:34 +0100, Ian Fette (イアンフェッティ) <
>> ifette@google.com> wrote:
>>
>>  There's a lot of back and forth lately about a possible CONNECT  
>> handshake
>>> and how it should look. I thought it might be helpful to send out some
>>> draft
>>> text that is a somewhat minimal CONNECT handshake, without things like
>>> JSON-encoded data and payload masking, as I think we may be getting
>>> distracted by some of these things and I'm not sure they're actually
>>> necessary or material.
>>>
>>> Below is some text that I would like the group to consider for a  
>>> CONNECT
>>> handshake, as it would fit into the WebSockets protocol.
>>>
>>
>> At the risk of distracting more, I'll provide some quick comments.
>>
>>
>>  1.2.  Protocol overview
>>>
>>
>> I didn't look at the non-normative sections.
>>
>>
>>
>>  5.  Opening Handshake
>>>
>>> 5.1.  Client Requirements
>>>
>>>   User agents running in controlled environments, e.g. browsers on
>>>   mobile handsets tied to specific carriers, may offload the management
>>>   of the connection to another agent on the network.  In such a
>>>   situation, the user agent for the purposes of conformance is
>>>   considered to include both the handset software and any such agents.
>>>
>>>   When the user agent is to *establish a WebSocket connection* to a
>>>   WebSocket URL /url/, it must meet the following requirements.  In the
>>>   following text, we will use terms from Section 3 such as "/host/" and
>>>   "/secure/ flag" as defined in that section.
>>>
>>
>> The WebSocket API passes the following "parameters" to the "establish a
>> WebSocket connection" algorithm:
>>
>> [[
>> Establish a WebSocket connection to a host host, on port port (if one  
>> was
>> specified), from origin, with the flag secure, with resource name as the
>> resource name, with protocols as the (possibly empty) list of  
>> protocols, and
>> with the defer cookies flag set. [WSP]
>> ]]
>>
>> i.e. host, port, origin, secure, resource name, protocols, defer  
>> cookies.
>> The URL is *not* passed.
>
>
> This needs to be parsed from the Sec-WebSocket-URL header. I will  
> clarify.
> (or, if we decide not to mask host, then it may change slightly.)

I don't follow... The WebSocket API spec says to parse the URL passed to  
the WebSocket() constructor and invoke "establish a WebSocket connection"  
with the above parameters -- it does not pass the URL to the algorithm.

>
>>
>>
>>
>>    1.  The WebSocket URL and its components MUST be valid according to
>>>       Section 3.3.  If any of the requirements are not met, the client
>>>       MUST fail the WebSocket connection and abort these steps.
>>>
>>
>> The WebSocket API validates the URL.
>>
>
>
> I'm not sure I understand your comment. Are you saying it's validated
> client-side?

Yes. See http://dev.w3.org/html5/websockets/#dom-websocket


> Even if so, the HTML5 clients are not necessarily the only
> users of the protocol. I think it's reasonable to leave validation text  
> in
> for servers, if the browser also validates which it should, then there
> shouldn't be any problem. Or am I mis-interpreting?

We're discussing the section which covers client-side requirements, not  
the server-side requirements.


>>
>>
>>    2.  If the user agent already has a WebSocket connection to the
>>>       remote host (IP address) identified by /host/, even if known by
>>>       another name, the user agent MUST wait until that connection has
>>>       been established or for that connection to have failed.  If
>>>       multiple connections to the same IP address are attempted
>>>       simultaneously, the user agent MUST serialize them so that there
>>>       is no more than one connection at a time running through the
>>>       following steps.
>>>
>>>       If the user agent cannot determine the IP address of the remote
>>>       host (for example because all communication is being done through
>>>       a proxy server that performs DNS queries itself), then the user
>>>       agent MUST assume for the purposes of this step that each host
>>>       name refers to a distinct remote host, but should instead limit
>>>       the total number of simultaneous connections that are not
>>>       established to a reasonably low number (e.g., in a Web browser,
>>>       to the number of tabs the user has open).
>>>
>>>       NOTE: This makes it harder for a script to perform a denial of
>>>       service attack by just opening a large number of WebSocket
>>>       connections to a remote host.  A server can further reduce the
>>>       load on itself when attacked by making use of this by pausing
>>>       before closing the connection, as that will reduce the rate at
>>>       which the client reconnects.
>>>
>>>       NOTE: There is no limit to the number of established WebSocket
>>>       connections a user agent can have with a single remote host.
>>>       Servers can refuse to connect users with an excessive number of
>>>       connections, or disconnect resource-hogging users when suffering
>>>
>>>
>>>
>>> Fette                     Expires June 8, 2011                 [Page  
>>> 24]
>>> %0CInternet-Draft           The WebSocket protocol            December
>>> 2010
>>>
>>>
>>>
>>>       high load.
>>>
>>>   3.  _Proxy Usage_: If the user agent is configured to use a proxy
>>>       when using the WebSocket protocol to connect to host /host/
>>>       and/or port /port/, then the user agent SHOULD connect to that
>>>       proxy and ask it to open a TCP connection to the host given by
>>>       /host/ and the port given by /port/.
>>>
>>>          EXAMPLE: For example, if the user agent uses an HTTP proxy for
>>>          all traffic, then if it was to try to connect to port 80 on
>>>          server example.com, it might send the following lines to the
>>>          proxy server:
>>>
>>>
>>>              CONNECT example.com:80 HTTP/1.1
>>>              Host: example.com
>>>
>>>          If there was a password, the connection might look like:
>>>
>>>
>>>              CONNECT example.com:80 HTTP/1.1
>>>              Host: example.com
>>>              Proxy-authorization: Basic ZWRuYW1vZGU6bm9jYXBlcyE=
>>>
>>>       If the user agent is not configured to use a proxy, then a direct
>>>       TCP connection SHOULD be opened to the host given by /host/ and
>>>       the port given by /port/.
>>>
>>>       NOTE: Implementations that do not expose explicit UI for
>>>       selecting a proxy for WebSocket connections separate from other
>>>       proxies are encouraged to use a SOCKS proxy for WebSocket
>>>       connections, if available, or failing that, to prefer the proxy
>>>       configured for HTTPS connections over the proxy configured for
>>>       HTTP connections.
>>>
>>>       For the purpose of proxy autoconfiguration scripts, the URL to
>>>       pass the function must be constructed from /host/, /port/,
>>>       /resource name/, and the /secure/ flag using the steps to
>>>       construct a WebSocket URL.
>>>
>>>       NOTE: The WebSocket protocol can be identified in proxy
>>>       autoconfiguration scripts from the scheme ("ws:" for unencrypted
>>>       connections and "wss:" for encrypted connections).
>>>
>>>   4.  If the connection could not be opened, either because a direct
>>>       connection failed or because any proxy used returned an error,
>>>       then the user agent MUST fail the WebSocket connection and abort
>>>       the connection attempt.
>>>
>>>
>>>
>>> Fette                     Expires June 8, 2011                 [Page  
>>> 25]
>>> %0CInternet-Draft           The WebSocket protocol            December
>>> 2010
>>>
>>>
>>>
>>>   5.  If /secure/ is true, the user agent MUST perform a TLS handshake
>>>       over the connection.  If this fails (e.g. the server's
>>>       certificate could not be verified), then the user agent MUST fail
>>>       the WebSocket connection and abort the connection.  Otherwise,
>>>       all further communication on this channel MUST run through the
>>>       encrypted tunnel.  [RFC2246]
>>>
>>>       User agents MUST use the Server Name Indication extension in the
>>>       TLS handshake.  [RFC4366]
>>>
>>>   Once a connection to the server has been established (including a
>>>   connection via a proxy or over a TLS-encrypted tunnel), the client
>>>   MUST send a handshake to the server.  The handshake consists of a
>>>   CONNECT request, along with a list of required and optional headers.
>>>   The requirements for this handshake are as follows.
>>>
>>>   1.   The handshake must be a valid HTTP request as specified by
>>>        [RFC2616].
>>>
>>
>> The current draft says which bytes should be put on the wire exactly. I
>> prefer the current draft over referencing RFC2616 since doing the latter
>> makes many variations conforming which can cause interop problems.
>>
>>
> And this was a source of contention for many people.
>
>
>>
>>
>>    2.   The Method of the request MUST be CONNECT, and the authority
>>>        MUST be websocket.invalid:443.  The HTTP version MUST be at
>>>        least 1.1.
>>>
>>
>> Why at least 1.1 and not exactly 1.1?
>
>
> Is there any particular reason to exclude future versions? I don't know  
> if
> it will really matter, happy to change if people want it to be exactly  
> 1.1.
>
>
>>
>>
>>
>>         The first line sent SHOULD be "CONNECT websocket.invalid:443
>>>        HTTP/1.1"
>>>
>>>   3.   The request MUST contain a "HOST" header whose value is equal to
>>>        "websocket.invalid:443"
>>>
>>
>> "HOST" or "Host"?
>
>
> Typo, Host. Thanks.
>
>
>>
>>
>>
>>    4.   The request MUST include a header with the name "Sec-WebSocket-
>>>        Key".  The value of this header MUST be a nonce consisting of a
>>>        randomly selected 16-byte value that has been base64-encoded
>>>        [RFC3548].  The nonce MUST be randomly selected for each
>>>        connection.
>>>
>>>        NOTE: As an example, if the randomly selected value was the set
>>>        of bytes 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x0a 0x0b
>>>        0x0c 0x0d 0x0e 0x0f 0x10, the value of the header should be
>>>        "AQIDBAUGBwgJCgsMDQ4PEC=="
>>>
>>>   5.   The request MUST include a header with the name "Sec-WebSocket-
>>>        Origin".  The value of this header MUST be the origin of the
>>>        context in which the code establishing the connection is
>>>        running.
>>>
>>
>> The value must be /origin/ that is passed to the algorithm. The  
>> WebSocket
>> API has requirements on what /origin/ is.
>>
>
> Ok, here I wasn't sure as again, the HTML5 WebSocket API is not  
> necessarily
> the only thing implementing the protocol. Perhaps I can change it though  
> to
> simply reference the definition in the HTML5 spec for HTML clients, and
> leave the somewhat more generic description for other non-HTML clients?
> Thoughts? E.g. it's not clear referencing the HTML5 API exactly would be
> appropriate if a plugin wished to implement the API.

I'm not saying you should reference the WebSocket API spec. I'm saying you  
should do what the current spec does and expect the input parameter  
/origin/ to be correct. See  
http://tools.ietf.org/html/draft-ietf-hybi-thewebsocketprotocol-03#section-11

>
>>
>>
>>         As an example, if code is running on www.example.com attempting
>>>        to establish a connection to ww2.example.com, the value of the
>>>        header should be "http://www.example.com".
>>>
>>>
>>>
>>> Fette                     Expires June 8, 2011                 [Page  
>>> 26]
>>> %0CInternet-Draft           The WebSocket protocol            December
>>> 2010
>>>
>>>
>>>
>>>   6.   The request MUST include a header with the name "Sec-WebSocket-
>>>        URL".  The value of this header MUST be the WebSocket URL to
>>>        which the connection is to be made.
>>>
>>
>> Why URL? Current handshake uses /resource name/.
>
>
> Because the current handshake also has the Host and resource as part of  
> the
> GET, and the new handshake using CONNECT doesn't have that, so it has to  
> go
> somewhere. At that point, it becomes a question of adding a host and
> resource name header, or adding a single URL header. Adding a single URL
> header seems cleaner.

OK. Then you either need to change section 11 to say to pass /URL/ to  
"establish a WebSocket connection" and get the WebSocket API spec changed,  
or let the input parameters be the same as in -03 and invoke "*construct a  
WebSocket URL* from a /host/, a /port/, a /resource name/, and a /secure/  
flag" to get the URL.


>
>>
>>
>>
>>    7.   The request MUST include a header with the name "Sec-WebSocket-
>>>        Draft".  The value of this header must be 4.
>>>
>>>   8.   The request MAY include a header with the name "Sec-WebSocket-
>>>        Protocol".  If present, this value indicates the subprotocol(s)
>>>        the client wishes to speak.  The ABNF for the value of this
>>>        header is 1#(token | quoted-string), where the definitions of
>>>        /token/ and /quoted-string/ are as given in [RFC2616].
>>>
>>
>> MAY? This should be MUST if /protocols/ is not empty, with the value of
>> /protocols/ joined with U+0020, or MUST NOT if /protocols/ is empty.
>
>
> I was trying to describe the requirements for a handshake. The header is
> optional in that the protocols field is not a required component. I also
> thought we had moved towards agreement on getting away from re-defining
> header parsing in this spec and instead  referencing 2616, e.g.
> Sec-WebSocket-Protocol: chat, superchat, "my, cool, thing"

So space should be allowed in a subprotocol then? I assume backslash  
escaping is also necessary? My knee-jerk reaction is that using the above  
syntax instead of the simpler separate-with-U+0020 will lead to more bugs  
in server impls.

>
>
>>
>>
>>
>>    9.   The request MAY include a header with the name "Sec-WebSocket-
>>>        Extensions".  If present, this value indicates the protocol-
>>>        level extension(s) the client wishes to speak.  The ABNF for the
>>>        value of this header is 1#(token | quoted-string), where the
>>>        definitions of /token/ and /quoted-string/ are as given in
>>>        [RFC2616].
>>>
>>
>> There are no extensions defined yet, right? Are vendor-proprietary
>> extensions allowed or not?
>
>
> There's one at the end of the spec (compression) in -03. And vendor
> proprietary extensions would certainly be allowed.

Ok.

>>
>>
>>
>>    10.  The request MAY include headers associated with sending cookies,
>>>        as defined by the appropriate specifications.  [RFC2109]
>>>        [RFC2965]
>>>
>>
>> MAY? Which cookies? Current draft has more precise rules here.
>
>
> Again, MAY as this is not required for a successful handshake. I'm happy  
> to
> throw in a MUST for browser clients specifically, but it seems like that
> ought to be in the HTML spec and not the protocol spec. The protocol spec
> should really be specifying the protocol for all clients, and what is
> allowed to be sent over that protocol. Instructions specific to a  
> particular
> client seem like they belong with the specification for that client, e.g.
> the HTML spec should specify what cookies must be sent with a WebSocket
> request for browsers, not the protocol spec.

Ok. Then I guess a list of cookies is also needed as an input parameter to  
"establish a WebSocket connection" (and section 11 changed as appropriate)  
so that the WebSocket API spec can say which cookies to pass to the  
algorithm. I'd still like the protocol spec to say that if there are  
cookies passed to the algorithm, then they must be sent.

>
>>
>>
>>
>>    Once the client's opening handshake has been sent, the client MUST
>>>   wait for a response from the server before sending any further data.
>>>   The server's response is detailed in the next section.
>>>
>>
>> *Sending* the server's response is detailed in the next section.  
>> Reading it
>> is not.
>
>
> It seemed somewhat duplicative to describe the exact same thing twice.  
> It's
> something I find quite frustrating in -03. I want to know what the  
> format of
> the handshake is, I don't want a precise set of steps each side must  
> take --
> that is much harder to interpret, and frankly I doubt people will code it
> exactly as written.

Well at Opera we implemented it exactly as written. Browser vendors are  
often anal about interoperability these days. Having loose rules leads to  
interop problems down the road with some sites just working in the most  
popular browser and the other browsers need to reverse engineer it to  
figure out what to do to be compatible instead of just implementing the  
spec.

>
>>
>>
>>    Specifically,
>>>   the client MUST receive from the server a 200 status code in the HTTP
>>>   response, and a "Sec-WebSocket-Accept" header.  The value of this
>>>   header MUST be the value specified in the following section, namely a
>>>   SHA-1 hash of the concatenation of the Sec-WebSocket-Key with the
>>>   string "258EAFA5-E914-47DA-95CA-C5AB0DC85B11", base64-encoded.  Any
>>>   other status code or value of "Sec-WebSocket-Accept" MUST NOT be
>>>   interpreted as having completed a WebSocket handshake, and the client
>>>   MUST NOT send WebSocket framed data over this connection until a
>>>   successful handshake is completed.
>>>
>>
>> Why not put this as part of the algorithm? See step 31 to 51 in the  
>> current
>> draft. It misses a lot of critical details present in the current draft,
>> like actually invoking _fail the WebSocket connection_, defining *the
>> WebSocket connection is established*, applying cookies, and other  
>> things.
>>
>
> It is. As for applying cookies, text welcome.

See -03, step 44 and step 51.


>
>>
>>
>>
>>    Where the algorithm above requires that a user agent fail the
>>>   WebSocket connection, the user agent may first read an arbitrary
>>>   number of further bytes from the connection (and then discard them)
>>>   before actually *failing the WebSocket connection*.  Similarly, if a
>>>   user agent can show that the bytes read from the connection so far
>>>   are such that there is no subsequent sequence of bytes that the
>>>   server can send that would not result in the user agent being
>>>   required to *fail the WebSocket connection*, the user agent may
>>>   immediately *fail the WebSocket connection* without waiting for those
>>>   bytes.
>>>
>>>
>>>
>>>
>>> Fette                     Expires June 8, 2011                 [Page  
>>> 27]
>>> %0CInternet-Draft           The WebSocket protocol            December
>>> 2010
>>>
>>>
>>>
>>>   NOTE: The previous paragraph is intended to make it conforming for
>>>   user agents to implement the algorithm in subtly different ways that
>>>   are equivalent in all ways except that they terminate the connection
>>>   at earlier or later points.  For example, it enables an
>>>   implementation to buffer the entire handshake response before
>>>   checking it, or to verify each field as it is received rather than
>>>   collecting all the fields and then checking them as a block.
>>>
>>> 5.2.  Server-side requirements
>>>
>>
>> I haven't looked at this section.
>>
>> --
>> Simon Pieters
>> Opera Software
>>

Cheers,
-- 
Simon Pieters
Opera Software