[hybi] Review of draft-ietf-hybi-thewebsocketprotocol-06.txt
Alexey Melnikov <alexey.melnikov@isode.com> Mon, 28 March 2011 12:18 UTC
Return-Path: <alexey.melnikov@isode.com>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 528663A6452 for <hybi@core3.amsl.com>; Mon, 28 Mar 2011 05:18:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.175
X-Spam-Level:
X-Spam-Status: No, score=-102.175 tagged_above=-999 required=5 tests=[AWL=-0.176, BAYES_00=-2.599, J_CHICKENPOX_14=0.6, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id S+Izl9CQvhY7 for <hybi@core3.amsl.com>; Mon, 28 Mar 2011 05:18:03 -0700 (PDT)
Received: from rufus.isode.com (rufus.isode.com [62.3.217.251]) by core3.amsl.com (Postfix) with ESMTP id DCDFA3A6817 for <hybi@ietf.org>; Mon, 28 Mar 2011 05:18:02 -0700 (PDT)
Received: from [130.129.20.248] (dhcp-14f8.meeting.ietf.org [130.129.20.248]) by rufus.isode.com (submission channel) via TCP with ESMTPA id <TZB82wADLx84@rufus.isode.com>; Mon, 28 Mar 2011 13:19:39 +0100
Message-ID: <4D907CC6.7080707@isode.com>
Date: Mon, 28 Mar 2011 14:19:18 +0200
From: Alexey Melnikov <alexey.melnikov@isode.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915
X-Accept-Language: en-us, en
To: hybi@ietf.org
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Subject: [hybi] Review of draft-ietf-hybi-thewebsocketprotocol-06.txt
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Mar 2011 12:18:05 -0000
Hi, I've performed a detailed Area Director-style review of the document, which is described below. More of the changes listed below are nits or IETF-specific changes (e.g. IANA registration) that need to be done before this document is submitted for IESG review. 1.2. Protocol overview Data is sent on the wire in the form of frames that have an associated type. Broadly speaking, there are types for textual data, which is interpreted as UTF-8 text, binary data (whose interpretation First reference to UTF-8 needs a reference. is left up to the application), and control frames, which are not intended to carry data for the application, but instead for protocol- level signaling, such as to signal that the connection should be closed. This version of the protocol defines six frame types and leaves ten reserved for future use. Some text explaining that a message consist of one or more frames is missing in this section. In Section 1.3: Additional headers are used to select options in the WebSocket protocol. Options available in this version are the subprotocol selector, |Sec-WebSocket-Protocol|, and |Cookie|, which can used for sending cookies to the server (e.g. as an authentication mechanism). The |Sec-WebSocket-Protocol| request-header field can be used to indicate what subprotocols (application-level protocols layered over the WebSocket protocol) are acceptable to the client. The server selects one of the acceptable protocols and echoes that value in its handshake to indicate that it has selected that protocol. Sec-WebSocket-Protocol: chat Why is Sec-WebSocket-Protocol an "option"? Can anything useful be done without it? If yes, this needs additional explanation. If not, this needs rephrasing. [...] To prove that the handshake was received, the server has to take two pieces of information and combine them to form a response. The first piece of information comes from the |Sec-WebSocket-Key| header in the client handshake: Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ== For this header, the server has to take the value (as present in the header, e.g. the base64-encoded version), base64 needs a reference. I think this needs to be clarified a bit: are leading and traling whitespaces removed from the value? and concatenate this with the GUID "258EAFA5-E914-47DA-95CA-C5AB0DC85B11" in string form, which is unlikely to be used by network endpoints that do not understand the WebSocket protocol. A SHA-1 hash, base64-encoded, of this concatenation is then returned in the server's handshake [FIPS.180-2.2002]. The handshake from the server is much simpler than the client handshake. The first line is an HTTP Status-Line, with the status code 101: HTTP/1.1 101 Switching Protocols Any status code other than 101 MUST be treated as a failure if semantics of that status code are not defined in the context of a WebSocket connection, What does "if semantics of that status code are not defined in the context of a WebSocket connection" mean? and the websocket connection aborted. The headers follow the status code. Option fields can also be included. In this version of the protocol, the main option field is |Sec-WebSocket-Protocol|, which indicates the subprotocol that the server has selected. Web browsers verify that the server included one of the values as was specified in the |WebSocket| constructor. A server that speaks multiple subprotocols has to make sure it selects one based on the client's handshake and specifies it in its handshake. As above - why is this an option? The server can also set cookie-related option fields to _set_ cookies, as in HTTP. This needs a reference to the Cookie draft. In Section 1.4: By sending a close frame and waiting for a close frame in response, Unfinished sentence. In Section 1.6: This protocol is intended to fail to establish a connection with servers of pre-existing protocols like SMTP or HTTP, while allowing An Informative reference for SMTP is needed. HTTP servers to opt-in to supporting this protocol if desired. This is achieved by having a strict and elaborate handshake, and by limiting the data that can be inserted into the connection before the handshake is finished (thus limiting how much the server can be influenced). 1.7. Relationship to TCP and HTTP Based on the expert recommendation of the IANA, the WebSocket protocol by default uses port 80 for regular WebSocket connections and port 443 for WebSocket connections tunneled over TLS. A reference to TLS is needed here. 1.9. Subprotocols using the WebSocket protocol _This section is non-normative._ The client can request that the server use a specific subprotocol by including the |Sec-WebSocket-Protocol| field in its handshake. If it is specified, the server needs to include the same field and one of the selected subprotocol values in its response for the connection to be established. These subprotocol names do not need to be registered, but if a I question the wisdom of not having standard subprotocols registered. Defining an IANA registry with a liberal registration policy (e.g. "First Come First Served") is easy. 2.1. Terminology The term "URI" is used in this section in a manner consistent with the terminology used in HTML, namely, to denote a string that might or might not be a valid URI or IRI and to which certain error handling behaviors will be applied when the string is parsed. [RFC3986] I don't think this is very useful. I think for the purposes used in this document, only valid URI are in scope. 3.1. Parsing WebSocket URIs The steps to *parse a WebSocket URI's components* from a string /uri/ are as follows. These steps return either a /host/, a /port/, a /resource name/, and a /secure/ flag, or they fail. 1. If the /uri/ string is not an absolute URI, then fail this Did you mean URI or IRI here? [RFC3987] defines IRIs, is it needed? algorithm. [RFC3986] [RFC3987] 2. Resolve the /uri/ string using the resolve a Web address algorithm defined by the Web addresses specification, with the I think this is underspecified. URI character encoding set to UTF-8. [RFC3629] [RFC3986] [RFC3987] 3.3. Valid WebSocket URIs For a WebSocket URI to be considered valid, the following conditions MUST hold. o The /host/ must be ASCII-only (i.e. it must have been punycode- encoded This needs an Informative reference to punycode already if necessary, and MUST NOT contain any characters above U+007E). 4.1. Overview The base framing protocol defines a frame type with an opcode, a payload length, and designated locations for extension and application data, which together define the _payload_ data. Certain bits and opcodes are reserved for future expansion of the protocol. As such, In the absence of extensions negotiated during the opening handshake (Section 5), all reserved bits MUST be 0 and reserved opcode values MUST NOT be used. I think this needs to talk separately about what can be sent by compliant senders and what must be done by receivers if they see a non 0 reserved bit or an unrecognized opcode. The trac ticket about error handling is pretty much talking about the same thing (and mentions more). 4.3. Base Framing Protocol This wire format for the data transfer part is described by the ABNF I think the reference to [RFC5234] should go after "ABNF" and not at the end of this paragraph. Otherwise this is confusing. given in detail in this section. A high level overview of the framing is given in the following figure. [RFC5234] Extension data: n bytes The extension data is 0 bytes unless there is a reserved op-code or reserved bit present in the frame which indicates an extension has been negotiated. Any extension MUST specify the length of the extension data, or how that length may be calculated, and its use "its" refers to extension, not to the length, right? I suggest clarifying that. MUST be negotiated during the handshake. If present, the extension data is included in the total payload length. 4.5.1. Close Following the 2-byte integer the body MAY contain UTF-8 encoded data, the interpretation of which is not defined by this specification. Is this data human readable? If yes, please say so. If it is human readable, you need to be able to include optional language tags as per BCP 18. See <http://tools.ietf.org/group/iesg/trac/wiki/TypicalAppsAreaIssues> for more details on how this can be addressed. 4.5.2. Ping The Ping message contains an opcode of 0x02. Upon receipt of a Ping message, an endpoint MUST send a Pong message in response. It SHOULD do so as soon as is practical. The message bodies of the Ping and Pong MUST be the same. I think this needs to clarify that a ping message doesn't contain any Extension data and that "message bodies" is the Application data. In Section 5.1: 2. If the user agent already has a WebSocket connection to the remote host (IP address) identified by /host/, even if known by It is not enough to use only the IP address to identify a WebSocket connection. The port number must also be used. Due to NATs/firewalls there is no guaranty that a connection to the same IP address and a different port will even be served by the same machine. another name, the user agent MUST wait until that connection has been established or for that connection to have failed. There MUST be no more than one connection in a CONNECTING state. If multiple connections to the same IP address are attempted simultaneously, the user agent MUST serialize them so that there is no more than one connection at a time running through the following steps. 3. _Proxy Usage_: If the user agent is configured to use a proxy when using the WebSocket protocol to connect to host /host/ and/or port /port/, then the user agent SHOULD connect to that Why SHOULD and not a MUST? I.e., what are [possible] reasons for violating the SHOULD? proxy and ask it to open a TCP connection to the host given by /host/ and the port given by /port/. If the user agent is not configured to use a proxy, then a direct TCP connection SHOULD be opened to the host given by /host/ and the port given by /port/. As above. NOTE: Implementations that do not expose explicit UI for selecting a proxy for WebSocket connections separate from other proxies are encouraged to use a SOCKS proxy for WebSocket SOCKS needs an Informative reference. connections, if available, or failing that, to prefer the proxy configured for HTTPS connections over the proxy configured for HTTP connections. 5. If /secure/ is true, the user agent MUST perform a TLS handshake over the connection. If this fails (e.g. the server's certificate could not be verified), This also need a Normative reference to RFC 2818, section 3.1, because TLS server certificate verification procedures are protocol specific. then the user agent MUST fail the WebSocket connection and abort the connection. Otherwise, all further communication on this channel MUST run through the encrypted tunnel. [RFC2246] 7. The request MUST include a header with the name "Sec-WebSocket- Key". The value of this header MUST be a nonce consisting of a randomly selected 16-byte value that has been base64-encoded [RFC3548]. The nonce MUST be randomly selected randomly for Delete one of the two "randomly". each connection. 10. The request MAY include a header with the name "Sec-WebSocket- What is the semantics of the request if it doesn't contain this header field? Protocol". If present, this value indicates the subprotocol(s) the client wishes to speak. The elements that comprise this value MUST be non-empty strings with characters in the range U+0021 to U+007E and MUST all be unique. The ABNF for the value of this header is 1#(token | quoted-string), where the definitions of constructs and rules are as given in [RFC2616]. 5.2.1. Reading the client's opening handshake The client handshake consists of the following parts. If the server, while reading the handshake, finds that the client did not send a handshake that matches the description below, the server must abort s/must/MUST the WebSocket connection. 5.2.1. Reading the client's opening handshake 6. Optionally, a "Sec-WebSocket-Protocol header, with a list of values indicating which protocols the client would like to speak, ordered by preference. What is the semantics of the request if it doesn't contain this header field? 5.2.2. Sending the server's opening handshake 2. Establish the following information: [...] /version/ The |Sec-WebSocket-Version| header in the client's handshake includes the version of the WebSocket protocol the client is attempting to communicate with. If this version does not match a version understood by the server, the server MUST abort the WebSocket connection. The server MAY send a non-200 response code with a |Sec-WebSocket-Version| header indicating the version(s) the server is capable of understanding along with this non-200 response code. I think "along with this non-200 response code" can be deleted. 3. ... 3. Optionally, a "Sec-WebSocket-Protocol" header, with a value /subprotocol/ as defined in Paragraph 2 of Section 5.2.2. As mentioned above: why is this optional? 6.2. Handling errors in UTF-8 from the client When a server is to interpret a byte stream as UTF-8 but finds that the byte stream is not in fact a valid UTF-8 stream, behavior is undefined. A server could close the connection, convert invalid byte sequences to U+FFFD REPLACEMENT CHARACTERs, store the data verbatim, The latter is a security issue (e.g. can lead to buffer overflows when naive UTF-8 parsers are used), so it should be mentioned in the Security Considerations section. or perform application-specific processing. Subprotocols layered on the WebSocket protocol might define specific behavior for servers. 7.1.1. Close the WebSocket Connection To _Close the WebSocket Connection_, an endpoint closes the underlying TCP connection. An endpoint SHOULD use a method that cleanly closes the TCP connection, Actually the TLS session should also be closed cleanly, if possible/applicable. discarding any trailing bytes that may be received. And endpoint MAY close the connection via any means available when necessary, such as when under attack. 7.4. Status codes When closing an established connection (e.g. when sending a Close frame, after the handshake has completed), an endpoint MAY indicate a reason for closure. The interpretation of this reason by an endpoint, and the action an endpoint should take given this reason, are left undefined by this specification. This specification defines a set of pre-defined status codes, I think an IANA registry for these would be needed. and specifies which ranges may be used by extensions, frameworks, and end applications. 8.1. Negotiating extensions A client requests extensions by including a "Sec-WebSocket- Extensions" header, which follows the normal rules for HTTP headers (see [RFC2616] section 4.2) and the value of the header is defined by the following ABNF: extension-list = 1#extension extension = extension-token *( ";" extension-param ) extension-token = registered-token | private-use-token registered-token = token private-use-token = "x-" token extension-param = token [ "=" ( token | quoted-string ) ] Here you are using RFC 2616 ABNF and not RFC 5234 ABNF. You need to make this clear, because there are differences (e.g. implied LWS) Any extension-token used must either be a registered token (registration TBD), Yes, these need to be registered with IANA. or have a prefix of "x-" to indicate a private- use token. 8.2.1. Compression Senders using this extension MUST apply RFC 1951 encodings to all bytes of the data stream following the handshake including both data and control messages. The data stream MAY include multiple blocks of both compressed and uncompressed types as defined by RFC 1951. [RFC1951] Does this apply before or after masking? 9. Security considerations Servers that are not intended to process input from any Web page but only for certain sites should verify the "Origin" field is an origin they expect, and should only respond with the corresponding "Sec- WebSocket-Origin" if it is an accepted origin. Servers that only accept input from one origin can just send back that value in the "Sec-WebSocket-Origin" field, without bothering to check the client's value. Is "without bothering to check the client's value" part actually correct? 10.1. Registration of ws: scheme URI scheme syntax. In ABNF terms using the terminals from the URI specifications: [RFC5234] [RFC3986] "ws" ":" hier-part [ "?" query ] The path and query components form the resource name sent to the server to identify the kind of service desired. Other components have the meanings described in RFC3986. Which "other components"? I think the last sentence should be: They have the meanings described in RFC3986. The same comment for section 10.2. 10.4. Sec-WebSocket-Key Status reserved; do not use outside WebSocket handshake This is not correct. According to RFC 3864, this field should contain the value "standard". The "do not use outside WebSocket handshake" can be moved to the "Related information" of the registration template. The same comment applies to sections 10.5-10.9 14. Normative References [RFC2246] Dierks, T. and C. Allen, "The TLS Protocol Version 1.0", RFC 2246, January 1999. Should this point to TLS 1.2? [RFC4366] Blake-Wilson, S., Nystrom, M., Hopwood, D., Mikkelsen, J., and T. Wright, "Transport Layer Security (TLS) Extensions", RFC 4366, April 2006. This RFC just got obsoleted, so the reference needs to be updated. [WSAPI] Hickson, I., "The Web Sockets API", August 2010, <http://dev.w3.org/html5/websockets/>. This must be an Informative reference.
- Re: [hybi] Review of draft-ietf-hybi-thewebsocket… Brian
- [hybi] Review of draft-ietf-hybi-thewebsocketprot… Alexey Melnikov
- Re: [hybi] Review of draft-ietf-hybi-thewebsocket… Alexey Melnikov
- Re: [hybi] Review of draft-ietf-hybi-thewebsocket… Greg Wilkins
- Re: [hybi] Review of draft-ietf-hybi-thewebsocket… Iñaki Baz Castillo
- Re: [hybi] Review of draft-ietf-hybi-thewebsocket… Philipp Serafin