[hybi] Review of draft-ietf-hybi-thewebsocketprotocol-06.txt

Alexey Melnikov <alexey.melnikov@isode.com> Mon, 28 March 2011 12:18 UTC

Return-Path: <alexey.melnikov@isode.com>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 528663A6452 for <hybi@core3.amsl.com>; Mon, 28 Mar 2011 05:18:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.175
X-Spam-Level:
X-Spam-Status: No, score=-102.175 tagged_above=-999 required=5 tests=[AWL=-0.176, BAYES_00=-2.599, J_CHICKENPOX_14=0.6, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id S+Izl9CQvhY7 for <hybi@core3.amsl.com>; Mon, 28 Mar 2011 05:18:03 -0700 (PDT)
Received: from rufus.isode.com (rufus.isode.com [62.3.217.251]) by core3.amsl.com (Postfix) with ESMTP id DCDFA3A6817 for <hybi@ietf.org>; Mon, 28 Mar 2011 05:18:02 -0700 (PDT)
Received: from [130.129.20.248] (dhcp-14f8.meeting.ietf.org [130.129.20.248]) by rufus.isode.com (submission channel) via TCP with ESMTPA id <TZB82wADLx84@rufus.isode.com>; Mon, 28 Mar 2011 13:19:39 +0100
Message-ID: <4D907CC6.7080707@isode.com>
Date: Mon, 28 Mar 2011 14:19:18 +0200
From: Alexey Melnikov <alexey.melnikov@isode.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915
X-Accept-Language: en-us, en
To: hybi@ietf.org
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Subject: [hybi] Review of draft-ietf-hybi-thewebsocketprotocol-06.txt
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Mar 2011 12:18:05 -0000

Hi,
I've performed a detailed Area Director-style review of the document, 
which is described below. More of the changes listed below are nits or 
IETF-specific changes (e.g. IANA registration) that need to be done 
before this document is submitted for IESG review.

1.2.  Protocol overview

   Data is sent on the wire in the form of frames that have an
   associated type.  Broadly speaking, there are types for textual data,
   which is interpreted as UTF-8 text, binary data (whose interpretation

First reference to UTF-8 needs a reference.

   is left up to the application), and control frames, which are not
   intended to carry data for the application, but instead for protocol-
   level signaling, such as to signal that the connection should be
   closed.  This version of the protocol defines six frame types and
   leaves ten reserved for future use.

Some text explaining that a message consist of one or more frames
is missing in this section.


In Section 1.3:

   Additional headers are used to select options in the WebSocket
   protocol.  Options available in this version are the subprotocol
   selector, |Sec-WebSocket-Protocol|, and |Cookie|, which can used for
   sending cookies to the server (e.g. as an authentication mechanism).
   The |Sec-WebSocket-Protocol| request-header field can be used to
   indicate what subprotocols (application-level protocols layered over
   the WebSocket protocol) are acceptable to the client.  The server
   selects one of the acceptable protocols and echoes that value in its
   handshake to indicate that it has selected that protocol.
        Sec-WebSocket-Protocol: chat

Why is Sec-WebSocket-Protocol an "option"? Can anything useful be done 
without it? If yes, this needs additional explanation. If not, this 
needs rephrasing.

 [...]

   To prove that the handshake was received, the server has to take two
   pieces of information and combine them to form a response.  The first
   piece of information comes from the |Sec-WebSocket-Key| header in the
   client handshake:

        Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==

   For this header, the server has to take the value (as present in the
   header, e.g. the base64-encoded version),

base64 needs a reference.

I think this needs to be clarified a bit: are leading and traling 
whitespaces
removed from the value?

   and concatenate this with
   the GUID "258EAFA5-E914-47DA-95CA-C5AB0DC85B11" in string form, which
   is unlikely to be used by network endpoints that do not understand
   the WebSocket protocol.  A SHA-1 hash, base64-encoded, of this
   concatenation is then returned in the server's handshake
   [FIPS.180-2.2002].


   The handshake from the server is much simpler than the client
   handshake.  The first line is an HTTP Status-Line, with the status
   code 101:

        HTTP/1.1 101 Switching Protocols

   Any status code other than 101 MUST be treated as a failure if
   semantics of that status code are not defined in the context of a
   WebSocket connection,

What does "if semantics of that status code are not defined in the 
context of a
WebSocket connection" mean?

   and the websocket connection aborted.  The
   headers follow the status code.



   Option fields can also be included.  In this version of the protocol,
   the main option field is |Sec-WebSocket-Protocol|, which indicates
   the subprotocol that the server has selected.  Web browsers verify
   that the server included one of the values as was specified in the
   |WebSocket| constructor.  A server that speaks multiple subprotocols
   has to make sure it selects one based on the client's handshake and
   specifies it in its handshake.

As above - why is this an option?

   The server can also set cookie-related option fields to _set_
   cookies, as in HTTP.

This needs a reference to the Cookie draft.


In Section 1.4:

   By sending a close frame and waiting for a close frame in response,

Unfinished sentence.


In Section 1.6:

   This protocol is intended to fail to establish a connection with
   servers of pre-existing protocols like SMTP or HTTP, while allowing

An Informative reference for SMTP is needed.

   HTTP servers to opt-in to supporting this protocol if desired.  This
   is achieved by having a strict and elaborate handshake, and by
   limiting the data that can be inserted into the connection before the
   handshake is finished (thus limiting how much the server can be
   influenced).


1.7.  Relationship to TCP and HTTP

   Based on the expert recommendation of the IANA, the WebSocket
   protocol by default uses port 80 for regular WebSocket connections
   and port 443 for WebSocket connections tunneled over TLS.

A reference to TLS is needed here.


1.9.  Subprotocols using the WebSocket protocol

   _This section is non-normative._

   The client can request that the server use a specific subprotocol by
   including the |Sec-WebSocket-Protocol| field in its handshake.  If it
   is specified, the server needs to include the same field and one of
   the selected subprotocol values in its response for the connection to
   be established.

   These subprotocol names do not need to be registered, but if a
 
I question the wisdom of not having standard subprotocols registered.
Defining an IANA registry with a liberal registration policy (e.g. 
"First Come First Served") is easy.

2.1.  Terminology

   The term "URI" is used in this section in a manner consistent with
   the terminology used in HTML, namely, to denote a string that might
   or might not be a valid URI or IRI and to which certain error
   handling behaviors will be applied when the string is parsed.
   [RFC3986]

I don't think this is very useful. I think for the purposes used in this 
document, only valid URI are in scope.

3.1.  Parsing WebSocket URIs

   The steps to *parse a WebSocket URI's components* from a string /uri/
   are as follows.  These steps return either a /host/, a /port/, a
   /resource name/, and a /secure/ flag, or they fail.

   1.   If the /uri/ string is not an absolute URI, then fail this

Did you mean URI or IRI here? [RFC3987] defines IRIs, is it needed?

        algorithm.  [RFC3986] [RFC3987]

   2.   Resolve the /uri/ string using the resolve a Web address
        algorithm defined by the Web addresses specification, with the

I think this is underspecified.

        URI character encoding set to UTF-8.  [RFC3629] [RFC3986]
        [RFC3987]

3.3.  Valid WebSocket URIs

   For a WebSocket URI to be considered valid, the following conditions
   MUST hold.

   o  The /host/ must be ASCII-only (i.e. it must have been punycode-
      encoded

This needs an Informative reference to punycode

      already if necessary, and MUST NOT contain any characters
      above U+007E).


4.1.  Overview

   The base framing protocol defines a frame type with an opcode, a
   payload length, and designated locations for extension and
   application data, which together define the _payload_ data.  Certain
   bits and opcodes are reserved for future expansion of the protocol.
   As such, In the absence of extensions negotiated during the opening
   handshake (Section 5), all reserved bits MUST be 0 and reserved
   opcode values MUST NOT be used.

I think this needs to talk separately about what can be sent by 
compliant senders
and what must be done by receivers if they see a non 0 reserved bit
or an unrecognized opcode. The trac ticket about error handling is pretty
much talking about the same thing (and mentions more).

4.3.  Base Framing Protocol

   This wire format for the data transfer part is described by the ABNF

I think the reference to [RFC5234] should go after "ABNF" and not at the
end of this paragraph. Otherwise this is confusing.

   given in detail in this section.  A high level overview of the
   framing is given in the following figure.  [RFC5234]


   Extension data:  n bytes

      The extension data is 0 bytes unless there is a reserved op-code
      or reserved bit present in the frame which indicates an extension
      has been negotiated.  Any extension MUST specify the length of the
      extension data, or how that length may be calculated, and its use

"its" refers to extension, not to the length, right?
I suggest clarifying that.

      MUST be negotiated during the handshake.  If present, the
      extension data is included in the total payload length.


4.5.1.  Close

   Following the 2-byte integer the body MAY contain UTF-8 encoded data,
   the interpretation of which is not defined by this specification.

Is this data human readable? If yes, please say so.
If it is human readable, you need to be able to include optional 
language tags
as per BCP 18. See 
<http://tools.ietf.org/group/iesg/trac/wiki/TypicalAppsAreaIssues>
for more details on how this can be addressed.

4.5.2.  Ping

   The Ping message contains an opcode of 0x02.

   Upon receipt of a Ping message, an endpoint MUST send a Pong message
   in response.  It SHOULD do so as soon as is practical.  The message
   bodies of the Ping and Pong MUST be the same.

I think this needs to clarify that a ping message doesn't contain any
Extension data and that "message bodies" is the Application data.


In Section 5.1:

   2.  If the user agent already has a WebSocket connection to the
       remote host (IP address) identified by /host/, even if known by

It is not enough to use only the IP address to identify a WebSocket 
connection.
The port number must also be used.

Due to NATs/firewalls there is no guaranty that a connection to the same
IP address and a different port will even be served by the same machine.

       another name, the user agent MUST wait until that connection has
       been established or for that connection to have failed.  There
       MUST be no more than one connection in a CONNECTING state.  If
       multiple connections to the same IP address are attempted
       simultaneously, the user agent MUST serialize them so that there
       is no more than one connection at a time running through the
       following steps.

   3.  _Proxy Usage_: If the user agent is configured to use a proxy
       when using the WebSocket protocol to connect to host /host/
       and/or port /port/, then the user agent SHOULD connect to that

Why SHOULD and not a MUST? I.e., what are [possible] reasons for violating
the SHOULD?

       proxy and ask it to open a TCP connection to the host given by
       /host/ and the port given by /port/.


       If the user agent is not configured to use a proxy, then a direct
       TCP connection SHOULD be opened to the host given by /host/ and
       the port given by /port/.

As above.

       NOTE: Implementations that do not expose explicit UI for
       selecting a proxy for WebSocket connections separate from other
       proxies are encouraged to use a SOCKS proxy for WebSocket

SOCKS needs an Informative reference.

       connections, if available, or failing that, to prefer the proxy
       configured for HTTPS connections over the proxy configured for
       HTTP connections.


   5.  If /secure/ is true, the user agent MUST perform a TLS handshake
       over the connection.  If this fails (e.g. the server's
       certificate could not be verified),

This also need a Normative reference to RFC 2818, section 3.1, because
TLS server certificate verification procedures are protocol specific.

       then the user agent MUST fail
       the WebSocket connection and abort the connection.  Otherwise,
       all further communication on this channel MUST run through the
       encrypted tunnel.  [RFC2246]

   7.   The request MUST include a header with the name "Sec-WebSocket-
        Key".  The value of this header MUST be a nonce consisting of a
        randomly selected 16-byte value that has been base64-encoded
        [RFC3548].  The nonce MUST be randomly selected randomly for

Delete one of the two "randomly".

        each connection.


   10.  The request MAY include a header with the name "Sec-WebSocket-

What is the semantics of the request if it doesn't contain this header 
field?

        Protocol".  If present, this value indicates the subprotocol(s)
        the client wishes to speak.  The elements that comprise this
        value MUST be non-empty strings with characters in the range
        U+0021 to U+007E and MUST all be unique.  The ABNF for the value
        of this header is 1#(token | quoted-string), where the
        definitions of constructs and rules are as given in [RFC2616].


5.2.1.  Reading the client's opening handshake

   The client handshake consists of the following parts.  If the server,
   while reading the handshake, finds that the client did not send a
   handshake that matches the description below, the server must abort

s/must/MUST

   the WebSocket connection.


5.2.1.  Reading the client's opening handshake

   6.  Optionally, a "Sec-WebSocket-Protocol header, with a list of
       values indicating which protocols the client would like to speak,
       ordered by preference.

What is the semantics of the request if it doesn't contain this header 
field?


5.2.2.  Sending the server's opening handshake

   2.  Establish the following information:

 [...]

       /version/
          The |Sec-WebSocket-Version| header in the client's handshake
          includes the version of the WebSocket protocol the client is
          attempting to communicate with.  If this version does not
          match a version understood by the server, the server MUST
          abort the WebSocket connection.  The server MAY send a non-200
          response code with a |Sec-WebSocket-Version| header indicating
          the version(s) the server is capable of understanding along
          with this non-200 response code.

I think "along with this non-200 response code" can be deleted.

   3. ...

       3.  Optionally, a "Sec-WebSocket-Protocol" header, with a value
           /subprotocol/ as defined in Paragraph 2 of Section 5.2.2.

As mentioned above: why is this optional?

6.2.  Handling errors in UTF-8 from the client

   When a server is to interpret a byte stream as UTF-8 but finds that
   the byte stream is not in fact a valid UTF-8 stream, behavior is
   undefined.  A server could close the connection, convert invalid byte
   sequences to U+FFFD REPLACEMENT CHARACTERs, store the data verbatim,

The latter is a security issue (e.g. can lead to buffer overflows when
naive UTF-8 parsers are used), so it should be mentioned in the Security
Considerations section.

   or perform application-specific processing.  Subprotocols layered on
   the WebSocket protocol might define specific behavior for servers.


7.1.1.  Close the WebSocket Connection

   To _Close the WebSocket Connection_, an endpoint closes the
   underlying TCP connection.  An endpoint SHOULD use a method that
   cleanly closes the TCP connection,

Actually the TLS session should also be closed cleanly, if 
possible/applicable.

   discarding any trailing bytes that
   may be received.  And endpoint MAY close the connection via any means
   available when necessary, such as when under attack.

7.4.  Status codes

   When closing an established connection (e.g. when sending a Close
   frame, after the handshake has completed), an endpoint MAY indicate a
   reason for closure.  The interpretation of this reason by an
   endpoint, and the action an endpoint should take given this reason,
   are left undefined by this specification.  This specification defines
   a set of pre-defined status codes,

I think an IANA registry for these would be needed.

   and specifies which ranges may be
   used by extensions, frameworks, and end applications.

8.1.  Negotiating extensions

   A client requests extensions by including a "Sec-WebSocket-
   Extensions" header, which follows the normal rules for HTTP headers
   (see [RFC2616] section 4.2) and the value of the header is defined by
   the following ABNF:

         extension-list = 1#extension
         extension = extension-token *( ";" extension-param )
         extension-token = registered-token | private-use-token
         registered-token = token
         private-use-token = "x-" token
         extension-param = token [ "=" ( token | quoted-string ) ]

Here you are using RFC 2616 ABNF and not RFC 5234 ABNF. You need to make
this clear, because there are differences (e.g. implied LWS)

   Any extension-token used must either be a registered token
   (registration TBD),

Yes, these need to be registered with IANA.

   or have a prefix of "x-" to indicate a private-
   use token.

8.2.1.  Compression

   Senders using this extension MUST apply RFC 1951 encodings to all
   bytes of the data stream following the handshake including both data
   and control messages.  The data stream MAY include multiple blocks of
   both compressed and uncompressed types as defined by RFC 1951.
   [RFC1951]

Does this apply before or after masking?


9.  Security considerations

   Servers that are not intended to process input from any Web page but
   only for certain sites should verify the "Origin" field is an origin
   they expect, and should only respond with the corresponding "Sec-
   WebSocket-Origin" if it is an accepted origin.  Servers that only
   accept input from one origin can just send back that value in the
   "Sec-WebSocket-Origin" field, without bothering to check the client's
   value.

Is "without bothering to check the client's value" part actually correct?


10.1.  Registration of ws: scheme

   URI scheme syntax.
      In ABNF terms using the terminals from the URI specifications:
      [RFC5234] [RFC3986]

           "ws" ":" hier-part [ "?" query ]

      The path and query components form the resource name sent to the
      server to identify the kind of service desired.  Other components
      have the meanings described in RFC3986.

Which "other components"? I think the last sentence should be:

     They have the meanings described in RFC3986.


The same comment for section 10.2.



10.4.  Sec-WebSocket-Key

   Status
      reserved; do not use outside WebSocket handshake

This is not correct. According to RFC 3864, this field should contain 
the value "standard".

The "do not use outside WebSocket handshake" can be moved to the
"Related information" of the registration template.

The same comment applies to sections 10.5-10.9



14.  Normative References

   [RFC2246]  Dierks, T. and C. Allen, "The TLS Protocol Version 1.0",
              RFC 2246, January 1999.

Should this point to TLS 1.2?

   [RFC4366]  Blake-Wilson, S., Nystrom, M., Hopwood, D., Mikkelsen, J.,
              and T. Wright, "Transport Layer Security (TLS)
              Extensions", RFC 4366, April 2006.

This RFC just got obsoleted, so the reference needs to be updated.

   [WSAPI]    Hickson, I., "The Web Sockets API", August 2010,
              <http://dev.w3.org/html5/websockets/>.

This must be an Informative reference.