Re: [hybi] Multiplexing in WebSocket

Greg Wilkins <gregw@webtide.com> Fri, 23 October 2009 23:34 UTC

Return-Path: <gregw@webtide.com>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id C7FE03A67B1 for <hybi@core3.amsl.com>; Fri, 23 Oct 2009 16:34:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.716
X-Spam-Level:
X-Spam-Status: No, score=-2.716 tagged_above=-999 required=5 tests=[AWL=-0.117, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 34VvilRbOSQV for <hybi@core3.amsl.com>; Fri, 23 Oct 2009 16:34:18 -0700 (PDT)
Received: from mail-yw0-f183.google.com (mail-yw0-f183.google.com [209.85.211.183]) by core3.amsl.com (Postfix) with ESMTP id B7ABE3A676A for <hybi@ietf.org>; Fri, 23 Oct 2009 16:34:18 -0700 (PDT)
Received: by ywh13 with SMTP id 13so13050433ywh.29 for <hybi@ietf.org>; Fri, 23 Oct 2009 16:34:23 -0700 (PDT)
Received: by 10.91.38.17 with SMTP id q17mr4042593agj.10.1256340863531; Fri, 23 Oct 2009 16:34:23 -0700 (PDT)
Received: from ?192.168.1.117? (dsl081-052-134.sfo1.dsl.speakeasy.net [64.81.52.134]) by mx.google.com with ESMTPS id 7sm344088yxg.14.2009.10.23.16.34.21 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 23 Oct 2009 16:34:22 -0700 (PDT)
Message-ID: <4AE23D7A.2060009@webtide.com>
Date: Fri, 23 Oct 2009 16:34:18 -0700
From: Greg Wilkins <gregw@webtide.com>
User-Agent: Thunderbird 2.0.0.23 (X11/20090817)
MIME-Version: 1.0
To: Ian Hickson <ian@hixie.ch>
References: <4ACE50A2.5070404@ericsson.com> <3a880e2c0910081600v3607665dp193f6df499706810@mail.gmail.com> <4ACF4055.6080302@ericsson.com> <Pine.LNX.4.62.0910092116010.21884@hixie.dreamhostps.com> <4AD2E353.8070609@webtide.com> <4AD2F43D.6030202@ninebynine.org> <4AD39A64.4080405@webtide.com> <Pine.LNX.4.62.0910132335390.25383@hixie.dreamhostps.com> <4AD53DCA.6050304@webtide.com> <Pine.LNX.4.62.0910170203460.9145@hixie.dreamhostps.com> <4ADA7FD4.9010406@webtide.com> <4ADB6F0B.4000004@gmail.com> <Pine.LNX.4.62.0910221120380.9145@hixie.dreamhostps.com> <4AE08907.7080402@webtide.com> <Pine.LNX.4.62.0910230348470.9145@hixie.dreamhostps.com> <4AE1E659.5050507@webtide.com> <Pine.LNX.4.62.0910232154470.13521@hixie.dreamhostps.com>
In-Reply-To: <Pine.LNX.4.62.0910232154470.13521@hixie.dreamhostps.com>
X-Enigmail-Version: 0.95.7
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Cc: hybi@ietf.org
Subject: Re: [hybi] Multiplexing in WebSocket
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 23 Oct 2009 23:34:19 -0000

Ian Hickson wrote:
> On Fri, 23 Oct 2009, Greg Wilkins wrote:
>> Ian Hickson wrote:
>>> On Thu, 22 Oct 2009, Joshua Bell wrote:
>>>> * I seem to recall that one of the desires for sentinel-based frames 
>>>> was to allow octet streams for which the length was not known in 
>>>> advance.
>>> No; the only reason for sentinel-based frames was to not rely on 
>>> authors having to determine the length of their UTF-8-encoded strings, 
>>> which in many environments can be easy to get wrong.
>> Authors that can't determine the length of a UTF-8 string are not 
>> exactly the sort of developers that should be implementing network 
>> protocols.
> 
> Wow.
> 
> I cannot stand behind such a judgmental statement. Personally I would like 
> to make this kind of thing accessible to as many people as possible.

I don't understand your Wow.  Buffer overflows have historically
been one of the biggest security issues with any services exposed
to the internet.

Utf-8 will mostly have a 1 character to 1 byte mapping (at least
for english speakers), so many programmers will write code
that they think works by allocating byte buffers the same
size as their characters strings.  But then when giving
multi-byte characters they will suffer buffer overflows.

I've made this programming error myself many times! So perhaps
I too am one of the authors that should not be writing network
protocols.

The level of programming skill needed to manage meta data
or channels is entirely of the same order of magnitude as
managing utf-8 encoding.

So my point is probably better expressed by saying that
programmers who are able to write a websocket implementation
that correctly handles multi-byte utf-8 characters, will
be entirely capable of handling the additional "complexity"
of some of the additional capabilities being discussed here.



>> It seams an entirely reasonable simplification of websocket to use only 
>> length limited framing and to use the type byte to indicate such things 
>> as content charset
> 
> What's the use case for doing anything other than UTF-8?


Even limiting myself to js in the browser, I can think of
of reasons that a content type other than UTF-8 would be
desirable.

* Compressed UTF-8

* UTF-16 for those whose language happens to use a lot of >2 byte
  utf-8 characters

* UTF-16 for those that can't deal with the uncertainty and/or
  unpredictability of the length of a UTF-8 string



If we consider what a browser itself might like to do with
websocket, or a none- browser client might like, then we have:

* Mime encoded content for those that want to send other
  content types down a stream.  Images, sounds etc.

* Some other framed content for those that want to implement
  multiplexing on top of websocket (as you advocate) but don't
  want to have to do it with text based framing.

* Anybody that wants to send any content with 0xFF in it
  and does not want to base64 code their entire content
  as a result

* A mobile phone that is restricted to a single outgoing
  connection (not uncommon) so the browser wants to
  transport HTTP over the websocket connection

* Something that none of us has thought of yet


You also say that future multiplexing (or other complex
things), should be built on top of websocket. Yet  when
anybody asks for additional content types to help do that,
you say they are not needed and it all can be done in UTF-8.


I find this amusing, because Websocket has gone against the common
convention for web protocols being humanly readable ascii encoded.
Instead it is a byte squeezed binary protocol.

Yet when it comes to building protocols on top of websocket, then
all of a sudden you are an advocate of text encoded protocols.

To send a stream of images with websocket, you will need to
have a sentinel framed UTF-8 message that contains a JSON or
mime header to give the content type of the base64 or hex
encoded image.     That's 3 envelopes around each image!
Ouch!!!


regards