Re: [hybi] thewebsocketprotocol #40 (new): Clarify binary/utf-8 mixed handling

Gabriel Montenegro <Gabriel.Montenegro@microsoft.com> Thu, 10 February 2011 09:47 UTC

Return-Path: <Gabriel.Montenegro@microsoft.com>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 732083A6940 for <hybi@core3.amsl.com>; Thu, 10 Feb 2011 01:47:14 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.839
X-Spam-Level:
X-Spam-Status: No, score=-9.839 tagged_above=-999 required=5 tests=[AWL=0.008, BAYES_00=-2.599, J_CHICKENPOX_55=0.6, RCVD_IN_DNSWL_HI=-8, SARE_SUB_ENC_UTF8=0.152]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RvhLIz9VfdRz for <hybi@core3.amsl.com>; Thu, 10 Feb 2011 01:47:13 -0800 (PST)
Received: from smtp.microsoft.com (mailb.microsoft.com [131.107.115.215]) by core3.amsl.com (Postfix) with ESMTP id 3E3953A6933 for <hybi@ietf.org>; Thu, 10 Feb 2011 01:47:13 -0800 (PST)
Received: from TK5EX14HUBC101.redmond.corp.microsoft.com (157.54.7.153) by TK5-EXGWY-E802.partners.extranet.microsoft.com (10.251.56.168) with Microsoft SMTP Server (TLS) id 8.2.176.0; Thu, 10 Feb 2011 01:47:24 -0800
Received: from TK5EX14MLTW651.wingroup.windeploy.ntdev.microsoft.com (157.54.71.39) by TK5EX14HUBC101.redmond.corp.microsoft.com (157.54.7.153) with Microsoft SMTP Server (TLS) id 14.1.270.2; Thu, 10 Feb 2011 01:47:24 -0800
Received: from TK5EX14MBXW605.wingroup.windeploy.ntdev.microsoft.com ([169.254.5.102]) by TK5EX14MLTW651.wingroup.windeploy.ntdev.microsoft.com ([157.54.71.39]) with mapi; Thu, 10 Feb 2011 01:47:24 -0800
From: Gabriel Montenegro <Gabriel.Montenegro@microsoft.com>
To: Bjoern Hoehrmann <derhoermi@gmx.net>, "g_e_montenegro@yahoo.com" <g_e_montenegro@yahoo.com>
Thread-Topic: [hybi] thewebsocketprotocol #40 (new): Clarify binary/utf-8 mixed handling
Thread-Index: AQHLyL3zrxOYNTkuIkOg6Nb3OoxxI5P6fJmA
Date: Thu, 10 Feb 2011 09:47:22 +0000
Message-ID: <CA566BAEAD6B3F4E8B5C5C4F61710C1126E0501B@TK5EX14MBXW605.wingroup.windeploy.ntdev.microsoft.com>
References: <063.e489b6d352cc1192d00acf7f96150ea7@tools.ietf.org> <buc6l61vlv7fh3s8nmu335g3d7897pcf0r@hive.bjoern.hoehrmann.de>
In-Reply-To: <buc6l61vlv7fh3s8nmu335g3d7897pcf0r@hive.bjoern.hoehrmann.de>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "hybi@ietf.org" <hybi@ietf.org>
Subject: Re: [hybi] thewebsocketprotocol #40 (new): Clarify binary/utf-8 mixed handling
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Feb 2011 09:47:14 -0000

Yes, if one accepts UTF-8, that must be validated, no question about it. The point is not to disallow UTF-8, but to be able to optimize on those websocket connections that use binary only, so this is not a fragmenting of the protocol any more than any other per-session negotiable feature is.

>From the comments, folks seem to be thinking about Javascript Websockets API, and how it does not allow binary nor support streaming. That's true, but as we've emphasized before: we're designing the websockets protocol to be usable with the JS Websockets API, but is not limited to it. Other APIs may offer both binary and streaming.

> -----Original Message-----
> From: hybi-bounces@ietf.org [mailto:hybi-bounces@ietf.org] On Behalf Of
> Bjoern Hoehrmann
> Sent: Wednesday, February 09, 2011 17:00
> To: g_e_montenegro@yahoo.com
> Cc: hybi@ietf.org
> Subject: Re: [hybi] thewebsocketprotocol #40 (new): Clarify binary/utf-8 mixed
> handling
> 
> * hybi issue tracker wrote:
> > Additionally, when only partial frames may be available, it is
> > expensive to verify that this is indeed a valid UTF-8 stream (protocol
> > implementation needs to take into account multi-byte characters and
> > end of current data payload).  If binary has been negotiated for this
> > session, processing can be optimized accordingly.
> 
> If you do accept UTF-8 encoded data then you have to validate it, other- wise
> you get strange and possibly dangerous failures if you receive mal- formed data,
> for instance, you can't trust that the length has been cal- culated correctly.
> Anyway, you would seem to have this problem due to fragmentation anyway if
> you accept text frames, and if you don't mean to accept text frames then you
> just don't, there would seem to be no need to negotiate for "binary-only". I also
> note that validating UTF-8 is not really expensive, it's just a matter of `state =
> table[state + byte]` for each byte. Work, sure, but not very "expensive". It's easy
> too if you use my http://bjoern.hoehrmann.de/utf-8/decoder/dfa/ decoder for
> it.
> 
> > Proposal: allow negotiation to clarify if a stream will not mix binary
> > and text in order to enable optimizing for the binary-only case.
> 
> I do agree the protocol specification needs to discuss mixing frame types, like,
> what if you have a fragmented text message but one of the frames is not a text
> frame, but allowing to negotiate this complexity away will most likely lead to
> interoperability and security problems, as people will take shortcuts like not
> validating the frame type. We've in fact seen that already on this list. Essentially
> negotiation binary- only would be subsetting and fragmenting the protocol, and
> I don't think the benefit here warrants that.
> --
> Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
> Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
> 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
> _______________________________________________
> hybi mailing list
> hybi@ietf.org
> https://www.ietf.org/mailman/listinfo/hybi