Re: [hybi] WS framing alternative

Ian Hickson <ian@hixie.ch> Tue, 27 October 2009 18:34 UTC

Return-Path: <ian@hixie.ch>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id BB13F28C20E for <hybi@core3.amsl.com>; Tue, 27 Oct 2009 11:34:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.556
X-Spam-Level:
X-Spam-Status: No, score=-2.556 tagged_above=-999 required=5 tests=[AWL=0.043, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eDbO-XgtRgUc for <hybi@core3.amsl.com>; Tue, 27 Oct 2009 11:34:11 -0700 (PDT)
Received: from looneymail-a2.g.dreamhost.com (caibbdcaaaaf.dreamhost.com [208.113.200.5]) by core3.amsl.com (Postfix) with ESMTP id AA0C828C216 for <hybi@ietf.org>; Tue, 27 Oct 2009 11:34:11 -0700 (PDT)
Received: from hixie.dreamhostps.com (hixie.dreamhost.com [208.113.210.27]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by looneymail-a2.g.dreamhost.com (Postfix) with ESMTP id E29F116D3F7; Tue, 27 Oct 2009 11:34:25 -0700 (PDT)
Date: Tue, 27 Oct 2009 18:34:30 +0000
From: Ian Hickson <ian@hixie.ch>
To: Greg Wilkins <gregw@webtide.com>
In-Reply-To: <a9699fd20910270426u4aa508cepf557b362025ae5db@mail.gmail.com>
Message-ID: <Pine.LNX.4.62.0910271824200.25616@hixie.dreamhostps.com>
References: <8B0A9FCBB9832F43971E38010638454F0F1EA72C@SISPE7MB1.commscope.com> <Pine.LNX.4.62.0910270903080.9145@hixie.dreamhostps.com> <a9699fd20910270426u4aa508cepf557b362025ae5db@mail.gmail.com>
Content-Language: en-GB-hixie
Content-Style-Type: text/css
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"
Cc: "hybi@ietf.org" <hybi@ietf.org>
Subject: Re: [hybi] WS framing alternative
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Oct 2009 18:34:12 -0000

On Tue, 27 Oct 2009, Greg Wilkins wrote:
> Ian Hickson wrote:
> > This would have some pretty major costs:
> > 
> > - It requires length delimiting for text frames, which is more 
> > complicated to implement (it's non-trivial to tell the difference 
> > between characters and bytes).
> 
> But length delimited frames are in the protocol anyway, so they have to 
> be implemented anyway.

Only by generic clients (i.e. the browsers). Servers only have to 
implement the binary frames if they want to support binary frames in the 
protocol they implement, and dedicated clients (that just implement 
WebSocket as a necessary part of implementing whatever protocol it is that 
they are written for) would only need the binary frame support if the 
protocol they implement uses binary frames.


> If there was only 1 framing type, then we have approximately half the 
> framing complexity.

I've previously explained the reasoning for wanting to minimise the length 
measurements for UTF-8 data, so I won't repeat it here.


> > - It requires parsing using a presized buffer for variable-encoding 
> > text, which risks character/byte mismatches and thus buffer overruns.
> 
> With sentinel encoding, sending data might be marginally simpler, but 
> receiving data is much harder.

Not particularly. In languages with automatic dynamic strings (like Perl, 
Python, ObjectPascal, etc) you just concatenate and all the complexity is 
hidden from you by the compiler or language runtime. If you are using 
explicit buffers, then the complexity consists of just doubling the buffer 
size when you reach the size of the buffer; it's not a big deal either.


> You will still have buffers of fixed sizes when you receive bytes.  You
> don't know how much data is coming, so you don't know how big to make
> your buffer or when to start turning bytes into characters.

You don't know that anyway, if your internal representation isn't UTF-8, 
since UTF-8 is a variable-length encoding (e.g. Win32 uses UTF-16 
internally, which is variable-length encoded in a different way from 
UTF-8, so you can't know how big the destination buffer should be without 
examining the whole byte string).


> I can see implementations reading a buffer.... scanning for 0x00, not 
> finding it... allocating a larger buffer... copying the bytes ... 
> reading more bytes ... scanning again for 0x00.... still not finding 
> it.... allocating yet another larger buffer... copying the bytes .... 
> etc.  etc.  until either you get a denial of service or you find 0x00, 
> when you can finally scan over all the bytes again to convert to 
> characters.

It's trivial to impose an arbitrary limit. It doesn't even have to be that 
arbitrary -- it can be whatever the server knows its protocol needs to 
support. Indeed, if the implementation uses fixed-size buffers like this, 
then it could just set its per-connection buffer to the maximum size it 
knows its protocol will ever handle, and just discard data if the limit is 
reached (or close the connection).

(Martin responded to the rest of your comments.)

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'