Re: [hybi] WS framing alternative

Jamie Lokier <jamie@shareable.org> Fri, 30 October 2009 12:46 UTC

Return-Path: <jamie@shareable.org>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 669803A67B6 for <hybi@core3.amsl.com>; Fri, 30 Oct 2009 05:46:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.413
X-Spam-Level:
X-Spam-Status: No, score=-2.413 tagged_above=-999 required=5 tests=[AWL=0.186, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id W1YbG-tQG6+R for <hybi@core3.amsl.com>; Fri, 30 Oct 2009 05:46:29 -0700 (PDT)
Received: from mail2.shareable.org (mail2.shareable.org [80.68.89.115]) by core3.amsl.com (Postfix) with ESMTP id 9477E3A6923 for <hybi@ietf.org>; Fri, 30 Oct 2009 05:46:29 -0700 (PDT)
Received: from jamie by mail2.shareable.org with local (Exim 4.63) (envelope-from <jamie@shareable.org>) id 1N3qsO-00013b-3f; Fri, 30 Oct 2009 12:46:44 +0000
Date: Fri, 30 Oct 2009 12:46:44 +0000
From: Jamie Lokier <jamie@shareable.org>
To: Ian Hickson <ian@hixie.ch>
Message-ID: <20091030124644.GC3579@shareable.org>
References: <8B0A9FCBB9832F43971E38010638454F0F1EA72C@SISPE7MB1.commscope.com> <Pine.LNX.4.62.0910270903080.9145@hixie.dreamhostps.com> <a9699fd20910270426u4aa508cepf557b362025ae5db@mail.gmail.com> <Pine.LNX.4.62.0910271824200.25616@hixie.dreamhostps.com> <4AE76137.8000603@webtide.com> <Pine.LNX.4.62.0910272118590.25608@hixie.dreamhostps.com> <20091029123121.GA24268@almeida.jinsky.com> <4AEA0E6C.1060607@webtide.com> <4AEA5713.8020008@it.aoyama.ac.jp> <Pine.LNX.4.62.0910300346010.25616@hixie.dreamhostps.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <Pine.LNX.4.62.0910300346010.25616@hixie.dreamhostps.com>
User-Agent: Mutt/1.5.13 (2006-08-11)
Cc: hybi@ietf.org
Subject: Re: [hybi] WS framing alternative
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Oct 2009 12:46:30 -0000

Ian Hickson wrote:
> You are proposing that given a length and a series of bytes encoding text 
> in a variable-sized encoding (UTF-8), the application return a series of 
> characters. My point is that this means that you have two lengths (the 
> length of the string in characters and the length of the string in bytes), 
> so you risk inexperienced software authors making elementary yet dangerous 
> mistakes in terms of how to read (or write) data to the stream. WebSocket 
> tries to avoid ever mixing the two (you either deal with bytes and byte 
> lengths, or you use sentinel bytes and no lengths -- you never have 
> characters and byte lengths mixed together).

Inexperienced authors, especially those writing 100 lines of Perl,
will send ISO-8859-1 or other text which occasionally contains 0xff
bytes in the middle.

Even experience authors will make that mistake sometimes.  What do you
think will happen when someone does something like this:

    - Read list of filenames in a directory.  They're UTF-8 (assumed),
      or the author is unfamiliar with character encodings and everything
      works fine in their ASCII development environment.

    - Concatenate the list with newlines, as people do.

    - Send the result as a frame.

Or this:

    - Read lines from a text file, which is in UTF-8 encoding.

    - Send each line as a frame.

    - (Oops, one of the text files you gave me had an 0xff byte in it.)

Result: Because of assumptions, 0xff bytes will be sent occasionally
in the middle of a frame.  Everything afterwards will break, but it'll
be rare enough that the author doesn't notice.  For the same reason
you've explained authors get lengths wrong.

The sentinel approach does not solve this fragility problem, it merely
shifts it around to a different place.

-- Jamie