Re: [hybi] Framing Take VI (a compromise proposal)

Dave Cridland <dave@cridland.net> Mon, 16 August 2010 14:55 UTC

Return-Path: <dave@cridland.net>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 41E123A69F7 for <hybi@core3.amsl.com>; Mon, 16 Aug 2010 07:55:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.492
X-Spam-Level:
X-Spam-Status: No, score=-2.492 tagged_above=-999 required=5 tests=[AWL=0.107, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4CzFvr7BbNzK for <hybi@core3.amsl.com>; Mon, 16 Aug 2010 07:55:52 -0700 (PDT)
Received: from peirce.dave.cridland.net (peirce.dave.cridland.net [217.155.137.61]) by core3.amsl.com (Postfix) with ESMTP id 981933A6765 for <hybi@ietf.org>; Mon, 16 Aug 2010 07:55:51 -0700 (PDT)
Received: from localhost (localhost.localdomain [127.0.0.1]) by peirce.dave.cridland.net (Postfix) with ESMTP id CC81F116809F; Mon, 16 Aug 2010 15:56:26 +0100 (BST)
X-Virus-Scanned: Debian amavisd-new at peirce.dave.cridland.net
Received: from peirce.dave.cridland.net ([127.0.0.1]) by localhost (localhost.localdomain [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 51B1AclRAURU; Mon, 16 Aug 2010 15:56:25 +0100 (BST)
Received: from puncture (puncture [217.155.137.60]) by peirce.dave.cridland.net (Postfix) with ESMTPA id 4281F116809E; Mon, 16 Aug 2010 15:56:25 +0100 (BST)
References: <AANLkTi=TBXO_Cbb+P+e2BVfx69shkf8E1-9ywDh_Y+Kz@mail.gmail.com> <2rlb66d01d7qn7qn8fbecr0a2tta768glk@hive.bjoern.hoehrmann.de> <AANLkTik9LrGoXxK0+v1orKF8rEUHnK0n+QEyHFR3wD-J@mail.gmail.com>
In-Reply-To: <AANLkTik9LrGoXxK0+v1orKF8rEUHnK0n+QEyHFR3wD-J@mail.gmail.com>
MIME-Version: 1.0
Message-Id: <4931.1281970585.257032@puncture>
Date: Mon, 16 Aug 2010 15:56:25 +0100
From: Dave Cridland <dave@cridland.net>
To: John Tamplin <jat@google.com>, Server-Initiated HTTP <hybi@ietf.org>, Bjoern Hoehrmann <derhoermi@gmx.net>
Content-Type: text/plain; delsp="yes"; charset="us-ascii"; format="flowed"
Subject: Re: [hybi] Framing Take VI (a compromise proposal)
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 16 Aug 2010 14:55:53 -0000

On Sat Aug 14 03:47:58 2010, John Tamplin wrote:
> Dave's original proposal had one opcode to start a frame, and two  
> separate
> opcodes to end a frame - one for text, and one for binary.  That  
> meant until
> you read the entire message, you didn't know if it was going to be  
> UTF8 or
> not.  That is what the paragraph quoted is referring to -- you need  
> to know
> the type of frame you are receiving at the first frame so you can  
> decode it
> as you receive each fragment.
> 
> 
Ah - yes, I'd thought there was no reason to decode early, but this  
seems like a reasonable one. It also makes debugging easier, so I'm  
quite happy with this flagrant casting aside of my brilliant ideas.

> So imagine you are writing the code to receive a text WebSocket  
> message.
>  Ultimately, you want to pass some UTF16-based string to the client  
> code.
>  The total message length in bytes is available, but UTF8  
> characters of 1-5
> bytes will convert to 1-2 UTF16 characters.  So, that means that  
> (given
> message length of n bytes from the first frame) you need to allocate
> wchar_t[n] (or char in Java, etc) in case each character in the  
> message is
> US-ASCII and possibly waste storage when some non-ASCII characters  
> are
> included.  Another alternative is to allocate a smaller buffer and  
> then
> resize it in the event that it is not large enough.  If instead the  
> number
> of UTF16 characters is known from the first fragment of the  
> message, you can
> simply allocate the correct size and never have to reallocate.  So,  
> it isn't
> about the processing speed of converting UTF8->UTF16, but rather  
> buffer
> management.
> 
> The downside is not all implementations may want to use UTF16  
> representation
> of the text data, in which case the value is useless.  So, I think  
> if it
> were useful, it would have to be in addition to the overall message  
> length.

This feels like something that's not a requirement, and therefore  
needs to be an extension.

It also feels like the kind of extension nobody will actually  
implement, too.

Dave.
-- 
Dave Cridland - mailto:dave@cridland.net - xmpp:dwd@dave.cridland.net
  - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
  - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade