Re: [hybi] Framing Take VI (a compromise proposal)
John Tamplin <jat@google.com> Sat, 14 August 2010 02:47 UTC
Return-Path: <jat@google.com>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id C69A73A687B for <hybi@core3.amsl.com>; Fri, 13 Aug 2010 19:47:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -104.238
X-Spam-Level:
X-Spam-Status: No, score=-104.238 tagged_above=-999 required=5 tests=[AWL=1.738, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lfCYewUuQClO for <hybi@core3.amsl.com>; Fri, 13 Aug 2010 19:47:44 -0700 (PDT)
Received: from smtp-out.google.com (smtp-out.google.com [74.125.121.35]) by core3.amsl.com (Postfix) with ESMTP id 111433A67A4 for <hybi@ietf.org>; Fri, 13 Aug 2010 19:47:43 -0700 (PDT)
Received: from hpaq1.eem.corp.google.com (hpaq1.eem.corp.google.com [172.25.149.1]) by smtp-out.google.com with ESMTP id o7E2mKhg004682 for <hybi@ietf.org>; Fri, 13 Aug 2010 19:48:20 -0700
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=google.com; s=beta; t=1281754100; bh=Dh0ROUeajdk0pwcyoyRgd5VMUKY=; h=MIME-Version:In-Reply-To:References:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=qvmpAypRous/Ab0BwilMw1/jnKHx8mrRwMNzdPUpcDU4GyGwX6BT0120VAbEeoP/z gLgwFzhi/ThbkHHAtIWlg==
DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns; h=mime-version:in-reply-to:references:from:date:message-id: subject:to:cc:content-type:x-system-of-record; b=tz5Reyk3eY4qyqUd6S1oqwkPEcGUiSkS4nT5YlTvkbTbh0tyFBdE1ojbbI2+91oM8 z3PXkS/brrHGxRSZWTGdA==
Received: from yxj4 (yxj4.prod.google.com [10.190.3.68]) by hpaq1.eem.corp.google.com with ESMTP id o7E2mGIa006360 for <hybi@ietf.org>; Fri, 13 Aug 2010 19:48:19 -0700
Received: by yxj4 with SMTP id 4so916140yxj.29 for <hybi@ietf.org>; Fri, 13 Aug 2010 19:48:18 -0700 (PDT)
Received: by 10.150.69.34 with SMTP id r34mr2792915yba.385.1281754098203; Fri, 13 Aug 2010 19:48:18 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.151.60.3 with HTTP; Fri, 13 Aug 2010 19:47:58 -0700 (PDT)
In-Reply-To: <2rlb66d01d7qn7qn8fbecr0a2tta768glk@hive.bjoern.hoehrmann.de>
References: <AANLkTi=TBXO_Cbb+P+e2BVfx69shkf8E1-9ywDh_Y+Kz@mail.gmail.com> <2rlb66d01d7qn7qn8fbecr0a2tta768glk@hive.bjoern.hoehrmann.de>
From: John Tamplin <jat@google.com>
Date: Fri, 13 Aug 2010 22:47:58 -0400
Message-ID: <AANLkTik9LrGoXxK0+v1orKF8rEUHnK0n+QEyHFR3wD-J@mail.gmail.com>
To: Bjoern Hoehrmann <derhoermi@gmx.net>
Content-Type: multipart/alternative; boundary="000e0cd598e4b67b78048dbfa174"
X-System-Of-Record: true
Cc: hybi@ietf.org
Subject: Re: [hybi] Framing Take VI (a compromise proposal)
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 14 Aug 2010 02:47:47 -0000
On Fri, Aug 13, 2010 at 10:24 PM, Bjoern Hoehrmann <derhoermi@gmx.net>wrote: > * Ian Fette wrote: > >> -- having a single opcode to start a fragmented message and separate > >opcodes to determine if it is a text or binary message means you can't > start > >to decode UTF8 text until you receive the entire message, which means you > >add a buffering requirement of the undecoded message > > The formatting of your mail and its HTML attachment is somewhat broken > so I am not sure what I am responding to here, but the observation seems > incorrect; http://bjoern.hoehrmann.de/utf-8/decoder/dfa/ for instance > makes it rather easy to decode the bytes as you receive them, so long as > they are not delivered out of order. Dave's original proposal had one opcode to start a frame, and two separate opcodes to end a frame - one for text, and one for binary. That meant until you read the entire message, you didn't know if it was going to be UTF8 or not. That is what the paragraph quoted is referring to -- you need to know the type of frame you are receiving at the first frame so you can decode it as you receive each fragment. > >> - Question: are endpoints likely enough to use UTF16 for internal > >representation of text that it would make sense to send the number of > UTF16 > >characters instead of bytes as the message length or as an additional > field > >on text frames? > > Sending it .instead. is probably not an option as that would encourage > some implementers to take shortcuts like sending just twice the number > of bytes if they expect to only ever send US-ASCII. And sending it in > addition would still mean the number could be wrong, and there are many > unknowns (length of strings, which code points are in the strings, how > scripts, in case of web browsers receiving text, use the text, how the > recepient implements strings, and so on). > > The computer I am using right now is a AMD Athlon II X2 215 with some > very cheap main memory and it can transcode UTF-8 to UTF-16 at about > 500 KB per millisecond (using the latest version of my decoder, which > is about the fastest I know of), that's three orders of magnitude re- > moved from the computer's Internet connection's bandwidth. I don't see > a particular indication that knowing the length of the UTF-16 buffer > in advance would have a noticable effect on my browsing experience. > So imagine you are writing the code to receive a text WebSocket message. Ultimately, you want to pass some UTF16-based string to the client code. The total message length in bytes is available, but UTF8 characters of 1-5 bytes will convert to 1-2 UTF16 characters. So, that means that (given message length of n bytes from the first frame) you need to allocate wchar_t[n] (or char in Java, etc) in case each character in the message is US-ASCII and possibly waste storage when some non-ASCII characters are included. Another alternative is to allocate a smaller buffer and then resize it in the event that it is not large enough. If instead the number of UTF16 characters is known from the first fragment of the message, you can simply allocate the correct size and never have to reallocate. So, it isn't about the processing speed of converting UTF8->UTF16, but rather buffer management. The downside is not all implementations may want to use UTF16 representation of the text data, in which case the value is useless. So, I think if it were useful, it would have to be in addition to the overall message length. -- John A. Tamplin Software Engineer (GWT), Google
- Re: [hybi] Framing Take VI (a compromise proposal) Patrick McManus
- Re: [hybi] Framing Take VI (a compromise proposal) Scott Ferguson
- [hybi] Framing Take VI (a compromise proposal) Ian Fette (イアンフェッティ)
- Re: [hybi] Framing Take VI (a compromise proposal) Ian Hickson
- Re: [hybi] Framing Take VI (a compromise proposal) Ian Fette (イアンフェッティ)
- Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
- Re: [hybi] Framing Take VI (a compromise proposal) Bjoern Hoehrmann
- Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
- Re: [hybi] Framing Take VI (a compromise proposal) Takeshi Yoshino
- Re: [hybi] Framing Take VI (a compromise proposal) Takeshi Yoshino
- Re: [hybi] Framing Take VI (a compromise proposal) Willy Tarreau
- Re: [hybi] Framing Take VI (a compromise proposal) Anne van Kesteren
- Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
- Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
- Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
- Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
- Re: [hybi] Framing Take VI (a compromise proposal) Greg Wilkins
- Re: [hybi] Framing Take VI (a compromise proposal) Ian Fette (イアンフェッティ)
- Re: [hybi] Framing Take VI (a compromise proposal) Greg Wilkins
- Re: [hybi] Framing Take VI (a compromise proposal) Takeshi Yoshino
- Re: [hybi] Framing Take VI (a compromise proposal) Greg Wilkins
- Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
- Re: [hybi] Framing Take VI (a compromise proposal) Dave Cridland
- Re: [hybi] Framing Take VI (a compromise proposal) Dave Cridland
- Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
- Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
- Re: [hybi] Framing Take VI (a compromise proposal) Greg Wilkins
- Re: [hybi] Framing Take VI (a compromise proposal) Patrick McManus
- Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
- Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
- Re: [hybi] Framing Take VI (a compromise proposal) Ian Fette (イアンフェッティ)
- Re: [hybi] Framing Take VI (a compromise proposal) Dave Cridland
- Re: [hybi] Framing Take VI (a compromise proposal) Douglas Otis
- Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
- Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
- Re: [hybi] Framing Take VI (a compromise proposal) Thomson, Martin
- Re: [hybi] Framing Take VI (a compromise proposal) gustav trede
- Re: [hybi] Framing Take VI (a compromise proposal) Ian Fette (イアンフェッティ)
- Re: [hybi] Framing Take VI (a compromise proposal) Thomson, Martin
- Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
- Re: [hybi] Framing Take VI (a compromise proposal) Thomson, Martin
- Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
- Re: [hybi] Framing Take VI (a compromise proposal) Scott Ferguson
- Re: [hybi] Framing Take VI (a compromise proposal) Scott Ferguson
- Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
- Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
- Re: [hybi] Framing Take VI (a compromise proposal) Dave Cridland
- Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin