Re: [hybi] Framing Take VI (a compromise proposal)

John Tamplin <jat@google.com> Sat, 14 August 2010 02:47 UTC

DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns; h=mime-version:in-reply-to:references:from:date:message-id: subject:to:cc:content-type:x-system-of-record; b=tz5Reyk3eY4qyqUd6S1oqwkPEcGUiSkS4nT5YlTvkbTbh0tyFBdE1ojbbI2+91oM8 z3PXkS/brrHGxRSZWTGdA==
MIME-Version: 1.0
In-Reply-To: <2rlb66d01d7qn7qn8fbecr0a2tta768glk@hive.bjoern.hoehrmann.de>
References: <AANLkTi=TBXO_Cbb+P+e2BVfx69shkf8E1-9ywDh_Y+Kz@mail.gmail.com> <2rlb66d01d7qn7qn8fbecr0a2tta768glk@hive.bjoern.hoehrmann.de>
From: John Tamplin <jat@google.com>
Date: Fri, 13 Aug 2010 22:47:58 -0400
Message-ID: <AANLkTik9LrGoXxK0+v1orKF8rEUHnK0n+QEyHFR3wD-J@mail.gmail.com>
To: Bjoern Hoehrmann <derhoermi@gmx.net>
Content-Type: multipart/alternative; boundary="000e0cd598e4b67b78048dbfa174"
Cc: hybi@ietf.org
Subject: Re: [hybi] Framing Take VI (a compromise proposal)
Precedence: list

On Fri, Aug 13, 2010 at 10:24 PM, Bjoern Hoehrmann <derhoermi@gmx.net>wrote:

> * Ian Fette wrote:
> >> -- having a single opcode to start a fragmented message and separate
> >opcodes to determine if it is a text or binary message means you can't
> start
> >to decode UTF8 text until you receive the entire message, which means you
> >add a buffering requirement of the undecoded message
>
> The formatting of your mail and its HTML attachment is somewhat broken
> so I am not sure what I am responding to here, but the observation seems
> incorrect; http://bjoern.hoehrmann.de/utf-8/decoder/dfa/ for instance
> makes it rather easy to decode the bytes as you receive them, so long as
> they are not delivered out of order.

Dave's original proposal had one opcode to start a frame, and two separate
opcodes to end a frame - one for text, and one for binary.  That meant until
you read the entire message, you didn't know if it was going to be UTF8 or
not.  That is what the paragraph quoted is referring to -- you need to know
the type of frame you are receiving at the first frame so you can decode it
as you receive each fragment.

> >> - Question: are endpoints likely enough to use UTF16 for internal
> >representation of text that it would make sense to send the number of
> UTF16
> >characters instead of bytes as the message length or as an additional
> field
> >on text frames?
>
> Sending it .instead. is probably not an option as that would encourage
> some implementers to take shortcuts like sending just twice the number
> of bytes if they expect to only ever send US-ASCII. And sending it in
> addition would still mean the number could be wrong, and there are many
> unknowns (length of strings, which code points are in the strings, how
> scripts, in case of web browsers receiving text, use the text, how the
> recepient implements strings, and so on).
>
> The computer I am using right now is a AMD Athlon II X2 215 with some
> very cheap main memory and it can transcode UTF-8 to UTF-16 at about
> 500 KB per millisecond (using the latest version of my decoder, which
> is about the fastest I know of), that's three orders of magnitude re-
> moved from the computer's Internet connection's bandwidth. I don't see
> a particular indication that knowing the length of the UTF-16 buffer
> in advance would have a noticable effect on my browsing experience.
>

So imagine you are writing the code to receive a text WebSocket message.
 Ultimately, you want to pass some UTF16-based string to the client code.
 The total message length in bytes is available, but UTF8 characters of 1-5
bytes will convert to 1-2 UTF16 characters.  So, that means that (given
message length of n bytes from the first frame) you need to allocate
wchar_t[n] (or char in Java, etc) in case each character in the message is
US-ASCII and possibly waste storage when some non-ASCII characters are
included.  Another alternative is to allocate a smaller buffer and then
resize it in the event that it is not large enough.  If instead the number
of UTF16 characters is known from the first fragment of the message, you can
simply allocate the correct size and never have to reallocate.  So, it isn't
about the processing speed of converting UTF8->UTF16, but rather buffer
management.

The downside is not all implementations may want to use UTF16 representation
of the text data, in which case the value is useless.  So, I think if it
were useful, it would have to be in addition to the overall message length.

-- 
John A. Tamplin
Software Engineer (GWT), Google

Re: [hybi] Framing Take VI (a compromise proposal) Patrick McManus
Re: [hybi] Framing Take VI (a compromise proposal) Scott Ferguson
[hybi] Framing Take VI (a compromise proposal) Ian Fette (イアンフェッティ)
Re: [hybi] Framing Take VI (a compromise proposal) Ian Hickson
Re: [hybi] Framing Take VI (a compromise proposal) Ian Fette (イアンフェッティ)
Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
Re: [hybi] Framing Take VI (a compromise proposal) Bjoern Hoehrmann
Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
Re: [hybi] Framing Take VI (a compromise proposal) Takeshi Yoshino
Re: [hybi] Framing Take VI (a compromise proposal) Takeshi Yoshino
Re: [hybi] Framing Take VI (a compromise proposal) Willy Tarreau
Re: [hybi] Framing Take VI (a compromise proposal) Anne van Kesteren
Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
Re: [hybi] Framing Take VI (a compromise proposal) Greg Wilkins
Re: [hybi] Framing Take VI (a compromise proposal) Ian Fette (イアンフェッティ)
Re: [hybi] Framing Take VI (a compromise proposal) Greg Wilkins
Re: [hybi] Framing Take VI (a compromise proposal) Takeshi Yoshino
Re: [hybi] Framing Take VI (a compromise proposal) Greg Wilkins
Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
Re: [hybi] Framing Take VI (a compromise proposal) Dave Cridland
Re: [hybi] Framing Take VI (a compromise proposal) Dave Cridland
Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
Re: [hybi] Framing Take VI (a compromise proposal) Greg Wilkins
Re: [hybi] Framing Take VI (a compromise proposal) Patrick McManus
Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
Re: [hybi] Framing Take VI (a compromise proposal) Ian Fette (イアンフェッティ)
Re: [hybi] Framing Take VI (a compromise proposal) Dave Cridland
Re: [hybi] Framing Take VI (a compromise proposal) Douglas Otis
Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
Re: [hybi] Framing Take VI (a compromise proposal) Thomson, Martin
Re: [hybi] Framing Take VI (a compromise proposal) gustav trede
Re: [hybi] Framing Take VI (a compromise proposal) Ian Fette (イアンフェッティ)
Re: [hybi] Framing Take VI (a compromise proposal) Thomson, Martin
Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
Re: [hybi] Framing Take VI (a compromise proposal) Thomson, Martin
Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
Re: [hybi] Framing Take VI (a compromise proposal) Scott Ferguson
Re: [hybi] Framing Take VI (a compromise proposal) Scott Ferguson
Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin
Re: [hybi] Framing Take VI (a compromise proposal) Dave Cridland
Re: [hybi] Framing Take VI (a compromise proposal) John Tamplin