Re: [hybi] Multiplexing in WebSocket

Greg Wilkins <gregw@webtide.com> Sun, 25 October 2009 08:02 UTC

Return-Path: <gregw@webtide.com>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id B85663A689A for <hybi@core3.amsl.com>; Sun, 25 Oct 2009 01:02:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.605
X-Spam-Level:
X-Spam-Status: No, score=-0.605 tagged_above=-999 required=5 tests=[AWL=-0.095, BAYES_05=-1.11, J_CHICKENPOX_14=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WosDslImEUK0 for <hybi@core3.amsl.com>; Sun, 25 Oct 2009 01:02:02 -0700 (PDT)
Received: from mail-yx0-f192.google.com (mail-yx0-f192.google.com [209.85.210.192]) by core3.amsl.com (Postfix) with ESMTP id DB4C23A6897 for <hybi@ietf.org>; Sun, 25 Oct 2009 01:02:01 -0700 (PDT)
Received: by yxe30 with SMTP id 30so13908779yxe.29 for <hybi@ietf.org>; Sun, 25 Oct 2009 01:02:10 -0700 (PDT)
Received: by 10.150.233.2 with SMTP id f2mr7820279ybh.259.1256457730362; Sun, 25 Oct 2009 01:02:10 -0700 (PDT)
Received: from ?10.10.1.9? (60-242-119-126.tpgi.com.au [60.242.119.126]) by mx.google.com with ESMTPS id 15sm1540629gxk.4.2009.10.25.01.02.06 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sun, 25 Oct 2009 01:02:08 -0700 (PDT)
Message-ID: <4AE405FA.7030002@webtide.com>
Date: Sun, 25 Oct 2009 01:02:02 -0700
From: Greg Wilkins <gregw@webtide.com>
User-Agent: Thunderbird 2.0.0.23 (X11/20090817)
MIME-Version: 1.0
To: Jamie Lokier <jamie@shareable.org>, hybi@ietf.org
References: <4AD53DCA.6050304@webtide.com> <Pine.LNX.4.62.0910170203460.9145@hixie.dreamhostps.com> <4ADA7FD4.9010406@webtide.com> <4ADB6F0B.4000004@gmail.com> <Pine.LNX.4.62.0910221120380.9145@hixie.dreamhostps.com> <4AE08907.7080402@webtide.com> <Pine.LNX.4.62.0910230348470.9145@hixie.dreamhostps.com> <4AE1E659.5050507@webtide.com> <Pine.LNX.4.62.0910232154470.13521@hixie.dreamhostps.com> <4AE23D7A.2060009@webtide.com> <20091024182133.GA30762@shareable.org>
In-Reply-To: <20091024182133.GA30762@shareable.org>
X-Enigmail-Version: 0.95.7
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Subject: Re: [hybi] Multiplexing in WebSocket
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 25 Oct 2009 08:02:02 -0000

Jamie Lokier wrote:
> Greg Wilkins wrote:
>> * UTF-16 for those that can't deal with the uncertainty and/or
>>   unpredictability of the length of a UTF-8 string
> 
> Careful.  UTF-16 is a variable length encoding too.
> 
> It is possibly worse than UTF-8 for this, because everyone knows that
> UTF-8 is variable length, but many people seem to think UTF-16 is not.

OK that's worth a Wow!

I had assigned UTF-16 was fixed length because the java implementation
for it is.  But now I see in the java documentations:

 "The char data type (and therefore the value that a Character object
  encapsulates) are based on the original Unicode specification, which
  defined characters as fixed-width 16-bit entities. The Unicode standard
  has since been changed to allow for characters whose representation
  requires more than 16 bits. The range of legal code points is now
  U+0000 to U+10FFFF, known as Unicode scalar value. (Refer to the
  definition of the U+n notation in the Unicode standard.)

  The set of characters from U+0000 to U+FFFF is sometimes referred
  to as the Basic Multilingual Plane (BMP). Characters whose code
  points are greater than U+FFFF are called supplementary characters.
  The Java 2 platform uses the UTF-16 representation in char arrays
  and in the String and StringBuffer classes. In this representation,
  supplementary characters are represented as a pair of char
  values, the first from the high-surrogates range, (\uD800-\uDBFF),
  the second from the low-surrogates range (\uDC00-\uDFFF)."


So I'll modify my original point to

 * UTF-16 for java programmers and other BMP users that don't want to
   deal with the uncertainty and/or unpredictability of the length of
   a UTF-8 string


Thanks for the learning experience!

cheers