Re: [hybi] name symtax, was: WebSocket sub protocol name.

Jamie Lokier <jamie@shareable.org> Fri, 18 December 2009 19:39 UTC

Return-Path: <jamie@shareable.org>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id A03B13A6ACC for <hybi@core3.amsl.com>; Fri, 18 Dec 2009 11:39:02 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.455
X-Spam-Level:
X-Spam-Status: No, score=-3.455 tagged_above=-999 required=5 tests=[AWL=-0.856, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bGQNzNgyr4tS for <hybi@core3.amsl.com>; Fri, 18 Dec 2009 11:39:01 -0800 (PST)
Received: from mail2.shareable.org (mail2.shareable.org [80.68.89.115]) by core3.amsl.com (Postfix) with ESMTP id BCC433A688E for <hybi@ietf.org>; Fri, 18 Dec 2009 11:39:01 -0800 (PST)
Received: from jamie by mail2.shareable.org with local (Exim 4.63) (envelope-from <jamie@shareable.org>) id 1NLiex-0000UX-Ru; Fri, 18 Dec 2009 19:38:43 +0000
Date: Fri, 18 Dec 2009 19:38:43 +0000
From: Jamie Lokier <jamie@shareable.org>
To: Pieter Hintjens <ph@imatix.com>
Message-ID: <20091218193843.GA1205@shareable.org>
References: <Pine.LNX.4.62.0912080946160.16061@hixie.dreamhostps.com> <4B1E22AC.9080507@gmx.de> <Pine.LNX.4.62.0912080958180.16061@hixie.dreamhostps.com> <5821ea240912080354h40fb5ce1xf6fb2d4b9c96cdaa@mail.gmail.com> <a9699fd20912080732j1fc3a0ebuc0a3d4fc06790480@mail.gmail.com> <5821ea240912080807o25e7cddamd94969861d3f19fe@mail.gmail.com> <4B2B4DD7.2080107@gmx.de> <5821ea240912180415g1f6704b5ha022ba7cf347fe2f@mail.gmail.com> <a9699fd20912180612t26eaaf41labd3e145f47d62d6@mail.gmail.com> <5821ea240912180705n3a90c3bhbc9e31b4c6cd70b0@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <5821ea240912180705n3a90c3bhbc9e31b4c6cd70b0@mail.gmail.com>
User-Agent: Mutt/1.5.13 (2006-08-11)
Cc: hybi@ietf.org
Subject: Re: [hybi] name symtax, was: WebSocket sub protocol name.
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Dec 2009 19:39:02 -0000

Pieter Hintjens wrote:
> First, this breaks in any language where a null is significant in a
> string.  What rules are there for making this work in C?  Or is that
> not considered a valid language for implementors?

Additional language notes:

Java encodes ASCII NUL in UTF-8 with a different encoding to standard
UTF-8.  Therefore, Java and C-like environments do not interoperate
perfectly in the presence of ASCII NUL, unless extra steps are taken -
which are, of course, often ignored.

Unix shells also fail when asked to handle NUL for the most part, due
to their C heritage.

C++ can handle NUL, up until the point you use the string with c_str().
Then it will truncate at the NUL.

Many system protocols, directory services and database queries will
not like being asked to look up something starting with a name
containing NULs and, often because they use C style NUL-terminated
strings in their APIs.

The POSIX regular expression matching functions do not operate
correctly on a string containing embedded NULs.  Unless they are
encoded with Java-UTF-8, that is.

These concerns may apply to the data framing later in the protocol.
When NUL is transmitted in a data frame, either as the byte 0x00, or
the Java-UTF-8 encoding of ASCII NUL, you may expect some
implementations to get confused about the content of a frame.
However, as long as they operate on bytes at the frame boundary
finding stage, boundaries are likely to be found correctly.

-- Jamie