Re: [hybi] Review of draft-ietf-hybi-thewebsocketprotocol-13

"Richard L. Barnes" <rbarnes@bbn.com> Tue, 06 September 2011 14:41 UTC

Return-Path: <rbarnes@bbn.com>
X-Original-To: hybi@ietfa.amsl.com
Delivered-To: hybi@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9D20721F8AD3; Tue, 6 Sep 2011 07:41:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.401
X-Spam-Level:
X-Spam-Status: No, score=-106.401 tagged_above=-999 required=5 tests=[AWL=-0.102, BAYES_00=-2.599, MIME_8BIT_HEADER=0.3, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ftchGSL8Y0rq; Tue, 6 Sep 2011 07:41:58 -0700 (PDT)
Received: from smtp.bbn.com (smtp.bbn.com [128.33.1.81]) by ietfa.amsl.com (Postfix) with ESMTP id 1C99321F86DE; Tue, 6 Sep 2011 07:41:58 -0700 (PDT)
Received: from ros-dhcp192-1-51-76.bbn.com ([192.1.51.76]:60993) by smtp.bbn.com with esmtps (TLSv1:AES128-SHA:128) (Exim 4.74 (FreeBSD)) (envelope-from <rbarnes@bbn.com>) id 1R0wsF-000ImM-5c; Tue, 06 Sep 2011 10:43:39 -0400
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset="us-ascii"
From: "Richard L. Barnes" <rbarnes@bbn.com>
In-Reply-To: <CALiegfmyQ5h4S2FgBnrh2VLr8+q-h0sLiGsww7T+1VwYNRo4wQ@mail.gmail.com>
Date: Tue, 06 Sep 2011 10:43:38 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <72E40A0F-C923-472F-9534-538B89F7A444@bbn.com>
References: <942CCA6B-B784-441B-96CA-3506FFC439E1@bbn.com> <CALiegfmyQ5h4S2FgBnrh2VLr8+q-h0sLiGsww7T+1VwYNRo4wQ@mail.gmail.com>
To: Iñaki Baz Castillo <ibc@aliax.net>
X-Mailer: Apple Mail (2.1084)
Cc: General Area Review Team <gen-art@ietf.org>, hybi@ietf.org
Subject: Re: [hybi] Review of draft-ietf-hybi-thewebsocketprotocol-13
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Sep 2011 14:41:58 -0000

>> Section 5.6, "Note that a particular text frame might include a partial UTF-8 sequence, however the whole message MUST contain valid UTF-8"
>> This requirement is meaningless, since the concept of a "message" is not defined here.  Suggest going back to a requirement that a frame MUST contain valid UTF-8 (i.e., that it breaks at code-point boundaries).
> 
> No please. This has been already discussed.
> 
> Imagine I must send a very big WS UTF-8 message and due to max frame
> size requeriments (still to know how such requiremente is
> "negotiated") I need to split it in N frames. This feature would work
> at the very transport core layer.
> 
> Probably I have a function that splits the whole WS message into
> chunks of N bytes (I mean "bytes" because I do know the max frame size
> in *bytes*), so such function just counts N bytes from the WS message
> and generates a frame. Please don't force such function to be
> Unicode/UTF-8 aware, no please.

Clearly it already has to be WebSocket aware, and it already has to read the opcode in order to distinguish data frames from control frames.  Adding on a requirement to break at code point boundaries does not seem hugely onerous.  It's three lines of C:

/* 
uint8_t *new_frame_start = *old_frame_start;
new_frame_start += DESIRED_FRAME_LENGTH;
*/
if (opcode & 0x0f == 0x01) { /* If this is a text frame */
    while (*new_frame_start & 0xc0 == 0x80) { /* While inside a code point */
        new_frame_start--; /* Back up one octet */
    }
    /* new_frame_start is now at the beginning of a code point */
}

In contrast, *not* requiring breaking at UTF-8 code points means that clients can't do any meaningful validation on text frames.  Which means you might as well get rid of text frames entirely.

--Richard