Re: [hybi] Framing Take VI (a compromise proposal)

Scott Ferguson <ferg@caucho.com> Wed, 18 August 2010 02:03 UTC

Return-Path: <ferg@caucho.com>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 6D6973A67B3 for <hybi@core3.amsl.com>; Tue, 17 Aug 2010 19:03:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.944
X-Spam-Level:
X-Spam-Status: No, score=-1.944 tagged_above=-999 required=5 tests=[AWL=-0.545, BAYES_00=-2.599, J_CHICKENPOX_44=0.6, J_CHICKENPOX_54=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ltG0pHB7LdyA for <hybi@core3.amsl.com>; Tue, 17 Aug 2010 19:03:09 -0700 (PDT)
Received: from smtp112.biz.mail.re2.yahoo.com (smtp112.biz.mail.re2.yahoo.com [66.196.116.97]) by core3.amsl.com (Postfix) with SMTP id 0C2223A6781 for <hybi@ietf.org>; Tue, 17 Aug 2010 19:03:08 -0700 (PDT)
Received: (qmail 81758 invoked from network); 18 Aug 2010 02:03:41 -0000
Received: from [192.168.1.11] (ferg@66.92.8.203 with plain) by smtp112.biz.mail.re2.yahoo.com with SMTP; 17 Aug 2010 19:03:41 -0700 PDT
X-Yahoo-SMTP: L1_TBRiswBB5.MuzAo8Yf89wczFo0A2C
X-YMail-OSG: lqWPpNkVM1kP5ykr8ZK659DPxiJSNjsiOdfkVRBcohnwOze avH3FzJDYBbMF1R9x8Y4dpSY0BPRiJzF5c5z3CtnmOvfkjrRIvYmCfrY_W6B hrct.xizSvdlMuqA09mX_1gnn7qBvRJ9tXBiaUmhF88iUxQdwmxKkpAhbk2z KSiCxZJfWpvg3Frxbh4QNp3JB0d7_y4..PmDparHV1gP.aVpP2qAp4q9WFk1 MuDkB1bRX2w8qqyyPh6WxvNfSng8Qh92.2L7KPwnypL0wAxXaIZCpdZG3Zps EpaJkUQ1CvudCkXaYpHAbFVH_VEhxaVT6p_rD1y8s89CYehaSTPgvjKYPaZD AZXEjIgDWIplMncvYAO4SPyTSlHlejdHPRPb04G6F5ZoYJs8es0KZL6woDES hn7imizfyQbq4LyV9tFYa_p0wF5wpQ37skPTTGvK210ROYbWBWNA-
X-Yahoo-Newman-Property: ymail-3
Message-ID: <4C6B3F76.2090207@caucho.com>
Date: Tue, 17 Aug 2010 19:03:34 -0700
From: Scott Ferguson <ferg@caucho.com>
User-Agent: Thunderbird 2.0.0.24 (X11/20100411)
MIME-Version: 1.0
To: John Tamplin <jat@google.com>
References: <AANLkTi=TBXO_Cbb+P+e2BVfx69shkf8E1-9ywDh_Y+Kz@mail.gmail.com> <AANLkTimJOGWgV6rx5JJYSJMC26OzQzskzVtkYz0L_EAg@mail.gmail.com> <op.vhe7qtmu64w2qv@anne-van-kesterens-macbook-pro.local> <AANLkTimqvQGJab-XdMuRFE8M2eB_xn_ipJZoNDuc28R2@mail.gmail.com> <4C66F67C.2080406@caucho.com> <AANLkTikY_ujn4rxuEPTidktL4Rwc1RizGBaa0-dpAhJP@mail.gmail.com>
In-Reply-To: <AANLkTikY_ujn4rxuEPTidktL4Rwc1RizGBaa0-dpAhJP@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: hybi@ietf.org
Subject: Re: [hybi] Framing Take VI (a compromise proposal)
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 18 Aug 2010 02:03:10 -0000

John Tamplin wrote:
> On Sat, Aug 14, 2010 at 4:03 PM, Scott Ferguson <ferg@caucho.com 
> <mailto:ferg@caucho.com>> wrote:
>
>     Well, the complexity is more of a spec issue than a coding one
>     because none of these proposals are hugely complicated to
>     implement. (Implementing them efficiently is a different matter.)
>
>     A Jamie-style proposal less complex because it leaves the basic
>     frame clean without needing the extension length or the extension
>     payload and without the extra complete-length field. The frame
>     would look like:
>
>
> There were objections raised about exactly this sort of thing before:
>
>     * a receiver would prefer to allocate the buffer at once for a
>       message rather than having to reallocate/copy it, which means
>       the sender should send the complete message length with the
>       first fragment if it is available
>

Yes, but it shouldn't be in the frame itself. Any of the extension 
methods would be fine for the TML. Besides, your proposal assumes that 
an implementation is fragmenting when it knows the TML, which doesn't 
make sense. If it knows the TML, it can just send a big frame.

>     * having a short length that was "mostly enough" would encourage
>       buggy implementations that don't support or test the long length
>       case
>

That's not a good reason to use 127 bytes. Picking an unusable length 
just to force people to test it is silly.

If you're concerned about people not properly testing variable lenghts, 
use a fixed length of 64 (or 56 to make a nice 8-byte frame.)

(The weird scrambling of the hash in the handshake is silly for the same 
reason.)

> Regarding d, as mentioned above those arguing against fragmentation 
> objected to not having the full message length, and after proposing 
> that addition they seemed happy with the compromise.  As fragmentation 
> support has been accepted as a requirement

Yes, the TML needs to be passed as metadata, but it should not be in the 
frame. Nor should it be tied to fragmentation. The whole point of 
fragmentation is that you don't know the total length.

I'm okay with several options for passing the TML. Just not in the frame 
definition itself.

>  
>
>     As a separate issue, your code's signature doesn't capture the
>     essence of the problem, because you really need to consider all
>     the APIs when evaluating a framing proposal, not just one that
>     happens to be easy to implement for the proposal. Your example was
>     a specialized API that really only works for utf-8 encoded 8-bit
>     C-strings.
>
>     The basic sender APIs that framing needs to support are:
>
>     a)  sendTextMessage(char []buffer, int offset, int length); //
>     JavaScript 16-bit text send
>
>
> The complication of UTF16->UTF8 conversion is the same in any other 
> length-based framing options, though supporting fragmentation means 
> you can have a fixed-size send buffer rather than having to allocate 
> all of it at once.

The difference between text and binary is that you know the frame length 
for binary immediately. With text, you do not. It's sufficiently 
different that you need to consider both signatures.
>  
>
>     a*) sendTextMessage(byte []buffer, int offset, int length); //
>     utf-8 encoded text (like PHP)
>
>     b)  sendBinaryMessage(byte []buffer, int offset, int length); //
>     binary single-buffer send
>
>
> Other than the choice of opcode, these are identical, right?
Correct, which is why your code sample was misleading, because it made 
the simplifying assumptions of binary, but pretended to be a solution 
for the text signature.
>  
>
>     c)  class BinaryMessageOutputStream {
>       write(byte []buffer, int offset, int length);
>       close();
>      }
>
>     d)  class TextMessageWriter {
>       write(char []buffer, int offset, int length);
>       close();
>      }
>
>
> Again, I don't see these as showing any differences between the 
> various framing proposals, and fragmentation makes it easier to 
> support streams as you just write a fragment when the buffer fills.

That's only true if the frame header size is fixed. If your frame header 
size is variable, you either need to shift the bytes in the buffer, or 
are forced to use something like writev with extra pointer metadata, 
instead of just preallocating the fixed frame header size.

Variable frame header sizes are bad.
>  
>
>     The receiver variations are
>
>     a) char []receiveTextMessage()
>
>     b) byte []receiveBinaryMessage() // if returning utf-8 8-bit
>     strings, this is the text receive
>
>     c) class BinaryMessageInputStream {
>         int read(byte []buffer, int offset, int length);
>       }
>
>     d) class TextMessageReader {
>         int read(char []buffer, int offset, int length);
>       }
>
>
> I would think there would also be an event-driven API, since that is 
> what the JS API is going to look like -- where there is an onMessage 
> event called with a complete message after it has been received.  The 
> only public server-side API I know of (Jetty7) also operates this way.

Yes, but event-based doesn't make a difference to the protocol, because 
you're either moving the string read by receiveTextMessage() or passing 
a notification to a stream-based receiver.
>  
>
>     A good frame design will support all 5 sender API styles and 4
>     receiver API styles in a straightforward/efficient manner. You
>     can't just pick one API that happens to make a frame proposal work.
>
>
> Can you explain how your proposed framing is any different for any of 
> these APIs?  The difference is about the size of the short frame 
> length, leaving out the extension information which isn't going to be 
> used in the base protocol anyway, and leaving out the overall message 
> length which means you have to guess a buffer size for fragmented 
> messages and grow it if necessary (or fail the receive if using a 
> too-small caller-supplied buffer).

The 127 byte short frame length can only be used efficiently for the 
sendBinaryMessage(), because you know the total length.

For the stream-based binary sender, a simple implementation is to use a 
fixed-length buffer if you care about efficiency, and reserve the bytes 
for the frame header, which means its length needs to be fixed, no 
matter whether the message happened to be 4 bytes, 1024 bytes, or extend 
past the buffer size. It's the variable-length frame header that's a 
killer for streams.

In the case of your proposal, the stream writer (for an 8k fixed buffer) 
needs a different frame header pre-allocation for a message that's less 
than 127 bytes, from a message that's less than 8k - 10, from a message 
that overflows and goes into fragment mode. Basically, your proposal 
takes away a simple and efficient stream implementation, forcing more 
complicated implementations (like shifting or pointers or writev, etc.).

The stream-based text sender is essentially the same as the binary 
(although there is the additional complication of splitting the final 
character across buffers.)

For sendTextMessage(), if you want a single-pass encoding, you have the 
same issues as the stream based senders. You could calculate the length 
in a second pass, but that's an added inefficiency (which would almost 
certainly be more expensive than the end-to-end overhead of the 
additional byte.) In other words, it has it's own issues and trade-offs 
distinct from the other APIs.

-- Scott
>
> -- 
> John A. Tamplin
> Software Engineer (GWT), Google