Re: [hybi] hum #3: Message

John Tamplin <jat@google.com> Thu, 05 August 2010 19:53 UTC

Return-Path: <jat@google.com>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id E92733A690D for <hybi@core3.amsl.com>; Thu, 5 Aug 2010 12:53:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -104.97
X-Spam-Level:
X-Spam-Status: No, score=-104.97 tagged_above=-999 required=5 tests=[AWL=-0.794, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, J_CHICKENPOX_46=0.6, J_CHICKENPOX_51=0.6, J_CHICKENPOX_56=0.6, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id x67GnGGeY1gT for <hybi@core3.amsl.com>; Thu, 5 Aug 2010 12:53:49 -0700 (PDT)
Received: from smtp-out.google.com (smtp-out.google.com [216.239.44.51]) by core3.amsl.com (Postfix) with ESMTP id A23833A691B for <hybi@ietf.org>; Thu, 5 Aug 2010 12:53:49 -0700 (PDT)
Received: from hpaq5.eem.corp.google.com (hpaq5.eem.corp.google.com [172.25.149.5]) by smtp-out.google.com with ESMTP id o75JsJt3031851 for <hybi@ietf.org>; Thu, 5 Aug 2010 12:54:19 -0700
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=google.com; s=beta; t=1281038060; bh=EHS+56SteaPlhYslbSGaJ//icFs=; h=MIME-Version:In-Reply-To:References:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=m+xjPyqI9rAmrwwBIsgZdzpd18Qvlbqs/pBxsi2FK74rcYocImf49WkuI28pAAzhJ QeLLdAIUhpgIZM9Uhpx0Q==
DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns; h=mime-version:in-reply-to:references:from:date:message-id: subject:to:cc:content-type:x-system-of-record; b=Q/PykK1/YO0cmlA6TwTWA+bsJoPcT8GPyvbMyapnOPbccymj8srFS4GOE9ynXRVQs jbp+JGrk9kfrLsZK/9xCg==
Received: from qwa26 (qwa26.prod.google.com [10.241.193.26]) by hpaq5.eem.corp.google.com with ESMTP id o75JsHPK022208 for <hybi@ietf.org>; Thu, 5 Aug 2010 12:54:18 -0700
Received: by qwa26 with SMTP id 26so4863486qwa.23 for <hybi@ietf.org>; Thu, 05 Aug 2010 12:54:17 -0700 (PDT)
Received: by 10.229.190.13 with SMTP id dg13mr3980638qcb.98.1281038057378; Thu, 05 Aug 2010 12:54:17 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.229.17.130 with HTTP; Thu, 5 Aug 2010 12:53:56 -0700 (PDT)
In-Reply-To: <Pine.LNX.4.64.1008051930160.5947@ps20323.dreamhostps.com>
References: <4C5AE93D.4040803@ericsson.com> <Pine.LNX.4.64.1008051758290.5947@ps20323.dreamhostps.com> <AANLkTik0kbh14s2JZARY2MFh0iNGV7H+B4Px4yG+wX44@mail.gmail.com> <71BCE4BF-D3F6-4F94-BE76-306BDF6A2E67@apple.com> <Pine.LNX.4.64.1008051930160.5947@ps20323.dreamhostps.com>
From: John Tamplin <jat@google.com>
Date: Thu, 05 Aug 2010 15:53:56 -0400
Message-ID: <AANLkTim_PzXf0r=nfhCgtxpt-=s8-51hdAe0z2bSd5B9@mail.gmail.com>
To: Ian Hickson <ian@hixie.ch>
Content-Type: multipart/alternative; boundary="0016361e82305a94eb048d18eabe"
X-System-Of-Record: true
Cc: "hybi@ietf.org" <hybi@ietf.org>
Subject: Re: [hybi] hum #3: Message
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Aug 2010 19:53:51 -0000

On Thu, Aug 5, 2010 at 3:31 PM, Ian Hickson <ian@hixie.ch> wrote:

> I was trying to express both.
>
> Having to split a 2GB file into a bazilion pieces and write each one out
> individually with frame headers, compared to just writing the whole thing
> at once, seems inefficient far beyond simply the concern about the
> on-the-wire overhead.


Assuming you don't care about buffering requirements on the client and you
only care about the case where you already have the entire data to send in
memory and you don't care about compressing the output, then I agree just
doing write(socket, buf, len) is easier.  However, it isn't much easier
than:

while(len > MAX_FRAME_SIZE) {
  writeHeader(socket, true, MAX_FRAME_SIZE);
  write(socket, buf, MAX_FRAME_SIZE);
  buf+=MAX_FRAME_SIZE;
  len-= MAX_FRAME_SIZE;
}
writeHeader(socket, false, len);
write(socket, buf, len);

Also, you have previously stated that you expect compression to always be
used once we get around to supporting it.  I have given pseudocode
previously of what the compression loop looks like -- if I have to send the
length for everything up front, I have to complete compressing the entire
data before I can send one byte over the wire.  I have to have an
arbitrarily large output buffer (which means reallocating it with multiple
copies), which is inefficient.

Finally, there are cases where the WebSocket server isn't the ultimate
source of the data to send, so requiring a length up-front means buffering
every byte from the source before you can write a single byte on the
WebSocket connection.  If instead I have fragments, I can use a fixed-size
buffer that I never have to reallocate, read the data from the server, and
write individual WebSocket frames as that buffer fills.

Regarding the small-frame case -- a variable-length length field is only
more efficient than fixed length if the payload data is between 0 and 127
bytes.  In that case, the extra byte required seems insignificant to the
TCP/IP overhead for a small packet, where even mobile devices don't use IP
header compression.  Again, this doesn't seem to rise up to "quite
inefficient".

When considering inefficient beyond bytes on the wire, consider these two
receivers:

int len = 0;
int c;
do {
  c = readByte(socket);
  len = (len << 7) + (c & 127);
} while(c & 128);
(re)allocate buf[len]
read(socket, buf, len);

vs

struct WSHeader hdr;
read(socket, &hdr, sizeof(hdr));
fixup byteorder
// buf is preallocated to be the maximum size frame
read(socket, buf, hdr.len);

[ignoring error handling of course]

The former again requires either many allocations/frees of the buffer or
reallocating it as larger frames are seen, and more system calls.

-- 
John A. Tamplin
Software Engineer (GWT), Google