Re: [hybi] hum #3: Message

Jack Moffitt <jack@collecta.com> Thu, 05 August 2010 19:55 UTC

Return-Path: <metajack@gmail.com>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 549573A68D9 for <hybi@core3.amsl.com>; Thu, 5 Aug 2010 12:55:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.952
X-Spam-Level:
X-Spam-Status: No, score=-1.952 tagged_above=-999 required=5 tests=[AWL=0.025, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id w3xP2uxcKuld for <hybi@core3.amsl.com>; Thu, 5 Aug 2010 12:55:21 -0700 (PDT)
Received: from mail-qw0-f44.google.com (mail-qw0-f44.google.com [209.85.216.44]) by core3.amsl.com (Postfix) with ESMTP id 59DD83A6831 for <hybi@ietf.org>; Thu, 5 Aug 2010 12:55:21 -0700 (PDT)
Received: by qwe5 with SMTP id 5so4703573qwe.31 for <hybi@ietf.org>; Thu, 05 Aug 2010 12:55:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type; bh=SKm6ADX9HXslJERlpVOAsm+fG8OEYso+1gSJKcmivW4=; b=qyeyQ8D6ZCrimdbYP71bSfrbArwav8fM4JC5c+HETeh3l/0AwSlNlEGdXQI0lwiCeR e+iKUavHXxcstbF7iHLnN/akBURhTPH9O45boUoH9NcwAKbu/RhFDkOCu9tmwoyZlZD8 iDLguyMs2mrA2BSfdMckLxR+4Mrp3qiGvSNW0=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; b=p2BgehPL/Vq2B20mPZIxBReG3A5T+no22GlNYdZn8mIlpNQetaO7IDR4gQpQ39A7CA 3CrJefwamcr7tYqkq796BMWhvthkgIVbcYwnZ/w0SYkB3mHQustj9Cx/2Hs33qDjX8ou 6m8+d8JMN1pLnWoJWQouPn78LuI3pFu6g4N2Y=
MIME-Version: 1.0
Received: by 10.229.71.70 with SMTP id g6mr3855448qcj.179.1281038151782; Thu, 05 Aug 2010 12:55:51 -0700 (PDT)
Sender: metajack@gmail.com
Received: by 10.229.18.147 with HTTP; Thu, 5 Aug 2010 12:55:51 -0700 (PDT)
In-Reply-To: <Pine.LNX.4.64.1008051930160.5947@ps20323.dreamhostps.com>
References: <4C5AE93D.4040803@ericsson.com> <Pine.LNX.4.64.1008051758290.5947@ps20323.dreamhostps.com> <AANLkTik0kbh14s2JZARY2MFh0iNGV7H+B4Px4yG+wX44@mail.gmail.com> <71BCE4BF-D3F6-4F94-BE76-306BDF6A2E67@apple.com> <Pine.LNX.4.64.1008051930160.5947@ps20323.dreamhostps.com>
Date: Thu, 05 Aug 2010 13:55:51 -0600
X-Google-Sender-Auth: ZwSIYlVc_9qC1LSIaMgvhkYqb3c
Message-ID: <AANLkTi=exPoB1dSgvQ+8JRXCJW-HpUEYq4K4yoftjgf_@mail.gmail.com>
From: Jack Moffitt <jack@collecta.com>
To: Ian Hickson <ian@hixie.ch>
Content-Type: text/plain; charset="ISO-8859-1"
Cc: "hybi@ietf.org" <hybi@ietf.org>
Subject: Re: [hybi] hum #3: Message
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Aug 2010 19:55:22 -0000

> Having to split a 2GB file into a bazilion pieces and write each one out
> individually with frame headers, compared to just writing the whole thing
> at once, seems inefficient far beyond simply the concern about the
> on-the-wire overhead.

Processing a 2GB file as a stream (or rather as a sequence of
variously sized chunks) is how it is normally done. You don't read in
a 2GB file in C, and you don't read in such a file in Python either.
You typically read either in fixed size amounts or in varying amounts
in a nonblocking fashion.

Sure, lots of people write code initially that reads everything at
once and writes it at once, but most of that code is thrown away or
changed as soon as the person realizes it as a bottleneck. It will
fail by running out of memory, being extremely slow, or locking up the
system in I/O when processing large files.

I think both concerns are relatively minor.

a) The overhead is pretty small on the wire (though some care should
be taken this applies to many small messages as well). Any framing for
getting to messages so far introduces at least one byte per message of
overhead. I think all the length-based proposals so far are 2 bytes.
If the message is small, it's not really much different except in the
extreme case of 1 byte payloads. If the message is large, 2 bytes
wasted out of every 64k or so hardly seems a high price.

b) The extra work to deal with chunks is already happening (in all but
the most trivial implementations), and is pretty minimal for writing
chunks, and only slightly harder for reading them.

jack.