Re: [hybi] frame length encoding

"Shelby Moore" <shelby@coolpage.com> Sun, 22 August 2010 22:51 UTC

Return-Path: <shelby@coolpage.com>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 711763A68CC for <hybi@core3.amsl.com>; Sun, 22 Aug 2010 15:51:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.217
X-Spam-Level:
X-Spam-Status: No, score=-2.217 tagged_above=-999 required=5 tests=[AWL=0.382, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id H+eG6J+BfAha for <hybi@core3.amsl.com>; Sun, 22 Aug 2010 15:51:13 -0700 (PDT)
Received: from www5.webmail.pair.com (www5.webmail.pair.com [66.39.3.83]) by core3.amsl.com (Postfix) with SMTP id 2EF663A6407 for <hybi@ietf.org>; Sun, 22 Aug 2010 15:51:13 -0700 (PDT)
Received: (qmail 72520 invoked by uid 65534); 22 Aug 2010 22:51:46 -0000
Received: from 121.97.54.174 ([121.97.54.174]) (SquirrelMail authenticated user shelby@coolpage.com) by sm.webmail.pair.com with HTTP; Sun, 22 Aug 2010 18:51:46 -0400
Message-ID: <7dc6f8cf8a992714c55238cc526b4f94.squirrel@sm.webmail.pair.com>
In-Reply-To: <AANLkTiny4c9LuYDCtdcviUnwEvcocoCpZc82dUbwuq=d@mail.gmail.com>
References: <AANLkTimKbmcpgx8k0uXUWvCO=8w9pPrtV=3y4qh6363k@mail.gmail.com> <224b9ed365bd78fd5e316b8cb5f3f837.squirrel@sm.webmail.pair.com> <1282435214.2014.14.camel@tng> <AANLkTimo0MwZEMn1t1vrASfwC1bx82Q9Z_Ls3wVb-zUS@mail.gmail.com> <b95f074b65875865802f532bb5668ff2.squirrel@sm.webmail.pair.com> <AANLkTi=AXLFPSASV2zkBiUU=1StO=YSrKq_9AZ2ZnVHy@mail.gmail.com> <8cd6ecfebb4a073ecf94c8e1aa56e642.squirrel@sm.webmail.pair.com> <77aecf89c6c8673f1b999f80fa04e005.squirrel@sm.webmail.pair.com> <AANLkTik9tpCQr9LjK0qdLuA1KfJv1MN9yK2UZ1ytxfCW@mail.gmail.com> <fb8bfae1b88ade55cad4234af724004b.squirrel@sm.webmail.pair.com> <AANLkTimAu5de0PnujHRwR0nnXFBpqdJoRWZ=UvGrLVJ7@mail.gmail.com> <65fc5176b7cc6c775ec167f4404b43ed.squirrel@sm.webmail.pair.com> <73243b9f687a0c8adac90852ba567256.squirrel@sm.webmail.pair.com> <AANLkTi=19bnH9rKAF5DxWuKTLQJr2ZNSegoMHZFEL8=L@mail.gmail.com> <eb5c555ae1a9793c86e32b631c7e3cc6.squirrel@sm.webmail.pair.com> <AANLkTiny4c9LuYDCtdcviUnwEvcocoCpZc82dUbwuq=d@mail.gmail.com>
Date: Sun, 22 Aug 2010 18:51:46 -0400
From: Shelby Moore <shelby@coolpage.com>
To: John Tamplin <jat@google.com>
User-Agent: SquirrelMail/1.4.20
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal
Cc: Hybi <hybi@ietf.org>
Subject: Re: [hybi] frame length encoding
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: shelby@coolpage.com
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 22 Aug 2010 22:51:14 -0000

Exactly processing the header is a stateful and ordered process.

Thus, the time that the CPU spends processing the header, is time that the
network might be idled. The bottleneck is processing the header and then
shunting the frame processing to the other cores. So as the frames get
smaller, the header is larger % of network bandwidth. If the CPU needs
more time processing the header than it takes to move the header over the
network, then at the asymptote of 0 size payload, then network bandwith is
more than 50% wasted.  I nearly certain that as Pieter's point.

Thus eliminating the comparisons is crucial for very small packets.

So it is likely that massively multi-core implementations will choose to
set the maximum size to 125 if you go with Option 2, or 126 with Option 1,
because that is where their economy is maximized.

That is why I proposed you go with Option 1/15-bit, so they can set the
maximum size to 32767 instead.

The comparisons can be eliminated if you set maximum at handshake and
expect the sender to honor it.  You can have a separate version of your
code for that hardcoded maximum.  Even self-writing code if necessary.

Do you finally get my point?


> On Sun, Aug 22, 2010 at 5:32 PM, Shelby Moore <shelby@coolpage.com> wrote:
>
>> Network bandwidth is increasing faster than single-core speeds are.
>> Multi-cores are being used to keep up with Moore's Law.
>>
>
> I disagreed with that point as did others, but it is irrelevant to the
> discussion at hand.  He brought that point up in the discussion about
> compressing frames to get the benefit of a small frame.
>
>
>> But the problem is that stateful (Turing complete) algorithms don't
>> scale
>> well to multi-core.  Stateless (lambda pure) algorithms do scale well to
>> multi-core. Maybe you are aware that pure Haskell can scale by compiler
>> to
>> any number of multi-core, whereas stateful languages are brittle mashups
>> of mutex and semaphore heuristics (not mathematically proveable).
>>
>
> Yes, I am aware.  You aren't going to be parallelizing decoding this frame
> anyway.  You can't start processing the payload until you have understood
> the opcode (which indirectly gets you the start of the payload data), the
> CMLP flag, and parsed the length (which gets you the end of the payload
> data).  Even if the length is a fixed 8-byte field, there is still an
> ordering requirement.
>
> The problem with stateful code is that there is always point where adding
>> more cores actually starts to reduce the speed (because of the
>> non-mathematical fit of the algorithm and the communication/coordination
>> overhead between cores growing faster than the speed added by the
>> additional cores).  Basically you end up in mutex/semaphore deadlock
>> hell
>> of inefficiency.
>>
>> If I am not mistaken, the comparisons in the size header field logic,
>> create a state-machine that has to be communicated/coordinated with the
>> multiple cores.  The comparisons change the algorithm from stateless to
>> stateful.  Without the comparisons (and other opcodes and state changes
>> to
>> be added), more cores can be used before the inefficiency overwhelms.
>>
>
> You aren't going to have mutliple cores trying to decode a single frame
> anyway -- the synchronization overhead will be far more than the cost to
> process even the most complex framing we have discussed here.  You might
> try
> and farm off (de)compression to another core, but again that is different
> than trying to decide how many bytes there are in the frame.
>
> Note that Pieter initially voted for Option #1, then introduced Option #0
> which was the same but assigned a reserved bit to the length field.  It
> still had a comparison for the length, so I don't think you are accurately
> representing his opinion.
>
>
>> And also I just think the 15-bits versus 7-bits as the maximum size that
>> can go through without comparisons is safer.  We can not see every
>> possible issue from this WG.  We don't lose much at all, maybe 2 - 5% in
>> network bandwidth, but we gain a lot of comfort zone against the
>> unexpected.
>>
>
> Please explain how it can go through without comparison.  Even if you
> allocate a bit to define which length to use, you still have to test that
> bit.  If you negotiate in the handshake a maximum, you have to test some
> flag you set during negotiation.
>
> --
> John A. Tamplin
> Software Engineer (GWT), Google
>