Re: [hybi] voting on frame length ideas

"Shelby Moore" <shelby@coolpage.com> Mon, 23 August 2010 19:16 UTC

Return-Path: <shelby@coolpage.com>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 388BB3A6ADD for <hybi@core3.amsl.com>; Mon, 23 Aug 2010 12:16:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.995
X-Spam-Level:
X-Spam-Status: No, score=-1.995 tagged_above=-999 required=5 tests=[AWL=0.004, BAYES_00=-2.599, J_CHICKENPOX_66=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Y4VjAXOZ2rpb for <hybi@core3.amsl.com>; Mon, 23 Aug 2010 12:16:27 -0700 (PDT)
Received: from www5.webmail.pair.com (www5.webmail.pair.com [66.39.3.83]) by core3.amsl.com (Postfix) with SMTP id 4B4043A6AC4 for <hybi@ietf.org>; Mon, 23 Aug 2010 12:16:25 -0700 (PDT)
Received: (qmail 31497 invoked by uid 65534); 23 Aug 2010 19:16:58 -0000
Received: from 121.97.54.174 ([121.97.54.174]) (SquirrelMail authenticated user shelby@coolpage.com) by sm.webmail.pair.com with HTTP; Mon, 23 Aug 2010 15:16:58 -0400
Message-ID: <37c00bfca6af779bbbb690f6e2a3e3e6.squirrel@sm.webmail.pair.com>
In-Reply-To: <AANLkTinsTTvoecxE-kqVTCL4QiK2WN=7Lyb1-QSE6_CN@mail.gmail.com>
References: <AANLkTinh5ON_L9yc29Y2CMrEeJHV=nvRhQauMSFi3ib1@mail.gmail.com> <4C7222EC.2000804@gmx.de> <dd27e6adb004dcab167038b2ec0ab280.squirrel@sm.webmail.pair.com> <AANLkTinmp+mG0ac4FgqFnJJZjxR5_hcaXYX5H7Y+cNNO@mail.gmail.com> <fe0a821d3c2bd14039e395ed1800263a.squirrel@sm.webmail.pair.com> <AANLkTinsTTvoecxE-kqVTCL4QiK2WN=7Lyb1-QSE6_CN@mail.gmail.com>
Date: Mon, 23 Aug 2010 15:16:58 -0400
From: Shelby Moore <shelby@coolpage.com>
To: John Tamplin <jat@google.com>
User-Agent: SquirrelMail/1.4.20
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal
Cc: Hybi <hybi@ietf.org>
Subject: Re: [hybi] voting on frame length ideas
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: shelby@coolpage.com
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 23 Aug 2010 19:16:29 -0000

> On Mon, Aug 23, 2010 at 12:34 PM, Shelby Moore <shelby@coolpage.com>
> wrote:
>
>> > I think it is ludicrous to suggest that an implementation would
>> advertise
>> > a
>> > maximum frame length of 126 just so its code for option #2 would look
>> > like:
>> >
>> > int len=readByte();
>> >
>> > rather than:
>> >
>> > long long len = readByte();
>> > switch (len) {
>> >   case 126:
>> >      len = readUnsignedShort();
>> >      break;
>> >   case 127:
>> >      len = readLongLong();
>> >      break;
>> > }
>>
>>
>> I do not think is it ludicrous for an implementation to choose the
>> option
>> that gives it the highest throughput.
>>
>
> I think that depends on what you call througput.  If you mean
> frames/second,
> then clearly it should always indicate a size of 1 so it can get the most
> possible frames.  If you mean payload bytes/second, then I disagree --
> having one 100k frame and doing an extra compare instruction is going to
> get
> far more payload bytes through in the same time than processing 100k
> 1-byte
> frames, even if it doesn't include a comparison.


The key epiphany is that it is a threshold issue with many interdependent
variables, which I can tell you haven't realized yet (given what you wrote
above). Around the threshold of 126, the choice might go either way
depending on the internal design and components and the external network
speed. Please go read what I originally wrote more carefully. I have
explained too many times already.  I know the list is getting tired of me
replying. I think I explained it already well enough.

Besides it is bad engineering to not add another byte and leave some
margin for error:

http://www.ietf.org/mail-archive/web/hybi/current/msg03584.html


>> I do not expect an implementation to be coded with concern as to the
>> load
>> on the senders CPU, when it conflicts with its own internal efficiency.
>> Economics and game theory.
>
>
> There was no such assumption, see above.


See above again too.


>> I already refuted that:
>>
>> http://www.ietf.org/mail-archive/web/hybi/current/msg03517.html
>>
>> If the receiver expects to be attacked, then checking the size field
>> guarantees nothing.
>
>
> Not checking it means a sender that ignored the maximum frame size could
> get
> the receiver out of sync with framing boundaries.  Experience has shown
> many
> attacks to follow from such problems.


It doesn't matter if you check the size, the size may be a lie too. I
can't believe I had to say that 4 times already. Really you can't see
that?


>> Regardless, I think the cost of a compare and an untaken branch is
>> > absolutely negligible when comparing the other costs of processing a
>> > frame.
>>
>> Someone else in list said otherwise.
>
>
> Which you are taking out of context and extending it far beyond the
> circumstances it was intended.  You also attributed that person to support
> your proposal, which in fact they did not.  Regardless, it would be best
> to
> speak your own opinions rather than restating someone elses.


I have not said any one supports my proposal.  And here are the posts and
I think it is pretty clear that he talks about CPU speed, and says "126"
threshold may miss the sweet spot (we don't know if he is only referring
to network bandwidth or to his prior point about CPU speed, but
nevertheless I think the threshold at 126 is too low and is unnecesary
risk for no gain):

http://www.ietf.org/mail-archive/web/hybi/current/msg03439.html
http://www.ietf.org/mail-archive/web/hybi/current/msg03418.html
http://www.ietf.org/mail-archive/web/hybi/current/msg03453.html
http://www.ietf.org/mail-archive/web/hybi/current/msg03435.html
http://www.ietf.org/mail-archive/web/hybi/current/msg03434.html
http://www.ietf.org/mail-archive/web/hybi/current/msg03429.html
http://www.ietf.org/mail-archive/web/hybi/current/msg03422.html

Rather I think you may be conflating this message, which was a different
question as your focus wasn't on the CPU efficiency:

http://www.ietf.org/mail-archive/web/hybi/current/msg03446.html


>> Any way, we lose nothing (negligible
>> 2-4% loss) by getting rid of the conditional branches.  And we gain more
>> reserved bits and other advantages with the "Option 8/16/32/64-bit".
>>
>> Are you arguing both sides of the fence in the post below?
>>
>> http://www.ietf.org/mail-archive/web/hybi/current/msg03532.html
>>
>> Is the extra byte costly or not?
>
>
> Between these options, there is an extra byte for small frames regardless.


So what does that byte cost us in risk??

See above. It costs us that the threshold around 126 is dangerous:

* data serializer might not be as efficient as you measured before
* maximum frame thresholding has only the 126 choice
* and who knows what else we didn't think of

Does NASA design rockets with 0% tolerance for failure?


>> >> 3) After I proposed "Option 15/63-bit", a new option was proposed
>> >> "Option
>> >> 8/16/32/64-bit". I can't yet decide between these two options, so I
>> >> voted
>> >> for both of them in the informal poll (and voted against all other
>> >> options, except "Option #2 - 7/16/63-bit"). The "Option
>> 8/16/32/64-bit"
>> >> gives more reversed bits, but it adds a CPU cost of loading the
>> LenLen,
>> >> loading a value of 1 and left shifting by LenLen.
>> >
>> >
>> > Rather than calculating a number of bytes to read for the length, it
>> seems
>> > more likely it will be:
>> >
>> > short header = readUnsignedShort();
>> > long long len;
>> > // possibly shifting to make it 0-3 for table jump implementations
>> > switch(header & LEN_LEN_MASK) {
>> >   case LEN_LEN_8:
>> >      len = readByte();
>> >      break;
>> >   case LEN_LEN_16:
>> >      len = readUnsignedShort();
>> >      break;
>> >   case LEN_LEN_32:
>> >      len = readUnsignedLong();
>> >      break;
>> >   case LEN_LEN_64:
>> >      len = readLongLong();
>> >      break;
>> > }
>>
>>
>> Yikes! Why?  Why not use the left shift, it is much more efficient than
>> a
>> branch.  Branches destroy CPU pipelining.
>>
>
> You are going to have branches anyway.  Your suggestion would be (inlining
> the loop for clarity):
>
> short header = readUnsignedShort();
> long long len;
> int lenlen = (header & LEN_LEN_MASK) >> LEN_LEN_SHIFT;
> int lengthBytes = 1 << lenlen;
> while (lengthBytes-- > 0) {
>   lenlen = (lenlen << 8) | readByte();
> }
>
> Even if you unroll the loop and branch to different points in the reading
> code, there is still a branch there:
>
> It is really pretty basic -- if you have a variable number of bytes for
> the
> length, there will be branches regardless.  If the number of branches are
> small, you can optimize for the branch-not-taken case, which will have
> minimal impact on the pipeline on most architectures in the 98% case.


Good point!  I hadn't thought that part out yet (mind has been too busy on
these other issues).

So it looks like my original proposal of "Option 15/63-bit" is best if we
care about minimum CPU load on the frame header processing.

If frames end up being as small as you hope, that could be significant.