Re: [hybi] voting on frame length ideas

"Shelby Moore" <shelby@coolpage.com> Mon, 23 August 2010 16:34 UTC

Return-Path: <shelby@coolpage.com>
X-Original-To: hybi@core3.amsl.com
Delivered-To: hybi@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 1AA1A3A68E6 for <hybi@core3.amsl.com>; Mon, 23 Aug 2010 09:34:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.819
X-Spam-Level:
X-Spam-Status: No, score=-1.819 tagged_above=-999 required=5 tests=[AWL=-0.135, BAYES_00=-2.599, J_CHICKENPOX_66=0.6, SARE_MILLIONSOF=0.315]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zPp17Bf7ki4E for <hybi@core3.amsl.com>; Mon, 23 Aug 2010 09:33:58 -0700 (PDT)
Received: from www5.webmail.pair.com (www5.webmail.pair.com [66.39.3.83]) by core3.amsl.com (Postfix) with SMTP id E0BDB3A67B4 for <hybi@ietf.org>; Mon, 23 Aug 2010 09:33:56 -0700 (PDT)
Received: (qmail 87393 invoked by uid 65534); 23 Aug 2010 16:34:27 -0000
Received: from 121.97.54.174 ([121.97.54.174]) (SquirrelMail authenticated user shelby@coolpage.com) by sm.webmail.pair.com with HTTP; Mon, 23 Aug 2010 12:34:27 -0400
Message-ID: <fe0a821d3c2bd14039e395ed1800263a.squirrel@sm.webmail.pair.com>
In-Reply-To: <AANLkTinmp+mG0ac4FgqFnJJZjxR5_hcaXYX5H7Y+cNNO@mail.gmail.com>
References: <AANLkTinh5ON_L9yc29Y2CMrEeJHV=nvRhQauMSFi3ib1@mail.gmail.com> <4C7222EC.2000804@gmx.de> <dd27e6adb004dcab167038b2ec0ab280.squirrel@sm.webmail.pair.com> <AANLkTinmp+mG0ac4FgqFnJJZjxR5_hcaXYX5H7Y+cNNO@mail.gmail.com>
Date: Mon, 23 Aug 2010 12:34:27 -0400
From: Shelby Moore <shelby@coolpage.com>
To: John Tamplin <jat@google.com>
User-Agent: SquirrelMail/1.4.20
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal
Cc: Hybi <hybi@ietf.org>
Subject: Re: [hybi] voting on frame length ideas
X-BeenThere: hybi@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: shelby@coolpage.com
List-Id: Server-Initiated HTTP <hybi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hybi>
List-Post: <mailto:hybi@ietf.org>
List-Help: <mailto:hybi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hybi>, <mailto:hybi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 23 Aug 2010 16:34:00 -0000

> On Mon, Aug 23, 2010 at 6:48 AM, Shelby Moore <shelby@coolpage.com> wrote:
>
>> It is my understanding that this is caused by a lack of complete
>> understanding of the issues, not by a valid difference of opinion. In
>> fact, most disagreement in life derives from incomplete communication.
>> When we think someone else is 'wrong', 99% of the time it is merely the
>> opposing sides do not yet understand the logic of each other.
>
>
> In this case, I think it is more likely that we have different use-cases
> in
> mind.  Clearly someone who wants to process millions of WS frames per
> second
> is going to have a different perspective from someone who wants to easily
> implement a WS server in python to service a handful of requests per hour.
>  It is unlikely those two will agree on how it should be done on their own
> accord.


In my experience will be always be able to converge, because never I have
I found that I have to compromise non-optimally on design.  It is always a
problem of not seeing the optimal design, before eventually finding the
optimal one that works well for everyone. If not, it means conflation has
been pushed into the layer being designed and one needs to take a broader
view and fix the conflation first.

The acrimony comes from clinging to conflation and not able to see that
can do the same thing just as efficiently without conflation.


>> Thus I explained that due to its internal economics, an implementation
>> might be optimized with its maximum size larger than 126 if there were
>> no
>> costs of doing so, but perhaps not if the cost of processing
>> conditionals
>> combined with the fact that 98% of frames are less than 126, justified
>> setting a _STRICT_ maximum size of 126, if that eliminated the
>> conditionals. Remember above I pointed out that the _STRICT_ maximum
>> size
>> was a benefit when coupled with frame size that uses conditionals,
>> because
>> it allows the implementation to control its costs and maximize its
>> economics. But the problem is that if we choose an option which can only
>> do 0 - 126 size with conditionals eliminated by the _STRICT_ maximum
>> size,
>> then the (implementation self-optimization) optimization is a disaster,
>> because it means a large frame has to be fragmented by the sender to 126
>> byte frames! However, the receiving implementation probably did not have
>> any benefit of limiting the maximum frame size to 126, except for the
>> minimization of the conditionals cost.
>>
>
> I think it is ludicrous to suggest that an implementation would advertise
> a
> maximum frame length of 126 just so its code for option #2 would look
> like:
>
> int len=readByte();
>
> rather than:
>
> long long len = readByte();
> switch (len) {
>   case 126:
>      len = readUnsignedShort();
>      break;
>   case 127:
>      len = readLongLong();
>      break;
> }


I do not think is it ludicrous for an implementation to choose the option
that gives it the highest throughput.

I do not expect an implementation to be coded with concern as to the load
on the senders CPU, when it conflicts with its own internal efficiency. 
Economics and game theory.


> while the former is certainly faster, that neglects the overhead of having
> to process thousands of frames instead of one, which is what this would
> require the sender to do when it needed to send a frame larger than 125
> bytes.


The receiver's economics are orthogonal to the sender's economics in this
case.


>  Even if the receiver didn't care about costs to the sender or the
> intervening networks, the receiver on net spends far more CPU time per
> payload byte than if it just did the checks.


Please read what I originally wrote again.  I took that into consideration
and still the disaster result occured in my game theory.  Consider all the
variables in the game theory. If I was not clear/coherent/lucid, maybe you
can point that out.


> In addition, I think the receiver would still have to check that the
> length
> byte was not 126 or 127, because their is no guarantee that the sender is
> playing by the rules.


I already refuted that:

http://www.ietf.org/mail-archive/web/hybi/current/msg03517.html

If the receiver expects to be attacked, then checking the size field
guarantees nothing.


>> So ladies and gentlemen, that is why I proposed "Option 15/63-bit" (I
>> formerly called it "Option 1/15-bit"). Do you understand now?
>>
>> The 15-bit (32767 bytes) size when conditionals are eliminated by
>> optimization of the _STRICT_ maximum size, is more than sufficient to
>> prevent the above disaster scenario.  Also it has negligible (2%? [1])
>> costs relative to the prior best option, "Option #2 - 7/16/63-bit":
>>
>
> I still think you can't get away from validating the input length, which
> means you still have conditionals.


Incorrect.  See above.


> Regardless, I think the cost of a compare and an untaken branch is
> absolutely negligible when comparing the other costs of processing a
> frame.


Someone else in list said otherwise.  Any way, we lose nothing (negligible
2-4% loss) by getting rid of the conditional branches.  And we gain more
reserved bits and other advantages with the "Option 8/16/32/64-bit".

Are you arguing both sides of the fence in the post below?

http://www.ietf.org/mail-archive/web/hybi/current/msg03532.html

Is the extra byte costly or not?


>> 3) After I proposed "Option 15/63-bit", a new option was proposed
>> "Option
>> 8/16/32/64-bit". I can't yet decide between these two options, so I
>> voted
>> for both of them in the informal poll (and voted against all other
>> options, except "Option #2 - 7/16/63-bit"). The "Option 8/16/32/64-bit"
>> gives more reversed bits, but it adds a CPU cost of loading the LenLen,
>> loading a value of 1 and left shifting by LenLen.
>
>
> Rather than calculating a number of bytes to read for the length, it seems
> more likely it will be:
>
> short header = readUnsignedShort();
> long long len;
> // possibly shifting to make it 0-3 for table jump implementations
> switch(header & LEN_LEN_MASK) {
>   case LEN_LEN_8:
>      len = readByte();
>      break;
>   case LEN_LEN_16:
>      len = readUnsignedShort();
>      break;
>   case LEN_LEN_32:
>      len = readUnsignedLong();
>      break;
>   case LEN_LEN_64:
>      len = readLongLong();
>      break;
> }


Yikes! Why?  Why not use the left shift, it is much more efficient than a
branch.  Branches destroy CPU pipelining.


[snip, because agreed]


>> Let us contemplate how the _STRICT_ maximum size impacts the 63-bit
>> sendFile() optimization?  An implementation will choose to minimize its
>> costs, so we have to consider the selfish economics of what ever we
>> choose. Perhaps with "Option 8/16/32/64-bit" we can tolerate the one
>> conditional and make maximum sizes a hint, not _STRICT_?
>>
>
> If the receiver only has a fixed 64k buffer and you send it a 128k frame,
> what is it supposed to do?  I don't see any recourse besides dropping the
> connection.  Doubly so if it told the sender during the handshake that was
> the largest frame size it could accept.


Agreed, so I see no advantage to making maximum size not _STRICT_.


>> [1] I write that only 'slightly', because John Tamplin had provided a
>> distribution of frame sizes from some real world applications, which
>> claims that only 2% of frame sizes will be greater than 126. I write
>> 'claims', because statistics of the present, do not always predict the
>> future (aka Murphy's Law). One thing I just realized now, is I forgot to
>> check/ask John the mean and std deviation of frame size? To better
>> quantify the impact of adding 1 byte to the header as the other options
>> do.
>>
>
> Mean/stddev from the sample traces:
>
>    - Wave C->S uncompressed: 287/80, compressed: 39.8/29.7;  S->C
>    uncompressed: 213/234, compressed: 34/33.4
>    - GWT Quake C->S uncompressed: 51.6/22.6, compressed: 24.7/6.2;   S->C
>    uncompressed: 225/139, compressed: 39/34.8

Thank you. So I was roughly correct to use 50 before and estimate a
negligible 2% cost for adding a byte.  Even at 4%, that is still
negligible in overall scheme of things.