Re: [bmwg] Mean vs Median

Marius Georgescu <liviumarius-g@is.naist.jp> Tue, 10 November 2015 07:16 UTC

Return-Path: <liviumarius-g@is.naist.jp>
X-Original-To: bmwg@ietfa.amsl.com
Delivered-To: bmwg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5075F1A00A1 for <bmwg@ietfa.amsl.com>; Mon, 9 Nov 2015 23:16:33 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.102
X-Spam-Level:
X-Spam-Status: No, score=-0.102 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ssXnU_aenAs3 for <bmwg@ietfa.amsl.com>; Mon, 9 Nov 2015 23:16:31 -0800 (PST)
Received: from mailrelay22.naist.jp (mailrelay22.naist.jp [163.221.80.91]) by ietfa.amsl.com (Postfix) with ESMTP id 2E06B1A0049 for <bmwg@ietf.org>; Mon, 9 Nov 2015 23:16:30 -0800 (PST)
Received: from mailpost22.naist.jp (mailscan22.naist.jp [163.221.80.59]) by mailrelay22.naist.jp (Postfix) with ESMTP id 57EA2406 for <bmwg@ietf.org>; Tue, 10 Nov 2015 16:16:29 +0900 (JST)
Received: from naist-wavenet125-152.naist.jp (naist-wavenet125-152.naist.jp [163.221.125.152]) by mailpost22.naist.jp (Postfix) with ESMTPSA id 41433405 for <bmwg@ietf.org>; Tue, 10 Nov 2015 16:16:29 +0900 (JST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\))
From: Marius Georgescu <liviumarius-g@is.naist.jp>
In-Reply-To: <4AF73AA205019A4C8A1DDD32C034631D0BB6ADB7B6@NJFPSRVEXG0.research.att.com>
Date: Tue, 10 Nov 2015 16:15:57 +0900
Content-Transfer-Encoding: quoted-printable
Message-Id: <8837B2D6-A6C2-46E7-AD83-EA9FD5D1B784@is.naist.jp>
References: <6b20c5aba195.56384250@naist.jp> <6c1081bddbe0.563844ac@naist.jp> <6c1084a7be89.563844e9@naist.jp> <6a608b65b1c2.56384525@naist.jp> <6a60d6ebaa6a.56384561@naist.jp> <6a80d3baddd6.5638459e@naist.jp> <6aa08a52c1ca.563845da@naist.jp> <6aa09799f4a7.563846ca@naist.jp> <6b60a07c9bbf.56384707@naist.jp> <6c109c80bfc2.56384743@naist.jp> <6a60e1ff9170.56384780@naist.jp> <6a60f4388bab.563847bc@naist.jp> <6bd0f10697e2.563847f8@naist.jp> <6a409179ad4a.56384835@naist.jp> <6a80cfd8c72d.56384871@naist.jp> <6c30b15ad280.563848ae@naist.jp> <6c30f0e98215.563848ea@naist.jp> <6c10c39aeff9.56384926@naist.jp> <6ab08659b996.56384963@naist.jp> <6ab0ea4dfdd6.563849a0@naist.jp> <6ab0be62e098.563849dc@naist.jp> <6aa0abb5b14b.56384a19@naist.jp> <6aa0e679a9c8.56384a55@naist.jp> <6b60e1babb96.56384a93@naist.jp> <6b60fdd88897.56384acf@naist.jp> <6a509431f711.56384c39@naist.jp> <6a50aab7bf13.5638cb72@naist.jp> <CAPrseCo-E82O+tSvRC=4x-yXYTMEHUW6UjeQK6HBRZwXey=sKg@mail.gmail.com> <5640DA91.30502@net.in.tum .de> <9C1BEDBD-2338-4E1B-8C98-E9479FE01423@is.naist.jp> <4AF73AA205019A4C8A1DDD32C034631D0BB6ADB7AF@NJFPSRVEXG0.research.att.com> <244E19BF-D6DF-4976-BB01-0A149CEB83D5@is.naist.jp> <4AF73AA205019A4C8A1DDD32C034631D0BB6ADB7B6@NJFPSRVEXG0.research.att.com>
To: "bmwg@ietf.org" <bmwg@ietf.org>
X-Mailer: Apple Mail (2.2104)
X-TM-AS-MML: No
X-TM-AS-Product-Ver: IMSS-7.1.0.1392-8.0.0.1202-21932.005
X-TM-AS-Result: No--15.965-5.0-31-10
X-imss-scan-details: No--15.965-5.0-31-10
X-TMASE-MatchedRID: 6i9BTbUAHf2PvrMjLFD6eB5+URxv1WlBGcfGM6EiL4aqvcIF1TcLYI+1 bntEYE/1k7tp0c2NXDLANNz/A2ARwxZK05xEkCkCGYJhRh6ssesl17fI3sjrAE4ijQ77llXhhls 0WvdOE0PgstsimUYlQ0cFF2lKTcjDFWFaSufkdX6VUcz8XpiS9IqwnMcS9aVdSSUXkvSVAdxTyk OINBDQU9apA7KsKWtZPJ8fQW8uMPSvM3xsTL0fvC1Hx9UxMGjdj/xLIaDSshENht78/JfyBDLFP 5XwbK7jENRGJfFquEc7tecGJdPypLnccJTUpEM4gFd2TFaysMhlrsuS5tC+P07jjsR9q0/xsmc+ HzD5HmjYhn+vyyu+X5rNWVT+z8fRPXdZx1sZHpD1RUeLAvHT8vjJwzedaZ0mQQ1XgvCe7sGnWnz zNdvRt6Na6s0qXzxnsX+kkCvwBRjZnXQG4m9JkOqwWVBfMuvoELbqrOgWzydtdJqoKEspy7wPgT nAWi+hJ/wdWVhoSY7s/AoQQPS5EnkAfwlioB7bLi5PDX0qWHpVyg4Yz2U7pMxxkv+2grBSthULH 5OIOlx+EundVpPQ5ijn2a811/k7w86os3lMhj9/HPKCjG4GGIfsPVs/8Vw6wEz5jkq73mU/q6CM JkNS1Zxqj8geAx6++W6kfULtE3rzno3UuCszqvOHbIp2eXtYIM86Aeo6sYIcXmBZ3mI5SamBZJX b6vEPwQREbw2sPlwJlzhlZtqiWIbsexifqsDn2ITTDZb/tDeiNCtus+nPOhjQD3m2MCf7fHgwdT dwM3YbeOvLuTG9Fal8sEIkt5BhdQ2J5dUT9j+eAiCmPx4NwFkMvWAuahr8i76rPO2HauatXQ6KI 0vTGERwZjp2iRLvxEHRux+uk8h+ICquNi0WJDU0fPmmaeSmSIkWb6bHg7NoxvMpGVA/SAPS+5m/ 38TvftwZ3X11IV0=
Archived-At: <http://mailarchive.ietf.org/arch/msg/bmwg/1M6fhwfLW_dsqhY9toJcxFYE8i0>
Subject: Re: [bmwg] Mean vs Median
X-BeenThere: bmwg@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Benchmarking Methodology Working Group <bmwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bmwg>, <mailto:bmwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bmwg/>
List-Post: <mailto:bmwg@ietf.org>
List-Help: <mailto:bmwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bmwg>, <mailto:bmwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Nov 2015 07:16:33 -0000

Hello Al,
> On Nov 10, 2015, at 15:22, MORTON, ALFRED C (AL) <acmorton@att.com> wrote:
> 
> Hi Marius,
> 
> On your first comment below, yes, we need to move beyond the 
> RFC2544 latency of a single packet in new work (while keeping the 
> intent of earlier work in mind).
[MG]
Then, should we agree upon a new procedure for latency in one of the working group items/new working group item? Or maybe reference another procedure for Latency ? - I don’t know any better one.  
> On your second point, relaying Scott's comment about hard and fast rules:
> I guess I would tend toward reporting the Median for any distribution
> (it's not subject to the outliers like mean), but I still believe
> that one statistic is not enough - a metric of variation is needed.
> 
[MG]
I guess we can be sure there’s no perfect solution here.
After talking with you and Kostas about it, I am tending more towards median myself. 
For physical testing, the distribution tends to be normal (at least for my testing experiences). In this context the difference between mean and median would be negligible. 
Since in the virtual world (the future) bimodal is the new normal (if i may say so :), median seems like a better choice. 
I fully agree we should have an additional statistic, accounting for the variance somehow. I proposed the Margin of error. It can be the standard deviation, the standard error … 

Best regards,
Marius

> Al
> (as participant)
> 
>> -----Original Message-----
>> From: bmwg [mailto:bmwg-bounces@ietf.org] On Behalf Of Marius Georgescu
>> Sent: Tuesday, November 10, 2015 12:25 AM
>> To: bmwg@ietf.org
>> Subject: Re: [bmwg] Mean vs Median
>> 
>> Hello Al,
>> 
>> Thank you very much for joining the discussion.
>> Please find my comments inline.
>> 
>>> On Nov 10, 2015, at 10:40, MORTON, ALFRED C (AL) <acmorton@att.com>
>> wrote:
>>> 
>>> Hi Marius, Paul, and all who have contributed so far.
>>> 
>>> a quick reply/differing opinion below.
>>> 
>>>> -----Original Message-----
>>>> From: bmwg [mailto:bmwg-bounces@ietf.org] On Behalf Of Marius
>>>> Georgescu
>>> ...
>>>>> On Nov 10, 2015, at 02:40, Paul Emmerich <emmericp@net.in.tum.de>
>>>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> On 03.11.15 09:45, Stenio Fernandes wrote:
>>>>>> a word of caution here... a number of phenomena in computer
>>>>>> networks follows a heavy-tailed probability distribution function,
>>>>>> which means that there is a non-negligible probability that a
>>>>>> random variable will take huge values. these values might be
>>>>>> erroneously considered
>>>> as outliers.
>>>>> 
>>>>> this is a really important point. I have benchmarked software where
>>>> the 99th percentile of the latency is twice the average/median and
>>>> the 99.9th percentile ten times the average/median.
>>>> 
>>>> Can you give us more context (test setup; physical/virtualized
>>>> tester/DUT; one tester/sender_receiver tester ... ) on these
>>>> measurements?
>>> [ACM]
>>> My understanding (and I've seen some results, but I've had trouble
>>> re-locating them) is that both outliers and bimodal distributions are
>>> more common in the world of virtual DUTs than they were in the
>>> physical/past. Not only does this affect analysis, but the threshold
>>> waiting time for packet arrival must be chosen carefully to even
>>> measure such outliers.
>>>> 
>>>>> This is an important performance characteristic for
>>>>> latency-sensitive
>>>> applications that isn't captured by taking just 20 measurements. So
>>>> I'd really like to see a standard that calls for thousands of latency
>>>> measurements to capture this properly.
>>>>> 
>>>> 
>>>> I think we should keep practicality in mind here. If we follow
>>>> RFC2544.latency measurement, the frame stream has to be 2 min long.
>>>> 2000 min ~ 33h  of testing for just one test sounds unreasonable to
>>>> me. I would agree to have a lower bound for the sample size as
>>>> RFC2544 actually recommends (n > 20).
>>> [ACM]
>>> Latency (delay) and delay variation need many single delay
>>> measurements to be meaningful. One way to view the variation is for a
>>> single flow of packets with spacing that might come from an
>>> application, say 20ms spacing for VoIP. Collecting a few thousand of
>> such packets should not take so long.
>> [MG]
>> I think there is one thing that needs clarification. The procedure in
>> RFC2544 says:
>> 
>> “The stream SHOULD be at least 120 seconds in duration.An identifying
>> tag SHOULD be included in one frame after 60 seconds with the type of
>> tag being implementation dependent.”
>> 
>> I never got to ask Scott this, but  would it make sense to tag more than
>> one frame? If we tag all the frames in the stream, we would have
>> (depending on the throughput) thousands of measurements in 60 seconds.
>> 
>>> 
>>>> 
>>>>> You can also get interesting insights into a black-box device by
>>>>> looking at histograms/probability density functions. For example,
>>>>> you can figure out if the device processes packets in batches,
>>>>> estimate the batch size, figure out at which rates interrupt
>>>>> moderation algorithms change etc. (This is, of course, not really a
>>>>> performance metric, just an interesting insight.)
>>>>> 
>>>> 
>>>> I agree this is an interesting insight. It can also be the base for a
>>>> decision between summarizing functions. However, in the light of
>>>> consistency and simplicity of the methodology, I think we would need
>>>> to recommend one function. We could do that depending on the
>>>> metric/DUT characteristics, previous testing behavior …
>>> [ACM]
>>> I agree the right summary statistics can only be chosen after an
>>> examination of the raw distribution for a particular scenario.  If
>>> Bi-modal, the central statistics of the sample could be meaningless.
>>> Without this examination, I don't think one recommendation can always
>> be right.
>>> 
>> [MG] I agree that analyzing the probability distribution is the best way
>> to choose, but maybe not the most consistent procedure. I had a short
>> discussion about this with Scott before the BMWG meeting. In my
>> understanding of his feedback, the consistency in recommending a
>> summarizing function is more important. In other words, if we leave room
>> for interpreting the data, the results might be more misleading than if
>> we choose  the “wrong”  summarizing function.
>> 
>> Best regards,
>> Marius
>> _______________________________________________
>> bmwg mailing list
>> bmwg@ietf.org
>> https://www.ietf.org/mailman/listinfo/bmwg