Re: [bmwg] Mean vs Median

Marius Georgescu <liviumarius-g@is.naist.jp> Tue, 10 November 2015 05:25 UTC

Return-Path: <liviumarius-g@is.naist.jp>
X-Original-To: bmwg@ietfa.amsl.com
Delivered-To: bmwg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EEA2E1ADBFB for <bmwg@ietfa.amsl.com>; Mon, 9 Nov 2015 21:25:19 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.102
X-Spam-Level:
X-Spam-Status: No, score=-0.102 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ylc_RpwNx-H0 for <bmwg@ietfa.amsl.com>; Mon, 9 Nov 2015 21:25:18 -0800 (PST)
Received: from mailrelay22.naist.jp (mailrelay22.naist.jp [163.221.80.91]) by ietfa.amsl.com (Postfix) with ESMTP id 4894B1AD374 for <bmwg@ietf.org>; Mon, 9 Nov 2015 21:25:18 -0800 (PST)
Received: from mailpost22.naist.jp (mailscan22.naist.jp [163.221.80.59]) by mailrelay22.naist.jp (Postfix) with ESMTP id 3A3ADEE3 for <bmwg@ietf.org>; Tue, 10 Nov 2015 14:25:17 +0900 (JST)
Received: from naist-wavenet125-152.naist.jp (naist-wavenet125-152.naist.jp [163.221.125.152]) by mailpost22.naist.jp (Postfix) with ESMTPSA id 1C805EE2 for <bmwg@ietf.org>; Tue, 10 Nov 2015 14:25:17 +0900 (JST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\))
From: Marius Georgescu <liviumarius-g@is.naist.jp>
In-Reply-To: <4AF73AA205019A4C8A1DDD32C034631D0BB6ADB7AF@NJFPSRVEXG0.research.att.com>
Date: Tue, 10 Nov 2015 14:24:45 +0900
Content-Transfer-Encoding: quoted-printable
Message-Id: <244E19BF-D6DF-4976-BB01-0A149CEB83D5@is.naist.jp>
References: <6b20c5aba195.56384250@naist.jp> <6c1081bddbe0.563844ac@naist.jp> <6c1084a7be89.563844e9@naist.jp> <6a608b65b1c2.56384525@naist.jp> <6a60d6ebaa6a.56384561@naist.jp> <6a80d3baddd6.5638459e@naist.jp> <6aa08a52c1ca.563845da@naist.jp> <6aa09799f4a7.563846ca@naist.jp> <6b60a07c9bbf.56384707@naist.jp> <6c109c80bfc2.56384743@naist.jp> <6a60e1ff9170.56384780@naist.jp> <6a60f4388bab.563847bc@naist.jp> <6bd0f10697e2.563847f8@naist.jp> <6a409179ad4a.56384835@naist.jp> <6a80cfd8c72d.56384871@naist.jp> <6c30b15ad280.563848ae@naist.jp> <6c30f0e98215.563848ea@naist.jp> <6c10c39aeff9.56384926@naist.jp> <6ab08659b996.56384963@naist.jp> <6ab0ea4dfdd6.563849a0@naist.jp> <6ab0be62e098.563849dc@naist.jp> <6aa0abb5b14b.56384a19@naist.jp> <6aa0e679a9c8.56384a55@naist.jp> <6b60e1babb96.56384a93@naist.jp> <6b60fdd88897.56384acf@naist.jp> <6a509431f711.56384c39@naist.jp> <6a50aab7bf13.5638cb72@naist.jp> <CAPrseCo-E82O+tSvRC=4x-yXYTMEHUW6UjeQK6HBRZwXey=sKg@mail.gmail.com> <5640DA91.30502@net.in.tum .de> <9C1BEDBD-2338-4E1B-8C98-E9479FE01423@is.naist.jp> <4AF73AA205019A4C8A1DDD32C034631D0BB6ADB7AF@NJFPSRVEXG0.research.att.com>
To: "bmwg@ietf.org" <bmwg@ietf.org>
X-Mailer: Apple Mail (2.2104)
X-TM-AS-MML: No
X-TM-AS-Product-Ver: IMSS-7.1.0.1392-8.0.0.1202-21932.005
X-TM-AS-Result: No--16.915-5.0-31-10
X-imss-scan-details: No--16.915-5.0-31-10
X-TMASE-MatchedRID: KwK83o6yaQyPvrMjLFD6eB5+URxv1WlBWDesRNOOJ5T1yiN1CBWeTUO8 Y0AXR+d4b3zp7Fd+U6dSm9MCIcizS4Dg4EsX82OcqRV+eC/H/cXdsJI7r+eTztvp+YdZ6aebkjU 5RQ4UtYA0me31Nb/HBgKVtOXBuOHkbTSwI/A2DvBIOSHptb5tx8Xp4w1/a6klbYLPo+BSN4Qb29 WAvZS6W6DHfH8+tFuXsX+kkCvwBRjZnXQG4m9JkOqwWVBfMuvo9xIiieITJahjyv+d0Z0OxWWgn eUNsDqfltawmHdJEaLqDzcmqKlzDM5swG5Jn8kuZwmQqXe/8sOimsR6hkcJAroKipQLKIni6kCj iUM6s9r+eTvekHr4n1aVc4rXqACFRt1seModGdDtMsi+dai/0Y4lnIgC6UzTPoEfrNtVJFipxN4 rruuBU7Q1SscbUVtToJ/wHSn3gff+hembQ23/aHC8PJ2EFS7Iz+4z9ubXKm8VemmWm9moN2ZJ8i UvUUgngcjNRBg0IxgG2ZWxTdSr2X2UUgAeQMmgTQh9A4m9EtG2raKhDLB9XX5h6y4KCSJcGxUU8 gXhKoFJz8hK9Qu/5ib5Ut7KJ2ZT4fEVvamOvhsAKzYLecaUGJPWrYhEvfadbJknz+3f3aUasBds HZWkOdMMOluQswX5WyDgyDAkBfXit/n2SgYCKv700xh9wZtnvjmFPx5veRmbKItl61J/ybLn+0V m71LcljNfzQcnhdey9Q92ZKlY2s4XLBsYBeuCKrauXd3MZDWhkOfXgZZSfZAgnQxJRXTIOOSIBZ sI4dNGQvNeHVzRtJl/MJ4QW/Gk
Archived-At: <http://mailarchive.ietf.org/arch/msg/bmwg/oy0NlVjgUu7LTnP6QsJL9_ZSKxI>
Subject: Re: [bmwg] Mean vs Median
X-BeenThere: bmwg@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Benchmarking Methodology Working Group <bmwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bmwg>, <mailto:bmwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bmwg/>
List-Post: <mailto:bmwg@ietf.org>
List-Help: <mailto:bmwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bmwg>, <mailto:bmwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Nov 2015 05:25:20 -0000

Hello Al,

Thank you very much for joining the discussion. 
Please find my comments inline. 

> On Nov 10, 2015, at 10:40, MORTON, ALFRED C (AL) <acmorton@att.com> wrote:
> 
> Hi Marius, Paul, and all who have contributed so far.
> 
> a quick reply/differing opinion below.
> 
>> -----Original Message-----
>> From: bmwg [mailto:bmwg-bounces@ietf.org] On Behalf Of Marius Georgescu
> ...
>>> On Nov 10, 2015, at 02:40, Paul Emmerich <emmericp@net.in.tum.de>
>> wrote:
>>> 
>>> Hi,
>>> 
>>> On 03.11.15 09:45, Stenio Fernandes wrote:
>>>> a word of caution here... a number of phenomena in computer networks
>>>> follows a heavy-tailed probability distribution function, which means
>>>> that there is a non-negligible probability that a random variable
>>>> will take huge values. these values might be erroneously considered
>> as outliers.
>>> 
>>> this is a really important point. I have benchmarked software where
>> the 99th percentile of the latency is twice the average/median and the
>> 99.9th percentile ten times the average/median.
>> 
>> Can you give us more context (test setup; physical/virtualized
>> tester/DUT; one tester/sender_receiver tester ... ) on these
>> measurements?
> [ACM] 
> My understanding (and I've seen some results, but I've had trouble
> re-locating them) is that both outliers and bimodal distributions
> are more common in the world of virtual DUTs than they were in the
> physical/past. Not only does this affect analysis, but the threshold
> waiting time for packet arrival must be chosen carefully to even
> measure such outliers.
>> 
>>> This is an important performance characteristic for latency-sensitive
>> applications that isn't captured by taking just 20 measurements. So I'd
>> really like to see a standard that calls for thousands of latency
>> measurements to capture this properly.
>>> 
>> 
>> I think we should keep practicality in mind here. If we follow
>> RFC2544.latency measurement, the frame stream has to be 2 min long. 2000
>> min ~ 33h  of testing for just one test sounds unreasonable to me. I
>> would agree to have a lower bound for the sample size as RFC2544
>> actually recommends (n > 20).
> [ACM] 
> Latency (delay) and delay variation need many single delay measurements
> to be meaningful. One way to view the variation is for a single flow of
> packets with spacing that might come from an application, say 20ms spacing
> for VoIP. Collecting a few thousand of such packets should not take so long.
[MG]  
I think there is one thing that needs clarification. The procedure in RFC2544 says:

“The stream SHOULD be at least 120 seconds in duration.An identifying tag SHOULD be included in one frame after 60 seconds with the type of tag being implementation dependent.”

I never got to ask Scott this, but  would it make sense to tag more than one frame? If we tag all the frames in the stream, we would have (depending on the throughput) thousands of measurements in 60 seconds.  

> 
>> 
>>> You can also get interesting insights into a black-box device by
>>> looking at histograms/probability density functions. For example, you
>>> can figure out if the device processes packets in batches, estimate
>>> the batch size, figure out at which rates interrupt moderation
>>> algorithms change etc. (This is, of course, not really a performance
>>> metric, just an interesting insight.)
>>> 
>> 
>> I agree this is an interesting insight. It can also be the base for a
>> decision between summarizing functions. However, in the light of
>> consistency and simplicity of the methodology, I think we would need to
>> recommend one function. We could do that depending on the metric/DUT
>> characteristics, previous testing behavior …
> [ACM] 
> I agree the right summary statistics can only be chosen after an examination
> of the raw distribution for a particular scenario.  If Bi-modal, the central
> statistics of the sample could be meaningless. Without this examination,
> I don't think one recommendation can always be right.
> 
[MG] I agree that analyzing the probability distribution is the best way to choose, but maybe not the most consistent procedure. I had a short discussion about this with Scott before the BMWG meeting. In my understanding of his feedback, the consistency in recommending a summarizing function is more important. In other words, if we leave room for interpreting the data, the results might be more misleading than if we choose  the “wrong”  summarizing function.

Best regards,
Marius