Re: [bmwg] Mean vs Median

"MORTON, ALFRED C (AL)" <acmorton@att.com> Tue, 10 November 2015 01:40 UTC

Return-Path: <acmorton@att.com>
X-Original-To: bmwg@ietfa.amsl.com
Delivered-To: bmwg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AD3261B2E5A for <bmwg@ietfa.amsl.com>; Mon, 9 Nov 2015 17:40:48 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.211
X-Spam-Level:
X-Spam-Status: No, score=-4.211 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BI7liLzERaG6 for <bmwg@ietfa.amsl.com>; Mon, 9 Nov 2015 17:40:47 -0800 (PST)
Received: from mail-pink.research.att.com (mail-pink.research.att.com [204.178.8.22]) by ietfa.amsl.com (Postfix) with ESMTP id E2FBD1B2E33 for <bmwg@ietf.org>; Mon, 9 Nov 2015 17:40:46 -0800 (PST)
Received: from mail-green.research.att.com (H-135-207-255-15.research.att.com [135.207.255.15]) by mail-pink.research.att.com (Postfix) with ESMTP id 70F4F122822; Mon, 9 Nov 2015 20:41:33 -0500 (EST)
Received: from exchange.research.att.com (njmtcas1.research.att.com [135.207.255.99]) by mail-green.research.att.com (Postfix) with ESMTP id EC6B2E1038; Mon, 9 Nov 2015 20:38:32 -0500 (EST)
Received: from NJFPSRVEXG0.research.att.com ([fe80::108a:1006:9f54:fd90]) by njmtcas1.research.att.com ([fe80::f1f7:6c06:d0d0:d48c%10]) with mapi; Mon, 9 Nov 2015 20:40:43 -0500
From: "MORTON, ALFRED C (AL)" <acmorton@att.com>
To: Marius Georgescu <liviumarius-g@is.naist.jp>, "bmwg@ietf.org" <bmwg@ietf.org>
Date: Mon, 09 Nov 2015 20:40:41 -0500
Thread-Topic: [bmwg] Mean vs Median
Thread-Index: AdEbVeEo+S+fiV8YQaidfY3DRarieQAANOmQ
Message-ID: <4AF73AA205019A4C8A1DDD32C034631D0BB6ADB7AF@NJFPSRVEXG0.research.att.com>
References: <6b20c5aba195.56384250@naist.jp> <6c1081bddbe0.563844ac@naist.jp> <6c1084a7be89.563844e9@naist.jp> <6a608b65b1c2.56384525@naist.jp> <6a60d6ebaa6a.56384561@naist.jp> <6a80d3baddd6.5638459e@naist.jp> <6aa08a52c1ca.563845da@naist.jp> <6aa09799f4a7.563846ca@naist.jp> <6b60a07c9bbf.56384707@naist.jp> <6c109c80bfc2.56384743@naist.jp> <6a60e1ff9170.56384780@naist.jp> <6a60f4388bab.563847bc@naist.jp> <6bd0f10697e2.563847f8@naist.jp> <6a409179ad4a.56384835@naist.jp> <6a80cfd8c72d.56384871@naist.jp> <6c30b15ad280.563848ae@naist.jp> <6c30f0e98215.563848ea@naist.jp> <6c10c39aeff9.56384926@naist.jp> <6ab08659b996.56384963@naist.jp> <6ab0ea4dfdd6.563849a0@naist.jp> <6ab0be62e098.563849dc@naist.jp> <6aa0abb5b14b.56384a19@naist.jp> <6aa0e679a9c8.56384a55@naist.jp> <6b60e1babb96.56384a93@naist.jp> <6b60fdd88897.56384acf@naist.jp> <6a509431f711.56384c39@naist.jp> <6a50aab7bf13.5638cb72@naist.jp> <CAPrseCo-E82O+tSvRC=4x-yXYTMEHUW6UjeQK6HBRZwXey=sKg@mail.gmail.com> <5640DA91.30502@net.in.tum .de> <9C1BEDBD-2338-4E1B-8C98-E9479FE01423@is.naist.jp>
In-Reply-To: <9C1BEDBD-2338-4E1B-8C98-E9479FE01423@is.naist.jp>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: <http://mailarchive.ietf.org/arch/msg/bmwg/8ou3dVRr3TZaWDN0y4pdwwAsjoI>
Subject: Re: [bmwg] Mean vs Median
X-BeenThere: bmwg@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Benchmarking Methodology Working Group <bmwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bmwg>, <mailto:bmwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bmwg/>
List-Post: <mailto:bmwg@ietf.org>
List-Help: <mailto:bmwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bmwg>, <mailto:bmwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Nov 2015 01:40:48 -0000

Hi Marius, Paul, and all who have contributed so far.

a quick reply/differing opinion below.

> -----Original Message-----
> From: bmwg [mailto:bmwg-bounces@ietf.org] On Behalf Of Marius Georgescu
...
> > On Nov 10, 2015, at 02:40, Paul Emmerich <emmericp@net.in.tum.de>
> wrote:
> >
> > Hi,
> >
> > On 03.11.15 09:45, Stenio Fernandes wrote:
> >> a word of caution here... a number of phenomena in computer networks
> >> follows a heavy-tailed probability distribution function, which means
> >> that there is a non-negligible probability that a random variable
> >> will take huge values. these values might be erroneously considered
> as outliers.
> >
> > this is a really important point. I have benchmarked software where
> the 99th percentile of the latency is twice the average/median and the
> 99.9th percentile ten times the average/median.
> 
> Can you give us more context (test setup; physical/virtualized
> tester/DUT; one tester/sender_receiver tester ... ) on these
> measurements?
[ACM] 
My understanding (and I've seen some results, but I've had trouble
re-locating them) is that both outliers and bimodal distributions
are more common in the world of virtual DUTs than they were in the
physical/past. Not only does this affect analysis, but the threshold
waiting time for packet arrival must be chosen carefully to even
measure such outliers.

> 
> > This is an important performance characteristic for latency-sensitive
> applications that isn't captured by taking just 20 measurements. So I'd
> really like to see a standard that calls for thousands of latency
> measurements to capture this properly.
> >
> 
> I think we should keep practicality in mind here. If we follow
> RFC2544.latency measurement, the frame stream has to be 2 min long. 2000
> min ~ 33h  of testing for just one test sounds unreasonable to me. I
> would agree to have a lower bound for the sample size as RFC2544
> actually recommends (n > 20).
[ACM] 
Latency (delay) and delay variation need many single delay measurements
to be meaningful. One way to view the variation is for a single flow of
packets with spacing that might come from an application, say 20ms spacing
for VoIP. Collecting a few thousand of such packets should not take so long.
 

> 
> > You can also get interesting insights into a black-box device by
> > looking at histograms/probability density functions. For example, you
> > can figure out if the device processes packets in batches, estimate
> > the batch size, figure out at which rates interrupt moderation
> > algorithms change etc. (This is, of course, not really a performance
> > metric, just an interesting insight.)
> >
> 
> I agree this is an interesting insight. It can also be the base for a
> decision between summarizing functions. However, in the light of
> consistency and simplicity of the methodology, I think we would need to
> recommend one function. We could do that depending on the metric/DUT
> characteristics, previous testing behavior …
[ACM] 
I agree the right summary statistics can only be chosen after an examination
of the raw distribution for a particular scenario.  If Bi-modal, the central
statistics of the sample could be meaningless. Without this examination,
I don't think one recommendation can always be right.

my 2 cents
Al
(as a participant)


> 
> >
> > Paul
> >
> > --
> > Paul Emmerich
> > Technical University of Munich (TUM)
> > Department of Informatics
> > Chair for Network Architectures and Services
> >
> > _______________________________________________
> > bmwg mailing list
> > bmwg@ietf.org
> > https://www.ietf.org/mailman/listinfo/bmwg
> 
> _______________________________________________
> bmwg mailing list
> bmwg@ietf.org
> https://www.ietf.org/mailman/listinfo/bmwg