Re: [bmwg] Mean vs Median

"MORTON, ALFRED C (AL)" <acmorton@att.com> Tue, 10 November 2015 06:23 UTC

Return-Path: <acmorton@att.com>
X-Original-To: bmwg@ietfa.amsl.com
Delivered-To: bmwg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D40E11B3344 for <bmwg@ietfa.amsl.com>; Mon, 9 Nov 2015 22:23:32 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.211
X-Spam-Level:
X-Spam-Status: No, score=-4.211 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SlYdxMI_iFvC for <bmwg@ietfa.amsl.com>; Mon, 9 Nov 2015 22:23:30 -0800 (PST)
Received: from mail-pink.research.att.com (mail-pink.research.att.com [204.178.8.22]) by ietfa.amsl.com (Postfix) with ESMTP id 548F21B3345 for <bmwg@ietf.org>; Mon, 9 Nov 2015 22:23:30 -0800 (PST)
Received: from mail-azure.research.att.com (unknown [135.207.255.18]) by mail-pink.research.att.com (Postfix) with ESMTP id DC1DC12172D; Tue, 10 Nov 2015 01:23:01 -0500 (EST)
Received: from exchange.research.att.com (njfpsrvexg11.research.att.com [135.207.255.123]) by mail-azure.research.att.com (Postfix) with ESMTP id 21574E105C; Tue, 10 Nov 2015 01:22:11 -0500 (EST)
Received: from NJFPSRVEXG0.research.att.com ([fe80::108a:1006:9f54:fd90]) by NJFPSRVEXG11.research.att.com ([fe80::516e:6eec:2697:ec78%17]) with mapi; Tue, 10 Nov 2015 01:22:10 -0500
From: "MORTON, ALFRED C (AL)" <acmorton@att.com>
To: Marius Georgescu <liviumarius-g@is.naist.jp>, "bmwg@ietf.org" <bmwg@ietf.org>
Date: Tue, 10 Nov 2015 01:22:09 -0500
Thread-Topic: [bmwg] Mean vs Median
Thread-Index: AdEbeDrvIQGoKAwFRBedcvXteb19ugAASmtg
Message-ID: <4AF73AA205019A4C8A1DDD32C034631D0BB6ADB7B6@NJFPSRVEXG0.research.att.com>
References: <6b20c5aba195.56384250@naist.jp> <6c1081bddbe0.563844ac@naist.jp> <6c1084a7be89.563844e9@naist.jp> <6a608b65b1c2.56384525@naist.jp> <6a60d6ebaa6a.56384561@naist.jp> <6a80d3baddd6.5638459e@naist.jp> <6aa08a52c1ca.563845da@naist.jp> <6aa09799f4a7.563846ca@naist.jp> <6b60a07c9bbf.56384707@naist.jp> <6c109c80bfc2.56384743@naist.jp> <6a60e1ff9170.56384780@naist.jp> <6a60f4388bab.563847bc@naist.jp> <6bd0f10697e2.563847f8@naist.jp> <6a409179ad4a.56384835@naist.jp> <6a80cfd8c72d.56384871@naist.jp> <6c30b15ad280.563848ae@naist.jp> <6c30f0e98215.563848ea@naist.jp> <6c10c39aeff9.56384926@naist.jp> <6ab08659b996.56384963@naist.jp> <6ab0ea4dfdd6.563849a0@naist.jp> <6ab0be62e098.563849dc@naist.jp> <6aa0abb5b14b.56384a19@naist.jp> <6aa0e679a9c8.56384a55@naist.jp> <6b60e1babb96.56384a93@naist.jp> <6b60fdd88897.56384acf@naist.jp> <6a509431f711.56384c39@naist.jp> <6a50aab7bf13.5638cb72@naist.jp> <CAPrseCo-E82O+tSvRC=4x-yXYTMEHUW6UjeQK6HBRZwXey=sKg@mail.gmail.com> <5640DA91.30502@net.in.tum .de> <9C1BEDBD-2338-4E1B-8C98-E9479FE01423@is.naist.jp> <4AF73AA205019A4C8A1DDD32C034631D0BB6ADB7AF@NJFPSRVEXG0.research.att.com> <244E19BF-D6DF-4976-BB01-0A149CEB83D5@is.naist.jp>
In-Reply-To: <244E19BF-D6DF-4976-BB01-0A149CEB83D5@is.naist.jp>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: <http://mailarchive.ietf.org/arch/msg/bmwg/1JCX2mYZV3M_OvgmMVsXrGCAtdg>
Subject: Re: [bmwg] Mean vs Median
X-BeenThere: bmwg@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Benchmarking Methodology Working Group <bmwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bmwg>, <mailto:bmwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bmwg/>
List-Post: <mailto:bmwg@ietf.org>
List-Help: <mailto:bmwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bmwg>, <mailto:bmwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Nov 2015 06:23:33 -0000

Hi Marius,

On your first comment below, yes, we need to move beyond the 
RFC2544 latency of a single packet in new work (while keeping the 
intent of earlier work in mind).

On your second point, relaying Scott's comment about hard and fast rules:
I guess I would tend toward reporting the Median for any distribution
(it's not subject to the outliers like mean), but I still believe
that one statistic is not enough - a metric of variation is needed.

Al
(as participant)

> -----Original Message-----
> From: bmwg [mailto:bmwg-bounces@ietf.org] On Behalf Of Marius Georgescu
> Sent: Tuesday, November 10, 2015 12:25 AM
> To: bmwg@ietf.org
> Subject: Re: [bmwg] Mean vs Median
> 
> Hello Al,
> 
> Thank you very much for joining the discussion.
> Please find my comments inline.
> 
> > On Nov 10, 2015, at 10:40, MORTON, ALFRED C (AL) <acmorton@att.com>
> wrote:
> >
> > Hi Marius, Paul, and all who have contributed so far.
> >
> > a quick reply/differing opinion below.
> >
> >> -----Original Message-----
> >> From: bmwg [mailto:bmwg-bounces@ietf.org] On Behalf Of Marius
> >> Georgescu
> > ...
> >>> On Nov 10, 2015, at 02:40, Paul Emmerich <emmericp@net.in.tum.de>
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> On 03.11.15 09:45, Stenio Fernandes wrote:
> >>>> a word of caution here... a number of phenomena in computer
> >>>> networks follows a heavy-tailed probability distribution function,
> >>>> which means that there is a non-negligible probability that a
> >>>> random variable will take huge values. these values might be
> >>>> erroneously considered
> >> as outliers.
> >>>
> >>> this is a really important point. I have benchmarked software where
> >> the 99th percentile of the latency is twice the average/median and
> >> the 99.9th percentile ten times the average/median.
> >>
> >> Can you give us more context (test setup; physical/virtualized
> >> tester/DUT; one tester/sender_receiver tester ... ) on these
> >> measurements?
> > [ACM]
> > My understanding (and I've seen some results, but I've had trouble
> > re-locating them) is that both outliers and bimodal distributions are
> > more common in the world of virtual DUTs than they were in the
> > physical/past. Not only does this affect analysis, but the threshold
> > waiting time for packet arrival must be chosen carefully to even
> > measure such outliers.
> >>
> >>> This is an important performance characteristic for
> >>> latency-sensitive
> >> applications that isn't captured by taking just 20 measurements. So
> >> I'd really like to see a standard that calls for thousands of latency
> >> measurements to capture this properly.
> >>>
> >>
> >> I think we should keep practicality in mind here. If we follow
> >> RFC2544.latency measurement, the frame stream has to be 2 min long.
> >> 2000 min ~ 33h  of testing for just one test sounds unreasonable to
> >> me. I would agree to have a lower bound for the sample size as
> >> RFC2544 actually recommends (n > 20).
> > [ACM]
> > Latency (delay) and delay variation need many single delay
> > measurements to be meaningful. One way to view the variation is for a
> > single flow of packets with spacing that might come from an
> > application, say 20ms spacing for VoIP. Collecting a few thousand of
> such packets should not take so long.
> [MG]
> I think there is one thing that needs clarification. The procedure in
> RFC2544 says:
> 
> “The stream SHOULD be at least 120 seconds in duration.An identifying
> tag SHOULD be included in one frame after 60 seconds with the type of
> tag being implementation dependent.”
> 
> I never got to ask Scott this, but  would it make sense to tag more than
> one frame? If we tag all the frames in the stream, we would have
> (depending on the throughput) thousands of measurements in 60 seconds.
> 
> >
> >>
> >>> You can also get interesting insights into a black-box device by
> >>> looking at histograms/probability density functions. For example,
> >>> you can figure out if the device processes packets in batches,
> >>> estimate the batch size, figure out at which rates interrupt
> >>> moderation algorithms change etc. (This is, of course, not really a
> >>> performance metric, just an interesting insight.)
> >>>
> >>
> >> I agree this is an interesting insight. It can also be the base for a
> >> decision between summarizing functions. However, in the light of
> >> consistency and simplicity of the methodology, I think we would need
> >> to recommend one function. We could do that depending on the
> >> metric/DUT characteristics, previous testing behavior …
> > [ACM]
> > I agree the right summary statistics can only be chosen after an
> > examination of the raw distribution for a particular scenario.  If
> > Bi-modal, the central statistics of the sample could be meaningless.
> > Without this examination, I don't think one recommendation can always
> be right.
> >
> [MG] I agree that analyzing the probability distribution is the best way
> to choose, but maybe not the most consistent procedure. I had a short
> discussion about this with Scott before the BMWG meeting. In my
> understanding of his feedback, the consistency in recommending a
> summarizing function is more important. In other words, if we leave room
> for interpreting the data, the results might be more misleading than if
> we choose  the “wrong”  summarizing function.
> 
> Best regards,
> Marius
> _______________________________________________
> bmwg mailing list
> bmwg@ietf.org
> https://www.ietf.org/mailman/listinfo/bmwg