Re: [bmwg] Mean vs Median

Marius Georgescu <liviumarius-g@is.naist.jp> Thu, 12 November 2015 08:30 UTC

Return-Path: <liviumarius-g@is.naist.jp>
X-Original-To: bmwg@ietfa.amsl.com
Delivered-To: bmwg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 51E081ACD72 for <bmwg@ietfa.amsl.com>; Thu, 12 Nov 2015 00:30:48 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.102
X-Spam-Level:
X-Spam-Status: No, score=-0.102 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6Xpj9Q2VjkHH for <bmwg@ietfa.amsl.com>; Thu, 12 Nov 2015 00:30:47 -0800 (PST)
Received: from mailrelay21.naist.jp (mailrelay21.naist.jp [163.221.80.71]) by ietfa.amsl.com (Postfix) with ESMTP id 226E31ACD6C for <bmwg@ietf.org>; Thu, 12 Nov 2015 00:30:47 -0800 (PST)
Received: from mailpost21.naist.jp (mailscan21.naist.jp [163.221.80.58]) by mailrelay21.naist.jp (Postfix) with ESMTP id 4395BA34 for <bmwg@ietf.org>; Thu, 12 Nov 2015 17:30:46 +0900 (JST)
Received: from naist-wavenet124-207.naist.jp (naist-wavenet124-207.naist.jp [163.221.124.207]) by mailpost21.naist.jp (Postfix) with ESMTPSA id 229A4A33 for <bmwg@ietf.org>; Thu, 12 Nov 2015 17:30:46 +0900 (JST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\))
From: Marius Georgescu <liviumarius-g@is.naist.jp>
In-Reply-To: <CAPrseCqY1FFQv8yuASVC5xMYQ7w4+KQCnMhE1cfV7Bjtowovqg@mail.gmail.com>
Date: Thu, 12 Nov 2015 17:30:21 +0900
Content-Transfer-Encoding: quoted-printable
Message-Id: <DBBE2128-BD25-417D-89E1-160ACDA6DF5D@is.naist.jp>
References: <6b20c5aba195.56384250@naist.jp> <6a608b65b1c2.56384525@naist.jp> <6a60d6ebaa6a.56384561@naist.jp> <6a80d3baddd6.5638459e@naist.jp> <6aa08a52c1ca.563845da@naist.jp> <6aa09799f4a7.563846ca@naist.jp> <6b60a07c9bbf.56384707@naist.jp> <6c109c80bfc2.56384743@naist.jp> <6a60e1ff9170.56384780@naist.jp> <6a60f4388bab.563847bc@naist.jp> <6bd0f10697e2.563847f8@naist.jp> <6a409179ad4a.56384835@naist.jp> <6a80cfd8c72d.56384871@naist.jp> <6c30b15ad280.563848ae@naist.jp> <6c30f0e98215.563848ea@naist.jp> <6c10c39aeff9.56384926@naist.jp> <6ab08659b996.56384963@naist.jp> <6ab0ea4dfdd6.563849a0@naist.jp> <6ab0be62e098.563849dc@naist.jp> <6aa0abb5b14b.56384a19@naist.jp> <6aa0e679a9c8.56384a55@naist.jp> <6b60e1babb96.56384a93@naist.jp> <6b60fdd88897.56384acf@naist.jp> <6a509431f711.56384c39@naist.jp> <6a50aab7bf13.5638cb72@naist.jp> <CAPrseCo-E82O+tSvRC=4x-yXYTMEHUW6UjeQK6HBRZwXey=sKg@mail.gmail.com> <9C1BEDBD-2338-4E1B-8C98-E9479FE01423@is.naist.jp> <56434C78.6090502@net.in.tum.de> <CAPrseC qY1FFQv8yuASVC5xMYQ7w4+KQCnMhE1cfV7Bjtowovqg@mail.gmail.com>
To: bmwg@ietf.org
X-Mailer: Apple Mail (2.2104)
X-TM-AS-MML: No
X-TM-AS-Product-Ver: IMSS-7.1.0.1392-8.0.0.1202-21936.005
X-TM-AS-Result: No--7.706-5.0-31-10
X-imss-scan-details: No--7.706-5.0-31-10
X-TMASE-MatchedRID: OG1R3ABYgXWPvrMjLFD6eB5+URxv1WlBGcfGM6EiL4aqvcIF1TcLYMFf P0s+dhCDj8Ifdp0/wdU8BzvrvX/oQi8DdD6aqeL3Y6Gjtu6/t30kKs3LoBtQlW3D6f6IpbLI5WR NYutijdXysOOoIPjR+h5gmCfy+IyV7bYjQvy3yUrMrZu+Xb3+2eiY+s2L3xQE550rfo9dgparWF zW8k/GrdJCqKVixFeFIh6hxkipsghJOUjEvT0q+9FeEDiZr6zeDYBVKmbeeQMUtdRZTmEaIbCQu Jto3I58VZ339ALiV72zJ0cXN5uSMP3FKUSTPHfoM5vhCMb51Y99LQinZ4QefJX0uUtuCvDoeeFA QHoVNZJvFzyvAB/ijt0H8LFZNFG7bkV4e2xSge4syGnBDphCGz9R8B3yHWL4wkbgM5AnPwdwn6+ abs3fKOulxyHOcPoH
Archived-At: <http://mailarchive.ietf.org/arch/msg/bmwg/GlQPw_MPDSfBXIL5zDj2J9pwdYQ>
Subject: Re: [bmwg] Mean vs Median
X-BeenThere: bmwg@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Benchmarking Methodology Working Group <bmwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bmwg>, <mailto:bmwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bmwg/>
List-Post: <mailto:bmwg@ietf.org>
List-Help: <mailto:bmwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bmwg>, <mailto:bmwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 12 Nov 2015 08:30:48 -0000

Hello Stenio,

> On Nov 12, 2015, at 03:46, Stenio Fernandes <sflf@cin.ufpe.br> wrote:
> 
> Very interesting results, Paul. I'd like to have a look at the results in the journal paper.
> 
> In the past, statisticians struggled with a shortage of samples to do their stuff. This is not the case for the computer networking people as we are constantly flooded with samples. 
> 
> The discussions so far have led me to conclude that in the context here, there is no need to make any assumptions on the sample set. Any specific measure of centrality or dispersion might not be precise enough in some cases. As stated by others, mean/median would work for well-behaved (e.g., normally distributed) data, but would not work for multi-modal or heavy-tailed ones. Recall that heavy-tailed distributions are usually characterized by the shape and location parameters instead of mean and variance. Regarding the number of samples, it is really tough to characterize heavy-tailed or multi-modal distributions with a few samples, even using advanced algorithms for maximum-likelihood estimation. 

As stated in the email to Paul, I don’t think increasing the sample size would be a problem, as long as the (minimum) test time is within reasonable limits. I agree that the stability of the dataset is important. However, a more stable dataset would still need to be summarized/reported. If not average (arithmetic mean), median … how can we meaningfully and consistently express a test result ? 

From the Al and Paul’s examples/replies as well as other discussions I had in IETF94 about this, a “fast&hard” solution could be the Median + a measure of variance.

Marius