Re: [bmwg] Mean vs Median

"GEORGESCU LIVIU MARIUS" <liviumarius-g@is.naist.jp> Tue, 03 November 2015 16:41 UTC

Return-Path: <liviumarius-g@is.naist.jp>
X-Original-To: bmwg@ietfa.amsl.com
Delivered-To: bmwg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EEE261A886E for <bmwg@ietfa.amsl.com>; Tue, 3 Nov 2015 08:41:56 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.101
X-Spam-Level:
X-Spam-Status: No, score=-0.101 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Y5Ego3a6LjfJ for <bmwg@ietfa.amsl.com>; Tue, 3 Nov 2015 08:41:53 -0800 (PST)
Received: from mailrelay21.naist.jp (mailrelay21.naist.jp [163.221.80.71]) by ietfa.amsl.com (Postfix) with ESMTP id 8ED011A87E2 for <bmwg@ietf.org>; Tue, 3 Nov 2015 08:41:53 -0800 (PST)
Received: from mailpost21.naist.jp (mailscan21.naist.jp [163.221.80.58]) by mailrelay21.naist.jp (Postfix) with ESMTP id 95287247; Wed, 4 Nov 2015 01:41:52 +0900 (JST)
Received: from naist.jp (webmail21-a.naist.jp [163.221.80.53]) by mailpost21.naist.jp (Postfix) with ESMTP id 7E55B246; Wed, 4 Nov 2015 01:41:52 +0900 (JST)
Received: from [127.0.0.1] (Forwarded-For: ::ffff:182.171.247.250) by webmail21-a.naist.jp (mshttpd); Wed, 04 Nov 2015 01:41:52 +0900
From: GEORGESCU LIVIU MARIUS <liviumarius-g@is.naist.jp>
To: Stenio Fernandes <sflf@cin.ufpe.br>
Message-ID: <6a40a06c97e4.56396260@naist.jp>
Date: Wed, 04 Nov 2015 01:41:52 +0900
X-Mailer: Oracle Communications Messenger Express 7.0.5.35.0 64bit (built Mar 31 2015)
MIME-Version: 1.0
Content-Language: en
X-Accept-Language: en
Priority: normal
In-Reply-To: <6a409189b1be.5638e3c4@naist.jp>
References: <6b20c5aba195.56384250@naist.jp> <6aa0d4b4811d.5638428d@naist.jp> <6c3092a4e4de.563842ca@naist.jp> <6c30e9bcce6f.56384306@naist.jp> <6c30b769f897.56384342@naist.jp> <6bd0eb5cc61c.5638437f@naist.jp> <6a80acabaf05.563843bb@naist.jp> <6a40d704f84b.563843f7@naist.jp> <6aa08acd9d6a.56384434@naist.jp> <6c10886bda9e.56384470@naist.jp> <6c1081bddbe0.563844ac@naist.jp> <6c1084a7be89.563844e9@naist.jp> <6a608b65b1c2.56384525@naist.jp> <6a60d6ebaa6a.56384561@naist.jp> <6a80d3baddd6.5638459e@naist.jp> <6aa08a52c1ca.563845da@naist.jp> <6aa09799f4a7.563846ca@naist.jp> <6b60a07c9bbf.56384707@naist.jp> <6c109c80bfc2.56384743@naist.jp> <6a60e1ff9170.56384780@naist.jp> <6a60f4388bab.563847bc@naist.jp> <6bd0f10697e2.563847f8@naist.jp> <6a409179ad4a.56384835@naist.jp> <6a80cfd8c72d.56384871@naist.jp> <6c30b15ad280.563848ae@naist.jp> <6c30f0e98215.563848ea@naist.jp> <6c10c39aeff9.56384926@naist.jp> <6ab08659b996.56384963@naist.jp> <6ab0ea4dfdd6.563849a0@naist.jp> <6ab0be62e098.563849dc@naist.jp> <6aa0abb5b14b.56384a19@naist.jp> <6aa0e679a9c8.56384a55@naist.jp> <6b60e1babb96.56384a93@naist.jp> <6b60fdd88897.56384acf@naist.jp> <6a509431f711.56384c39@naist.jp> <6a50aab7bf13.5638cb72@naist.jp> <CAPrseCo-E82O+tSvRC=4x-yXYTMEHUW6UjeQK6HBRZwXey=sKg@mail.gmail.com> <6a409189b1be.5638e3c4@naist.jp>
Content-Type: multipart/alternative; boundary="--993b210d2b9c300d54ce"
X-TM-AS-MML: No
X-TM-AS-Product-Ver: IMSS-7.1.0.1392-8.0.0.1202-21920.000
X-TM-AS-Result: No--25.882-5.0-31-10
X-imss-scan-details: No--25.882-5.0-31-10
X-TMASE-MatchedRID: OoEa6u7Uk5/XTgyOeCkiImF/OVjTCoG0+X5uLEidSv+kHBdQz6cLCzgc oj/JqpGb5gCHftmwEMIuLZ3AqIxH3JEfv6UzUvbdzFqXKi+YeObutiMqNIaz7d1T+nRD/jT6PXd Zx1sZHpCdtRmRhPNchh9fNWA7SFWqh2iFXWy3oLdjLORo1y6rxhzMSx/XQOvMlFkA20bvgUUKdo kVXhehdlVN8laWo90MGY9Y+ATae1xaMgYuNs2nhpTAtbL7Lmy0NfqeZZPssim+3xA5udJ2N99J2 zmJrDxSzFOoeBdH/n2Bs03RHrzjM7oKipQLKIni/qsg+OKw7BUc7GmtYIN/Asn+WbzE3UhvfTYI ha6O7PyQ+gWwzffozvn94Go09nBsId0CVpkYkCJTCzmYI+kXZxBxqcioKvQn9Op6/Qg6ZAIwfXl 56Qt5SMWUKBjERoYTuP9+fQUL2ahqcCi3MPMBNpBb6mwLnSLRSKGrLERHMp14ez6VMlvnaPtzqs Xu3a66zmwpmAFt7kW0NJ9wxH7tkw7PhHZvnJBt+XY56IPq/P6WlWAZ6Nc21KZPQKkIZi5yJ+//t Wvrw2OR1ykkpfknCsnlJe2gk8vIFdX7UGTqqi9Pwh+Xh4Gh7cuzJIrNacGOSLUnJKbY1z+LYaqu +AuRalgowyUWHgGdYY0tNGdvli1Y/2pi7PK0EsZShONTmTT1IzkElH+xuv0lZfxAhmTMNNHbNyn 4IxDULi5PDX0qWHo76ne+DS+gbb5XN91D4/25W4UO24PfvJUaOJ6ZH2E5SSdSi76/qrXQBawxeD gsyEl1e7Xbb6Im2pv8tNLRPUrWNAw3+pAf1IijxYyRBa/qJeBPbNdhr3+dhXCdeY81j3dGONWF/ 6P/CsAn1apN4PgHrV0OiiNL0xhEcGY6dokS7yPzRlrdFGDw+Z1ukCzySmeX5k1SOULTwHaHZuZ0 VqBIr+FMTdSs/g73KhyswiSFLw==
Archived-At: <http://mailarchive.ietf.org/arch/msg/bmwg/yTBC5fccXJp54HFuewRxS91LouI>
Cc: k.pentikousis@eict.de, bmwg@ietf.org
Subject: Re: [bmwg] Mean vs Median
X-BeenThere: bmwg@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Benchmarking Methodology Working Group <bmwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bmwg>, <mailto:bmwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bmwg/>
List-Post: <mailto:bmwg@ietf.org>
List-Help: <mailto:bmwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bmwg>, <mailto:bmwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Nov 2015 16:41:57 -0000

> 
> 
> 
> 
> Hello Stenio,
>  
>  
>  
> Thanks for your comments. Please see my comments inline.
>  
>  
>  
> From: steniofernandes@gmail.com
> [mailto:steniofernandes@gmail.com <steniofernandes@gmail.com>]
> On Behalf Of Stenio Fernandes
> 
> Sent: Tuesday, November 3, 2015 5:45 PM
> 
> To: GEORGESCU LIVIU MARIUS <liviumarius-g@is.naist.jp>
> 
> Cc: bmwg@ietf.org; k.pentikousis@eict.de
> 
> Subject: Re: [bmwg] Mean vs Median
>  
>  
>  
> my two cents on this... see inline comments
>  
>  
>  
> this is my first interaction with the wg... so, bear
> with me if i'm a bit wordy :-)
>  
>  
>  
> stenio
>  
>  
>  
>  
>  
> Thanks for joining the discussion.
>  
>  
>  
> On Tu
>  
> e, Nov 3, 2015 at 3:57 AM, GEORGESCU LIVIU MARIUS <liviumarius-g@is.naist.jp(https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=liviumarius-g@is.naist.jp)> wrote:
>  
> Hello BMWG,
>  
>  
>  
> Following some of the discussion we had in IETF93 about
> using either mean or median as a summarizing function for the results of
> multiple test iterations, I added the following section in http://tools.ietf.org/html/draft-ietf-bmwg-ipv6-tran-tech-benchmarking-00 
>  
> .
>  10. Summarizing function
> and repeatability 
>  
> 
>  
> 
>  To ensure the stability of the benchmarking scores obtained using
> 
>  the tests presented in Sections 6(http://tools.ietf.org/html/draft-ietf-bmwg-ipv6-tran-tech-benchmarking-00#section-6)-9(http://tools.ietf.org/html/draft-ietf-bmwg-ipv6-tran-tech-benchmarking-00#section-9), multiple test iterations are
> 
>  recommended. Following the recommendations of RFC2544(http://tools.ietf.org/html/rfc2544), the average
> 
>  was chosen to be the summarizing function for the reported values.
> 
>  While median can be an alternative summarizing function, a rationale
> 
>  for using one or the other is needed.
>  
>  
>  
>  
>  
> average is a colloquial term. although, in this context,
> there might be nothing wrong with that, imho precise terms are preferred.
> measures of central tendency could be used as the general term, where mean and
> median fit in. 
>  
>  
>  
> Average seems to be a term accepted and used by industry, and
> not just a “colloquial” term. We could quibble about definitions and what
> we need to follow just as well. I prefer to go with RFC2544.
>  
>  
>  
>  The median can be useful for summarizing especially when outliers
> 
>  are not a desired quantity. However, in the overall performance of a
> 
>  network device the outliers can represent a malfunction or
> 
>  misconfiguration in the DUT, which should be taken into account.
> 
>  The average is a more inclusive summarizing function. Moreover, as
> 
>  underlined in [DeNijs(http://tools.ietf.org/html/draft-ietf-bmwg-ipv6-tran-tech-benchmarking-00#ref-DeNijs)], the average is less exposed to statistical
> 
>  uncertainty. These reasons make it the RECOMMENDED summarizing
> 
>  function for the results of different test iterations, unless stated
> 
>  otherwise.
>  
>  
>  
> i'm having a hard time to understand this paragraph... i)
> "inclusive" is very vague; ii) "less exposed to uncertainty"
> is confusing. the mean is just a measure of centrality, whereas measures of
> dispersions (e.g., sd, variance) can be used to assess the degree of
> uncertainty around that measure (think of confidence intervals).
>  
> i know this is not the objective of the document, but the
> recommendation could be very simple. for example, stating that one should
> assess statistical significance of results would be enough, and then
> pointing out to a strong reference. le boudec's book seems appropriate
> here (cf. chap 2).
>  
>  
>  
> Just assessing statistical significance would be great. I would
> recommend the study of this book: “Jain, Raj. The art of computer systems
> performance analysis. John Wiley & Sons, 2008.”
>  
> I wonder if text like this in an RFC would see the print.
>  
> I am sure this would be perfect for certain types of academic
> papers. I would rather have a clearer recommendation.
>  
>  
>  
>  To express the repeatability of the benchmarking tests through a
> 
>  number, the Margin of error (MoE) can be used. Of course, other
> 
>  functions, such as standard error could be employed as well. The
> 
>  advantage the MoE has is expressing an associated confidence
> 
>  interval by using the alpha parameter.
> 
>  
> 
>  The recommended formula for calculating the MoE is presented in 
>  
> Section 6.3.1.
>  
>  
>  
> if the document will not give detailed approaches for
> summarizing performance data (and i think it shouldn't), it should provied the
> simplest recommendation as possible. otherwise, in order to be scientifically
> correct, lots of assumptions must be made and provided, like iid, which might
> not hold true for all cases.
>  
>  
>  
> In order to be scientifically correct, the most important
> part would be to assess the probability distribution of the data. I am trying
> to find a solution where that wouldn’t be necessary. Of course it would
> not hold for all cases, but the goal is to find a pareto-optimal solution where
> the summarized result would be representative enough for the test sample and
> simple enough to obtain. Or at least that’s how I see things.
>  
>  
>  
> Marius
>  
>  
>