Re: [codec] draft test and processing plan for the IETF Codec

Paul Coverdale <> Thu, 14 April 2011 12:34 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id DB0EAE087F for <>; Thu, 14 Apr 2011 05:34:20 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -0.496
X-Spam-Status: No, score=-0.496 tagged_above=-999 required=5 tests=[AWL=1.300, BAYES_00=-2.599, MSGID_FROM_MTA_HEADER=0.803]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id BnOkV5bJlfhr for <>; Thu, 14 Apr 2011 05:34:20 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 2DBDEE0875 for <>; Thu, 14 Apr 2011 05:34:20 -0700 (PDT)
Received: from BLU0-SMTP46 ([]) by with Microsoft SMTPSVC(6.0.3790.4675); Thu, 14 Apr 2011 05:34:20 -0700
X-Originating-IP: []
X-Originating-Email: []
Message-ID: <BLU0-SMTP463B56C50578E4BB6938BBD0AD0@phx.gbl>
Received: from PaulNewPC ([]) by over TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675); Thu, 14 Apr 2011 05:34:18 -0700
From: Paul Coverdale <>
To: 'Gregory Maxwell' <>,
References: <> <>
In-Reply-To: <>
Date: Thu, 14 Apr 2011 08:34:11 -0400
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Mailer: Microsoft Office Outlook 12.0
Thread-Index: Acv5rOEklPnBthETQ7y0t/p2LHMhUwAocYrQABJ1DTA=
Content-Language: en-us
X-OriginalArrivalTime: 14 Apr 2011 12:34:18.0642 (UTC) FILETIME=[4601B720:01CBFAA0]
Subject: Re: [codec] draft test and processing plan for the IETF Codec
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 14 Apr 2011 12:34:21 -0000

>I'm surprised we haven't seen a more intense reaction to this
>proposal yet.  Perhaps people are missing the less-than-obvious
>mathematical reality of it.
>If you have 10 requirements all of which must be met, where each is
>90% likely to be met, the chance of meeting all of them is 34.8%
>(.9^10).  The chance of failure increases exponentially with
>the number of requirements.
>This amplification effect is one reason why I've opposed additional
>requirements, even though I was quite confident that Opus was better
>than the competition.  Add enough requirements and Opus is sure to fail
>due to _chance_ no matter how good the codec is, even if the
>requirements each sound reasonable individually.
>In this case we have 162 requirements proposed. 75 "better than" (BT),
>and 87 "not worse than" (NWT), once you expand out all the loss rates,
>rates, etc.
>Moreover, because of measurement noise, Opus could meet all of the
>requirements and yet still fail some of the tests. Because there are
>so many requirements, even a small chance of false failure becomes
>I did some rough numeric simulations with the tests proposed, using
>scores with a standard deviation of 1 (which is about what they were on
>the HA test), N = 144 as proposed, and Opus better than the
>comparison codec by 0.1. The chance of passing any single NWT
>requirement is then 0.9769, and the chance of passing any single BT
>requirement is 0.3802.
>The chance of passing all of them is
>0.9769^87 * 0.3802^75 = 4.1483e-33
>Which means about a 1 in 241 nonillion chance of passing all the tests,
>even assuming Opus actually met _all_ the stated requirements with a
>score +0.1 over the reference.
>This is so astronomically unlikely that I had to use an encyclopedia to
>find the name for the number.  I should have saved the time and just
>left it at "a farce".
>I urge the working group to keep this hazard in mind when considering
>the reasonableness of parallel MUST requirements on top of listening-


I don't think the situation is as dire as you make out. Your analysis
assumes that all requirements are completely independent. This is not the
case, in many cases if you meet one requirement you are likely to meet
others of the same kind (eg performance as a function of bit rate).

But in any case, the statistical analysis procedure outlined in the test
plan doesn't assume that every requirement must be met with absolute
certainty, it allows for a confidence interval.