Re: [codec] draft test and processing plan for the IETF Codec

Dear Greg,

Thanks for raising this point, what you say is of course correct mathematically and I am very well aware of that. 

I do not recall anyone suggested that everything needs to be passed perfectly. The analysis of the results is something that is open for discussion, if the codec fails a requirement we need to understand why and whether the failure is systematic, that does not mean that the codec is rejected.

There are many cases in which a requirement formally fail but is numerically very close to pass. Certain requirements may pass in a language while fail (very close to pass) in another language and when combined together lead to a pass. There are many examples in ITU-T where codecs have been selected and standardized while fulfilling 90% or 95% of the requirements. What matters in the end, is the decision of the group and the availability of data helps in reaching consensus.

Kind regards,
/Anisse

> -----Original Message-----
> From: codec-bounces@ietf.org [mailto:codec-bounces@ietf.org] On Behalf Of
> Gregory Maxwell
> Sent: Thursday, April 14, 2011 5:05 AM
> To: codec@ietf.org
> Subject: Re: [codec] draft test and processing plan for the IETF Codec
> 
> Anisse Taleb [anisse.taleb@huawei.com] wrote:
> > Hi,
> > Please find attached a first draft of a test plan of the IETF codec
> (Opus).
> > The proposal does not claim to be complete, there are still many missing
> > things, e.g. tandeming cases, tests with delay jitter, dtx etc. Consider
> > it as a starting point for discussion where everyone is welcome to
> > contribute in a constructive manner. Further updates are planned,
> > but let's see first some initial comments.
> 
> I'm surprised we haven't seen a more intense reaction to this
> proposal yet.  Perhaps people are missing the less-than-obvious
> mathematical reality of it.
> 
> If you have 10 requirements all of which must be met, where each is
> 90% likely to be met, the chance of meeting all of them is 34.8%
> (.9^10).  The chance of failure increases exponentially with
> the number of requirements.
> 
> This amplification effect is one reason why I've opposed additional
> requirements, even though I was quite confident that Opus was better
> than the competition.  Add enough requirements and Opus is sure to fail
> due to _chance_ no matter how good the codec is, even if the
> requirements each sound reasonable individually.
> 
> In this case we have 162 requirements proposed. 75 "better than" (BT),
> and 87 "not worse than" (NWT), once you expand out all the loss rates, bit
> rates, etc.
> 
> Moreover, because of measurement noise, Opus could meet all of the
> requirements and yet still fail some of the tests. Because there are
> so many requirements, even a small chance of false failure becomes
> significant.
> 
> I did some rough numeric simulations with the tests proposed, using
> scores with a standard deviation of 1 (which is about what they were on
> the HA test), N = 144 as proposed, and Opus better than the
> comparison codec by 0.1. The chance of passing any single NWT
> requirement is then 0.9769, and the chance of passing any single BT
> requirement is 0.3802.
> 
> The chance of passing all of them is
> 0.9769^87 * 0.3802^75 = 4.1483e-33
> 
> Which means about a 1 in 241 nonillion chance of passing all the tests,
> even assuming Opus actually met _all_ the stated requirements with a
> score +0.1 over the reference.
> e.
> 
> This is so astronomically unlikely that I had to use an encyclopedia to
> find the name for the number.  I should have saved the time and just
> left it at "a farce".
> 
> I urge the working group to keep this hazard in mind when considering
> the reasonableness of parallel MUST requirements on top of listening-
> test.
> _______________________________________________
> codec mailing list
> codec@ietf.org
> https://www.ietf.org/mailman/listinfo/codec