Re: [codec] draft test and processing plan for the IETF Codec

Gregory Maxwell <gmaxwell@juniper.net> Thu, 14 April 2011 03:07 UTC

Return-Path: <gmaxwell@juniper.net>
X-Original-To: codec@ietfc.amsl.com
Delivered-To: codec@ietfc.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfc.amsl.com (Postfix) with ESMTP id 87F69E07F5 for <codec@ietfc.amsl.com>; Wed, 13 Apr 2011 20:07:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.599
X-Spam-Level:
X-Spam-Status: No, score=-6.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([208.66.40.236]) by localhost (ietfc.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 68rpB7rEozzA for <codec@ietfc.amsl.com>; Wed, 13 Apr 2011 20:07:04 -0700 (PDT)
Received: from exprod7og118.obsmtp.com (exprod7og118.obsmtp.com [64.18.2.8]) by ietfc.amsl.com (Postfix) with ESMTP id 7FEC1E07F4 for <codec@ietf.org>; Wed, 13 Apr 2011 20:07:04 -0700 (PDT)
Received: from P-EMHUB03-HQ.jnpr.net ([66.129.224.36]) (using TLSv1) by exprod7ob118.postini.com ([64.18.6.12]) with SMTP ID DSNKTaZk1z9AvT+F1zP5g6YsC3cizYSw43MJ@postini.com; Wed, 13 Apr 2011 20:07:04 PDT
Received: from EMBX01-HQ.jnpr.net ([fe80::c821:7c81:f21f:8bc7]) by P-EMHUB03-HQ.jnpr.net ([::1]) with mapi; Wed, 13 Apr 2011 20:04:44 -0700
From: Gregory Maxwell <gmaxwell@juniper.net>
To: "codec@ietf.org" <codec@ietf.org>
Date: Wed, 13 Apr 2011 20:04:44 -0700
Thread-Topic: draft test and processing plan for the IETF Codec
Thread-Index: Acv5rOEklPnBthETQ7y0t/p2LHMhUwAocYrQ
Message-ID: <BCB3F026FAC4C145A4A3330806FEFDA93BA8B6463D@EMBX01-HQ.jnpr.net>
References: <F5AD4C2E5FBF304ABAE7394E9979AF7C26BC684E@LHREML503-MBX.china.huawei.com>
In-Reply-To: <F5AD4C2E5FBF304ABAE7394E9979AF7C26BC684E@LHREML503-MBX.china.huawei.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [codec] draft test and processing plan for the IETF Codec
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 14 Apr 2011 03:07:05 -0000

Anisse Taleb [anisse.taleb@huawei.com] wrote:
> Hi,
> Please find attached a first draft of a test plan of the IETF codec (Opus).
> The proposal does not claim to be complete, there are still many missing
> things, e.g. tandeming cases, tests with delay jitter, dtx etc. Consider
> it as a starting point for discussion where everyone is welcome to
> contribute in a constructive manner. Further updates are planned,
> but let's see first some initial comments.

I'm surprised we haven't seen a more intense reaction to this
proposal yet.  Perhaps people are missing the less-than-obvious
mathematical reality of it.

If you have 10 requirements all of which must be met, where each is
90% likely to be met, the chance of meeting all of them is 34.8% 
(.9^10).  The chance of failure increases exponentially with 
the number of requirements.

This amplification effect is one reason why I've opposed additional
requirements, even though I was quite confident that Opus was better
than the competition.  Add enough requirements and Opus is sure to fail
due to _chance_ no matter how good the codec is, even if the
requirements each sound reasonable individually.

In this case we have 162 requirements proposed. 75 "better than" (BT),
and 87 "not worse than" (NWT), once you expand out all the loss rates, bit
rates, etc.

Moreover, because of measurement noise, Opus could meet all of the
requirements and yet still fail some of the tests. Because there are
so many requirements, even a small chance of false failure becomes
significant.

I did some rough numeric simulations with the tests proposed, using
scores with a standard deviation of 1 (which is about what they were on
the HA test), N = 144 as proposed, and Opus better than the
comparison codec by 0.1. The chance of passing any single NWT
requirement is then 0.9769, and the chance of passing any single BT
requirement is 0.3802. 

The chance of passing all of them is
0.9769^87 * 0.3802^75 = 4.1483e-33

Which means about a 1 in 241 nonillion chance of passing all the tests,
even assuming Opus actually met _all_ the stated requirements with a
score +0.1 over the reference.
e.

This is so astronomically unlikely that I had to use an encyclopedia to
find the name for the number.  I should have saved the time and just
left it at "a farce". 

I urge the working group to keep this hazard in mind when considering
the reasonableness of parallel MUST requirements on top of listening-
test.