Re: [codec] A concrete proposal for requirements and testing

Gregory Maxwell <gmaxwell@juniper.net> Thu, 07 April 2011 20:58 UTC

Return-Path: <gmaxwell@juniper.net>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id D72043A6958 for <codec@core3.amsl.com>; Thu, 7 Apr 2011 13:58:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.599
X-Spam-Level:
X-Spam-Status: No, score=-6.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ozVttEijXK1E for <codec@core3.amsl.com>; Thu, 7 Apr 2011 13:58:52 -0700 (PDT)
Received: from exprod7og117.obsmtp.com (exprod7og117.obsmtp.com [64.18.2.6]) by core3.amsl.com (Postfix) with ESMTP id D687C3A6940 for <codec@ietf.org>; Thu, 7 Apr 2011 13:58:51 -0700 (PDT)
Received: from P-EMHUB03-HQ.jnpr.net ([66.129.224.36]) (using TLSv1) by exprod7ob117.postini.com ([64.18.6.12]) with SMTP ID DSNKTZ4l8o/Xn1s0to+JbUFcrk+HwULgM0JK@postini.com; Thu, 07 Apr 2011 14:00:37 PDT
Received: from EMBX01-HQ.jnpr.net ([fe80::c821:7c81:f21f:8bc7]) by P-EMHUB03-HQ.jnpr.net ([::1]) with mapi; Thu, 7 Apr 2011 13:57:56 -0700
From: Gregory Maxwell <gmaxwell@juniper.net>
To: Peter Saint-Andre <stpeter@stpeter.im>, Paul Coverdale <coverdale@sympatico.ca>
Date: Thu, 07 Apr 2011 13:57:56 -0700
Thread-Topic: [codec] A concrete proposal for requirements and testing
Thread-Index: Acv1UoktscUF1wI4TrSiMZPbcG/SAAAAfL9J
Message-ID: <BCB3F026FAC4C145A4A3330806FEFDA93BA8B6462E@EMBX01-HQ.jnpr.net>
References: <64212FE1AE068044AD567CCB214073F123A10234@MAIL2.octasic.com> <F5AD4C2E5FBF304ABAE7394E9979AF7C26BC47FA@LHREML503-MBX.china.huawei.com> <027A93CE4A670242BD91A44E37105AEF17ACA33C36@ESESSCMS0351.eemea.ericsson.se> <20110407125345.GA30415@audi.shelbyville.oz> <BANLkTimeDEPY8va6_MQVztn3YGyTZ2LmVw@mail.gmail.com> <20110407164817.GB30415@audi.shelbyville.oz> <BLU0-SMTP522E3F60CF41CCB8108C96D0A40@phx.gbl>, <4D9E0443.6040703@stpeter.im>
In-Reply-To: <4D9E0443.6040703@stpeter.im>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "codec@ietf.org" <codec@ietf.org>, 'Stephen Botzko' <stephen.botzko@gmail.com>
Subject: Re: [codec] A concrete proposal for requirements and testing
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Apr 2011 20:58:54 -0000

Peter Saint-Andre [stpeter@stpeter.im] wrote:
> I completely agree. That's why it is so important for us to publish
> draft-ietf-codec-opus as a Proposed Standard.
> The IETF tradition is one of "rough consensus and running code". What
> this means is that we work hard on a new technology and publish it as an
> RFC ("request for comments"). This is the rough consensus part.
> Then we implement it and deploy it. Testing is something that happens
> naturally when we run a technology on networks. This is the "running
> code" part.
> Implementation and deployment experience can lead to revisions and
> refinements and better specification of the technology. Of course we'll
> also seek rough consensus on any modifications, but it's not necessary,
>or even desirable, to get that implementation and deployment experience
> before we publish a specification as a Proposed Standard RFC.
[snip]
> Let's get to rough consensus and then start running this code on the
> network. That will be the true test.

Peter's message succinctly describes much of what I've been thinking but having a difficult time expressing clearly.
Thank you.

High effort post-hoc characterization is very useful for "marketing" and somewhat useful for "best practices", but it doesn't appear to be very useful at driving forward a collaborative development process.  We've seen this already: by the time the developers first saw Broadcom's listening test results, we'd already significantly improved the codec in at least one of the two cases where their test showed lower performance,  mostly based on the feedback of some people on hydrogenaudio trying out the codec and reporting weaknesses.  Likewise, we've received considerable computational complexity and memory usage feedback from people trying to implement the codec on _real_ systems and encountering _real_ difficulties, and I think that this is far more useful than anything we could hope to achieve from a prolonged debate about abstract WMOPS figures.

In order to expand the testing further we need to move beyond testing only parts, and we need to start creating some interoperability rather than having everyone using different incompatible development bitstreams.

Anisse Taleb [anisse.taleb@huawei.com] wrote:
> Dear Ron,
> There are many individuals and organizations that may be willing to help the testing effort on a voluntary basis provided there is a test plan that is agreed and represents a consensus of the group of what is to be tested.
> The results we have are incomplete, there are many cases which needs to be tested and we are trying to build consensus over those and derive a test plan that can be agreed and be conducted by, hopefully, more than one organization.
> Undermining this WG testing effort doesn't encourage organizations to chip in their resources and technical expertise to make this work go smoothly and according to plan.

Many people here are not especially picky about what testing is performed by other people.  Personally, I invite all comers to perform whatever testing they find interesting and helpful for their own purposes.  I'm sure the working group—as a whole—will find almost any results interesting.  Different groups have different requirements.  For example, some people are interested in Opus' performance in non-interactive applications. These are outside of the scope of the working group, but the results are still interesting.  Other people care about low bitrate performance, which I think is of less interest to many people here, or stereo support, which is interesting to a different and mostly non-overlapping set of people.  I expect that many groups will do the characterization which they find to be most applicable to their interests, and they'll report back to the working group.  I do not see why much consensus is required in advance of this kind of testing—except on the basis of ensuring basic scientific method (e.g. a non-blind test would likely be uninteresting to everyone), which we would probably find that we already agree on without even having a discussion.

What I find more concerning is this late in the process effort to raise the _requirements_: from requiring the WG product to be better than the nearest alternatives according to the WG charter (royalty free-ish things), to requiring it to be better than "best of breed" royalty-bearing codecs in each of the several broadly defined operating areas.  I believe this change is contrary to the intended purpose of the working group, as the charter acknowledges the existence of alternative standards which do not meet the intended licensing goals several times but never proposes the our output must have higher listening test performance then these codecs. Accepting this requirement risks the paradoxical outcome of being unable to finish a codec even if we have one which meets the explicit purpose of the WG's creation. Not being able to _guarantee_ a result, as a reality of the law, is not the same as willfully failing to achieve that result— so I don't find the argument that we may fail compelling.  The charter is abundantly clear about the purpose of the working group and I think it doesn't make any sense to change from optimizing for cases where the WG's purpose is successful to optimizing for cases where the working group has failed especially when such an optimization would increase the chance of failure!

In particular, I think the initial requirements were establishing a level below which collaborative development wouldn't work. If we put out something which underperformed Speex, for example, few people would deploy it and we'd gain little operational feedback from which we could further improve the codec and/or implementations. But at least beating Speex would ensure rapid deployment from at least the segment which is already using Speex and provide the operational feedback required to collaboratively advance the codec, as already we've seen from some users who were using Speex and who could tolerate a non-frozen non-interoperable codec.

By conflating two issues—characterization targets and requirements—we've taken what should be two fairly uncontroversial issues and converted it into a contentious one.  The testing should be uncontroversial because absent a requirement gate the working group will probably support whatever testing the participants find interesting and are able to perform, and the requirements issue should be uncontroversial because there are relatively few codecs which come anywhere near the WG's stated purpose.