Re: [codec] A concrete proposal for requirements and testing

Ron <ron@debian.org> Thu, 07 April 2011 12:52 UTC

Return-Path: <ron@debian.org>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 1233B3A6A0A for <codec@core3.amsl.com>; Thu, 7 Apr 2011 05:52:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eJE+YOVAedk1 for <codec@core3.amsl.com>; Thu, 7 Apr 2011 05:52:04 -0700 (PDT)
Received: from ipmail05.adl6.internode.on.net (ipmail05.adl6.internode.on.net [150.101.137.143]) by core3.amsl.com (Postfix) with ESMTP id 4C3283A6A0E for <codec@ietf.org>; Thu, 7 Apr 2011 05:52:03 -0700 (PDT)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AvsEAKyynU120jym/2dsb2JhbACmAnjBW4VtBIVOh3c
Received: from ppp118-210-60-166.lns20.adl2.internode.on.net (HELO audi.shelbyville.oz) ([118.210.60.166]) by ipmail05.adl6.internode.on.net with ESMTP; 07 Apr 2011 22:23:47 +0930
Received: from localhost (localhost [127.0.0.1]) by audi.shelbyville.oz (Postfix) with ESMTP id C11BC4F8F3 for <codec@ietf.org>; Thu, 7 Apr 2011 22:23:45 +0930 (CST)
X-Virus-Scanned: Debian amavisd-new at audi.shelbyville.oz
Received: from audi.shelbyville.oz ([127.0.0.1]) by localhost (audi.shelbyville.oz [127.0.0.1]) (amavisd-new, port 10024) with LMTP id CvijBCAgxej3 for <codec@ietf.org>; Thu, 7 Apr 2011 22:23:45 +0930 (CST)
Received: by audi.shelbyville.oz (Postfix, from userid 1000) id 2F7FF4F8FE; Thu, 7 Apr 2011 22:23:45 +0930 (CST)
Date: Thu, 07 Apr 2011 22:23:45 +0930
From: Ron <ron@debian.org>
To: codec@ietf.org
Message-ID: <20110407125345.GA30415@audi.shelbyville.oz>
References: <64212FE1AE068044AD567CCB214073F123A10234@MAIL2.octasic.com> <F5AD4C2E5FBF304ABAE7394E9979AF7C26BC47FA@LHREML503-MBX.china.huawei.com> <027A93CE4A670242BD91A44E37105AEF17ACA33C36@ESESSCMS0351.eemea.ericsson.se>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <027A93CE4A670242BD91A44E37105AEF17ACA33C36@ESESSCMS0351.eemea.ericsson.se>
User-Agent: Mutt/1.5.20 (2009-06-14)
Subject: Re: [codec] A concrete proposal for requirements and testing
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Apr 2011 12:52:05 -0000

Hi,

So I must say at the outset, that I'm greatly encouraged by the renewed
vigour and interest we've seen here in the last few weeks.  Though much
work has gone on in the background between the active developers, we
haven't really seen such a flurry of interested contributions, since,
well, since many of these same people first tried to veto the formation
of this group because they didn't believe that we could do it.  :>

I'm grateful to them for sticking with us, and helping us to sanity check
that really, we actually have.  Beyond the wildest expectations of both
the 'for' and 'against' crowds it would seem so far.

Good engineering is about finding faults that can be fixed.  If no other
thing binds us to a common purpose, I think that we can all agree on.
And faults are faults, whether pointed out by friend or foe, so to speak.


What I'd really like to point out at this stage though, is that there are
some very fundamental differences between the way that a collaborative
group such as this one tends to work, as compared to a competitive group
such as some other SDOs seem to prefer.

When you have a relatively closed organisation, comprised of companies
that each would like to have a monopoly hold over some technology, and
which aren't particularly interested in sharing their secrets with others
except when it is to their direct advantage over them, then it does seem
logical to lay down some rules in advance, let each group work in isolation
and then have a shootout at the end to see who "wins".

What we have here though, is both in-principle, and as we've seen, in
practice, quite different to that.  Instead of taking a dog-eat-dog
approach, all of the people with real technology to contribute have
instead banded together to create a single solution which is better
than what any of them initially had to offer on their own.

Even the testing that has occurred to date has been of quite a different
flavour to what many of the currently active voices here would probably
be used to.  When you have a competitive process, each group is naturally
going to try to advocate the tests for which their particular technology
is known (by them) to be superior at.

Much of the testing I've observed for Opus however has been quite the
opposite.  The developers have actively sought out the tests for which
the codec sucks the worst at, and compared it to the results of codecs
which outperform it (or should do given their relative constraints),
in order to find the things with the greatest scope for improvement.


We've already seen a number of published tests.  And we've in turn seen
some people challenge the validity of those tests.  What we haven't seen
so far is any tests which prove the validity of those challenges.

This is an open process.  Anyone is free to test anything they wish and
provide feedback to the group as to their findings.  So far the people
who have done tests seem to have indicated that they are satisfied with
the results they have seen, and have neither suggestions for things that
need to improve further, nor plans for further tests of their own, that
they haven't already shared.


Roni hinted at the Prague meeting that certain "internal testing" had
taken place which he was aware of.  That we haven't seen the results of
those tests I can only assume means they paint us in as favourable a
light as the disclosed tests have.

Stephan intimated that he was "not trying to obstruct this process,
anymore".  From which I wonder if he also has seen tests that prove
there is something valuable in this for his company too (but I won't
speculate, beyond inviting him to explain for himself the reason(s)
for this change of heart :)

On Wed, Apr 06, 2011 at 10:39:20AM +0200, Erik Norvell wrote:
> The tests presented so far serve well in aiding the development work, but
> they are not mature enough to support general conclusions on the Opus
> performance. I think the examples from Paul are a good starting point for
> specifying the tests.

I think that's a fair statement to make, but if Erik and Paul have
doubts they want allayed, then I don't think it's really within the
power of any of us to help them to their own conclusions, except to
invite them to present their own testing that they do find satisfactory
and have it peer reviewed here like the rest of the tests to date.


So I'd like to suggest something like the following:

Let's give people a week or so to formally propose the test plans which
*they* intend to conduct, along with a time when they plan to have them
completed.  Others can give input as to how the tests may be improved.
Then bring those results back to the group for an open discussion of
their relevance to assessing the codec.

Based on those results, we can then assess if the codec needs further work
or whether it is sufficient to release as is, for wider evaluation in the
real world roles that we're all waiting to deploy it in.

I think getting bogged down in devising an intractable number of tests
that nobody here is actually proposing to perform, isn't advancing the
goals of the WG.  Nobody doubts we'll beat G.711, so proposing to test
against it is pointless.  Likewise testing against other codecs that
*cannot* fill the requirements the WG initially set out, may be good
for bragging rights, but offers little or nothing in the way of a
meaningful and useful comparison.

The kind of tests and minimum performance requirements that we had
envisaged at the outset, did not forsee that we'd be outperforming
HE-AAC on an unconstrained listening test.  We certainly weren't when
the WG was first formed.  In the light of that result, and others like
it, it doesn't seem unreasonable at all to not waste time testing it
against lesser codecs from a former state of the art.  But if someone
wants to do those tests themselves and report on the results, then by
all means, please do.  Just don't expect us to wait for you if you
take a really long time to do that.


So please, let's avoid the shootout mentality.  We have a candidate
codec.  Go test it.  If you find something wrong, then please tell us
while there is still time to fix it.  If you don't, let's move on and
ship it.  The Real testing won't actually begin until we take that
next step into the real world, and no contrived test can model that
in a way that will satisfy everyone.  So throw your worst at it now,
because if you can't break it, then claims the LC is premature will
be rather hard to sustain for very much longer.


If you made it this far,
Thanks :)

Ron