Re: [codec] A concrete proposal for requirements and testing

Hi Ron

This is not a debugging task, it is a codec characterization/quality
assessment that will be done on the finished product.

In my view, the proper way to go about it is to first get consensus on the
tests that need to be run, what results are needed, and how the tests need
to be conducted in order to meaningfully compare the results.  Then get
folks signed up to actually do the tests.

This is not about collaboration vs competition, rather it is about running
distributed tests with results that can integrated and scientifically
analyzed/compared.  And recording the test methods, in order to allow other
people in the future to re-do the tests and duplicate the results.

Regards,
Stephen Botzko

On Thu, Apr 7, 2011 at 8:53 AM, Ron <ron@debian.org> wrote:

>
> Hi,
>
> So I must say at the outset, that I'm greatly encouraged by the renewed
> vigour and interest we've seen here in the last few weeks.  Though much
> work has gone on in the background between the active developers, we
> haven't really seen such a flurry of interested contributions, since,
> well, since many of these same people first tried to veto the formation
> of this group because they didn't believe that we could do it.  :>
>
> I'm grateful to them for sticking with us, and helping us to sanity check
> that really, we actually have.  Beyond the wildest expectations of both
> the 'for' and 'against' crowds it would seem so far.
>
> Good engineering is about finding faults that can be fixed.  If no other
> thing binds us to a common purpose, I think that we can all agree on.
> And faults are faults, whether pointed out by friend or foe, so to speak.
>
>
> What I'd really like to point out at this stage though, is that there are
> some very fundamental differences between the way that a collaborative
> group such as this one tends to work, as compared to a competitive group
> such as some other SDOs seem to prefer.
>
> When you have a relatively closed organisation, comprised of companies
> that each would like to have a monopoly hold over some technology, and
> which aren't particularly interested in sharing their secrets with others
> except when it is to their direct advantage over them, then it does seem
> logical to lay down some rules in advance, let each group work in isolation
> and then have a shootout at the end to see who "wins".
>
> What we have here though, is both in-principle, and as we've seen, in
> practice, quite different to that.  Instead of taking a dog-eat-dog
> approach, all of the people with real technology to contribute have
> instead banded together to create a single solution which is better
> than what any of them initially had to offer on their own.
>
> Even the testing that has occurred to date has been of quite a different
> flavour to what many of the currently active voices here would probably
> be used to.  When you have a competitive process, each group is naturally
> going to try to advocate the tests for which their particular technology
> is known (by them) to be superior at.
>
> Much of the testing I've observed for Opus however has been quite the
> opposite.  The developers have actively sought out the tests for which
> the codec sucks the worst at, and compared it to the results of codecs
> which outperform it (or should do given their relative constraints),
> in order to find the things with the greatest scope for improvement.
>
>
> We've already seen a number of published tests.  And we've in turn seen
> some people challenge the validity of those tests.  What we haven't seen
> so far is any tests which prove the validity of those challenges.
>
> This is an open process.  Anyone is free to test anything they wish and
> provide feedback to the group as to their findings.  So far the people
> who have done tests seem to have indicated that they are satisfied with
> the results they have seen, and have neither suggestions for things that
> need to improve further, nor plans for further tests of their own, that
> they haven't already shared.
>
>
> Roni hinted at the Prague meeting that certain "internal testing" had
> taken place which he was aware of.  That we haven't seen the results of
> those tests I can only assume means they paint us in as favourable a
> light as the disclosed tests have.
>
> Stephan intimated that he was "not trying to obstruct this process,
> anymore".  From which I wonder if he also has seen tests that prove
> there is something valuable in this for his company too (but I won't
> speculate, beyond inviting him to explain for himself the reason(s)
> for this change of heart :)
>
> On Wed, Apr 06, 2011 at 10:39:20AM +0200, Erik Norvell wrote:
> > The tests presented so far serve well in aiding the development work, but
> > they are not mature enough to support general conclusions on the Opus
> > performance. I think the examples from Paul are a good starting point for
> > specifying the tests.
>
> I think that's a fair statement to make, but if Erik and Paul have
> doubts they want allayed, then I don't think it's really within the
> power of any of us to help them to their own conclusions, except to
> invite them to present their own testing that they do find satisfactory
> and have it peer reviewed here like the rest of the tests to date.
>
>
> So I'd like to suggest something like the following:
>
> Let's give people a week or so to formally propose the test plans which
> *they* intend to conduct, along with a time when they plan to have them
> completed.  Others can give input as to how the tests may be improved.
> Then bring those results back to the group for an open discussion of
> their relevance to assessing the codec.
>
> Based on those results, we can then assess if the codec needs further work
> or whether it is sufficient to release as is, for wider evaluation in the
> real world roles that we're all waiting to deploy it in.
>
> I think getting bogged down in devising an intractable number of tests
> that nobody here is actually proposing to perform, isn't advancing the
> goals of the WG.  Nobody doubts we'll beat G.711, so proposing to test
> against it is pointless.  Likewise testing against other codecs that
> *cannot* fill the requirements the WG initially set out, may be good
> for bragging rights, but offers little or nothing in the way of a
> meaningful and useful comparison.
>
> The kind of tests and minimum performance requirements that we had
> envisaged at the outset, did not forsee that we'd be outperforming
> HE-AAC on an unconstrained listening test.  We certainly weren't when
> the WG was first formed.  In the light of that result, and others like
> it, it doesn't seem unreasonable at all to not waste time testing it
> against lesser codecs from a former state of the art.  But if someone
> wants to do those tests themselves and report on the results, then by
> all means, please do.  Just don't expect us to wait for you if you
> take a really long time to do that.
>
>
> So please, let's avoid the shootout mentality.  We have a candidate
> codec.  Go test it.  If you find something wrong, then please tell us
> while there is still time to fix it.  If you don't, let's move on and
> ship it.  The Real testing won't actually begin until we take that
> next step into the real world, and no contrived test can model that
> in a way that will satisfy everyone.  So throw your worst at it now,
> because if you can't break it, then claims the LC is premature will
> be rather hard to sustain for very much longer.
>
>
> If you made it this far,
> Thanks :)
>
> Ron
>
>
> _______________________________________________
> codec mailing list
> codec@ietf.org
> https://www.ietf.org/mailman/listinfo/codec
>