Re: [codec] A concrete proposal for requirements and testing

Roman: 

How could what Stephen Botzko describes as "a codec characterization/quality assessment that will be done on the finished product " possibly improve coding efficiency? 

best, 
koen. 

----- Original Message -----
From: "Roman Shpount" <roman@telurix.com> 
To: "Koen Vos" <koen.vos@skype.net> 
Cc: "Stephen Botzko" <stephen.botzko@gmail.com>, codec@ietf.org 
Sent: Thursday, April 7, 2011 5:50:06 PM 
Subject: Re: [codec] A concrete proposal for requirements and testing 

Quoting Nokia's paper: "Codecs done without thorough standardization effort like Speex and iLBC offer significantly reduced efficiency, probably due to much lesser optimization, listening tests and IPR free design." Testing is an important part of development and standardization process for a CODEC. 

It is in everybody's interest to produce the best CODEC possible. Comments that "we listen to it and we like it" and "you are free to run your own tests", even though valid, are not very productive. I think the best course possible is to treat testing the same way as we do development. We need to collaborate in putting together a comprehensive test plan, and then in actual testing. I am not sure I can contribute a lot in creating a test plan, but I can definitely contribute in testing if such test plan exists. I think this is a common situation for a lot of people in this group. 
_____________ 
Roman Shpount 

On Thu, Apr 7, 2011 at 8:24 PM, Koen Vos < koen.vos@skype.net > wrote: 

Stephen Botzko wrote: 
> This is not a debugging task, it is a codec characterization/quality assessment that will be done on the finished product. 

Does such testing have a realistic chance of revealing something that should alter the course of the WG? 
If not, then the testing can be done separately. 
If yes: how precisely? 

Is the concern that the codec may not be good enough to be worth publishing, given the other codecs already out there? 
--> In the tests so far, Opus has consistently outperformed iLBC, Speex, G.722.1, etc. To me it seems not realistic to expect that pattern to reverse with more or "better" testing. 

Or is the concern that for certain input signals or network conditions the codec produces results that are somehow unacceptable? 
--> Opus has gone through all kinds of testing during development. Feel free to do some more, you don't need test plans or consensus for that. Remember that there will always be untested cases (speech with dog barking in background over dial up hasn't been tried yet I believe), and that's ok. 

Characterization may be useful, but I don't see why it should be a deliverable of the WG. 

best, 
koen. 

From: "Stephen Botzko" < stephen.botzko@gmail.com > 
To: "Ron" < ron@debian.org > 
Cc: codec@ietf.org 
Sent: Thursday, April 7, 2011 8:46:05 AM 

Subject: Re: [codec] A concrete proposal for requirements and testing 

Hi Ron 

This is not a debugging task, it is a codec characterization/quality assessment that will be done on the finished product. 

In my view, the proper way to go about it is to first get consensus on the tests that need to be run, what results are needed, and how the tests need to be conducted in order to meaningfully compare the results. Then get folks signed up to actually do the tests. 

This is not about collaboration vs competition, rather it is about running distributed tests with results that can integrated and scientifically analyzed/compared. And recording the test methods, in order to allow other people in the future to re-do the tests and duplicate the results. 

Regards, 
Stephen Botzko 

On Thu, Apr 7, 2011 at 8:53 AM, Ron < ron@debian.org > wrote: 

Hi, 

So I must say at the outset, that I'm greatly encouraged by the renewed 
vigour and interest we've seen here in the last few weeks. Though much 
work has gone on in the background between the active developers, we 
haven't really seen such a flurry of interested contributions, since, 
well, since many of these same people first tried to veto the formation 
of this group because they didn't believe that we could do it. :> 

I'm grateful to them for sticking with us, and helping us to sanity check 
that really, we actually have. Beyond the wildest expectations of both 
the 'for' and 'against' crowds it would seem so far. 

Good engineering is about finding faults that can be fixed. If no other 
thing binds us to a common purpose, I think that we can all agree on. 
And faults are faults, whether pointed out by friend or foe, so to speak. 

What I'd really like to point out at this stage though, is that there are 
some very fundamental differences between the way that a collaborative 
group such as this one tends to work, as compared to a competitive group 
such as some other SDOs seem to prefer. 

When you have a relatively closed organisation, comprised of companies 
that each would like to have a monopoly hold over some technology, and 
which aren't particularly interested in sharing their secrets with others 
except when it is to their direct advantage over them, then it does seem 
logical to lay down some rules in advance, let each group work in isolation 
and then have a shootout at the end to see who "wins". 

What we have here though, is both in-principle, and as we've seen, in 
practice, quite different to that. Instead of taking a dog-eat-dog 
approach, all of the people with real technology to contribute have 
instead banded together to create a single solution which is better 
than what any of them initially had to offer on their own. 

Even the testing that has occurred to date has been of quite a different 
flavour to what many of the currently active voices here would probably 
be used to. When you have a competitive process, each group is naturally 
going to try to advocate the tests for which their particular technology 
is known (by them) to be superior at. 

Much of the testing I've observed for Opus however has been quite the 
opposite. The developers have actively sought out the tests for which 
the codec sucks the worst at, and compared it to the results of codecs 
which outperform it (or should do given their relative constraints), 
in order to find the things with the greatest scope for improvement. 

We've already seen a number of published tests. And we've in turn seen 
some people challenge the validity of those tests. What we haven't seen 
so far is any tests which prove the validity of those challenges. 

This is an open process. Anyone is free to test anything they wish and 
provide feedback to the group as to their findings. So far the people 
who have done tests seem to have indicated that they are satisfied with 
the results they have seen, and have neither suggestions for things that 
need to improve further, nor plans for further tests of their own, that 
they haven't already shared. 

Roni hinted at the Prague meeting that certain "internal testing" had 
taken place which he was aware of. That we haven't seen the results of 
those tests I can only assume means they paint us in as favourable a 
light as the disclosed tests have. 

Stephan intimated that he was "not trying to obstruct this process, 
anymore". From which I wonder if he also has seen tests that prove 
there is something valuable in this for his company too (but I won't 
speculate, beyond inviting him to explain for himself the reason(s) 
for this change of heart :) 

On Wed, Apr 06, 2011 at 10:39:20AM +0200, Erik Norvell wrote: 
> The tests presented so far serve well in aiding the development work, but 
> they are not mature enough to support general conclusions on the Opus 
> performance. I think the examples from Paul are a good starting point for 
> specifying the tests. 

I think that's a fair statement to make, but if Erik and Paul have 
doubts they want allayed, then I don't think it's really within the 
power of any of us to help them to their own conclusions, except to 
invite them to present their own testing that they do find satisfactory 
and have it peer reviewed here like the rest of the tests to date. 

So I'd like to suggest something like the following: 

Let's give people a week or so to formally propose the test plans which 
*they* intend to conduct, along with a time when they plan to have them 
completed. Others can give input as to how the tests may be improved. 
Then bring those results back to the group for an open discussion of 
their relevance to assessing the codec. 

Based on those results, we can then assess if the codec needs further work 
or whether it is sufficient to release as is, for wider evaluation in the 
real world roles that we're all waiting to deploy it in. 

I think getting bogged down in devising an intractable number of tests 
that nobody here is actually proposing to perform, isn't advancing the 
goals of the WG. Nobody doubts we'll beat G.711, so proposing to test 
against it is pointless. Likewise testing against other codecs that 
*cannot* fill the requirements the WG initially set out, may be good 
for bragging rights, but offers little or nothing in the way of a 
meaningful and useful comparison. 

The kind of tests and minimum performance requirements that we had 
envisaged at the outset, did not forsee that we'd be outperforming 
HE-AAC on an unconstrained listening test. We certainly weren't when 
the WG was first formed. In the light of that result, and others like 
it, it doesn't seem unreasonable at all to not waste time testing it 
against lesser codecs from a former state of the art. But if someone 
wants to do those tests themselves and report on the results, then by 
all means, please do. Just don't expect us to wait for you if you 
take a really long time to do that. 

So please, let's avoid the shootout mentality. We have a candidate 
codec. Go test it. If you find something wrong, then please tell us 
while there is still time to fix it. If you don't, let's move on and 
ship it. The Real testing won't actually begin until we take that 
next step into the real world, and no contrived test can model that 
in a way that will satisfy everyone. So throw your worst at it now, 
because if you can't break it, then claims the LC is premature will 
be rather hard to sustain for very much longer. 

If you made it this far, 
Thanks :) 

Ron 

_______________________________________________ 
codec mailing list 
codec@ietf.org 
https://www.ietf.org/mailman/listinfo/codec 

_______________________________________________ 
codec mailing list 
codec@ietf.org 
https://www.ietf.org/mailman/listinfo/codec 

_______________________________________________ 
codec mailing list 
codec@ietf.org 
https://www.ietf.org/mailman/listinfo/codec