Re: [codec] A concrete proposal for requirements and testing

Roman Shpount <roman@telurix.com> Fri, 08 April 2011 00:48 UTC

Return-Path: <roman@telurix.com>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 14B3D3A695E for <codec@core3.amsl.com>; Thu, 7 Apr 2011 17:48:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.976
X-Spam-Level:
X-Spam-Status: No, score=-2.976 tagged_above=-999 required=5 tests=[AWL=-0.000, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NgRONJiY2JWS for <codec@core3.amsl.com>; Thu, 7 Apr 2011 17:48:26 -0700 (PDT)
Received: from mail-ew0-f44.google.com (mail-ew0-f44.google.com [209.85.215.44]) by core3.amsl.com (Postfix) with ESMTP id 23A4B3A6846 for <codec@ietf.org>; Thu, 7 Apr 2011 17:48:25 -0700 (PDT)
Received: by ewy19 with SMTP id 19so1109603ewy.31 for <codec@ietf.org>; Thu, 07 Apr 2011 17:50:10 -0700 (PDT)
Received: by 10.14.47.133 with SMTP id t5mr678696eeb.48.1302223808500; Thu, 07 Apr 2011 17:50:08 -0700 (PDT)
Received: from mail-ew0-f44.google.com (mail-ew0-f44.google.com [209.85.215.44]) by mx.google.com with ESMTPS id q53sm1384294eeh.4.2011.04.07.17.50.07 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 07 Apr 2011 17:50:07 -0700 (PDT)
Received: by ewy19 with SMTP id 19so1109588ewy.31 for <codec@ietf.org>; Thu, 07 Apr 2011 17:50:06 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.213.4.74 with SMTP id 10mr728818ebq.33.1302223806843; Thu, 07 Apr 2011 17:50:06 -0700 (PDT)
Received: by 10.213.4.14 with HTTP; Thu, 7 Apr 2011 17:50:06 -0700 (PDT)
In-Reply-To: <607546550.2587173.1302222242947.JavaMail.root@lu2-zimbra>
References: <BANLkTimeDEPY8va6_MQVztn3YGyTZ2LmVw@mail.gmail.com> <607546550.2587173.1302222242947.JavaMail.root@lu2-zimbra>
Date: Thu, 07 Apr 2011 20:50:06 -0400
Message-ID: <BANLkTinTZUyRBYUQq7igHB74r8cXsUMPCg@mail.gmail.com>
From: Roman Shpount <roman@telurix.com>
To: Koen Vos <koen.vos@skype.net>
Content-Type: multipart/alternative; boundary="0015174c379c6ccd8904a05d9b7a"
Cc: codec@ietf.org, Stephen Botzko <stephen.botzko@gmail.com>
Subject: Re: [codec] A concrete proposal for requirements and testing
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Apr 2011 00:48:29 -0000

Quoting Nokia's paper: "Codecs done without thorough standardization effort
like Speex and iLBC offer significantly reduced efficiency, probably due to
much lesser optimization, *listening tests* and IPR free design." Testing is
an important part of development and standardization process for a CODEC.

It is in everybody's interest to produce the best CODEC possible. Comments
that "we listen to it and we like it" and "you are free to run your own
tests", even though valid, are not very productive. I think the best course
possible is to treat testing the same way as we do development. We need to
collaborate in putting together a comprehensive test plan, and then in
actual testing. I am not sure I can contribute a lot in creating a test
plan, but I can definitely contribute in testing if such test plan exists. I
think this is a common situation for a lot of people in this group.
_____________
Roman Shpount


On Thu, Apr 7, 2011 at 8:24 PM, Koen Vos <koen.vos@skype.net> wrote:

> Stephen Botzko wrote:
> > This is not a debugging task, it is a codec characterization/quality
> assessment that will be done on the finished product.
>
> Does such testing have a realistic chance of revealing something that
> should alter the course of the WG?
> If not, then the testing can be done separately.
> If yes: how precisely?
>
> Is the concern that the codec may not be good enough to be worth
> publishing, given the other codecs already out there?
> --> In the tests so far, Opus has consistently outperformed iLBC, Speex,
> G.722.1, etc.  To me it seems not realistic to expect that pattern to
> reverse with more or "better" testing.
>
> Or is the concern that for certain input signals or network conditions the
> codec produces results that are somehow unacceptable?
> --> Opus has gone through all kinds of testing during development.  Feel
> free to do some more, you don't need test plans or consensus for that.
> Remember that there will always be untested cases (speech with dog barking
> in background over dial up hasn't been tried yet I believe), and that's ok.
>
>
> Characterization may be useful, but I don't see why it should be a
> deliverable of the WG.
>
> best,
> koen.
>
>
> ------------------------------
> *From: *"Stephen Botzko" <stephen.botzko@gmail.com>
> *To: *"Ron" <ron@debian.org>
> *Cc: *codec@ietf.org
> *Sent: *Thursday, April 7, 2011 8:46:05 AM
>
> *Subject: *Re: [codec] A concrete proposal for requirements and testing
>
> Hi Ron
>
> This is not a debugging task, it is a codec characterization/quality
> assessment that will be done on the finished product.
>
> In my view, the proper way to go about it is to first get consensus on the
> tests that need to be run, what results are needed, and how the tests need
> to be conducted in order to meaningfully compare the results.  Then get
> folks signed up to actually do the tests.
>
> This is not about collaboration vs competition, rather it is about running
> distributed tests with results that can integrated and scientifically
> analyzed/compared.  And recording the test methods, in order to allow other
> people in the future to re-do the tests and duplicate the results.
>
> Regards,
> Stephen Botzko
>
> On Thu, Apr 7, 2011 at 8:53 AM, Ron <ron@debian.org> wrote:
>
>>
>> Hi,
>>
>> So I must say at the outset, that I'm greatly encouraged by the renewed
>> vigour and interest we've seen here in the last few weeks.  Though much
>> work has gone on in the background between the active developers, we
>> haven't really seen such a flurry of interested contributions, since,
>> well, since many of these same people first tried to veto the formation
>> of this group because they didn't believe that we could do it.  :>
>>
>> I'm grateful to them for sticking with us, and helping us to sanity check
>> that really, we actually have.  Beyond the wildest expectations of both
>> the 'for' and 'against' crowds it would seem so far.
>>
>> Good engineering is about finding faults that can be fixed.  If no other
>> thing binds us to a common purpose, I think that we can all agree on.
>> And faults are faults, whether pointed out by friend or foe, so to speak.
>>
>>
>> What I'd really like to point out at this stage though, is that there are
>> some very fundamental differences between the way that a collaborative
>> group such as this one tends to work, as compared to a competitive group
>> such as some other SDOs seem to prefer.
>>
>> When you have a relatively closed organisation, comprised of companies
>> that each would like to have a monopoly hold over some technology, and
>> which aren't particularly interested in sharing their secrets with others
>> except when it is to their direct advantage over them, then it does seem
>> logical to lay down some rules in advance, let each group work in
>> isolation
>> and then have a shootout at the end to see who "wins".
>>
>> What we have here though, is both in-principle, and as we've seen, in
>> practice, quite different to that.  Instead of taking a dog-eat-dog
>> approach, all of the people with real technology to contribute have
>> instead banded together to create a single solution which is better
>> than what any of them initially had to offer on their own.
>>
>> Even the testing that has occurred to date has been of quite a different
>> flavour to what many of the currently active voices here would probably
>> be used to.  When you have a competitive process, each group is naturally
>> going to try to advocate the tests for which their particular technology
>> is known (by them) to be superior at.
>>
>> Much of the testing I've observed for Opus however has been quite the
>> opposite.  The developers have actively sought out the tests for which
>> the codec sucks the worst at, and compared it to the results of codecs
>> which outperform it (or should do given their relative constraints),
>> in order to find the things with the greatest scope for improvement.
>>
>>
>> We've already seen a number of published tests.  And we've in turn seen
>> some people challenge the validity of those tests.  What we haven't seen
>> so far is any tests which prove the validity of those challenges.
>>
>> This is an open process.  Anyone is free to test anything they wish and
>> provide feedback to the group as to their findings.  So far the people
>> who have done tests seem to have indicated that they are satisfied with
>> the results they have seen, and have neither suggestions for things that
>> need to improve further, nor plans for further tests of their own, that
>> they haven't already shared.
>>
>>
>> Roni hinted at the Prague meeting that certain "internal testing" had
>> taken place which he was aware of.  That we haven't seen the results of
>> those tests I can only assume means they paint us in as favourable a
>> light as the disclosed tests have.
>>
>> Stephan intimated that he was "not trying to obstruct this process,
>> anymore".  From which I wonder if he also has seen tests that prove
>> there is something valuable in this for his company too (but I won't
>> speculate, beyond inviting him to explain for himself the reason(s)
>> for this change of heart :)
>>
>> On Wed, Apr 06, 2011 at 10:39:20AM +0200, Erik Norvell wrote:
>> > The tests presented so far serve well in aiding the development work,
>> but
>> > they are not mature enough to support general conclusions on the Opus
>> > performance. I think the examples from Paul are a good starting point
>> for
>> > specifying the tests.
>>
>> I think that's a fair statement to make, but if Erik and Paul have
>> doubts they want allayed, then I don't think it's really within the
>> power of any of us to help them to their own conclusions, except to
>> invite them to present their own testing that they do find satisfactory
>> and have it peer reviewed here like the rest of the tests to date.
>>
>>
>> So I'd like to suggest something like the following:
>>
>> Let's give people a week or so to formally propose the test plans which
>> *they* intend to conduct, along with a time when they plan to have them
>> completed.  Others can give input as to how the tests may be improved.
>> Then bring those results back to the group for an open discussion of
>> their relevance to assessing the codec.
>>
>> Based on those results, we can then assess if the codec needs further work
>> or whether it is sufficient to release as is, for wider evaluation in the
>> real world roles that we're all waiting to deploy it in.
>>
>> I think getting bogged down in devising an intractable number of tests
>> that nobody here is actually proposing to perform, isn't advancing the
>> goals of the WG.  Nobody doubts we'll beat G.711, so proposing to test
>> against it is pointless.  Likewise testing against other codecs that
>> *cannot* fill the requirements the WG initially set out, may be good
>> for bragging rights, but offers little or nothing in the way of a
>> meaningful and useful comparison.
>>
>> The kind of tests and minimum performance requirements that we had
>> envisaged at the outset, did not forsee that we'd be outperforming
>> HE-AAC on an unconstrained listening test.  We certainly weren't when
>> the WG was first formed.  In the light of that result, and others like
>> it, it doesn't seem unreasonable at all to not waste time testing it
>> against lesser codecs from a former state of the art.  But if someone
>> wants to do those tests themselves and report on the results, then by
>> all means, please do.  Just don't expect us to wait for you if you
>> take a really long time to do that.
>>
>>
>> So please, let's avoid the shootout mentality.  We have a candidate
>> codec.  Go test it.  If you find something wrong, then please tell us
>> while there is still time to fix it.  If you don't, let's move on and
>> ship it.  The Real testing won't actually begin until we take that
>> next step into the real world, and no contrived test can model that
>> in a way that will satisfy everyone.  So throw your worst at it now,
>> because if you can't break it, then claims the LC is premature will
>> be rather hard to sustain for very much longer.
>>
>>
>> If you made it this far,
>> Thanks :)
>>
>> Ron
>>
>>
>> _______________________________________________
>> codec mailing list
>> codec@ietf.org
>> https://www.ietf.org/mailman/listinfo/codec
>>
>
>
> _______________________________________________
> codec mailing list
> codec@ietf.org
> https://www.ietf.org/mailman/listinfo/codec
>
> _______________________________________________
> codec mailing list
> codec@ietf.org
> https://www.ietf.org/mailman/listinfo/codec
>
>