Re: [codec] draft test and processing plan for the IETF Codec

in-line

On Mon, Apr 18, 2011 at 7:47 AM, Ron <ron@debian.org> wrote:

>
> Hi Anisse,
>
> On Mon, Apr 18, 2011 at 09:52:42AM +0000, Anisse Taleb wrote:
> > Hi Ben,
> >
> > > On 04/13/2011 03:32 AM, Anisse Taleb wrote:
> > > > Please find attached a first draft of a test plan of the IETF codec
> > > (Opus).
> > >
> > > Thank you for drawing up this test plan, which clearly required a great
> > > deal of thought.  The results of such testing would certainly be very
> > > interesting to many.
> > >
> > > However, I think the execution of such a test is clearly _not_ an
> > > appropriate prerequisite for publishing a Proposed Standard.  By my
> > > calculations, the draft plan presently calls for over 1300 hours of
> > > listening tests, counting only audio being played, estimating 10-second
> > > samples and the minimum number of listeners.  Even if many listeners
> are
> > > listening in parallel, and overheads (such as delays between samples)
> are
> > > low, conducting such a test would still take many months.
> >
> > It is always a good practice to first have a target on what would be
> tested
> > and then find ways how to make the test realistic and reasonable. When it
> > comes to the proposal itself, I think that shortcuts have been taken
> already.
> > I am not against discussing the size of the test, the draft proposal was
> > exactly made to initiate such discussion...
> > >
> > > Such an extensive, expensive battery of tests can hardly be justified
> on
> > > some arbitrary codec version still under development.
> >
> > I cannot agree more. Freeze a version of Opus, and let's check the
> quality of
> > the codec. If it passes the quality expectations, it will become a
> standard.
> >
> > -- But before that, clean up the code and the specification and fix the
> IPR
> > issues. Right now the codec does not pass the "admin" part of
> requirements.
>
> I thought it was already agreed that the people acting in good faith would
> endeavour to conduct as much of this in parallel as possible.
>
> I'm sure we have plenty of time to file off the rough edges while you
> gather
> enough people to run the two hundred and something nonillion iterations of
> your test that Gregory showed would be necessary for it to approach an even
> remotely significant result that wasn't entirely a function of chance.
>
> It saddens me to see you play a cheap shot like this at Ben, when so many
> people are eagerly awaiting your explanation as to whether that was simply
> an error in your math, or a factor you had not considered.  Or possibly an
> essential ingredient in your insistence of a single do-or-die test?  That
> nobody could possibly afford to repeat independently ...
>
> Maybe I am missing something, but I am not seeing any cheap shots at Ben or
anyone else in Anisse's post.  The "nonillion iterations" and "heat death of
the universe" stuff in your reply are perhaps pejorative, though as we all
know it is hard to judge intentions from emails.  I agree with Cullen that
we should assume good faith - that the proponents of systematic testing are
not trying to kill standardization of Opus, but instead simply believe that
such testing is important part of codec standardization, no matter what SDO
is doing the work.  As far as I know, that is the truth of it.

The answer to the statistical argument is quite simple - people run
cross-checks and follow up as needed to verify failed results.  This allows
fairly efficient weeding out of false negatives and the occasional false
positive), and is one reason why the test procedures need to be
well-documented (so they can be verified/reproduced). The math exercise
presumed that the test procedure could only be run once, and that failure to
pass a requirement or two could not be followed up on.  That is not the
case, if we see a result that concerns us, we can follow up.  Perhaps the
test plan should say this explicitly, or perhaps we can just agree to
discuss needed follow-ups when we see the results.

>
> So please, we've mapped both extremes of what a non-test might look like
> now,
> and we've clearly shown, with absolute certainty, that this entire group
> will
> never, not before the heat death of the universe, agree upon and perform
> one
> single tell-all test which satisfies them all.  And that's before we
> consider
> the users who aren't represented in this testing yet.
>
> I think Cullen very accurately plotted where the middle ground may lay.
> Let's all gravitate a little closer to that now, can we please?
>
> Thanks Much!
> Ron
>
>
> _______________________________________________
> codec mailing list
> codec@ietf.org
> https://www.ietf.org/mailman/listinfo/codec
>