Re: [codec] Discussion around ITU LS

Gregory Maxwell <> Wed, 14 September 2011 15:39 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 45DB521F85D1 for <>; Wed, 14 Sep 2011 08:39:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -6.438
X-Spam-Status: No, score=-6.438 tagged_above=-999 required=5 tests=[AWL=0.161, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id KqyK1Ja+UMKQ for <>; Wed, 14 Sep 2011 08:39:30 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id D201821F8669 for <>; Wed, 14 Sep 2011 08:39:25 -0700 (PDT)
Received: from ([]) (using TLSv1) by ([]) with SMTP ID; Wed, 14 Sep 2011 08:41:39 PDT
Received: from ( by ( with Microsoft SMTP Server (TLS) id; Wed, 14 Sep 2011 08:40:14 -0700
Received: from ([fe80::c821:7c81:f21f:8bc7]) by ([::1]) with mapi; Wed, 14 Sep 2011 08:40:14 -0700
From: Gregory Maxwell <>
To: "Michael Ramalho (mramalho)" <>, Anisse Taleb <>, Jean-Marc Valin <>
Date: Wed, 14 Sep 2011 08:40:13 -0700
Thread-Topic: [codec] Discussion around ITU LS
Message-ID: <>
References: <><><><><><> <>, <>
In-Reply-To: <>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
x-exclaimer-md-config: cf379b08-bd3c-4ba0-a9bb-2500e331bd80
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: Jonathan Rosenberg <>, "" <>
Subject: Re: [codec] Discussion around ITU LS
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 14 Sep 2011 15:39:31 -0000

Michael Ramalho (mramalho) []:
> I find it amazing that Anisse's constructive comments are being met with
> such resistance ... as such capabilities were touted as a major reason
> why this work needed to be performed in the IETF.

I find it unfortunate that Jean-Marc's comments are being interpreted as 
"resistance" as he plainly invited the contribution of such functionality
while saying that he still thought it would best be done another way.

(And, for whatever it's worth,  I think he's right:  It's the jitter buffer
this functionality needs to be integrated with far more so than
the codec itself  I also think that Christian is right and the codec could
export some information which would be useful)

Personally, I don't understand how simply not jumping at the opportunity
to do a bunch of software development and IPR clearance work for
functionality that you aren't interested in, won't use, and think 
could better be done another way can fairly be called resistance.

I also think you should take a careful look at the old discussion
surrounding the requirements like this— you'll see that there was
clear opposition to reinventing the whole of the internet inside the
codec.  There is a careful balancing act.

Generally I think we should avoid adding functionality to the codec
specification which isn't itself important for interoperability or making
the codec functional.

I expect future Opus implementers will add a variety of useful things
to their implementations— better error concealment, more intelligent
encoders, echo cancellation, etc.  We can't hope to match in finite
time the quality and brilliance of the work the whole world can
produce in the indefinite future.

I would probably have let out the loss concealment and the _encoder_
from the draft,  except that having these things was essential for
proving that a codec meeting the community's needs would actually
be available to them and it's needed for the reference implementation
to be functionally complete. But I can't say the same thing about

Or perhaps there is a miscommunication here:  Opus _does_ have
rudimentary "stretching" of the at least of the level required
to make the code usable over the internet:  If you invoke the decoder
without input (null data buffer) it will generate filler audio, making
the signal longer. Likewise, if you skip a packet on decode (especially
an inactive one) you'll shorten the signal without anything bad

This works fine, and it's how all of the pre-final adoption of CELT
in the open source world has worked. As far as I know _none_
of our great many early adopters have requested anything

When JM and I hear complaint that Opus doesn't provide "time
stretching" we're mostly thinking about  high complexity/high fidelity
algorithms for this purpose instead of that basic one.  We think of
Non-basic functionality which isn't needed to interop, which will have
a platform and application specific nature— (since you can usually
afford better stretching on a desktop than on a phone), and which
can be added by anyone at any time in the future...  

One way to address this, if people consider it important, would be
to develop another draft for that purpose.  It's something I would
have instantly proposed in response to the request— except I know
that many people here are concerned about working group
scope creep.  

(Though, if it isn't opposed it as scope creep as part of the
codec draft, why oppose it as a freestanding draft?)

> How exactly is OPUS technically differentiated (other than marginal
> differences in quality at bit rates within a factor of ~ 1.3x *) from
> existing codecs developed in other SDOs?

Can you suggest a comparable codec which supports latency at the 
5ms level? Another codec that can operate at _any_ bitrate congestion
control permits it?  Another codec that supports good fidelity speech at
11kbit/sec? another codec that gives good stereo music at 64kbit/sec?
Transparent stereo music near 100kbit/sec while giving latencies which
are acceptable for communication?  Another with similar licensing
status to Opus?

You can find examples for each of these, though you'll find
that Opus is still best of breed in each considered alone— but
you can not find anything that matches the composition overall or
even comes close.

When I look at the applications and requirements the strongest theme
visible to me is this theme of versatility— it's one that underlies many
of the devices being added to the network today (such as smartphones,
with their 1,00 1 "apps") as well as the the trends in webbrowser 
technology where small collections of flexible tools and development
glue are being used to invent applications that where never dreamed
of by the developers of the infrastructure.

We are already _well_ differentiated, and if you're not willing to
accept that then why would you accept that adding additional
stretching would resolve it? After all, the same code could be copied
to the implementation of any other codec.