Re: [codec] Discussion around ITU LS

"Michael Ramalho (mramalho)" <mramalho@cisco.com> Wed, 14 September 2011 17:56 UTC

Return-Path: <mramalho@cisco.com>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 91A2F21F8AF7 for <codec@ietfa.amsl.com>; Wed, 14 Sep 2011 10:56:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.243
X-Spam-Level:
X-Spam-Status: No, score=-2.243 tagged_above=-999 required=5 tests=[AWL=0.356, BAYES_00=-2.599]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 75sH2sDqPeHO for <codec@ietfa.amsl.com>; Wed, 14 Sep 2011 10:56:51 -0700 (PDT)
Received: from rcdn-iport-3.cisco.com (rcdn-iport-3.cisco.com [173.37.86.74]) by ietfa.amsl.com (Postfix) with ESMTP id 388C521F8ABC for <codec@ietf.org>; Wed, 14 Sep 2011 10:56:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=mramalho@cisco.com; l=8839; q=dns/txt; s=iport; t=1316023141; x=1317232741; h=mime-version:content-transfer-encoding:subject:date: message-id:in-reply-to:references:from:to:cc; bh=HzAtdqFdSwghgc7Zl/z09NFpZQz6RwQU8pkcV6hMhps=; b=lsZbI2Nj3CMXYZCaM+ODSuXHbxVsAXrp3DqHmJ9jEYZxG0uWx9/iVvXy ikEA9HFjRT8SKdcTFF1wpITCtNTWW3Qk34CuFgfRIFMm/zPfc/zul5veF u59ggLZAthzV60KlfDssfD3sOD23BVFkXQlX7KUAEVs/bs+0OftQcFRTw o=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ar4AAErqcE6tJXG8/2dsb2JhbABCmEuOe3iBUwEBAQECARIBHQo/BQcEAgEIDgMEAQELBhcBBgFFCQgCBAESCAwOh1WWXQGeVIYOYASHbpB1jB8
X-IronPort-AV: E=Sophos;i="4.68,381,1312156800"; d="scan'208";a="21481465"
Received: from rcdn-core2-1.cisco.com ([173.37.113.188]) by rcdn-iport-3.cisco.com with ESMTP; 14 Sep 2011 17:59:00 +0000
Received: from xbh-rcd-202.cisco.com (xbh-rcd-202.cisco.com [72.163.62.201]) by rcdn-core2-1.cisco.com (8.14.3/8.14.3) with ESMTP id p8EHx0w9017784; Wed, 14 Sep 2011 17:59:00 GMT
Received: from xmb-rcd-209.cisco.com ([72.163.62.216]) by xbh-rcd-202.cisco.com with Microsoft SMTPSVC(6.0.3790.4675); Wed, 14 Sep 2011 12:59:00 -0500
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Date: Wed, 14 Sep 2011 12:58:57 -0500
Message-ID: <999109E6BC528947A871CDEB5EB908A0049A1E38@XMB-RCD-209.cisco.com>
In-Reply-To: <BCB3F026FAC4C145A4A3330806FEFDA93CFB906CB3@EMBX01-HQ.jnpr.net>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: [codec] Discussion around ITU LS
Thread-Index: AQHMbNdFqNL2VxoWr0KOLjBFjn6nQ5VBQIaAgAIGLdWAADZ+gIAF5oZEgAKlxYCAALuIO4AAHcCQgAAW7ASAABq28A==
References: <35921B63-3FBC-411D-B587-4AB81F218E57@cisco.com><4E66F111.9070008@mozilla.com><6A58A83F7040374B9FB4EEEDBD835512A3FBAA@LHREML503-MBX.china.huawei.com><4E68D175.9090703@mozilla.com><6A58A83F7040374B9FB4EEEDBD835512A400EE@LHREML503-MBX.china.huawei.com><4E6FFD21.4090801@mozilla.com><6A58A83F7040374B9FB4EEEDBD835512A404F2@LHREML503-MBX.china.huawei.com>, <999109E6BC528947A871CDEB5EB908A0049A1C5C@XMB-RCD-209.cisco.com> <BCB3F026FAC4C145A4A3330806FEFDA93CFB906CB3@EMBX01-HQ.jnpr.net>
From: "Michael Ramalho (mramalho)" <mramalho@cisco.com>
To: Gregory Maxwell <gmaxwell@juniper.net>, Anisse Taleb <Anisse.Taleb@huawei.com>, Jean-Marc Valin <jmvalin@mozilla.com>
X-OriginalArrivalTime: 14 Sep 2011 17:59:00.0591 (UTC) FILETIME=[FB5AF7F0:01CC7307]
Cc: Jonathan Rosenberg <jonathan.rosenberg@skype.net>, codec@ietf.org
Subject: Re: [codec] Discussion around ITU LS
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 14 Sep 2011 17:56:52 -0000

Gregory,

Thank you for your civil tone, serious debate and tradeoff explanation
of the issues here.

[I would have appreciated you "de-personalizing" your response by not
using phrases like "functionality YOU are not interested in" (perhaps
you meant that the present OPUS developers are not interested in), or
"[functionality] YOU'RE not willing" (perhaps meaning a particular
implementer not willing and not me personally).]

I will address some of your issues in-line (with "MAR:"), but the main
point is that if the time warping capability was added ... a lot of the
present criticism leveled at the CODEC WG process would be evaporate.
Your other points are appreciated.

Regards,

Michael Ramalho

-----Original Message-----
From: Gregory Maxwell [mailto:gmaxwell@juniper.net] 
Sent: Wednesday, September 14, 2011 11:40 AM
To: Michael Ramalho (mramalho); Anisse Taleb; Jean-Marc Valin
Cc: Jonathan Rosenberg; codec@ietf.org
Subject: RE: [codec] Discussion around ITU LS

Michael Ramalho (mramalho) [mramalho@cisco.com]:
> I find it amazing that Anisse's constructive comments are being met
with
> such resistance ... as such capabilities were touted as a major reason
> why this work needed to be performed in the IETF.

I find it unfortunate that Jean-Marc's comments are being interpreted as

"resistance" as he plainly invited the contribution of such
functionality
while saying that he still thought it would best be done another way.

(And, for whatever it's worth,  I think he's right:  It's the jitter
buffer
this functionality needs to be integrated with far more so than
the codec itself  I also think that Christian is right and the codec
could
export some information which would be useful)

Personally, I don't understand how simply not jumping at the opportunity
to do a bunch of software development and IPR clearance work for
functionality that you aren't interested in, won't use, and think 
could better be done another way can fairly be called resistance.

MAR (took me a while to absorb the 3x double negative): Assuming that
you intend to say that the OPUS developers don't want to spend
cycles developing a capability they personally intend to do another
way - I understand that. That wasn't the point of my email. Although
I responded to Jean-Marc, my reply was intended for the entire WG.
My apologies to Jean-Marc if my reply was too specific to his prior
post and to him personally.

I also think you should take a careful look at the old discussion
surrounding the requirements like this- you'll see that there was
clear opposition to reinventing the whole of the internet inside the
codec.  There is a careful balancing act.

Generally I think we should avoid adding functionality to the codec
specification which isn't itself important for interoperability or
making
the codec functional.

MAR: Please humor me here. Overlap and add functionality was developed
in the 1970s, PSOLA (pitch synchronous overlap and add) commonplace in
the 1980s and the coder already has a knowledge of pitch (for speech).
How difficult would it be to add a "speed parameter" to the codec
whose value is 1.00 99+% of the time and modulated from 1.00 only
as the jitter buffer nears its high or low water mark? This would
likely be one of the lowest complexity functions to be added to the
decoder (overlap and add and correlation function ONLY invoked when
speed parameter != 1.0).

I expect future Opus implementers will add a variety of useful things
to their implementations- better error concealment, more intelligent
encoders, echo cancellation, etc.  We can't hope to match in finite
time the quality and brilliance of the work the whole world can
produce in the indefinite future.

I would probably have let out the loss concealment and the _encoder_
from the draft,  except that having these things was essential for
proving that a codec meeting the community's needs would actually
be available to them and it's needed for the reference implementation
to be functionally complete. But I can't say the same thing about
stretching.

Or perhaps there is a miscommunication here:  Opus _does_ have
rudimentary "stretching" of the at least of the level required
to make the code usable over the internet:  If you invoke the decoder
without input (null data buffer) it will generate filler audio, making
the signal longer. Likewise, if you skip a packet on decode (especially
an inactive one) you'll shorten the signal without anything bad
happening.

MAR: Does the decoder know when small time length phonemes are being
produced (e.g., "t") and know NOT to drop those particular frames? I
would be OK with your suggestion if you had "voiced/unvoiced"
determination exposed outside of the decoder (and instructed the
application to preferentially drop only during known voiced or
silence periods) ... or derivatives of this determination (an
indication of how important this phoneme is to intelligibility).

This works fine, and it's how all of the pre-final adoption of CELT
in the open source world has worked. As far as I know _none_
of our great many early adopters have requested anything
more.

When JM and I hear complaint that Opus doesn't provide "time
stretching" we're mostly thinking about  high complexity/high fidelity
algorithms for this purpose instead of that basic one.  We think of
Non-basic functionality which isn't needed to interop, which will have
a platform and application specific nature- (since you can usually
afford better stretching on a desktop than on a phone), and which
can be added by anyone at any time in the future...  

MAR: I addressed this above. I (personally) am not asking for
this capability - but incorporating a rudimentary (not high
complexity) version of it is simple (unless you disagree with
what I stated above) AND it deflates the whole "you guys
don't have time warping capabilities" argument.

One way to address this, if people consider it important, would be
to develop another draft for that purpose.  It's something I would
have instantly proposed in response to the request- except I know
that many people here are concerned about working group
scope creep.  

(Though, if it isn't opposed it as scope creep as part of the
codec draft, why oppose it as a freestanding draft?)

> How exactly is OPUS technically differentiated (other than marginal
> differences in quality at bit rates within a factor of ~ 1.3x *) from
> existing codecs developed in other SDOs?

Can you suggest a comparable codec which supports latency at the 
5ms level? Another codec that can operate at _any_ bitrate congestion
control permits it?  Another codec that supports good fidelity speech at
11kbit/sec? another codec that gives good stereo music at 64kbit/sec?
Transparent stereo music near 100kbit/sec while giving latencies which
are acceptable for communication?  Another with similar licensing
status to Opus?

MAR: I put the adjective "technically" on purpose; your last sentence
doesn't quality as a response to my question.

MAR: That being said, you make good points - although many codecs have
one or more of these features (e.g., G.711.0 admittedly narrow-band
only data compression algorithm operates down to 5ms to address your
first point and is royalty-free for softclient use).

You can find examples for each of these, though you'll find
that Opus is still best of breed in each considered alone- but
you can not find anything that matches the composition overall or
even comes close.

When I look at the applications and requirements the strongest theme
visible to me is this theme of versatility- it's one that underlies many
of the devices being added to the network today (such as smartphones,
with their 1,00 1 "apps") as well as the the trends in webbrowser 
technology where small collections of flexible tools and development
glue are being used to invent applications that where never dreamed
of by the developers of the infrastructure.

We are already _well_ differentiated, and if you're not willing to
accept that then why would you accept that adding additional
stretching would resolve it? After all, the same code could be copied
to the implementation of any other codec.

MAR: Hoping your use of "you're not willing" isn't directed at me
personally, as that wasn't the purpose of my email. My point was that
if such capability was added ... I don't see any other major complaints
on the list ... and lack of such functionality was a major rationale
for the work being performed in the IETF. I don't see that the WG
would care if your supposition was acceptable to me personally anyway
(I am only one hum).

MAR: Thanks again for your reasoned reply.