Re: [codec] Discussion around ITU LS
"Michael Ramalho (mramalho)" <mramalho@cisco.com> Wed, 14 September 2011 17:56 UTC
Return-Path: <mramalho@cisco.com>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 91A2F21F8AF7 for <codec@ietfa.amsl.com>; Wed, 14 Sep 2011 10:56:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.243
X-Spam-Level:
X-Spam-Status: No, score=-2.243 tagged_above=-999 required=5 tests=[AWL=0.356, BAYES_00=-2.599]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 75sH2sDqPeHO for <codec@ietfa.amsl.com>; Wed, 14 Sep 2011 10:56:51 -0700 (PDT)
Received: from rcdn-iport-3.cisco.com (rcdn-iport-3.cisco.com [173.37.86.74]) by ietfa.amsl.com (Postfix) with ESMTP id 388C521F8ABC for <codec@ietf.org>; Wed, 14 Sep 2011 10:56:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=mramalho@cisco.com; l=8839; q=dns/txt; s=iport; t=1316023141; x=1317232741; h=mime-version:content-transfer-encoding:subject:date: message-id:in-reply-to:references:from:to:cc; bh=HzAtdqFdSwghgc7Zl/z09NFpZQz6RwQU8pkcV6hMhps=; b=lsZbI2Nj3CMXYZCaM+ODSuXHbxVsAXrp3DqHmJ9jEYZxG0uWx9/iVvXy ikEA9HFjRT8SKdcTFF1wpITCtNTWW3Qk34CuFgfRIFMm/zPfc/zul5veF u59ggLZAthzV60KlfDssfD3sOD23BVFkXQlX7KUAEVs/bs+0OftQcFRTw o=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ar4AAErqcE6tJXG8/2dsb2JhbABCmEuOe3iBUwEBAQECARIBHQo/BQcEAgEIDgMEAQELBhcBBgFFCQgCBAESCAwOh1WWXQGeVIYOYASHbpB1jB8
X-IronPort-AV: E=Sophos;i="4.68,381,1312156800"; d="scan'208";a="21481465"
Received: from rcdn-core2-1.cisco.com ([173.37.113.188]) by rcdn-iport-3.cisco.com with ESMTP; 14 Sep 2011 17:59:00 +0000
Received: from xbh-rcd-202.cisco.com (xbh-rcd-202.cisco.com [72.163.62.201]) by rcdn-core2-1.cisco.com (8.14.3/8.14.3) with ESMTP id p8EHx0w9017784; Wed, 14 Sep 2011 17:59:00 GMT
Received: from xmb-rcd-209.cisco.com ([72.163.62.216]) by xbh-rcd-202.cisco.com with Microsoft SMTPSVC(6.0.3790.4675); Wed, 14 Sep 2011 12:59:00 -0500
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Date: Wed, 14 Sep 2011 12:58:57 -0500
Message-ID: <999109E6BC528947A871CDEB5EB908A0049A1E38@XMB-RCD-209.cisco.com>
In-Reply-To: <BCB3F026FAC4C145A4A3330806FEFDA93CFB906CB3@EMBX01-HQ.jnpr.net>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: [codec] Discussion around ITU LS
Thread-Index: AQHMbNdFqNL2VxoWr0KOLjBFjn6nQ5VBQIaAgAIGLdWAADZ+gIAF5oZEgAKlxYCAALuIO4AAHcCQgAAW7ASAABq28A==
References: <35921B63-3FBC-411D-B587-4AB81F218E57@cisco.com><4E66F111.9070008@mozilla.com><6A58A83F7040374B9FB4EEEDBD835512A3FBAA@LHREML503-MBX.china.huawei.com><4E68D175.9090703@mozilla.com><6A58A83F7040374B9FB4EEEDBD835512A400EE@LHREML503-MBX.china.huawei.com><4E6FFD21.4090801@mozilla.com><6A58A83F7040374B9FB4EEEDBD835512A404F2@LHREML503-MBX.china.huawei.com>, <999109E6BC528947A871CDEB5EB908A0049A1C5C@XMB-RCD-209.cisco.com> <BCB3F026FAC4C145A4A3330806FEFDA93CFB906CB3@EMBX01-HQ.jnpr.net>
From: "Michael Ramalho (mramalho)" <mramalho@cisco.com>
To: Gregory Maxwell <gmaxwell@juniper.net>, Anisse Taleb <Anisse.Taleb@huawei.com>, Jean-Marc Valin <jmvalin@mozilla.com>
X-OriginalArrivalTime: 14 Sep 2011 17:59:00.0591 (UTC) FILETIME=[FB5AF7F0:01CC7307]
Cc: Jonathan Rosenberg <jonathan.rosenberg@skype.net>, codec@ietf.org
Subject: Re: [codec] Discussion around ITU LS
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 14 Sep 2011 17:56:52 -0000
Gregory, Thank you for your civil tone, serious debate and tradeoff explanation of the issues here. [I would have appreciated you "de-personalizing" your response by not using phrases like "functionality YOU are not interested in" (perhaps you meant that the present OPUS developers are not interested in), or "[functionality] YOU'RE not willing" (perhaps meaning a particular implementer not willing and not me personally).] I will address some of your issues in-line (with "MAR:"), but the main point is that if the time warping capability was added ... a lot of the present criticism leveled at the CODEC WG process would be evaporate. Your other points are appreciated. Regards, Michael Ramalho -----Original Message----- From: Gregory Maxwell [mailto:gmaxwell@juniper.net] Sent: Wednesday, September 14, 2011 11:40 AM To: Michael Ramalho (mramalho); Anisse Taleb; Jean-Marc Valin Cc: Jonathan Rosenberg; codec@ietf.org Subject: RE: [codec] Discussion around ITU LS Michael Ramalho (mramalho) [mramalho@cisco.com]: > I find it amazing that Anisse's constructive comments are being met with > such resistance ... as such capabilities were touted as a major reason > why this work needed to be performed in the IETF. I find it unfortunate that Jean-Marc's comments are being interpreted as "resistance" as he plainly invited the contribution of such functionality while saying that he still thought it would best be done another way. (And, for whatever it's worth, I think he's right: It's the jitter buffer this functionality needs to be integrated with far more so than the codec itself I also think that Christian is right and the codec could export some information which would be useful) Personally, I don't understand how simply not jumping at the opportunity to do a bunch of software development and IPR clearance work for functionality that you aren't interested in, won't use, and think could better be done another way can fairly be called resistance. MAR (took me a while to absorb the 3x double negative): Assuming that you intend to say that the OPUS developers don't want to spend cycles developing a capability they personally intend to do another way - I understand that. That wasn't the point of my email. Although I responded to Jean-Marc, my reply was intended for the entire WG. My apologies to Jean-Marc if my reply was too specific to his prior post and to him personally. I also think you should take a careful look at the old discussion surrounding the requirements like this- you'll see that there was clear opposition to reinventing the whole of the internet inside the codec. There is a careful balancing act. Generally I think we should avoid adding functionality to the codec specification which isn't itself important for interoperability or making the codec functional. MAR: Please humor me here. Overlap and add functionality was developed in the 1970s, PSOLA (pitch synchronous overlap and add) commonplace in the 1980s and the coder already has a knowledge of pitch (for speech). How difficult would it be to add a "speed parameter" to the codec whose value is 1.00 99+% of the time and modulated from 1.00 only as the jitter buffer nears its high or low water mark? This would likely be one of the lowest complexity functions to be added to the decoder (overlap and add and correlation function ONLY invoked when speed parameter != 1.0). I expect future Opus implementers will add a variety of useful things to their implementations- better error concealment, more intelligent encoders, echo cancellation, etc. We can't hope to match in finite time the quality and brilliance of the work the whole world can produce in the indefinite future. I would probably have let out the loss concealment and the _encoder_ from the draft, except that having these things was essential for proving that a codec meeting the community's needs would actually be available to them and it's needed for the reference implementation to be functionally complete. But I can't say the same thing about stretching. Or perhaps there is a miscommunication here: Opus _does_ have rudimentary "stretching" of the at least of the level required to make the code usable over the internet: If you invoke the decoder without input (null data buffer) it will generate filler audio, making the signal longer. Likewise, if you skip a packet on decode (especially an inactive one) you'll shorten the signal without anything bad happening. MAR: Does the decoder know when small time length phonemes are being produced (e.g., "t") and know NOT to drop those particular frames? I would be OK with your suggestion if you had "voiced/unvoiced" determination exposed outside of the decoder (and instructed the application to preferentially drop only during known voiced or silence periods) ... or derivatives of this determination (an indication of how important this phoneme is to intelligibility). This works fine, and it's how all of the pre-final adoption of CELT in the open source world has worked. As far as I know _none_ of our great many early adopters have requested anything more. When JM and I hear complaint that Opus doesn't provide "time stretching" we're mostly thinking about high complexity/high fidelity algorithms for this purpose instead of that basic one. We think of Non-basic functionality which isn't needed to interop, which will have a platform and application specific nature- (since you can usually afford better stretching on a desktop than on a phone), and which can be added by anyone at any time in the future... MAR: I addressed this above. I (personally) am not asking for this capability - but incorporating a rudimentary (not high complexity) version of it is simple (unless you disagree with what I stated above) AND it deflates the whole "you guys don't have time warping capabilities" argument. One way to address this, if people consider it important, would be to develop another draft for that purpose. It's something I would have instantly proposed in response to the request- except I know that many people here are concerned about working group scope creep. (Though, if it isn't opposed it as scope creep as part of the codec draft, why oppose it as a freestanding draft?) > How exactly is OPUS technically differentiated (other than marginal > differences in quality at bit rates within a factor of ~ 1.3x *) from > existing codecs developed in other SDOs? Can you suggest a comparable codec which supports latency at the 5ms level? Another codec that can operate at _any_ bitrate congestion control permits it? Another codec that supports good fidelity speech at 11kbit/sec? another codec that gives good stereo music at 64kbit/sec? Transparent stereo music near 100kbit/sec while giving latencies which are acceptable for communication? Another with similar licensing status to Opus? MAR: I put the adjective "technically" on purpose; your last sentence doesn't quality as a response to my question. MAR: That being said, you make good points - although many codecs have one or more of these features (e.g., G.711.0 admittedly narrow-band only data compression algorithm operates down to 5ms to address your first point and is royalty-free for softclient use). You can find examples for each of these, though you'll find that Opus is still best of breed in each considered alone- but you can not find anything that matches the composition overall or even comes close. When I look at the applications and requirements the strongest theme visible to me is this theme of versatility- it's one that underlies many of the devices being added to the network today (such as smartphones, with their 1,00 1 "apps") as well as the the trends in webbrowser technology where small collections of flexible tools and development glue are being used to invent applications that where never dreamed of by the developers of the infrastructure. We are already _well_ differentiated, and if you're not willing to accept that then why would you accept that adding additional stretching would resolve it? After all, the same code could be copied to the implementation of any other codec. MAR: Hoping your use of "you're not willing" isn't directed at me personally, as that wasn't the purpose of my email. My point was that if such capability was added ... I don't see any other major complaints on the list ... and lack of such functionality was a major rationale for the work being performed in the IETF. I don't see that the WG would care if your supposition was acceptable to me personally anyway (I am only one hum). MAR: Thanks again for your reasoned reply.
- [codec] Discussion around ITU LS Cullen Jennings
- Re: [codec] Discussion around ITU LS Stephan Wenger
- Re: [codec] Discussion around ITU LS Monty Montgomery
- Re: [codec] Discussion around ITU LS Jean-Marc Valin
- Re: [codec] Discussion around ITU LS Gregory Maxwell
- Re: [codec] Discussion around ITU LS Anisse Taleb
- Re: [codec] Discussion around ITU LS Jean-Marc Valin
- Re: [codec] Discussion around ITU LS Anisse Taleb
- Re: [codec] Discussion around ITU LS Jean-Marc Valin
- Re: [codec] Discussion around ITU LS Christian Hoene
- Re: [codec] Discussion around ITU LS Anisse Taleb
- Re: [codec] Discussion around ITU LS Michael Ramalho (mramalho)
- Re: [codec] Discussion around ITU LS Benjamin M. Schwartz
- Re: [codec] Discussion around ITU LS Gregory Maxwell
- Re: [codec] Discussion around ITU LS Jean-Marc Valin
- Re: [codec] Discussion around ITU LS Jean-Marc Valin
- Re: [codec] Discussion around ITU LS Michael Ramalho (mramalho)
- Re: [codec] Discussion around ITU LS Benjamin M. Schwartz
- Re: [codec] Discussion around ITU LS Benjamin M. Schwartz
- Re: [codec] Discussion around ITU LS Anisse Taleb
- Re: [codec] Discussion around ITU LS Koen Vos