Re: [codec] #16: Multicast?

Jean-Marc Valin <> Fri, 23 April 2010 04:04 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 45E1C3A68F0 for <>; Thu, 22 Apr 2010 21:04:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: 0.001
X-Spam-Status: No, score=0.001 tagged_above=-999 required=5 tests=[BAYES_50=0.001]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id PcRh-jTHhCJj for <>; Thu, 22 Apr 2010 21:04:44 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 42CF03A68D8 for <>; Thu, 22 Apr 2010 21:04:44 -0700 (PDT)
MIME-version: 1.0
Content-transfer-encoding: 8bit
Content-type: text/plain; charset="windows-1252"; format="flowed"
Received: from [] ([]) by (Sun Java(tm) System Messaging Server 6.3-8.01 (built Dec 16 2008; 32bit)) with ESMTP id <> for; Fri, 23 Apr 2010 00:04:33 -0400 (EDT)
Message-id: <>
Date: Fri, 23 Apr 2010 00:04:32 -0400
From: Jean-Marc Valin <>
User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv: Gecko/20100317 Thunderbird/3.0.4
To: "Raymond (Juin-Hwey) Chen" <>
References: <> <> <> <> <> <000001cae173$dba012f0$92e038d0$@de> <> <001101cae177$e8aa6780$b9ff3680$@de> <> <002d01cae188$a330b2c0$e9921840$@de> <>
In-reply-to: <>
Cc: "" <>, 'stephen botzko' <>
Subject: Re: [codec] #16: Multicast?
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 23 Apr 2010 04:04:45 -0000


See me comments below.

> [Raymond]: High quality is a given, but I would like to emphasize the 
> importance of low latency.
> (1) It is well-known that the longer the latency, the lower the 
> perceived quality of the communication link. The E-model in the ITU-T 
> Recommendation G.107 models such communication quality in MOS_cqe, 
> which among other things depends on the so-called “delay impairment 
> factor” /Id/. Basically, MOS_cqe is a monotonically decreasing 
> function of increasing latency, and beyond about 150 ms one-way delay, 
> the perceived quality of the communication link drops rapidly with 
> further delay increase.

As the author of CELT, I obviously agree that latency is an important 
aspect for this codec :-) That being said, I tend to say that 20 ms is 
still the most widely used frame size, so we might as well optimise for 
that. This is not really a problem because as the frame size goes down, 
the overhead of the IP/UDP/RTP headers go up, so the codec bit-rate 
becomes a bit less of an issue. For example, with 5 ms frames, we would 
already be sending 64 kb/s worth of headers (excluding the link layer), 
so we might as well spend about as many bits on the actual payload as we 
spend on the headers. And with 64 kb/s of payload, we can actually have 
high-quality full-band audio.

> 1) If a conference bridge has to decode a large number of voice 
> channels, mix, and re-encode, and if compressed-domain mixing cannot 
> be done (which is usually the case), then it is important to keep the 
> decoder complexity low.

Definitely agree here. The decoder complexity is very important. Not 
only because of mixing issue, but also because the decoder is generally 
not allowed to take shortcuts to save on complexity (unlike the 
encoder). As for compressed-domain mixing, as you say it is not always 
available, but *if* we can do it (even if only partially), then that can 
result in a "free" reduction in decoder complexity for mixing.

> 2) In topology b) of your other email 
> (IPend-to-transcoding_gateway-to-PSTNend), the transcoding gateway, or 
> VoIP gateway, often has to encode and decode thousands of voice 
> channels in a single box, so not only the computational complexity, 
> but also the per-instance RAM size requirement of the codec become 
> very important for achieving high channel density in the gateway.

Agreed here, although I would say that per-instance RAM -- as long as 
it's reasonable -- is probably a bit less important than complexity.

> 3) Many telephone terminal devices at the edge of the Internet use 
> embedded processors with limited processing power, and the processors 
> also have to handle many tasks other than speech coding. If the IETF 
> codec complexity is too high, some of such devices may not have 
> sufficient processing power to run it. Even if the codec can fit, some 
> battery-powered mobile devices may prefer to run a lower-complexity 
> codec to reduce power consumption and battery drain. For example, even 
> if you make a Internet phone call from a computer, you may like the 
> convenience of using a Bluetooth headset that allows you to walk 
> around a bit and have hands-free operation. Currently most Bluetooth 
> headsets have small form factors with a tiny battery. This puts a 
> severe constraint on power consumption. Bluetooth headset chips 
> typically have very limited processing capability, and it has to 
> handle many other tasks such as echo cancellation and noise reduction. 
> There is just not enough processing power to handle a relatively 
> high-complexity codec. Most BT headsets today relies on the extremely 
> low-complexity, hardware-based CVSD codec at 64 kb/s to transmit 
> narrowband voice, but CVSD has audible coding noise, so it degrades 
> the overall audio quality. If the IETF codec has low enough 
> complexity, it would be possible to directly encode and decode the 
> IETF codec bit-stream at the BT headset, thus avoiding the quality 
> degradation of CVSD transcoding.

Any idea what the complexity requirements would be for this use-case to 
be possible?