Re: [codec] #16: Multicast? (Bluetooth)

"Raymond (Juin-Hwey) Chen" <> Wed, 05 May 2010 01:56 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 4A82D3A69C3 for <>; Tue, 4 May 2010 18:56:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -0.184
X-Spam-Status: No, score=-0.184 tagged_above=-999 required=5 tests=[AWL=-0.186, BAYES_50=0.001, HTML_MESSAGE=0.001]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id Zzndn3V+qpTt for <>; Tue, 4 May 2010 18:55:43 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id C5B103A6898 for <>; Tue, 4 May 2010 18:55:43 -0700 (PDT)
Received: from [] by with ESMTP (Broadcom SMTP Relay (Email Firewall v6.3.2)); Tue, 04 May 2010 18:55:14 -0700
X-Server-Uuid: B55A25B1-5D7D-41F8-BC53-C57E7AD3C201
Received: from ([]) by ([]) with mapi; Tue, 4 May 2010 18:56:37 -0700
From: "Raymond (Juin-Hwey) Chen" <>
To: Koen Vos <>
Date: Tue, 04 May 2010 18:55:11 -0700
Thread-Topic: [codec] #16: Multicast? (Bluetooth)
Thread-Index: AcrpBHLO2ccEesD8TIS/TTbwa+emYQC7Lr4A
Message-ID: <>
References: <> <> <> <> <> <000001cae173$dba012f0$92e038d0$@de> <> <001101cae177$e8aa6780$b9ff3680$@de> <> <002d01cae188$a330b2c0$e9921840$@de> <> <> <> <> <> <>
In-Reply-To: <>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
MIME-Version: 1.0
X-WSS-ID: 67FE0F8831G114452481-01-01
Content-Type: multipart/alternative; boundary="_000_CB68DF4CFBEF4942881AD37AE1A7E8C74B90345536IRVEXCHCCR01c_"
Cc: "" <>
Subject: Re: [codec] #16: Multicast? (Bluetooth)
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 05 May 2010 01:56:14 -0000

Hi Koen,

In the same order as your numbered list below:

(1) True, Bluetooth != Internet for now, but why not look into the future and explore what is possible and will be very good to have for the future?

(2) Your argument here doesn't make sense to me.  For PC-to-PC calls, there is no reason to use an ultra-low-complexity mode, so you don't need to suffer the lower coding efficiency, and therefore your concern is not relevant. For any call that involves a Bluetooth headset, it would be better to use a low-complexity mode of the IETF codec on the Bluetooth headset than to go through CVSD transcoding and suffer the significant quality degradation of CVSD and the additional coding delay due to transcoding.


a) The reason in 2) is not a reason as I explained above.

b) Didn't you say Moore's Law will take care of that? :-) Furthermore, I am not sure it is necessary for the Bluetooth headset to run the entire VoIP stack.

c) It is not at all clear that doing PLC two times on each of the two lossy links is necessarily better than doing PLC just once after the packets go through the two lossy links.  It may well be that the latter approach is better, considering that the transcoding distortion of the second link may go up substantially when encoding the output of the first PLC for the first lossy link.

I) Standardizing two separate codecs takes more time and effort and requires transcoding which increases total coding distortion and total coding delay.

II) For you and some, it is out of the scope, but for others it is not.  Different people have different views.  My view is that if we can cover this very useful usage scenario without too much trouble, why leave it out?


-----Original Message-----
From: Koen Vos []
Sent: Saturday, May 01, 2010 1:01 AM
To: Raymond (Juin-Hwey) Chen
Subject: RE: [codec] #16: Multicast? (Bluetooth)

Hi Raymond,

I continue to fail to see the connection between the Internet codec

and Bluetooth, for the reasons below.

(1) Bluetooth != Internet:

Bluetooth devices are wireless audio devices, not VoIP end points, and

are indeed used mostly for (mobile) PSTN calls.

(2) Diverging requirements:

A codec/mode that meets the BT requirements for ultra-low complexity

will have a relatively poor coding efficiency, resulting in lower

audio quality and/or a higher bitrate.  Both of these negatively

impact the user experience over the Internet.  Therefore, you do not

want to run a BT codec over the Internet if you can use a more

efficient codec instead.

(3) Transcoding:

Even when using a BT audio device, a well-designed VoIP end point will

always transcode between the Internet codec and the BT codec, because:

   a) the reason given in 2) above

   b) the BT device lacks the CPU power and memory to run the entire VoIP stack

   c) it allows for a packet-loss concealment operation in between two

lossy lags of the end-to-end connection.

Note that such transcoding is also standard with DECT devices, where

base stations even transcode between G.722 and G.722 (yes: twice the

same codec).

In short, there is no benefit from the BT and Internet codecs being

modes of one and the same codec.  This complete lack of overlap means


   I) it is better to standardize two separate codecs

  II) Bluetooth is out of scope for the Internet codec.



Quoting "Raymond (Juin-Hwey) Chen":

> Hi Koen,


> For some reason the SPAM filter accidentally routed your email below

> (sent last Sunday) to my junk email folder and I just saw it. Sorry

> about the delay of my response.


> I agree that there are some fundamental differences in the

> requirements for cellular codecs and Bluetooth codecs which caused

> the codecs in these two types of devices to each go their own way.

> However, these differences are (or can be) substantially smaller

> between an Internet codec and Bluetooth codecs, so I think it is

> easier for Internet devices and Bluetooth devices to use the same

> codec to avoid the additional delay and coding distortion of

> transcoding.


> (1) Royalty-free requirement:

> Cellular codecs are usually royalty-bearing, and that's acceptable

> in the cellular world.  Not so with Bluetooth.  Bluetooth devices

> are meant to be simple and low cost.  As such, Bluetooth SIG

> basically only wants to standardize royalty-free technologies.

> That's an important reason why they picked the CVSD codec, a

> royalty-free old technology of 1970.  We are trying to make the IETF

> codec royalty-free, so in this regard this goal is consistent with

> the Bluetooth SIG's royalty-free requirement for codec.


> (2) Bit-rate requirement:

> Cellular radio spectrum is a limited, fixed resource that doesn't

> change with time, and cellular operators spent billions of dollars

> in radio spectrum auctions. Thus, it is extremely important for

> cellular codecs to have bit-rates as low as possible, with an

> average bit-rate often going below 1 bit/sample, to maximize the

> number of cellular subscribers a given amount of radio spectrum can

> support.  In contrast, the bit-rate is not nearly as big a concern

> for Bluetooth. Initially Bluetooth SIG picked the relatively

> high-bit-rate 64 kb/s CVSD narrowband codec (8 bits/sample) for its

> simplicity and royalty-free nature among other things.  Since the

> speeds of the Internet back bone and access networks keep growing

> with time, the bit-rate of an Internet codec is also not nearly as

> big a concern as in cellular codecs, and an Internet codec around 2

> bits/sample can have better trade-offs (e.g. higher quality, lower

> delay, and lower complexity) for Internet applications than what

> cellular codecs can provide.  Incidentally, Bluetooth SIG is moving

> toward 4 bits/sample.  As you can see, in terms of the bit-rate

> requirement, an Internet codec is much closer to Bluetooth codecs

> than cellular codecs are.


> (3) Complexity requirement:

> Bluetooth headsets have much lower processing power and much smaller

> batteries than cell phones. The complexity of cellular codecs,

> typically in the range of 20 to 40 MHz on a DSP, is too high to fit

> most Bluetooth headsets. However, unlike cell phones and Bluetooth

> headsets where each is a specific type of device with a relatively

> narrow range of device complexity, Internet voice/audio applications

> can potentially encompass a large variety of different device types,

> from desktop computers at the high end with > 3 GHz multi-core CPU

> to IP phones and possibly even Bluetooth headsets at the low end

> with a processor of only a few tens of MHz.  It is up to the IETF

> codec WG how we want the complexity of the IETF codec to be.  We can

> standardize just one codec mode that works well for

> computer-to-computer calls but can't fit in low-end devices, or we

> can keep that mode but also have a low-complexity mode that can be

> implemented in low-end devices.  Frankly, I think the second

> approach makes much more sense since it allows many more devices to

> benefit from the IETF codec and enables the large number of

> Bluetooth headset users to avoid the additional distortion and delay

> associated with transcoding when making Internet calls.


> (4) Delay requirement: Due to the need for cellular codecs to

> achieve bit-rates as low as possible, they sacrificed the coding

> delay and used a 20 ms frame size, because using a 10 or 5 ms frame

> size would increase the bit-rate for a given level of speech

> quality.  On the other hand, a Bluetooth headset needs to have a low

> delay since its delay is added to the already long cell phone delay.

>  For the IETF codec, again it is up to the codec WG to decide what

> kind of codec delay we want, and again I think it makes sense to

> have a higher-delay, higher bit-rate efficiency mode for

> bit-rate-sensitive applications and another low-delay mode for

> delay-sensitive applications, since one size doesn't fit all.  If

> the IETF codec delay is forced to be one size, the resulting codec

> will be (potentially very) suboptimal for some applications.


> You wrote:

>> Do you think it's realistic for us to come up with a design that

>> fulfills the needs of both worlds?


> With a one-size-fit-all approach, probably not, but with a

> multi-mode approach, then I think so.


> Best Regards,


> Raymond


> -----Original Message-----

> From: Koen Vos []

> Sent: Sunday, April 25, 2010 4:01 AM

> To: Raymond (Juin-Hwey) Chen

> Cc:

> Subject: RE: [codec] #16: Multicast? (Bluetooth)


> Hi Raymond,


> You seem to suggest that the IETF Internet codec should fit Bluetooth

> requirements in order to enable transcoding-free operation all the way

> from the Internet, through the Internet-connected device, to the BT

> wireless audio device.


> A similar argument would hold for ITU-T cellular codecs: AMR-WB and

> G.718 could have been designed with BT as an application.  In reality,

> these codecs have very little in common with BT codecs, because of the

> vastly different requirements in terms of

> - complexity

> - memory footprint

> - bitrate

> - scalability

> - bit error robustness

> - packet loss robustness.


> Do you think it's realistic for us to come up with a design that

> fulfills the needs of both worlds?


> The alternative is to separately design codecs for Internet

> applications and BT devices, and continue the practice of transcoding

> on the Internet-connected device.  That would have a better chance of

> maximizing quality in all scenarios.


> best,

> koen.



> Quoting "Raymond (Juin-Hwey) Chen":


>> Hi Koen,


>> Responding to your earlier email about Bluetooth headset application:


>> (1) Although BT SIG standardization is a preferred route, it is

>> technically feasible to negotiate and use a non-Bluetooth-SIG codec.


>> (2) Someone familiar with BT SIG told me that it would probably take

>> only 6 months to add an optional codec to the BT SIG spec and 12 to

>> 18 months to add a mandatory codec.


>> (3) The IETF codec is scheduled to be finalized in 14 months and

>> submitted to IESG in 18 months.  Even if we take the BT SIG route

>> and take 6 to 18 months there.  The total time of 2 to 3 years from

>> now means the Moore's Law would only increase the CPU resources 2X

>> to 3X, and definitely no more than 4X max, not 10X.


>> (4) Most importantly, guess what, in the last several years the

>> Bluetooth headset chips have been growing its processing power at a

>> MUCH, MUCH slower rate than what the Moore's Law says it should.

>> Sometimes they did not increase the speed at all for years.  The

>> reasons? The ASP (average sale price) of Bluetooth chips plummeted

>> very badly, making it unattractive to invest significant resources

>> to make them significantly faster.  Also, for low-end and mid-end BT

>> headsets, the BT chips were often considered "good enough" and there

>> wasn't a strong drive to increase the computing resources.  In

>> addition, the BT headsets got smaller over the last few years; the

>> corresponding reduction in battery size required a reduction in

>> power consumption, which also limited how fast the processor speed

>> could grow.  In the next several years, it is highly likely that the

>> computing capabilities of Bluetooth headset chips will continue to

>> grow at a rate substantially below what's predicted by the Moore's

>> Law.


>> (5) Although Bluetooth supports G.711 as an optional codec,

>> basically no one uses it because it is too sensitive to bit errors.

>> Essentially all the BT mono headsets on the market today are

>> narrowband (8 kHz sampling) headsets using CVSD.  There isn't any

>> real wideband support yet, so your comment about G.722 doesn't

>> apply.  Even after wideband-capable BT headsets come out, for many

>> years to come the majority of the BT headsets (especially mid- to

>> low-end) will still be narrowband only, running only CVSD. Hence,

>> the quality degradation of the CVSD transcoding is real and will be

>> with us for quite a while, so it is desirable for the IETF codec to

>> have a low-complexity mode that can directly run on the BT headsets

>> to avoid the quality degradation of CVSD when using BT headsets to

>> make Internet phone calls.


>> (6) Even if you could use G.711 or G.722 in the BT headsets, they

>> both operate at 64 kb/s.  A low-complexity mode of the IETF codec

>> can operate at half or one quarter of that bit-rate.  This will help

>> conserve BT headsets' radio power because of the lower transmit duty

>> cycle.  It will also help the Bluetooth + WiFi co-existence

>> technologies.


>> (7) Already a lot of people are used to using Bluetooth headsets to

>> make phone calls today.  If they have a choice, many of these people

>> will also want to use Bluetooth headsets to make Internet phone

>> calls, not only through computers, but also through smart phones

>> connected to WiFi or cellular networks.  As more and more states and

>> countries pass laws to ban the use of cell phones that are not in

>> hands-free mode while driving, the number of Bluetooth headset users

>> will only increase with time, and many of them will want to make

>> Internet-based phone calls.


>> Given all the above, I would argue that Bluetooth headset is a very

>> relevant application that the IETF codec should address with a

>> low-complexity mode.


>> Best Regards,


>> Raymond


>> -----Original Message-----

>> From: [] On

>> Behalf Of Koen Vos

>> Sent: Friday, April 23, 2010 1:16 AM

>> To:

>> Subject: Re: [codec] #16: Multicast?


>> By the time the BlueTooth Special Interest Group will have adopted a

>> future IETF codec standard, Moore's law will surely have multiplied

>> CPU resources in the BT device by one order of magnitude..?  Not sure

>> it makes sense to apply today's BT constraints to tomorrow's codec.


>> I'm not even convinced BlueTooth is a relevant use case for an

>> Internet codec.  BT devices are audio devices more than VoIP end

>> points: BT always connects to the Internet through another device.

>> You could simply first decode incoming packets and send PCM data to

>> the BT device, or use a high-quality/high-bitrate codec like G.722.

>> The requirements for BT devices and the Internet are just too

>> different.  Similarly, GSM phones use AMR on the network side and a

>> different codec towards the BT device.  The required transcoding

>> causes no quality problems because BT supports high bitrates.


>> best,

>> koen.



>> Quoting Raymond (Juin-Hwey) Chen:


>>> Hi Christian,


>>> My comments about your question of CODEC requirements are in-line.


>>> Raymond


>>> From: [] On

>>> Behalf Of Christian Hoene

>>> Sent: Wednesday, April 21, 2010 12:27 PM

>>> To: 'stephen botzko'

>>> Cc:

>>> Subject: Re: [codec] #16: Multicast?


>>> Hi,


>>> if we take those two scenarios (high quality and scalable

>>> teleconferencing), what are then the CODEC requirements?


>>> High quality:


>>> -          Quite the same requirement as an end-to-end audio

>>> transmission: high quality and low latency.

>>> [Raymond]: High quality is a given, but I would like to emphasize

>>> the importance of low latency.

>>> (1) It is well-known that the longer the latency, the lower the

>>> perceived quality of the communication link.  The E-model in the

>>> ITU-T Recommendation G.107 models such communication quality in

>>> MOS_cqe, which among other things depends on the so-called "delay

>>> impairment factor" Id.  Basically, MOS_cqe is a monotonically

>>> decreasing function of increasing latency, and beyond about 150 ms

>>> one-way delay, the perceived quality of the communication link drops

>>> rapidly with further delay increase.

>>> (2) The lower the latency, the less audible the echo, and thus the

>>> lower the required echo return loss.  Hence, lower latency means

>>> easier echo control and simpler echo canceller, and as people

>>> already mentioned previously, below a certain delay, an echo is

>>> simply perceived as a harmless side-tone and no echo canceller is

>>> needed. It seems to me that echo control in conference calls is more

>>> difficult than in point-to-point calls.  While I hardly ever heard

>>> echoes in domestic point-to-point calls, in my experience with

>>> conference calls at work, even with the G.711 codec (which has

>>> almost no delay), sometimes I still hear echoes (I just heard

>>> another one this afternoon).  If a relatively long-delay IETF codec

>>> is used, the echo control will be even more problematic.

>>> (3) In normal phone calls or conference calls, people routinely have

>>> a need to interrupt each other, but beyond a certain point, long

>>> latency makes it very difficult for people to interrupt each other

>>> on the call.  This is because when you try to interrupt another

>>> person, that person doesn't hear your interruption until a certain

>>> time later, so he keeps talking, but when you hear that he did not

>>> stop talking when you interrupted, you stop; then, he hears your

>>> interruption, so he stops. When you hear he stops, you start talking

>>> again, but then he also hears you stopped (due to the long delay),

>>> so he also starts talking again.  The net result is that with a long

>>> latency, when you try to interrupt him, you and he end up stopping

>>> and starting at roughly the same time for a few cycles, making it

>>> difficult to interrupt each other.

>>> (4) We need to keep in mind that the IETF codec may not be the only

>>> codec involved in a phone call or a conference call.  We cannot

>>> assume that all conference call participants will be using a

>>> computer to conduct the call. Not only do people use cell phones for

>>> point-to-point phone calls, they also often use cell phones to call

>>> in to conference calls.  The one-way delay for a cell phone call

>>> through one carriers network is typically around 80 to 110 ms.  A

>>> call from a cell phone in a carrier network to another cell phone in

>>> a different type of carrier network can easily double this delay to

>>> 160 ~ 220 ms and makes the total one-way delay already far exceeding

>>> the 150 ms mentioned in (1) above.  Any coding delay added by the

>>> IETF codec will be on top of that long delay, and such coding delay

>>> will be applied twice when both cell phones call through the IETF

>>> codec to a conference bridge.  Even without the IETF codec delay,

>>> when I previously called from a Verizon cell phone to an AT&T cell

>>> phone, I already experienced the problem mentioned in (3) sometimes.

>>>  If the IETF codec has a relatively long delay, adding two times the

>>> IETF codec one-way delay to the already long delay of 160 ~ 220 ms

>>> will make the situation much worse.  Even if just one cell phone is

>>> involved in a conference call, adding twice the one-way delay of a

>>> relatively long-delay IETF codec can still easily push the total

>>> one-way delay beyond 150 ms.

>>> To summarize, my point is that to help reduce potential echo

>>> problems and to ensure a high-quality experience in such a

>>> conference call, the IETF codec should have a delay as low as

>>> possible while maintaining good enough speech quality and a

>>> reasonable bit-rate.


>>> -          Maybe additionally: variable bit rate encoding to achieve

>>> a multiplexing gain at the receiver


>>> -          and thus, a fast control loop to cope with variable

>>> bitrates on transmission paths.


>>> -          Maybe stereo/multichannel support to send the spatial

>>> audio to the headphone or loudspeakers.


>>> Scalable:


>>> -          Efficient encoding/transcoding for multiple different

>>> qualities (at the conference bridge)

>>> [Raymond]: I am not sure whether by "efficient", you meant coding

>>> efficiency or computational efficiency.  In any case, I would like

>>> to take this opportunity to express my view that although codec

>>> complexity isn't much of an issue for PC-to-PC calls where there are

>>> GHz of processing power available, the codec complexity is an

>>> important issue in certain application scenarios.  The following are

>>> just some examples.

>>> 1) If a conference bridge has to decode a large number of voice

>>> channels, mix, and re-encode, and if compressed-domain mixing cannot

>>> be done (which is usually the case), then it is important to keep

>>> the decoder complexity low.

>>> 2) In topology b) of your other email

>>> (IPend-to-transcoding_gateway-to-PSTNend), the transcoding gateway,

>>> or VoIP gateway, often has to encode and decode thousands of voice

>>> channels in a single box, so not only the computational complexity,

>>> but also the per-instance RAM size requirement of the codec become

>>> very important for achieving high channel density in the gateway.

>>> 3) Many telephone terminal devices at the edge of the Internet use

>>> embedded processors with limited processing power, and the

>>> processors also have to handle many tasks other than speech coding.

>>> If the IETF codec complexity is too high, some of such devices may

>>> not have sufficient processing power to run it.  Even if the codec

>>> can fit, some battery-powered mobile devices may prefer to run a

>>> lower-complexity codec to reduce power consumption and battery

>>> drain.  For example, even if you make a Internet phone call from a

>>> computer, you may like the convenience of using a Bluetooth headset

>>> that allows you to walk around a bit and have hands-free operation.

>>> Currently most Bluetooth headsets have small form factors with a

>>> tiny battery.  This puts a severe constraint on power consumption.

>>> Bluetooth headset chips typically have very limited processing

>>> capability, and it has to handle many other tasks such as echo

>>> cancellation and noise reduction.  There is just not enough

>>> processing power to handle a relatively high-complexity codec.  Most

>>> BT headsets today relies on the extremely low-complexity,

>>> hardware-based CVSD codec at 64 kb/s to transmit narrowband voice,

>>> but CVSD has audible coding noise, so it degrades the overall audio

>>> quality.  If the IETF codec has low enough complexity, it would be

>>> possible to directly encode and decode the IETF codec bit-stream at

>>> the BT headset, thus avoiding the quality degradation of CVSD

>>> transcoding.

>>> In summary, my point is that the IETF codec should attempt to

>>> achieve a codec complexity as low as possible in both MHz

>>> consumption and RAM size requirement while maintaining good enough

>>> speech quality.


>>> -          The control loop must not react (fast) because

>>> (multicast) group communication requires to encode at low quality

>>> anyhow.


>>> -          Receiver side activity detection for music and voice

>>> having low complexity (for the conference bridge)


>>> -          Efficient mixing of two to four(?) active flows (is this

>>> achievable without the complete process of decoding and encoding

>>> again?)


>>> Are any teleconferencing requirements missing?


>>>  Christian





>>> ---------------------------------------------------------------

>>> Dr.-Ing. Christian Hoene

>>> Interactive Communication Systems (ICS), University of Tübingen

>>> Sand 13, 72076 Tübingen, Germany, Phone +49 7071 2970532



>>> From: stephen botzko []

>>> Sent: Wednesday, April 21, 2010 8:19 PM

>>> To: Christian Hoene

>>> Cc:

>>> Subject: Re: [codec] #16: Multicast?


>>> Inline

>>> On Wed, Apr 21, 2010 at 1:27 PM, Christian Hoene

>>> <<>> wrote:

>>> Hi Stephen,


>>> not too bad. You answered faster than the mailing list distributes...

>>> Not sure how that happened!


>>> Comments inline:



>>> From: stephen botzko

>>> [<>]

>>> Sent: Wednesday, April 21, 2010 7:10 PM

>>> To: Christian Hoene

>>> Cc:<>


>>> Subject: Re: [codec] #16: Multicast?


>>> I agree there are lots of use cases.



>>> Though I don't see why high quality has to be given up in order to

>>> be scalable.

>>> CH: These are just experiences from our lab. A spatial audio

>>> conference server including the acoustic 3D sound rendering needs a

>>> LOT of processing power. In the end, we have to remain realistic.

>>> Processing power is always limited thus if we need a lot then we

>>> cannot serve many clients.

>>> Also, I am not sure why you think central mixing is more scalable

>>> than multicast (or why you think it is lower quality either).

>>> CH: With multicast, you need N times 1:N multicast distribution

>>> trees (somewhat small tan O(n)=n²).  With central mixing you need N

>>> times 2 transmission paths (O(n)=n). Also, this distributed mixing

>>> you need N times the mixing at each client. With centralized, you

>>> can live with one mixing for all (and some tricks for serving the

>>> talkers).

>>> I agree you need more distribution trees for multicast if you allow

>>> every site to talk. There is a corresponding benefit, since there is

>>> no central choke point and also less bandwidth on shared WAN links.


>>> In the distributed case,  you don't need an N-way mixer at each

>>> client, and you also don't need to continuously receive payload on

>>> all N streams at each client either.  In practice you can cap N at a

>>> relatively small number (in the 3-8 range) no matter how large the

>>> conference gets.  In a large conference, you can even choose to drop

>>> your comfort noise if you are receiving two or more streams, and

>>> just send enough to keep your firewall pinhole open.  This is all

>>> assuming a suitable voice activity measure in the RTP packet.  Of

>>> course in the worst case, you will receive all N streams.


>>> Cheers,

>>>  Christian


>>> Stephen Botzko

>>> On Wed, Apr 21, 2010 at 12:58 PM, Christian Hoene

>>> <<>> wrote:

>>> Hi,


>>> the teleconferencing issue gets complex. I am trying to compile the

>>> different requirements that have been mentioned on this list.


>>> -          low complexity (with just one active speaker) vs.

>>> multiple speaker mixing vs. spatial audio/stereo mixing


>>> -          centralized vs. distributed


>>> -          few participants vs. hundreds of listeners and talkers


>>> -          individual distribution of audio streams vs. IP multicast

>>> or RTP group communication


>>> -          efficient encoding of multiple streams having the same

>>> content (but different quality).


>>> -           I bet I missed some.


>>> To make things easier, why not to split the teleconferencing

>>> scenario in two: High quality and Scalable?


>>> The high quality scenario, intended for a low number of users, could

>>> have features like


>>> -          Distributed processing and mixing


>>> -          High computational resources to support spatial audio

>>> mixing (at the receiver) and multiple encodings of the same audio

>>> stream at different qualities (at the sender)


>>> -          Enough bandwidth to allow direct N to N transmissions of

>>> audio streams (no multicast or group communication). This would be

>>> good for the latency, too.


>>> The scalable scenario is the opposite:


>>> -          Central processing and mixing for many participants .


>>> -          N to 1 and 1 to N communication using efficient

>>> distribution mechanisms (RTP group communication and IP multicast).


>>> -          Low complexity mixing of many using tricks like VAD,

>>> encoding at lowest rate to support many receivers having different

>>> paths, you name it...


>>> Then, we need not to compare apples with oranges all the time.


>>> Christian


>>> ---------------------------------------------------------------

>>> Dr.-Ing. Christian Hoene

>>> Interactive Communication Systems (ICS), University of Tübingen

>>> Sand 13, 72076 Tübingen, Germany, Phone +49 7071 2970532



>>> From:<>

>>> [<>] On

>>> Behalf Of stephen botzko

>>> Sent: Wednesday, April 21, 2010 4:34 PM

>>> To: Colin Perkins

>>> Cc:<>;


>>> Subject: Re: [codec] #16: Multicast?


>>> in-line


>>> Stephen Botzko

>>> On Wed, Apr 21, 2010 at 8:17 AM, Colin Perkins

>>> <<>> wrote:

>>> On 21 Apr 2010, at 12:20, Marshall Eubanks wrote:

>>> On Apr 21, 2010, at 6:48 AM, Colin Perkins wrote:

>>> On 21 Apr 2010, at 10:42, codec issue tracker wrote:

>>> #16: Multicast?

>>> ------------------------------------+----------------------------------

>>> Reporter:  hoene@...                 |       Owner:

>>>  Type:  enhancement             |      Status:  new

>>> Priority:  trivial                 |   Milestone:

>>> Component:  requirements            |     Version:

>>> Severity:  Active WG Document      |    Keywords:

>>> ------------------------------------+----------------------------------

>>> The question arrose whether the interactive CODEC MUST support

>>> multicast in addition to teleconferencing.


>>> On 04/13/2010 11:35 AM, Christian Hoene wrote:

>>> P.S. On the same note, does anybody here cares about using this

>>> CODEC with multicast? Is there a single commercial multicast voice

>>> deployment? From what I've seen all multicast does is making IETF

>>> voice standards harder to understand or implement.


>>> I think that would be a mistake to ignore multicast - not because of

>>> multicast itself, but because of Xcast (RFC 5058) which is a

>>> promising technology to replace centralized conference bridges.


>>> Regarding multicast:


>>> I think we shall start at user requirements and scenarios.

>>> Teleconference (including mono or spatial audio) might be good

>>> starting point. Virtual environments like second live would require

>>> multicast communication, too. If the requirements of these scenarios

>>> are well understand, we can start to talk about potential solutions

>>> like IP multicast, Xcast or conference bridges.



>>> RTP is inherently a group communication protocol, and any codec

>>> designed for use with RTP should consider operation in various

>>> different types of group communication scenario (not just

>>> multicast). RFC 5117 is a good place to start when considering the

>>> different types of topology in which RTP is used, and the possible

>>> placement of mixing and switching functions which the codec will

>>> need to work with.


>>> It is not clear to me what supporting multicast would entail here.

>>> If this is a codec over RTP, then what is to stop it from being

>>> multicast ?


>>> Nothing. However group conferences implemented using multicast

>>> require end system mixing of potentially large numbers of active

>>> audio streams, whereas those implemented using conference bridges do

>>> the mixing in a single central location, and generally suppress all

>>> but one speaker. The differences in mixing and the number of

>>> simultaneous active streams that might be received potentially

>>> affect the design of the codec.


>>> Conference bridges with central mixing almost always mix multiple

>>> speakers.  As you add more streams into the mix, you reduce the

>>> chance of missing onset speech and interruptions, but raise the

>>> noise floor. So even if complexity is not a consideration, there is

>>> value in gating the mixer (instead of always doing a full mix-minus).


>>> More on point, compressed domain mixing and easy detection of VAD

>>> have both been advocated on these lists, and both simplify the

>>> large-scale mixing problem.


>>> --

>>> Colin Perkins





>>> _______________________________________________

>>> codec mailing list








>> _______________________________________________

>> codec mailing list