Re: [codec] #16: Multicast? (Bluetooth)

stephen botzko <stephen.botzko@gmail.com> Fri, 30 April 2010 11:35 UTC

Return-Path: <stephen.botzko@gmail.com>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 6C4EC28C228 for <codec@core3.amsl.com>; Fri, 30 Apr 2010 04:35:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.852
X-Spam-Level:
X-Spam-Status: No, score=-0.852 tagged_above=-999 required=5 tests=[AWL=-0.854, BAYES_50=0.001, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nguRXAn4GGWl for <codec@core3.amsl.com>; Fri, 30 Apr 2010 04:35:39 -0700 (PDT)
Received: from mail-ww0-f44.google.com (mail-ww0-f44.google.com [74.125.82.44]) by core3.amsl.com (Postfix) with ESMTP id 1BB2528C214 for <codec@ietf.org>; Fri, 30 Apr 2010 04:35:05 -0700 (PDT)
Received: by wwb24 with SMTP id 24so75639wwb.31 for <codec@ietf.org>; Fri, 30 Apr 2010 04:34:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=BWS+5UTS+RBQe2NihxIeaB5pdxCvAcfT7A0b3BzK2+A=; b=BwgG5gypGfdInVIXIlisqI7EUHGTOyOzF7nxaIa4yHT9pZw+NXiBVv82BP4sxyM9F0 jVYK23L4Fw+bV0gc3zd2bj0fmPzZKV6/73gUaptPPe7FHDaRO5bkaYmMXpXn6GIAn1Rg mKp+NtpHv/1qxKHH6CBQixfC0cPRxpWID3Pr0=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=CRgW4bg56x7HF6ZdGS/2+LQFzbhlmSzOm5v9yRYj+hO9vx+tEDWhahLD5pyXmFjHG2 8nLX0xpcnA8/QEd42zeukF4XI0f161Ko3va8FrJwiSfLb7qLbZ98Z6Avgn6lhb8LnXsh nQqVlQ/ogV+BWBGowu9lpNCauc7TfxXyQ5Fwk=
MIME-Version: 1.0
Received: by 10.216.86.7 with SMTP id v7mr4078154wee.191.1272627288513; Fri, 30 Apr 2010 04:34:48 -0700 (PDT)
Received: by 10.216.28.139 with HTTP; Fri, 30 Apr 2010 04:34:48 -0700 (PDT)
In-Reply-To: <CB68DF4CFBEF4942881AD37AE1A7E8C74B90136FC3@IRVEXCHCCR01.corp.ad.broadcom.com>
References: <062.7439ee5d5fd36480e73548f37cb10207@tools.ietf.org> <r2q6e9223711004211010gfdee1a70q972e8239fef10435@mail.gmail.com> <001101cae177$e8aa6780$b9ff3680$@de> <t2t6e9223711004211119i6b107798pa01fc4b1d33debf1@mail.gmail.com> <002d01cae188$a330b2c0$e9921840$@de> <CB68DF4CFBEF4942881AD37AE1A7E8C74AB3F4A017@IRVEXCHCCR01.corp.ad.broadcom.com> <20100423011559.20246ayxdicd9vzz@mail.skype.net> <CB68DF4CFBEF4942881AD37AE1A7E8C74AB3F4A291@IRVEXCHCCR01.corp.ad.broadcom.com> <20100425040049.69785q4ih4vowtep@mail.skype.net> <CB68DF4CFBEF4942881AD37AE1A7E8C74B90136FC3@IRVEXCHCCR01.corp.ad.broadcom.com>
Date: Fri, 30 Apr 2010 07:34:48 -0400
Message-ID: <n2i6e9223711004300434t5dad346dw5854dd785b55a28e@mail.gmail.com>
From: stephen botzko <stephen.botzko@gmail.com>
To: "Raymond (Juin-Hwey) Chen" <rchen@broadcom.com>
Content-Type: multipart/alternative; boundary="0016e6d97670769f34048572a1c5"
Cc: "codec@ietf.org" <codec@ietf.org>
Subject: Re: [codec] #16: Multicast? (Bluetooth)
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Apr 2010 11:35:44 -0000

Though the Bluetooth angle is interesting, it is clearly out-of-scope for
this WG.

Of course Bluetooth SIG could pick up CODEC later on if they think it meets
their requirements.

Stephen Botzko

On Thu, Apr 29, 2010 at 6:14 PM, Raymond (Juin-Hwey) Chen <
rchen@broadcom.com> wrote:

> Hi Koen,
>
> For some reason the SPAM filter accidentally routed your email below (sent
> last Sunday) to my junk email folder and I just saw it. Sorry about the
> delay of my response.
>
> I agree that there are some fundamental differences in the requirements for
> cellular codecs and Bluetooth codecs which caused the codecs in these two
> types of devices to each go their own way.  However, these differences are
> (or can be) substantially smaller between an Internet codec and Bluetooth
> codecs, so I think it is easier for Internet devices and Bluetooth devices
> to use the same codec to avoid the additional delay and coding distortion of
> transcoding.
>
> (1) Royalty-free requirement:
> Cellular codecs are usually royalty-bearing, and that's acceptable in the
> cellular world.  Not so with Bluetooth.  Bluetooth devices are meant to be
> simple and low cost.  As such, Bluetooth SIG basically only wants to
> standardize royalty-free technologies.  That's an important reason why they
> picked the CVSD codec, a royalty-free old technology of 1970.  We are trying
> to make the IETF codec royalty-free, so in this regard this goal is
> consistent with the Bluetooth SIG's royalty-free requirement for codec.
>
> (2) Bit-rate requirement:
> Cellular radio spectrum is a limited, fixed resource that doesn't change
> with time, and cellular operators spent billions of dollars in radio
> spectrum auctions. Thus, it is extremely important for cellular codecs to
> have bit-rates as low as possible, with an average bit-rate often going
> below 1 bit/sample, to maximize the number of cellular subscribers a given
> amount of radio spectrum can support.  In contrast, the bit-rate is not
> nearly as big a concern for Bluetooth. Initially Bluetooth SIG picked the
> relatively high-bit-rate 64 kb/s CVSD narrowband codec (8 bits/sample) for
> its simplicity and royalty-free nature among other things.  Since the speeds
> of the Internet back bone and access networks keep growing with time, the
> bit-rate of an Internet codec is also not nearly as big a concern as in
> cellular codecs, and an Internet codec around 2 bits/sample can have better
> trade-offs (e.g. higher quality, lower delay, and lower complexity) for
> Internet applications than what cellular codecs can provide.  Incidentally,
> Bluetooth SIG is moving toward 4 bits/sample.  As you can see, in terms of
> the bit-rate requirement, an Internet codec is much closer to Bluetooth
> codecs than cellular codecs are.
>
> (3) Complexity requirement:
> Bluetooth headsets have much lower processing power and much smaller
> batteries than cell phones. The complexity of cellular codecs, typically in
> the range of 20 to 40 MHz on a DSP, is too high to fit most Bluetooth
> headsets. However, unlike cell phones and Bluetooth headsets where each is a
> specific type of device with a relatively narrow range of device complexity,
> Internet voice/audio applications can potentially encompass a large variety
> of different device types, from desktop computers at the high end with > 3
> GHz multi-core CPU to IP phones and possibly even Bluetooth headsets at the
> low end with a processor of only a few tens of MHz.  It is up to the IETF
> codec WG how we want the complexity of the IETF codec to be.  We can
> standardize just one codec mode that works well for computer-to-computer
> calls but can't fit in low-end devices, or we can keep that mode but also
> have a low-complexity mode that can be implemented in low-end devices.
>  Frankly, I think the second approach makes much more sense since it allows
> many more devices to benefit from the IETF codec and enables the large
> number of Bluetooth headset users to avoid the additional distortion and
> delay associated with transcoding when making Internet calls.
>
> (4) Delay requirement: Due to the need for cellular codecs to achieve
> bit-rates as low as possible, they sacrificed the coding delay and used a 20
> ms frame size, because using a 10 or 5 ms frame size would increase the
> bit-rate for a given level of speech quality.  On the other hand, a
> Bluetooth headset needs to have a low delay since its delay is added to the
> already long cell phone delay.  For the IETF codec, again it is up to the
> codec WG to decide what kind of codec delay we want, and again I think it
> makes sense to have a higher-delay, higher bit-rate efficiency mode for
> bit-rate-sensitive applications and another low-delay mode for
> delay-sensitive applications, since one size doesn't fit all.  If the IETF
> codec delay is forced to be one size, the resulting codec will be
> (potentially very) suboptimal for some applications.
>
> You wrote:
> > Do you think it's realistic for us to come up with a design that
> > fulfills the needs of both worlds?
>
> With a one-size-fit-all approach, probably not, but with a multi-mode
> approach, then I think so.
>
> Best Regards,
>
> Raymond
>
> -----Original Message-----
> From: Koen Vos [mailto:koen.vos@skype.net]
> Sent: Sunday, April 25, 2010 4:01 AM
> To: Raymond (Juin-Hwey) Chen
> Cc: codec@ietf.org
> Subject: RE: [codec] #16: Multicast? (Bluetooth)
>
> Hi Raymond,
>
> You seem to suggest that the IETF Internet codec should fit Bluetooth
> requirements in order to enable transcoding-free operation all the way
> from the Internet, through the Internet-connected device, to the BT
> wireless audio device.
>
> A similar argument would hold for ITU-T cellular codecs: AMR-WB and
> G.718 could have been designed with BT as an application.  In reality,
> these codecs have very little in common with BT codecs, because of the
> vastly different requirements in terms of
> - complexity
> - memory footprint
> - bitrate
> - scalability
> - bit error robustness
> - packet loss robustness.
>
> Do you think it's realistic for us to come up with a design that
> fulfills the needs of both worlds?
>
> The alternative is to separately design codecs for Internet
> applications and BT devices, and continue the practice of transcoding
> on the Internet-connected device.  That would have a better chance of
> maximizing quality in all scenarios.
>
> best,
> koen.
>
>
> Quoting "Raymond (Juin-Hwey) Chen":
>
> > Hi Koen,
> >
> > Responding to your earlier email about Bluetooth headset application:
> >
> > (1) Although BT SIG standardization is a preferred route, it is
> > technically feasible to negotiate and use a non-Bluetooth-SIG codec.
> >
> > (2) Someone familiar with BT SIG told me that it would probably take
> > only 6 months to add an optional codec to the BT SIG spec and 12 to
> > 18 months to add a mandatory codec.
> >
> > (3) The IETF codec is scheduled to be finalized in 14 months and
> > submitted to IESG in 18 months.  Even if we take the BT SIG route
> > and take 6 to 18 months there.  The total time of 2 to 3 years from
> > now means the Moore's Law would only increase the CPU resources 2X
> > to 3X, and definitely no more than 4X max, not 10X.
> >
> > (4) Most importantly, guess what, in the last several years the
> > Bluetooth headset chips have been growing its processing power at a
> > MUCH, MUCH slower rate than what the Moore's Law says it should.
> > Sometimes they did not increase the speed at all for years.  The
> > reasons? The ASP (average sale price) of Bluetooth chips plummeted
> > very badly, making it unattractive to invest significant resources
> > to make them significantly faster.  Also, for low-end and mid-end BT
> > headsets, the BT chips were often considered "good enough" and there
> > wasn't a strong drive to increase the computing resources.  In
> > addition, the BT headsets got smaller over the last few years; the
> > corresponding reduction in battery size required a reduction in
> > power consumption, which also limited how fast the processor speed
> > could grow.  In the next several years, it is highly likely that the
> > computing capabilities of Bluetooth headset chips will continue to
> > grow at a rate substantially below what's predicted by the Moore's
> > Law.
> >
> > (5) Although Bluetooth supports G.711 as an optional codec,
> > basically no one uses it because it is too sensitive to bit errors.
> > Essentially all the BT mono headsets on the market today are
> > narrowband (8 kHz sampling) headsets using CVSD.  There isn't any
> > real wideband support yet, so your comment about G.722 doesn't
> > apply.  Even after wideband-capable BT headsets come out, for many
> > years to come the majority of the BT headsets (especially mid- to
> > low-end) will still be narrowband only, running only CVSD. Hence,
> > the quality degradation of the CVSD transcoding is real and will be
> > with us for quite a while, so it is desirable for the IETF codec to
> > have a low-complexity mode that can directly run on the BT headsets
> > to avoid the quality degradation of CVSD when using BT headsets to
> > make Internet phone calls.
> >
> > (6) Even if you could use G.711 or G.722 in the BT headsets, they
> > both operate at 64 kb/s.  A low-complexity mode of the IETF codec
> > can operate at half or one quarter of that bit-rate.  This will help
> > conserve BT headsets' radio power because of the lower transmit duty
> > cycle.  It will also help the Bluetooth + WiFi co-existence
> > technologies.
> >
> > (7) Already a lot of people are used to using Bluetooth headsets to
> > make phone calls today.  If they have a choice, many of these people
> > will also want to use Bluetooth headsets to make Internet phone
> > calls, not only through computers, but also through smart phones
> > connected to WiFi or cellular networks.  As more and more states and
> > countries pass laws to ban the use of cell phones that are not in
> > hands-free mode while driving, the number of Bluetooth headset users
> > will only increase with time, and many of them will want to make
> > Internet-based phone calls.
> >
> > Given all the above, I would argue that Bluetooth headset is a very
> > relevant application that the IETF codec should address with a
> > low-complexity mode.
> >
> > Best Regards,
> >
> > Raymond
> >
> > -----Original Message-----
> > From: codec-bounces@ietf.org [mailto:codec-bounces@ietf.org] On
> > Behalf Of Koen Vos
> > Sent: Friday, April 23, 2010 1:16 AM
> > To: codec@ietf.org
> > Subject: Re: [codec] #16: Multicast?
> >
> > By the time the BlueTooth Special Interest Group will have adopted a
> > future IETF codec standard, Moore's law will surely have multiplied
> > CPU resources in the BT device by one order of magnitude..?  Not sure
> > it makes sense to apply today's BT constraints to tomorrow's codec.
> >
> > I'm not even convinced BlueTooth is a relevant use case for an
> > Internet codec.  BT devices are audio devices more than VoIP end
> > points: BT always connects to the Internet through another device.
> > You could simply first decode incoming packets and send PCM data to
> > the BT device, or use a high-quality/high-bitrate codec like G.722.
> > The requirements for BT devices and the Internet are just too
> > different.  Similarly, GSM phones use AMR on the network side and a
> > different codec towards the BT device.  The required transcoding
> > causes no quality problems because BT supports high bitrates.
> >
> > best,
> > koen.
> >
> >
> > Quoting Raymond (Juin-Hwey) Chen:
> >
> >> Hi Christian,
> >>
> >> My comments about your question of CODEC requirements are in-line.
> >>
> >> Raymond
> >>
> >> From: codec-bounces@ietf.org [mailto:codec-bounces@ietf.org] On
> >> Behalf Of Christian Hoene
> >> Sent: Wednesday, April 21, 2010 12:27 PM
> >> To: 'stephen botzko'
> >> Cc: codec@ietf.org
> >> Subject: Re: [codec] #16: Multicast?
> >>
> >> Hi,
> >>
> >> if we take those two scenarios (high quality and scalable
> >> teleconferencing), what are then the CODEC requirements?
> >>
> >> High quality:
> >>
> >> -          Quite the same requirement as an end-to-end audio
> >> transmission: high quality and low latency.
> >> [Raymond]: High quality is a given, but I would like to emphasize
> >> the importance of low latency.
> >> (1) It is well-known that the longer the latency, the lower the
> >> perceived quality of the communication link.  The E-model in the
> >> ITU-T Recommendation G.107 models such communication quality in
> >> MOS_cqe, which among other things depends on the so-called "delay
> >> impairment factor" Id.  Basically, MOS_cqe is a monotonically
> >> decreasing function of increasing latency, and beyond about 150 ms
> >> one-way delay, the perceived quality of the communication link drops
> >> rapidly with further delay increase.
> >> (2) The lower the latency, the less audible the echo, and thus the
> >> lower the required echo return loss.  Hence, lower latency means
> >> easier echo control and simpler echo canceller, and as people
> >> already mentioned previously, below a certain delay, an echo is
> >> simply perceived as a harmless side-tone and no echo canceller is
> >> needed. It seems to me that echo control in conference calls is more
> >> difficult than in point-to-point calls.  While I hardly ever heard
> >> echoes in domestic point-to-point calls, in my experience with
> >> conference calls at work, even with the G.711 codec (which has
> >> almost no delay), sometimes I still hear echoes (I just heard
> >> another one this afternoon).  If a relatively long-delay IETF codec
> >> is used, the echo control will be even more problematic.
> >> (3) In normal phone calls or conference calls, people routinely have
> >> a need to interrupt each other, but beyond a certain point, long
> >> latency makes it very difficult for people to interrupt each other
> >> on the call.  This is because when you try to interrupt another
> >> person, that person doesn't hear your interruption until a certain
> >> time later, so he keeps talking, but when you hear that he did not
> >> stop talking when you interrupted, you stop; then, he hears your
> >> interruption, so he stops. When you hear he stops, you start talking
> >> again, but then he also hears you stopped (due to the long delay),
> >> so he also starts talking again.  The net result is that with a long
> >> latency, when you try to interrupt him, you and he end up stopping
> >> and starting at roughly the same time for a few cycles, making it
> >> difficult to interrupt each other.
> >> (4) We need to keep in mind that the IETF codec may not be the only
> >> codec involved in a phone call or a conference call.  We cannot
> >> assume that all conference call participants will be using a
> >> computer to conduct the call. Not only do people use cell phones for
> >> point-to-point phone calls, they also often use cell phones to call
> >> in to conference calls.  The one-way delay for a cell phone call
> >> through one carriers network is typically around 80 to 110 ms.  A
> >> call from a cell phone in a carrier network to another cell phone in
> >> a different type of carrier network can easily double this delay to
> >> 160 ~ 220 ms and makes the total one-way delay already far exceeding
> >> the 150 ms mentioned in (1) above.  Any coding delay added by the
> >> IETF codec will be on top of that long delay, and such coding delay
> >> will be applied twice when both cell phones call through the IETF
> >> codec to a conference bridge.  Even without the IETF codec delay,
> >> when I previously called from a Verizon cell phone to an AT&T cell
> >> phone, I already experienced the problem mentioned in (3) sometimes.
> >>  If the IETF codec has a relatively long delay, adding two times the
> >> IETF codec one-way delay to the already long delay of 160 ~ 220 ms
> >> will make the situation much worse.  Even if just one cell phone is
> >> involved in a conference call, adding twice the one-way delay of a
> >> relatively long-delay IETF codec can still easily push the total
> >> one-way delay beyond 150 ms.
> >> To summarize, my point is that to help reduce potential echo
> >> problems and to ensure a high-quality experience in such a
> >> conference call, the IETF codec should have a delay as low as
> >> possible while maintaining good enough speech quality and a
> >> reasonable bit-rate.
> >>
> >> -          Maybe additionally: variable bit rate encoding to achieve
> >> a multiplexing gain at the receiver
> >>
> >> -          and thus, a fast control loop to cope with variable
> >> bitrates on transmission paths.
> >>
> >> -          Maybe stereo/multichannel support to send the spatial
> >> audio to the headphone or loudspeakers.
> >>
> >> Scalable:
> >>
> >> -          Efficient encoding/transcoding for multiple different
> >> qualities (at the conference bridge)
> >> [Raymond]: I am not sure whether by "efficient", you meant coding
> >> efficiency or computational efficiency.  In any case, I would like
> >> to take this opportunity to express my view that although codec
> >> complexity isn't much of an issue for PC-to-PC calls where there are
> >> GHz of processing power available, the codec complexity is an
> >> important issue in certain application scenarios.  The following are
> >> just some examples.
> >> 1) If a conference bridge has to decode a large number of voice
> >> channels, mix, and re-encode, and if compressed-domain mixing cannot
> >> be done (which is usually the case), then it is important to keep
> >> the decoder complexity low.
> >> 2) In topology b) of your other email
> >> (IPend-to-transcoding_gateway-to-PSTNend), the transcoding gateway,
> >> or VoIP gateway, often has to encode and decode thousands of voice
> >> channels in a single box, so not only the computational complexity,
> >> but also the per-instance RAM size requirement of the codec become
> >> very important for achieving high channel density in the gateway.
> >> 3) Many telephone terminal devices at the edge of the Internet use
> >> embedded processors with limited processing power, and the
> >> processors also have to handle many tasks other than speech coding.
> >> If the IETF codec complexity is too high, some of such devices may
> >> not have sufficient processing power to run it.  Even if the codec
> >> can fit, some battery-powered mobile devices may prefer to run a
> >> lower-complexity codec to reduce power consumption and battery
> >> drain.  For example, even if you make a Internet phone call from a
> >> computer, you may like the convenience of using a Bluetooth headset
> >> that allows you to walk around a bit and have hands-free operation.
> >> Currently most Bluetooth headsets have small form factors with a
> >> tiny battery.  This puts a severe constraint on power consumption.
> >> Bluetooth headset chips typically have very limited processing
> >> capability, and it has to handle many other tasks such as echo
> >> cancellation and noise reduction.  There is just not enough
> >> processing power to handle a relatively high-complexity codec.  Most
> >> BT headsets today relies on the extremely low-complexity,
> >> hardware-based CVSD codec at 64 kb/s to transmit narrowband voice,
> >> but CVSD has audible coding noise, so it degrades the overall audio
> >> quality.  If the IETF codec has low enough complexity, it would be
> >> possible to directly encode and decode the IETF codec bit-stream at
> >> the BT headset, thus avoiding the quality degradation of CVSD
> >> transcoding.
> >> In summary, my point is that the IETF codec should attempt to
> >> achieve a codec complexity as low as possible in both MHz
> >> consumption and RAM size requirement while maintaining good enough
> >> speech quality.
> >>
> >> -          The control loop must not react (fast) because
> >> (multicast) group communication requires to encode at low quality
> >> anyhow.
> >>
> >> -          Receiver side activity detection for music and voice
> >> having low complexity (for the conference bridge)
> >>
> >> -          Efficient mixing of two to four(?) active flows (is this
> >> achievable without the complete process of decoding and encoding
> >> again?)
> >>
> >> Are any teleconferencing requirements missing?
> >>
> >>  Christian
> >>
> >>
> >>
> >>
> >> ---------------------------------------------------------------
> >> Dr.-Ing. Christian Hoene
> >> Interactive Communication Systems (ICS), University of Tübingen
> >> Sand 13, 72076 Tübingen, Germany, Phone +49 7071 2970532
> >> http://www.net.uni-tuebingen.de/
> >>
> >> From: stephen botzko [mailto:stephen.botzko@gmail.com]
> >> Sent: Wednesday, April 21, 2010 8:19 PM
> >> To: Christian Hoene
> >> Cc: codec@ietf.org
> >> Subject: Re: [codec] #16: Multicast?
> >>
> >> Inline
> >> On Wed, Apr 21, 2010 at 1:27 PM, Christian Hoene
> >> <hoene@uni-tuebingen.de<mailto:hoene@uni-tuebingen.de>> wrote:
> >> Hi Stephen,
> >>
> >> not too bad. You answered faster than the mailing list distributes...
> >> Not sure how that happened!
> >>
> >> Comments inline:
> >>
> >>
> >> From: stephen botzko
> >> [mailto:stephen.botzko@gmail.com<mailto:stephen.botzko@gmail.com>]
> >> Sent: Wednesday, April 21, 2010 7:10 PM
> >> To: Christian Hoene
> >> Cc: codec@ietf.org<mailto:codec@ietf.org>
> >>
> >> Subject: Re: [codec] #16: Multicast?
> >>
> >> I agree there are lots of use cases.
> >>
> >>
> >> Though I don't see why high quality has to be given up in order to
> >> be scalable.
> >> CH: These are just experiences from our lab. A spatial audio
> >> conference server including the acoustic 3D sound rendering needs a
> >> LOT of processing power. In the end, we have to remain realistic.
> >> Processing power is always limited thus if we need a lot then we
> >> cannot serve many clients.
> >> Also, I am not sure why you think central mixing is more scalable
> >> than multicast (or why you think it is lower quality either).
> >> CH: With multicast, you need N times 1:N multicast distribution
> >> trees (somewhat small tan O(n)=n²).  With central mixing you need N
> >> times 2 transmission paths (O(n)=n). Also, this distributed mixing
> >> you need N times the mixing at each client. With centralized, you
> >> can live with one mixing for all (and some tricks for serving the
> >> talkers).
> >> I agree you need more distribution trees for multicast if you allow
> >> every site to talk. There is a corresponding benefit, since there is
> >> no central choke point and also less bandwidth on shared WAN links.
> >>
> >> In the distributed case,  you don't need an N-way mixer at each
> >> client, and you also don't need to continuously receive payload on
> >> all N streams at each client either.  In practice you can cap N at a
> >> relatively small number (in the 3-8 range) no matter how large the
> >> conference gets.  In a large conference, you can even choose to drop
> >> your comfort noise if you are receiving two or more streams, and
> >> just send enough to keep your firewall pinhole open.  This is all
> >> assuming a suitable voice activity measure in the RTP packet.  Of
> >> course in the worst case, you will receive all N streams.
> >>
> >> Cheers,
> >>  Christian
> >>
> >> Stephen Botzko
> >> On Wed, Apr 21, 2010 at 12:58 PM, Christian Hoene
> >> <hoene@uni-tuebingen.de<mailto:hoene@uni-tuebingen.de>> wrote:
> >> Hi,
> >>
> >> the teleconferencing issue gets complex. I am trying to compile the
> >> different requirements that have been mentioned on this list.
> >>
> >> -          low complexity (with just one active speaker) vs.
> >> multiple speaker mixing vs. spatial audio/stereo mixing
> >>
> >> -          centralized vs. distributed
> >>
> >> -          few participants vs. hundreds of listeners and talkers
> >>
> >> -          individual distribution of audio streams vs. IP multicast
> >> or RTP group communication
> >>
> >> -          efficient encoding of multiple streams having the same
> >> content (but different quality).
> >>
> >> -           I bet I missed some.
> >>
> >> To make things easier, why not to split the teleconferencing
> >> scenario in two: High quality and Scalable?
> >>
> >> The high quality scenario, intended for a low number of users, could
> >> have features like
> >>
> >> -          Distributed processing and mixing
> >>
> >> -          High computational resources to support spatial audio
> >> mixing (at the receiver) and multiple encodings of the same audio
> >> stream at different qualities (at the sender)
> >>
> >> -          Enough bandwidth to allow direct N to N transmissions of
> >> audio streams (no multicast or group communication). This would be
> >> good for the latency, too.
> >>
> >> The scalable scenario is the opposite:
> >>
> >> -          Central processing and mixing for many participants .
> >>
> >> -          N to 1 and 1 to N communication using efficient
> >> distribution mechanisms (RTP group communication and IP multicast).
> >>
> >> -          Low complexity mixing of many using tricks like VAD,
> >> encoding at lowest rate to support many receivers having different
> >> paths, you name it...
> >>
> >> Then, we need not to compare apples with oranges all the time.
> >>
> >> Christian
> >>
> >> ---------------------------------------------------------------
> >> Dr.-Ing. Christian Hoene
> >> Interactive Communication Systems (ICS), University of Tübingen
> >> Sand 13, 72076 Tübingen, Germany, Phone +49 7071 2970532
> >> http://www.net.uni-tuebingen.de/
> >>
> >> From: codec-bounces@ietf.org<mailto:codec-bounces@ietf.org>
> >> [mailto:codec-bounces@ietf.org<mailto:codec-bounces@ietf.org>] On
> >> Behalf Of stephen botzko
> >> Sent: Wednesday, April 21, 2010 4:34 PM
> >> To: Colin Perkins
> >> Cc: trac@tools.ietf.org<mailto:trac@tools.ietf.org>;
> >> codec@ietf.org<mailto:codec@ietf.org>
> >> Subject: Re: [codec] #16: Multicast?
> >>
> >> in-line
> >>
> >> Stephen Botzko
> >> On Wed, Apr 21, 2010 at 8:17 AM, Colin Perkins
> >> <csp@csperkins.org<mailto:csp@csperkins.org>> wrote:
> >> On 21 Apr 2010, at 12:20, Marshall Eubanks wrote:
> >> On Apr 21, 2010, at 6:48 AM, Colin Perkins wrote:
> >> On 21 Apr 2010, at 10:42, codec issue tracker wrote:
> >> #16: Multicast?
> >> ------------------------------------+----------------------------------
> >> Reporter:  hoene@...                 |       Owner:
> >>  Type:  enhancement             |      Status:  new
> >> Priority:  trivial                 |   Milestone:
> >> Component:  requirements            |     Version:
> >> Severity:  Active WG Document      |    Keywords:
> >> ------------------------------------+----------------------------------
> >> The question arrose whether the interactive CODEC MUST support
> >> multicast in addition to teleconferencing.
> >>
> >> On 04/13/2010 11:35 AM, Christian Hoene wrote:
> >> P.S. On the same note, does anybody here cares about using this
> >> CODEC with multicast? Is there a single commercial multicast voice
> >> deployment? From what I've seen all multicast does is making IETF
> >> voice standards harder to understand or implement.
> >>
> >> I think that would be a mistake to ignore multicast - not because of
> >> multicast itself, but because of Xcast (RFC 5058) which is a
> >> promising technology to replace centralized conference bridges.
> >>
> >> Regarding multicast:
> >>
> >> I think we shall start at user requirements and scenarios.
> >> Teleconference (including mono or spatial audio) might be good
> >> starting point. Virtual environments like second live would require
> >> multicast communication, too. If the requirements of these scenarios
> >> are well understand, we can start to talk about potential solutions
> >> like IP multicast, Xcast or conference bridges.
> >>
> >>
> >> RTP is inherently a group communication protocol, and any codec
> >> designed for use with RTP should consider operation in various
> >> different types of group communication scenario (not just
> >> multicast). RFC 5117 is a good place to start when considering the
> >> different types of topology in which RTP is used, and the possible
> >> placement of mixing and switching functions which the codec will
> >> need to work with.
> >>
> >> It is not clear to me what supporting multicast would entail here.
> >> If this is a codec over RTP, then what is to stop it from being
> >> multicast ?
> >>
> >> Nothing. However group conferences implemented using multicast
> >> require end system mixing of potentially large numbers of active
> >> audio streams, whereas those implemented using conference bridges do
> >> the mixing in a single central location, and generally suppress all
> >> but one speaker. The differences in mixing and the number of
> >> simultaneous active streams that might be received potentially
> >> affect the design of the codec.
> >>
> >> Conference bridges with central mixing almost always mix multiple
> >> speakers.  As you add more streams into the mix, you reduce the
> >> chance of missing onset speech and interruptions, but raise the
> >> noise floor. So even if complexity is not a consideration, there is
> >> value in gating the mixer (instead of always doing a full mix-minus).
> >>
> >> More on point, compressed domain mixing and easy detection of VAD
> >> have both been advocated on these lists, and both simplify the
> >> large-scale mixing problem.
> >>
> >> --
> >> Colin Perkins
> >> http://csperkins.org/
> >>
> >>
> >>
> >> _______________________________________________
> >> codec mailing list
> >> codec@ietf.org<mailto:codec@ietf.org>
> >> https://www.ietf.org/mailman/listinfo/codec
> >>
> >>
> >>
> >>
> >
> > _______________________________________________
> > codec mailing list
> > codec@ietf.org
> > https://www.ietf.org/mailman/listinfo/codec
> >
> >
> >
>
>
>
> _______________________________________________
> codec mailing list
> codec@ietf.org
> https://www.ietf.org/mailman/listinfo/codec
>