Re: [codec] #14: VAD and CNG?
Michael Knappe <mknappe@juniper.net> Mon, 24 May 2010 16:09 UTC
Return-Path: <mknappe@juniper.net>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 57D8E3A6C50 for <codec@core3.amsl.com>; Mon, 24 May 2010 09:09:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.011
X-Spam-Level:
X-Spam-Status: No, score=-4.011 tagged_above=-999 required=5 tests=[AWL=-0.612, BAYES_50=0.001, J_CHICKENPOX_72=0.6, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Z4WhtPuhVACW for <codec@core3.amsl.com>; Mon, 24 May 2010 09:09:52 -0700 (PDT)
Received: from exprod7og115.obsmtp.com (exprod7og115.obsmtp.com [64.18.2.217]) by core3.amsl.com (Postfix) with ESMTP id 21D053A6B33 for <codec@ietf.org>; Mon, 24 May 2010 09:09:31 -0700 (PDT)
Received: from source ([66.129.224.36]) (using TLSv1) by exprod7ob115.postini.com ([64.18.6.12]) with SMTP ID DSNKS/qkiFvjXnAjrsQ5Ba+ghG4j6D9o236M@postini.com; Mon, 24 May 2010 09:09:45 PDT
Received: from EMBX02-HQ.jnpr.net ([fe80::18fe:d666:b43e:f97e]) by P-EMHUB01-HQ.jnpr.net ([fe80::fc92:eb1:759:2c72%11]) with mapi; Mon, 24 May 2010 09:06:01 -0700
From: Michael Knappe <mknappe@juniper.net>
To: Brian Rosen <br@brianrosen.net>, "codec@ietf.org" <codec@ietf.org>, "hoene@uni-tuebingen.de" <hoene@uni-tuebingen.de>
Date: Mon, 24 May 2010 09:05:59 -0700
Thread-Topic: [codec] #14: VAD and CNG?
Thread-Index: Acr7UCWVvG5Svlx6n0WnvziJ/5NDvAACtovz
Message-ID: <C81FF1F7.16BE0%mknappe@juniper.net>
In-Reply-To: <C82009F2.35809%br@brianrosen.net>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-Entourage/13.3.0.091002
acceptlanguage: en-US
Content-Type: text/plain; charset="iso-8859-2"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [codec] #14: VAD and CNG?
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 24 May 2010 16:09:54 -0000
I agree that VAD (or DTX) needs to be negotiable. Mike On 5/24/10 7:48 AM, "Brian Rosen" <br@brianrosen.net> wrote: > I hope we are assuming VAD (or DTX) is negotiable. It is essential (for > emergency calls) that VAD is disabled. While in many cases the endpoint > will know that it is in an emergency call, and disable VAD, it is not always > possible to know. > > Brian > > > On 5/24/10 10:22 AM, "codec issue tracker" <trac@tools.ietf.org> wrote: > >> #14: VAD and >> CNG? > ------------------------------------+------------------------------------ >> --- > Reporter: hoene@Š | Owner: > Type: defect >> | Status: new > Priority: major | Milestone: >> > Component: requirements | Version: > Severity: - >> | Keywords: >> > ------------------------------------+--------------------------------------- > >> > Comment(by hoene@Š): > > [Raymond]: I think comfort noise is more than just to >> let the telephone > user know that the connection is not dead. If voice >> packets are sent only > during active voice regions of the signal (in DTX) and >> not during > silence/background noise regions, and if there is audible >> background noise > and comfort noise is not added during background noise >> regions, at the > receiving end the background noise will be "on" only during >> active voice > and "off" otherwise. This frequent on-off switching will make >> the > background noise sound unnatural and will bother many users. Adding > >> comfort noise during background noise regions will make such background > noise >> sound more natural. > > [Benjamin]: The cheapest solution, of course, is >> transmit-side activity > detection. > Maybe we need to specify a way for a >> receiver to request that the > transmitter employ (or not employ) VAD. > > > >> [Benjamin]: > I know that CELT makes decoder VAD very efficient, but how is >> decoder VAD > better than encoder VAD? Encoder VAD saves even more CPU, saves > >> bandwidth, and enables easier jitter buffering. > Are you thinking about some >> sort of adaptive thresholding that requires > knowing all streams' volume >> levels? > Anyway, VAD can run on both encode and decode sides at the same >> time. > > > [Stephen]: It might be valuable to be able to obtain the VAD without > >> having to decrypt the bitstream, since that also can become a problem as > the >> number of streams grows. > There is a security consideration (traffic >> analysis), however, it still > might be worth thinking about. > > [JM]: >> I know >> that CELT makes decoder VAD very efficient, > Not only CELT. You can do that >> with an LPC-based codec too. > >> but how is decoder VAD >> better than encoder >> VAD? Encoder VAD saves even more CPU, saves >> bandwidth, and enables easier >> jitter buffering. > > There's a few reasons why I think decoder-side is better: > >> - The decision for an encoder-size VAD would take some amount of space in the >> bit-stream > - If we make an encode-size VAD mandatory, then all encoders will >> have to > spend the CPU cycles, even when it's not needed. If it's not >> mandatory, > then the decoder cannot rely on it, so it still needs to implement >> a VAD > - A decoder VAD does not need to be specified in an exact way, so > >> implementers can choose different implementations depending on that > >> information they need. > - You cannot "game" a decode-size VAD. > >> Are you >> thinking about some sort of adaptive thresholding that >> requires knowing all >> streams' volume levels? > > Well, knowing the relative amplitudes of each stream >> can allow you to take > more intelligent decisions, e.g. when you have to >> choose the "most active > speaker". That's something you can't really get from >> an encoder VAD. > >> Anyway, VAD can run on both encode and decode sides at the >> same time. > > That would just mean nobody would bother implementing the encode >> side. > > > [Benjamin]: > > I think I failed to communicate that by VAD I mean _not >> sending packets_ > during inactivity. For the packets that are sent, the >> overhead should > average much less than 1 bit per frame. > > I'm not suggesting >> sending 200 packets a second containing a flag > indicating no voice activity, >> followed by carefully coded background > noise. That would be silly. > >> - If >> we make an encode-size VAD mandatory, then all encoders will have >> to spend >> the CPU cycles, even when it's not needed. If it's not >> mandatory, then the >> decoder cannot rely on it, so it still needs to >> implement a VAD > > I don't >> see this as "mandatory". The encoder can turn off VAD, and > probably should >> for full-quality applications. > >> - A decoder VAD does not need to be >> specified in an exact way, so >> implementers can choose different >> implementations depending on that >> information they need. > > The only thing >> that needs exact specification is the signalling. The > encoder may use it or >> not use it as it pleases. > >> - You cannot "game" a decode-size VAD. > > I don't >> know what this means. > >>> Are you thinking about some sort of adaptive >> thresholding that >>> requires knowing all streams' volume levels? >> >> Well, >> knowing the relative amplitudes of each stream can allow you to >> take more >> intelligent decisions, e.g. when you have to choose the >> "most active >> speaker". That's something you can't really get from an > encoder VAD. >> >>> >> Anyway, VAD can run on both encode and decode sides at the same time. >> >> >> That would just mean nobody would bother implementing the encode side. > > I >> expect encode-side VAD on a conference call to save more than a factor > of 2 >> in bandwidth, which makes it very desirable, especially for large > >> deployments. People will use it to save bandwidth (especially if it's on by >> default in the reference implementation). The decode-side CPU savings > are >> just a minor bonus side-effect. > > [JM]: > What you're describing is called DTX >> (discontinuous transmission). This > can be useful feature, but it's very >> different from what we were > originally talking about in terms of conference >> mixing. > > [Ben]: Oops. Right. What I'm trying to say is that DTX, based on >> encoder- > side VAD, also greatly reduces the (average) computational burden on >> a > conference mixer. Of course, if everyone's really talking at once then > >> VAD can't help. > > [Roman]: There is one more application to efficiently >> combining pre- > encoded > audio: playing announcements or recorded audio. >> Standard network or IVR > announcements can be encoded once and efficiently >> inserted or combined > into audio stream. If pre-encoded audio is supported and >> the client > supports AVT tones, it is trivial to develop a very efficient IVR >> server > which does not require any CODEC encoding or decoding. > > Efficient >> decoder side VAD is also very helpful in case of speech > recognition, where it >> allows to save cycles in end-pointer. This way audio > only needs to be decoded >> and passed to the speech recognition system only > when voice is present. > > >> Bottom line, if we have both efficient decoder side VAD and combining pre- > >> encoded audio we can develop some very efficient VXML servers, voice mail and >> IVR system, not just conferencing servers. > > Of course, this works well only >> if the background noise is relatively > stationary. If the background noise is >> dynamically changing, then comfort > noise can't really sound like the true >> background noise. Today I was put > on hold in a phone call with music >> playing. Apparently the system treated > some parts of the music as background >> noise and replaced them with comfort > noise. The result is pretty annoying, or >> amusing, depending on which way > you look at it. > > Therefore, I think comfort >> noise has its value if DTX is used and the > background noise is fairly >> stationary, but if the background noise or > music is changing dynamically and >> high audio quality is desired, then the > DTX and comfort noise should be >> turned off and all parts of the signal > need to be transmitted.
- Re: [codec] #14: VAD and CNG? Brian Rosen
- Re: [codec] #14: VAD and CNG? Olle E. Johansson
- [codec] FW: #14: VAD and CNG? Christian Hoene
- Re: [codec] #14: VAD and CNG? Brian Rosen
- [codec] #14: VAD and CNG? codec issue tracker
- Re: [codec] #14: VAD and CNG? codec issue tracker
- Re: [codec] #14: VAD and CNG? Raymond (Juin-Hwey) Chen
- Re: [codec] #14: VAD and CNG? codec issue tracker
- Re: [codec] #14: VAD and CNG? Christian Hoene
- Re: [codec] #14: VAD and CNG? Michael Knappe
- Re: [codec] #14: VAD and CNG? Michael Knappe
- Re: [codec] #14: VAD and CNG? Benjamin M. Schwartz
- Re: [codec] #14: VAD and CNG? Christian Hoene
- Re: [codec] #14: VAD and CNG? Brian Rosen
- Re: [codec] #14: VAD and CNG? Benjamin M. Schwartz
- Re: [codec] #14: VAD and CNG? Christian Hoene
- Re: [codec] #14: VAD and CNG? Christian Hoene
- Re: [codec] #14: VAD and CNG? Brian Rosen
- Re: [codec] #14: VAD and CNG? Cullen Jennings
- [codec] Speech Quality Aspects in emergency calls Christian Hoene
- Re: [codec] Speech Quality Aspects in emergency c… Brian Rosen
- Re: [codec] #14: VAD and CNG? codec issue tracker
- Re: [codec] #14: VAD and CNG? codec issue tracker