Re: [codec] #14: VAD and CNG?
Brian Rosen <br@brianrosen.net> Mon, 24 May 2010 14:48 UTC
Return-Path: <br@brianrosen.net>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id C84E93A6359 for <codec@core3.amsl.com>; Mon, 24 May 2010 07:48:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.341
X-Spam-Level:
X-Spam-Status: No, score=0.341 tagged_above=-999 required=5 tests=[AWL=-0.594, BAYES_50=0.001, IP_NOT_FRIENDLY=0.334, J_CHICKENPOX_72=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dSCLRGIzuJhf for <codec@core3.amsl.com>; Mon, 24 May 2010 07:48:30 -0700 (PDT)
Received: from ebru.winwebhosting.com (ebru.winwebhosting.com [67.18.150.162]) by core3.amsl.com (Postfix) with ESMTP id AAB7A3A6813 for <codec@ietf.org>; Mon, 24 May 2010 07:48:30 -0700 (PDT)
Received: from neustargw.va.neustar.com ([209.173.53.233] helo=[192.168.128.195]) by ebru.winwebhosting.com with esmtpa (Exim 4.69) (envelope-from <br@brianrosen.net>) id 1OGYwx-0003Rn-0q; Mon, 24 May 2010 09:48:15 -0500
User-Agent: Microsoft-Entourage/12.24.0.100205
Date: Mon, 24 May 2010 10:48:18 -0400
From: Brian Rosen <br@brianrosen.net>
To: codec@ietf.org, hoene@uni-tuebingen.de
Message-ID: <C82009F2.35809%br@brianrosen.net>
Thread-Topic: [codec] #14: VAD and CNG?
Thread-Index: Acr7UCWVvG5Svlx6n0WnvziJ/5NDvA==
In-Reply-To: <071.305e71a2ad840434c74fe5953992951a@tools.ietf.org>
Mime-version: 1.0
Content-type: text/plain; charset="ISO-8859-1"
Content-transfer-encoding: quoted-printable
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - ebru.winwebhosting.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - brianrosen.net
Subject: Re: [codec] #14: VAD and CNG?
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 24 May 2010 14:48:32 -0000
I hope we are assuming VAD (or DTX) is negotiable. It is essential (for emergency calls) that VAD is disabled. While in many cases the endpoint will know that it is in an emergency call, and disable VAD, it is not always possible to know. Brian On 5/24/10 10:22 AM, "codec issue tracker" <trac@tools.ietf.org> wrote: > #14: VAD and > CNG? ------------------------------------+------------------------------------ > --- Reporter: hoene@ | Owner: Type: defect > | Status: new Priority: major | Milestone: > Component: requirements | Version: Severity: - > | Keywords: > ------------------------------------+--------------------------------------- > Comment(by hoene@): [Raymond]: I think comfort noise is more than just to > let the telephone user know that the connection is not dead. If voice > packets are sent only during active voice regions of the signal (in DTX) and > not during silence/background noise regions, and if there is audible > background noise and comfort noise is not added during background noise > regions, at the receiving end the background noise will be "on" only during > active voice and "off" otherwise. This frequent on-off switching will make > the background noise sound unnatural and will bother many users. Adding > comfort noise during background noise regions will make such background noise > sound more natural. [Benjamin]: The cheapest solution, of course, is > transmit-side activity detection. Maybe we need to specify a way for a > receiver to request that the transmitter employ (or not employ) VAD. > [Benjamin]: I know that CELT makes decoder VAD very efficient, but how is > decoder VAD better than encoder VAD? Encoder VAD saves even more CPU, saves > bandwidth, and enables easier jitter buffering. Are you thinking about some > sort of adaptive thresholding that requires knowing all streams' volume > levels? Anyway, VAD can run on both encode and decode sides at the same > time. [Stephen]: It might be valuable to be able to obtain the VAD without > having to decrypt the bitstream, since that also can become a problem as the > number of streams grows. There is a security consideration (traffic > analysis), however, it still might be worth thinking about. [JM]: > I know > that CELT makes decoder VAD very efficient, Not only CELT. You can do that > with an LPC-based codec too. > but how is decoder VAD > better than encoder > VAD? Encoder VAD saves even more CPU, saves > bandwidth, and enables easier > jitter buffering. There's a few reasons why I think decoder-side is better: > - The decision for an encoder-size VAD would take some amount of space in the > bit-stream - If we make an encode-size VAD mandatory, then all encoders will > have to spend the CPU cycles, even when it's not needed. If it's not > mandatory, then the decoder cannot rely on it, so it still needs to implement > a VAD - A decoder VAD does not need to be specified in an exact way, so > implementers can choose different implementations depending on that > information they need. - You cannot "game" a decode-size VAD. > Are you > thinking about some sort of adaptive thresholding that > requires knowing all > streams' volume levels? Well, knowing the relative amplitudes of each stream > can allow you to take more intelligent decisions, e.g. when you have to > choose the "most active speaker". That's something you can't really get from > an encoder VAD. > Anyway, VAD can run on both encode and decode sides at the > same time. That would just mean nobody would bother implementing the encode > side. [Benjamin]: I think I failed to communicate that by VAD I mean _not > sending packets_ during inactivity. For the packets that are sent, the > overhead should average much less than 1 bit per frame. I'm not suggesting > sending 200 packets a second containing a flag indicating no voice activity, > followed by carefully coded background noise. That would be silly. > - If > we make an encode-size VAD mandatory, then all encoders will have > to spend > the CPU cycles, even when it's not needed. If it's not > mandatory, then the > decoder cannot rely on it, so it still needs to > implement a VAD I don't > see this as "mandatory". The encoder can turn off VAD, and probably should > for full-quality applications. > - A decoder VAD does not need to be > specified in an exact way, so > implementers can choose different > implementations depending on that > information they need. The only thing > that needs exact specification is the signalling. The encoder may use it or > not use it as it pleases. > - You cannot "game" a decode-size VAD. I don't > know what this means. >> Are you thinking about some sort of adaptive > thresholding that >> requires knowing all streams' volume levels? > > Well, > knowing the relative amplitudes of each stream can allow you to > take more > intelligent decisions, e.g. when you have to choose the > "most active > speaker". That's something you can't really get from an encoder VAD. > >> > Anyway, VAD can run on both encode and decode sides at the same time. > > > That would just mean nobody would bother implementing the encode side. I > expect encode-side VAD on a conference call to save more than a factor of 2 > in bandwidth, which makes it very desirable, especially for large > deployments. People will use it to save bandwidth (especially if it's on by > default in the reference implementation). The decode-side CPU savings are > just a minor bonus side-effect. [JM]: What you're describing is called DTX > (discontinuous transmission). This can be useful feature, but it's very > different from what we were originally talking about in terms of conference > mixing. [Ben]: Oops. Right. What I'm trying to say is that DTX, based on > encoder- side VAD, also greatly reduces the (average) computational burden on > a conference mixer. Of course, if everyone's really talking at once then > VAD can't help. [Roman]: There is one more application to efficiently > combining pre- encoded audio: playing announcements or recorded audio. > Standard network or IVR announcements can be encoded once and efficiently > inserted or combined into audio stream. If pre-encoded audio is supported and > the client supports AVT tones, it is trivial to develop a very efficient IVR > server which does not require any CODEC encoding or decoding. Efficient > decoder side VAD is also very helpful in case of speech recognition, where it > allows to save cycles in end-pointer. This way audio only needs to be decoded > and passed to the speech recognition system only when voice is present. > Bottom line, if we have both efficient decoder side VAD and combining pre- > encoded audio we can develop some very efficient VXML servers, voice mail and > IVR system, not just conferencing servers. Of course, this works well only > if the background noise is relatively stationary. If the background noise is > dynamically changing, then comfort noise can't really sound like the true > background noise. Today I was put on hold in a phone call with music > playing. Apparently the system treated some parts of the music as background > noise and replaced them with comfort noise. The result is pretty annoying, or > amusing, depending on which way you look at it. Therefore, I think comfort > noise has its value if DTX is used and the background noise is fairly > stationary, but if the background noise or music is changing dynamically and > high audio quality is desired, then the DTX and comfort noise should be > turned off and all parts of the signal need to be transmitted. -- Ticket > URL: <http://trac.tools.ietf.org/wg/codec/trac/ticket/14#comment:2> codec > <http://tools.ietf.org/codec/> ______________________________________________ > _ codec mailing > list codec@ietf.org https://www.ietf.org/mailman/listinfo/codec
- Re: [codec] #14: VAD and CNG? Brian Rosen
- Re: [codec] #14: VAD and CNG? Olle E. Johansson
- [codec] FW: #14: VAD and CNG? Christian Hoene
- Re: [codec] #14: VAD and CNG? Brian Rosen
- [codec] #14: VAD and CNG? codec issue tracker
- Re: [codec] #14: VAD and CNG? codec issue tracker
- Re: [codec] #14: VAD and CNG? Raymond (Juin-Hwey) Chen
- Re: [codec] #14: VAD and CNG? codec issue tracker
- Re: [codec] #14: VAD and CNG? Christian Hoene
- Re: [codec] #14: VAD and CNG? Michael Knappe
- Re: [codec] #14: VAD and CNG? Michael Knappe
- Re: [codec] #14: VAD and CNG? Benjamin M. Schwartz
- Re: [codec] #14: VAD and CNG? Christian Hoene
- Re: [codec] #14: VAD and CNG? Brian Rosen
- Re: [codec] #14: VAD and CNG? Benjamin M. Schwartz
- Re: [codec] #14: VAD and CNG? Christian Hoene
- Re: [codec] #14: VAD and CNG? Christian Hoene
- Re: [codec] #14: VAD and CNG? Brian Rosen
- Re: [codec] #14: VAD and CNG? Cullen Jennings
- [codec] Speech Quality Aspects in emergency calls Christian Hoene
- Re: [codec] Speech Quality Aspects in emergency c… Brian Rosen
- Re: [codec] #14: VAD and CNG? codec issue tracker
- Re: [codec] #14: VAD and CNG? codec issue tracker