Re: [codec] #15: Efficiently combine pre-encoded audio

Jean-Marc Valin <> Wed, 12 May 2010 16:23 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 8FA4428C1B1 for <>; Wed, 12 May 2010 09:23:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -0.347
X-Spam-Status: No, score=-0.347 tagged_above=-999 required=5 tests=[AWL=-0.348, BAYES_50=0.001]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id c0NzAS9c0TZe for <>; Wed, 12 May 2010 09:23:25 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 7449228C1B2 for <>; Wed, 12 May 2010 09:03:30 -0700 (PDT)
Received: from [] ([]) by with Microsoft SMTPSVC(6.0.3790.4675); Wed, 12 May 2010 12:03:20 -0400
Message-ID: <>
Date: Wed, 12 May 2010 12:03:19 -0400
From: Jean-Marc Valin <>
User-Agent: Thunderbird (X11/20100317)
MIME-Version: 1.0
References: <> <> <> <> <> <>
In-Reply-To: <>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-OriginalArrivalTime: 12 May 2010 16:03:20.0182 (UTC) FILETIME=[A4231960:01CAF1EC]
Subject: Re: [codec] #15: Efficiently combine pre-encoded audio
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 12 May 2010 16:23:26 -0000

Benjamin M. Schwartz wrote:
>> I think you can do better than an encoder VAD.
> I know that CELT makes decoder VAD very efficient, 

Not only CELT. You can do that with an LPC-based codec too.

> but how is decoder VAD
> better than encoder VAD?  Encoder VAD saves even more CPU, saves
> bandwidth, and enables easier jitter buffering.

There's a few reasons why I think decoder-side is better:
- The decision for an encoder-size VAD would take some amount of space in 
the bit-stream
- If we make an encode-size VAD mandatory, then all encoders will have to 
spend the CPU cycles, even when it's not needed. If it's not mandatory, 
then the decoder cannot rely on it, so it still needs to implement a VAD
- A decoder VAD does not need to be specified in an exact way, so 
implementers can choose different implementations depending on that 
information they need.
- You cannot "game" a decode-size VAD.

> Are you thinking about some sort of adaptive thresholding that requires
> knowing all streams' volume levels?

Well, knowing the relative amplitudes of each stream can allow you to take 
more intelligent decisions, e.g. when you have to choose the "most active 
speaker". That's something you can't really get from an encoder VAD.

> Anyway, VAD can run on both encode and decode sides at the same time.

That would just mean nobody would bother implementing the encode side.