Re: [codec] #15: Efficiently combine pre-encoded audio

Roman Shpount <roman@telurix.com> Wed, 12 May 2010 18:38 UTC

Return-Path: <roman@telurix.com>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 0F29F3A6CB2 for <codec@core3.amsl.com>; Wed, 12 May 2010 11:38:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.677
X-Spam-Level:
X-Spam-Status: No, score=-0.677 tagged_above=-999 required=5 tests=[AWL=-1.300, BAYES_50=0.001, FM_FORGED_GMAIL=0.622]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xULM3Of4OXth for <codec@core3.amsl.com>; Wed, 12 May 2010 11:38:44 -0700 (PDT)
Received: from mail-yw0-f173.google.com (mail-yw0-f173.google.com [209.85.211.173]) by core3.amsl.com (Postfix) with ESMTP id 9ECF23A6CD4 for <codec@ietf.org>; Wed, 12 May 2010 11:12:19 -0700 (PDT)
Received: by ywh3 with SMTP id 3so146920ywh.31 for <codec@ietf.org>; Wed, 12 May 2010 11:12:06 -0700 (PDT)
Received: by 10.101.210.25 with SMTP id m25mr4980105anq.265.1273687926596; Wed, 12 May 2010 11:12:06 -0700 (PDT)
Received: from mail-gw0-f44.google.com (mail-gw0-f44.google.com [74.125.83.44]) by mx.google.com with ESMTPS id n18sm1059703anl.12.2010.05.12.11.12.04 (version=TLSv1/SSLv3 cipher=RC4-MD5); Wed, 12 May 2010 11:12:05 -0700 (PDT)
Received: by gwb19 with SMTP id 19so179604gwb.31 for <codec@ietf.org>; Wed, 12 May 2010 11:12:04 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.150.128.41 with SMTP id a41mr11890032ybd.177.1273687924115; Wed, 12 May 2010 11:12:04 -0700 (PDT)
Received: by 10.150.186.7 with HTTP; Wed, 12 May 2010 11:12:04 -0700 (PDT)
In-Reply-To: <4BEADACD.4080609@fas.harvard.edu>
References: <062.bc75a3b3c4a980df34535f87c9484935@tools.ietf.org> <071.30b67e93d22f0bfedf46b5035d133441@tools.ietf.org> <1F68067D-33B9-4F0C-B31B-B3A56A72DBA4@cisco.com> <4BEAC888.50109@fas.harvard.edu> <4BEACCD7.8080401@octasic.com> <4BEACEBF.7080403@fas.harvard.edu> <4BEAD147.8080307@octasic.com> <4BEAD5C1.4000802@fas.harvard.edu> <4BEAD963.4010300@octasic.com> <4BEADACD.4080609@fas.harvard.edu>
Date: Wed, 12 May 2010 14:12:04 -0400
Message-ID: <AANLkTin6miu3S1lFoW903zV1VXh1rtagx-WuYXuSeVWA@mail.gmail.com>
From: Roman Shpount <roman@telurix.com>
To: bens@alum.mit.edu
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: codec@ietf.org
Subject: Re: [codec] #15: Efficiently combine pre-encoded audio
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 12 May 2010 18:38:45 -0000

There is one more application to efficiently combining pre-encoded
audio: playing announcements or recorded audio. Standard network or
IVR announcements can be encoded once and efficiently inserted or
combined into audio stream. If pre-encoded audio is supported and the
client supports AVT tones, it is trivial to develop a very efficient
IVR server which does not require any CODEC encoding or decoding.

Efficient decoder side VAD is also very helpful in case of speech
recognition, where it allows to save cycles in end-pointer. This way
audio only needs to be decoded and passed to the speech recognition
system only when voice is present.

Bottom line, if we have both efficient decoder side VAD and combining
pre-encoded audio we can develop some very efficient VXML servers,
voice mail and IVR system, not just conferencing servers.
_____________________________
Roman Shpount - www.telurix.com



On Wed, May 12, 2010 at 12:43 PM, Benjamin M. Schwartz
<bmschwar@fas.harvard.edu> wrote:
> On 05/12/2010 12:37 PM, Jean-Marc Valin wrote:
>>
>> Benjamin M. Schwartz wrote:
>>>
>>> I think I failed to communicate that by VAD I mean _not sending packets_
>>> during inactivity. For the packets that are sent, the overhead should
>>> average much less than 1 bit per frame.
>>
>> What you're describing is called DTX (discontinuous transmission).
>
> Oops. Right.  What I'm trying to say is that DTX, based on encoder-side VAD,
> also greatly reduces the (average) computational burden on a conference
> mixer.  Of course, if everyone's really talking at once then VAD can't help.
>
> --Ben
> _______________________________________________
> codec mailing list
> codec@ietf.org
> https://www.ietf.org/mailman/listinfo/codec
>