Re: [codec] #16: Multicast?

"Raymond (Juin-Hwey) Chen" <> Wed, 12 May 2010 02:18 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id D657328C0F0 for <>; Tue, 11 May 2010 19:18:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -0.115
X-Spam-Status: No, score=-0.115 tagged_above=-999 required=5 tests=[AWL=-0.116, BAYES_50=0.001]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id hKuacx2UlU1F for <>; Tue, 11 May 2010 19:18:48 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id B56A628C0ED for <>; Tue, 11 May 2010 19:18:47 -0700 (PDT)
Received: from [] by with ESMTP (Broadcom SMTP Relay (Email Firewall v6.3.2)); Tue, 11 May 2010 19:18:26 -0700
X-Server-Uuid: 02CED230-5797-4B57-9875-D5D2FEE4708A
Received: from ([]) by ([]) with mapi; Tue, 11 May 2010 19:19:49 -0700
From: "Raymond (Juin-Hwey) Chen" <>
To: Koen Vos <>, "" <>, "Benjamin M. Schwartz" <>
Date: Tue, 11 May 2010 19:18:25 -0700
Thread-Topic: [codec] #16: Multicast?
Thread-Index: Acrw0mATDtvAvsPcTemmFkzsF3mL4AAlncHA
Message-ID: <>
References: <> <> <> <> <> <000001cae173$dba012f0$92e038d0$@de> <> <001101cae177$e8aa6780$b9ff3680$@de> <> <002d01cae188$a330b2c0$e9921840$@de> <> <> <> <002c01cae939$5c01f400$1405dc00$@de> <>, <009901caede1$43f366d0$cbda3470$@de> <> <CB68DF4CFBEF4942881AD37AE1A7E8C74B90345C0D@IRVE XCHCC... <> <>
In-Reply-To: <>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
MIME-Version: 1.0
X-WSS-ID: 67F4D07820S122105949-01-01
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Cc: "" <>
Subject: Re: [codec] #16: Multicast?
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 12 May 2010 02:18:51 -0000

Hi Koen,

You wrote:
> Not a typo: codecs have become more wasteful with delay, while  
> delivering better fidelity.  G.718 evolved out of AMR-WB and has more  
> than twice the delay.  Same for G.729.1 versus G.729.  This is not by  
> accident.

[Raymond]: If you read the published technical papers on G.718 and 
G.729.1 carefully, I think you will find that the real reason for the 
increased delay is not because they needed a longer delay to achieve 
better fidelity for speech, but because they wanted to extend speech 
codecs to also get good performance when coding general audio (music, 
etc.).  To get good music coding performance, most audio codecs use 
Modified Discrete Cosine Transform (MDCT) with at least a transform 
window size that is fairly large, so most of the audio codecs have 
longer coding delays than speech codecs.  

To code music well, G.718 and G.729.1 developers naturally had to use 
long MDCT transform windows on top of the codec delay already in AMR-
WB and G.729. Even so, the resulting longer delays of G.718 and 
G.729.1 are still not any longer than typical delays of audio codecs; 
in fact, they are probably somewhat shorter. 

My point is that the increased delays of G.718 and G.729.1 are purely 
a result of changing from "speech-only" to "speech and music". It's 
not because the G.718 and G.729.1 developers knew the network delay 
was getting shorter so they could be more wasteful with delay.  
Furthermore, even after they changed the codecs to handle music as 
well as speech, they still chose to make their codec delays shorter 
than the delays of most audio codecs.  Why?  They wanted to make 
their codec delays as short as they could.  In fact, they even made 
an effort to introduce a "low-delay mode" into both G.718 and 
G.729.1. That shows they were pretty concerned about the higher 
delays they needed to have in order to code music well. 

By the way, G.718 does NOT have "more than twice the delay" of AMR-WB 
as you said.  AMR-WB has a 20 ms frame size, 5 ms look-ahead, and 
1.875 ms of filtering delay, for a total algorithmic buffering delay 
of 26.875 ms.  The "normal mode" of G.718 has a buffering delay of 
42.875 ms for 16 kHz wideband input/output. That's only 59.5% higher 
than AMR-WB.  For Layers 1 and 2 coding of speech, the "low-delay 
mode" shaves 10 ms off to give a delay of 32.875 ms, or only 22.3% 
higher than AMR-WB.

When G.729.1 was first standardized in May 2006, there was already a 
low-delay mode for narrowband speech at 8 and 12 kb/s with a 
algorithmic buffering delay of 25 ms.  Later in August 2007, the 
developers made an effort to add another low-delay mode for wideband 
at 14 kb/s that has a buffering delay of 28.94 ms.  If they wanted to 
sacrifice delay to get higher fidelity as you suggested, then why 
would they bother to go back and add another low-delay mode for 

In fact, only a few months ago in their G.729.1 paper in IEEE 
Communications Magazine, October 2009, Varga, Proust, and Taddei 
still emphasized in multiple instances the importance of achieving a 
low coding delay.  I will quote two of the instances: 

"The low-delay mode... was added to the first wideband layer at 14 
kb/s of G.729.1 (August 2007).  The motivation was to address 
applications such as VoIP in enterprise networks where low end-to-end 
delay is crucial" and 

"Indeed, delay is an important performance parameter, and 
transmitting speech with low end-to-end delay is also required in 
several applications making use of wideband signals".

In summary, I do not see a clear trend where codec developers are 
becoming more wasteful with delay in order to get higher fidelity. If 
anything, in recent years I saw a trend of low-delay audio coding, 
such as low-delay AAC and the CELT codec, and I saw the effort by 
G.718 and G.729.1 developers to introduce low-delay modes.

In any case, I thought a few days ago a consensus was already reached 
in the WG email reflector that the IETF codec needs to have a low-
delay mode with a 5 to 10 ms codec frame size so that it can handle 
delay-sensitive applications (that is 5 out of 6 applications listed 
in the charter and codec requirement document).  Therefore, I think 
the discussion in your last email and my current email is mostly of 
academic interest only and doesn't and shouldn't affect how the IETF 
codec is to be designed.

Best Regards,