Re: [codec] #19: How large is the frame size depended delay / the serialization delay?

"Raymond (Juin-Hwey) Chen" <> Wed, 05 May 2010 00:10 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 374933A687C for <>; Tue, 4 May 2010 17:10:36 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -0.207
X-Spam-Status: No, score=-0.207 tagged_above=-999 required=5 tests=[AWL=-0.209, BAYES_50=0.001, HTML_MESSAGE=0.001]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id mPGJFRbrmiaU for <>; Tue, 4 May 2010 17:10:24 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 64C3D3A657C for <>; Tue, 4 May 2010 17:10:24 -0700 (PDT)
Received: from [] by with ESMTP (Broadcom SMTP Relay (Email Firewall v6.3.2)); Tue, 04 May 2010 17:10:00 -0700
X-Server-Uuid: D3C04415-6FA8-4F2C-93C1-920E106A2031
Received: from ([]) by ([]) with mapi; Tue, 4 May 2010 17:10:00 -0700
From: "Raymond (Juin-Hwey) Chen" <>
To: Christian Hoene <>, "" <>
Date: Tue, 04 May 2010 17:09:53 -0700
Thread-Topic: [codec] #19: How large is the frame size depended delay / the serialization delay?
Thread-Index: AcrpMp+sn+iSbbxBTgGizlvP/ru38wAG8FcgAKPVhSA=
Message-ID: <>
References: <> <001101cae177$e8aa6780$b9ff3680$@de> <> <002d01cae188$a330b2c0$e9921840$@de> <> <> <> <> <> <> <> <002f01cae951$6d802d60$48808820$@de>
In-Reply-To: <002f01cae951$6d802d60$48808820$@de>
Accept-Language: en-US
Content-Language: en-US
x-cr-puzzleid: {612FAF6A-FDEA-4305-BF94-20AD88568E7F}
acceptlanguage: en-US
MIME-Version: 1.0
X-WSS-ID: 67FE68D238O196421366-01-01
Content-Type: multipart/alternative; boundary="_000_CB68DF4CFBEF4942881AD37AE1A7E8C74B903454FAIRVEXCHCCR01c_"
Subject: Re: [codec] #19: How large is the frame size depended delay / the serialization delay?
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 05 May 2010 00:10:37 -0000

Hi Christian,

Not all embedded processors can dynamically adjust their clock rate or adjust it as frequently as once or twice per codec frame.

Regarding the 10000-channel gateway example you gave, the actual processing delay is MUCH larger than (frame size)/10000.  Here are the reasons:

(1) That (frame size)/10000 delay assumes that the gateway uses a single DSP fast enough to handle 10000 channels simultaneously.  No gateway is designed that way since no DSP is that fast.  Instead, typical VoIP gateways use many, many DSPs, with each DSP handling only N voice channels, where N might be from single digit to a few dozens, so the actual time the DSP spent in processing the codec frame is (frame size)/N >> (frame size)/10000.

(2) More importantly, the bulk of the processing delay is actually not the (frame size)/N delay above, but the time that each voice channel spends waiting for their turn to get processed by the DSP, and other various buffering delays and scheduling delays encountered by the DSPs and the micro-controllers, because there are so many voice channels that need to be handled simultaneously, making the timing issue extremely complex. These delays depends on the codec frame size chosen.

The engineering manager I mentioned in my last email (who is a different from the IP phone expert I previously mentioned) told me that "the devil is in the details" and that excluding the jitter buffer delay and other codec-independent delays, a straightforward VoIP gateway implementation without paying attention to minimizing delay may have a codec-dependent one-way delay of 5X to 6X codec frame size because of all of the various delays of (2) above due to complex timing issues that come with supporting so many channels simultaneously.  Even after analyzing all delay components carefully and "optimizing the delay to death" until there is no more room for delay reduction, the worst-case one-way codec-dependent delay is still about 3X codec frame size, excluding jitter and other codec-independent delay.  This is an independent corroboration of what the other IP phone expert said about the codec-independent one-way delay of 3X codec frame size for VoIP gateways. (The two of them worked on different projects in different companies.)

My conclusion: while I am less familiar with VoIP soft client implementations on computers, at least for IP phones and VoIP gateways, the rule of thumb that many engineers found to work well for codec-dependent one-way delay is 3X (codec frame size) + other codec buffering delays (e.g. look-ahead and/or filtering delay).


From: [] On Behalf Of Christian Hoene
Sent: Saturday, May 01, 2010 10:12 AM
Subject: Re: [codec] #19: How large is the frame size depended delay / the serialization delay?


I agree that serialization, processing, and implementation delay should be distinguished.

Assume a low-cost VoIP phone with its processing power being fully utilized by one call: Then, the DSP/CPU needs an entire frame duration to encode and decode frames. Thus, the latency is increase by one frame length in addition to the serialization delay, propagation delay, algorithmic delay, dejittering delay, echo cancelling delay, ...  Running the chips at 100% load is of course cost saving compared to add some more computational resource. But is this still a relevant issue today?

I am not sure whether it always make sense for mobile device to run at 100% load. Of course, from a energy consumption perceptive it make sense to run CMOS circuit at the lowest possible frequency as power consumption drops quadratic. But maybe running the CPU/DSP at higher speed and switching to power save mode if after the a frame has been decoded/encoded is be equally energy efficient...

Even if a gateways DSP is utilized fully, the processing delays must not be very large. For example, take a gateway serving 10000 calls and all CPUs/DSPs are at 100%. Then, the time needed for encoding/decoding should be frame duration/10000, if the encoding/decoding is well scheduled.

Also, I do not think we shall consider implementation delay, which occurs due to suboptimal implementation. For example, some years ago we tested the RTT of two Linux softphones link directly together using G.711. It was 400ms. The implementation delay could be a good performance metric to differentiate two otherwise equal products. Also, the algorithmic processing delay might be subject to similar market optimization.

Having said this, I would anyhow suggest to include the processing delay into the measurement of the end-to-end (acoustic round) trip time.  Those measurements should be part of the control loop that optimizes the overall conversation call quality.


Dr.-Ing. Christian Hoene
Interactive Communication Systems (ICS), University of Tübingen
Sand 13, 72076 Tübingen, Germany, Phone +49 7071 2970532

From: [] On Behalf Of stephen botzko
Sent: Saturday, May 01, 2010 3:31 PM
To: Koen Vos
Subject: Re: [codec] #16: Multicast?

If the frame-size multiplier is due to serialization, then I agree with Koen's assessment.  In fact on many connections the multiplier would be less than 1. Dial-up is of course the worst case here, and on those links the multiplier ought to be close to 2.  Variations due to congestion (and on some links, polling) are (IMHO) better modeled as jitter.

Gateways are another matter, with the delays being highly dependent on the product architecture.  Interupt latencies, context switching, bus architectures, etc. can dominate, so it is totally possible that reducing the frame size might actually increase the latency (since it increases the packets per second load on the gateway).  So I agree with Koen on this as well.

Anecdotal models based on industry experience can be useful guides - though if we are going to use these models to drive requirements, I'd prefer something more analytical.

Stephen Botzko
On Sat, May 1, 2010 at 2:07 AM, Koen Vos <<>> wrote:
Quoting "Raymond (Juin-Hwey) Chen":
 One-way delay = codec-independent delay + 3*(codec frame size) + (codec look-ahead) + (codec filtering delay if any)

This formula was obtained from an experienced engineer who has been working on IP phones related fields for more than a decade,

At Skype We have 100+ years of combined VoIP experience, and a focus on minimizing delay as part of our goal to maximize quality.  The consensus among our engineers is that the multiplier is closer to 1 than to 2, at least for software VoIP applications over typical Internet connections.  Some years ago the situation was slightly worse because dial-up was more prevalent.

Similar 3X multiplier is also observed in VoIP gateways.  Even with a fast processor/system optimized from ground up to be low-delay, the measured "codec-dependent" one-way delay of such a VoIP gateway using the G.711 codec with a 5 ms frame/packet size is between 12 and 17 ms, or around 3X the frame size.

As I've pointed out before, that doesn't say much about how the delay increases with larger frame sizes.  Perhaps the 12~17 ms includes a constant delay of 7 ms, and the marginal growth of delay with frame size is 1x.


codec mailing list<>