Re: [codec] #16: Multicast?

Koen Vos <> Wed, 26 May 2010 22:13 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 8C87C3A6A2C for <>; Wed, 26 May 2010 15:13:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: 3.3
X-Spam-Level: ***
X-Spam-Status: No, score=3.3 tagged_above=-999 required=5 tests=[BAYES_60=1, MANGLED_WRLDWD=2.3]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 9J63DXL+HloN for <>; Wed, 26 May 2010 15:13:40 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id A0D2A3A6A31 for <>; Wed, 26 May 2010 15:13:39 -0700 (PDT)
Received: from (localhost []) by (Postfix) with ESMTP id 363C420074882; Thu, 27 May 2010 00:13:30 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed;; h=message-id :date:from:to:cc:subject:references:in-reply-to:mime-version :content-type:content-transfer-encoding; s=mail; bh=kHBwXSim4FpT fudc5UW0s0Bmscs=; b=Vi5DC7J6wU5YbAI3vCES9lx6uX/IuhofTh0a5VWeOXUg X/+MxMNs4B4CTDHk+k6Zk0jLQlRRJ05r4PoxIiSFRgIpNG9rlYF/QUttAx9hWdxl MEROXtNYVSfg4cyy5opHq++DBIS4sQSiwDKs5/HUDE0FnXugPI6F6kYGkL/pekc=
DomainKey-Signature: a=rsa-sha1; c=nofws;; h=message-id:date:from :to:cc:subject:references:in-reply-to:mime-version:content-type: content-transfer-encoding; q=dns; s=mail; b=sP2J/B2zXqD/gVdPycYy QAInq2KpBCHbPz0XPNHoGuY/GgMsfHzVHpaEO+SAs2+Gx849IiNVK1Si41e7OnI4 xUzfvpkcIh+03dPQ+O29rmPxNbfzon2j77iSYjrEPcjsadYdg5WhsBosuyt3miWY 8dR3n9sikFOAtQSVcm8HxwI=
Received: from localhost (localhost []) by (Postfix) with ESMTP id 3391C20074457; Thu, 27 May 2010 00:13:30 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id aDW3thWdflHJ; Thu, 27 May 2010 00:13:27 +0200 (CEST)
Received: by (Postfix, from userid 33) id 819B320074881; Thu, 27 May 2010 00:13:27 +0200 (CEST)
Received: from ( []) by (Horde Framework) with HTTP; Wed, 26 May 2010 15:13:26 -0700
Message-ID: <>
Date: Wed, 26 May 2010 15:13:26 -0700
From: Koen Vos <>
To: "Raymond (Juin-Hwey) Chen" <>
References: <> <> <> <> <> <000001cae173$dba012f0$92e038d0$@de> <> <001101cae177$e8aa6780$b9ff3680$@de> <> <002d01cae188$a330b2c0$e9921840$@de> <> <> <> <> <> <> <> <> <> <> <>
In-Reply-To: <>
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"; DelSp="Yes"; format="flowed"
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
User-Agent: Internet Messaging Program (IMP) H3 (4.3.4)
Cc: "" <>
Subject: Re: [codec] #16: Multicast?
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 26 May 2010 22:13:42 -0000

Hi Raymond,

Thanks for the detailed explanation, this clarifies your earlier  
statements about the 3x multiplier.

The essence, if I understand you correctly, is that there still exist  
low-end platforms with barely enough processing power to run a VoIP  
call.  If such platforms use a naive FIFO scheduler, they'll create up  
to one frame of processing delay for encoder and decoder each, on top  
of the frame of buffering delay.

The good news is that Moore's law will continue to drive down the  
fraction of platforms with such processing delay problems.

I'm a bit surprised by your analysis of "packet transmission delay",  
as it has little bearing on our multiplier (ie the change in delay as  
a function of frame size). See old posts.


Quoting "Raymond (Juin-Hwey) Chen":

> Hi Cullen,
> Sorry for the delay of my reply.  I was busy last week and could not respond
> earlier.
> Thank you for sharing the details of your delay measurements on Cisco 7960 IP
> phones.  What you observed does NOT conflict with what I have been saying.
> The reason is that the 20 ms and 30 ms you quoted are the "packet sizes", not
> the "codec frame sizes".  Codec frame size and packet size have different
> impacts on one-way delay.  The G.711 codec that you used is a sample-by-
> sample codec. Theoretically its "codec frame size" is only one sample, or
> 0.125 ms, so the (3 x 30 ms - 3 x 20 ms) formula is not the right target
> for comparison.
> Furthermore, many telephones have G.711 encoder and decoder directly built
> into the chip hardware of A/D and D/A, so they can directly digitize the
> input audio signal into 8-bit G.711 codewords and directly playback 8-bit
> G.711 codewords as the output audio signal; thus, there is essentially no
> processing delay for G.711.  Even if the G.711 encoding/decoding is done in
> software or firmware, the G.711 codec complexity is so low that it takes
> almost no time to do G.711 processing.  The almost-zero processing delay can
> contribute to the extra low delay of G.711-based VoIP systems.
> There have been so many discussions about how the codec frame size and packet
> size may affect the one-way delay, there has been confusion, and there have
> been criticism that there wasn't any rigorous theoretical analysis, so I
> thought I would spend some time to give a more rigorous delay analysis below
> so we can hopefully settle such disputes. At the end of my analysis, you will
> see how the lower bound and upper bound of the one-way delay depend on the
> codec frame size AND the packet size under various conditions. Please read on
> if you are interested; ignore if you are not; or you can quickly scroll down
> to Equations (1) through (3), which are the main results of my delay
> analysis, and read the last few paragraphs after Eq. (3).
> Before I did the following delay analysis, I consulted extensively with three
> Broadcom senior technical leads who have many years of extensive real-time
> system architecture and design experiences in IP phones, VoIP gateways, and
> video systems (such as cable/satellite set-top boxes), respectively.  What
> they told me were consistent with each other and consistent with what I have
> been saying.
> Before I start the analysis, let me first discuss the multi-tasking, or Real-
> Time Scheduling (RTS) delay, because it is a critical component of the total
> one-way delay and needs to be clarified first.
> In real-time audio or video systems, many tasks have definite completion
> deadlines beyond which the real-time operation will be lost and there will be
> audible or visible glitches. One way to handle a real-time task is by
> interrupting the processor in the hope that the processor will put down
> whatever it is doing and service the interrupt first.  If there is only one
> real-time task and all other tasks in the system do not have real-time
> requirements, then the interrupt will be serviced immediately and there is no
> RTS delay.  However, this is rarely the case, since the system typically also
> has other real-time tasks. (For example, an IP phone needs to handle the
> encoding of the send-path signal, decoding of the receive-path signal, echo
> canceller, side-tone, and other real-time tasks at the same time.) Then, the
> interrupts generated by different real-time tasks need to be prioritized.
> There can be only one highest-priority task.  Any of the other tasks will
> have a lower priority and need to wait for its turn if it tries to interrupt
> a higher-priority task. That wait time, plus the time it takes the processor
> to complete the task, is the RTS delay of that task. The entire audio or
> video stream will need to be buffered and delayed by at least the worst-case
> wait time in order to have a smooth playback without any gaps or glitch.
> If there are a large number of real-time tasks in the system, then a
> prioritized interrupt-driven RTS system will become very complex and messy,
> and the associated context switching for all the interrupts will reduce the
> system efficiency.  Therefore, in IP phones, VoIP gateways, and
> cable/satellite set-top boxes, usually a different kind of real-time
> scheduling scheme is used, where each real-time task is allowed to run to
> completion, but to simplify RT scheduling, all real-time tasks are requested
> in a periodic manner, or with similar assumptions such as a minimum interval.
> In many of these designs, all real time tasks on any one processor have the
> same period (or "thread interval") for maximum real time efficiency.  In the
> case of real-time voice communication systems, the most convenient and common
> thread interval is the codec frame size.  Thus, the codec frame size
> determines how much RTS delay the system has.  I have consulted my Broadcom
> colleague Sandy MacInnis, a senior architect who specializes in video and
> system design, and who is knowledgeable about real time scheduling.  He was
> the chair of the MPEG Systems committee for MPEG-1 and MPEG-2 (i.e. MPEG
> Transport, MPEG Programs streams, and MPEG-1 Systems).  I will quote him
> below:
> "For most efficient scheduling, all tasks should have the same period, and in
> the general case, each task may be served any time from immediately after the
> request to the last instant before the next request. So, for such efficient,
> general and robust systems, the RTS (real time scheduling) latency is up to
> one request period, which in this case is a frame duration. When the request
> is serviced earlier, the data has to be buffered up because the end-end delay
> needs to be constant. While someone might say that they think an RTS scheme
> can service requests with consistently less latency than a frame time, I
> would challenge them for a theoretical basis that shows they can do so
> reliably. What happens when all the requests happen at the same time? That
> can certainly happen, in general.  ...  An extremely standard basic
> assumption of RTS, and in particular Rate Monotonic Scheduling (RMS), is that
> for each task, the deadline equals the period. That means that from the time
> a requester makes a request, the RTS system needs to ensure that the request
> is completely serviced (finished, not just started) before the period from
> that request to the next request expires. Other assumptions are possible, but
> longer deadlines don't usually help much and they make the system more
> complex, and shorter deadlines make scheduling harder.  If there is a set of
> tasks with exactly the same period, i.e. synchronous, then it's possible to
> schedule the shared resource to 100% of capacity while ensuring RT
> performance. However, in the more typical case, the various tasks do not have
> the same period, in which case in general the maximum utilization of the
> shared resource that can be scheduled for real time tasks is significantly
> less than 100%. Whether the system is real-time schedulable or not can be
> determined in various ways, including critical instant analysis.  In either
> case, in general the latency of any given request can be anywhere from zero
> plus processing time, to exactly the period = deadline."
> For a PC with a very powerful processor and a very light real-time load, it
> may be reasonable to expect the processor to perform the encoding and
> decoding tasks very shortly after they are requested, with the requests being
> driven by interrupts, and the processing time of each task may be very short
> relative to the interval between requests. The resulting RTS delay may be as
> low as a few percent of the frame interval.  This is possible because a
> typical PC has much higher processing power than is required by a speech
> coder.
> The same is not true for VoIP gateways or IP phones, where the processor is
> heavily loaded with real-time tasks and is often just barely fast enough to
> handle the designated number of voice channels (many for gateways and one for
> IP phones).  For example, rather than having a 2 to 3 GHz processor as in a
> PC, the processor used to do speech coding in a low-end IP phone may only
> have a clock rate of slightly more than 100 MHz.  In this case, it is
> reasonable to expect that the time required to service each request,
> including processing time, may be as much as the full frame interval.
> OK, now that the RTS delay has been discussed, let me proceed with my delay
> analysis.  I will break down the delay into many components, with each
> component occurring after the components listed earlier.  Let the codec frame
> size be F ms and the packet size be P ms.  Let each packet contain N codec
> frames, so P = N*F.  For simplicity, we will not consider the codec look-
> ahead L ms and codec filtering delay R ms in this analysis and will just add
> them at the end because we know their multiplier is 1X.
> The one-way mouth-to-ear delay includes the following codec-dependent delay
> components:
> (1) Encoder buffering delay: d1 = a1*F, where a1 = 1.
> This is the time it takes to buffer all input samples of a codec frame.
> (2) Encoder RTS delay: d2 = a2*F, where 0 < a2 <= 1.
> This includes the encoder processing delay; see the discussion above.
> (3) Packetization delay: d3 = a3*F, where a3 = (N-1).
> This is the amount of time the first frame in the packet need to wait until
> the last frame of encoded bits in the packet is ready.
> (4) Packet transmission delay: d4 = a4*F, where 0 < a4 <= N.
> This is the time it takes to ship all bits in the packets out of the
> transmitter; this can also be considered the decoder bit buffering delay,
> since it is the time the decoder needs to wait to get all bits in the packet.
> If the speed of the communication channel is very high, then d4 can be a very
> small fraction of the packet size P = N*F ms, but it will not be zero.  If
> the channel speed is exactly the same as the bit-rate of the packet
> (including the packet header), then d4 = P = N*F ms.  Even for the case of
> high-speed channel, if we view the bit transmission task as a real-time
> scheduling problem for the micro-controller (which may run at a different
> thread rate than the DSP), then the scheduling wait time plus the processing
> time (i.e. the time to actually transmit bits) may still take up to one
> thread interval, which is P = N*F ms in this case.
> (5) Decoder RTS delay: d5 = a5*F, where 0 < a5 <= 1.
> This includes the decoder processing delay; see the discussion above.
> There may be other delay components that may depend on the codec frame size.
> For example, in gateways where a few layers of processors are used, each
> processor may have its own real-time scheduling delays for all tasks that it
> handles.  However, at least the delay components listed above are the major
> ones that are commonly encountered.  If we omit the other possible codec-
> dependent components for the moment but add back the codec look-ahead L and
> codec filtering delay R (if any), the total codec-dependent one-way delay is
> then
> D = d1 + d2 +... + d5 + L + R = {1 + (0,1] + (N-1) + (0,N] + (0,1]}*F + L + R
> Hence, the one-way delay D has a possible range of
> N*F + L + R < D <= (2*N + 2)*F + L + R, or
> P + L + R < D <= 2*P + 2*F + L + R             Eq. (1)
> For heavily loaded real-time systems such as VoIP gateways or IP phones, if
> we assume the worst case of one full frame of encoder RTS delay and decoder
> RTS delay, then a2 = 1 and a5 = 1, and we get a tighter range for the one-way
> delay:
> P + 2*F + L + R < D <= 2*P + 2*F + L + R       Eq. (2)
> In the special case of N = 1 (each packet contains only one codec frame),
> then we get
> 3*F + L + R < D <= 4*F + L + R                 Eq. (3)
> The delay lower bounds in Eq. (1) through Eq. (3) above (under their
> individual assumptions) are consistent with what I have been saying.
> If the other omitted codec-dependent delay components are significant, or
> if the system implementers have not been careful about minimizing the delay,
> then the delay upper bounds can be even higher than what are shown  
> in Eq. (1) through Eq. (3).
> In your Cisco 7960 IP phone delay measurements, P = 20 ms or 30 ms, L = 0, R
> = 0, and theoretically F = 0.125 ms.  If you look at Eq. (2) above, then it
> is clear that you won't see 3 times the packet size difference as the delay
> difference.  However, here the codec frame size is 0.125 ms, not 20 or 30 ms,
> so this result doesn't conflict with what I have been saying (i.e. 3X codec
> frame size).
> Of course, in reality it is unlikely that an IP phone will use 0.125 ms as
> the thread interval.  A more likely thread interval is P.  Then, my delay
> analysis above does not apply directly.  However, it is not difficult to
> follow the same logic and procedure to see what will happen in this case.  If
> G.711 encoding and decoding is built right into the A/D and D/A, then the 8-
> bit G.711 codewords directly arrives at the input buffer or leave the output
> buffer and the RTS system does not need to schedule G.711 encoding and
> decoding tasks, so d2 = d5 = 0. Also, in this case d1 = P, d3 = 0, and 0 < d4
> <= P.  Thus, the total one-way delay is P < D <= 2*P.
> Even if the G.711 encoding and decoding operations are done in
> software/firmware, the G.711 complexity is so low that it takes the processor
> almost no time to do encoding and decoding.  In this case, the IP phone is
> closer to the case of a PC that has much more processing power than is
> required for speech coding, and if the Cisco engineers did a good job of
> optimizing RTS to minimize d2 and d5, then d2 and d5 would be closer to 0
> than to P.  Then, the total one-way codec-dependent delay would be closer to
> P than to 3*P.  This is probably what you have observed.
> Best Regards,
> Raymond
> -----Original Message-----
> From: Cullen Jennings []
> Sent: Tuesday, May 18, 2010 10:34 AM
> To: Raymond (Juin-Hwey) Chen
> Cc: Koen Vos;
> Subject: Re: [codec] #16: Multicast?
> On May 12, 2010, at 12:28 PM, Raymond (Juin-Hwey) Chen wrote:
>> Hi Cullen,
>> Hmm... That's interesting.  Would you please share more details of
>> your measurement equipment setup, the codec used, the codec frame
>> size, the number of codec frames in each packet, the way you
>> measured the delay, and the measured delay value, etc.?
> Sure - it's really simple to set up. I use a signal generator that  
> makes a tone burst. Typically I do something like 550 Hz tone that  
> is a 200 ms long burst that occurs once a second. I play this into a  
> speaker near the microphone on one phone and also put it into 1  
> channel of a scope.  I think put a microphone near the speaker of  
> the other phone and put that on another channel of the scope and  
> measure the mouth to ear delay. It's really easy to see the starts  
> of the two bursts from the speaker and microphone and measure the  
> delay within a few ms. I've done lots of clever things over the  
> years using stats from software in the phones but this technique is  
> easy and pretty fool proof on getting good results. The phones were  
> plugged into Netgear 100Mbps hub as I sometimes look at the timing  
> of the ethernet packets too. I set up phones for G.711, one frame  
> per packet - I choose this as  it is easy to change the length of  
> each frame but results are similar with other codecs.
> I was using two Cisco 7960 phones - no idea what version of the  
> software and may have been a development build but it's pretty  
> unlikely that current production software would have different  
> results from what I tested. I set the phones for G.711 with 20 ms  
> packets, measured delay, then set the phones for 30 ms packets and  
> measured delay. For a given packet length, when I make multiple  
> phone calls or reboot one of the phones, the measurements stay  
> consistent.
> The resulting change in delay between the two experiments was much  
> less than 30ms that a (3x 30 ms - 3 x 20 ms) would have predicted. I  
> feel like a total tool not providing the details of the exact  
> measurements but lots of measurements like this Cisco considers  
> confidential and I'm just not up to arguing with folks about what  
> can and can not be said publicly. I probably should have done a test  
> with 10 ms packets too but I did not. Yes I realize how nuts it is  
> to consider something that anyone can easily measure as  
> confidential. If anyone really cares, I will go do the work to be  
> able to provide the numbers.
>> I didn't come up with this 3*(codec frame size) delay number for IP
>> phones myself.  A very senior technical lead in Broadcom's IP phone
>> chips group told me that, and Broadcom is currently the #1 world-
>> wide market share leader in IP phone chips, accounting for more than
>> half of the world's IP phone chip shipments.
>> Most of the world's
>> tier-1 IP phone manufacturers use our IP phone chips at least in
>> some of their product lines.
> Yah, again, Cisco considers chipsets confidential but if you poke  
> around with google,  such as
> Folks claim the 7960 uses a Broadcom chip - of course that looks  
> like it is for the ethernet switch on the phone not much to do with  
> software that impacts audio latency.
>> I would be very interested to learn more about your measurements to
>> try to reconcile these seemingly contradictory statements from two
>> different sources.  Thanks.
> Be glad to talk to whoever it is. I really don't know relevant this  
> all is too figuring out what packets sizes we need to support.
>> Best Regards,
>> Raymond
>> -----Original Message-----
>> From: Cullen Jennings []
>> Sent: Wednesday, May 12, 2010 8:00 AM
>> To: Raymond (Juin-Hwey) Chen
>> Cc: Koen Vos;
>> Subject: Re: [codec] #16: Multicast?
>> On May 4, 2010, at 7:15 PM, Raymond (Juin-Hwey) Chen wrote:
>>> the 3*(codec frame size) delay is very real for IP phone
>> This does not match the measurements I have. And I certainly don't  
>> have 100+ year voip experience but I do have two of the #1 selling  
>> enterprise phones connected to an oscilloscope. Test with other  
>> phones suggest most the major vendors of IP hard phones have fairly  
>> comparable performance when it comes to delay.
>> Cullen
> Cullen Jennings
> For corporate legal information go to: