Re: [codec] #19: How large is the frame size depended delay / the serialization delay / frame size depended processing delay?

"codec issue tracker" <trac@tools.ietf.org> Thu, 24 June 2010 15:31 UTC

Return-Path: <trac@tools.ietf.org>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 87E153A680D for <codec@core3.amsl.com>; Thu, 24 Jun 2010 08:31:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -101.153
X-Spam-Level:
X-Spam-Status: No, score=-101.153 tagged_above=-999 required=5 tests=[AWL=-1.153, BAYES_50=0.001, NO_RELAYS=-0.001, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pr8KxjFc0+Zz for <codec@core3.amsl.com>; Thu, 24 Jun 2010 08:30:44 -0700 (PDT)
Received: from zinfandel.tools.ietf.org (unknown [IPv6:2001:1890:1112:1::2a]) by core3.amsl.com (Postfix) with ESMTP id 7FC8928C0DC for <codec@ietf.org>; Thu, 24 Jun 2010 08:30:40 -0700 (PDT)
Received: from localhost ([::1] helo=zinfandel.tools.ietf.org) by zinfandel.tools.ietf.org with esmtp (Exim 4.72) (envelope-from <trac@tools.ietf.org>) id 1ORoO9-0000sO-7D; Thu, 24 Jun 2010 08:30:49 -0700
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: codec issue tracker <trac@tools.ietf.org>
X-Trac-Version: 0.11.7
Precedence: bulk
Auto-Submitted: auto-generated
X-Mailer: Trac 0.11.7, by Edgewall Software
To: hoene@uni-tuebingen.de
X-Trac-Project: codec
Date: Thu, 24 Jun 2010 15:30:49 -0000
X-URL: http://tools.ietf.org/codec/
X-Trac-Ticket-URL: http://trac.tools.ietf.org/wg/codec/trac/ticket/19#comment:5
Message-ID: <071.69c574ff48426175f73948ea16470629@tools.ietf.org>
References: <062.f8b0d2abf056a9655a81ee25366bb354@tools.ietf.org>
X-Trac-Ticket-ID: 19
In-Reply-To: <062.f8b0d2abf056a9655a81ee25366bb354@tools.ietf.org>
X-SA-Exim-Connect-IP: ::1
X-SA-Exim-Rcpt-To: hoene@uni-tuebingen.de, codec@ietf.org
X-SA-Exim-Mail-From: trac@tools.ietf.org
X-SA-Exim-Scanned: No (on zinfandel.tools.ietf.org); SAEximRunCond expanded to false
Cc: codec@ietf.org
Subject: Re: [codec] #19: How large is the frame size depended delay / the serialization delay / frame size depended processing delay?
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Reply-To: codec@ietf.org
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 24 Jun 2010 15:31:48 -0000

#19: How large is the frame size depended delay / the serialization delay /
frame size depended processing delay?
------------------------------------+---------------------------------------
 Reporter:  hoene@…                 |       Owner:     
     Type:  enhancement             |      Status:  new
 Priority:  minor                   |   Milestone:     
Component:  requirements            |     Version:     
 Severity:  -                       |    Keywords:     
------------------------------------+---------------------------------------

Comment(by hoene@…):

 [Raymond]:
 Thank you [Cullen] for sharing the details of your delay measurements on
 Cisco 7960 IP phones.  What you observed does NOT conflict with what I
 have been saying.
 The reason is that the 20 ms and 30 ms you quoted are the "packet sizes",
 not the "codec frame sizes".  Codec frame size and packet size have
 different impacts on one-way delay.  The G.711 codec that you used is a
 sample-by- sample codec. Theoretically its "codec frame size" is only one
 sample, or
 0.125 ms, so the (3 x 30 ms - 3 x 20 ms) formula is not the right target
 for comparison.

 Furthermore, many telephones have G.711 encoder and decoder directly built
 into the chip hardware of A/D and D/A, so they can directly digitize the
 input audio signal into 8-bit G.711 codewords and directly playback 8-bit
 G.711 codewords as the output audio signal; thus, there is essentially no
 processing delay for G.711.  Even if the G.711 encoding/decoding is done
 in software or firmware, the G.711 codec complexity is so low that it
 takes almost no time to do G.711 processing.  The almost-zero processing
 delay can contribute to the extra low delay of G.711-based VoIP systems.

 There have been so many discussions about how the codec frame size and
 packet size may affect the one-way delay, there has been confusion, and
 there have been criticism that there wasn't any rigorous theoretical
 analysis, so I thought I would spend some time to give a more rigorous
 delay analysis below so we can hopefully settle such disputes. At the end
 of my analysis, you will see how the lower bound and upper bound of the
 one-way delay depend on the codec frame size AND the packet size under
 various conditions. Please read on if you are interested; ignore if you
 are not; or you can quickly scroll down to Equations (1) through (3),
 which are the main results of my delay analysis, and read the last few
 paragraphs after Eq. (3).

 Before I did the following delay analysis, I consulted extensively with
 three Broadcom senior technical leads who have many years of extensive
 real-time system architecture and design experiences in IP phones, VoIP
 gateways, and video systems (such as cable/satellite set-top boxes),
 respectively.  What they told me were consistent with each other and
 consistent with what I have been saying.

 Before I start the analysis, let me first discuss the multi-tasking, or
 Real- Time Scheduling (RTS) delay, because it is a critical component of
 the total one-way delay and needs to be clarified first.

 In real-time audio or video systems, many tasks have definite completion
 deadlines beyond which the real-time operation will be lost and there will
 be audible or visible glitches. One way to handle a real-time task is by
 interrupting the processor in the hope that the processor will put down
 whatever it is doing and service the interrupt first.  If there is only
 one real-time task and all other tasks in the system do not have real-time
 requirements, then the interrupt will be serviced immediately and there is
 no RTS delay.  However, this is rarely the case, since the system
 typically also has other real-time tasks. (For example, an IP phone needs
 to handle the encoding of the send-path signal, decoding of the receive-
 path signal, echo canceller, side-tone, and other real-time tasks at the
 same time.) Then, the interrupts generated by different real-time tasks
 need to be prioritized.
 There can be only one highest-priority task.  Any of the other tasks will
 have a lower priority and need to wait for its turn if it tries to
 interrupt a higher-priority task. That wait time, plus the time it takes
 the processor to complete the task, is the RTS delay of that task. The
 entire audio or video stream will need to be buffered and delayed by at
 least the worst-case wait time in order to have a smooth playback without
 any gaps or glitch.

 If there are a large number of real-time tasks in the system, then a
 prioritized interrupt-driven RTS system will become very complex and
 messy, and the associated context switching for all the interrupts will
 reduce the system efficiency.  Therefore, in IP phones, VoIP gateways, and
 cable/satellite set-top boxes, usually a different kind of real-time
 scheduling scheme is used, where each real-time task is allowed to run to
 completion, but to simplify RT scheduling, all real-time tasks are
 requested in a periodic manner, or with similar assumptions such as a
 minimum interval.
 In many of these designs, all real time tasks on any one processor have
 the same period (or "thread interval") for maximum real time efficiency.
 In the case of real-time voice communication systems, the most convenient
 and common thread interval is the codec frame size.  Thus, the codec frame
 size determines how much RTS delay the system has.  I have consulted my
 Broadcom colleague Sandy MacInnis, a senior architect who specializes in
 video and system design, and who is knowledgeable about real time
 scheduling.  He was the chair of the MPEG Systems committee for MPEG-1 and
 MPEG-2 (i.e. MPEG Transport, MPEG Programs streams, and MPEG-1 Systems).
 I will quote him
 below:

 "For most efficient scheduling, all tasks should have the same period, and
 in the general case, each task may be served any time from immediately
 after the request to the last instant before the next request. So, for
 such efficient, general and robust systems, the RTS (real time scheduling)
 latency is up to one request period, which in this case is a frame
 duration. When the request is serviced earlier, the data has to be
 buffered up because the end-end delay needs to be constant. While someone
 might say that they think an RTS scheme can service requests with
 consistently less latency than a frame time, I would challenge them for a
 theoretical basis that shows they can do so reliably. What happens when
 all the requests happen at the same time? That can certainly happen, in
 general.  ...  An extremely standard basic assumption of RTS, and in
 particular Rate Monotonic Scheduling (RMS), is that for each task, the
 deadline equals the period. That means that from the time a requester
 makes a request, the RTS system needs to ensure that the request is
 completely serviced (finished, not just started) before the period from
 that request to the next request expires. Other assumptions are possible,
 but longer deadlines don't usually help much and they make the system more
 complex, and shorter deadlines make scheduling harder.  If there is a set
 of tasks with exactly the same period, i.e. synchronous, then it's
 possible to schedule the shared resource to 100% of capacity while
 ensuring RT performance. However, in the more typical case, the various
 tasks do not have the same period, in which case in general the maximum
 utilization of the shared resource that can be scheduled for real time
 tasks is significantly less than 100%. Whether the system is real-time
 schedulable or not can be determined in various ways, including critical
 instant analysis.  In either case, in general the latency of any given
 request can be anywhere from zero plus processing time, to exactly the
 period = deadline."

 For a PC with a very powerful processor and a very light real-time load,
 it may be reasonable to expect the processor to perform the encoding and
 decoding tasks very shortly after they are requested, with the requests
 being driven by interrupts, and the processing time of each task may be
 very short relative to the interval between requests. The resulting RTS
 delay may be as low as a few percent of the frame interval.  This is
 possible because a typical PC has much higher processing power than is
 required by a speech coder.

 The same is not true for VoIP gateways or IP phones, where the processor
 is heavily loaded with real-time tasks and is often just barely fast
 enough to handle the designated number of voice channels (many for
 gateways and one for IP phones).  For example, rather than having a 2 to 3
 GHz processor as in a PC, the processor used to do speech coding in a low-
 end IP phone may only have a clock rate of slightly more than 100 MHz.  In
 this case, it is reasonable to expect that the time required to service
 each request, including processing time, may be as much as the full frame
 interval.

 OK, now that the RTS delay has been discussed, let me proceed with my
 delay analysis.  I will break down the delay into many components, with
 each component occurring after the components listed earlier.  Let the
 codec frame size be F ms and the packet size be P ms.  Let each packet
 contain N codec frames, so P = N*F.  For simplicity, we will not consider
 the codec look- ahead L ms and codec filtering delay R ms in this analysis
 and will just add them at the end because we know their multiplier is 1X.

 The one-way mouth-to-ear delay includes the following codec-dependent
 delay
 components:

 (1) Encoder buffering delay: d1 = a1*F, where a1 = 1.
 This is the time it takes to buffer all input samples of a codec frame.

 (2) Encoder RTS delay: d2 = a2*F, where 0 < a2 <= 1.
 This includes the encoder processing delay; see the discussion above.

 (3) Packetization delay: d3 = a3*F, where a3 = (N-1).
 This is the amount of time the first frame in the packet need to wait
 until the last frame of encoded bits in the packet is ready.

 (4) Packet transmission delay: d4 = a4*F, where 0 < a4 <= N.
 This is the time it takes to ship all bits in the packets out of the
 transmitter; this can also be considered the decoder bit buffering delay,
 since it is the time the decoder needs to wait to get all bits in the
 packet.
 If the speed of the communication channel is very high, then d4 can be a
 very small fraction of the packet size P = N*F ms, but it will not be
 zero.  If the channel speed is exactly the same as the bit-rate of the
 packet (including the packet header), then d4 = P = N*F ms.  Even for the
 case of high-speed channel, if we view the bit transmission task as a
 real-time scheduling problem for the micro-controller (which may run at a
 different thread rate than the DSP), then the scheduling wait time plus
 the processing time (i.e. the time to actually transmit bits) may still
 take up to one thread interval, which is P = N*F ms in this case.

 (5) Decoder RTS delay: d5 = a5*F, where 0 < a5 <= 1.
 This includes the decoder processing delay; see the discussion above.

 There may be other delay components that may depend on the codec frame
 size.
 For example, in gateways where a few layers of processors are used, each
 processor may have its own real-time scheduling delays for all tasks that
 it handles.  However, at least the delay components listed above are the
 major ones that are commonly encountered.  If we omit the other possible
 codec- dependent components for the moment but add back the codec look-
 ahead L and codec filtering delay R (if any), the total codec-dependent
 one-way delay is then

 D = d1 + d2 +... + d5 + L + R = {1 + (0,1] + (N-1) + (0,N] + (0,1]}*F + L
 + R

 Hence, the one-way delay D has a possible range of

 N*F + L + R < D <= (2*N + 2)*F + L + R, or

 P + L + R < D <= 2*P + 2*F + L + R             Eq. (1)

 For heavily loaded real-time systems such as VoIP gateways or IP phones,
 if we assume the worst case of one full frame of encoder RTS delay and
 decoder RTS delay, then a2 = 1 and a5 = 1, and we get a tighter range for
 the one-way
 delay:

 P + 2*F + L + R < D <= 2*P + 2*F + L + R       Eq. (2)

 In the special case of N = 1 (each packet contains only one codec frame),
 then we get

 3*F + L + R < D <= 4*F + L + R                 Eq. (3)

 The delay lower bounds in Eq. (1) through Eq. (3) above (under their
 individual assumptions) are consistent with what I have been saying.
 If the other omitted codec-dependent delay components are significant, or
 if the system implementers have not been careful about minimizing the
 delay, then the delay upper bounds can be even higher than what are shown
 in Eq. (1) through Eq. (3).

 In your Cisco 7960 IP phone delay measurements, P = 20 ms or 30 ms, L = 0,
 R = 0, and theoretically F = 0.125 ms.  If you look at Eq. (2) above, then
 it is clear that you won't see 3 times the packet size difference as the
 delay difference.  However, here the codec frame size is 0.125 ms, not 20
 or 30 ms, so this result doesn't conflict with what I have been saying
 (i.e. 3X codec frame size).

 Of course, in reality it is unlikely that an IP phone will use 0.125 ms as
 the thread interval.  A more likely thread interval is P.  Then, my delay
 analysis above does not apply directly.  However, it is not difficult to
 follow the same logic and procedure to see what will happen in this case.
 If
 G.711 encoding and decoding is built right into the A/D and D/A, then the
 8- bit G.711 codewords directly arrives at the input buffer or leave the
 output buffer and the RTS system does not need to schedule G.711 encoding
 and decoding tasks, so d2 = d5 = 0. Also, in this case d1 = P, d3 = 0, and
 0 < d4 <= P.  Thus, the total one-way delay is P < D <= 2*P.

 Even if the G.711 encoding and decoding operations are done in
 software/firmware, the G.711 complexity is so low that it takes the
 processor almost no time to do encoding and decoding.  In this case, the
 IP phone is closer to the case of a PC that has much more processing power
 than is required for speech coding, and if the Cisco engineers did a good
 job of optimizing RTS to minimize d2 and d5, then d2 and d5 would be
 closer to 0 than to P.  Then, the total one-way codec-dependent delay
 would be closer to P than to 3*P.  This is probably what you have
 observed.

 [Koen]:
 Thanks for the detailed explanation, this clarifies your earlier
 statements about the 3x multiplier.

 The essence, if I understand you correctly, is that there still exist low-
 end platforms with barely enough processing power to run a VoIP call.  If
 such platforms use a naive FIFO scheduler, they'll create up to one frame
 of processing delay for encoder and decoder each, on top of the frame of
 buffering delay.

 The good news is that Moore's law will continue to drive down the fraction
 of platforms with such processing delay problems.

 I'm a bit surprised by your analysis of "packet transmission delay", as it
 has little bearing on our multiplier (ie the change in delay as a function
 of frame size). See old posts.

 [Raymond]: It doesn't have to be low-end platforms.  I wouldn't consider
 high-density VoIP gateways "low-end".  What matters is whether the
 processor is heavily loaded (i.e. busy at a high percentage of time) with
 real-time tasks (and thus is just fast enough). I think this is true for
 typical implementations of IP phones and VoIP gateways.

 I also wouldn't use the term "a naïve FIFO scheduler" to describe the "run
 to completion" real-time scheduler that I talked about in my last email,
 because that term seems to imply that it is a very simple-minded and
 inferior approach used by an inexperienced person who doesn't know
 anything better.  My understanding from talking to the three senior
 technical leads of Broadcom is that the reality is when you have many
 real-time tasks that you need to handle concurrently, using a prioritized
 interrupt-driven scheduler is just way too complex and messy, and it
 doesn't even guarantee that you will get a lower delay if you do go
 through the trouble.  In contrast, the kind of "run to completion" real-
 time scheduler that I talked about is a more elegant solution as it
 simplifies the scheduling problem substantially and also allows you to
 have more efficient utilization of the processor.

 Other than these two points, your understanding of my main point is
 correct.

 > The good news is that Moore's law will continue to drive down the
 > fraction of platforms with such processing delay problems.

 [Raymond]: This may be true for PC but probably not true in general.
 PC is a general-purpose computing device that has to handle numerous
 possible tasks, and a voice phone call takes only a very small fraction of
 the worst-case computational power requirement of a PC.  In contrast, for
 special-purpose dedicated hardware devices such as IP phones or VoIP
 gateways, it would make no sense to use a processor that is many times
 faster than the worst-case computational power requirement.  For the sake
 of cost and power efficiency, the designers of such special- purpose
 devices will want to use a processor that's just slightly faster than
 required, because then they can use the cheapest and/or lowest power-
 consuming processor that's fast enough to get the job done.
 If they choose to use a processor much faster than is required, then
 competitors using processors just fast enough can have lower costs and
 power consumption and can take market share away from them.

 A case in point: after its first appearance several decades ago, 8-bit
 microprocessors are still widely used in many devices today despite the
 several orders of magnitude of speed improvement provided by Moore's Law,
 because those devices just don't need anything faster, so using anything
 faster would be a waste of money and power consumption.

 My point is that we should not expect that future IP phones or gateways
 will operate at a very low percentage point of the processor load just
 because Moore's Law can improve processor speed over time. Therefore,
 don't expect the 3X multiplier for codec frame size to go down much below
 where they are now.

 In fact, if in addition to a VoIP call, a PC is heavily loaded with a lot
 of other concurrent tasks, many of which may be real-time tasks (e.g.
 video, playing/burning CD/DVD, networking, etc.), then it will be
 difficult for the PC to have small encoding and decoding RTS delays (d2
 and d5 in my delay analysis).  In this case, the codec frame size
 multiplier will be closer to 3X than to 1X, unless you are willing to let
 the voice stream occasionally run out of real time and produce an audible
 glitch (which is not acceptable from the voice quality perspective).  If
 you agree with this and agree that a PC sometimes does get very heavily
 loaded, then if you don't want the voice stream to run out of real time,
 the worst-case codec-dependent delay for PC can still be around 3X the
 codec frame size.

 > I'm a bit surprised by your analysis of "packet transmission delay",
 > as it has little bearing on our multiplier (ie the change in delay as
 > a function of frame size). See old posts.

 [Raymond]: I am not sure I understand what you are saying.  You probably
 misunderstood the goal of my analysis. I mentioned in my last email that
 my delay analysis aimed to derive the lower and upper bounds of the codec-
 dependent one-way delay as functions of both the codec frame size AND the
 packet size.  That "packet transmission delay" does depend on the packet
 size, so it should be included.  Also, including it doesn't increase the
 lower bound of the delay (and the codec frame size multiplier there); it
 only affects the upper bound.

 Or, are you saying the "packet transmission delay" depends on the packet
 size, not the codec frame size, and therefore is not codec-dependent?
 Well, we know the packet size should be a positive integer multiple of the
 codec frame size.  Once the codec frame size is determined, there are only
 limited choices of packet sizes you can use, so in this sense the packet
 size does depend on the codec frame size.  Therefore, the "packet
 transmission delay" indirectly depends on the choice of the codec.

 [Koen]:
 In other words, future manufacturers won't spend a few dimes on reducing
 delay, even though today they're happy to add several dollars to the price
 just to enable wideband?  That's a statement about the relative importance
 of delay.

 For the discussion about transmission delay vs. frame size, see e.g.
 http://www.ietf.org/mail-archive/web/codec/current/msg01477.html

 [Hoene]:
 yesterdays, I had a brief look on ITU-T G.114
 http://www1.cs.columbia.edu/~andreaf/new/documents/other/T-REC-G.114-200305.pdf
 It might help in your discussion...

 [Sanny MacInnis]:
 Sorry for stepping in here... full disclosure: I'm not a speech coding
 expert, and I work at Broadcom, where Raymond works.

 I too would like to end this discussion; it seems to have diverged from a
 discussion of the requirements for the CODEC algorithm to have a mode with
 low algorithmic delay, which AFAIK is already agreed anyway, to some
 rather tangential discussions related to, but not really addressing, real
 time scheduling of the algorithm on a processor.

 The point from Raymond that is the head of this particular discussion
 trail is RTS, i.e. real time scheduling. I know his note about that is
 long; it might be worth reading it again.

 It's not a fair assumption that 100% of a shared resource - in this
 instance, a processor - can be spent performing real-time-scheduled tasks.
 If there is a set of RT (real time) tasks that have different periods, and
 periods = deadlines, all being scheduled on the same processor, the best
 you can do is less than 100%. How close you can get depends on the
 details; it might be e.g. 68%, or it could be significantly less; there's
 a lot of literature on this. If the system is optimally designed for the
 purposes of RTS, i.e. all other tasks are treated as non-real time and
 have lower priority than all real time tasks, there are no priority
 inversions, task switching is very efficient, etc. the RTS performance can
 come close to theory, but if any of these assumptions are not true, it be
 significantly worse.

 If the total RT demands are only a very small fraction of the total shared
 resource, i.e. processor cycles, it tends to be easier to perform the
 scheduling and ensure that it works correctly. Such a scenario may be more
 important than RTS indicates if the system is not well designed for real
 time operation, i.e. a PC. And, such systems draw MUCH more power than
 well-designed embedded products. Conversely, low power and modest clock
 rates are good design principles for embedded products, if those that are
 wall (mains) powered. E.g. someone noted leakage power at 65nm - have you
 looked at 40nm? It just keeps getting worse. Designing for slower max
 clock rate saves substantial power.

 There are good reasons why a common convention of real time scheduling is
 the assumption that period = deadline. As Raymond noted, other design
 assumptions are possible, but they have their own problems.

 Note also, as Raymond pointed out, that RTS also applies to intermediate
 points in the end-end system, such as gateways. Such a device may have
 very powerful processors, and if so, it should be for the specific purpose
 performing a large number of RT tasks, loading the processors as much as
 can be guaranteed.

 I would hope that this committee is not planning to be in the position of
 dictating that all implementations of the algorithm require a processor
 that is so fast that the system can guarantee service that latency is much
 less than the period of an audio frame. And if not, then a reasonable
 assumption is that, in general, the deadline of service latency does equal
 the period of an audio frame. That assumption is part of one of upper-
 limit calculations from Raymond.

 [Raymond]: I too don't want to see this discussion drags on, but some of
 your comments seem misleading to me, so I would like to respond with some
 quick comments.

 Wideband is a new feature in some devices and is a check box that a
 product manager needs to check off to remain competitive.  That doesn't
 mean wideband is more important than existing features in a device.  Also,
 I am not sure the cost difference is a few dimes versus several dollars.
 In some devices the extra cost of adding wideband is minimal. Furthermore,
 it is not only a cost issue but also a power consumption issue.  No one in
 his or her right mind will use a processor that's 5X to 10X faster than
 necessary just in order to reduce the encoder and decoder RTS delays to a
 small fraction of the codec frame size; this is just the way it is and has
 nothing to do with the relative importance of delay or anything else.

 You were presenting it as if this were a reasonable choice that device
 designers could easily make but chose not to make, but that's just not
 true.  It has always been the case that the designers will use processors
 just fast enough for the job, perhaps with a little margin for the
 unexpected, but not 5X or 10X.  Given this, the bottom line is that ~ 3X
 codec frame size is the "norm" or a "necessary result" for special-purpose
 hardware devices rather than by a design choice, and you are just lucky to
 get < 2X in PC-based VoIP calls because PCs were not designed for voice
 calls but for other much more computationally demanding tasks.  (Even
 there you can't guarantee that PCs will always give you a multiplier of <
 2X.
 What if the PC is heavily loaded with other tasks?  Then you are more
 likely to get 3X if you don't want your voice stream to run out of real
 time.)

-- 
Ticket URL: <http://trac.tools.ietf.org/wg/codec/trac/ticket/19#comment:5>
codec <http://tools.ietf.org/codec/>