Re: [codec] #19: How large is the frame size depended delay / the serialization delay / frame size depended processing delay? (was: How large is the frame size depended delay / the serialization delay?)

"codec issue tracker" <trac@tools.ietf.org> Sun, 09 May 2010 17:21 UTC

MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: codec issue tracker <trac@tools.ietf.org>
Precedence: bulk
Auto-Submitted: auto-generated
To: hoene@uni-tuebingen.de
Date: Sun, 09 May 2010 17:20:54 -0000
Message-ID: <071.d8b4cc2e3960f92569ae8bff500e48e0@tools.ietf.org>
References: <062.f8b0d2abf056a9655a81ee25366bb354@tools.ietf.org>
In-Reply-To: <062.f8b0d2abf056a9655a81ee25366bb354@tools.ietf.org>
Cc: codec@ietf.org
Subject: Re: [codec] #19: How large is the frame size depended delay / the serialization delay / frame size depended processing delay? (was: How large is the frame size depended delay / the serialization delay?)
Reply-To: codec@ietf.org

#19: How large is the frame size depended delay / the serialization delay /
frame size depended processing delay?
------------------------------------+---------------------------------------
 Reporter:  hoene@…                 |       Owner:     
     Type:  enhancement             |      Status:  new
 Priority:  minor                   |   Milestone:     
Component:  requirements            |     Version:     
 Severity:  -                       |    Keywords:     
------------------------------------+---------------------------------------

Comment(by hoene@…):

 [Stephen]: There's algorithmic delay (including framing) + flight time +
 dejittering.  Flight time depends on the network path, not on the frame
 size.  And the amount of jitter is due principally to cross-congestion.
 [Raymond]: My main point was not the absolute one-way delay value for the
 5 ms frame size but the relative delay between 5 ms and 20 ms frame sizes.
 I agree that the 5X delay might be too simplistic.  I tried to use a
 simple formula to make it is easier for people to follow, but I did
 realize its limitation especially at a small frame size, so I added that
 “Even if you use a longer jitter buffer, …” sentence.
 Regardless of whether 5X frame size is overly simplistic, the fact remains
 that cellular codecs have a 20 ms frame size and have a typical one-way
 delay around 80 to 110 ms or so, and the cellular networks probably don’t
 have the kind of jitter that the Internet has.  What would make us believe
 that an IETF codec with a 20 ms frame size will get a one-way delay much
 below 80 ms?  Chances are an Internet call using an IETF codec with a 20
 ms frame size will likely have a one-way delay at least as long as the
 one-way delay of a cell phone call, and more likely to be longer, because
 PC audio driver software tends to add quite a bit of delay, and an
 Internet call incurs additional jitter buffer delay when compared with
 cell phone calls.
 Therefore, regardless of the accuracy of the 5X frame size formula, the
 conclusion remains the same: for the conference bridge application, a 20
 ms codec frame size will result in the total one-way delay far exceeding
 the ITU-T’s 150 ms guideline, thus substantially degrading the perceived
 quality of the communication links, and with one or even two cell phone
 callers joining the conference, the long latency and the associated
 problems will just get much worse.  Therefore, it is necessary for the
 IETF codec to have a low-delay mode using a small codec frame size such as
 5 ms to address delay-sensitive applications such as bridge-based
 conference calls.
 [Mikhael]: The light-speed-in-fiber delay RTT is 1ms per 100km. Europe -
 US West coast is ~150ms RTT. I'm in Thailand at the momen. I have 350ms
 RTT to Sweden currently (because the path goes
 sweden->us->singapore>thailand), just to give some datapoints. Add then
 ADSL2+ 25ms RTT just over the access layer, and I'd say that 200ms network
 RTT (100ms one-way) might be a low percentage of the calls, but it's still
 definitely going to happen.
 Considering the prices of internationall calls out of a place like this, a
 lot of people are going to want to use it to get around it.

 [Raymond]: First, I agree that codec algorithmic buffering delay is more
 accurate than frame size since it can also include the “look-ahead” delay
 and filtering delay if sub-band analysis/synthesis is used.  However, your
 formula implies that for the codec-related delay, the “multiplier” to be
 used for the codec frame size is only 1.  That’s unrealistic and
 theoretically impossible.  For that to happen, after you wait one frame of
 time for the current frame of input audio samples to arrive at your input
 signal buffer (that’s one frame of codec-related delay already), you need
 an infinitely fast processor to finish the encoding operation instantly,
 then you need an infinitely fast communication link to ship all the bits
 in the compressed frame to the decoder instantly, and then you need an
 infinitely fast processor to finish decoding the frame instantly and start
 playing back the current frame of audio without any delay.  That’s just
 impossible.
 In reality, if the processor is just barely fast enough to implement the
 codec in real time, then you need nearly a full frame of time to finish
 the encoding and decoding operations. That makes the multiplier to be 2
 already.  If your communication link is just barely fast enough to
 transmit your packets at the same speed they are generated without piling
 up unsent packets, then it takes another frame of time to finish
 transmitting the compressed bits in a frame to the decoder.  That makes
 the multiplier to be 3 already.
 Granted, in practice the processor and the communication link are usually
 faster than just barely enough, so the processing delay and the
 transmission delay can be less than 1 frame each.  However, there are
 other miscellaneous uncounted delays that tend to depend on the codec size
 in various ways.  Thus, a typical IP phone implementation would have
   One-way delay = codec-independent delay + 3*(codec frame size) + (codec
 look-ahead) + (codec filtering delay if any).
 Hence, the one-way delay difference between a 20 ms and a 5 ms codec frame
 size would be 45 ms + (codec look-ahead difference) + (codec filtering
 delay difference).
 Consequently, for the conference bridge application, the total difference
 in one-way delay can easily be in the 90 to 100 ms range. When adding this
 delay difference to all the other codec-independent delay components, it
 is still a huge difference that the users can easily notice, especially
 since it will most likely push the total one-way delay significantly
 beyond the 150 ms limit.

 [Stephen]:
 There is a floor transmission delay when you send the minimum size packets
 the network path allows.
 There is an incremental delay due to serialization when you send larger
 size packets than the minimum. (At each hop you wait until you receive
 last bit in the packet before you forward the first bit).   I'd agree that
 a reasonable model for the incremental delay is that it scales linearly
 with the increase in packet size.  But the floor delay is usually too
 large for this to matter dominate.
 And on top of that is the variable delay  (jitter) due to congestion,
 layer 2 retransmission, and the like.  That also will not scale linearly
 with frame size or packet size.

 [Koen]: Processing time matters on low-end hardware - a small fraction of
 today's VoIP end points.  Even then, the higher coding efficiency of
 longer frames can be translated into lower complexity.

 [Raymond]: Processing time certainly matters for IP phones, and there are
 a lot of enterprise IP phones deployed today. I heard that it is actually
 significantly cheaper for enterprises to have their entire phone systems
 IP-phone-based than analog-phone-based. I won’t be surprised that before
 too long the vast majority of enterprises will use only IP phones.  Even
 consumer phones and cell phones are moving toward IP-based.  Eventually
 that would be a very large percentage of VoIP end points.

 [Koen]: And transmission delay increases (perhaps) linearly with the
 *packet   size*, not with the *frame size*.  For a 32 kbps codec with 5 ms
 frames, packets are just 30% smaller than with a 16 kbps codecs with 20 ms
 frames.

 [Raymond]: Agreed. My previous comments on transmission delay was based on
 the TDM rather than packet scenario, but I was just using that simplified
 TDM example to make a point that transmission delay cannot be zero, as
 your 1X frame size multiplier would imply.  Even with your statement
 above, a larger codec frame size still makes a larger packet size, which
 then increases the transmission delay, so you can’t say transmission delay
 is zero or is independent of the codec size.
 In any case, these are really minor details.  My key point is that your 1X
 multiplier for the codec frame size is simply theoretically impossible.
 The rule of thumb used by IP phone engineers is around 3X codec frame
 size.

 [Koen]: Let me ask you something: how often is G.729 used with 10 ms
 packets,  or Broadvoice with 5 ms packets?

 [Raymond]: Not very often, but that’s because previously network
 routers/switches didn’t like to handle too many packets per second, and
 the higher packet header overhead associated with a smaller packet size
 means the overall bit-rate would be higher than desired or allowed, so the
 time of small packet size for low-delay VoIP hasn’t really come yet.
 However, with the help of Moore’s Law, network routers/switches are
 becoming much faster now, and I was told that they can handle a 5 ms
 packet size without problems; furthermore, the speed of backbone networks
 and access networks keep increasing with time, so the bit-rate concern
 will also decrease with time.
 Unlike processing speed and communication speed that continuously get
 improved with time for decades, delay is one thing that will NOT get
 improved with time and Moore’s Law cannot do anything about that!
 If the IETF codec has a minimum frame size of 20 ms, we will be stuck with
 the longer overall delay associated with that, and Moore’s Law will not
 help us reduce that delay in the future.  On the other hand, in addition
 to using a 20 ms frame size for bit-rate-sensitive applications, if the
 IETF codec also has a low-delay mode that uses a 5 ms frame size, then at
 least for delay-sensitive applications, people have a choice to achieve a
 lower delay by paying the price of a higher overall bit-rate (i.e. with
 packet header counted), and this higher bit-rate will be less and less of
 a concern as the network speed keep increasing with time.
 Therefore, recognizing that delay cannot be helped by Moore’s Law but bit-
 rate can, it would be wise for the IETF codec WG to adopt a low-delay mode
 for the codec in order to be future-proof.

 [Raymond]:[…] one-way delay = codec-independent delay + 3*(codec frame
 size) + (codec look-ahead) + (codec filtering delay if any).  The main
 debate now is centered on whether the multiplier of the codec frame size
 should be 1 as Koen said or 3 as I was told by experienced IP phone
 engineers.  I argue that 1X is theoretically impossible.  It is
 interesting to note that the ITU-T uses a multiplier of 2X.  I think 2X is
 probably achievable for the idealized situation.  In practice, however,
 many nitty-gritty details get in the way of getting that idealized
 situation, and little additional delays just keep getting added, resulting
 in a real-world realistic 3X multiplier.  With a 3X multiplier, the one-
 way delay difference between a 20 ms and a 5 ms codec frame size would be
 45 ms + (codec look-ahead difference) + (codec filtering delay
 difference).  For the conference bridge application, the total difference
 in one-way delay will double to the 90 to 100 ms range.  That’s a VERY
 significant difference that typical users will notice (it’s like adding
 another cell phone call delay), especially if it pushes the total one-way
 delay significantly beyond the 150 ms guideline.   Therefore, I argue that
 for the best user experience in conference bridge calls, the IETF codec
 should have a low-delay mode with a small codec frame size such as 5 ms,
 and let the continually increasing speed of communication links make the
 header overhead bit-rate become less and less of an issue in the future.
 (Even now, for those people who have high speed connection to their
 computers, it is already not an issue.  It is better for them to get low
 delay than to worry about bit-rate or packet header overhead.)

 [Stephen]:
 Serialization delay is defined as being
 Serialization Delay = Size of Packet (bits) / Transmission Rate (bps)
 This is summed over all hops.

 For a typical low-speed backbone hop (OC3), the serialization delay for
 the bitrates we are talking about result in serialization delay on the
 order of 10 microseconds for 20 ms frame sizes.  For a fast backbone hop
 (OC48) it is on the order of 1 microsecond.

 There is a handy calculator here: http://kt2t.us/serialization_delay.html


 [Stephen]:
 If the frame-size multiplier is due to serialization, then I agree with
 Koen's assessment.  In fact on many connections the multiplier would be
 less than 1. Dial-up is of course the worst case here, and on those links
 the multiplier ought to be close to 2.  Variations due to congestion (and
 on some links, polling) are (IMHO) better modeled as jitter.

 Gateways are another matter, with the delays being highly dependent on the
 product architecture.  Interupt latencies, context switching, bus
 architectures, etc. can dominate, so it is totally possible that reducing
 the frame size might actually increase the latency (since it increases the
 packets per second load on the gateway).  So I agree with Koen on this as
 well.

 Anecdotal models based on industry experience can be useful guides -
 though if we are going to use these models to drive requirements, I'd
 prefer something more analytical.

 [Christian]:
 I agree that serialization, processing, and implementation delay should be
 distinguished.

 Assume a low-cost VoIP phone with its processing power being fully
 utilized by one call: Then, the DSP/CPU needs an entire frame duration to
 encode and decode frames. Thus, the latency is increase by one frame
 length in addition to the serialization delay, propagation delay,
 algorithmic delay, dejittering delay, echo cancelling delay, ...  Running
 the chips at 100% load is of course cost saving compared to add some more
 computational resource. But is this still a relevant issue today?

 I am not sure whether it always make sense for mobile device to run at
 100% load. Of course, from a energy consumption perceptive it make sense
 to run CMOS circuit at the lowest possible frequency as power consumption
 drops quadratic. But maybe running the CPU/DSP at higher speed and
 switching to power save mode if after a frame has been decoded/encoded is
 be equally energy efficient…

 Also, I do not think we shall consider implementation delay, which occurs
 due to suboptimal implementation. For example, some years ago we tested
 the RTT of two Linux softphones link directly together using G.711. It was
 400ms. The implementation delay could be a good performance metric to
 differentiate two otherwise equal products. Also, the algorithmic
 processing delay might be subject to similar market optimization.

 Having said this, I would anyhow suggest to include the processing delay
 into the measurement of the end-to-end (acoustic round) trip time.  Those
 measurements should be part of the control loop that optimizes the overall
 conversation call quality.


 [Koen]:
 > Gateways are another matter, with the delays being highly dependent on
 > the product architecture.  Interupt latencies, context switching, bus
 > architectures, etc. can dominate, so it is totally possible that
 > reducing the frame size might actually increase the latency (since it
 > increases the packets per second load on the gateway).  So I agree
 > with Koen on this as well.

 [Raymond]: I agree with the first half of your paragraph above but not the
 second half, because the second half contradicts with the real-world
 observed behaviors: G.711 with a 5 ms frame/packet size gets 12 to 17 ms
 of codec-dependent delay, while G.711 with a 10 ms frame/packet size gets
 50 to 60 ms of worst-case codec-dependent delay before delay optimization,
 and 30 ms after delay optimization.  In these two actual real-world VoIP
 gateway implementations, the codec-dependent delay grow with the codec
 frame size with about a 3X multiplier.

 [Raymond]: The frame-size multiplier has many more components than just
 serialization as I have discussed in my previous emails.  How can the
 total multiplier be less than 1 when just buffering the current frame of
 input speech samples will take one frame?  Perhaps you are only talking
 about the serialization delay component?  I agree that delay variations
 due to congestion are better modeled as jitter.  That's always the case,
 and my previous discussions did not include jitter in the codec-dependent
 delay.

 [Raymond]: We broke down the one-way delay into codec-independent delay
 and codec-dependent delay, and then further broke down the codec-dependent
 delay into the components of codec buffering delay, processing delay,
 transmission delay (I guess what you called serialization delay), and
 scheduling and buffering delays of the micro-controllers and DSPs due to
 many tasks to many channels competing for the processors, etc. We also
 analyzed which part doesn't change with improving technology (e.g. codec
 buffering delay) and how certain delay components may change with
 increasing processor speed and transmission speed.  Isn't that analytical?
 How much more analytical can you get than that?  We didn't just throw a
 few real-world observed codec-dependent delay values and ask everyone to
 believe the 3X multiplier without any explanation or analysis. No, we just
 used these real-world values to support our analysis.

 [Stephen]: Generally the need to buffer the current frame is treated as
 part of the algorithmic delay.  At least I believe that is what the ITU-T
 does.

-- 
Ticket URL: <http://trac.tools.ietf.org/wg/codec/trac/ticket/19#comment:2>
codec <http://tools.ietf.org/codec/>

[codec] #19: How large is Serialization delay? codec issue tracker
Re: [codec] #19: How large is Serialization delay? stephen botzko
Re: [codec] #19: How large is the frame size depe… codec issue tracker
Re: [codec] #19: How large is the frame size depe… codec issue tracker
Re: [codec] #19: How large is the frame size depe… codec issue tracker
Re: [codec] #19: How large is the frame size depe… codec issue tracker
Re: [codec] #19: How large is the frame size depe… codec issue tracker
Re: [codec] requirements #19 (new): How large is … codec issue tracker