Re: [codec] #19: How large is the frame size depended delay / the serialization delay / frame size depended processing delay? (was: How large is the frame size depended delay / the serialization delay?)
"codec issue tracker" <trac@tools.ietf.org> Sun, 09 May 2010 17:21 UTC
Return-Path: <trac@tools.ietf.org>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id DA4403A6A76 for <codec@core3.amsl.com>; Sun, 9 May 2010 10:21:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -101.13
X-Spam-Level:
X-Spam-Status: No, score=-101.13 tagged_above=-999 required=5 tests=[AWL=-1.131, BAYES_50=0.001, NO_RELAYS=-0.001, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PKzHLhiJvM26 for <codec@core3.amsl.com>; Sun, 9 May 2010 10:21:16 -0700 (PDT)
Received: from zinfandel.tools.ietf.org (unknown [IPv6:2001:1890:1112:1::2a]) by core3.amsl.com (Postfix) with ESMTP id D21203A6A89 for <codec@ietf.org>; Sun, 9 May 2010 10:21:08 -0700 (PDT)
Received: from localhost ([::1] helo=zinfandel.tools.ietf.org) by zinfandel.tools.ietf.org with esmtp (Exim 4.69) (envelope-from <trac@tools.ietf.org>) id 1OBABS-0003ds-DF; Sun, 09 May 2010 10:20:56 -0700
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: codec issue tracker <trac@tools.ietf.org>
X-Trac-Version: 0.11.6
Precedence: bulk
Auto-Submitted: auto-generated
X-Mailer: Trac 0.11.6, by Edgewall Software
To: hoene@uni-tuebingen.de
X-Trac-Project: codec
Date: Sun, 09 May 2010 17:20:54 -0000
X-URL: http://tools.ietf.org/codec/
X-Trac-Ticket-URL: http://trac.tools.ietf.org/wg/codec/trac/ticket/19#comment:2
Message-ID: <071.d8b4cc2e3960f92569ae8bff500e48e0@tools.ietf.org>
References: <062.f8b0d2abf056a9655a81ee25366bb354@tools.ietf.org>
X-Trac-Ticket-ID: 19
In-Reply-To: <062.f8b0d2abf056a9655a81ee25366bb354@tools.ietf.org>
X-SA-Exim-Connect-IP: ::1
X-SA-Exim-Rcpt-To: hoene@uni-tuebingen.de, codec@ietf.org
X-SA-Exim-Mail-From: trac@tools.ietf.org
X-SA-Exim-Scanned: No (on zinfandel.tools.ietf.org); SAEximRunCond expanded to false
Cc: codec@ietf.org
Subject: Re: [codec] #19: How large is the frame size depended delay / the serialization delay / frame size depended processing delay? (was: How large is the frame size depended delay / the serialization delay?)
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Reply-To: codec@ietf.org
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 09 May 2010 17:21:20 -0000
#19: How large is the frame size depended delay / the serialization delay / frame size depended processing delay? ------------------------------------+--------------------------------------- Reporter: hoene@… | Owner: Type: enhancement | Status: new Priority: minor | Milestone: Component: requirements | Version: Severity: - | Keywords: ------------------------------------+--------------------------------------- Comment(by hoene@…): [Stephen]: There's algorithmic delay (including framing) + flight time + dejittering. Flight time depends on the network path, not on the frame size. And the amount of jitter is due principally to cross-congestion. [Raymond]: My main point was not the absolute one-way delay value for the 5 ms frame size but the relative delay between 5 ms and 20 ms frame sizes. I agree that the 5X delay might be too simplistic. I tried to use a simple formula to make it is easier for people to follow, but I did realize its limitation especially at a small frame size, so I added that “Even if you use a longer jitter buffer, …” sentence. Regardless of whether 5X frame size is overly simplistic, the fact remains that cellular codecs have a 20 ms frame size and have a typical one-way delay around 80 to 110 ms or so, and the cellular networks probably don’t have the kind of jitter that the Internet has. What would make us believe that an IETF codec with a 20 ms frame size will get a one-way delay much below 80 ms? Chances are an Internet call using an IETF codec with a 20 ms frame size will likely have a one-way delay at least as long as the one-way delay of a cell phone call, and more likely to be longer, because PC audio driver software tends to add quite a bit of delay, and an Internet call incurs additional jitter buffer delay when compared with cell phone calls. Therefore, regardless of the accuracy of the 5X frame size formula, the conclusion remains the same: for the conference bridge application, a 20 ms codec frame size will result in the total one-way delay far exceeding the ITU-T’s 150 ms guideline, thus substantially degrading the perceived quality of the communication links, and with one or even two cell phone callers joining the conference, the long latency and the associated problems will just get much worse. Therefore, it is necessary for the IETF codec to have a low-delay mode using a small codec frame size such as 5 ms to address delay-sensitive applications such as bridge-based conference calls. [Mikhael]: The light-speed-in-fiber delay RTT is 1ms per 100km. Europe - US West coast is ~150ms RTT. I'm in Thailand at the momen. I have 350ms RTT to Sweden currently (because the path goes sweden->us->singapore>thailand), just to give some datapoints. Add then ADSL2+ 25ms RTT just over the access layer, and I'd say that 200ms network RTT (100ms one-way) might be a low percentage of the calls, but it's still definitely going to happen. Considering the prices of internationall calls out of a place like this, a lot of people are going to want to use it to get around it. [Raymond]: First, I agree that codec algorithmic buffering delay is more accurate than frame size since it can also include the “look-ahead” delay and filtering delay if sub-band analysis/synthesis is used. However, your formula implies that for the codec-related delay, the “multiplier” to be used for the codec frame size is only 1. That’s unrealistic and theoretically impossible. For that to happen, after you wait one frame of time for the current frame of input audio samples to arrive at your input signal buffer (that’s one frame of codec-related delay already), you need an infinitely fast processor to finish the encoding operation instantly, then you need an infinitely fast communication link to ship all the bits in the compressed frame to the decoder instantly, and then you need an infinitely fast processor to finish decoding the frame instantly and start playing back the current frame of audio without any delay. That’s just impossible. In reality, if the processor is just barely fast enough to implement the codec in real time, then you need nearly a full frame of time to finish the encoding and decoding operations. That makes the multiplier to be 2 already. If your communication link is just barely fast enough to transmit your packets at the same speed they are generated without piling up unsent packets, then it takes another frame of time to finish transmitting the compressed bits in a frame to the decoder. That makes the multiplier to be 3 already. Granted, in practice the processor and the communication link are usually faster than just barely enough, so the processing delay and the transmission delay can be less than 1 frame each. However, there are other miscellaneous uncounted delays that tend to depend on the codec size in various ways. Thus, a typical IP phone implementation would have One-way delay = codec-independent delay + 3*(codec frame size) + (codec look-ahead) + (codec filtering delay if any). Hence, the one-way delay difference between a 20 ms and a 5 ms codec frame size would be 45 ms + (codec look-ahead difference) + (codec filtering delay difference). Consequently, for the conference bridge application, the total difference in one-way delay can easily be in the 90 to 100 ms range. When adding this delay difference to all the other codec-independent delay components, it is still a huge difference that the users can easily notice, especially since it will most likely push the total one-way delay significantly beyond the 150 ms limit. [Stephen]: There is a floor transmission delay when you send the minimum size packets the network path allows. There is an incremental delay due to serialization when you send larger size packets than the minimum. (At each hop you wait until you receive last bit in the packet before you forward the first bit). I'd agree that a reasonable model for the incremental delay is that it scales linearly with the increase in packet size. But the floor delay is usually too large for this to matter dominate. And on top of that is the variable delay (jitter) due to congestion, layer 2 retransmission, and the like. That also will not scale linearly with frame size or packet size. [Koen]: Processing time matters on low-end hardware - a small fraction of today's VoIP end points. Even then, the higher coding efficiency of longer frames can be translated into lower complexity. [Raymond]: Processing time certainly matters for IP phones, and there are a lot of enterprise IP phones deployed today. I heard that it is actually significantly cheaper for enterprises to have their entire phone systems IP-phone-based than analog-phone-based. I won’t be surprised that before too long the vast majority of enterprises will use only IP phones. Even consumer phones and cell phones are moving toward IP-based. Eventually that would be a very large percentage of VoIP end points. [Koen]: And transmission delay increases (perhaps) linearly with the *packet size*, not with the *frame size*. For a 32 kbps codec with 5 ms frames, packets are just 30% smaller than with a 16 kbps codecs with 20 ms frames. [Raymond]: Agreed. My previous comments on transmission delay was based on the TDM rather than packet scenario, but I was just using that simplified TDM example to make a point that transmission delay cannot be zero, as your 1X frame size multiplier would imply. Even with your statement above, a larger codec frame size still makes a larger packet size, which then increases the transmission delay, so you can’t say transmission delay is zero or is independent of the codec size. In any case, these are really minor details. My key point is that your 1X multiplier for the codec frame size is simply theoretically impossible. The rule of thumb used by IP phone engineers is around 3X codec frame size. [Koen]: Let me ask you something: how often is G.729 used with 10 ms packets, or Broadvoice with 5 ms packets? [Raymond]: Not very often, but that’s because previously network routers/switches didn’t like to handle too many packets per second, and the higher packet header overhead associated with a smaller packet size means the overall bit-rate would be higher than desired or allowed, so the time of small packet size for low-delay VoIP hasn’t really come yet. However, with the help of Moore’s Law, network routers/switches are becoming much faster now, and I was told that they can handle a 5 ms packet size without problems; furthermore, the speed of backbone networks and access networks keep increasing with time, so the bit-rate concern will also decrease with time. Unlike processing speed and communication speed that continuously get improved with time for decades, delay is one thing that will NOT get improved with time and Moore’s Law cannot do anything about that! If the IETF codec has a minimum frame size of 20 ms, we will be stuck with the longer overall delay associated with that, and Moore’s Law will not help us reduce that delay in the future. On the other hand, in addition to using a 20 ms frame size for bit-rate-sensitive applications, if the IETF codec also has a low-delay mode that uses a 5 ms frame size, then at least for delay-sensitive applications, people have a choice to achieve a lower delay by paying the price of a higher overall bit-rate (i.e. with packet header counted), and this higher bit-rate will be less and less of a concern as the network speed keep increasing with time. Therefore, recognizing that delay cannot be helped by Moore’s Law but bit- rate can, it would be wise for the IETF codec WG to adopt a low-delay mode for the codec in order to be future-proof. [Raymond]:[…] one-way delay = codec-independent delay + 3*(codec frame size) + (codec look-ahead) + (codec filtering delay if any). The main debate now is centered on whether the multiplier of the codec frame size should be 1 as Koen said or 3 as I was told by experienced IP phone engineers. I argue that 1X is theoretically impossible. It is interesting to note that the ITU-T uses a multiplier of 2X. I think 2X is probably achievable for the idealized situation. In practice, however, many nitty-gritty details get in the way of getting that idealized situation, and little additional delays just keep getting added, resulting in a real-world realistic 3X multiplier. With a 3X multiplier, the one- way delay difference between a 20 ms and a 5 ms codec frame size would be 45 ms + (codec look-ahead difference) + (codec filtering delay difference). For the conference bridge application, the total difference in one-way delay will double to the 90 to 100 ms range. That’s a VERY significant difference that typical users will notice (it’s like adding another cell phone call delay), especially if it pushes the total one-way delay significantly beyond the 150 ms guideline. Therefore, I argue that for the best user experience in conference bridge calls, the IETF codec should have a low-delay mode with a small codec frame size such as 5 ms, and let the continually increasing speed of communication links make the header overhead bit-rate become less and less of an issue in the future. (Even now, for those people who have high speed connection to their computers, it is already not an issue. It is better for them to get low delay than to worry about bit-rate or packet header overhead.) [Stephen]: Serialization delay is defined as being Serialization Delay = Size of Packet (bits) / Transmission Rate (bps) This is summed over all hops. For a typical low-speed backbone hop (OC3), the serialization delay for the bitrates we are talking about result in serialization delay on the order of 10 microseconds for 20 ms frame sizes. For a fast backbone hop (OC48) it is on the order of 1 microsecond. There is a handy calculator here: http://kt2t.us/serialization_delay.html [Stephen]: If the frame-size multiplier is due to serialization, then I agree with Koen's assessment. In fact on many connections the multiplier would be less than 1. Dial-up is of course the worst case here, and on those links the multiplier ought to be close to 2. Variations due to congestion (and on some links, polling) are (IMHO) better modeled as jitter. Gateways are another matter, with the delays being highly dependent on the product architecture. Interupt latencies, context switching, bus architectures, etc. can dominate, so it is totally possible that reducing the frame size might actually increase the latency (since it increases the packets per second load on the gateway). So I agree with Koen on this as well. Anecdotal models based on industry experience can be useful guides - though if we are going to use these models to drive requirements, I'd prefer something more analytical. [Christian]: I agree that serialization, processing, and implementation delay should be distinguished. Assume a low-cost VoIP phone with its processing power being fully utilized by one call: Then, the DSP/CPU needs an entire frame duration to encode and decode frames. Thus, the latency is increase by one frame length in addition to the serialization delay, propagation delay, algorithmic delay, dejittering delay, echo cancelling delay, ... Running the chips at 100% load is of course cost saving compared to add some more computational resource. But is this still a relevant issue today? I am not sure whether it always make sense for mobile device to run at 100% load. Of course, from a energy consumption perceptive it make sense to run CMOS circuit at the lowest possible frequency as power consumption drops quadratic. But maybe running the CPU/DSP at higher speed and switching to power save mode if after a frame has been decoded/encoded is be equally energy efficient… Also, I do not think we shall consider implementation delay, which occurs due to suboptimal implementation. For example, some years ago we tested the RTT of two Linux softphones link directly together using G.711. It was 400ms. The implementation delay could be a good performance metric to differentiate two otherwise equal products. Also, the algorithmic processing delay might be subject to similar market optimization. Having said this, I would anyhow suggest to include the processing delay into the measurement of the end-to-end (acoustic round) trip time. Those measurements should be part of the control loop that optimizes the overall conversation call quality. [Koen]: > Gateways are another matter, with the delays being highly dependent on > the product architecture. Interupt latencies, context switching, bus > architectures, etc. can dominate, so it is totally possible that > reducing the frame size might actually increase the latency (since it > increases the packets per second load on the gateway). So I agree > with Koen on this as well. [Raymond]: I agree with the first half of your paragraph above but not the second half, because the second half contradicts with the real-world observed behaviors: G.711 with a 5 ms frame/packet size gets 12 to 17 ms of codec-dependent delay, while G.711 with a 10 ms frame/packet size gets 50 to 60 ms of worst-case codec-dependent delay before delay optimization, and 30 ms after delay optimization. In these two actual real-world VoIP gateway implementations, the codec-dependent delay grow with the codec frame size with about a 3X multiplier. [Raymond]: The frame-size multiplier has many more components than just serialization as I have discussed in my previous emails. How can the total multiplier be less than 1 when just buffering the current frame of input speech samples will take one frame? Perhaps you are only talking about the serialization delay component? I agree that delay variations due to congestion are better modeled as jitter. That's always the case, and my previous discussions did not include jitter in the codec-dependent delay. [Raymond]: We broke down the one-way delay into codec-independent delay and codec-dependent delay, and then further broke down the codec-dependent delay into the components of codec buffering delay, processing delay, transmission delay (I guess what you called serialization delay), and scheduling and buffering delays of the micro-controllers and DSPs due to many tasks to many channels competing for the processors, etc. We also analyzed which part doesn't change with improving technology (e.g. codec buffering delay) and how certain delay components may change with increasing processor speed and transmission speed. Isn't that analytical? How much more analytical can you get than that? We didn't just throw a few real-world observed codec-dependent delay values and ask everyone to believe the 3X multiplier without any explanation or analysis. No, we just used these real-world values to support our analysis. [Stephen]: Generally the need to buffer the current frame is treated as part of the algorithmic delay. At least I believe that is what the ITU-T does. -- Ticket URL: <http://trac.tools.ietf.org/wg/codec/trac/ticket/19#comment:2> codec <http://tools.ietf.org/codec/>
- [codec] #19: How large is Serialization delay? codec issue tracker
- Re: [codec] #19: How large is Serialization delay? stephen botzko
- Re: [codec] #19: How large is the frame size depe… codec issue tracker
- Re: [codec] #19: How large is the frame size depe… codec issue tracker
- Re: [codec] #19: How large is the frame size depe… codec issue tracker
- Re: [codec] #19: How large is the frame size depe… codec issue tracker
- Re: [codec] #19: How large is the frame size depe… codec issue tracker
- Re: [codec] requirements #19 (new): How large is … codec issue tracker