[codec] #33: Impact of transmission delay
"codec issue tracker" <trac@tools.ietf.org> Mon, 24 May 2010 14:16 UTC
Return-Path: <trac@tools.ietf.org>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 32D883A6E9D for <codec@core3.amsl.com>; Mon, 24 May 2010 07:16:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -101.055
X-Spam-Level:
X-Spam-Status: No, score=-101.055 tagged_above=-999 required=5 tests=[AWL=-1.055, BAYES_50=0.001, NO_RELAYS=-0.001, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FkCYGUkavo7j for <codec@core3.amsl.com>; Mon, 24 May 2010 07:15:54 -0700 (PDT)
Received: from zinfandel.tools.ietf.org (unknown [IPv6:2001:1890:1112:1::2a]) by core3.amsl.com (Postfix) with ESMTP id C11383A6AC1 for <codec@ietf.org>; Mon, 24 May 2010 07:15:54 -0700 (PDT)
Received: from localhost ([::1] helo=zinfandel.tools.ietf.org) by zinfandel.tools.ietf.org with esmtp (Exim 4.71) (envelope-from <trac@tools.ietf.org>) id 1OGYRX-00088D-Dr; Mon, 24 May 2010 07:15:47 -0700
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: codec issue tracker <trac@tools.ietf.org>
X-Trac-Version: 0.11.7
Precedence: bulk
Auto-Submitted: auto-generated
X-Mailer: Trac 0.11.7, by Edgewall Software
To: hoene@uni-tuebingen.de
X-Trac-Project: codec
Date: Mon, 24 May 2010 14:15:47 -0000
X-URL: http://tools.ietf.org/codec/
X-Trac-Ticket-URL: http://trac.tools.ietf.org/wg/codec/trac/ticket/33
Message-ID: <062.f33215ed513b5540d72cce71ceca2a9a@tools.ietf.org>
X-Trac-Ticket-ID: 33
X-SA-Exim-Connect-IP: ::1
X-SA-Exim-Rcpt-To: hoene@uni-tuebingen.de, codec@ietf.org
X-SA-Exim-Mail-From: trac@tools.ietf.org
X-SA-Exim-Scanned: No (on zinfandel.tools.ietf.org); SAEximRunCond expanded to false
Cc: codec@ietf.org
Subject: [codec] #33: Impact of transmission delay
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Reply-To: codec@ietf.org
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 24 May 2010 14:16:02 -0000
#33: Impact of transmission delay ------------------------------------+--------------------------------------- Reporter: hoene@… | Owner: Type: defect | Status: new Priority: major | Milestone: Component: requirements | Version: Severity: - | Keywords: ------------------------------------+--------------------------------------- [Koen]: For typical VoIP applications, Moore's law has lessened the pressure to reduce bitrates, delay and complexity, and has shifted the focus to fidelity instead. [Benjamin]: I think this is a typo, and you mean "lessened the pressure to reduce bitrates and complexity, and has shifted the focus to fidelity and delay instead". [Koen]: Not a typo: codecs have become more wasteful with delay, while delivering better fidelity. G.718 evolved out of AMR-WB and has more than twice the delay. Same for G.729.1 versus G.729. This is not by accident. The main rationale for codec delay being less important today is that faster hardware has reduced end-to-end delay in every step along the way. As a result, a typical VoIP connection now operates at a flatter part of the "impairment-vs-delay" curve, meaning that reducing delay by N ms at a given fidelity gives a smaller improvement to end users today than it did some years ago. Therefore, the weight on minimizing delay in the "codec design problem" has gone down, and the optimum codec operating point has naturally shifted towards higher delay, in favor of fidelity. I've mentioned before that average delay on Internet connections seems to be 40% to 50% lower now than just 5 years ago, which is just one contributor to lower end-to-end delay. That doesn't mean high-delay connections don't exist - they do, for instance over dial-up or 3G. But in those cases it's still better to use a moderate packet rate (and bitrate), to minimize congestion risk. The confusion may come from the fact that the trade-off between fidelity and delay changes towards high quality levels: once fidelity saturates, delay gets priority. Even more so because such high fidelity enables new, delay-sensitive applications like distributed music performances. This is reflected in the ultra-low delay requirements in the requirements document. To summarize, the case for using sub-20 ms frame sizes with medium- fidelity quality is now weaker than ever, because the relative importance of fidelity has gone up. [Christian]: may I present some results of the ITU-T SG12 on the perceptual effects of delay? For many years, it was assumed that 150ms is the boundary for interactive voice conversations (see Nobuhiko Kitawaki, and Kenzo Itoh: Pure Delay Effects on Speech Quality in Telecommunications, IEEE J. on Selected Areas in Commun., Vol.9, No.4, pp.586-593, May 1991) Until 400ms quality is still acceptable (about toll quality). The ITU-T G.107 quality model reflects this opinion. However, in the recent years, new results have shown that the impact of delay on conversation quality is NOT as strong as assumed. At the ITU-T, numerous contributions have been made on this issue: Contribution of BT “Comparison of E-Model and subjective test data for pure-delay conditions” from 2007-01-08: Link http://www.itu.int/md/T05-SG12-C-0030/en The conversational tests were done in controlled environments with nine pairs of subjects. Two subjects had the common tasks of their set of sorting pictures in the same order. Other conditions: No echos, G.711, no frame loss [PICTURE at http://www.ietf.org/mail- archive/web/codec/current/msg01588.html] Legend: MOS-CQS are subjective conversational tests MOS-CQE is the E-Modell (G.107) MOS-LQO are result from PESQ. The delay is a one-way delay. Beside MOS values, they also studied the subjective rating of percentage difficultly (%D). Starting at about 150ms is goes up at reaches 35% at 900ms. After that it remains constant. Also, LM Ericsson described very interesting results in “Investigation of the influence of pure delay, packet loss and audio-video synchronization for different conversation tasks” from 2007-09-24. http://www.itu.int/md/T05-SG12-C-0119/en For example: The done conversational tests similar to ITU-T P.805. The conversation lasted about 3 to 5 minutes. 11 pairs of experts were taken part. [PICTURE at http://www.ietf.org/mail- archive/web/codec/current/msg01588.html] The tasks at 160ms were done about 50s faster than the same task at 600ms And in the second tests about 60 naïve subjects and experts were taken part to solve a conversational task. If they were asked for interactivity the ratings look worse. Overall, it seems that the limit of 150ms is greatly overestimated. A much relaxed timing is allowed. [Benjamin]: (1) The results conflict with common sense. A round-trip delay of 800 ms makes normal conversation extremely irritating in practice. I'm not surprised these results don't show up in laboratory tests, because fast conversations with interjections and rapid responses typically require a social context not available in a lab test. It's possible that the ITU regards "extremely irritating" as "acceptable", since effective conversation is still possible. In that case, I would say that the working group intends to enable applications with much better than "acceptable" quality. (2) Tests may have been done in G.711 narrowband, which introduces its own intelligibility problems and reduces quality expectation. Higher fidelity makes latency more apparent. Similarly, the equipment used may have introduced quality impairments that make the delay merely one problem among many. (3) I presume the tests were done with careful equipment setup to avoid echo. The perceived quality impact of echo at 200 ms one-way delay is enormous, as shown in http://downloads.hindawi.com/journals/asp/2008/185248.pdf Using an echo-canceller impairs quality significantly. Imperfect echo cancellation leaves some residual artifact, which is also irritating at long delays. The tests (even in the paper above) were performed using a telephone handset and earpiece. High-quality telephony with a freestanding speaker instead of an earpiece demands especially low delay due to the difficulties with echo cancellation. [Marshall]: This depends a lot on what sort of discussion is at issue (and, also on the culture of the participants). For example, in my experience telepresence sessions tend to be structured meetings and can typically tolerate even half second delays without too much disruption, while for a one-on-one conversation on the same equipment the same delay can be pretty objectionable. Having said that, I myself also find the previously attached graphs a little odd, and want to see a written description of just what sort of experiments they describe. [Brian]: I agree with this. I was in a group that did some research on this (unpublished, unfortunately) and we confirmed that there is a cliff, around 500 ms round trip, after which conversation is impaired. It is remarkably consistent, is more or less independent of culture (with one interesting exception), and is really a cliff: under it and further improvement is hard to notice, over it and conversation is impaired, and the difference between say 750 and 1500ms isn't all that significant. Engineers who believe delay is a "less is better" quantity need to be educated that it is not. It is a threshold [JM]: Considering that the network delay is not a constant, you no longer have an absolute cliff. So reducing the delay means you can increase the distance without falling off the cliff. [Benjamin]: One test in that paper told trained subjects to "Take turns reading random numbers aloud as fast as possible", on a pair of handsets with narrowband uncompressed audio and no echo. Subjects were able to detect round-trip delays down to 90 ms. Conversational efficiency was impaired even with round-trip delay of 100 ms. Let me emphasize again that these delays are round-trip, not one-way, there is no echo, and the task, while designed to expose latency, is probably less demanding than musical performance. ... I accept Brian Rosen's claim that a slow conversation doesn't normally suffer greatly from round-trip latencies up to 500 ms, but under some circumstances much lower latencies are valuable. Let's make sure they're achievable for those who can use them. [Raymond]: Other than potential echo issues, the biggest problem with a one-way delay longer than a few hundred ms is that such a long delay makes it very difficult to interrupt each other, resulting in the start-stop- start-stop cycles I previously talked about. Therefore, I agree with Ben that if the lab test did not have echoes and did not involve the test subjects trying to interrupt each other, then the test results may appear more benign than what one would experience in the real world. Note that the top curve in the first figure below is for “listening-only tests”. Well, in that case there was no interaction/interruption at all, so if there was no echoes, either, then it is no wonder that the curve stayed essentially flat. I do wonder what made the curve go down at 1300 ms; I guess to understand this we need to know what the lab set up was for this test. Thus, I echo Marshall’s opinion that we need the original paper/contribution. My personal experience with the delay impairment is much worse than the middle curve (MOS-CQS) would suggest and is close to the bottom curve (MOS-CQE). Back in early 1980s the phone calls I made from southern California to East Asia were carried through geosynchronous satellites with a one-way delay slightly more than 500 ms (see http://en.wikipedia.org/wiki/Geostationary_orbit). I absolutely hated it, because turn-taking was severely impaired and the only way to interrupt the person at the other side was to keep talking (rudely, I may say) until the other person finally stopped. Then, starting in late 1980s undersea cables were used to carry my traditional circuit-switched calls to the same person in East Asia, and all of a sudden the delay was much shorter and interrupting each other felt as easy as face-to-face conversation. It’s a night-and-day difference! Even in early 2000s when I used my cell phone to call my son’s cell phone in another cellular network, I could tell that there was a significant delay that noticeably impaired our turn- taking and our ability to interrupt each other, and I didn’t like it at all. Now you know why I advocate low-delay voice communications, have been working on low-delay speech coding for two decades, and have even published a book chapter on low-delay speech coding :o) [Stephen]: From my own experience (not testing) I agree with Brian's claim that 500 ms round trip is acceptable for most conversation. It does depend on what you are doing, and there are certainly tasks where much lower delays are needed. [Mike]: Agreed that achieving low enough latencies for sidetone perception should not be a goal of the wg, but we should be aiming if at all possible for better than 250 ms one-way delay in typical (and non-tandemed) deployments. The knee of the one-way delay impairment factor begins rising non-linearly somewhere between 150 and 250 ms. [Raymond]: If you read the published technical papers on G.718 and G.729.1 carefully, I think you will find that the real reason for the increased delay is not because they needed a longer delay to achieve better fidelity for speech, but because they wanted to extend speech codecs to also get good performance when coding general audio (music, etc.). To get good music coding performance, most audio codecs use Modified Discrete Cosine Transform (MDCT) with at least a transform window size that is fairly large, so most of the audio codecs have longer coding delays than speech codecs. To code music well, G.718 and G.729.1 developers naturally had to use long MDCT transform windows on top of the codec delay already in AMR- WB and G.729. Even so, the resulting longer delays of G.718 and G.729.1 are still not any longer than typical delays of audio codecs; in fact, they are probably somewhat shorter. My point is that the increased delays of G.718 and G.729.1 are purely a result of changing from "speech-only" to "speech and music". It's not because the G.718 and G.729.1 developers knew the network delay was getting shorter so they could be more wasteful with delay. Furthermore, even after they changed the codecs to handle music as well as speech, they still chose to make their codec delays shorter than the delays of most audio codecs. Why? They wanted to make their codec delays as short as they could. In fact, they even made an effort to introduce a "low-delay mode" into both G.718 and G.729.1. That shows they were pretty concerned about the higher delays they needed to have in order to code music well. By the way, G.718 does NOT have "more than twice the delay" of AMR-WB as you said. AMR-WB has a 20 ms frame size, 5 ms look-ahead, and 1.875 ms of filtering delay, for a total algorithmic buffering delay of 26.875 ms. The "normal mode" of G.718 has a buffering delay of 42.875 ms for 16 kHz wideband input/output. That's only 59.5% higher than AMR-WB. For Layers 1 and 2 coding of speech, the "low-delay mode" shaves 10 ms off to give a delay of 32.875 ms, or only 22.3% higher than AMR-WB. When G.729.1 was first standardized in May 2006, there was already a low- delay mode for narrowband speech at 8 and 12 kb/s with a algorithmic buffering delay of 25 ms. Later in August 2007, the developers made an effort to add another low-delay mode for wideband at 14 kb/s that has a buffering delay of 28.94 ms. If they wanted to sacrifice delay to get higher fidelity as you suggested, then why would they bother to go back and add another low-delay mode for wideband? In fact, only a few months ago in their G.729.1 paper in IEEE Communications Magazine, October 2009, Varga, Proust, and Taddei still emphasized in multiple instances the importance of achieving a low coding delay. I will quote two of the instances: "The low-delay mode... was added to the first wideband layer at 14 kb/s of G.729.1 (August 2007). The motivation was to address applications such as VoIP in enterprise networks where low end-to-end delay is crucial" and "Indeed, delay is an important performance parameter, and transmitting speech with low end-to-end delay is also required in several applications making use of wideband signals". In summary, I do not see a clear trend where codec developers are becoming more wasteful with delay in order to get higher fidelity. If anything, in recent years I saw a trend of low-delay audio coding, such as low-delay AAC and the CELT codec, and I saw the effort by G.718 and G.729.1 developers to introduce low-delay modes. In any case, I thought a few days ago a consensus was already reached in the WG email reflector that the IETF codec needs to have a low- delay mode with a 5 to 10 ms codec frame size so that it can handle delay-sensitive applications (that is 5 out of 6 applications listed in the charter and codec requirement document). Therefore, I think the discussion in your last email and my current email is mostly of academic interest only and doesn't and shouldn't affect how the IETF codec is to be designed. [Mike]: Agreed that achieving low enough latencies for sidetone perception should not be a goal of the wg, but we should be aiming if at all possible for better than 250 ms one-way delay in typical (and non-tandemed) deployments. The knee of the one-way delay impairment factor begins rising non-linearly somewhere between 150 and 250 ms. CONSENSUS: Impairments start somewhere between 150 and 250ms one-way delay. -- Ticket URL: <http://trac.tools.ietf.org/wg/codec/trac/ticket/33> codec <http://tools.ietf.org/codec/>
- [codec] #33: Impact of transmission delay codec issue tracker
- Re: [codec] #33: Impact of transmission delay codec issue tracker