Re: [codec] #31: Requirements of high-density VoIP gateways (and low cost VoIP phone)?
"codec issue tracker" <trac@tools.ietf.org> Sun, 09 May 2010 17:30 UTC
Return-Path: <trac@tools.ietf.org>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id B97613A6987 for <codec@core3.amsl.com>; Sun, 9 May 2010 10:30:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -100.97
X-Spam-Level:
X-Spam-Status: No, score=-100.97 tagged_above=-999 required=5 tests=[AWL=-1.285, BAYES_50=0.001, NO_RELAYS=-0.001, SARE_MILLIONSOF=0.315, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4jZS6GN6Dtyn for <codec@core3.amsl.com>; Sun, 9 May 2010 10:30:38 -0700 (PDT)
Received: from zinfandel.tools.ietf.org (unknown [IPv6:2001:1890:1112:1::2a]) by core3.amsl.com (Postfix) with ESMTP id 0C35B3A688F for <codec@ietf.org>; Sun, 9 May 2010 10:30:38 -0700 (PDT)
Received: from localhost ([::1] helo=zinfandel.tools.ietf.org) by zinfandel.tools.ietf.org with esmtp (Exim 4.69) (envelope-from <trac@tools.ietf.org>) id 1OBAKg-0007KV-UI; Sun, 09 May 2010 10:30:26 -0700
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: codec issue tracker <trac@tools.ietf.org>
X-Trac-Version: 0.11.6
Precedence: bulk
Auto-Submitted: auto-generated
X-Mailer: Trac 0.11.6, by Edgewall Software
To: hoene@uni-tuebingen.de
X-Trac-Project: codec
Date: Sun, 09 May 2010 17:30:26 -0000
X-URL: http://tools.ietf.org/codec/
X-Trac-Ticket-URL: http://trac.tools.ietf.org/wg/codec/trac/ticket/31#comment:1
Message-ID: <071.ed8f4214802ed6fa1cfed32aff870c5a@tools.ietf.org>
References: <062.033384855453e54a2a3d58ff06d7ccb1@tools.ietf.org>
X-Trac-Ticket-ID: 31
In-Reply-To: <062.033384855453e54a2a3d58ff06d7ccb1@tools.ietf.org>
X-SA-Exim-Connect-IP: ::1
X-SA-Exim-Rcpt-To: hoene@uni-tuebingen.de, codec@ietf.org
X-SA-Exim-Mail-From: trac@tools.ietf.org
X-SA-Exim-Scanned: No (on zinfandel.tools.ietf.org); SAEximRunCond expanded to false
Cc: codec@ietf.org
Subject: Re: [codec] #31: Requirements of high-density VoIP gateways (and low cost VoIP phone)?
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Reply-To: codec@ietf.org
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 09 May 2010 17:30:39 -0000
#31: Requirements of high-density VoIP gateways (and low cost VoIP phone)? ------------------------------------+--------------------------------------- Reporter: hoene@… | Owner: Type: enhancement | Status: new Priority: major | Milestone: Component: requirements | Version: Severity: Active WG Document | Keywords: ------------------------------------+--------------------------------------- Comment(by hoene@…): [Christian]: > Does DSP take over all codec processing? May the CPU do some parts of > the computation > before, during or after DSP does the signal processing? [Raymond]: I asked an engineering manager who was deeply involved in the design of high-density VoIP gateways. He said that in such gateways, due to the high number of voice channels (thousands) per box, a large number of DSPs and micro-controllers are used, and they are usually structured in a hierarchical way. The DSPs typically take care of all speech codec processing, echo cancellation, DMTF tone detection, and fax, etc. The DSPs are usually divided into groups, with each groups of DSPs controlled by a single micro-controller, which handles things like RTP, jitter buffering, packetization, QoS statistics, and moving the voice traffic to and from the DSPs in the group. Then, on top of that there may be higher- performance controllers, each connected to many such groups of micro- controller + DSPs. These higher-performance controllers may handle things like call setup, UDP/IP/RTP, routing to and from internal processor groups, and routing to and from external networks/devices. [Christian]: > How do you count number of channels? Do all voice channels have the > same weight regardless their sampling rate? > Say suppose, if the mixing is done for 48kHz instead of 8kHz, how many > resource are we allowed to consume more? [Raymond]: I am not sure what you meant. The channel count is just counting the actual physical voice channels that the gateway can handle simultaneously; it is not a weighted sum. Are you thinking that a 48 kHz channel should be counted more than an 8 kHz channel because it requires more computational resources? Typical VoIP gateways only support 8 kHz telephone-bandwidth speech, so 48 kHz is out of the picture. With that said, the complexity difference between speech codecs can make a big difference in the channel density. Let's say a VoIP gateway supports X simultaneous voice channels running the G.711 codec. Since the complexity of G.711 PCM is next to nothing, the complexity of each voice channel is dominated by the echo canceller (EC). Now if you replace the G.711 codec by the G.729A codec which takes about 10 MIPS of computational complexity for a full-duplex codec, that can easily decrease the channel density to X/2.5 per gateway, depending on the EC and other things. If you replace the G.711 codec by the G.728 codec that takes 30+ MIPS, the channel density can easily go down to X/4 ~ X/5 or worse. Thus, if you choose a high-complexity codec, you would need to buy a lot more VoIP gateways to support the same number of voice channels than if you use a low-complexity codec. The cost difference is very real and can be very big. The engineering manager I mentioned in my last email (who is a different from the IP phone expert I previously mentioned) told me that “the devil is in the details” and that excluding the jitter buffer delay and other codec-independent delays, a straightforward VoIP gateway implementation without paying attention to minimizing delay may have a codec-dependent one-way delay of 5X to 6X codec frame size because of all of the various delays of (2) above due to complex timing issues that come with supporting so many channels simultaneously. Even after analyzing all delay components carefully and “optimizing the delay to death” until there is no more room for delay reduction, the worst-case one-way codec-dependent delay is still about 3X codec frame size, excluding jitter and other codec-independent delay. This is an independent corroboration of what the other IP phone expert said about the codec-independent one-way delay of 3X codec frame size for VoIP gateways. (The two of them worked on different projects in different companies.) My conclusion: while I am less familiar with VoIP soft client implementations on computers, at least for IP phones and VoIP gateways, the rule of thumb that many engineers found to work well for codec- dependent one-way delay is 3X (codec frame size) + other codec buffering delays (e.g. look-ahead and/or filtering delay). [Raymond]: Regarding the 3X multiplier for VoIP gateways, I already stated clearly in my original text that the 12 to 17 ms was the codec-dependent one-way delay. There is no "constant delay of 7 ms" in that (if it were constant, it would not be codec-dependent). The whole 12 to 17 ms delay was proportional to the codec frame size. As I said in my last email to Christian, there is another independent corroboration by another person (who was deeply involved in VoIP gateway designs) that this 3*(codec frame size) worst-case codec-dependent one-way delay was about the lowest that can be achieved after they "optimized the delay to death". What I didn't say is that this was actually for G.711 channels with a 10 ms frame/packet size, where the actual processing time spent on encoding and decoding the 10 ms G.711 codec frame was next to nothing, and yet the complex scheduling and buffering delays throughout the system, which are proportional to the 10 ms processing intervals, still added up to 3X frame size. Currently, 70% to 80% of the phone shipments to large enterprises are IP phones. With small enterprises also counted, the overall average is about 60% IP phones. The current industry projection is that within 5 years, the overall average would be 80% to 90% IP phones. (The large enterprises will probably be close to 100% IP phones by then.) Hence, there are already a huge number of IP phones deployed, and in the future it would be almost all IP phones in the workplace, especially in medium to large companies. I think it would be a mistake for the IETF Internet codec to completely ignore such IP phone applications, but if we want to address such a huge installed base of IP phones, the 3*(codec frame size) delay is very real for IP phones and it is desirable to have a low-delay mode for the IETF codec to enhance the user experience when using the IETF codec in such IP phones. [Stephen]: I've worked with Gateways\MCUs where the packet size had to be increased because packet loading in the product became too high. Also, if you have QOS features enabled in many routers, the routers themselves have to start using a "software path", which creates a similar throughput problem in the routers. Too many packets per second can overwhelm these devices, creating both capacity issues and excessive queuing delays. [Raymond]: OK, now I see what you meant when you said "it is totally possible that reducing the frame size might actually increase the latency". This is probably more likely to happen many years ago but less of a problem now, as I was told by networking guys that nowadays networking gears can handle 5 ms packets without problems. In fact, the VoIP gateway I talked about, which has a 12 to 17 ms codec-dependent one- way delay for a 5 ms frame/packet size, was done 6 or 7 years ago. Even back then the gateway can handle it without problems. … Yes, higher packet rates means higher packet header overhead bit-rates, more burden on networking gears in I/O bandwidth and throughput, etc. However, that's the price to pay if we need low latency, just like if we want to avoid all these, the price to pay is higher latency. It's all a matter of trade-off and the best choice depends on the application at hand. In Section 2 of Jean-Marc's Internet Draft draft-ietf-codec- requirements-00, 6 specific applications for the IETF codec were listed. Fully 5 of these 6 applications list less than 10 ms of codec delay as either a requirement or a desirable feature. (The only exception is point- to-point calls.) The only way to achieve this less than 10 ms codec delay is with a codec frame size of less than 10 ms, and to get the kind of low latency that these 5 applications desire, each packet had better contain only one codec frame as payload (rather than multiple frames). So, yeah, there is negative consequences of the resulting higher packet rates, but hey, if we want to get low latency as desired or required by these 5 applications, that's the price we will need to be prepared to pay. There is no free lunch. If we want to use a 20 ms frame/packet size to avoid those consequences, then we need pay the price of not achieving the low latency that these 5 applications desire or require. [Raymond]: All I have been arguing in the last couple of weeks was that there are also application scenarios where a low-delay mode is needed, and there are applications where low codec complexity is desirable or even important. Even draft-ietf-codec-requirements-00 talks about a low-delay mode. Although the codec WG charter says that “it is not the goal of working group to produce more than one codec”, it does acknowledge that “based on the working group's analysis of the design space, the working group might determine that it needs to produce more than one codec, or a codec with multiple modes”. Thus, I believe that my proposal to have multiple coding modes in the IETF codec (to address the needs of low bit-rate, low delay, or low complexity in different applications) is completely within the scope of the codec WG’s charter. One more comment about the coding delay issue. When we compare VoIP with traditional circuit-switched PSTN telephony, VoIP is better in most aspects except one: it has substantially longer one-way delay than PSTN telephony. In this area of delay, PSTN still beats VoIP by far. As Moore’s Law improves technologies over time, the processing speed and communication speed improves with time, so the codec complexity and encoding bit-rate are going to be less and less of an issue as time goes. However, delay is one thing that doesn’t get improved with Moore’s Law once a codec frame size is chosen and fixed. Therefore, if we take a long-term view and attempt to make VoIP better than or at least not significantly worse than PSTN in all aspects, then I believe that we should address the VoIP’s long-delay issue head-on with a low-delay mode in the IETF codec. [Koen]: Ultra-low delay is important and has been part of the requirements from day one. [...] Personally I'm convinced that people want super-wideband and probably even full-band audio before they want a < 20 ms codec, if they have the bitrate to support either. Yes, even for interactive voice. Audio bandwidth just has a bigger impact on user experience. The analysis we've done within our Skype network supports this conclusion. But maybe it's different with IP phones which apparently have problems with delay, dunno.. [Hoene]: I have been told that similar statement are valid also for other gateway manufactures and that the design of high-density gateways is much more demanding than of softphones: Because data and code memory is limited and code cannot be loaded on demand, costs are already high, power consumption is a problem, execution is highly paralleled, etc... Thus, it makes sense to have a codec (profile) optimized for this use case. […] And, are these requirements unique or are they covered by existing codecs like G.711 and G.729 already? Is it likely that gateways, which operate already on their limits, can support yet another codec? [Gregory]: There are a number of excellent pre-existing codecs out there— during the formation of this working group we concluded that there was a significant non-addressed application space which a new codec could satisfy, but I've seen a number of requirements raised here which may be specific to applications for which existing codecs are already well suited. In particular while computational burden is an essential concern, I don't think it is reasonable to subject a full-band / super-band / wideband codec to the same criteria which would be reasonable for a narrow band codec. If your gateway can't scale to acceptable size except with very computationally cheap codecs, then you probably ought to be using one of the already established narrow-band codecs. I don't think it's a good idea to design for very high levels of complexity but we ought to keep in mind that the working group is already targeted at something high quality (and thus more complex) than narrow band. Together with the normal "moore's law" progress in transistor density, I think these factors may suggest a slight bias towards additional computational complexity at least where the increase in complexity can be effectively used. [Christian]: > What are those specific codec requirements, then? > - narrow-band? > - 5ms or 10ms frame size? > - low complexity > - low memory footprint > - transcoding robust ... [Raymond]: - For the foreseeable future, I think most of the VoIP gateway voice channels will still be narrowband. We may start to see some wideband (16 kHz sampling). - There are different VoIP gateway customers. Some just want the lowest possible cost of deployment and don't care too much about call quality or latency; they will probably use 20 ms packets. However, some want to compete seriously with the PSTN telephony offered by incumbent telcos. There they have to have good quality and low latency since PSTN latency is very low. These customers will want to use 10 ms packets or even 5 ms packets if their hardware can handle it. - Yes, relatively low complexity and low memory footprint are both important for VoIP gateway implementation of codecs. - Good transcoding performance is also a plus, although generally if a codec's single encoding performance is already high, then it transcoding performance is usually good as well. [Raymond]: I never propose that we should subject full- /super-/wide-band codecs to the same complexity constraints as for narrowband codecs. What I am proposing, though, is that for a particular sampling-rate, be it 16, 32, 44.1, or 48 kHz, when we consider different codec options, we should not ignore the codec complexity, because high codec complexity have negative consequences in low-end devices and gateways. My other point is that although the WG does want a wideband/super- wideband/full-band audio codec, we shouldn't completely ignore narrowband, because that's still how most of the point-to-point voice calls and multi- party voice conferencing (the first two of the six applications listed in the charter) are conducted today and in the foreseeable future. Just North American cable operators alone have tens of millions of VoIP telephony subscribers. If you add other countries, telcos, independent VoIP operators like Vonage, and enterprise IP phone users, although I don't have the stats, I wouldn't be surprised if the total VoIP users worldwide exceed 100M, probably significantly. Most of these are still narrowband. Therefore, it makes the most sense for the IETF codec to be able to also address narrowband speech coding so it has a chance to be used by these VoIP users. If a call goes through the IETF codec and reaches out to a conventional phone through a VoIP gateway, then it is better if the call doesn't have to be transcoded to another medium- or low-bit-rate codec so the additional codec distortion and coding delay can be avoided. This is where the recent discussion of VoIP gateways come in. [Raymond]: devices like VoIP gateways are already using very fast processors and connected to very fast networks, and yet the codec- dependent one-way delay are still around 3X codec frame size because of complicated timing issues and processor buffering needs due to the large number of voice channels competing for resources. As Moore's Law makes the processor even faster, chances are each processor will handle even more voice channels, so although the time spent on processing each codec frame size will decrease (it is already fairly small), the scheduling/timing issue and the associated buffering needs probably will get even worse, so I am not convinced that the net result is that the codec-dependent delay will get much smaller than 3X codec frame size in the future. -- Ticket URL: <http://trac.tools.ietf.org/wg/codec/trac/ticket/31#comment:1> codec <http://tools.ietf.org/codec/>
- [codec] #31: Requirements of high-density VoIP ga… codec issue tracker
- Re: [codec] #31: Requirements of high-density VoI… codec issue tracker
- Re: [codec] requirements #31 (new): Requirements … codec issue tracker
- Re: [codec] #31: Requirements of high-density VoI… codec issue tracker