Re: [codec] #31: Requirements of high-density VoIP gateways (and low cost VoIP phone)?

"codec issue tracker" <trac@tools.ietf.org> Sun, 09 May 2010 17:30 UTC

Return-Path: <trac@tools.ietf.org>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id B97613A6987 for <codec@core3.amsl.com>; Sun, 9 May 2010 10:30:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -100.97
X-Spam-Level:
X-Spam-Status: No, score=-100.97 tagged_above=-999 required=5 tests=[AWL=-1.285, BAYES_50=0.001, NO_RELAYS=-0.001, SARE_MILLIONSOF=0.315, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4jZS6GN6Dtyn for <codec@core3.amsl.com>; Sun, 9 May 2010 10:30:38 -0700 (PDT)
Received: from zinfandel.tools.ietf.org (unknown [IPv6:2001:1890:1112:1::2a]) by core3.amsl.com (Postfix) with ESMTP id 0C35B3A688F for <codec@ietf.org>; Sun, 9 May 2010 10:30:38 -0700 (PDT)
Received: from localhost ([::1] helo=zinfandel.tools.ietf.org) by zinfandel.tools.ietf.org with esmtp (Exim 4.69) (envelope-from <trac@tools.ietf.org>) id 1OBAKg-0007KV-UI; Sun, 09 May 2010 10:30:26 -0700
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: codec issue tracker <trac@tools.ietf.org>
X-Trac-Version: 0.11.6
Precedence: bulk
Auto-Submitted: auto-generated
X-Mailer: Trac 0.11.6, by Edgewall Software
To: hoene@uni-tuebingen.de
X-Trac-Project: codec
Date: Sun, 09 May 2010 17:30:26 -0000
X-URL: http://tools.ietf.org/codec/
X-Trac-Ticket-URL: http://trac.tools.ietf.org/wg/codec/trac/ticket/31#comment:1
Message-ID: <071.ed8f4214802ed6fa1cfed32aff870c5a@tools.ietf.org>
References: <062.033384855453e54a2a3d58ff06d7ccb1@tools.ietf.org>
X-Trac-Ticket-ID: 31
In-Reply-To: <062.033384855453e54a2a3d58ff06d7ccb1@tools.ietf.org>
X-SA-Exim-Connect-IP: ::1
X-SA-Exim-Rcpt-To: hoene@uni-tuebingen.de, codec@ietf.org
X-SA-Exim-Mail-From: trac@tools.ietf.org
X-SA-Exim-Scanned: No (on zinfandel.tools.ietf.org); SAEximRunCond expanded to false
Cc: codec@ietf.org
Subject: Re: [codec] #31: Requirements of high-density VoIP gateways (and low cost VoIP phone)?
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Reply-To: codec@ietf.org
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 09 May 2010 17:30:39 -0000

#31: Requirements of high-density VoIP gateways (and low cost VoIP phone)?
------------------------------------+---------------------------------------
 Reporter:  hoene@…                 |       Owner:     
     Type:  enhancement             |      Status:  new
 Priority:  major                   |   Milestone:     
Component:  requirements            |     Version:     
 Severity:  Active WG Document      |    Keywords:     
------------------------------------+---------------------------------------

Comment(by hoene@…):

 [Christian]:
 > Does DSP take over all codec processing? May the CPU do some parts of
 > the computation > before, during or after DSP does the signal
 processing?

 [Raymond]: I asked an engineering manager who was deeply involved in the
 design of high-density VoIP gateways. He said that in such gateways, due
 to the high number of voice channels (thousands) per box, a large number
 of DSPs and micro-controllers are used, and they are usually structured in
 a hierarchical way.  The DSPs typically take care of all speech codec
 processing, echo cancellation, DMTF tone detection, and fax, etc.  The
 DSPs are usually divided into groups, with each groups of DSPs controlled
 by a single micro-controller, which handles things like RTP, jitter
 buffering, packetization, QoS statistics, and moving the voice traffic to
 and from the DSPs in the group.  Then, on top of that there may be higher-
 performance controllers, each connected to many such groups of micro-
 controller + DSPs.  These higher-performance controllers may handle things
 like call setup, UDP/IP/RTP, routing to and from internal processor
 groups, and routing to and from external networks/devices.

 [Christian]: > How do you count number of channels? Do all voice channels
 have the
 > same weight regardless their sampling rate?
 > Say suppose, if the mixing is done for 48kHz instead of 8kHz, how many
 > resource are we allowed to consume more?

 [Raymond]: I am not sure what you meant. The channel count is just
 counting the actual physical voice channels that the gateway can handle
 simultaneously; it is not a weighted sum. Are you thinking that a 48 kHz
 channel should be counted more than an 8 kHz channel because it requires
 more computational resources? Typical VoIP gateways only support 8 kHz
 telephone-bandwidth speech, so 48 kHz is out of the picture.
 With that said, the complexity difference between speech codecs can make a
 big difference in the channel density.  Let's say a VoIP gateway supports
 X simultaneous voice channels running the G.711 codec.  Since the
 complexity of G.711 PCM is next to nothing, the complexity of each voice
 channel is dominated by the echo canceller (EC).  Now if you replace the
 G.711 codec by the G.729A codec which takes about 10 MIPS of computational
 complexity for a full-duplex codec, that can easily decrease the channel
 density to X/2.5 per gateway, depending on the EC and other things.  If
 you replace the G.711 codec by the G.728 codec that takes 30+ MIPS, the
 channel density can easily go down to X/4 ~ X/5 or worse.
 Thus, if you choose a high-complexity codec, you would need to buy a lot
 more VoIP gateways to support the same number of voice channels than if
 you use a low-complexity codec. The cost difference is very real and can
 be very big.

 The engineering manager I mentioned in my last email (who is a different
 from the IP phone expert I previously mentioned) told me that “the devil
 is in the details” and that excluding the jitter buffer delay and other
 codec-independent delays, a straightforward VoIP gateway implementation
 without paying attention to minimizing delay may have a codec-dependent
 one-way delay of 5X to 6X codec frame size because of all of the various
 delays of (2) above due to complex timing issues that come with supporting
 so many channels simultaneously.  Even after analyzing all delay
 components carefully and “optimizing the delay to death” until there is no
 more room for delay reduction, the worst-case one-way codec-dependent
 delay is still about 3X codec frame size, excluding jitter and other
 codec-independent delay.  This is an independent corroboration of what the
 other IP phone expert said about the codec-independent one-way delay of 3X
 codec frame size for VoIP gateways. (The two of them worked on different
 projects in different companies.)
 My conclusion: while I am less familiar with VoIP soft client
 implementations on computers, at least for IP phones and VoIP gateways,
 the rule of thumb that many engineers found to work well for codec-
 dependent one-way delay is 3X (codec frame size) + other codec buffering
 delays (e.g. look-ahead and/or filtering delay).
  [Raymond]:
 Regarding the 3X multiplier for VoIP gateways, I already stated clearly in
 my original text that the 12 to 17 ms was the codec-dependent one-way
 delay.  There is no "constant delay of 7 ms" in that (if it were constant,
 it would not be codec-dependent).  The whole 12 to 17 ms delay was
 proportional to the codec frame size.

 As I said in my last email to Christian, there is another independent
 corroboration by another person (who was deeply involved in VoIP gateway
 designs) that this 3*(codec frame size) worst-case codec-dependent one-way
 delay was about the lowest that can be achieved after they "optimized the
 delay to death".  What I didn't say is that this was actually for G.711
 channels with a 10 ms frame/packet size, where the actual processing time
 spent on encoding and decoding the 10 ms G.711 codec frame was next to
 nothing, and yet the complex scheduling and buffering delays throughout
 the system, which are proportional to the 10 ms processing intervals,
 still added up to 3X frame size.

 Currently, 70% to 80% of the phone shipments to large enterprises are IP
 phones.  With small enterprises also counted, the overall average is about
 60% IP phones.  The current industry projection is that within 5 years,
 the overall average would be 80% to 90% IP phones. (The large enterprises
 will probably be close to 100% IP phones by then.)  Hence, there are
 already a huge number of IP phones deployed, and in the future it would be
 almost all IP phones in the workplace, especially in medium to large
 companies.  I think it would be a mistake for the IETF Internet codec to
 completely ignore such IP phone applications, but if we want to address
 such a huge installed base of IP phones, the 3*(codec frame size) delay is
 very real for IP phones and it is desirable to have a low-delay mode for
 the IETF codec to enhance the user experience when using the IETF codec in
 such IP phones.


  [Stephen]: I've worked with Gateways\MCUs where the packet size had to be
 increased because packet loading in the product became too high.  Also, if
 you have QOS features enabled in many routers, the routers themselves have
 to start using a "software path", which creates a similar throughput
 problem in the routers.  Too many packets per second can overwhelm these
 devices, creating both capacity issues and excessive queuing delays.

 [Raymond]: OK, now I see what you meant when you said "it is totally
 possible that reducing the frame size might actually increase the
 latency". This is probably more likely to happen many years ago but less
 of a problem now, as I was told by networking guys that nowadays
 networking gears can handle 5 ms packets without problems.  In fact, the
 VoIP gateway I talked about, which has a 12 to 17 ms codec-dependent one-
 way delay for a 5 ms frame/packet size, was done 6 or 7 years ago.  Even
 back then the gateway can handle it without problems.
 …
 Yes, higher packet rates means higher packet header overhead bit-rates,
 more burden on networking gears in I/O bandwidth and throughput, etc.
 However, that's the price to pay if we need low latency, just like if we
 want to avoid all these, the price to pay is higher latency.  It's all a
 matter of trade-off and the best choice depends on the application at
 hand.
 In Section 2 of Jean-Marc's Internet Draft draft-ietf-codec-
 requirements-00, 6 specific applications for the IETF codec were listed.
 Fully 5 of these 6 applications list less than 10 ms of codec delay as
 either a requirement or a desirable feature. (The only exception is point-
 to-point calls.)  The only way to achieve this less than 10 ms codec delay
 is with a codec frame size of less than 10 ms, and to get the kind of low
 latency that these 5 applications desire, each packet had better contain
 only one codec frame as payload (rather than multiple frames).
 So, yeah, there is negative consequences of the resulting higher packet
 rates, but hey, if we want to get low latency as desired or required by
 these 5 applications, that's the price we will need to be prepared to pay.
 There is no free lunch.  If we want to use a 20 ms frame/packet size to
 avoid those consequences, then we need pay the price of not achieving the
 low latency that these 5 applications desire or require.
 [Raymond]: All I have been arguing in the last couple of weeks was that
 there are also application scenarios where a low-delay mode is needed, and
 there are applications where low codec complexity is desirable or even
 important.
 Even draft-ietf-codec-requirements-00 talks about a low-delay mode.
 Although the codec WG charter says that “it is not the goal of working
 group to produce more than one codec”, it does acknowledge that “based on
 the working group's analysis of the design space, the working group might
 determine that it needs to produce more than one codec, or a codec with
 multiple modes”.  Thus, I believe that my proposal to have multiple coding
 modes in the IETF codec (to address the needs of low bit-rate, low delay,
 or low complexity in different applications) is completely within the
 scope of the codec WG’s charter.
 One more comment about the coding delay issue.  When we compare VoIP with
 traditional circuit-switched PSTN telephony, VoIP is better in most
 aspects except one: it has substantially longer one-way delay than PSTN
 telephony.  In this area of delay, PSTN still beats VoIP by far.  As
 Moore’s Law improves technologies over time, the processing speed and
 communication speed improves with time, so the codec complexity and
 encoding bit-rate are going to be less and less of an issue as time goes.
 However, delay is one thing that doesn’t get improved with Moore’s Law
 once a codec frame size is chosen and fixed.
 Therefore, if we take a long-term view and attempt to make VoIP better
 than or at least not significantly worse than PSTN in all aspects, then I
 believe that we should address the VoIP’s long-delay issue head-on with a
 low-delay mode in the IETF codec.
 [Koen]:
 Ultra-low delay is important and has been part of the requirements from
 day one. [...] Personally I'm convinced that people want super-wideband
 and probably even full-band audio before they want a < 20 ms codec, if
 they have the bitrate to support either.  Yes, even for interactive voice.
 Audio bandwidth just has a bigger impact on user experience.  The analysis
 we've done within our Skype network supports this conclusion.

 But maybe it's different with IP phones which apparently have problems
 with delay, dunno..


 [Hoene]:
 I have been told that similar statement are valid also for other gateway
 manufactures and that the design of high-density gateways is much more
 demanding than of softphones:  Because data and code memory is limited and
 code cannot be loaded on demand, costs are already high, power consumption
 is a problem, execution is highly paralleled, etc... Thus, it makes sense
 to have a codec (profile) optimized for this use case.
 […]
 And, are these requirements unique or are they covered by existing codecs
 like G.711 and G.729 already? Is it likely that gateways, which operate
 already on their limits, can support yet another codec?

 [Gregory]:
 There are a number of excellent pre-existing codecs out there— during the
 formation of this working group we concluded that there was a significant
 non-addressed application space which a new codec could satisfy,  but I've
 seen a number of requirements raised here which may be specific to
 applications for which existing codecs are already well suited.

 In particular while computational burden is an essential concern, I don't
 think it is reasonable to subject a full-band / super-band / wideband
 codec to the same criteria which would be reasonable for a narrow band
 codec.

 If your gateway can't scale to acceptable size except with very
 computationally cheap codecs, then you probably ought to be using one of
 the already established narrow-band codecs.

 I don't think it's a good idea to design for very high levels of
 complexity but we ought to keep in mind that the working group is already
 targeted at something high quality (and thus more complex) than narrow
 band.  Together with the normal "moore's law"  progress in  transistor
 density, I think these factors may suggest a slight bias towards
 additional computational complexity at least  where the increase in
 complexity can be effectively used.

 [Christian]:
 > What are those specific codec requirements, then?
 > - narrow-band?
 > - 5ms or 10ms frame size?
 > - low complexity
 > - low memory footprint
 > - transcoding robust ...

 [Raymond]:
 - For the foreseeable future, I think most of the VoIP gateway voice
 channels will still be narrowband.  We may start to see some wideband (16
 kHz sampling).
 - There are different VoIP gateway customers.  Some just want the lowest
 possible cost of deployment and don't care too much about call quality or
 latency; they will probably use 20 ms packets. However, some want to
 compete seriously with the PSTN telephony offered by incumbent telcos.
 There they have to have good quality and low latency since PSTN latency is
 very low. These customers will want to use 10 ms packets or even 5 ms
 packets if their hardware can handle it.
 - Yes, relatively low complexity and low memory footprint are both
 important for VoIP gateway implementation of codecs.
 - Good transcoding performance is also a plus, although generally if a
 codec's single encoding performance is already high, then it transcoding
 performance is usually good as well.


 [Raymond]: I never propose that we should subject full- /super-/wide-band
 codecs to the same complexity constraints as for narrowband codecs.  What
 I am proposing, though, is that for a particular sampling-rate, be it 16,
 32, 44.1, or 48 kHz, when we consider different codec options, we should
 not ignore the codec complexity, because high codec complexity have
 negative consequences in low-end devices and gateways.

 My other point is that although the WG does want a wideband/super-
 wideband/full-band audio codec, we shouldn't completely ignore narrowband,
 because that's still how most of the point-to-point voice calls and multi-
 party voice conferencing (the first two of the six applications listed in
 the charter) are conducted today and in the foreseeable future.  Just
 North American cable operators alone have tens of millions of VoIP
 telephony subscribers. If you add other countries, telcos, independent
 VoIP operators like Vonage, and enterprise IP phone users, although I
 don't have the stats, I wouldn't be surprised if the total VoIP users
 worldwide exceed 100M, probably significantly.  Most of these are still
 narrowband.
 Therefore, it makes the most sense for the IETF codec to be able to also
 address narrowband speech coding so it has a chance to be used by these
 VoIP users.  If a call goes through the IETF codec and reaches out to a
 conventional phone through a VoIP gateway, then it is better if the call
 doesn't have to be transcoded to another medium- or low-bit-rate codec so
 the additional codec distortion and coding delay can be avoided.  This is
 where the recent discussion of VoIP gateways come in.

 [Raymond]: devices like VoIP gateways are already using very fast
 processors and connected to very fast networks, and yet the codec-
 dependent one-way delay are still around 3X codec frame size because of
 complicated timing issues and processor buffering needs due to the large
 number of voice channels competing for resources.  As Moore's Law makes
 the processor even faster, chances are each processor will handle even
 more voice channels, so although the time spent on processing each codec
 frame size will decrease (it is already fairly small), the
 scheduling/timing issue and the associated buffering needs probably will
 get even worse, so I am not convinced that the net result is that the
 codec-dependent delay will get much smaller than 3X codec frame size in
 the future.

-- 
Ticket URL: <http://trac.tools.ietf.org/wg/codec/trac/ticket/31#comment:1>
codec <http://tools.ietf.org/codec/>