Re: [codec] Suggested summary...

Hi Christian,

Comments in-line, with my new comments preceded by ">>>[Raymond]:" to make it easier to identify.

Raymond

-----Original Message-----
From: Christian Hoene [mailto:hoene@uni-tuebingen.de] 
Sent: Sunday, July 04, 2010 6:34 AM
To: Raymond (Juin-Hwey) Chen
Cc: codec@ietf.org
Subject: RE: [codec] Suggested summary...

Hi Raymond,

Inline some comments of mine are inline

Christian

-----Original Message-----
From: Raymond (Juin-Hwey) Chen [mailto:rchen@broadcom.com] 

Hi Christian,

Thanks for your summary.  Some comments on 2) and 3) below.

2) low complexity mode: 
All else being equal or comparable, the lower the complexity, the 
better.  Besides MIPS or WMOPS, the memory footprint is another 
important aspect of complexity that should be considered.  

[Christian Hoene] Any suggested minimal and median values? 

>>>[Raymond]: I would suggest taking a look at the memory footprint 
numbers of many contemporary codecs in some codec comparison 
tables such as those in Annexes C and D of PacketCable 2.0 codec 
and media spec for narrowband and wideband codecs, respectively, at  
http://www.cablelabs.com/specifications/PKT-SP-CODEC-MEDIA-I09-100527.pdf
Whether the complexity is "low" is in a relative sense.  I think it 
makes sense to call a codec's complexity low if it is below the 
median value of the complexity of contemporary codecs at the same 
sampling rate.  Some may even argue that it has to be far below the 
median to be called "low".  Using the Annex C above and excluding 
very old codecs such as G.711 and G.726, the median RAM size is about 
2.6 kwords and the median ROM size is about 12 to 14 kwords for 
narrowband (8 kHz sampling) codecs.  Similarly, using Annex D gives
a median RAM size of 4.6 to 5.3 kwords and a median total memory 
footprint of 19.4 to 23 kwords for wideband (16 kHz sampling) codecs.

Furthermore, if we want to specify a complexity target, then in 
addition to a target for the full-band 48 kHz sampling rate, it would 
be useful to specify also the complexity targets for lower sampling 
rates such as 16 and 8 kHz, since the codec may operate at these 
lower sampling rates in some important voice-centric applications.

[Christian Hoene] Upps, the minimal complexity values were meant regardless of the sampling rate

>>>[Raymond]: Using a single complexity target for all sampling rates 
doesn't make sense to me, as it could be too tight a constraint for high
sampling rates and too loose a constraint for low sampling rates.  It makes 
more sense to have separate targets for at least 8, 16, and 48 kHz sampling 
rates.  If we follow the same logic as in memory footprint above, then the 
median complexity is 20 MIPS for 8 kHz sampling and 17.5 MIPS to 38 WMOPS 
for 16 kHz sampling.  I know 17.5 to 38 is a wide range. That's an inherent
problem with taking the median of an even number of entries.  If we take 
the average in this case, it is 25.3 MIPS/WMOPS.  I know MIPS is not the 
same as WMOPS (usually slightly higher than WMOPS), but that was how the 
complexity data were collected, and since we don't have access to all 
complexity data using only MIPS or only WMOPS, I was forced to treat the 
two as roughly equivalent for the purpose of deriving median or average.

3) How latency sums up: 
Thanks for mentioning the ITU-T G.114, which has a good discussion of 
the codec-related one-way delay along the line of what we discussed 
in the emails. 

[Christian Hoene] Yes, but I forgot to mention that is was industry consensus ten years ago. Things have changed. 

>>>[Raymond]: Things certainly have changed, especially the processor 
speeds, but I am not sure that the industry consensus has changed much.  
If it did, you would expect that the ITU-T would probably update G.114.  
I think the reality is that most of the voice communications today (all 
the conventional land-line phones, cell phones, IP phones, etc.) are 
still handled by embedded processors in special-purpose hardware 
devices, which don't have the luxury of having a DSP that is 100 times
or even 10 times faster than the speed required by all real-time tasks 
of the system combined.  Thus, I would say the delay analysis in G.114 
is still relevant today.  In fact, one could even argue that some 
values of the delay components G.114 are on the low side as the wait 
time in RTS scheduling does not seem to be taken into account there. 
Hence, it seems to me that the delay values presented in G.114 are 
more like lower bounds of the delay values likely to be encountered 
in the real world.

However, I disagree with the statement that "Typical 
values are range from a factor faster of 100 (smart phones) to 1000 
(PCs). A device working at full load is a rare case." It is not at 
all a rare case to find devices running at or close to full load.  A 
good example is VoIP gateways.  

[Christian Hoene] Sorry, a telecommunication network overrating at full load is a rare case. For example, around  New Year's Eve networks tend to become overloaded: 2h per one year. Typically, the network shall be loaded somewhere between 5 and 20% or less. Even if you assume that all traffic is between end device and gateway (which is not the case - end to end IP is a more reasonable scenario), then the gateway is fully loaded less than 1<% of the time. 

>>>[Raymond]: Even if the overall network traffic load is relatively 
light, you still cannot rule out the possibility that some gateways may 
be significantly more loaded than others. More importantly, the delay 
depends on how the real-time scheduling system is designed and may not 
necessarily depend on the traffic load.  If you take a conservative 
approach and design your RTS system for the worst-case processing load, 
then even when the traffic load is lighter, the delay through the system 
remains the same.  If you try to design the RTS system to reduce the 
delay aggressively when the traffic load is lighter, then certain 
real-time tasks may run out of real time if you are not careful, 
resulting in audio quality degradation.  In any case, VoIP gateways 
were just given as an example. How about other devices with embedded 
processors such as IP phones? Their processor load doesn't depend on the 
traffic load and can be nearly fully loaded, so I would say a device 
working at (nearly) full load is not a rare case.

Also, in numerous occasions I have 
seen engineers trying very hard to cut the complexity of algorithms 
to make them fit the processing power of existing DSPs or host 
processors. That means the resulting implementations would have the 
processor essentially fully loaded.  

The fastest processors in current smart phones and PCs are about 1 
GHz and slightly more than 3 GHz, respectively.  A factor of 100 and 
1000 faster would require that the codec complexity be less than 
about 10 MHz and 3 MHz, respectively.  Most contemporary codecs today 
have higher complexity than that.  

[Christian Hoene] Then take the value 10 and 100. It still does not matter. 

>>>[Raymond]: My real point is that even if the processor MHz rating 
is 10 or 100 times the codec complexity MHz requirement, it doesn't 
mean that your codec processing delay will be 1/10 or 1/100 of the 
frame size.  You can achieve such a processing delay of 1/10 or 1/100 
of the frame size only if the processor is immediately available to 
do the encoder task and decoder task as soon as the last audio sample 
of the frame and the last bit of the compressed frame arrives, 
respectively.  We know that this is generally not true because there 
is almost always some other real-time tasks in the system which can 
be running at those particular time instants and the encoder and 
decoder tasks just have to wait for their turns.  My original 
paragraph below in my last email discussed this point.

Even if the codec complexity were 
lower than these numbers, to achieve a delay reduction factor of 100 
and 1000, the encoding and decoding operations would each have to be 
the only real-time task or the highest-priority task so the processor 
will get to it right away without any delay.  This is not the case in 
general, since having a full-duplex channel will requires at least a 
real-time encoding task and a real-time decoding task running at the 
same time. Both cannot be the only real-time task or the highest-
priority real-time task at the same time; furthermore, there are 
almost always some other real-time tasks in the system.  As I 
discussed in length in my previous email about Real-Time Scheduling 
(RTS) delay, most audio/video communication systems have many 
real-time tasks running at the same time; substantial delay needs 
to be added to ensure that none of such real-time tasks will run out 
of real time.  This often leads to an RTS delay of one frame.

[Christian Hoene] Let me cite from TI specs of the TNETV3020 carrier grade voice gateway: "Processing delay in the Telogy Software solution is minimized by a staggered processing schedule". TI engineers do not have the same technical problems as Broadcom engineers. This paper (or any other) on the topic of staggered scheduling might be helpful http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.143.7213&rep=rep1&type=pdf 

>>>[Raymond]: First, I checked the product bulletin of TNETV3020, and other than the sentence "Processing delay in the Telogy Software solution is minimized by a staggered processing schedule" that you quoted above, it doesn't say anything about how low the latency is or the relationship between the processing delay and the codec frame size.  This generic statement does not prove or disprove anything related to what we are discussing, and it does not say anything about the technical problems facing TI engineers and Broadcom engineers.
Second, I read the Holman and Anderson staggered model paper that your link points to, but I am not sure it is relevant to what we are discussing here.  If I understand it correctly, that paper is talking about scheduling the processors of a symmetric multiprocessors (SMP) system in a staggered manner to reduce the wait time due to bus contention.  That's very different from what we are talking about here, which is how to schedule multiple real-time tasks in a single-core processor without letting any of the tasks running out of real-time.  

Best Regards,

Raymond

-----Original Message-----
From: codec-bounces@ietf.org [mailto:codec-bounces@ietf.org] On Behalf Of Christian Hoene
Sent: Friday, July 02, 2010 3:09 AM
To: 'Cullen Jennings'
Cc: codec@ietf.org
Subject: [codec] Suggested summary...

Hello,

taking Cullen advise, I would like to suggest the following summary.

> 1) low delay mode

The codec shall be able to operated at a mode having an algorithmic delay of 8ms or less while having a frame duration of 5 1/3 ms or less. This is require to support ensemble performances over the Internet and other highly interactive conversational tasks.

> 2) low complexity mode (whatever this means)

The codec shall be able to operate at a low complexity mode while requiring less computational resources than a AMR-WB codec 
(< 38 WMOPS if measured with ITU-T STL2005 BaseOP (ITU-T G.192)).

> 3) technical understanding on how latency sums up on different platforms

Standard ITU-T G.114 (05/00 and 05/03) describes how different system components contribute to the one-way transmission delay. It states that the processing time of the codec contributes with an additional delay as large as the frame duration.

However, it is common consensus that plenty computational resources will be available most of the time. Then, the codec processing will be much faster than one frame duration. Typical values are range from a factor faster of 100 (smart phones) to 1000 (PCs). A device working at full load is a rare case.

Any suggestion to improve it?

With best regards,

 Christian 

---------------------------------------------------------------
Dr.-Ing. Christian Hoene
Interactive Communication Systems (ICS), University of Tübingen 
Sand 13, 72076 Tübingen, Germany, Phone +49 7071 2970532 
http://www.net.uni-tuebingen.de/

-----Original Message-----
From: Cullen Jennings [mailto:fluffy@cisco.com] 
Sent: Monday, June 21, 2010 7:21 PM
To: Christian Hoene
Cc: codec@ietf.org
Subject: Re: [codec] #16: Multicast?

On May 27, 2010, at 9:48 AM, Christian Hoene wrote:

> So, we have consensus on 
> 1) low delay mode
> 2) low complexity mode (whatever this means)
> 3) technical understanding on how latency sums up on different platforms

From a Chair point of view, I don't think the Chairs could summarize or call consensus on these three - however, I'm not sure that matters. If you think a key piece of consensus has come out of this conversation and that it needs to captured in the archive, can you summarize what you think it is folks agree with and then the chairs can make some sort of consensus call.

Thanks, Cullen <with my chair hat on>
 =

_______________________________________________
codec mailing list
codec@ietf.org
https://www.ietf.org/mailman/listinfo/codec