[codec] Measuring the algorithmic efficiency of the IWAC

"Christian Hoene" <hoene@uni-tuebingen.de> Mon, 31 May 2010 09:28 UTC

From: Christian Hoene <hoene@uni-tuebingen.de>
To: codec@ietf.org
Date: Mon, 31 May 2010 11:27:48 +0200
Message-ID: <000b01cb00a3$8c9ab710$a5d02530$@de>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_NextPart_000_000C_01CB00B4.50238710"
Thread-Index: AcsAo4RifaTQhJwsT5arhJbNjrLEKw==
Content-Language: de
Subject: [codec] Measuring the algorithmic efficiency of the IWAC
Precedence: list

Hi,

 

in the last weeks we had a couple of discussion on the complexity or algorithmic
<http://en.wikipedia.org/wiki/Algorithmic_efficiency>  efficiency of the codec. Knowing the algorithmic efficiency is important
because:

1.      the complexity has an impact on power consumption and system costs, 

2.      the hardware can be selected to fit pre-known complexity requirement, and

3.      different codec proposals can be compared if they are show similar performance in other aspects.

Before any complexity comparisons can be made, we need an objective, precise, reliable, and repeatable metric on how to measure the
algorithmic efficiency. Let me describe three different approaches:

 

Classic Approach used in Codec Standardization

 

In the last 17 years, the ITU-T Study Group 16 measure the complexity of codecs using a library called ITU-T Basic Operators
(described in ITU-T <http://www.itu.int/rec/T-REC-G.191-200508-I/en>  G.191), that counts the kind and number of operations and the
amount of memory. The last version of the standard (03/10), not yet publically available, supports - beside fix-point operations of
different widths - also floating operations.  Each operation can be counted automatically and weighted accordingly. The following
source code is an excerpt from baseop32.h

 

/*___________________________________________________________________________

 |                                                                           |

 |   Prototypes for basic arithmetic operators                               |

 |___________________________________________________________________________|

*/

 

Word16 add (Word16 var1, Word16 var2);    /* Short add,           1   */

Word16 sub (Word16 var1, Word16 var2);    /* Short sub,           1   */

Word16 abs_s (Word16 var1);               /* Short abs,           1   */

Word16 shl (Word16 var1, Word16 var2);    /* Short shift left,    1   */

Word16 shr (Word16 var1, Word16 var2);    /* Short shift right,   1   */



Word16 div_s (Word16 var1, Word16 var2); /* Short division,       18  */

Word16 norm_l (Word32 L_var1);           /* Long norm,             1  */

 

In the upcoming ITU-T G.GSAD standard another approach has been used as shown in the following code example. For each operation,
WMPOS function have been added, which count the number of operations. If the algorithmic efficiency of an algorithm has to be
measured, the program is run and the operations are counted. 

 

    for (i=0; i<NUM_BAND; i++)

    {

#ifdef WMOPS_FX

        move32();move32();

        move32();move32();

#endif

            

        state_fx->band_enrg_long_fx[i] = 30;

        state_fx->band_enrg_fx[i] = 30;

        state_fx->band_enrg_bgd_fx[i] = 30;   

        state_fx->min_band_enrg_fx[i] = 30;

    }

 

These two similar examples worked well in the years and are an established procedure to count operations and complexity. Still, it
has some drawbacks

a)      Existing algorithm must be modified manually to address the counting of operations. This is time consuming. 

b)      The CPU model is simplistic as it does not consider memory access (e.g. cache), parallel executions, or other kind of
optimization done in modern microprocessor and compilers. Thus, the number of instruction might not correlated well with the actual
execution time on modern CPUs.

Profiling: The Classic Approach used in Software Engineering

 

Citing Wikipedia <http://en.wikipedia.org/wiki/Profiling_(computer_programming)> : In software engineering
<http://en.wikipedia.org/wiki/Software_engineering> , program profiling, software profiling or simply profiling, a form of dynamic
program analysis <http://en.wikipedia.org/wiki/Dynamic_program_analysis>  (as opposed tostatic code analysis
<http://en.wikipedia.org/wiki/Static_code_analysis> ), is the investigation of a program's behavior using information gathered as
the program executes. The usual purpose of this analysis is to determine which sections of a program to optimize
<http://en.wikipedia.org/wiki/Optimization_(computer_science)>  - to increase its overall speed, decrease its memory requirement or
sometimes both.

*         A (code) profiler is a performance analysis tool that, most commonly, measures only the frequency and duration of function
calls, but there are other specific types of profilers (e.g. memory profilers
<http://en.wikipedia.org/w/index.php?title=Memory_profiler&action=edit&redlink=1> ) in addition to more comprehensive profilers,
capable of gathering extensive performance data.

*         An instruction set simulator <http://en.wikipedia.org/wiki/Instruction_set_simulator>  which is also  by necessity  a
profiler, can measure the totality of a program's behaviour from invocation to termination.

Thus, a typical profiler such as gprof can be used to measure and understand the complexity of a codec implementation. It is precise
because it is uses on modern computers. However, if used for standardization, it has the following drawback.

a)      The execution times depend on the CPU architecture, the PC in general, the OS and parallel running programs. As such it
cannot produce reliable and repeatable results. The results are not comparable if done under slightly changed conditions.

Suggestion: Use a Cycle Accurate Simulator

Wikipedia: A Cycle Accurate Simulator (CAS) is a computer program that simulates a
<http://en.wikipedia.org/wiki/Microarchitecture> microarchitecture cycle-accurate. In contrast an
<http://en.wikipedia.org/wiki/Instruction_set_simulator> instruction set simulator simulates an
<http://en.wikipedia.org/wiki/Instruction_Set_Architecture> Instruction Set Architecture usually faster but not cycle-accurate to a
specific implementation of this architecture.

Then, the execution times are precise and they are repeatable. If two parties make measurements using different CPUs, they still get
the same results. Of course, a cycle accurate simulator is slower than the real CPU at a factor of about 100.

 

Actually, there is an open-source Cycle accurate simulator called PTLsim <http://www.ptlsim.org/>  avaible that simulates an Pentium
IV. PTLsim is a cycle accurate x86 microprocessor simulator and virtual machine for the x86 and x86-64 instruction sets. PTLsim
models a modern superscalar out of order x86-64 compatible processor core at a configurable level of detail ranging from full-speed
native execution on the host CPU all the way down to RTL level models of all key pipeline structures. In addition, all microcode,
the complete cache hierarchy, memory subsystem and supporting hardware devices are modeled with true cycle accuracy. PTLsim supports
the full x86-64 instruction set of the Pentium 4+, Athlon 64 and similar machines with all extensions (x86-64, SSE/SSE2/SSE3, MMX,
x87). It is currently the only tool available to the public to support true cycle accurate modeling of real x86 microarchitectures.
[] PTLsim is used extensively at hundreds of major universities, industry research labs and the well known x86 microprocessor
vendors Intel and AMD.

Thus, I would suggest to use the Cycle Accurate Simulator PTLsim to measure the complexity of the IETF audio codec contributions
because

a)        it makes precise performance measurements using a modern, widely deployed CPU platform,

b)        the measurement are repeatable and can be made by everybody,

c)        existing source code can be used without significant changes, and 

d)        it is time-saving and easy to use.

With best regards,

 Christian Hoene

 

---------------------------------------------------------------

Dr.-Ing. Christian Hoene

Interactive Communication Systems (ICS), University of Tübingen 

Sand 13, 72076 Tübingen, Germany, Phone +49 7071 2970532 

http://www.net.uni-tuebingen.de/

[codec] Measuring the algorithmic efficiency of t… Christian Hoene
Re: [codec] Measuring the algorithmic efficiency … stephen botzko
Re: [codec] Measuring the algorithmic efficiency … Christian Hoene
Re: [codec] Measuring the algorithmic efficiency … stephen botzko