Re: [codec] Measuring the algorithmic efficiency of the IWAC

Hi Christian

Are you implying that the complexity is only a requirement for x86
machines?

>>>
different codec proposals can be compared if they are show similar
performance in other aspects.
>>>
There's been no discussion on this point.  In my view, at some point in the
process we set a complexity bound which all codecs proposals will have to
meet.  That is quite different from the above statement.

Regards
Stephen Botzko

On Mon, May 31, 2010 at 5:27 AM, Christian Hoene <hoene@uni-tuebingen.de>wrote:

>  Hi,
>
>
>
> in the last weeks we had a couple of discussion on the complexity or algorithmic
> efficiency <http://en.wikipedia.org/wiki/Algorithmic_efficiency> of the
> codec. Knowing the algorithmic efficiency is important because:
>
> 1.      the complexity has an impact on power consumption and system
> costs,
>
> 2.      the hardware can be selected to fit pre-known complexity
> requirement, and
>
> 3.      different codec proposals can be compared if they are show similar
> performance in other aspects.
>
> Before any complexity comparisons can be made, we need an objective,
> precise, reliable, and repeatable metric on how to measure the algorithmic
> efficiency. Let me describe three different approaches:
>
>
>
> *Classic Approach used in Codec Standardization*
>
>
>
> In the last 17 years, the ITU-T Study Group 16 measure the complexity of
> codecs using a library called ITU-T Basic Operators (described in ITU-T
> G.191), <http://www.itu.int/rec/T-REC-G.191-200508-I/en> that counts the
> kind and number of operations and the amount of memory. The last version of
> the standard (03/10), not yet publically available, supports - beside
> fix-point operations of different widths - also floating operations.  Each
> operation can be counted automatically and weighted accordingly. The
> following source code is an excerpt from baseop32.h
>
>
>
>
> /*___________________________________________________________________________
>
>  |
>         |
>
>  |   Prototypes for basic arithmetic
> operators                               |
>
>
>  |___________________________________________________________________________|
>
> */
>
>
>
> Word16 add (Word16 var1, Word16 var2);    /* Short add,           1   */
>
> Word16 sub (Word16 var1, Word16 var2);    /* Short sub,           1   */
>
> Word16 abs_s (Word16 var1);               /* Short abs,           1   */
>
> Word16 shl (Word16 var1, Word16 var2);    /* Short shift left,    1   */
>
> Word16 shr (Word16 var1, Word16 var2);    /* Short shift right,   1   */
>
> …
>
> Word16 div_s (Word16 var1, Word16 var2); /* Short division,       18  */
>
> Word16 norm_l (Word32 L_var1);           /* Long norm,             1  */
>
>
>
> In the upcoming ITU-T G.GSAD standard another approach has been used as
> shown in the following code example. For each operation, WMPOS function
> have been added, which count the number of operations. If the algorithmic
> efficiency of an algorithm has to be measured, the program is run and the
> operations are counted.
>
>
>
>     for (i=0; i<NUM_BAND; i++)
>
>     {
>
> #ifdef WMOPS_FX
>
>         move32();move32();
>
>         move32();move32();
>
> #endif
>
>
>
>         state_fx->band_enrg_long_fx[i] = 30;
>
>         state_fx->band_enrg_fx[i] = 30;
>
>         state_fx->band_enrg_bgd_fx[i] = 30;
>
>         state_fx->min_band_enrg_fx[i] = 30;
>
>     }
>
>
>
> These two similar examples worked well in the years and are an established
> procedure to count operations and complexity. Still, it has some drawbacks
>
> a)      Existing algorithm must be modified manually to address the
> counting of operations. This is time consuming.
>
> b)      The CPU model is simplistic as it does not consider memory access
> (e.g. cache), parallel executions, or other kind of optimization done in
> modern microprocessor and compilers. Thus, the number of instruction might
> not correlated well with the actual execution time on modern CPUs.
>
> *Profiling: The Classic Approach used in Software Engineering*
>
> * *
>
> Citing Wikipedia<http://en.wikipedia.org/wiki/Profiling_%28computer_programming%29>:
> „In software engineering<http://en.wikipedia.org/wiki/Software_engineering>, program
> profiling, software profiling or simply profiling, a form of dynamic
> program analysis <http://en.wikipedia.org/wiki/Dynamic_program_analysis> (as
> opposed tostatic code analysis<http://en.wikipedia.org/wiki/Static_code_analysis>),
> is the investigation of a program's behavior using information gathered as
> the program executes. The usual purpose of this analysis is to determine
> which sections of a program to optimize<http://en.wikipedia.org/wiki/Optimization_%28computer_science%29> -
> to increase its overall speed, decrease its memory requirement or sometimes
> both.
>
> ·         A (code) profiler is a performance analysis tool that, most
> commonly, measures only the frequency and duration of function calls, but
> there are other specific types of profilers (e.g. memory profilers<http://en.wikipedia.org/w/index.php?title=Memory_profiler&action=edit&redlink=1>)
> in addition to more comprehensive profilers, capable of gathering extensive
> performance data.
>
> ·         An instruction set simulator<http://en.wikipedia.org/wiki/Instruction_set_simulator> which
> is also — by necessity — a profiler, can measure the totality of a program's
> behaviour from invocation to termination.“
>
> Thus, a typical profiler such as gprof can be used to measure and
> understand the complexity of a codec implementation. It is precise because
> it is uses on modern computers. However, if used for standardization, it has
> the following drawback.
>
> a)      The execution times depend on the CPU architecture, the PC in
> general, the OS and parallel running programs. As such it cannot produce
> reliable and repeatable results. The results are not comparable if done
> under slightly changed conditions.
>
> *Suggestion: Use a Cycle Accurate Simulator*
>
> Wikipedia: „A Cycle Accurate Simulator (CAS) is a computer program that
> simulates a microarchitecture<http://en.wikipedia.org/wiki/Microarchitecture>
>  cycle-accurate. In contrast an instruction set simulator<http://en.wikipedia.org/wiki/Instruction_set_simulator>
>  simulates an Instruction Set Architecture<http://en.wikipedia.org/wiki/Instruction_Set_Architecture>
>  usually faster but not cycle-accurate to a specific implementation of
> this architecture.“
>
> Then, the execution times are precise and they are repeatable. If two
> parties make measurements using different CPUs, they still get the same
> results. Of course, a cycle accurate simulator is slower than the real CPU
> at a factor of about 100.
>
>
>
> Actually, there is an open-source Cycle accurate simulator called PTLsim<http://www.ptlsim.org/>avaible that simulates an Pentium IV. „PTLsim is a cycle accurate x86
> microprocessor simulator and virtual machine for the x86 and x86-64
> instruction sets. PTLsim models a modern superscalar out of order
> x86-64 compatible processor core at a configurable level of detail ranging
> from full-speed native execution on the host CPU all the way down to RTL
> level models of all key pipeline structures. In addition, all microcode, the
> complete cache hierarchy, memory subsystem and supporting hardware devices
> are modeled with true cycle accuracy. PTLsim supports the full x86-64
> instruction set of the Pentium 4+, Athlon 64 and similar machines with all
> extensions (x86-64, SSE/SSE2/SSE3, MMX, x87). It is currently the only tool
> available to the public to support true cycle accurate modeling of real x86
> microarchitectures. […] PTLsim is used extensively at hundreds of major
> universities, industry research labs and the well known x86 microprocessor
> vendors Intel and AMD.“
>
> Thus, I would suggest to use the Cycle Accurate Simulator PTLsim to
> measure the complexity of the IETF audio codec contributions because
>
> a)        it makes precise performance measurements using a modern, widely
> deployed CPU platform,
>
> b)        the measurement are repeatable and can be made by everybody,
>
> c)        existing source code can be used without significant changes,
> and
>
> d)        it is time-saving and easy to use.
>
> With best regards,
>
>  Christian Hoene
>
>
>
> ---------------------------------------------------------------
>
> Dr.-Ing. Christian Hoene
>
> Interactive Communication Systems (ICS), University of Tübingen
>
> Sand 13, 72076 Tübingen, Germany, Phone +49 7071 2970532
>
> http://www.net.uni-tuebingen.de/
>
>
>
> _______________________________________________
> codec mailing list
> codec@ietf.org
> https://www.ietf.org/mailman/listinfo/codec
>
>