Re: [codec] Measuring the algorithmic efficiency of the IWAC

stephen botzko <stephen.botzko@gmail.com> Mon, 31 May 2010 13:38 UTC

Return-Path: <stephen.botzko@gmail.com>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 973003A6A5A for <codec@core3.amsl.com>; Mon, 31 May 2010 06:38:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.602
X-Spam-Level:
X-Spam-Status: No, score=0.602 tagged_above=-999 required=5 tests=[BAYES_50=0.001, HTML_MESSAGE=0.001, J_CHICKENPOX_14=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yH7FhB82ehle for <codec@core3.amsl.com>; Mon, 31 May 2010 06:38:13 -0700 (PDT)
Received: from mail-ww0-f44.google.com (mail-ww0-f44.google.com [74.125.82.44]) by core3.amsl.com (Postfix) with ESMTP id 6D01B3A68F2 for <codec@ietf.org>; Mon, 31 May 2010 06:37:44 -0700 (PDT)
Received: by wwb39 with SMTP id 39so1059785wwb.31 for <codec@ietf.org>; Mon, 31 May 2010 06:37:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=H7xtzpDWFfvphQHJIUZWBO60fHcpz7C8mhoWYFGELzU=; b=I8ibsEmwHdJ1mLSsTxhTO/Vvc7XgcopRCrUwlZXkqcCmdmtxoCLon8AtSQD55ND5MU tO/v/KRHbRajDB2Ugx5e823tUPmsg5CxuiupRvf6eZKLsuNh1dxfLIneCCqOe9mMrGIO EovhClJdlnsWR/762ZQfPLtyiEUnueVfS/8fU=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=Yvg/eSdE/zp0DEE1nPICrWignV+bt+LZL5+4dgshh/tYONsH5w1Gr4iYoypMLFBq35 TFwZbsmr+1kyeEd267+Tgo0f0EpKzYQBrjHrODILDntOH28stUJXA6a1jEnB1PQ4lRi3 spwe7XqGzr8fBTU3fCVMGqa+ao0TBaTdjQvwA=
MIME-Version: 1.0
Received: by 10.227.134.194 with SMTP id k2mr4429657wbt.118.1275312697801; Mon, 31 May 2010 06:31:37 -0700 (PDT)
Received: by 10.216.23.5 with HTTP; Mon, 31 May 2010 06:31:29 -0700 (PDT)
In-Reply-To: <000f01cb00b7$5f7ddb90$1e7992b0$@de>
References: <000b01cb00a3$8c9ab710$a5d02530$@de> <AANLkTimk-0hXkfyFqyr8_C7KdiP2Q4QBc9BdDy3Qz0e1@mail.gmail.com> <000f01cb00b7$5f7ddb90$1e7992b0$@de>
Date: Mon, 31 May 2010 09:31:29 -0400
Message-ID: <AANLkTimU_9XwRXu9N2cmHMN-AJN55dOqV-WxwU80TJlj@mail.gmail.com>
From: stephen botzko <stephen.botzko@gmail.com>
To: Christian Hoene <hoene@uni-tuebingen.de>
Content-Type: multipart/alternative; boundary="001636831c62547d720487e3e011"
Cc: codec@ietf.org
Subject: Re: [codec] Measuring the algorithmic efficiency of the IWAC
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 31 May 2010 13:38:15 -0000

One significant disadvantage of profiling is that it gives an unfair
advantage to code which is optimized for the x86 platform.

This may not be a problem at the end of the development process, however in
the beginning it is best to stay focused on the core algorithm.

Even at the end, more generic code may be the best reference, since the x86
optimizations don't have to be undone if you are re-optimizing for a non-x86
platform.

>>>
Also, the contributed source code is only available for PCs like system not
for DSPs.
>>>
Surely NOT!  I think the contributed source code has to be portable, and
need to compile (and deliver the right result) for a wide variety of
platforms.

Regards
Stephen Botzko

On Mon, May 31, 2010 at 7:49 AM, Christian Hoene <hoene@uni-tuebingen.de>wrote:

>  Hi Stephen,
>
>
>
> comments inline
>
>
>
> Are you implying that the complexity is only a requirement for x86
> machines?
>
>  Hoene: No, complexity is a requirement for all machines. However, x86
> machines are distributed and used widely on PC, notebook and also sometimes
> also in embedded systems. Also, the contributed source code is only
> available for PCs like system not for DSPs. Thus, any testing on DSP will
> not be possible.
>
> However, you are free to suggest other reference platforms such as
>
> ·         ARM: http://facsim.snu.ac.kr/
>
> ·         C64x+ CPU Cycle Accurate Simulator:
> http://processors.wiki.ti.com/index.php/C64x%2B_CPU_Cycle_Accurate_Simulator
>
> Just a matter of choice (and availability).
>
> >>>
> different codec proposals can be compared if they are show similar
> performance in other aspects.
> >>>
> There's been no discussion on this point.  In my view, at some point in the
> process we set a complexity bound which all codecs proposals will have to
> meet.
>
> Hoene: Agreed. However, before agreeing on bound, we have to agree on a
> measure and metric. My last email is on suggestion on how to measure precise
> and time saving while covering the most typical usage scenarios.
>
> Christian
>
>
>
> That is quite different from the above statement.
>
>
>
>
> Regards
> Stephen Botzko
>
>
>  On Mon, May 31, 2010 at 5:27 AM, Christian Hoene <hoene@uni-tuebingen.de>
> wrote:
>
> Hi,
>
>
>
> in the last weeks we had a couple of discussion on the complexity or algorithmic
> efficiency <http://en.wikipedia.org/wiki/Algorithmic_efficiency> of the
> codec. Knowing the algorithmic efficiency is important because:
>
> 1.      the complexity has an impact on power consumption and system
> costs,
>
> 2.      the hardware can be selected to fit pre-known complexity
> requirement, and
>
> 3.      different codec proposals can be compared if they are show similar
> performance in other aspects.
>
> Before any complexity comparisons can be made, we need an objective,
> precise, reliable, and repeatable metric on how to measure the algorithmic
> efficiency. Let me describe three different approaches:
>
>
>
> *Classic Approach used in Codec Standardization*
>
>
>
> In the last 17 years, the ITU-T Study Group 16 measure the complexity of
> codecs using a library called ITU-T Basic Operators (described in ITU-T
> G.191), <http://www.itu.int/rec/T-REC-G.191-200508-I/en> that counts the
> kind and number of operations and the amount of memory. The last version of
> the standard (03/10), not yet publically available, supports - beside
> fix-point operations of different widths - also floating operations.  Each
> operation can be counted automatically and weighted accordingly. The
> following source code is an excerpt from baseop32.h
>
>
>
>
> /*___________________________________________________________________________
>
>  |
>         |
>
>  |   Prototypes for basic arithmetic
> operators                               |
>
>
>  |___________________________________________________________________________|
>
> */
>
>
>
> Word16 add (Word16 var1, Word16 var2);    /* Short add,           1   */
>
> Word16 sub (Word16 var1, Word16 var2);    /* Short sub,           1   */
>
> Word16 abs_s (Word16 var1);               /* Short abs,           1   */
>
> Word16 shl (Word16 var1, Word16 var2);    /* Short shift left,    1   */
>
> Word16 shr (Word16 var1, Word16 var2);    /* Short shift right,   1   */
>
> …
>
> Word16 div_s (Word16 var1, Word16 var2); /* Short division,       18  */
>
> Word16 norm_l (Word32 L_var1);           /* Long norm,             1  */
>
>
>
> In the upcoming ITU-T G.GSAD standard another approach has been used as
> shown in the following code example. For each operation, WMPOS function
> have been added, which count the number of operations. If the algorithmic
> efficiency of an algorithm has to be measured, the program is run and the
> operations are counted.
>
>
>
>     for (i=0; i<NUM_BAND; i++)
>
>     {
>
> #ifdef WMOPS_FX
>
>         move32();move32();
>
>         move32();move32();
>
> #endif
>
>
>
>         state_fx->band_enrg_long_fx[i] = 30;
>
>         state_fx->band_enrg_fx[i] = 30;
>
>         state_fx->band_enrg_bgd_fx[i] = 30;
>
>         state_fx->min_band_enrg_fx[i] = 30;
>
>     }
>
>
>
> These two similar examples worked well in the years and are an established
> procedure to count operations and complexity. Still, it has some drawbacks
>
> a)      Existing algorithm must be modified manually to address the
> counting of operations. This is time consuming.
>
> b)      The CPU model is simplistic as it does not consider memory access
> (e.g. cache), parallel executions, or other kind of optimization done in
> modern microprocessor and compilers. Thus, the number of instruction might
> not correlated well with the actual execution time on modern CPUs.
>
> *Profiling: The Classic Approach used in Software Engineering*
>
> * *
>
> Citing Wikipedia<http://en.wikipedia.org/wiki/Profiling_%28computer_programming%29>:
> „In software engineering<http://en.wikipedia.org/wiki/Software_engineering>, program
> profiling, software profiling or simply profiling, a form of dynamic
> program analysis <http://en.wikipedia.org/wiki/Dynamic_program_analysis> (as
> opposed tostatic code analysis<http://en.wikipedia.org/wiki/Static_code_analysis>),
> is the investigation of a program's behavior using information gathered as
> the program executes. The usual purpose of this analysis is to determine
> which sections of a program to optimize<http://en.wikipedia.org/wiki/Optimization_%28computer_science%29> -
> to increase its overall speed, decrease its memory requirement or sometimes
> both.
>
> ·         A (code) profiler is a performance analysis tool that, most
> commonly, measures only the frequency and duration of function calls, but
> there are other specific types of profilers (e.g. memory profilers<http://en.wikipedia.org/w/index.php?title=Memory_profiler&action=edit&redlink=1>)
> in addition to more comprehensive profilers, capable of gathering extensive
> performance data.
>
> ·         An instruction set simulator<http://en.wikipedia.org/wiki/Instruction_set_simulator> which
> is also — by necessity — a profiler, can measure the totality of a program's
> behaviour from invocation to termination.“
>
> Thus, a typical profiler such as gprof can be used to measure and
> understand the complexity of a codec implementation. It is precise because
> it is uses on modern computers. However, if used for standardization, it has
> the following drawback.
>
> a)      The execution times depend on the CPU architecture, the PC in
> general, the OS and parallel running programs. As such it cannot produce
> reliable and repeatable results. The results are not comparable if done
> under slightly changed conditions.
>
> *Suggestion: Use a Cycle Accurate Simulator*
>
> Wikipedia: „A Cycle Accurate Simulator (CAS) is a computer program that
> simulates a microarchitecture<http://en.wikipedia.org/wiki/Microarchitecture> cycle-accurate.
> In contrast an instruction set simulator<http://en.wikipedia.org/wiki/Instruction_set_simulator> simulates
> an Instruction Set Architecture<http://en.wikipedia.org/wiki/Instruction_Set_Architecture> usually
> faster but not cycle-accurate to a specific implementation of this
> architecture.“
>
> Then, the execution times are precise and they are repeatable. If two
> parties make measurements using different CPUs, they still get the same
> results. Of course, a cycle accurate simulator is slower than the real CPU
> at a factor of about 100.
>
>
>
> Actually, there is an open-source Cycle accurate simulator called PTLsim<http://www.ptlsim.org/>avaible that simulates an Pentium IV. „PTLsim is a cycle accurate x86
> microprocessor simulator and virtual machine for the x86 and x86-64
> instruction sets. PTLsim models a modern superscalar out of order
> x86-64 compatible processor core at a configurable level of detail ranging
> from full-speed native execution on the host CPU all the way down to RTL
> level models of all key pipeline structures. In addition, all microcode, the
> complete cache hierarchy, memory subsystem and supporting hardware devices
> are modeled with true cycle accuracy. PTLsim supports the full x86-64
> instruction set of the Pentium 4+, Athlon 64 and similar machines with all
> extensions (x86-64, SSE/SSE2/SSE3, MMX, x87). It is currently the only tool
> available to the public to support true cycle accurate modeling of real x86
> microarchitectures. […] PTLsim is used extensively at hundreds of major
> universities, industry research labs and the well known x86 microprocessor
> vendors Intel and AMD.“
>
> Thus, I would suggest to use the Cycle Accurate Simulator PTLsim to measure
> the complexity of the IETF audio codec contributions because
>
> a)        it makes precise performance measurements using a modern, widely
> deployed CPU platform,
>
> b)        the measurement are repeatable and can be made by everybody,
>
> c)        existing source code can be used without significant changes,
> and
>
> d)        it is time-saving and easy to use.
>
> With best regards,
>
>  Christian Hoene
>
>
>
> ---------------------------------------------------------------
>
> Dr.-Ing. Christian Hoene
>
> Interactive Communication Systems (ICS), University of Tübingen
>
> Sand 13, 72076 Tübingen, Germany, Phone +49 7071 2970532
>
> http://www.net.uni-tuebingen.de/
>
>
>
>
> _______________________________________________
> codec mailing list
> codec@ietf.org
> https://www.ietf.org/mailman/listinfo/codec
>
>
>