Re: [codec] Measuring the algorithmic efficiency of the IWAC

stephen botzko <stephen.botzko@gmail.com> Mon, 31 May 2010 10:54 UTC

Return-Path: <stephen.botzko@gmail.com>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 9B15A3A67D7 for <codec@core3.amsl.com>; Mon, 31 May 2010 03:54:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.602
X-Spam-Level:
X-Spam-Status: No, score=0.602 tagged_above=-999 required=5 tests=[BAYES_50=0.001, HTML_MESSAGE=0.001, J_CHICKENPOX_14=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xSTixGGJ3TZf for <codec@core3.amsl.com>; Mon, 31 May 2010 03:54:37 -0700 (PDT)
Received: from mail-wy0-f172.google.com (mail-wy0-f172.google.com [74.125.82.172]) by core3.amsl.com (Postfix) with ESMTP id F328E3A67D1 for <codec@ietf.org>; Mon, 31 May 2010 03:54:36 -0700 (PDT)
Received: by wye20 with SMTP id 20so2437025wye.31 for <codec@ietf.org>; Mon, 31 May 2010 03:54:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=nDZab43UGjOgo14/wQipsXudsnDp0ZcnDr457ZPDeZw=; b=JdaVKDebAOjoVKLxkQOo0/HJs5fHDGV3nACNBwQEshHeHAXalAJqnCsJJuSdyk5hxS x6/mLhJnBNcJHdSrZUbLxhNxg513CVFaRT31mpTCpr6uFB50l4rk/aX+3tbsxJLtA9su e5Z1GK5DHaTRJKDgtqWnMBHj2gkyw58BTXaPU=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=g38Z15o6oJNGbgHijhUqVemGFgMIC36an0yOoaJrTbTezhqWkxc54d9cFpXgxXXRyW kyTArTzh7KlPcMfFiV78bb14vyQ4Xcovyv5zNlCtg27Bh5nkuHJF21tEs+vNkoN2z4Gq iqurhXQ0Nxf8dQUERu4pQOGKkg1NYZ50VDk4g=
MIME-Version: 1.0
Received: by 10.227.154.204 with SMTP id p12mr4108731wbw.141.1275303258595; Mon, 31 May 2010 03:54:18 -0700 (PDT)
Received: by 10.216.23.5 with HTTP; Mon, 31 May 2010 03:54:18 -0700 (PDT)
In-Reply-To: <000b01cb00a3$8c9ab710$a5d02530$@de>
References: <000b01cb00a3$8c9ab710$a5d02530$@de>
Date: Mon, 31 May 2010 06:54:18 -0400
Message-ID: <AANLkTimk-0hXkfyFqyr8_C7KdiP2Q4QBc9BdDy3Qz0e1@mail.gmail.com>
From: stephen botzko <stephen.botzko@gmail.com>
To: Christian Hoene <hoene@uni-tuebingen.de>
Content-Type: multipart/alternative; boundary="0016364165efb5a48f0487e1add1"
Cc: codec@ietf.org
Subject: Re: [codec] Measuring the algorithmic efficiency of the IWAC
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 31 May 2010 10:54:39 -0000

Hi Christian

Are you implying that the complexity is only a requirement for x86
machines?

>>>
different codec proposals can be compared if they are show similar
performance in other aspects.
>>>
There's been no discussion on this point.  In my view, at some point in the
process we set a complexity bound which all codecs proposals will have to
meet.  That is quite different from the above statement.

Regards
Stephen Botzko



On Mon, May 31, 2010 at 5:27 AM, Christian Hoene <hoene@uni-tuebingen.de>wrote:

>  Hi,
>
>
>
> in the last weeks we had a couple of discussion on the complexity or algorithmic
> efficiency <http://en.wikipedia.org/wiki/Algorithmic_efficiency> of the
> codec. Knowing the algorithmic efficiency is important because:
>
> 1.      the complexity has an impact on power consumption and system
> costs,
>
> 2.      the hardware can be selected to fit pre-known complexity
> requirement, and
>
> 3.      different codec proposals can be compared if they are show similar
> performance in other aspects.
>
> Before any complexity comparisons can be made, we need an objective,
> precise, reliable, and repeatable metric on how to measure the algorithmic
> efficiency. Let me describe three different approaches:
>
>
>
> *Classic Approach used in Codec Standardization*
>
>
>
> In the last 17 years, the ITU-T Study Group 16 measure the complexity of
> codecs using a library called ITU-T Basic Operators (described in ITU-T
> G.191), <http://www.itu.int/rec/T-REC-G.191-200508-I/en> that counts the
> kind and number of operations and the amount of memory. The last version of
> the standard (03/10), not yet publically available, supports - beside
> fix-point operations of different widths - also floating operations.  Each
> operation can be counted automatically and weighted accordingly. The
> following source code is an excerpt from baseop32.h
>
>
>
>
> /*___________________________________________________________________________
>
>  |
>         |
>
>  |   Prototypes for basic arithmetic
> operators                               |
>
>
>  |___________________________________________________________________________|
>
> */
>
>
>
> Word16 add (Word16 var1, Word16 var2);    /* Short add,           1   */
>
> Word16 sub (Word16 var1, Word16 var2);    /* Short sub,           1   */
>
> Word16 abs_s (Word16 var1);               /* Short abs,           1   */
>
> Word16 shl (Word16 var1, Word16 var2);    /* Short shift left,    1   */
>
> Word16 shr (Word16 var1, Word16 var2);    /* Short shift right,   1   */
>
> …
>
> Word16 div_s (Word16 var1, Word16 var2); /* Short division,       18  */
>
> Word16 norm_l (Word32 L_var1);           /* Long norm,             1  */
>
>
>
> In the upcoming ITU-T G.GSAD standard another approach has been used as
> shown in the following code example. For each operation, WMPOS function
> have been added, which count the number of operations. If the algorithmic
> efficiency of an algorithm has to be measured, the program is run and the
> operations are counted.
>
>
>
>     for (i=0; i<NUM_BAND; i++)
>
>     {
>
> #ifdef WMOPS_FX
>
>         move32();move32();
>
>         move32();move32();
>
> #endif
>
>
>
>         state_fx->band_enrg_long_fx[i] = 30;
>
>         state_fx->band_enrg_fx[i] = 30;
>
>         state_fx->band_enrg_bgd_fx[i] = 30;
>
>         state_fx->min_band_enrg_fx[i] = 30;
>
>     }
>
>
>
> These two similar examples worked well in the years and are an established
> procedure to count operations and complexity. Still, it has some drawbacks
>
> a)      Existing algorithm must be modified manually to address the
> counting of operations. This is time consuming.
>
> b)      The CPU model is simplistic as it does not consider memory access
> (e.g. cache), parallel executions, or other kind of optimization done in
> modern microprocessor and compilers. Thus, the number of instruction might
> not correlated well with the actual execution time on modern CPUs.
>
> *Profiling: The Classic Approach used in Software Engineering*
>
> * *
>
> Citing Wikipedia<http://en.wikipedia.org/wiki/Profiling_%28computer_programming%29>:
> „In software engineering<http://en.wikipedia.org/wiki/Software_engineering>, program
> profiling, software profiling or simply profiling, a form of dynamic
> program analysis <http://en.wikipedia.org/wiki/Dynamic_program_analysis> (as
> opposed tostatic code analysis<http://en.wikipedia.org/wiki/Static_code_analysis>),
> is the investigation of a program's behavior using information gathered as
> the program executes. The usual purpose of this analysis is to determine
> which sections of a program to optimize<http://en.wikipedia.org/wiki/Optimization_%28computer_science%29> -
> to increase its overall speed, decrease its memory requirement or sometimes
> both.
>
> ·         A (code) profiler is a performance analysis tool that, most
> commonly, measures only the frequency and duration of function calls, but
> there are other specific types of profilers (e.g. memory profilers<http://en.wikipedia.org/w/index.php?title=Memory_profiler&action=edit&redlink=1>)
> in addition to more comprehensive profilers, capable of gathering extensive
> performance data.
>
> ·         An instruction set simulator<http://en.wikipedia.org/wiki/Instruction_set_simulator> which
> is also — by necessity — a profiler, can measure the totality of a program's
> behaviour from invocation to termination.“
>
> Thus, a typical profiler such as gprof can be used to measure and
> understand the complexity of a codec implementation. It is precise because
> it is uses on modern computers. However, if used for standardization, it has
> the following drawback.
>
> a)      The execution times depend on the CPU architecture, the PC in
> general, the OS and parallel running programs. As such it cannot produce
> reliable and repeatable results. The results are not comparable if done
> under slightly changed conditions.
>
> *Suggestion: Use a Cycle Accurate Simulator*
>
> Wikipedia: „A Cycle Accurate Simulator (CAS) is a computer program that
> simulates a microarchitecture<http://en.wikipedia.org/wiki/Microarchitecture>
>  cycle-accurate. In contrast an instruction set simulator<http://en.wikipedia.org/wiki/Instruction_set_simulator>
>  simulates an Instruction Set Architecture<http://en.wikipedia.org/wiki/Instruction_Set_Architecture>
>  usually faster but not cycle-accurate to a specific implementation of
> this architecture.“
>
> Then, the execution times are precise and they are repeatable. If two
> parties make measurements using different CPUs, they still get the same
> results. Of course, a cycle accurate simulator is slower than the real CPU
> at a factor of about 100.
>
>
>
> Actually, there is an open-source Cycle accurate simulator called PTLsim<http://www.ptlsim.org/>avaible that simulates an Pentium IV. „PTLsim is a cycle accurate x86
> microprocessor simulator and virtual machine for the x86 and x86-64
> instruction sets. PTLsim models a modern superscalar out of order
> x86-64 compatible processor core at a configurable level of detail ranging
> from full-speed native execution on the host CPU all the way down to RTL
> level models of all key pipeline structures. In addition, all microcode, the
> complete cache hierarchy, memory subsystem and supporting hardware devices
> are modeled with true cycle accuracy. PTLsim supports the full x86-64
> instruction set of the Pentium 4+, Athlon 64 and similar machines with all
> extensions (x86-64, SSE/SSE2/SSE3, MMX, x87). It is currently the only tool
> available to the public to support true cycle accurate modeling of real x86
> microarchitectures. […] PTLsim is used extensively at hundreds of major
> universities, industry research labs and the well known x86 microprocessor
> vendors Intel and AMD.“
>
> Thus, I would suggest to use the Cycle Accurate Simulator PTLsim to
> measure the complexity of the IETF audio codec contributions because
>
> a)        it makes precise performance measurements using a modern, widely
> deployed CPU platform,
>
> b)        the measurement are repeatable and can be made by everybody,
>
> c)        existing source code can be used without significant changes,
> and
>
> d)        it is time-saving and easy to use.
>
> With best regards,
>
>  Christian Hoene
>
>
>
> ---------------------------------------------------------------
>
> Dr.-Ing. Christian Hoene
>
> Interactive Communication Systems (ICS), University of Tübingen
>
> Sand 13, 72076 Tübingen, Germany, Phone +49 7071 2970532
>
> http://www.net.uni-tuebingen.de/
>
>
>
> _______________________________________________
> codec mailing list
> codec@ietf.org
> https://www.ietf.org/mailman/listinfo/codec
>
>