Re: [codec] Measuring the algorithmic efficiency of the IWAC
stephen botzko <stephen.botzko@gmail.com> Mon, 31 May 2010 10:54 UTC
Return-Path: <stephen.botzko@gmail.com>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 9B15A3A67D7 for <codec@core3.amsl.com>; Mon, 31 May 2010 03:54:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.602
X-Spam-Level:
X-Spam-Status: No, score=0.602 tagged_above=-999 required=5 tests=[BAYES_50=0.001, HTML_MESSAGE=0.001, J_CHICKENPOX_14=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xSTixGGJ3TZf for <codec@core3.amsl.com>; Mon, 31 May 2010 03:54:37 -0700 (PDT)
Received: from mail-wy0-f172.google.com (mail-wy0-f172.google.com [74.125.82.172]) by core3.amsl.com (Postfix) with ESMTP id F328E3A67D1 for <codec@ietf.org>; Mon, 31 May 2010 03:54:36 -0700 (PDT)
Received: by wye20 with SMTP id 20so2437025wye.31 for <codec@ietf.org>; Mon, 31 May 2010 03:54:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=nDZab43UGjOgo14/wQipsXudsnDp0ZcnDr457ZPDeZw=; b=JdaVKDebAOjoVKLxkQOo0/HJs5fHDGV3nACNBwQEshHeHAXalAJqnCsJJuSdyk5hxS x6/mLhJnBNcJHdSrZUbLxhNxg513CVFaRT31mpTCpr6uFB50l4rk/aX+3tbsxJLtA9su e5Z1GK5DHaTRJKDgtqWnMBHj2gkyw58BTXaPU=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=g38Z15o6oJNGbgHijhUqVemGFgMIC36an0yOoaJrTbTezhqWkxc54d9cFpXgxXXRyW kyTArTzh7KlPcMfFiV78bb14vyQ4Xcovyv5zNlCtg27Bh5nkuHJF21tEs+vNkoN2z4Gq iqurhXQ0Nxf8dQUERu4pQOGKkg1NYZ50VDk4g=
MIME-Version: 1.0
Received: by 10.227.154.204 with SMTP id p12mr4108731wbw.141.1275303258595; Mon, 31 May 2010 03:54:18 -0700 (PDT)
Received: by 10.216.23.5 with HTTP; Mon, 31 May 2010 03:54:18 -0700 (PDT)
In-Reply-To: <000b01cb00a3$8c9ab710$a5d02530$@de>
References: <000b01cb00a3$8c9ab710$a5d02530$@de>
Date: Mon, 31 May 2010 06:54:18 -0400
Message-ID: <AANLkTimk-0hXkfyFqyr8_C7KdiP2Q4QBc9BdDy3Qz0e1@mail.gmail.com>
From: stephen botzko <stephen.botzko@gmail.com>
To: Christian Hoene <hoene@uni-tuebingen.de>
Content-Type: multipart/alternative; boundary="0016364165efb5a48f0487e1add1"
Cc: codec@ietf.org
Subject: Re: [codec] Measuring the algorithmic efficiency of the IWAC
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 31 May 2010 10:54:39 -0000
Hi Christian Are you implying that the complexity is only a requirement for x86 machines? >>> different codec proposals can be compared if they are show similar performance in other aspects. >>> There's been no discussion on this point. In my view, at some point in the process we set a complexity bound which all codecs proposals will have to meet. That is quite different from the above statement. Regards Stephen Botzko On Mon, May 31, 2010 at 5:27 AM, Christian Hoene <hoene@uni-tuebingen.de>wrote: > Hi, > > > > in the last weeks we had a couple of discussion on the complexity or algorithmic > efficiency <http://en.wikipedia.org/wiki/Algorithmic_efficiency> of the > codec. Knowing the algorithmic efficiency is important because: > > 1. the complexity has an impact on power consumption and system > costs, > > 2. the hardware can be selected to fit pre-known complexity > requirement, and > > 3. different codec proposals can be compared if they are show similar > performance in other aspects. > > Before any complexity comparisons can be made, we need an objective, > precise, reliable, and repeatable metric on how to measure the algorithmic > efficiency. Let me describe three different approaches: > > > > *Classic Approach used in Codec Standardization* > > > > In the last 17 years, the ITU-T Study Group 16 measure the complexity of > codecs using a library called ITU-T Basic Operators (described in ITU-T > G.191), <http://www.itu.int/rec/T-REC-G.191-200508-I/en> that counts the > kind and number of operations and the amount of memory. The last version of > the standard (03/10), not yet publically available, supports - beside > fix-point operations of different widths - also floating operations. Each > operation can be counted automatically and weighted accordingly. The > following source code is an excerpt from baseop32.h > > > > > /*___________________________________________________________________________ > > | > | > > | Prototypes for basic arithmetic > operators | > > > |___________________________________________________________________________| > > */ > > > > Word16 add (Word16 var1, Word16 var2); /* Short add, 1 */ > > Word16 sub (Word16 var1, Word16 var2); /* Short sub, 1 */ > > Word16 abs_s (Word16 var1); /* Short abs, 1 */ > > Word16 shl (Word16 var1, Word16 var2); /* Short shift left, 1 */ > > Word16 shr (Word16 var1, Word16 var2); /* Short shift right, 1 */ > > … > > Word16 div_s (Word16 var1, Word16 var2); /* Short division, 18 */ > > Word16 norm_l (Word32 L_var1); /* Long norm, 1 */ > > > > In the upcoming ITU-T G.GSAD standard another approach has been used as > shown in the following code example. For each operation, WMPOS function > have been added, which count the number of operations. If the algorithmic > efficiency of an algorithm has to be measured, the program is run and the > operations are counted. > > > > for (i=0; i<NUM_BAND; i++) > > { > > #ifdef WMOPS_FX > > move32();move32(); > > move32();move32(); > > #endif > > > > state_fx->band_enrg_long_fx[i] = 30; > > state_fx->band_enrg_fx[i] = 30; > > state_fx->band_enrg_bgd_fx[i] = 30; > > state_fx->min_band_enrg_fx[i] = 30; > > } > > > > These two similar examples worked well in the years and are an established > procedure to count operations and complexity. Still, it has some drawbacks > > a) Existing algorithm must be modified manually to address the > counting of operations. This is time consuming. > > b) The CPU model is simplistic as it does not consider memory access > (e.g. cache), parallel executions, or other kind of optimization done in > modern microprocessor and compilers. Thus, the number of instruction might > not correlated well with the actual execution time on modern CPUs. > > *Profiling: The Classic Approach used in Software Engineering* > > * * > > Citing Wikipedia<http://en.wikipedia.org/wiki/Profiling_%28computer_programming%29>: > „In software engineering<http://en.wikipedia.org/wiki/Software_engineering>, program > profiling, software profiling or simply profiling, a form of dynamic > program analysis <http://en.wikipedia.org/wiki/Dynamic_program_analysis> (as > opposed tostatic code analysis<http://en.wikipedia.org/wiki/Static_code_analysis>), > is the investigation of a program's behavior using information gathered as > the program executes. The usual purpose of this analysis is to determine > which sections of a program to optimize<http://en.wikipedia.org/wiki/Optimization_%28computer_science%29> - > to increase its overall speed, decrease its memory requirement or sometimes > both. > > · A (code) profiler is a performance analysis tool that, most > commonly, measures only the frequency and duration of function calls, but > there are other specific types of profilers (e.g. memory profilers<http://en.wikipedia.org/w/index.php?title=Memory_profiler&action=edit&redlink=1>) > in addition to more comprehensive profilers, capable of gathering extensive > performance data. > > · An instruction set simulator<http://en.wikipedia.org/wiki/Instruction_set_simulator> which > is also — by necessity — a profiler, can measure the totality of a program's > behaviour from invocation to termination.“ > > Thus, a typical profiler such as gprof can be used to measure and > understand the complexity of a codec implementation. It is precise because > it is uses on modern computers. However, if used for standardization, it has > the following drawback. > > a) The execution times depend on the CPU architecture, the PC in > general, the OS and parallel running programs. As such it cannot produce > reliable and repeatable results. The results are not comparable if done > under slightly changed conditions. > > *Suggestion: Use a Cycle Accurate Simulator* > > Wikipedia: „A Cycle Accurate Simulator (CAS) is a computer program that > simulates a microarchitecture<http://en.wikipedia.org/wiki/Microarchitecture> > cycle-accurate. In contrast an instruction set simulator<http://en.wikipedia.org/wiki/Instruction_set_simulator> > simulates an Instruction Set Architecture<http://en.wikipedia.org/wiki/Instruction_Set_Architecture> > usually faster but not cycle-accurate to a specific implementation of > this architecture.“ > > Then, the execution times are precise and they are repeatable. If two > parties make measurements using different CPUs, they still get the same > results. Of course, a cycle accurate simulator is slower than the real CPU > at a factor of about 100. > > > > Actually, there is an open-source Cycle accurate simulator called PTLsim<http://www.ptlsim.org/>avaible that simulates an Pentium IV. „PTLsim is a cycle accurate x86 > microprocessor simulator and virtual machine for the x86 and x86-64 > instruction sets. PTLsim models a modern superscalar out of order > x86-64 compatible processor core at a configurable level of detail ranging > from full-speed native execution on the host CPU all the way down to RTL > level models of all key pipeline structures. In addition, all microcode, the > complete cache hierarchy, memory subsystem and supporting hardware devices > are modeled with true cycle accuracy. PTLsim supports the full x86-64 > instruction set of the Pentium 4+, Athlon 64 and similar machines with all > extensions (x86-64, SSE/SSE2/SSE3, MMX, x87). It is currently the only tool > available to the public to support true cycle accurate modeling of real x86 > microarchitectures. […] PTLsim is used extensively at hundreds of major > universities, industry research labs and the well known x86 microprocessor > vendors Intel and AMD.“ > > Thus, I would suggest to use the Cycle Accurate Simulator PTLsim to > measure the complexity of the IETF audio codec contributions because > > a) it makes precise performance measurements using a modern, widely > deployed CPU platform, > > b) the measurement are repeatable and can be made by everybody, > > c) existing source code can be used without significant changes, > and > > d) it is time-saving and easy to use. > > With best regards, > > Christian Hoene > > > > --------------------------------------------------------------- > > Dr.-Ing. Christian Hoene > > Interactive Communication Systems (ICS), University of Tübingen > > Sand 13, 72076 Tübingen, Germany, Phone +49 7071 2970532 > > http://www.net.uni-tuebingen.de/ > > > > _______________________________________________ > codec mailing list > codec@ietf.org > https://www.ietf.org/mailman/listinfo/codec > >
- [codec] Measuring the algorithmic efficiency of t… Christian Hoene
- Re: [codec] Measuring the algorithmic efficiency … stephen botzko
- Re: [codec] Measuring the algorithmic efficiency … Christian Hoene
- Re: [codec] Measuring the algorithmic efficiency … stephen botzko