Re: [codec] #20: Computational complexity?
"codec issue tracker" <trac@tools.ietf.org> Thu, 24 June 2010 15:33 UTC
Return-Path: <trac@tools.ietf.org>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 159753A67FB for <codec@core3.amsl.com>; Thu, 24 Jun 2010 08:33:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -100.848
X-Spam-Level:
X-Spam-Status: No, score=-100.848 tagged_above=-999 required=5 tests=[AWL=-1.448, BAYES_50=0.001, J_CHICKENPOX_14=0.6, NO_RELAYS=-0.001, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HLCTtv0wohZE for <codec@core3.amsl.com>; Thu, 24 Jun 2010 08:33:30 -0700 (PDT)
Received: from zinfandel.tools.ietf.org (unknown [IPv6:2001:1890:1112:1::2a]) by core3.amsl.com (Postfix) with ESMTP id 812653A680D for <codec@ietf.org>; Thu, 24 Jun 2010 08:33:29 -0700 (PDT)
Received: from localhost ([::1] helo=zinfandel.tools.ietf.org) by zinfandel.tools.ietf.org with esmtp (Exim 4.72) (envelope-from <trac@tools.ietf.org>) id 1ORoQs-0000xd-7E; Thu, 24 Jun 2010 08:33:38 -0700
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: codec issue tracker <trac@tools.ietf.org>
X-Trac-Version: 0.11.7
Precedence: bulk
Auto-Submitted: auto-generated
X-Mailer: Trac 0.11.7, by Edgewall Software
To: hoene@uni-tuebingen.de
X-Trac-Project: codec
Date: Thu, 24 Jun 2010 15:33:38 -0000
X-URL: http://tools.ietf.org/codec/
X-Trac-Ticket-URL: https://svn.tools.ietf.org/wg/codec/trac/ticket/20#comment:3
Message-ID: <071.5e6ef09350c42a52fe51191cc3cf486e@tools.ietf.org>
References: <062.8524135614c0f45c18915362cc459235@tools.ietf.org>
X-Trac-Ticket-ID: 20
In-Reply-To: <062.8524135614c0f45c18915362cc459235@tools.ietf.org>
X-SA-Exim-Connect-IP: ::1
X-SA-Exim-Rcpt-To: hoene@uni-tuebingen.de, codec@ietf.org
X-SA-Exim-Mail-From: trac@tools.ietf.org
X-SA-Exim-Scanned: No (on zinfandel.tools.ietf.org); SAEximRunCond expanded to false
Cc: codec@ietf.org
Subject: Re: [codec] #20: Computational complexity?
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Reply-To: codec@ietf.org
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 24 Jun 2010 15:33:34 -0000
#20: Computational complexity? ------------------------------------+--------------------------------------- Reporter: hoene@… | Owner: Type: defect | Status: new Priority: major | Milestone: Component: requirements | Version: Severity: - | Keywords: ------------------------------------+--------------------------------------- Comment(by hoene@…): [Hoene]: in the last weeks we had a couple of discussion on the complexity or algorithmic efficiency of the codec. Knowing the algorithmic efficiency is important because: 1. the complexity has an impact on power consumption and system costs, 2. the hardware can be selected to fit pre-known complexity requirement, and 3. different codec proposals can be compared if they are show similar performance in other aspects. Before any complexity comparisons can be made, we need an objective, precise, reliable, and repeatable metric on how to measure the algorithmic efficiency. Let me describe three different approaches: Classic Approach used in Codec Standardization In the last 17 years, the ITU-T Study Group 16 measure the complexity of codecs using a library called ITU-T Basic Operators (described in ITU-T G.191), that counts the kind and number of operations and the amount of memory. The last version of the standard (03/10), not yet publically available, supports - beside fix-point operations of different widths - also floating operations. Each operation can be counted automatically and weighted accordingly. The following source code is an excerpt from baseop32.h /*___________________________________________________________________________ | | | Prototypes for basic arithmetic operators | |___________________________________________________________________________| */ Word16 add (Word16 var1, Word16 var2); /* Short add, 1 */ Word16 sub (Word16 var1, Word16 var2); /* Short sub, 1 */ Word16 abs_s (Word16 var1); /* Short abs, 1 */ Word16 shl (Word16 var1, Word16 var2); /* Short shift left, 1 */ Word16 shr (Word16 var1, Word16 var2); /* Short shift right, 1 */ … Word16 div_s (Word16 var1, Word16 var2); /* Short division, 18 */ Word16 norm_l (Word32 L_var1); /* Long norm, 1 */ In the upcoming ITU-T G.GSAD standard another approach has been used as shown in the following code example. For each operation, WMPOS function have been added, which count the number of operations. If the algorithmic efficiency of an algorithm has to be measured, the program is run and the operations are counted. for (i=0; i<NUM_BAND; i++) { #ifdef WMOPS_FX move32();move32(); move32();move32(); #endif state_fx->band_enrg_long_fx[i] = 30; state_fx->band_enrg_fx[i] = 30; state_fx->band_enrg_bgd_fx[i] = 30; state_fx->min_band_enrg_fx[i] = 30; } These two similar examples worked well in the years and are an established procedure to count operations and complexity. Still, it has some drawbacks a) Existing algorithm must be modified manually to address the counting of operations. This is time consuming. b) The CPU model is simplistic as it does not consider memory access (e.g. cache), parallel executions, or other kind of optimization done in modern microprocessor and compilers. Thus, the number of instruction might not correlated well with the actual execution time on modern CPUs. Profiling: The Classic Approach used in Software Engineering Citing Wikipedia: „In software engineering, program profiling, software profiling or simply profiling, a form of dynamic program analysis (as opposed tostatic code analysis), is the investigation of a program's behavior using information gathered as the program executes. The usual purpose of this analysis is to determine which sections of a program to optimize - to increase its overall speed, decrease its memory requirement or sometimes both. • A (code) profiler is a performance analysis tool that, most commonly, measures only the frequency and duration of function calls, but there are other specific types of profilers (e.g. memory profilers) in addition to more comprehensive profilers, capable of gathering extensive performance data. • An instruction set simulator which is also — by necessity — a profiler, can measure the totality of a program's behaviour from invocation to termination.“ Thus, a typical profiler such as gprof can be used to measure and understand the complexity of a codec implementation. It is precise because it is uses on modern computers. However, if used for standardization, it has the following drawback. a) The execution times depend on the CPU architecture, the PC in general, the OS and parallel running programs. As such it cannot produce reliable and repeatable results. The results are not comparable if done under slightly changed conditions. Suggestion: Use a Cycle Accurate Simulator Wikipedia: „A Cycle Accurate Simulator (CAS) is a computer program that simulates a microarchitecture cycle-accurate. In contrast an instruction set simulator simulates an Instruction Set Architecture usually faster but not cycle-accurate to a specific implementation of this architecture.“ Then, the execution times are precise and they are repeatable. If two parties make measurements using different CPUs, they still get the same results. Of course, a cycle accurate simulator is slower than the real CPU at a factor of about 100. Actually, there is an open-source Cycle accurate simulator called PTLsim avaible that simulates an Pentium IV. „PTLsim is a cycle accurate x86 microprocessor simulator and virtual machine for the x86 and x86-64 instruction sets. PTLsim models a modern superscalar out of order x86-64 compatible processor core at a configurable level of detail ranging from full-speed native execution on the host CPU all the way down to RTL level models of all key pipeline structures. In addition, all microcode, the complete cache hierarchy, memory subsystem and supporting hardware devices are modeled with true cycle accuracy. PTLsim supports the full x86-64 instruction set of the Pentium 4+, Athlon 64 and similar machines with all extensions (x86-64, SSE/SSE2/SSE3, MMX, x87). It is currently the only tool available to the public to support true cycle accurate modeling of real x86 microarchitectures. […] PTLsim is used extensively at hundreds of major universities, industry research labs and the well known x86 microprocessor vendors Intel and AMD.“ Thus, I would suggest to use the Cycle Accurate Simulator PTLsim to measure the complexity of the IETF audio codec contributions because a) it makes precise performance measurements using a modern, widely deployed CPU platform, b) the measurement are repeatable and can be made by everybody, c) existing source code can be used without significant changes, and d) it is time-saving and easy to use. [Botzko]: Are you implying that the complexity is only a requirement for x86 machines? [Hoene]: Hoene: No, complexity is a requirement for all machines. However, x86 machines are distributed and used widely on PC, notebook and also sometimes also in embedded systems. Also, the contributed source code is only available for PCs like system not for DSPs. Thus, any testing on DSP will not be possible. However, you are free to suggest other reference platforms such as • ARM: http://facsim.snu.ac.kr/ • C64x+ CPU Cycle Accurate Simulator: http://processors.wiki.ti.com/index.php/C64x%2B_CPU_Cycle_Accurate_Simulator Just a matter of choice (and availability). [Botzko]: One significant disadvantage of profiling is that it gives an unfair advantage to code which is optimized for the x86 platform. This may not be a problem at the end of the development process, however in the beginning it is best to stay focused on the core algorithm. Even at the end, more generic code may be the best reference, since the x86 optimizations don't have to be undone if you are re-optimizing for a non-x86 platform. -- Ticket URL: <https://svn.tools.ietf.org/wg/codec/trac/ticket/20#comment:3> codec <http://tools.ietf.org/codec/>
- [codec] #20: Computational complexity? codec issue tracker
- Re: [codec] #20: Computational complexity? codec issue tracker
- Re: [codec] #20: Computational complexity? codec issue tracker
- Re: [codec] #20: Computational complexity? codec issue tracker
- Re: [codec] requirements #20 (new): Computational… codec issue tracker
- Re: [codec] #20: Computational complexity? codec issue tracker