Re: [video-codec] Lessons Learned from Audio Codec Group - Normative Text not Code

"Timothy B. Terriberry" <tterribe@xiph.org> Fri, 05 October 2012 15:18 UTC

Return-Path: <tterribe@xiph.org>
X-Original-To: video-codec@ietfa.amsl.com
Delivered-To: video-codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8BF0A21F844D for <video-codec@ietfa.amsl.com>; Fri, 5 Oct 2012 08:18:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.594
X-Spam-Level:
X-Spam-Status: No, score=-1.594 tagged_above=-999 required=5 tests=[AWL=0.083, BAYES_00=-2.599, HELO_MISMATCH_ORG=0.611, HOST_MISMATCH_COM=0.311]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id E9waGTMlvOBA for <video-codec@ietfa.amsl.com>; Fri, 5 Oct 2012 08:18:39 -0700 (PDT)
Received: from smtp.mozilla.org (mx1.corp.phx1.mozilla.com [63.245.216.69]) by ietfa.amsl.com (Postfix) with ESMTP id D9ECB21F844B for <video-codec@ietf.org>; Fri, 5 Oct 2012 08:18:39 -0700 (PDT)
Received: from [10.250.6.54] (corp-240.mv.mozilla.com [63.245.220.240]) (Authenticated sender: tterriberry@mozilla.com) by mx1.mail.corp.phx1.mozilla.com (Postfix) with ESMTPSA id CADF6F22C0 for <video-codec@ietf.org>; Fri, 5 Oct 2012 08:18:38 -0700 (PDT)
Message-ID: <506EFA4E.2060808@xiph.org>
Date: Fri, 05 Oct 2012 08:18:38 -0700
From: "Timothy B. Terriberry" <tterribe@xiph.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120626 SeaMonkey/2.10.1
MIME-Version: 1.0
To: "video-codec@ietf.org" <video-codec@ietf.org>
References: <506E52B8.6080206@jmvalin.ca>, <CC93A84D.8DE12%stewe@stewe.org> <9B8EA46C78239244B5F7A07E163D3DFE03F8F3@CH1PRD0511MB432.namprd05.prod.outlook.com>
In-Reply-To: <9B8EA46C78239244B5F7A07E163D3DFE03F8F3@CH1PRD0511MB432.namprd05.prod.outlook.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Subject: Re: [video-codec] Lessons Learned from Audio Codec Group - Normative Text not Code
X-BeenThere: video-codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Video codec BoF discussion list <video-codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/video-codec>, <mailto:video-codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/video-codec>
List-Post: <mailto:video-codec@ietf.org>
List-Help: <mailto:video-codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/video-codec>, <mailto:video-codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Oct 2012 15:18:40 -0000

Gregory Maxwell wrote:
>> I prefer a bit-exact decoder operation spec
>> over the performance based decoder spec we had in Opus.
>
> For video there is no question in my mind for that. Even ignoring the norms and the conformance testing simplifications, the signal characteristics ('low' bit depth and low sampling frequency) in video mostly preclude auto-regressive prediction as we used in Opus and benefit significantly from unity-gain prediction (which requires bit accuracy).

More to the point, we were using bit-exact decoder definitions in 
VP3/Theora back when MPEG 1, MPEG 2, and MPEG 4 part 2 were still using 
transforms specified only up to a set of tolerances with metrics 
designed to avoid drift (and macroblock refresh requirements in the 
encoder) [1]. I consider the latter approach to be moderately crazy, 
even without the rapid accumulation of error that would be produced in a 
single frame by intra prediction (which is why H.264 was forced to go 
the bit-exact route), but that is just my personal opinion.

> But if test implementations are separate from the draft then I don't really
> see that it merits any real discussion. People can create whatever they

Even if the code is non-normative, that doesn't mean we can't include it 
in the draft. Some in the IETF seemed to strongly support the idea of 
providing reference code in the RFC [2], even if there were objections 
to making it normative.

As for speed, this is a trade-off, like anything else. There are 
definite benefits to having a fast implementation, but this requires 
more work than it does for audio. For Opus, the entire codec can almost 
fit in L1 cache, and the reference includes no SIMD assembly. For video, 
cache considerations are extremely important (and complicate the code), 
and SIMD assembly is a basic necessity for adequate speed on modern 
general purpose CPUs. It's also time-consuming to develop, and has 
compiler/toolchain dependency issues we did not have for Opus. Time 
spent working on those things is time that isn't being spent improving 
basic coding performance, testing reliability, or a host of other 
things. Particular individuals may be much more effective at some of 
those things than others, and slow code may make the other things more 
difficult, but I don't believe that the best approach is to have no 
attempt at optimization, nor is it to invest all the effort needed to 
make a commercial-grade optimized implementation. Though we will do the 
latter, and release it as open-source (because that is what we do), I'm 
not sure that needs to be gating for standardization.

[1] We were, of course, not the first to use bit-exact integer 
transforms. That dates back to at _least_ W.K. Pratt, J. Kane, and H.C. 
Andrews, "Hadamard transform image coding," in Proc. IEEE 57(1):58--68, 
Jan 1969.

[2] 
https://www.ietf.org/ibin/c5i?mid=6&rid=49&gid=0&k1=933&k2=64944&tid=1349448928