Re: [rtcweb] New VP8 vs H.264 tests uploaded

Harald Alvestrand <harald@alvestrand.no> Fri, 05 April 2013 07:19 UTC

Return-Path: <harald@alvestrand.no>
X-Original-To: rtcweb@ietfa.amsl.com
Delivered-To: rtcweb@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 55A8D21F95E1 for <rtcweb@ietfa.amsl.com>; Fri, 5 Apr 2013 00:19:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -110.598
X-Spam-Level:
X-Spam-Status: No, score=-110.598 tagged_above=-999 required=5 tests=[AWL=-0.000, BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-8, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id j2SK2UoE9Lnf for <rtcweb@ietfa.amsl.com>; Fri, 5 Apr 2013 00:18:57 -0700 (PDT)
Received: from eikenes.alvestrand.no (eikenes.alvestrand.no [158.38.152.233]) by ietfa.amsl.com (Postfix) with ESMTP id 4A6EA21F943A for <rtcweb@ietf.org>; Fri, 5 Apr 2013 00:18:56 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by eikenes.alvestrand.no (Postfix) with ESMTP id 534AF39E0C8; Fri, 5 Apr 2013 09:18:54 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at eikenes.alvestrand.no
Received: from eikenes.alvestrand.no ([127.0.0.1]) by localhost (eikenes.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mIHfuOfxDUNi; Fri, 5 Apr 2013 09:18:48 +0200 (CEST)
Received: from hta-dell.lul.corp.google.com (62-20-124-50.customer.telia.com [62.20.124.50]) by eikenes.alvestrand.no (Postfix) with ESMTPSA id 09D7139E091; Fri, 5 Apr 2013 09:18:48 +0200 (CEST)
Message-ID: <515E7AD6.7040001@alvestrand.no>
Date: Fri, 05 Apr 2013 09:18:46 +0200
From: Harald Alvestrand <harald@alvestrand.no>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130308 Thunderbird/17.0.4
MIME-Version: 1.0
To: "Mo Zanaty (mzanaty)" <mzanaty@cisco.com>
References: <CAPVCLWbajJNS-DbXS-AJjakwovBKhhpXAmBaR_LYKjCyk7UnYg@mail.gmail.com> <515D3FA1.6050305@gmail.com> <515D96A2.1000602@cisco.com> <CAGgHUiRLAmGz7H5iY_cpiiKPPN6JXo1jc2-U7TZLe6k-qETo9Q@mail.gmail.com> <3879D71E758A7E4AA99A35DD8D41D3D90F69B243@xmb-rcd-x14.cisco.com>
In-Reply-To: <3879D71E758A7E4AA99A35DD8D41D3D90F69B243@xmb-rcd-x14.cisco.com>
Content-Type: multipart/alternative; boundary="------------050806080901090103050000"
Cc: "Cullen Jennings (fluffy)" <fluffy@cisco.com>, "rtcweb@ietf.org" <rtcweb@ietf.org>
Subject: Re: [rtcweb] New VP8 vs H.264 tests uploaded
X-BeenThere: rtcweb@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <rtcweb.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtcweb>
List-Post: <mailto:rtcweb@ietf.org>
List-Help: <mailto:rtcweb-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Apr 2013 07:19:12 -0000

On 04/05/2013 01:28 AM, Mo Zanaty (mzanaty) wrote:
>
> Realtime/low latency and constrained bitrate are obviously important 
> for the actual implementation used. Thomas was pointing out that these 
> factors have nothing to do with the codec technology itself, since 
> they are purely encoder implementation optimizations. There is nothing 
> in the VP8 or H.264 standard that uniquely provides realtime/low 
> latency or constrained bitrate. Those are attributes of encoder 
> implementations which are not part of the standard.
>

I'd challenge that assumption.
If one technology behaves much better under constrained bitrate and/or 
low latency than the other, that might just possibly be linked to the 
technology.

As an obvious example, consider B-frames; due to their basic "predict 
from the future" nature, they will incur a latency penalty if used.
A test that depends on B-frames will therefore be an invalid test for 
real-time operation.

What we can always be sure of is that when a certain quality is 
demonstrated under certain conditions, that quality is achievable under 
those conditions. We should be careful about drawing larger conclusions 
than that.

> So the question was whether we care about evaluating codec technology 
> or specific implementations. If the former, then tests should be 
> staged in the same way codec experts evaluate codec technology/tools.
>

by "the same way codec experts evaluate codec technology/tools", do you 
mean the way MPEG does it?

Having just started working in MPEG, with their procedures for quality 
evaluations .... I can't say I'm terribly impressed by the evaluation 
methods used. I'm also very unimpressed with the openness of the process.

> If the latter, then tests should be staged using the target 
> implementations.
>
> I'm not aware of conferencing applications which use x264, because it 
> was designed and optimized for transcoding (dvd rips to blu-ray) not 
> conferencing. Most importantly, x264 cbr mode is inappropriate for 
> conferencing since it is for broadcast MPEG transport streams that 
> must be absolutely CBR to avoid M2TS-mux overflow or underflow, and it 
> will actually insert filler data instead of real frame data to hit the 
> CBR rate exactly. Looking at the results which show the worst H.264 
> bitrate (62% above VP8) in gipsrecstat_1280_720_50_1485kbps.mkv, there 
> is almost as much filler data as real frame data, meaning the true 
> bitrate of real frame data is almost half what is reported in the 
> results. (See attached if it makes it through.)
>
It made it through, but I can't interpret it much. Certainly sounds like 
CBR is a setting to avoid.

> While the results are bad, the methodology, effort and transparency 
> are very good (if we want to compare implementations not standards). I 
> can rerun without the bogus fillers and post the results next week, 
> unless someone else can do it faster. But as Thomas pointed out, the 
> technologies themselves are comparable as far as coding tools, so any 
> results which show significant differences are either suspect or 
> explained by differences in encoder implementations or settings not 
> the codec technology itself.
>

I've pushed (as my first contribution to the actual code; the tests 
themselves were done by other Googlers) some changes that will make it a 
little easier for new people who execute them to get predictable results 
from the tests we published.

Looking forward to more contributions!


> Mo
>
> *From:*rtcweb-bounces@ietf.org [mailto:rtcweb-bounces@ietf.org] *On 
> Behalf Of *Leon Geyser
> *Sent:* Thursday, April 04, 2013 12:56 PM
> *To:* Thomas Davies (thdavies)
> *Cc:* rtcweb@ietf.org
> *Subject:* Re: [rtcweb] New VP8 vs H.264 tests uploaded
>
>     If the purpose is to show whether vp8 is superior as a
>     *technology* to h264 CBP, then I think the comparison should use
>     the best settings you have (ideally with a special full-on
>     non-real time implementation) and test against the JM reference
>     encoder. Ideally you would use the same or similar GOP structures,
>     number of references, prediction and QP hierarchies.
>
> I thought WebRTC was meant for real-time communication. What would it 
> benefit us if we test settings that won't be used or can't be used in 
> practice?
>
> The tests need to test the encoders at realtime/low latency and at a 
> constrained bitrate mode like CBR. We aren't archiving videos here :)
>
> A graph that shows the bitrate over time for each clip could be 
> usefull to make sure that no encoder spikes the bitrate too high at 
> certain moments.
> I welcome changes to the encoder settings as long as they stay 
> realtime/low latency and constrained bitrate.
>
> On 4 April 2013 17:05, Thomas Davies <thdavies@cisco.com 
> <mailto:thdavies@cisco.com>> wrote:
>
> Harald,
>
> I think there are quite a few problems with the comparison you have 
> posted.
>
> 1. Looking at the sequences there is a very major difference between 
> the initial intra frame qualities. When I encode just one frame of 
> sequence gipsrecomotion using the parameters in the script at 1Mb/s 
> then the intra frame is 3 times larger with vp8 than with x264.
>
> With video conferencing content, the quality of the initial I frame 
> has a big impact that can last for many seconds - certainly the length 
> of these clips. You can easily get gains by increasing the quality 
> difference between an I frame and subsequent frames.
>
> x264 seems to have a policy of initially undershooting the bitrate 
> substantially and ramping up, whereas vpxenc has a different approach. 
> During this ramp up period the quality is very much worse. I can't 
> find a way to persuade x264 to behave differently.
>
> This is a good illustration of why including rate control in 
> comparisons is a bad idea.
>
> 2. Likewise, looking at the individual frame sizes, it seems vpxenc is 
> using a quality hierarchy with a length of 8 ("hiercharchical-P") 
> where every 8th frame is about 4x bigger than the others. x264 has a 
> constant target per frame.
>
> Hierarchical P frames are a really good idea, and can easily get you 
> 10-20% gain with a big separation like this, at a cost in latency. 
> Again I don't know how to make x264 do this, but the technique is 
> applicable to any codec and is used in the JM reference.
>
> 3. The x264 settings are a bit of a black art, but appear not to be 
> ideal after all. I am definitely no expert but I found that when 
> encoding gipsrecomotion at 1Mb/s:
>
> - setting --threads 1 improves quality by a full 1dB (vpxenc seems to 
> run single threaded by default)
> - reducing the number of references from 3 to 2 (--ref 2) reduces the 
> load very substantially at very little loss (0.2dB or so).
>
> So with --threads 1 --ref 2, I found x264 ran more than 2x faster than 
> vpxenc for this data point and had much better quality than before. 
> vpxenc is still better (about 1dB), but very possibly within the range 
> of hierarchical P coding improvements.
>
> Incidentally, I don't think that x264 performs particularly well at 
> these high complexity settings, at least for video conferencing, no 
> doubt as other more practical settings have been targeted. x264 
> appears to have a quality ceiling that the JM does not have.
>
> 4. Another (smaller) issue is that the reported PSNR is combined luma 
> and chroma over all frames. It's relatively easy to improve chroma 
> PSNR at a small cost in bits, and usually it is best to ignore chroma 
> PSNR or (possibly) give it a small weight. The arithmetic mean of 
> frame PSNRs is generally used rather than the PSNR of the whole 
> sequence, also. I would very much like separate component PSNRs in 
> tests. The figures I quote above are luma PSNR.
>
> If the purpose is to show whether vp8 is superior as a *technology* to 
> h264 CBP, then I think the comparison should use the best settings you 
> have (ideally with a special full-on non-real time implementation) and 
> test against the JM reference encoder. Ideally you would use the same 
> or similar GOP structures, number of references, prediction and QP 
> hierarchies.
>
> Comparing different real-time implementations of different codecs 
> trying to do high quality coding with different GOP structures and 
> using rate control with different strategies is just a waste of time. 
> The first two elements in the list above are alone worth a very 
> significant amount of bit rate.
>
> On the other hand, a quick perusal of the actual tools would suggest 
> that vp8 and h264 CBP are likely "comparable" and the variation 
> between implementations of the same technology would be bigger than 
> the variation between the technologies. If we could agree that then a 
> lot of time could be saved.
>
> best regards
>
> Thomas
>
>
>
>
>
> On 04/04/13 09:53, Sergio Garcia Murillo wrote:
>
> Hi Adrian,
>
> Could you explain how the encoding parametrization is comparable?
>
> x264 --nal-hrd cbr --vbv-maxrate ${rate} --vbv-bufsize ${rate} \
>       --vbv-init 0.8 --bitrate ${rate} --fps ${frame_rate} \
>       --profile baseline --no-scenecut --keyint infinite --preset 
> veryslow \
>       --input-res ${width}x${height} \
>       --tune psnr \
>       -o ./encoded_clips/h264/${clip_stem}_${rate}kbps.mkv ${filename} \
>       2> ./logs/h264/${clip_stem}_${rate}kbps.txt
>
> vs:
>
>  ./bin/vpxenc --lag-in-frames=0 --target-bitrate=${rate} 
> --kf-min-dist=3000 \
>       --kf-max-dist=3000 --cpu-used=0 --fps=${frame_rate}/1 
> --static-thresh=0 \
>       --token-parts=1 --drop-frame=0 --end-usage=cbr --min-q=2 
> --max-q=56 \
>       --undershoot-pct=100 --overshoot-pct=15 --buf-sz=1000 \
>       --buf-initial-sz=800 --buf-optimal-sz=1000 --max-intra-rate=1200 \
>       --resize-allowed=0 --drop-frame=0 --passes=1 --good 
> --noise-sensitivity=0 \
>       -w ${width} -h ${height} ${filename} --codec=vp8 \
>       -o ./encoded_clips/vp8/${clip_stem}_${rate}kbps.webm \
> &>./logs/vp8/${clip_stem}_${rate}kbps.txt
>
> Best regards
> Sergio
>
> El 03/04/2013 18:20, Adrian Grange escribió:
>
>     We have uploaded a new set of test results comparing VP8 to H.264.
>     This latest set contains fixes for some of the problems in the
>     previous set. We would like to extend our thanks to those who made
>     suggestions as to how we could improve our methodology and
>     encourage suggestions as to how we can make further improvements.
>
>     In these tests we run x264 with the "veryslow" preset and VP8 with
>     the "good, speed 0" setting in an attempt to produce comparable
>     results.
>
>     An overview of our results is available as follows:
>
>     - A Quality comparison (psnr):
>     http://downloads.webmproject.org/ietf_tests/vp8_vs_h264_quality.html
>
>     - An Encode Speed comparison:
>     http://downloads.webmproject.org/ietf_tests/vp8_vs_h264_speed.html
>
>     - A comparison of the aggregate time required to decode all of the
>     clips in the test:
>     http://downloads.webmproject.org/ietf_tests/vp8vsh264-decodetime.txt
>
>     All of our test scripts can either be downloaded from:
>
>     http://downloads.webmproject.org/ietf_tests/vp8_vs_h264.tar.xz
>
>     or checked out of our git/gerrit repository:
>
>     git clone http://git.chromium.org/webm/vpx_codec_comparison.git
>
>     The file README.txt, contained within, presents details of how to
>     build and run the tests.
>
>     The compressed video files--the output from the quality tests--can
>     also be downloaded:
>
>     VP8:
>
>     http://downloads.webmproject.org/ietf_tests/vp8_videos
>     <http://downloads.webmproject.org/ietf_tests/vp8_videos/>/index.html
>
>     H.264:
>
>     http://downloads.webmproject.org/ietf_tests/h264_videos/index.html
>
>     Adrian Grange
>
>     _______________________________________________
>
>     rtcweb mailing list
>
>     rtcweb@ietf.org  <mailto:rtcweb@ietf.org>
>
>     https://www.ietf.org/mailman/listinfo/rtcweb
>
>
>
> _______________________________________________
> rtcweb mailing list
> rtcweb@ietf.org  <mailto:rtcweb@ietf.org>
> https://www.ietf.org/mailman/listinfo/rtcweb
>
>
> _______________________________________________
> rtcweb mailing list
> rtcweb@ietf.org <mailto:rtcweb@ietf.org>
> https://www.ietf.org/mailman/listinfo/rtcweb
>