Re: [rtcweb] New VP8 vs H.264 tests uploaded

"Mo Zanaty (mzanaty)" <mzanaty@cisco.com> Thu, 04 April 2013 23:28 UTC

Return-Path: <mzanaty@cisco.com>
X-Original-To: rtcweb@ietfa.amsl.com
Delivered-To: rtcweb@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3B80F21F875A for <rtcweb@ietfa.amsl.com>; Thu, 4 Apr 2013 16:28:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0
X-Spam-Level:
X-Spam-Status: No, score=x tagged_above=-999 required=5 tests=[]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xcqpON+e56Rl for <rtcweb@ietfa.amsl.com>; Thu, 4 Apr 2013 16:28:37 -0700 (PDT)
Received: from rcdn-iport-5.cisco.com (rcdn-iport-5.cisco.com [173.37.86.76]) by ietfa.amsl.com (Postfix) with ESMTP id C457921F8738 for <rtcweb@ietf.org>; Thu, 4 Apr 2013 16:28:34 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=464977; q=dns/txt; s=iport; t=1365118114; x=1366327714; h=from:to:cc:subject:date:message-id:references: in-reply-to:mime-version; bh=wGN1mCrRxm2LjsWZy8YizSjx6xgld1C4yvO6B58JaQ4=; b=HHIw3BGT+UY+Bs7LpA8bbDc9V6KsDAnF7OekR45CCgsrOp3P8Is0bKpS 6ncxDx+RwCNoK//wNGOJVz9lxzT5FanFQ5rZlg5HkePOFM7cPe9oovkWR Hplcz5nTNTWdWciSC0Bn13Vz3T3rVT18GMv8EUMKpoz4X7rf8UlOdYzA8 g=;
X-Files: x264cbr.png : 315058
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApsEAOULXlGtJV2c/2dsb2JhbADGIw
X-IronPort-AV: E=Sophos; i="4.87,412,1363132800"; d="png'150?scan'150,208,217,150"; a="195236951"
Received: from rcdn-core-5.cisco.com ([173.37.93.156]) by rcdn-iport-5.cisco.com with ESMTP; 04 Apr 2013 23:28:34 +0000
Received: from xhc-rcd-x13.cisco.com (xhc-rcd-x13.cisco.com [173.37.183.87]) by rcdn-core-5.cisco.com (8.14.5/8.14.5) with ESMTP id r34NSXP0002910 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Thu, 4 Apr 2013 23:28:33 GMT
Received: from xmb-rcd-x14.cisco.com ([169.254.4.51]) by xhc-rcd-x13.cisco.com ([173.37.183.87]) with mapi id 14.02.0318.004; Thu, 4 Apr 2013 18:28:33 -0500
From: "Mo Zanaty (mzanaty)" <mzanaty@cisco.com>
To: Leon Geyser <lgeyser@gmail.com>, "Thomas Davies (thdavies)" <thdavies@cisco.com>, Adrian Grange <agrange@google.com>, "Cullen Jennings (fluffy)" <fluffy@cisco.com>, "Harald Alvestrand (harald@alvestrand.no)" <harald@alvestrand.no>
Thread-Topic: [rtcweb] New VP8 vs H.264 tests uploaded
Thread-Index: AQHOMVU+Hnjec1L9ZkOY54Vzu17WCpjGSHWg
Date: Thu, 04 Apr 2013 23:28:32 +0000
Message-ID: <3879D71E758A7E4AA99A35DD8D41D3D90F69B243@xmb-rcd-x14.cisco.com>
References: <CAPVCLWbajJNS-DbXS-AJjakwovBKhhpXAmBaR_LYKjCyk7UnYg@mail.gmail.com> <515D3FA1.6050305@gmail.com> <515D96A2.1000602@cisco.com> <CAGgHUiRLAmGz7H5iY_cpiiKPPN6JXo1jc2-U7TZLe6k-qETo9Q@mail.gmail.com>
In-Reply-To: <CAGgHUiRLAmGz7H5iY_cpiiKPPN6JXo1jc2-U7TZLe6k-qETo9Q@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator:
x-originating-ip: [10.150.30.39]
Content-Type: multipart/mixed; boundary="_004_3879D71E758A7E4AA99A35DD8D41D3D90F69B243xmbrcdx14ciscoc_"
MIME-Version: 1.0
Cc: "rtcweb@ietf.org" <rtcweb@ietf.org>
Subject: Re: [rtcweb] New VP8 vs H.264 tests uploaded
X-BeenThere: rtcweb@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <rtcweb.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtcweb>
List-Post: <mailto:rtcweb@ietf.org>
List-Help: <mailto:rtcweb-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Apr 2013 23:28:37 -0000

Realtime/low latency and constrained bitrate are obviously important for the actual implementation used. Thomas was pointing out that these factors have nothing to do with the codec technology itself, since they are purely encoder implementation optimizations. There is nothing in the VP8 or H.264 standard that uniquely provides realtime/low latency or constrained bitrate. Those are attributes of encoder implementations which are not part of the standard.

So the question was whether we care about evaluating codec technology or specific implementations. If the former, then tests should be staged in the same way codec experts evaluate codec technology/tools. If the latter, then tests should be staged using the target implementations.

I'm not aware of conferencing applications which use x264, because it was designed and optimized for transcoding (dvd rips to blu-ray) not conferencing. Most importantly, x264 cbr mode is inappropriate for conferencing since it is for broadcast MPEG transport streams that must be absolutely CBR to avoid M2TS-mux overflow or underflow, and it will actually insert filler data instead of real frame data to hit the CBR rate exactly. Looking at the results which show the worst H.264 bitrate (62% above VP8) in gipsrecstat_1280_720_50_1485kbps.mkv, there is almost as much filler data as real frame data, meaning the true bitrate of real frame data is almost half what is reported in the results. (See attached if it makes it through.)

While the results are bad, the methodology, effort and transparency are very good (if we want to compare implementations not standards). I can rerun without the bogus fillers and post the results next week, unless someone else can do it faster. But as Thomas pointed out, the technologies themselves are comparable as far as coding tools, so any results which show significant differences are either suspect or explained by differences in encoder implementations or settings not the codec technology itself.

Mo


From: rtcweb-bounces@ietf.org [mailto:rtcweb-bounces@ietf.org] On Behalf Of Leon Geyser
Sent: Thursday, April 04, 2013 12:56 PM
To: Thomas Davies (thdavies)
Cc: rtcweb@ietf.org
Subject: Re: [rtcweb] New VP8 vs H.264 tests uploaded

If the purpose is to show whether vp8 is superior as a *technology* to h264 CBP, then I think the comparison should use the best settings you have (ideally with a special full-on non-real time implementation) and test against the JM reference encoder. Ideally you would use the same or similar GOP structures, number of references, prediction and QP hierarchies.
I thought WebRTC was meant for real-time communication. What would it benefit us if we test settings that won't be used or can't be used in practice?

The tests need to test the encoders at realtime/low latency and at a constrained bitrate mode like CBR. We aren't archiving videos here :)

A graph that shows the bitrate over time for each clip could be usefull to make sure that no encoder spikes the bitrate too high at certain moments.
I welcome changes to the encoder settings as long as they stay realtime/low latency and constrained bitrate.

On 4 April 2013 17:05, Thomas Davies <thdavies@cisco.com<mailto:thdavies@cisco.com>> wrote:
Harald,

I think there are quite a few problems with the comparison you have posted.

1. Looking at the sequences there is a very major difference between the initial intra frame qualities. When I encode just one frame of sequence gipsrecomotion using the parameters in the script at 1Mb/s then the intra frame is 3 times larger with vp8 than with x264.

With video conferencing content, the quality of the initial I frame has a big impact that can last for many seconds - certainly the length of these clips. You can easily get gains by increasing the quality difference between an I frame and subsequent frames.

x264 seems to have a policy of initially undershooting the bitrate substantially and ramping up, whereas vpxenc has a different approach. During this ramp up period the quality is very much worse. I can't find a way to persuade x264 to behave differently.

This is a good illustration of why including rate control in comparisons is a bad idea.

2. Likewise, looking at the individual frame sizes, it seems vpxenc is using a quality hierarchy with a length of 8 ("hiercharchical-P") where every 8th frame is about 4x bigger than the others. x264 has a constant target per frame.

Hierarchical P frames are a really good idea, and can easily get you 10-20% gain with a big separation like this, at a cost in latency. Again I don't know how to make x264 do this, but the technique is applicable to any codec and is used in the JM reference.

3. The x264 settings are a bit of a black art, but appear not to be ideal after all. I am definitely no expert but I found that when encoding gipsrecomotion at 1Mb/s:

- setting --threads 1 improves quality by a full 1dB (vpxenc seems to run single threaded by default)
- reducing the number of references from 3 to 2 (--ref 2) reduces the load very substantially at very little loss (0.2dB or so).

So with --threads 1 --ref 2, I found x264 ran more than 2x faster than vpxenc for this data point and had much better quality than before. vpxenc is still better (about 1dB), but very possibly within the range of hierarchical P coding improvements.

Incidentally, I don't think that x264 performs particularly well at these high complexity settings, at least for video conferencing, no doubt as other more practical settings have been targeted. x264 appears to have a quality ceiling that the JM does not have.

4. Another (smaller) issue is that the reported PSNR is combined luma and chroma over all frames. It's relatively easy to improve chroma PSNR at a small cost in bits, and usually it is best to ignore chroma PSNR or (possibly) give it a small weight. The arithmetic mean of frame PSNRs is generally used rather than the PSNR of the whole sequence, also. I would very much like separate component PSNRs in tests. The figures I quote above are luma PSNR.

If the purpose is to show whether vp8 is superior as a *technology* to h264 CBP, then I think the comparison should use the best settings you have (ideally with a special full-on non-real time implementation) and test against the JM reference encoder. Ideally you would use the same or similar GOP structures, number of references, prediction and QP hierarchies.

Comparing different real-time implementations of different codecs trying to do high quality coding with different GOP structures and using rate control with different strategies is just a waste of time. The first two elements in the list above are alone worth a very significant amount of bit rate.

On the other hand, a quick perusal of the actual tools would suggest that vp8 and h264 CBP are likely "comparable" and the variation between implementations of the same technology would be bigger than the variation between the technologies. If we could agree that then a lot of time could be saved.

best regards

Thomas




On 04/04/13 09:53, Sergio Garcia Murillo wrote:
Hi Adrian,

Could you explain how the encoding parametrization is comparable?

x264 --nal-hrd cbr --vbv-maxrate ${rate} --vbv-bufsize ${rate} \
      --vbv-init 0.8 --bitrate ${rate} --fps ${frame_rate} \
      --profile baseline --no-scenecut --keyint infinite --preset veryslow \
      --input-res ${width}x${height} \
      --tune psnr \
      -o ./encoded_clips/h264/${clip_stem}_${rate}kbps.mkv ${filename} \
      2> ./logs/h264/${clip_stem}_${rate}kbps.txt

vs:

 ./bin/vpxenc --lag-in-frames=0 --target-bitrate=${rate} --kf-min-dist=3000 \
      --kf-max-dist=3000 --cpu-used=0 --fps=${frame_rate}/1 --static-thresh=0 \
      --token-parts=1 --drop-frame=0 --end-usage=cbr --min-q=2 --max-q=56 \
      --undershoot-pct=100 --overshoot-pct=15 --buf-sz=1000 \
      --buf-initial-sz=800 --buf-optimal-sz=1000 --max-intra-rate=1200 \
      --resize-allowed=0 --drop-frame=0 --passes=1 --good --noise-sensitivity=0 \
      -w ${width} -h ${height} ${filename} --codec=vp8 \
      -o ./encoded_clips/vp8/${clip_stem}_${rate}kbps.webm \
      &>./logs/vp8/${clip_stem}_${rate}kbps.txt

Best regards
Sergio

El 03/04/2013 18:20, Adrian Grange escribió:
We have uploaded a new set of test results comparing VP8 to H.264. This latest set contains fixes for some of the problems in the previous set. We would like to extend our thanks to those who made suggestions as to how we could improve our methodology and encourage suggestions as to how we can make further improvements.

In these tests we run x264 with the "veryslow" preset and VP8 with the "good, speed 0" setting in an attempt to produce comparable results.

An overview of our results is available as follows:

- A Quality comparison (psnr): http://downloads.webmproject.org/ietf_tests/vp8_vs_h264_quality.html

- An Encode Speed comparison: http://downloads.webmproject.org/ietf_tests/vp8_vs_h264_speed.html

- A comparison of the aggregate time required to decode all of the clips in the test: http://downloads.webmproject.org/ietf_tests/vp8vsh264-decodetime.txt

All of our test scripts can either be downloaded from:
http://downloads.webmproject.org/ietf_tests/vp8_vs_h264.tar.xz
or checked out of our git/gerrit repository:
git clone http://git.chromium.org/webm/vpx_codec_comparison.git

The file README.txt, contained within, presents details of how to build and run the tests.

The compressed video files--the output from the quality tests--can also be downloaded:

VP8:
http://downloads.webmproject.org/ietf_tests/vp8_videos<http://downloads.webmproject.org/ietf_tests/vp8_videos/>/index.html

H.264:
http://downloads.webmproject.org/ietf_tests/h264_videos/index.html

Adrian Grange







_______________________________________________

rtcweb mailing list

rtcweb@ietf.org<mailto:rtcweb@ietf.org>

https://www.ietf.org/mailman/listinfo/rtcweb



_______________________________________________

rtcweb mailing list

rtcweb@ietf.org<mailto:rtcweb@ietf.org>

https://www.ietf.org/mailman/listinfo/rtcweb


_______________________________________________
rtcweb mailing list
rtcweb@ietf.org<mailto:rtcweb@ietf.org>
https://www.ietf.org/mailman/listinfo/rtcweb