Re: [rtcweb] Comments on H.264 and VP8 performance comparisons

Thanks for the details, Bo!

We had the discussion about locking QP at MPEG too; the H.264 reference 
model is usually run with a fixed QP per frame type (the QP for the I 
frame is different from the P frame, which is again different from the B 
frame if used, but in encoding to a specific bitrate, they change the QP 
once per run so that the data rate comes out right), which is fine for 
comparing two minor versions of a codec to , but doesn't seem equally 
obvious when comparing two codecs that have differing histories and are 
expected to behave differently on a large number of points.
The VP8 codec was designed and tested in a real-world environment where 
rate controls have to be used, and where we can't make repeated passes 
over the clips to find the best QP values to use for a given clip. This 
colors the way it behaves; there's no reason to believe it would be 
optimal in a fixed-QP environment.

When we debated this at MPEG, the MPEG conclusion was that they accepted 
that we would encode the VP8 clips with our normal rate control despite 
the strenous objections of some members - I don't want to second-guess 
MPEG's decisions in this forum.

I'll look at the changes you're proposing. We did do one set of tweaks 
"as suggested online" to the H.264 parameters, the diff (from the public 
repo of our testing code) is:

Date:   Tue Mar 19 10:22:28 2013 -0700

     Used settings suggested by x264 devs and updated vp8 settings too

     We didn't go down to the vp8 equivalent of very_slow for quality tests.
     Now we do also adjusted the x264 settings to match what was 
suggested on
     line.

     Change-Id: I2270d156cbe5893ddb7d37633f5fe61f93fae091

diff --git a/run_h264_tests.sh b/run_h264_tests.sh
index a8a1db0..6d75136 100755
--- a/run_h264_tests.sh
+++ b/run_h264_tests.sh
@@ -50,9 +50,11 @@ do
    for (( rate=rate_start; rate<=rate_end; rate+=rate_step ))
    do
      # Encode into 
./<clip_name>_<width>_<height>_<frame_rate>_<rate>kbps.yuv
-    x264 --vbv-bufsize ${rate} --bitrate ${rate} --fps ${frame_rate} \
+    x264 --nal-hrd cbr --vbv-maxrate ${rate} --vbv-bufsize ${rate} \
+      --vbv-init 0.8 --bitrate ${rate} --fps ${frame_rate} \
        --profile baseline --no-scenecut --keyint infinite --preset 
veryslow \
        --input-res ${width}x${height} \
+      --tune psnr \
        -o ./encoded_clips/h264/${clip_stem}_${rate}kbps.mkv ${filename} \
        2> ./logs/h264/${clip_stem}_${rate}kbps.txt

We don't seem to have made any changes since then; I'll experiment with 
the settings you're suggesting.

FWIW, when I run the head of our repository (which is locked to a 
specific software rev for each codec; I'll play with those too), the 
summary report says that the difference is 71%. Feel free to check out 
the latest version!

On 10/14/2013 11:12 PM, Bo Burman wrote:
> Hi all,
>
> We would like to counter Google's suggestion that our test has only "demonstrated that it is possible to reduce VP8's performance" (updated draft on VP8 http://datatracker.ietf.org/doc/draft-alvestrand-rtcweb-vp8/).
>
> In fact, what we did in our test was mostly undoing some very peculiar x264 settings made by Google in their test from April 3. By instead using the x264 settings Google themselves proposed in their earlier test (from March 12), and removing threading, the difference went down from 41% to 16%. (This is without touching the VP8 parameters.)
>
> The last change we made was to remove the rate control from the comparison, something that is standard practice in the world of video standardization. This involved changing both the x264 and VP8 parameters. After that, the difference went down to -1%.
>
> In summary, the following steps were taken in our comparison:
>
> 1) Downloading the latest software: 41% became 36%
> 2) Removing threading: 26%
> 3) Removing bit padding: 18%
> 4) Removing other differences between Google's March 12 and April 3rd tests: 16%
> 5) Removing rate controller: -1%
>
> Contrary to Google's note on our test the purpose was not to "reduce the VP8 performance" but rather to present a technically correct codec comparison. Below follows a detailed description of what we found:
>
> ---
>
> When we first saw Google's test which was posted on April 3rd, we were surprised to find that their results differed so much from our own. Whereas we got a negligible difference between VP8 and H.264 constrained baseline, Google reported that H.264 constrained baseline needed 41% more bits than VP8 for the same quality. This made us look deeper into Google's test to see how this could be explained.
>
> The first thing we did was to download the latest versions of all software packages. By using the newest version of x264, the difference went down from 41% to 36%.
>
> The second thing that caught our attention was that Google's x264 test used auto threading. By omitting the parameter "--threads 1", the x264 encoder defaults to "--threads auto", which means that a large number of threads will be used to compress the image (see for instance http://mewiki.project357.com/wiki/X264_Settings). This will have the effect of decreasing the compression time at the cost of quality. The VP8 codec, on the other hand, defaults to a single thread with no quality degradation. Even when we changed the x264 configuration to use the proper threading value ("--threads 1"), the x264 codec was twice as fast as VP8, and the difference in bit rate went down from 36% to 26%.
>
> At this time we proceeded to only test the first 10 seconds of each sequence in order to get reasonable running times. This actually increased the difference again to 29%.
>
> Our attention now turned to the "--nal-hrd cbr" parameter in Google's x264 April 3rd parameter list. As described at http://mewiki.project357.com/wiki/X264_Settings#nal-hrd, this setting will pack the bitstream with padding bits in order to exactly reach a particular bit rate. This is useful in some circumstances such as in Blu-ray or ISDN video telephony which has to be exactly, say, 64000 bits per second, but it is undesirable in a codec comparison since it will only add bits and not increase quality. The VP8 encoder does not do such bit padding and was thus at an advantage. Removing the "--nal-hrd cbr" parameter for x264 avoided the unnecessary bit padding and the difference now went down from 29% to 18%.
>
> At this point in time we tried removing the remaining variables that differed between the x264 parameter set of Google's first test (from March 12) and the x264 parameter set of Google's second test (from April 3rd), and this resulted in a further decrease from 18% to 16%.
>
> Finally we removed the rate controller from the test and instead used fixed QP. As we have argued previously, having a rate controller in the loop just adds noise to the test and also risks measuring the performance of the rate controller rather than the codec. Using fixed QP (no rate control) is therefore established practice in the video codec community. As an example, the Motion Pictures Expert Group (MPEG) recommends using fixed QP for video comparisons, see for instance (http://mpeg.chiariglione.org/standards/exploration/internet-video-coding/call-proposals-internet-video-coding-technology). The reworked test without rate control (which we published on June 22) then got the result of -1%. This means H.264 constrained baseline (in the x264 implementation) outperformed VP8. We also found H.264 constrained high, again using the x264 implementation, to be 24% better than VP8.
>
> Google's comparison on April 3rd also included a speed test. In this, x264 is set to use only one thread, the parameter "--threads 1" is set. This means that x264 cannot enjoy the faster speed of a parallel implementation. We do not quite understand this choice of parameters from Google: When threading should be avoided (in the case of measuring quality), it is used, whereas when it would be helpful (in the case of measuring speed), it is avoided. In both cases this is unfavorable to x264. This does not seem entirely fair.
>
> Note that the biggest differences in performance were not due the changes made to the VP8 settings but to those of x264: The threading and the bit padding alone accounted for 21 of the 41 percentage points. Thus, we do not think that our test is about trying to "reduce the VP8 performance". Instead, the major things have been avoiding what we believe are unfortunate parameter choices for x264 on behalf of Google, and to create a technically correct test.
>
> Best Regards,
>
> Bo Burman
>
> _______________________________________________
> rtcweb mailing list
> rtcweb@ietf.org
> https://www.ietf.org/mailman/listinfo/rtcweb