[video-codec] Benjamin Kaduk's Discuss on draft-ietf-netvc-testing-08: (with DISCUSS and COMMENT)

Benjamin Kaduk via Datatracker <noreply@ietf.org> Thu, 13 June 2019 05:47 UTC

Return-Path: <noreply@ietf.org>
X-Original-To: video-codec@ietf.org
Delivered-To: video-codec@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 27172120048; Wed, 12 Jun 2019 22:47:29 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-netvc-testing@ietf.org, Matthew Miller <linuxwolf+ietf@outer-planes.net>, netvc-chairs@ietf.org, linuxwolf+ietf@outer-planes.net, video-codec@ietf.org
X-Test-IDTracker: no
X-IETF-IDTracker: 6.97.0
Auto-Submitted: auto-generated
Precedence: bulk
Reply-To: Benjamin Kaduk <kaduk@mit.edu>
Message-ID: <156040484907.27201.12939157708122861483.idtracker@ietfa.amsl.com>
Date: Wed, 12 Jun 2019 22:47:29 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/video-codec/S4HGKlm3OJLD-Nhgeo7R2p8GtLA>
Subject: [video-codec] Benjamin Kaduk's Discuss on draft-ietf-netvc-testing-08: (with DISCUSS and COMMENT)
X-BeenThere: video-codec@ietf.org
X-Mailman-Version: 2.1.29
List-Id: Video codec BoF discussion list <video-codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/video-codec>, <mailto:video-codec-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/video-codec/>
List-Post: <mailto:video-codec@ietf.org>
List-Help: <mailto:video-codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/video-codec>, <mailto:video-codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Jun 2019 05:47:29 -0000

Benjamin Kaduk has entered the following ballot position for
draft-ietf-netvc-testing-08: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-netvc-testing/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

I suspect I will end up balloting Abstain on this document, given how
far it is from something I could support publishing (e.g., a
freestanding clear description of test procedures), but I do think
there are some key issues that need to be resolved before publication.
Perhaps some of them stem from a misunderstanding of the intended goal
of the document -- I am reading this document as attempting to lay out
procedures that are of general utility in evaluating a codec or codecs,
but it is possible that (e.g.) it is intended as an informal summary of
some choices made in a specific operating environment to make a
specific decision.  Additional text to set the scope of the discussion
could go a long way.

Section 2

There's a lot of assertions here without any supporting evidence or
reasoning.  Why is subjective better than objective?  What if objective
gets a lot better in the future?  What if a test should be important but
the interested people don't have the qualifications and the qualified
people are too busy doing other things?

Section 2.1

Why is p<0.5 an appropriate criterion?  Even where p-values are still
used in the scientific literature (which is decreasing in popularity),
the threshold is more often 0.05, or even 0.00001 (e.g., for high-energy
physics).

Section 3

Normative C code contained outside of the RFC being published is hardly
an archival way to describe an algorithm.  There isn't even a git commit
hash listed to ensure that the referenced material doesn't change!

Section 3.5, 3.6, 3.7

I don't see how MSSSIM, CIEDE2000, VMAF, etc. are not normative
references.  If you want to use the indicated metric, you have to follow
the reference.

Section 4.2

There is a dearth of references here.  This document alone is far from
sufficient to perform these calculations.

Section 4.3

There is a dearth of references here as well.  What are libaom and
libvpx?  What is the overlap "BD-Rate method" and where is it specified?

Section 5.2

This mention of "[a]ll current test sets" seems to imply that this
document is part of a broader set of work.  The Introduction should make
clear what broader context this document is to be interpreted within.
(I only note this once in the Discuss portion, but noted some other
examples in the Comment section.)


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

Section 1

Please give the reader a background reading list to get up to speed with
the general concepts, terminology, etc.  (E.g., I happen to know what
the "luma plane" is, but that's not the case for all consumers of the
RFC series.)

Section 2.1

It seems likely that we should note that the ordering of the algorithms
in question should be randomized (presented as left vs. right,
first vs. second, etc.)

Section 2.3

   A Mean Opinion Score (MOS) viewing test is the preferred method of
   evaluating the quality.  The subjective test should be performed as
   either consecutively showing the video sequences on one screen or on
   two screens located side-by-side.  The testing procedure should

When would it be appropriate to perform the test differently?

   normally follow rules described in [BT500] and be performed with non-
   expert test subjects.  The result of the test will be (depending on

(I couldn't follow the links to [BT500] and look; is this a
restricted-distribution document?)

Section 3.4

A forward reference or other expansion for BD-Rate would be helpful.

Section 3.7

   perception of video quality [VMAF].  This metric is focused on
   quality degradation due compression and rescaling.  VMAF estimates

nit: "due to"

Section 4.1

Decibel is a logarithmic scale that requires a fixed reference value in
order for numerical values to be defined (i.e., to "cancel out the
units" before the transcendental logarthmic function is applied).  I
assume this is intended to take the reference as the full-fidelity
unprocessed original signal, but it may be worth making that explicit.

Section 4.2

Why is it necessary to mandate trapezoidal integration for the numerical
integration?  There are fairly cheap numerical methods available that
have superior performance and are well-known.

Section 5.2.x

How important is it to have what is effectively a directory listing in
the final RFC?

Section 5.2.2, 5.2.3

              This test set requires compiling with high bit depth
   support.

Compiling?  Compiling what?  Again, this needs to be set in the broader
context.

Section 5.3

Please expand CQP on first usage.  I don't think the broader scope in
which the "operating modes" are defined has been made clear.

Section 5.3.4, 5.3.5

   supported.  One parameter is provided to adjust bitrate, but the
   units are arbitrary.  Example configurations follow:

Example configurations *of what*?

Section 6.2

   Normally, the encoder should always be run at the slowest, highest
   quality speed setting (cpu-used=0 in the case of AV1 and VP9).
   However, in the case of computation time, both the reference and

What is "the case of computation time"?

   changed encoder can be built with some options disabled.  For AV1, -
   disable-ext_partition and -disable-ext_partition_types can be passed
   to the configure script to substantially speed up encoding, but the
   usage of these options must be reported in the test results.

Again, this is assuming some context of command-line tools that is not
clear from the document.