[video-codec] Charter issues from BoF

"Timothy B. Terriberry" <tterribe@xiph.org> Tue, 06 November 2012 16:26 UTC

Return-Path: <tterribe@xiph.org>
X-Original-To: video-codec@ietfa.amsl.com
Delivered-To: video-codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8BA8A21F892C for <video-codec@ietfa.amsl.com>; Tue, 6 Nov 2012 08:26:37 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.702
X-Spam-Level:
X-Spam-Status: No, score=-0.702 tagged_above=-999 required=5 tests=[AWL=-1.976, BAYES_00=-2.599, FRT_PROFILE1=2.555, HELO_MISMATCH_ORG=0.611, HOST_MISMATCH_COM=0.311, MIME_QP_LONG_LINE=1.396, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id njWVkt79+Pek for <video-codec@ietfa.amsl.com>; Tue, 6 Nov 2012 08:26:33 -0800 (PST)
Received: from smtp.mozilla.org (mx1.corp.phx1.mozilla.com [63.245.216.69]) by ietfa.amsl.com (Postfix) with ESMTP id ED0DA21F8A17 for <video-codec@ietf.org>; Tue, 6 Nov 2012 08:26:28 -0800 (PST)
Received: from kizuka.merseine.nu (c-69-181-137-38.hsd1.ca.comcast.net [69.181.137.38]) (Authenticated sender: tterriberry@mozilla.com) by mx1.mail.corp.phx1.mozilla.com (Postfix) with ESMTPSA id AFC23F25BF for <video-codec@ietf.org>; Tue, 6 Nov 2012 08:26:26 -0800 (PST)
Received: by kizuka.merseine.nu (Postfix, from userid 81) id F3CB275C02F; Tue, 6 Nov 2012 11:26:25 -0500 (EST)
Message-ID: <20121106112625.2btpoxrylcgg8w4c@kizuka.merseine.nu>
Date: Tue, 06 Nov 2012 11:26:25 -0500
From: "Timothy B. Terriberry" <tterribe@xiph.org>
To: video-codec@ietf.org
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"; DelSp="Yes"; format="flowed"
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
User-Agent: Internet Messaging Program (IMP) H3 (4.1.4)
Subject: [video-codec] Charter issues from BoF
X-BeenThere: video-codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Video codec BoF discussion list <video-codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/video-codec>, <mailto:video-codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/video-codec>
List-Post: <mailto:video-codec@ietf.org>
List-Help: <mailto:video-codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/video-codec>, <mailto:video-codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Nov 2012 16:26:37 -0000

I wanted to follow up on the "lessons learned" that Cullen presented  
during the BoF,  
<http://www.ietf.org/proceedings/85/slides/slides-85-videocodec-7.pdf>, which  
he said he wished had been addressed in the charter.

> 1. Don't use source code as a normative spec

I think there's broad agreement that this is a good idea this time  
around, but in the codec WG we specified this in a Guidelines document  
(RFC 6569). The proposed charter says that we'll use that document as  
the starting point for the guidelines for this effort, making changes  
as appropriate. We can certainly list this as one such change right now.

> 2. Be clear about "optimized for the internet"

This was left somewhat vague because we're still getting feedback on  
precisely what "the internet" (i.e., the IETF) thinks is important,  
but there are certainly a number of specific examples of ways we could  
improve on existing codecs:

1) Fast/flexible congestion control (e.g., resolution changes without  
keyframes, etc.)
2) Simplify the interaction of packet loss recovery with reference  
frame re-ordering (should we even use such re-ordering... if we don't,  
the interaction is _very_ simple).
3) "Fast channel switching", i.e., the ability to join broadcast  
streams without waiting for a keyframe.
4) Support for screen captures (remote services, desktop sharing,  
etc.), which may require special coding tools, non-subsampled chroma  
(a feature notably lacking from VP8), etc.

These are just some of the things that immediately spring to mind  
given the <video> tag and WebRTC use-cases. I'm sure there's many  
more. I'm planning to meet with Janardhan Iyengar to discuss the  
interaction of video and transport on Thursday during the 15:10  
session (others are welcome to join us: location currently TBD, but  
send e-mail and I'll make sure to let you know).

We can certainly list some of these examples in the charter, but I  
think it's somewhat premature to say those are the only things we're  
going to do, or even that we're going to do all of them. There was  
some criticism that "optimized for the internet" was ill-defined for  
Opus, but if you look at the actual result, you can see that it  
informed a lot of decisions, resulting in a codec that operates quite  
a bit differently from most audio codecs:  
<http://www.ietf.org/mail-archive/web/rtcweb/current/msg05205.html>.

> 3. Better plan for Liasons with other SDOs - particularly existing Joint *

So, I agree with Stephan's comments during the BoF that trying to set  
up something like the JCTVC with _both_ the ITU and ISO would be a  
multi-year effort in and of itself, making it somewhat impractical.  
But I certainly agree the current language in the charter on this  
subject could be improved. What text would people _like_ to see here?

> 4. Sort out preferred licensing terms early.

We (Xiph.Org/Mozilla) obviously have some strong preferences here. You  
can look at our Opus disclosures to see what they are. Given the  
strong reactions against discussing such issues on the list with Opus,  
I'm hesitant to specify what those terms should be in the charter.

> 5. Be clear about targets for coding efficiency.

Speaking personally, as long as we have a significant advantage over  
existing royalty-free options (e.g., Theora and VP8), then it is  
worthwhile publishing the result. Some people think we should strive  
for much more (i.e., significantly better than HEVC), and I think  
that's great, but if we were merely "competitive" with HEVC, I  
wouldn't count this as a failure. Greg Maxwell has language to this  
effect in his requirements draft. Should there be similar language in  
the charter as well?

> 6. Decide if you are doing one codec or many.

I think we should do one codec.

> 7. Have a strategy for achieving RF.

The most important part of this strategy is already specified in the  
charter, namely that the codec be developed "Under the IPR rules of  
the IETF." I discussed that in a little bit more detail at the BoF.

> 8. Be clear if the WG is creating new technology or selecting existing
> technology.

Given the existing technology I'm aware of, I have no problem saying  
we're going to be creating something new here. That might preclude the  
possibility of the JCTVC offering the world HEVC royalty-free via the  
IETF, but that proposition seems so vanishingly unlikely that it won't  
keep me up at night.

> 9. Use signaling to have fine grained enablement of features.

Regardless of any IPR implications, this feels like a technical  
discussion. I.e., there are implications about interoperability,  
profiling, and testing here. You don't want more than 8...10 of these,  
or you add an enormous burden if you want to test all combinations  
exhaustively. I'm not saying this is a bad idea, just that there's a  
lot of details to work through (beyond the obvious "what features  
should the flags affect?"). If someone has an idea of something useful  
we can say in the charter on this subject, I'm all ears.

> 10. Have a clear idea how to get test results to inform WG decisions.

Fortunately, we're in a much better position with video than we were  
with audio. The objective metrics are more useful... everyone knows  
they're still flawed in various ways, but you can still make a lot of  
progress relying on such metrics (see the various ITU/MEPG efforts  
that rely on them exclusively). PSNR measured on a single, short clip  
comparing mostly similar algorithms actually correlate with human  
observer ratings pretty well [1]. Most comparisons between different  
codecs rely on them, too.

By contrast, in audio optimizing for SNR is actively harmful, and even  
more advanced metrics like PEAQ are essentially blind to things like  
transients, which are one of the most important source of artifacts  
for a transform codec. So if we wanted useful results, we didn't have  
a lot of options other than relying on human listening tests.

Humans are still the ultimate gold standard for video, of course. At  
least for the purposes of getting a technique adopted, I think we can  
just say the burden of proof lies on the person proposing the  
technique. If the visual improvement is large enough, even if the  
objective metrics say it looks worse, it shouldn't actually take much  
to convince people it's a good idea.

[1] http://ieeexplore.ieee.org/iel5/2220/4550681/04550695.pdf?arnumber=4550695