Re: [codec] draft-ietf-codec-oggopus: R128_TRACK_GAIN units

"Timothy B. Terriberry" <> Mon, 08 September 2014 20:11 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id D6DB41A02F2 for <>; Mon, 8 Sep 2014 13:11:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -0.577
X-Spam-Status: No, score=-0.577 tagged_above=-999 required=5 tests=[BAYES_50=0.8, HELO_MISMATCH_ORG=0.611, HOST_MISMATCH_COM=0.311, RCVD_IN_DNSWL_MED=-2.3, SPF_FAIL=0.001] autolearn=ham
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id L_sEF-CwP1F4 for <>; Mon, 8 Sep 2014 13:11:07 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 71F0D1A0322 for <>; Mon, 8 Sep 2014 13:11:06 -0700 (PDT)
Received: from [] ( []) (Authenticated sender: by (Postfix) with ESMTPSA id 7C07BF234B for <>; Mon, 8 Sep 2014 13:11:05 -0700 (PDT)
Message-ID: <>
Date: Mon, 08 Sep 2014 13:11:05 -0700
From: "Timothy B. Terriberry" <>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:29.0) Gecko/20100101 SeaMonkey/2.26
MIME-Version: 1.0
To: "" <>
References: <>
In-Reply-To: <>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Subject: Re: [codec] draft-ietf-codec-oggopus: R128_TRACK_GAIN units
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Codec WG <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 08 Sep 2014 20:11:11 -0000

As an individual...

Mark Harris wrote:
> output gain field of the Ogg Opus header.  Is it expected that these
> units would be adopted by other codecs?

I can't speak for anyone else, but if _I_ were making a new codec, I 
wouldn't see a need to innovate here. This design has the lessons of 
doing this for 15+ years baked into it.

> concerned that if Ogg Opus uses 1/256 LU units, and another format or
> codec uses the much more obvious LU (dB) units, then someone may be in

Then they should pick a different tag name.

> Because there is an existing nearly universally used standard unit, it
> would seem to be much better to use that unit in tag values,

This was tried for ReplayGain tags. 1 dB units are far too coarse, and 
once you add a decimal point, people almost universally implement it 
incorrectly (due to various issues like atof and strtod having 
locale-dependent semantics, locale changes not being thread-safe in C, 
etc.). The choice here was a conscious one designed to avoid the 
failures of past mistakes.

It's also got two years of deployment at this point, so there's now 
non-trivial costs in changing it. If we had a good reason, we could 
change it. But "reduced confusion" is not a good reason. Changing now 
will only increase confusion (another lesson I've learned from letting 
people talk me into such changes in the past): people will be required 
to support both formats forever, despite the fact that one would be 
"standard" and the other wouldn't be.

> containers, and reference level standards, it would seem more sensible
> to put the actual loudness measurement in the tag, and let the player
> adjust it as needed to match the target loudness that it is using for

In a system with infinite dynamic range, sure. In a system without that 
(like, say, the fixed-point decoder provided in the reference 
implementation), starting from a 0 LUFS reference means naive 
implementers will introduce substantial clipping distortions, and then 
later in the pipeline maybe they will make it quieter distortion. A good 
implementation will adjust downward during decoding, before clipping 
occurs. Baking a good default adjustment into the reference (as _all_ 
loudness measurements have some reference... there's no concept of a 
loudness measurement that doesn't) makes the naive implementation into a 
good implementation. Baking in anything other than the R128 recommended 
reference level would, as you like to say, cause confusion.

> all audio sources.  In fact EBU R128 specifically recommends that
> "Loudness Metadata shall correctly indicate the actual Programme
> Loudness", not a gain adjustment.

And that is what the current values already represent. If it makes you 
feel better to think of the offset of 18*256 as a detail of the encoding 
in this file format, then go ahead.

> In light of these issues, would it not be better to record a
> TRACK_LOUDNESS tag in LUFS and an ALBUM_LOUDNESS tag (which would have
> the same value for all tracks that are part of the album) also in

I think it would be worse, for all of the reasons stated above.

> In order to allow a player to adjust the loudness to any desired
> target loudness and dynamic range, ideally the loudness range and
> maximum true peak level would also optionally be available in a

I can't remember if it's been said on the list before, but "true peak" 
is a meaningless concept for a lossy format with a non-bit-exact decoder 
specification, like Opus. The variation in the true peak calculation 
depending on your decoder implementation is theoretically unbounded. So 
you're guaranteed the value going into this tag is garbage.

When we first drafted this spec, we also conducted a survey of all 
open-source software we could find that supported ReplayGain tags. 
Exactly 0 programs did anything sensible with the peak tags. The most 
sensible thing anyone did was ignore them, which fortunately was the 
majority of them. By leaving them out entirely, we're making it easy for 
people to do the right thing.

The reference implementation includes a 0-delay declipping function, 
opus_pcm_soft_clip(). It's run by default when decoding to 16-bit 
integer output. You can call it yourself if you decode to float and 
convert to 16-bit later (as libopusfile does by default). If you don't 
clip, it has zero impact on the decoded output.

It's an excellent choice for those who want to handle clipping without 
adding any more complexity to their decoding pipeline. Maybe it's 
possible to do better by adaptively enabling DRC and using a 
sophisticated reconstruction declipper like the one Monty wrote for 
Postfish, but anyone who can set that up properly would have been one of 
the smart ones who ignored the peak tags to begin with.