Re: [codec] draft-ietf-codec-oggopus and "album" gain

Ron <> Sun, 07 September 2014 20:34 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id 1B7701A0701 for <>; Sun, 7 Sep 2014 13:34:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: 0.8
X-Spam-Status: No, score=0.8 tagged_above=-999 required=5 tests=[BAYES_50=0.8, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id p5Ri0ZJRYqII for <>; Sun, 7 Sep 2014 13:33:57 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id D7C2A1A070A for <>; Sun, 7 Sep 2014 13:33:56 -0700 (PDT)
Received: from (HELO mailservice.shelbyville.oz) ([]) by with ESMTP; 08 Sep 2014 06:03:49 +0930
Received: from localhost (localhost []) by mailservice.shelbyville.oz (Postfix) with ESMTP id 69C36FFD70 for <>; Mon, 8 Sep 2014 06:03:46 +0930 (CST)
X-Virus-Scanned: Debian amavisd-new at mailservice.shelbyville.oz
Received: from mailservice.shelbyville.oz ([]) by localhost (mailservice.shelbyville.oz []) (amavisd-new, port 10024) with LMTP id t8IHGK1EoSVt for <>; Mon, 8 Sep 2014 06:03:45 +0930 (CST)
Received: from hex.shelbyville.oz (hex.shelbyville.oz []) by mailservice.shelbyville.oz (Postfix) with ESMTPS id 69AF2FF88C for <>; Mon, 8 Sep 2014 06:03:45 +0930 (CST)
Received: by hex.shelbyville.oz (Postfix, from userid 1000) id 53F6A80470; Mon, 8 Sep 2014 06:03:45 +0930 (CST)
Date: Mon, 8 Sep 2014 06:03:45 +0930
From: Ron <>
Message-ID: <20140907203345.GA326@hex.shelbyville.oz>
References: <20140813222201.54fe7910@crunchbang> <> <20140816040140.GA31682@hex.shelbyville.oz> <> <20140827153043.2ff5e031@crunchbang> <> <20140827212655.GW326@hex.shelbyville.oz> <20140907163126.GZ326@hex.shelbyville.oz> <20140907180607.290f13ba@crunchbang>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20140907180607.290f13ba@crunchbang>
User-Agent: Mutt/1.5.23 (2014-03-12)
Subject: Re: [codec] draft-ietf-codec-oggopus and "album" gain
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Codec WG <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sun, 07 Sep 2014 20:34:00 -0000

On Sun, Sep 07, 2014 at 06:06:07PM +0100, Ian Nartowicz wrote:
> On Mon, 8 Sep 2014 02:01:26 +0930
> Ron <> wrote:
> >On Thu, Aug 28, 2014 at 06:56:55AM +0930, Ron wrote:
> >> On Wed, Aug 27, 2014 at 08:59:16AM -0700, Ralph Giles wrote:
> >> > 
> >> > "Implementations of this specification MUST respect the 'output gain'
> >> > field, but MAY NOT respect the comments. Encoder authors are advised to
> >> > take this into account. For example, it is more robust for a
> >> > post-processing application to performing track normalization to update
> >> > the 'output gain' field and write a comment 'R128_TRACK_GAIN=0' than to
> >> > put the normalization value directly in the comment."
> >> 
> >> s/to performing/performing/ ?  Or maybe "that is performing"?
> >> 
> >> But yeah, modulo whatever still seems to be misleading Ian,
> >> that seems about right to me.
> >
> >So I believe this is now the last outstanding question from the last
> >call comments that we still have open.
> >
> >There were some concerns raised about this wording, but my questions
> >about exactly what the problem Ian had with it was have so far gone
> >unanswered, which makes it a bit tricky to try to address whatever
> >they really were, and I haven't seen a proposed alternative wording
> >that covers the problem Ralph and I saw with the initial proposed
> >edit leaving it without a rationale (which this version tried to fix).
> >
> >So if someone still thinks this says something that it shouldn't, or
> >isn't properly clear about what it does say, could you please give
> >us a sufficient explanation of that so we can understand and fix the
> >problem, or propose an alternative wording that you think does which
> >we can discuss.
> >
> >Otherwise, this is so far the closest to what I think was originally
> >intended that we have on the table to accept.
> >
> >If we get this one out of the way now, I think we're nearly done.
> >
> >  Thanks,
> >  Ron
> Sorry, I didn't realise there were outstanding questions.  Perhaps if I give a
> concrete example, it will show where I'm coming from:
> In this context, a common use will be to encode a given (probably lossless)
> source into Opus.  The requirement here is to (optionally) be able to play back
> the Opus at the same level as the original source, even if there are R128 gain
> tags present.

I'm not sure what the distinction you're making about 'level' here is
or why it's important for a general purpose player?

[ed: Ian, you might want to skip commenting on this first lot unless
 there is something you *really* disagree with.  I've left it here
 because it is important that we are on the same page about this,
 but if you make it past this to the bottom, I *think* I start to
 see what it is that you're *actually* concerned about ... ]

The R128 tags are indeed a mechanism that people can use to normalise
all of their recordings to some consistent level, so they'll all play
back at about the same volume for a given setting that they've dialled
the volume knob on their player to.

They are completely optional, and we don't mandate that a player must
support them (or use them even if it does), but enough people have
said they wanted that for us to define a standard way to do it if you

My understanding is that having both ALBUM_GAIN and TRACK_GAIN meets
the varied requirements that people wanted for that function.

But the output_gain has no such constraint or meaning attached to it.
It's mandatory to apply precisely because it is part of what defines
the "level of the original source" as defined by whoever mastered
that recording.

The *only* reason it exists is where there is no "lossless source",
since it's quite conceivable that people will make devices which
record directly to Opus, and no "lossless master" will ever exist.
We expect most people will never need to use it, but for the case
where you've accidentally recorded something at a ridiculous level
(whether that is loud or quiet), this gives you a way to losslessly
'fix' that, without having to destructively do a lossy to lossy
re-encoding - which would otherwise be your only option.

We don't say anything about what 'level' you should fix this to
if you do that.  It's entirely at the discretion of whoever goes
"damn, I totally screwed this recording up", and whatever they
decide would be a less screwed up level.

The "playback level" of that file, like any other unnormalised
file, is going to be entirely under the control of the user
adjusting their volume control.  This just lets the person who
mastered it get a file that's way out there, back in the ballpark.

There should be no reason for an 'end user' to ever need or want
to "turn that off" - since at best it will do almost nothing for
them, and at worst it could do Something Terrible.  If they want
a "different" level, they turn their volume knob.  If they want
a "standard" level, they put the file through a tool that adds
R128 tags.

This isn't a "normalisation mechanism" so much as a "tool of last
resort" for salvaging a botched recording that there is no other
good way to save.  If people can't rely on that tool working,
then it's useless to them and they'll just destructively re-encode
in the cases where they really would have needed it and could have
used it.

Does that make sense?  Because if it doesn't, we need to talk
about that more or explain it better in the draft - and if you
think it means something different to that, then we're going to
have a really hard time converging on understanding for anything
else here.

> This requirement will be met, one way or another for good or
> for bad, my me and probably by other authors, but I'd prefer
> to do it in a way that complies with the specification.

Sorry if I'm appearing to be thick here, but I still don't understand
what "This requirement" is?  You've told me "what" you want, but not
*why* that is actually needed or can't already be done with what we

Your example above seems to hinge on there being some prior lossless
source, but that's not the case where the output_gain is useful.

If you have other lossless source, then generally you can simply just
re-encode the file from scratch if you somehow botched the gain on
the transcode, you don't need the output_gain knob to fix that.

> Clearly this is not possible in all cases because there is no explicit
> identification of the original source level or the total amount of gain
> attributable to the R128 tags, but so long as the spec has enough flexibility
> to allow a use case then I can work with it.  Applying normative language to
> placing one of the R128 values into the output gain and then requiring that it
> be applied in all cases did not seen sufficiently flexible.

Hmm.  So your concern is actually with the part that suggests people
MAY wish to put the base normalisation in output_gain and set R128=0?

Not actually with the fact we say you MUST always apply output_gain?

It's not a normative requirement that they do it that way (as opposed
to instead setting output_gain=0 and R128=X) -- it was really just a
clarification that the R128 tags are Optional, and that if you *do*
want the "original level" to be normalised, then using output_gain
would guarantee that "robustly".

But yes, I think I can see how that advice may steer people toward
creating files where a lossless master existed, and wasn't originally
normalised but the Opus encoding is, with no certain way for people
to be able to reliably "unnormalise it" again.

You might for some files be able to deduce that moving the output_gain
value to an R128 tag and setting output_gain=0 *could* do that, but
there is no way to be certain that's true for every file (purely because
the file might be of the sort I described above, but rather than pick
some random level that makes it "not ridiculous" anymore, they did
actually normalise it to R128 when they fixed it, and the original
might still be 'dangerously' useless - though you could again apply
some heuristic based on the amount of gain adjustment seen ...)

That does seem to be an unintended consequence of the "For example"
advice.  Which although not normative in itself, might not actually
be the preferred default for people to use.

> The original
> language of my patch left it entirely up to the encoder and subsequent tools
> where to place the R128 gain values, allowing for them to be placed entirely in
> the tags and therefore identified.  Or not, but that's at least under the
> control of the user.

My main problem with that one was it removed all the context explaining
when and why people might use this and left it too vague for there to
be some reasonable consistency among how people interpret it (since if
everyone does something different, we'll just have a complete mess,
worse that if we allowed no gain tags at all).

But yes, the intent was certainly not for there to be information loss
somewhere as a result of doing this.

So in that light, I think this part should stand:

> "Implementations of this specification MUST respect the 'output gain'
> field, but MAY NOT respect the comments.

Because it's a simple statement that reaffirms tags are always optional
and the header gain is an intrinsic part of the "original recording".

But this part:

> Encoder authors are advised to
> take this into account. For example, it is more robust for a
> post-processing application to performing track normalization to update
> the 'output gain' field and write a comment 'R128_TRACK_GAIN=0' than to
> put the normalization value directly in the comment."

Possibly needs to give some slightly different advice, at least for the
case where R128 normalisation is something that someone retrofits to
a recording that was previously mastered to some acceptable level, to
avoid actually losing useful information about the original in that case.

Does that seem like it would cover what you're worried about?

Do you have some proposed language for what you'd like us to recommend
people do if they want to retrofit R128 normalisation to recordings?