Re: [codec] Suggestions for OggOpus draft

Ralph Giles <giles@thaumas.net> Mon, 24 September 2012 18:03 UTC

Return-Path: <giles@thaumas.net>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E7CCD21F880D for <codec@ietfa.amsl.com>; Mon, 24 Sep 2012 11:03:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.599
X-Spam-Level:
X-Spam-Status: No, score=-3.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id O2ZNnv6yvXcS for <codec@ietfa.amsl.com>; Mon, 24 Sep 2012 11:03:05 -0700 (PDT)
Received: from mail-da0-f44.google.com (mail-da0-f44.google.com [209.85.210.44]) by ietfa.amsl.com (Postfix) with ESMTP id 738F921F8803 for <codec@ietf.org>; Mon, 24 Sep 2012 11:03:05 -0700 (PDT)
Received: by danh15 with SMTP id h15so196692dan.31 for <codec@ietf.org>; Mon, 24 Sep 2012 11:03:05 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding :x-gm-message-state; bh=uYASA5a9DubfdMdigBlKOnEe8/YBxkwEBvtHnfx6UoI=; b=VLOqnBD6XVx3Hqj1Hfm/ZMEDelmzD1knIg8iC6vRl09hd4G7g204kS8Q0kLcWtvrsD A70GrmjRZKs8S7PlV9LE9FaOg45+a/7CIp+RnBXW8qdJdG2P9guJvL/KDbxOmz5PX/wU ZY43DHwrli8WSPvUocv3QPBI1yDjp1gcR7Nizej4YadvJBElsLMUGb2fW/1WiCRI7Ngs nn09br7CuVU1aVORZBsVc5cyEf7Y3dFMDJxXkU7v7g50Us5UBYUc03fU7A5zk29hbbTn 2upQgRb7Tinum6d4uzNDbjCdiYMVexZa+3/VzvJ2rz8kC+6dCmIC6OiYftlDoUOtRBdD Pl7A==
Received: by 10.68.197.194 with SMTP id iw2mr38785594pbc.121.1348509785178; Mon, 24 Sep 2012 11:03:05 -0700 (PDT)
Received: from Glaucomys.local (static-68-179-67-73.ptr.terago.net. [68.179.67.73]) by mx.google.com with ESMTPS id pj10sm9959629pbb.46.2012.09.24.11.03.03 (version=SSLv3 cipher=OTHER); Mon, 24 Sep 2012 11:03:04 -0700 (PDT)
Message-ID: <5060A056.4000307@thaumas.net>
Date: Mon, 24 Sep 2012 11:03:02 -0700
From: Ralph Giles <giles@thaumas.net>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:15.0) Gecko/20120907 Thunderbird/15.0.1
MIME-Version: 1.0
To: Jean-Marc Valin <jmvalin@jmvalin.ca>
References: <50609935.3080900@jmvalin.ca>
In-Reply-To: <50609935.3080900@jmvalin.ca>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
X-Gm-Message-State: ALoCoQnZct1PVRXloKQE/GjArdA22ehLdKyU3+y9B3vlE/9SIP6toGQe+EdoBNKCzV74wcvHuduS
Cc: "codec@ietf.org" <codec@ietf.org>
Subject: Re: [codec] Suggestions for OggOpus draft
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 24 Sep 2012 18:03:06 -0000

On 12-09-24 10:32 AM, Jean-Marc Valin wrote:

> I suggest adding these two sections to the OggOpus draft, one that
> describes how to handle chaining and one that has recommendations for
> encoders. I'm sure there's still more things an encoder can do that I
> haven't listed.

Thanks for writing this up.  A couple of questions:

By 'linear prediction' do you mean the LPC-based lapping the opusenc
utility already uses on the final packet?

Is having the encoder supply predicted audio only to help decoders which
don't do a cross-fade, or does it also help preserve fidelity
across the boundary even in the case that the decoder does something to
mitigate the discontinuity?

I guess the issue is that if the encoder is processing a section of a
larger stream, with the intention that the output will generally be
concatenated, like tracks within an album with continuous audio, it
should pad with the actual earlier and later samples. Prediction should
therefore be used to smooth the transition only if adjacent audio isn't
available? Or does that decision depend on the quality of the adjacent
audio. E.g. a large transient would disturb the transition when the next
segment *isn't* the expected one, in shuffle play or a custom playlist,
for example?

These don't seem like they're practices decoders can rely on, so they
need to handle dicontinuities regardless. I think it is useful to give
some suggestions in the draft, though if we can reduce the emphasis a bit.

 -r