Re: [codec] I-D Action: draft-ietf-codec-oggopus-00.txt

Ron <ron@debian.org> Sat, 30 March 2013 07:28 UTC

Return-Path: <ron@debian.org>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7DF2821F8B25 for <codec@ietfa.amsl.com>; Sat, 30 Mar 2013 00:28:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.423
X-Spam-Level:
X-Spam-Status: No, score=-1.423 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, FH_HOST_EQ_D_D_D_D=0.765, HOST_MISMATCH_NET=0.311, RDNS_DYNAMIC=0.1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PFlWCxmdBwfs for <codec@ietfa.amsl.com>; Sat, 30 Mar 2013 00:28:04 -0700 (PDT)
Received: from ipmail05.adl6.internode.on.net (ipmail05.adl6.internode.on.net [IPv6:2001:44b8:8060:ff02:300:1:6:5]) by ietfa.amsl.com (Postfix) with ESMTP id 46D6A21F8B18 for <codec@ietf.org>; Sat, 30 Mar 2013 00:28:03 -0700 (PDT)
Received: from ppp121-45-66-146.lns20.adl6.internode.on.net (HELO audi.shelbyville.oz) ([121.45.66.146]) by ipmail05.adl6.internode.on.net with ESMTP; 30 Mar 2013 17:58:02 +1030
Received: from localhost (localhost [127.0.0.1]) by audi.shelbyville.oz (Postfix) with ESMTP id 9F16E4F8F3 for <codec@ietf.org>; Sat, 30 Mar 2013 17:58:01 +1030 (CST)
X-Virus-Scanned: Debian amavisd-new at audi.shelbyville.oz
Received: from audi.shelbyville.oz ([127.0.0.1]) by localhost (audi.shelbyville.oz [127.0.0.1]) (amavisd-new, port 10024) with LMTP id GomxKO4HPh3R for <codec@ietf.org>; Sat, 30 Mar 2013 17:58:00 +1030 (CST)
Received: by audi.shelbyville.oz (Postfix, from userid 1000) id B36774F902; Sat, 30 Mar 2013 17:58:00 +1030 (CST)
Date: Sat, 30 Mar 2013 17:58:00 +1030
From: Ron <ron@debian.org>
To: codec@ietf.org
Message-ID: <20130330072800.GU19099@audi.shelbyville.oz>
References: <20121119225213.13225.30835.idtracker@ietfa.amsl.com> <50AABA00.6030102@xiph.org> <515609F7.7090402@jmvalin.ca>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <515609F7.7090402@jmvalin.ca>
User-Agent: Mutt/1.5.20 (2009-06-14)
Subject: Re: [codec] I-D Action: draft-ietf-codec-oggopus-00.txt
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 30 Mar 2013 07:28:05 -0000

Hi Jean-Marc,

A few quick "first-reading" comments on this ...

On Fri, Mar 29, 2013 at 05:39:03PM -0400, Jean-Marc Valin wrote:
> Hi Tim,
> 
> Here's a revised patch that describes guidelines for encoders. These
> guidelines are related to gapless support, but I'm sure there's other
> recommendations we should give encoder implementers. The general idea is
> to minimize the number of broken or suboptimal encoders.


> diff --git a/doc/draft-ietf-codec-oggopus.xml b/doc/draft-ietf-codec-oggopus.xml
> index ab20be5..9c23cbf 100644
> --- a/doc/draft-ietf-codec-oggopus.xml
> +++ b/doc/draft-ietf-codec-oggopus.xml
> @@ -1138,6 +1138,74 @@ An implementation could reasonably choose any of these numbers for its internal
>  </t>
>  </section>
>  
> +<section anchor="encoder" title="Encoder Guidelines">
> +<t>
> +When encoding Opus files, Ogg encoders should take into account the
> +algorithmic delay of the Opus encoder. In encoders derived from the reference
> +implementation, the number of samples can be queried with:
> +opus_encoder_ctl(encoder_state, OPUS_GET_LOOKAHEAD, &amp;samples_delay);
> +To achieve good quality in the very first samples of a stream, the Ogg encoder
> +MAY use LPC extrapolation to generate at least 120 extra samples
> +(extra_samples) at the beginning to avoid the Opus encoder having to encode
> +a discontinuous signal.
> +For an input file containing length samples, the Ogg encoder, SHOULD set the
> +preskip header flag to samples_delay+extra_samples, encode at least
> +length+samples_delay+extra_samples samples, and set the granulepos of the last
> +page to length+samples_delay+extra_samples.
> +This ensures that the encoded file has the same duration as the original, with
> +no time offset. The best way to pad the end of the stream is to also use LPC
> +extrapolation, but zero-padding is also acceptable.
> +</t>
> +
> +<section anchor="lpc" title="LPC Extrapolation">
> +<t>
> +The first step in LPC extrapolation is to compute linear prediction coefficients.
> +When extending the end of the signal, order-N (typically with N ranging from 8
> +to 40) LPC analysis is performed on a window near the end of the signal. The last
> +N samples are used as memory to an infinite impulse response (IIR) filter. The
> +filter is then applied on a zero input to extrapolate the end of the signal. Let
> +a(k) be the kth LPC coefficient and x(n) be the nth sample of the signal, each
> +new sample past the end of the signal is computed as:
> +<artwork align="center"><![CDATA[
> +        N
> +       ---
> +x(n) = \   a(k)*x(n-k)
> +       /
> +       ---
> +       k=1
> +]]></artwork>
> +The process is repeated independently for each channel. It is possible to extend
> +the beginning of the signal by applying the same process backward in time. When
> +extending the beginning of the signal, it is best to apply a "fade in" to the
> +extrapolated signal, e.g. by multiplying it by a half-Hanning window.
> +</t>

If this is going to be generally recommended, I wonder if we shouldn't provide
a 'reference implementation' of it in libopus, and also make some reference to
that here?


In the following section, I wonder if we need to provide a few more specifics
(and also possibly some API support in libopus), since this draft primarily
is focussed on an encapsulation format, and many people implementing it will
likely be using a separate encoder library, quite possibly without having, or
otherwise needing, a very detailed knowledge of the Opus spec itself or the
internals of the encoder they are using.

The issues you raise here, while important, do kind of blur the lines between
"encapsulating the output of an encoder" and "doing additional encoding above
and beyond what the encoder you may be using does".  And they aren't specific
to the Ogg encapsulation, other mechanisms would surely get the same benefits
from them too.

> +</section>
> +
> +<section anchor="continuous_chaining" title="Continuous Chaining">
> +<t>
> +In some applications, such as Internet radio, it is desirable to cut a long
> +streams into smaller chains, e.g. so the comment header can be updated.
> +This can be done simply by separating the input streams into segments and
> +encoding each segment independenty.

s/independenty/independently/

> +The ony drawback of this approach is that it creates a small discontinuity
> +at the boundary due to the lossy nature of Opus.
> +An encoder MAY avoid this discontinuity by using the following procedure:
> +<list style="numbers">
> +<t>On the last frame of the first stream, encoding an independent frame by
> +turning off all forms of inter-frame prediction (de-emphasis is allowed).</t>

How do people turn that off?

> +<t>setting the granulepos of the past page to a point near the end of the last
> +frame.</t>

How do they determine where that point should be?

> +<t>Beginning the new stream with a copy of the last frame of the first
> +stream.</t>
> +<t>Setting the preskip flag of the second stream in such a way as to properly
> +join the two streams.</t>

What such ways would ensure this is done properly?

> +<t>Continuing the encoding process normally from there, without any reset to
> +the encoder.</t>
> +</list>
> +</t>
> +</section>

The answers to these questions may be obvious to someone who has studied the
Opus spec and the encoder that they are using in detail, but in the context
of this draft I suspect we should probably offer slightly stronger guidance
about how to get this right, if we are going to include these concerns here.

 Cheers,
 Ron