Re: [codec] Ogg Opus zero-length frames

Mark Harris <> Fri, 23 August 2013 10:56 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id E705D11E81C6 for <>; Fri, 23 Aug 2013 03:56:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.6
X-Spam-Status: No, score=-2.6 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, NO_RELAYS=-0.001]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id NP-NtIJiN4ki for <>; Fri, 23 Aug 2013 03:56:24 -0700 (PDT)
Received: from ( [IPv6:2a00:1450:400c:c05::231]) by (Postfix) with ESMTP id 2FDDA11E819B for <>; Fri, 23 Aug 2013 03:56:24 -0700 (PDT)
Received: by with SMTP id hq12so405748wib.10 for <>; Fri, 23 Aug 2013 03:56:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=ktGJKVBQ47ofOEhBv4GfPFwvtQE4hPfoDkT84U/k+fg=; b=ULw2IT3HHextXFfzRtFJD4NHwF5VC2gqYbsedI5OUSvLavFEYODfx6tEtijvAs8Kp6 Nt4DEVSSS4xFGIOQWd+uPWl8uNv9uiQjI1XkthKl5hSADB0Tut+mdTAuqarsEY7zBG8y aHnjNuJVnfk2ISGpWmte7R3tgXs4zcql3WT1TmDZjS8LfLdaV7J46RjYEawOBHImbJSK oS4ZtXQtRaq1Uk9oA0PMdlKrY+c8Ix0/FuBLKWeTMPRDLOaMzvsfmBFXm32KkdNO3XhR imr98+qbgvo+PLkqlX72nMbP9DmRVtWMUcpRa1+0wAy4lfwpNUQ/4BKvCGq5vqLdewwy Z8Ug==
MIME-Version: 1.0
X-Received: by with SMTP id a5mr770173wje.48.1377255383351; Fri, 23 Aug 2013 03:56:23 -0700 (PDT)
Received: by with HTTP; Fri, 23 Aug 2013 03:56:23 -0700 (PDT)
In-Reply-To: <>
References: <> <>
Date: Fri, 23 Aug 2013 03:56:23 -0700
X-Google-Sender-Auth: Z0J-qqkqZQTfjXxempyGu2rsZJM
Message-ID: <>
From: Mark Harris <>
To: "Timothy B. Terriberry" <>
Content-Type: text/plain; charset=UTF-8
Subject: Re: [codec] Ogg Opus zero-length frames
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 23 Aug 2013 10:56:25 -0000

Timothy B. Terriberry <> wrote:
> I took a stab at drafting some text that answers these questions (XML diff
> attached):

Thanks; this is very helpful.  I have just a few comments:

> In order to support capturing a real-time stream that has lost
> packets, or that uses discontinuous transmission (DTX), a muxer
> SHOULD emit packets that explicitly request the use of Packet Loss
> Concealment (PLC) in place of the packets that were not transmitted.

lost or not transmitted.

> If
> there is no previous packet, reasonable decoders will not emit
> anything other than silence regardless of the mode.  Using the CELT-
> only mode for this case (with any audio bandwidth) allows maximum
> flexibility, since a single packet can represent any duration up to
> 120 ms that is a multiple of 2.5 ms using at most two bytes. one byte of Ogg lacing.

For initial zero-length frames, might it be better to prefer the
configuration of the first non-zero-length frame to the extent
possible, when available, to help in any situation where the
configuration of the first packet might be used to report
information (such as frame size), or for an initial estimate of
bandwidth, required buffer sizes, etc.?

Or perhaps the last sentence should just be omitted, since it
already effectively says that the mode, bandwidth, and channel
count are unlikely to matter to a decoder in this case.

> Delaying such
> changes as long as possible to simplifies things for PLC
> implementations.

s/to //

> A 95 ms gap could be encoded as 19 5 ms frames in
> two bytes with a single CBR code 3 packet.  If the previous frame
> size was 20 ms, using four 80 ms frames, followed by three 5 ms


> frames requires 4 bytes (plus an extra byte of Ogg lacing overhead),
> but allows the PLC to use its well-tested steady state behavior for
> as long as possible.

To clarify, if the previous frame was 20 ms SILK, is this
suggesting a 4 x 20 ms SILK packet followed by a 3 x 5 ms CELT
packet?  The next paragraph suggests keeping the mode as long as
possible, implying that it may be better to use 4 x 20 ms SILK +
10 ms SILK + 5 ms CELT.  Or is minimizing the number of frame size
changes more important than keeping the mode as long as possible?