Re: [codec] Ogg Opus zero-length frames

"Timothy B. Terriberry" <tterribe@xiph.org> Thu, 22 August 2013 01:00 UTC

Return-Path: <tterribe@xiph.org>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8C42711E8193 for <codec@ietfa.amsl.com>; Wed, 21 Aug 2013 18:00:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.677
X-Spam-Level:
X-Spam-Status: No, score=-2.677 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HELO_MISMATCH_ORG=0.611, HOST_MISMATCH_COM=0.311, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5rvw-Io4MTQW for <codec@ietfa.amsl.com>; Wed, 21 Aug 2013 18:00:11 -0700 (PDT)
Received: from smtp.mozilla.org (mx2.corp.phx1.mozilla.com [63.245.216.70]) by ietfa.amsl.com (Postfix) with ESMTP id ECF3311E810B for <codec@ietf.org>; Wed, 21 Aug 2013 18:00:10 -0700 (PDT)
Received: from [10.250.6.54] (corp-240.mv.mozilla.com [63.245.220.240]) (Authenticated sender: tterriberry@mozilla.com) by mx2.mail.corp.phx1.mozilla.com (Postfix) with ESMTPSA id 4CD30F225D; Wed, 21 Aug 2013 18:00:09 -0700 (PDT)
Message-ID: <52156299.6080906@xiph.org>
Date: Wed, 21 Aug 2013 18:00:09 -0700
From: "Timothy B. Terriberry" <tterribe@xiph.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:19.0) Gecko/20100101 SeaMonkey/2.16.2
MIME-Version: 1.0
To: Mark Harris <mark.hsj@gmail.com>, codec@ietf.org
References: <CAMdZqKEDk4rJeEWr-0-oxHQDiy+Lk5QQei9-b+yrXLSRYs8GhQ@mail.gmail.com>
In-Reply-To: <CAMdZqKEDk4rJeEWr-0-oxHQDiy+Lk5QQei9-b+yrXLSRYs8GhQ@mail.gmail.com>
Content-Type: multipart/mixed; boundary="------------030106010700090708090409"
Subject: Re: [codec] Ogg Opus zero-length frames
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 22 Aug 2013 01:00:17 -0000

Mark Harris wrote:
> It would be nice to have guidelines for encoding these gaps in the
> Ogg Opus draft, including:

I took a stab at drafting some text that answers these questions (XML 
diff attached):

4.1.  Repairing Gaps in Real-time Streams

    In order to support capturing a real-time stream that has lost
    packets, or that uses discontinuous transmission (DTX), a muxer
    SHOULD emit packets that explicitly request the use of Packet Loss
    Concealment (PLC) in place of the packets that were not transmitted.
    Only gaps that are a multiple of 2.5 ms are repairable, as these are
    the only durations that can be created by packet loss or DTX.  Muxers
    need not handle other gap sizes.  Creating the necessary packets
    involves synthesizing a TOC byte (defined in Section 3.1
    of [RFC6716])---and whatever additional internal framing is needed---
    to indicate the packet duration for each stream.  The actual length
    of each missing Opus frame inside the packet is zero bytes, as
    defined in Section 3.2.1 of [RFC6716].

    [RFC6716] does not impose any requirements on the PLC, but this
    section outlines choices that are expected to have a positive
    influence on most PLC implementations, including the reference
    implementation.  When possible, creating the TOC byte using the same
    mode, audio bandwidth, channel count, and frame size as the previous
    packet (if any) covers all losses that do not include a configuration
    switch, as defined in Section 4.5 of [RFC6716].  This is the simplest
    and usually the most well-tested case for the PLC to handle.  If
    there is no previous packet, reasonable decoders will not emit
    anything other than silence regardless of the mode.  Using the CELT-
    only mode for this case (with any audio bandwidth) allows maximum
    flexibility, since a single packet can represent any duration up to
    120 ms that is a multiple of 2.5 ms using at most two bytes.

    When a previous packet is available, keeping the audio bandwidth and
    channel count the same allows the PLC to provide maximum continuity
    in the concealment data it generates.  However, if the size of the
    gap is not a multiple of the most recent frame size, then the frame
    size will have to change for at least some frames.  Delaying such
    changes as long as possible to simplifies things for PLC
    implementations.  A 95 ms gap could be encoded as 19 5 ms frames in
    two bytes with a single CBR code 3 packet.  If the previous frame
    size was 20 ms, using four 80 ms frames, followed by three 5 ms
    frames requires 4 bytes (plus an extra byte of Ogg lacing overhead),
    but allows the PLC to use its well-tested steady state behavior for
    as long as possible.  The total bitrate of the latter approach,
    including Ogg overhead, is about 0.4 kbps, so the impact on file size
    is minimal.

    Changing modes is discouraged, since this causes some decoder
    implementations to reset their PLC state.  However, SILK and Hybrid
    modes cannot fill gaps that are not a multiple of 10 ms.  If
    switching to CELT mode is needed to match the gap size, doing so at
    the end of the gap allows the PLC to function for as long as
    possible.  Since CELT does not support medium-band audio, using
    wideband when switching from medium-band SILK ensures that any PLC
    implementation that does try to migrate state between the modes will
    not be forced to artificially reduce the bandwidth.

    The synthetic TOC byte MAY use any of codes 0, 1, 2, or 3 to pack the
    frame(s) into a packet.  If the TOC configuration matches, the muxer
    MAY combine the empty frames with previous or subsequent non-zero-
    length frames (using code 2 or VBR code 3).

It's a little bit long, but if people think it's useful (or have 
suggestions for shortening it), we should include it.