Re: [codec] Ogg Opus zero-length frames

"Timothy B. Terriberry" <tterribe@xiph.org> Fri, 23 August 2013 13:51 UTC

Return-Path: <tterribe@xiph.org>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B4CB511E817A for <codec@ietfa.amsl.com>; Fri, 23 Aug 2013 06:51:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.677
X-Spam-Level:
X-Spam-Status: No, score=-2.677 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HELO_MISMATCH_ORG=0.611, HOST_MISMATCH_COM=0.311, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0X3Ad-Bj0U-j for <codec@ietfa.amsl.com>; Fri, 23 Aug 2013 06:51:33 -0700 (PDT)
Received: from smtp.mozilla.org (mx2.corp.phx1.mozilla.com [63.245.216.70]) by ietfa.amsl.com (Postfix) with ESMTP id 2F1DC11E80FF for <codec@ietf.org>; Fri, 23 Aug 2013 06:51:33 -0700 (PDT)
Received: from [172.17.0.5] (50-78-100-113-static.hfc.comcastbusiness.net [50.78.100.113]) (Authenticated sender: tterriberry@mozilla.com) by mx2.mail.corp.phx1.mozilla.com (Postfix) with ESMTPSA id A8F65F2181 for <codec@ietf.org>; Fri, 23 Aug 2013 06:51:31 -0700 (PDT)
Message-ID: <521768E3.5030502@xiph.org>
Date: Fri, 23 Aug 2013 06:51:31 -0700
From: "Timothy B. Terriberry" <tterribe@xiph.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:19.0) Gecko/20100101 SeaMonkey/2.16
MIME-Version: 1.0
To: codec@ietf.org
References: <CAMdZqKEDk4rJeEWr-0-oxHQDiy+Lk5QQei9-b+yrXLSRYs8GhQ@mail.gmail.com> <52156299.6080906@xiph.org> <CAMdZqKHq-03JfRhtC-EOUzcmBdW4uQK5BUejjdF1=OvamoiLhQ@mail.gmail.com>
In-Reply-To: <CAMdZqKHq-03JfRhtC-EOUzcmBdW4uQK5BUejjdF1=OvamoiLhQ@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [codec] Ogg Opus zero-length frames
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 23 Aug 2013 13:51:40 -0000

Mark Harris wrote:
> lost or not transmitted.

Done.

> ...plus one byte of Ogg lacing.

See below.

> For initial zero-length frames, might it be better to prefer the
> configuration of the first non-zero-length frame to the extent
> possible, when available, to help in any situation where the
> configuration of the first packet might be used to report
> information (such as frame size), or for an initial estimate of
> bandwidth, required buffer sizes, etc.?

I thought about this a little when writing the original text, and I'm 
not sure it helps much. It's not a bad thing to do, both here and when 
you're forced to change modes to match the gap duration, but the 
benefits are small (you can't really do a good job estimating initial 
bandwidth/buffer sizes from 0-byte frames, for example), and it means 
you can't write out anything until you get the first packet after the 
gap. Most RTP stacks, on the other hand, are going to declare packets 
lost and generate PLC without necessarily waiting for that to arrive.

> Or perhaps the last sentence should just be omitted, since it
> already effectively says that the mode, bandwidth, and channel
> count are unlikely to matter to a decoder in this case.

In practice I only expect initial gaps to be useful in conjunction with 
other streams (e.g., video), since otherwise you would just take the 
arrival of the first Opus packet as the start of the stream, so perhaps 
it isn't worth spending much text on them. If you don't think this is 
necessary, let's just drop it.

> s/to //

Done.

> s/80/20/

Done.

>> frames requires 4 bytes (plus an extra byte of Ogg lacing overhead),
>> but allows the PLC to use its well-tested steady state behavior for
>> as long as possible.
>
> To clarify, if the previous frame was 20 ms SILK, is this
> suggesting a 4 x 20 ms SILK packet followed by a 3 x 5 ms CELT
> packet?  The next paragraph suggests keeping the mode as long as
> possible, implying that it may be better to use 4 x 20 ms SILK +
> 10 ms SILK + 5 ms CELT.  Or is minimizing the number of frame size
> changes more important than keeping the mode as long as possible?

No, the other way around. I changed this example a few times and didn't 
think through the last version very well, apparently. 4x20 ms SILK + 
10ms SILK + 5 ms CELT would be better. I'll rework this to clarify.

Thanks for the review!