[codec] Multi-mode frame concatenation

"Benjamin M. Schwartz" <bmschwar@fas.harvard.edu> Fri, 29 July 2011 00:48 UTC

Return-Path: <bmschwar@fas.harvard.edu>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost []) by ietfa.amsl.com (Postfix) with ESMTP id 1540111E8116 for <codec@ietfa.amsl.com>; Thu, 28 Jul 2011 17:48:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.599
X-Spam-Status: No, score=-3.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id w2Pg+V7sP279 for <codec@ietfa.amsl.com>; Thu, 28 Jul 2011 17:48:21 -0700 (PDT)
Received: from us20.unix.fas.harvard.edu (us20.unix.fas.harvard.edu []) by ietfa.amsl.com (Postfix) with ESMTP id 2442511E8092 for <codec@ietf.org>; Thu, 28 Jul 2011 17:48:20 -0700 (PDT)
Received: from us20.unix.fas.harvard.edu (localhost.localdomain []) by us20.unix.fas.harvard.edu (Postfix) with ESMTP id EE8A010FE23 for <codec@ietf.org>; Thu, 28 Jul 2011 20:48:19 -0400 (EDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=fas.harvard.edu; h= message-id:date:from:reply-to:mime-version:to:subject :content-type; s=mail; bh=aXf7b9MTMRPFskXWZEPcKXQ+s/JV50aw3ufUe3 6u5BM=; b=EEjFt/Xfp1Ar6UaclwPy9fWe/VvkXS4xsmC2D3FtgmH8AQhe27okDR 53PBaP/lMuuKHW5S5UaId8VYrHf0mEPDIdDNHh1PCPM3KJZ141arj0Mg0iJARTrt R4hLB4hcMRifnld/zPpvyOWjS3vIAjz5sXm0StkzvXGtX3O6iX4cE=
DomainKey-Signature: a=rsa-sha1; c=simple; d=fas.harvard.edu; h= message-id:date:from:reply-to:mime-version:to:subject :content-type; q=dns; s=mail; b=nfvM1nz1goxfAGQ9U/rUXjmelzhTCgdd lAOlrMVlzY5fJItjD+OpNxC8nrmsGyRAqsv3rsWn4kLYJUW2lXMESWCHnCOGBU9a oWmDFlcHCYucprwAeE8wdDuG+4ZYZgpatPDEIJ+rKiLrKK2NyYTq3xbsFfHJ9R4m eHmapc5qQGM=
Received: from [] (c-71-192-160-188.hsd1.nh.comcast.net []) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: bmschwar@fas) by us20.unix.fas.harvard.edu (Postfix) with ESMTPSA id DC6FF10FE21 for <codec@ietf.org>; Thu, 28 Jul 2011 20:48:19 -0400 (EDT)
Message-ID: <4E320353.5050505@fas.harvard.edu>
Date: Thu, 28 Jul 2011 20:48:19 -0400
From: "Benjamin M. Schwartz" <bmschwar@fas.harvard.edu>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Gecko/20110617 Lightning/1.0b2 Thunderbird/3.1.11
MIME-Version: 1.0
To: "codec@ietf.org" <codec@ietf.org>
X-Enigmail-Version: 1.1.2
Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig5682BAB115C087E679AC5902"
Subject: [codec] Multi-mode frame concatenation
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
Reply-To: bens@alum.mit.edu
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 29 Jul 2011 00:48:22 -0000

Given the hum in favor of frame concatenation within the codec, I would
like to propose the following addition for multi-mode concatenation in
Code 3 (circa page 14):

If M == 49, p == 0, and v == 1, then the packet is a "multi-mode packet".
 The decoder MUST read the next byte to determine the true values of M, p,
and v.  Decode shall then continue as normal, except that each frame shall
be interpreted as a Code 0 packet.  Decoders MUST NOT continue decoding
the packet if any frame indicates a code other than 0.

This design has zero overhead when not used, and should be as simple as
possible to implement and test.  The implementation lives entirely inside
the codec, so its complexity is no more a problem for the user than is the
complexity of the CELT collapse-prevention system.  The described mode is
currently invalid, so it does not represent a significant change; all
previously valid bitstreams are unaffected.

This design improves security by ensuring that a gateway can always
convert any Opus stream into a CBR stream at a constant packet rate.
Currently, this is not true due to mode switching boundaries, resulting in
information leakage (e.g. whether someone is talking or still listening to
the hold music.)

This design improves interoperability by ensuring that anyone who requires
such functionality is not tempted to implement it in their own layer.

This design improves bitrate efficiency when using Code 1-3 by avoiding an
additional packet's overhead (e.g. RTP+UDP+IP) on every mode switch.

This design allows encoders to switch modes freely, with limited and
predictable overhead.  In the present system, each mode switch is
associated with the unpredictable and potentially large cost of forcing a
packet boundary.  Making this cost more predictable allows an encoder to
switch modes more frequently (i.e. employ RDO) in order to improve
compression, for example by varying the bandpass or framesize based on
signal content.

This design is particularly beneficial in certain advanced applications
such as a multichannel extension where a group of packets must all start
and stop on a shared time boundary.  Without this design, all mode changes
in all channels must occur on that boundary as well.