Re: [AVT] Submission and request for feedback on draft-valin-celt-rtp-profile-00.txt

Gregory Maxwell <gmaxwell@juniper.net> Mon, 09 March 2009 18:12 UTC

Return-Path: <gmaxwell@juniper.net>
X-Original-To: avt@core3.amsl.com
Delivered-To: avt@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 9746D3A6C2A for <avt@core3.amsl.com>; Mon, 9 Mar 2009 11:12:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.576
X-Spam-Level:
X-Spam-Status: No, score=-6.576 tagged_above=-999 required=5 tests=[AWL=0.023, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3027+uCE97yU for <avt@core3.amsl.com>; Mon, 9 Mar 2009 11:12:12 -0700 (PDT)
Received: from exprod7og116.obsmtp.com (exprod7og116.obsmtp.com [64.18.2.219]) by core3.amsl.com (Postfix) with ESMTP id 96BD53A6922 for <avt@ietf.org>; Mon, 9 Mar 2009 11:12:12 -0700 (PDT)
Received: from source ([66.129.224.36]) (using TLSv1) by exprod7ob116.postini.com ([64.18.6.12]) with SMTP ID DSNKSbVcEo/NcaoWCjsTHOCY3pqdptRbUbGm@postini.com; Mon, 09 Mar 2009 11:12:47 PDT
Received: from p-emfe01-sac.jnpr.net (66.129.254.72) by P-EMHUB01-HQ.jnpr.net (172.24.192.35) with Microsoft SMTP Server id 8.1.340.0; Mon, 9 Mar 2009 10:26:46 -0700
Received: from p-emlb02-sac.jnpr.net ([66.129.254.47]) by p-emfe01-sac.jnpr.net with Microsoft SMTPSVC(6.0.3790.3959); Mon, 9 Mar 2009 10:26:45 -0700
Received: from emailsmtp55.jnpr.net ([172.24.18.132]) by p-emlb02-sac.jnpr.net with Microsoft SMTPSVC(6.0.3790.3959); Mon, 9 Mar 2009 10:26:45 -0700
Received: from antihadron.jnpr.net ([172.24.60.31]) by emailsmtp55.jnpr.net with Microsoft SMTPSVC(6.0.3790.1830); Mon, 9 Mar 2009 10:26:45 -0700
Content-Class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-MimeOLE: Produced By Microsoft Exchange V6.5
Date: Mon, 09 Mar 2009 10:26:44 -0700
Message-ID: <C5664E27013B564EBFA8884606D2439106B3358B@antihadron.jnpr.net>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: [AVT] Submission and request for feedback on draft-valin-celt-rtp-profile-00.txt
Thread-Index: AcmgxCjRtnnydXyhSZuf/zvjX77vHgAD/zF8
References: <C5664E27013B564EBFA8884606D2439106B33589@antihadron.jnpr.net> <ybu1vt8hay0.fsf@jesup.eng.wgate.com> <49B52906.5000300@octasic.com>
From: Gregory Maxwell <gmaxwell@juniper.net>
To: Jean-Marc Valin <jean-marc.valin@octasic.com>, Randell Jesup <rjesup@wgate.com>
X-OriginalArrivalTime: 09 Mar 2009 17:26:45.0247 (UTC) FILETIME=[382CA8F0:01C9A0DC]
Cc: avt@ietf.org
Subject: Re: [AVT] Submission and request for feedback on draft-valin-celt-rtp-profile-00.txt
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Audio/Video Transport Working Group <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/avt>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 09 Mar 2009 18:12:13 -0000

Jean-Marc Valin wrote:
> Randell Jesup wrote:
[snip]

> Yes, the bit-rate information is actually the compressed frame length
> itself.

To clarify: CELT is designed assuming a length-preserving transport.
If a CELT frame of N bytes is received, the receiver assumes the
encoder encoded that frame at a rate of sample_rate/frame_samples*N
bytes per second.

For naturally length preserving transports (such as raw UDP, RTP, or
most other transports) no signalling of bitrate is required at all,
internally to the codec or externally, beyond the length preservation.

However, since we needed to support multiple frames per packet for RTP
(for multichannel audio, and to allow flexible trade-off of latency vs
overhead), we needed to provide an in-channel length encoding that the
decoder can use differentiate the bitrates used by the component frames
of the packet.

This is accomplished by explicitly enumerating the lengths of the packed
frames at the start of the packet payload.  The transport still must be
length preserving since the total packet length must be known to determine
when to stop reading lengths.  This is perfectly robust against packet
loss, but should be protected against bit-error.  I suppose it might
be worthwhile to make a recommendation regarding CRC coverage (i.e.
for UDP light).

We also considered an alternative of encoding the number of frames
per packet in a byte, followed by N-1 lengths, but decided that simply
encoding N lengths would likely be slightly easier to implement and would
likely have somewhat better bit error robustness properties in practice.

(Of if someone has reason to believe that another scheme is more likely
to be correctly implemented…)

This length encoding process must be made sufficiently clear in the
draft. If it isn't currently then that needs to be resolved.

>>> CELT allows for bitrate adjustment in one byte per frame increments
>>> without any signaling requirement or overhead. Applications SHOULD
>>> utilize congestion control to regulate the transmitted bitrate.  In 
>>> some applications it may make sense to increase the packetization 
>>> interval rather than decreasing the codec bitrate.  Congestion 
>>> control implementations should consider the users differential 
>>> tolerance for high latency and low quality.
>> That would be (I assume) really 1 byte per channel per frame, not 1 byte 
>> per frame.  
> Well, technically you can increase the bit-rate by one byte for only one 
> channel...

Perhaps the clarity of the draft would be improved with some standalone
definitions:

* Packet  — A unit of network transmission, containing IP/UDP/RTP
headers and one or more CELT frames.

* Frame — A unit of codec operation, representing frame_samples samples
for one or two channels of audio.

* Channel — A stream of audio, transported in groups of one or two in a
single frame by CELT

The codec can adjust its instantaneous bitrate with one byte per frame
granularity. This means that for congestion control purposes one byte
per packet granularity is available regardless of the number of channels
encoded or frames per packet, so long as the encoder is willing to use
slightly unequal bitrates per frame.

The logic the sender uses in distributing the available capacity to frames
(and channels) isn't relevant for interoperability, so I don't see cause
to make a recommendation. However, if it would make the draft more clear
it would be possible to do so.

>> DTLS?

Quite reasonable.  Most of that section is from the boiler-plate RTP
profile security considerations in draft-ietf-avt-rtp-howto-06.txt. We
can add that to draft-valin-celt-rtp-profile-00.txt but the changes should
also be made to the rtp-howto unless there is a reason not to do so.

Our enhancements to that section consist of recommendations regarding
timing and length-information-leakage attacks, the latter of which is
somewhat CELT specific since CELT can fill any target byte-size.