[AVT] Problems with uRTR: draft-ietf-avt-variable-rate-audio-00.txt
Magnus Westerlund <magnus.westerlund@ericsson.com> Thu, 28 October 2004 12:37 UTC
Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id IAA07685 for <avt-archive@ietf.org>; Thu, 28 Oct 2004 08:37:07 -0400 (EDT)
Received: from megatron.ietf.org ([132.151.6.71]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1CN9ki-0008L1-QZ for avt-archive@ietf.org; Thu, 28 Oct 2004 08:51:41 -0400
Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1CN9TX-0001ht-EW; Thu, 28 Oct 2004 08:33:55 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1CN9QK-0000yd-TU for avt@megatron.ietf.org; Thu, 28 Oct 2004 08:30:36 -0400
Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id IAA07157 for <avt@ietf.org>; Thu, 28 Oct 2004 08:30:36 -0400 (EDT)
Received: from penguin.ericsson.se ([193.180.251.47]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1CN9eE-0008Cc-IS for avt@ietf.org; Thu, 28 Oct 2004 08:45:09 -0400
Received: from esealmw141.al.sw.ericsson.se ([153.88.254.120]) by penguin.ericsson.se (8.12.10/8.12.10/WIREfire-1.8b) with ESMTP id i9SCUOfM019607 for <avt@ietf.org>; Thu, 28 Oct 2004 14:30:24 +0200 (MEST)
Received: from esealnt610.al.sw.ericsson.se ([153.88.254.120]) by esealmw141.al.sw.ericsson.se with Microsoft SMTPSVC(6.0.3790.211); Thu, 28 Oct 2004 14:30:24 +0200
Received: from [147.214.34.66] (research-1fd0e1.ki.sw.ericsson.se [147.214.34.66]) by esealnt610.al.sw.ericsson.se with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2657.72) id VJQFJJGJ; Thu, 28 Oct 2004 14:30:24 +0200
Message-ID: <4180E65F.2060402@ericsson.com>
Date: Thu, 28 Oct 2004 14:30:23 +0200
X-Sybari-Trust: f87e779b ad48f3dd 0a52791a 00000179
From: Magnus Westerlund <magnus.westerlund@ericsson.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.2) Gecko/20040803
X-Accept-Language: sv, en-us, en
MIME-Version: 1.0
To: IETF AVT WG <avt@ietf.org>
Content-Type: text/plain; charset="us-ascii"; format="flowed"
Content-Transfer-Encoding: 7bit
X-OriginalArrivalTime: 28 Oct 2004 12:30:24.0207 (UTC) FILETIME=[E5CD85F0:01C4BCE9]
X-Spam-Score: 0.3 (/)
X-Scan-Signature: 93e7fb8fef2e780414389440f367c879
Content-Transfer-Encoding: 7bit
Subject: [AVT] Problems with uRTR: draft-ietf-avt-variable-rate-audio-00.txt
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Audio/Video Transport Working Group <avt.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
Sender: avt-bounces@ietf.org
Errors-To: avt-bounces@ietf.org
X-Spam-Score: 0.3 (/)
X-Scan-Signature: 0ff9c467ad7f19c2a6d058acd7faaec8
Content-Transfer-Encoding: 7bit
Hi, I have discussed the issue around RTP timestamp for variable sampling rate codecs with my colleagues and also Colin to get as good idea of the issues as possible. We also went through the exercise of to see what the usage of an uRTR (unified RTP timestamp rate) concept would mean for the RTP payload format for AMR-WB+. This revealed a number of issues. Lets start with a brief explanation of the system view for codec like AMR-WB+: 1. Input sampling (Input Sampling Rate) -> 2. encoding into frames using a specific codec internal sampling frequency (ISF) -> 3. RTP packetization, and assigning an RTP TS value for each frame (RTP TS rate) -> 4. Transmission -> 5. RTP Reception -> 6. Buffering -> 7. Decoding to Output Sampling Rate -> 8. Audio playout (Output Sampling Rate) In such a codec system, the Input Sampling Frequency must not necessary match the output sampling frequency. The audio signals bandwidth is dependent on the selected ISF. Thus Input and output sampling frequency should be larger then ISF. The RTP payload and timestamp must provide the receiver with sufficient information for: - Recovery of decoding and playout position and order - Intermedia synchronization First, a common thing when using an RTP timestamp rate not matching the input sampling rate is that the sampling instance may not be represented by an integer timestamp value, instead it may be fractional. This can lead to a initial offset error to another media, when starting decoding, due to the rounding. When one uses uRTR or a timestamp rate that results in that the transport units, either samples or audio frames, do not have integer timestamp tick duration is that one may get in-stream jitter. This is due to that a frame has a duration in the RTP timestamp domain that is fractional, the rounding error becomes the error in placing the data correctly on the timescale. This may not be a serious problem for frame based codecs as long as all data arrives, as then one can run a scheme that concatenates the data to be decoded into a correct stream. Thus the decoder output should be correctly and unjittered stream. However if losses occurs then one may needs to insert the data without the help of prior data to determine what the fractional offset it, thus potentially introducing jitter in the placement. If I understand things correctly, the inter media synchronization error is normally not a problem as humans are quite tolerable to offset. However we are very sensitive to jitter within an audio stream. The error introduced by fractional frame lengths will also have impact on the RTP payload design. When aggregating frames for frame-based codecs the normal RTP timestamp recovery scheme is to calculate the RTP TS as: RTP TS value + N * <frame duration in RTP TS ticks>, where N is the number of frames prior to this frame. However if one can't express the frame duration in integer number of RTP ticks, then the error is multiplied by N. Thus an error can grove to several timestamp ticks. Or one uses an scheme that provide absolute RTP TS offset values, which will raise the need overhead for aggregation. For sample based codecs where the smallest unit is a sample, the fractional error may be even harder to handle due to need for greater precision in alignment and potential less regular borders between packets. There also seem very hard to select a uRTR that will work well. First, audio has two families for frequencies used: - 8000, 16000, 24000, 32000, 48000, 96000, 192000 - 11025, 22050, 44100, 88200 The frequency span is also quite large due different applications. The higher values of 192k and 88.2k are used in SACD and DVD-Audio and can be expected to occur. Thus selection of a common rate within what is practically feasible (lower than a few MHz due to the wrap around) within RTP seem to not be possible. Thus any selected rate would most likely result in a compromise leading to bad conversion factors for either of the two families. Due to these issues, I think AMR-WB+ should keep its 72kHz RTP timestamp rate, as it provides the codec with the necessary audio frame location on full resolution clock without jitter. It also has quite good clock conversion factors for commonly used output frequencies: Hz # of 72kHz ticks per left column frequency tick 8000 9 16000 4,5 24000 3 25600 2,8125 32000 2,25 44100 1,632653061 48000 1,5 Another common problem with uRTR and more free choice of the RTP timestamp rate is the impact on the client implementations. Most multi-media clients are driven by the audio card clock. The client implementation uses the audio clock and to know when it needs to decode, remove data from receiver buffer, etc. Thus RTP timestamp rates will need to be converted to that clock and errors may arise also here. Allowing different rates to be used on different codecs will result in the need for handling conversion for more than one rate. Thus making codec plug-ins a more difficult. In conclusion, not using uRTR will more likely allow for maintained precision of where the audio data belongs. The client implementation will be slightly more impacted then what it would be for uRTR however I think this might be the price we need to accept. I would also propose that the issues around RTP timestamp rates for audio is documented in a informational RFC. This would included both the recommendation to use input sampling frequency when applicable. Variable rate codec do expose limitations in RTP and these should also be documented. Further recommendations on how to select rates, and that these may need to be considered already in codec development should also be part of it. Cheers Magnus Westerlund Multimedia Technologies, Ericsson Research EAB/TVA/A ---------------------------------------------------------------------- Ericsson AB | Phone +46 8 4048287 Torshamsgatan 23 | Fax +46 8 7575550 S-164 80 Stockholm, Sweden | mailto: magnus.westerlund@ericsson.com _______________________________________________ Audio/Video Transport Working Group avt@ietf.org https://www1.ietf.org/mailman/listinfo/avt
- [AVT] Problems with uRTR: draft-ietf-avt-variable… Magnus Westerlund