Re: [AVT] RE: <draft-ietf-avt-rtp-vmr-wb-03.txt>: sampling rate

Colin Perkins <csp@csperkins.org> Fri, 24 September 2004 08:48 UTC

Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id EAA21273 for <avt-archive@ietf.org>; Fri, 24 Sep 2004 04:48:14 -0400 (EDT)
Received: from megatron.ietf.org ([132.151.6.71]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1CAlrT-0005ND-EF for avt-archive@ietf.org; Fri, 24 Sep 2004 04:55:27 -0400
Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1CAlc9-000778-7c; Fri, 24 Sep 2004 04:39:37 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1CAlbT-0006yT-Ej for avt@megatron.ietf.org; Fri, 24 Sep 2004 04:38:55 -0400
Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id EAA20266 for <avt@ietf.org>; Fri, 24 Sep 2004 04:38:53 -0400 (EDT)
Received: from mr1.dcs.gla.ac.uk ([130.209.249.184]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1CAliP-00058l-M7 for avt@ietf.org; Fri, 24 Sep 2004 04:46:06 -0400
Received: from vpn18.dcs.gla.ac.uk ([130.209.254.18]:54839) by mr1.dcs.gla.ac.uk with esmtpsa (TLSv1:RC4-SHA:128) (Exim 4.42) id 1CAlap-0005jg-LR; Fri, 24 Sep 2004 09:38:15 +0100
In-Reply-To: <4145B1B6.4090000@ericsson.com>
References: <0B08EA1BF5F6304992CDC985EE02209E02A74365@sdebe002.americas.nokia.com> <1225B53B-03EB-11D9-A048-000A957FC5F2@csperkins.org> <41433EE5.3030604@motorola.com> <4145B1B6.4090000@ericsson.com>
Mime-Version: 1.0 (Apple Message framework v619)
Content-Type: text/plain; charset="US-ASCII"; format="flowed"
Message-Id: <6654897A-0D36-11D9-A100-000A957FC5F2@csperkins.org>
Content-Transfer-Encoding: 7bit
From: Colin Perkins <csp@csperkins.org>
Subject: Re: [AVT] RE: <draft-ietf-avt-rtp-vmr-wb-03.txt>: sampling rate
Date: Thu, 23 Sep 2004 09:58:48 +0200
To: Magnus Westerlund <magnus.westerlund@ericsson.com>
X-Mailer: Apple Mail (2.619)
X-Spam-Score: 0.0 (/)
X-Scan-Signature: a7d2e37451f7f22841e3b6f40c67db0f
Content-Transfer-Encoding: 7bit
Cc: sassan.ahmadi@nokia.com, avt@ietf.org, Qiaobing Xie <Qiaobing.Xie@motorola.com>
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Audio/Video Transport Working Group <avt.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
Sender: avt-bounces@ietf.org
Errors-To: avt-bounces@ietf.org
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 5ebbf074524e58e662bc8209a6235027
Content-Transfer-Encoding: 7bit

Hi,

On 13 Sep 2004, at 16:41, Magnus Westerlund wrote:
> I think we have two issues:
>
> A. Is there any benefit to indicate or request that the sampling 
> frequency used at the sender.

Yes. This is why RTP has the "rate" parameter, and uses the sampling 
rate as the RTP timestamp rate.

> B. Is it necessary to use the sampling frequency as RTP timestamp rate.

It's highly desirable.

> I will start with A that I think is easier to explain and also can 
> provide some information for issue B. If you find any of my 
> assumptions and statements are incorrect, please correct me.
>
> To my understanding of the VMR-WB after a conversation with Jonas 
> Svedberg is that the VMR-WB will provide a somewhat better encoding of 
> 8kHz material if it is indicated that the input is 8kHz. However there 
> is no need due to compatibility or decoder operation to signal the 
> case where the 8kHz is used as input into the encoder. These would 
> then result that the only case needed to be signaled between encoder 
> and decoder is cases where the decoder will use output at 8kHz. 
> Because if the decoder can request that the encoder uses 8kHz input 
> some improvement of the 8kHz material is achieved. In the other cases 
> where the receiver is capable of 16kHz it doesn't matter for the 
> receiver if the original audio was 8 or 16kHz from a decoding point of 
> view.
>
> Colin, if one looks at issue B. Is it really needed to use the RTP 
> timestamp frequency equal to the sampling rate used? I would say NO to 
> that question.

Yes, it is necessary to use an RTP timestamp equal to the sampling rate.

> My reasoning is the following.
>
> - Many audio input is sampled from a source at a higher rate then the 
> encoder may handle. Thus a resampling and pre-processing stage is 
> employed based on the encoders input frequency rather then producing 
> that rate initially from the hardware. Some of the reason is that the 
> pre-processing may actually yield better results than what the 
> hardware at given input rate can gain. Another reason may be that one 
> like to avoid switching the hardware between rate if changing the 
> encoding.
>
> - The frame based decoders does not need to know the encoders input 
> rate. The encoder may anyway resample this into other rates for 
> internal processing and band limited signals. I would claim that 
> VMR-WB, AMR-WB+ and AAC are all example of codecs that perform this 
> kind of tricks. On the receiver side they produce a output signal that 
> has any sampling frequency the receiver finds most useful. Either 
> causing clipping of the higher frequencies, but more commonly to a 
> higher clock rate, despite that no more information is provided simply 
> for ease of use.
>
> - The frame based codecs do only need a RTP timestamp that allows the 
> receiver to correctly reconstruct the time line when the encoding is 
> done with the most audio bandwidth. In the VMR-WB case this is 16kHz. 
> AMR-WB+ is even more strange, as we have selected an RTP timestamp 
> rate that results in that all internal sampling frequencies will 
> result in integer timestamp ticks. Thus actually allowing one to 
> correctly calculate frame alignment when the internal sampling 
> frequency changes. That the frequency also is possible to recalculate 
> into several common sampling frequencies with few partial sample 
> alignments was also considered.
>
> Thus I would use this to argue that indicating the actual sampling 
> frequency is not necessarily as long as the receiver is capable of 
> correctly reconstruct the media stream with its timing information in 
> full resolution.

True, but it greatly simplifies the system if all codecs use the 
sampling rate as the RTP clock rate. You can make things work if each 
codec uses a different rate, but it's desirable that RTP is consistent 
where possible. Why is this codec so special that it needs to break 
this rule?

> In the VMR-WB case I would think that having only one timestamp rate 
> of 16kHz does not effect codec operation and would simplify the 
> handling when one has some senders that do use 8kHz, especially when 
> gateways need to encoded sometime 8kHz material from pre-recorded 
> responses and in other cases WB channel data. This do avoid the need 
> to perform RTP timestamp rate switches.

But in the process you make senders that support multiple codecs more 
complex, since they can't use the sampling rate to drive the media 
clock for all codecs.

> If desired to have this possibility to request by a receiver that the 
> sender do use 8kHz input then one should introduce a MIME parameter 
> for this. However I would like to avoid using the "rate" parameter as 
> it results in unnecessary barriers in form of signalling and RTP 
> timestamp rate switching.

I disagree. Using the rate parameter is consistent with other payload 
formats, and so will simplify the system overall.

Colin


-- 
Colin Perkins
http://csperkins.org/


_______________________________________________
Audio/Video Transport Working Group
avt@ietf.org
https://www1.ietf.org/mailman/listinfo/avt