Re: [AVT] RE: <draft-ietf-avt-rtp-vmr-wb-03.txt>: sampling rate

Qiaobing Xie <Qiaobing.Xie@motorola.com> Thu, 09 September 2004 07:56 UTC

Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id DAA26079 for <avt-archive@ietf.org>; Thu, 9 Sep 2004 03:56:10 -0400 (EDT)
Received: from megatron.ietf.org ([132.151.6.71]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1C5JqT-0004pW-W3 for avt-archive@ietf.org; Thu, 09 Sep 2004 04:00:09 -0400
Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1C5JlZ-000077-DM; Thu, 09 Sep 2004 03:54:49 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1C5Jki-0008M1-8K for avt@megatron.ietf.org; Thu, 09 Sep 2004 03:53:56 -0400
Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id DAA26017 for <avt@ietf.org>; Thu, 9 Sep 2004 03:53:54 -0400 (EDT)
Received: from motgate3.mot.com ([144.189.100.103]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1C5JoF-0004nz-C4 for avt@ietf.org; Thu, 09 Sep 2004 03:57:52 -0400
Received: from az33exr01.mot.com (az33exr01.mot.com [10.64.251.231]) by motgate3.mot.com (Motorola/Motgate3) with ESMTP id i897rWtd027900; Thu, 9 Sep 2004 00:53:32 -0700 (MST)
Received: from motorola.com ([163.14.20.80]) by az33exr01.mot.com (Motorola/az33exr01) with ESMTP id i897jTRv007553; Thu, 9 Sep 2004 02:45:30 -0500
Message-ID: <41400C56.5030000@motorola.com>
Date: Thu, 09 Sep 2004 15:55:02 +0800
From: Qiaobing Xie <Qiaobing.Xie@motorola.com>
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.4) Gecko/20030624 Netscape/7.1
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Magnus Westerlund <magnus.westerlund@ericsson.com>
Subject: Re: [AVT] RE: <draft-ietf-avt-rtp-vmr-wb-03.txt>: sampling rate
References: <0B08EA1BF5F6304992CDC985EE02209E02A7435F@sdebe002.americas.nokia.com> <413EEED6.8070503@ericsson.com>
In-Reply-To: <413EEED6.8070503@ericsson.com>
Content-Type: text/plain; charset="us-ascii"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 287c806b254c6353fcb09ee0e53bbc5e
Content-Transfer-Encoding: 7bit
Cc: csp@csperkins.org, avt@ietf.org, sassan.ahmadi@nokia.com
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Audio/Video Transport Working Group <avt.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
Sender: avt-bounces@ietf.org
Errors-To: avt-bounces@ietf.org
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 36c793b20164cfe75332aa66ddb21196
Content-Transfer-Encoding: 7bit

Hello, Magnus,

Magnus Westerlund wrote:

> Hi Sassan,
> 
> Based on what you write in the previous mail. It seems that the only 
> reason for using different RTP timestamp rate between 8000 and 16000 Hz 
> is to indicate the sampling rate of the source material. If the codec 
> does not need any indication at all if the source material is 8k or 16k 
> then, I think the usage of different RTP timestamp rates is creating 
> unnecessary interoperability barriers. The barrier is that one actually 
> needs to indicate the rate of the source material, and cope with RTP 
> timestamp switching.

Right on! You nailed the issue perfectly.

> 
> To avoid the unnecessary function I would propose that VMR-WB only 
> defines 16kHz as RTP timestamp rate. 

Agreed. This would effectively remove the interoperability barrier you pointed out above.

My only concern is that this may create some interesting situations. Let's consider an 
example - original speech of 8k rate is passed to vmr-wb encoder and the decoder is set to 
output speech at 8k rate.

Here, we would then have:

  - source sampling rate = 8k
  - actually sampling rate of the bit stream sent over RTP = 12.8k
  - sampling rate output from vmr-wb = 8k
  - RTP header timestamp rate = 16k!!!

I am not sure this will cause any problem, but it seems strange.

 > If there is desire to have
> knowledge about source sampling rate that will be used, then one should 
> define a parameter that indicates that. But I am not certain it really 
> is needed. Such a parameter is declarative and does not matter in 
> regards to any interoperability and can be ignored without consequence. 

I, too, would like to first see some use case here. If we don't know how the information is 
going to be used, it makes no sense to specify a mechanism in RTP or even SDP to pass it around.

regards,
-Qiaobing

> Or is it something else about the codec that prevents this? I would not 
> think so as the file format can be fine without an explicit indication 
> of the source sampling rate.
> 
> Cheers
> 
> Magnus
> 
> 
> sassan.ahmadi@nokia.com wrote:
> 
>> Hi Qiaobing,
>>
>>
>>> Is it true that all the coded frames output from a VMR-WB __encoder__ 
>>> use the 12.8k sampling rate, independent of the original sampling 
>>> rate of the speech?
>>
>> The above statement is true. However, I want to make sure that it is 
>> not misinterpreted.
>>
>> The VMR-WB encoder converts the 8 or 16 kHz sampled input speech to 
>> 12.8 kHz prior to the encoding functions. This INTERNAL sampling 
>> frequency is transparent (hidden) to the user. The bit stream 
>> generated by the encoder is then transmitted to the VMR-WB decoder.
>>
>> The VMR-WB decoding functions are independent of the encoder input 
>> speech sampling frequency. By default, the VMR-WB decoder generates a 
>> wideband output, unless instructed otherwise. The internal sampling 
>> frequency must now be converted to 16 kHz (for wideband output) and 
>> the higher frequency band (6.4 to 7 kHz spectrum) must be 
>> reconstructed by the decoder. If a narrowband output is desired then 
>> 12.8 kHz sampling frequency must be converted to 8 kHz. Therefore, you 
>> CANNOT use the 12.8 kHz internal sampling frequency for any other 
>> purposes than the encoding-decoding functions.
>> Depending on the output audio interface (or the network interface), 
>> one may wish to instruct the decoder to generate a narrowband or 
>> wideband output.
>>
>> For proper operation, the RTP timestamp clock rate must be either 8000 
>> or 16000 depending on the narrowband or wideband operation, 
>> respectively. The 12800 Hz internal sampling rate CANNOT be used for 
>> the RTP timestamp clock rate. The correct timestamp or clock rate 
>> (8000 or 16000) is required for proper buffering and other functions 
>> in the transmitting and receiving sites.
>>
>> cdma2000 Service Option 62 (VMR-WB) also recognizes only 8000 or 16000 
>> Hz sampling frequencies.
>>
>> Since VMR-WB and AMR-WB codecs share the same core technology, the 
>> concept of 12800 Hz internal sampling frequency is used in both 
>> codecs. As you see in AMR-WB RFC and 3GPP specs, there is no external 
>> usage of the internal sampling frequency and the default RTP clock 
>> rate for the AMR-WB codec is 16000 Hz.
>>
>> Regards
>>
>> -Sassan Ahmadi
>>
> 
> 


_______________________________________________
Audio/Video Transport Working Group
avt@ietf.org
https://www1.ietf.org/mailman/listinfo/avt