Re: [codec] #8: Sample rates?

I would agree with Roman that for speech the difference between wideband (16 kHz sampling) and super-wideband/full-band (32 ~ 48 kHz sampling) is there but very small and many people cannot even distinguish between them, while for music the difference can be much more noticeable. From all the codec WG emails dating back to last year, it appears to me that most people want the IETF codec to handle music as well as speech.  Therefore, it seems appropriate to have the maximum sampling rate up to 48 kHz if we want to the codec to handle music.

Regarding the dynamic switching of the sampling rate or audio bandwidth, if this is to be done, I think we need to be careful not to change the audio bandwidth too frequently, otherwise the frequent change of the audio bandwidth can be quite disturbing to the listener.  It would certainly be disturbing to me personally if the audio bandwidth changes more frequently than once every few seconds, but if it changes once every few minutes, that's probably tolerable to me.

Raymond

From: codec-bounces@ietf.org [mailto:codec-bounces@ietf.org] On Behalf Of stephen botzko
Sent: Tuesday, April 13, 2010 11:43 AM
To: Roman Shpount
Cc: codec@ietf.org
Subject: Re: [codec] #8: Sample rates?

>>>
I will not argue superwideband vs wideband, even though we did some, not very scientific, blind tests, and in most cases people (especially men) cannot even distinguish wideband from superwideband when    listening to voice samples. Only a very small percentage of voice energy is even present above 8 Khz.
>>>
Though there isn't much voice energy over 8 kHz, in our (equally unscientific) tests, sibilants and fricatives are easier to distinguish if you are using superwideband or better.  That was one reason we added Annec C to G.722.1.

Fullband is (IMHO) a specsmanship thing for speech (and probably for music also for most of us).  Though it may not be that hard to get it, if we are shooting for superwideband anyway.

Stephen Botzko

On Tue, Apr 13, 2010 at 2:11 PM, Roman Shpount <roman@telurix.com<mailto:roman@telurix.com>> wrote:
I will not argue superwideband vs wideband, even though we did some, not very scientific, blind tests, and in most cases people (especially men) cannot even distinguish wideband from superwideband when listening to voice samples. Only a very small percentage of voice energy is even present above 8 Khz.

Music on the other hand, especially past 24Khz sampling rate, gets affected by the CODEC more then by bandwidth. For instance 8KHz G.711 encoded music sounds poor, but reasonable, and G.729 encoded music cannot be listened to. In most cases (apart from critical listening), music sampled at 16Khz is acceptable, especially for generation iPod.

My remark about RTPC was to try to develop a CODEC that will function properly with RTCP absent. If we require RTCP based mechanisms in order for the CODEC to operate properly, this can impede the adoption of this CODEC. In no way do I propose to create new signaling mechanisms.

______________________________
Roman Shpount - www.telurix.com<http://www.telurix.com>

On Tue, Apr 13, 2010 at 1:29 PM, stephen botzko <stephen.botzko@gmail.com<mailto:stephen.botzko@gmail.com>> wrote:
Superwideband (and even fullband) do make speech somewhat more intelligible, and also reduce listener fatigue.  Telepresence and other videoconferencing equipment use those acoustic bandwidths today, so it would be nice if CODEC supported at least superwideband also.

Personally I see some value in carriage of music.  Sometimes our equipment is used for music performance.  Distance learning is another use case where music has some value, since course and training materials frequently do include videos with music.  Though of course conversational speech is the dominant use case.

BTW, Videoconferencing devices do almost always support RTCP.  It is regrettable that so many VOIP devices do not.  Anyway, I do not think our charter scope includes invention of a new mechanism for signaling the network quality.

Stephen Botzko

On Tue, Apr 13, 2010 at 12:41 PM, Roman Shpount <roman@telurix.com<mailto:roman@telurix.com>> wrote:
I am not sure if this was decided, but should this new CODEC support music encoding? If we don't plan to support music, we should probably stick to 16 Khz sampling rate. If we need music, I would suggest to have a 24 Khz (or higher sampling rate) variant. I am not sure how many people here care about a non-voice CODEC. For all the practical purposes I don't. I would argue,  at least, for a fixed 16 KHz sampling rate CODEC variant.

P.S. On the same note, does anybody here cares about using this CODEC with multicast? Is there a single commercial multicast voice deployment? From what I've seen all multicast does is making IETF voice standards harder to understand or implement.

P.P.S. RTCP is almost universally not implemented. The biggest VoIP gateway on the market does not generate RTCP. If we will rely on any RTCP functionality for bandwidth control it will probably be ignored.
______________________________
Roman Shpount -  www.telurix.com<http://www.telurix.com>

On Tue, Apr 13, 2010 at 12:26 PM, stephen botzko <stephen.botzko@gmail.com<mailto:stephen.botzko@gmail.com>> wrote:
TCP is a different case, since for this we are using RTCP to signal our feedback, and I don't think it has the facility you are envisioning.

Also, I disagree with your presumption that multicast is out of scope.  I don't know of any other packetization RFCs that expressly rule out multicast, and multicast can be used for interactive applications.

This concept seems pretty theoretical to me.  If we need to manage complexity / quality tradeoffs, why not just use profiles (as AVC/H.264 does) or create a low complexity variant (like G.729A).  I really don't see the need for dynamic complexity management.

BTW, you seem to be assuming that a lower sample rate results in significantly less complexity.  The savings there might not be as great as you think, especially if the receiver needs to resample anyway (to prevent those sound card limitations you were talking about before).

Stephen Botzko
On Tue, Apr 13, 2010 at 11:18 AM, Christian Hoene <hoene@uni-tuebingen.de<mailto:hoene@uni-tuebingen.de>> wrote:
Hi,

comments inline:

From: stephen botzko [mailto:stephen.botzko@gmail.com<mailto:stephen.botzko@gmail.com>]
Sent: Tuesday, April 13, 2010 4:56 PM
To: Christian Hoene
Cc: codec@ietf.org<mailto:codec@ietf.org>

Subject: Re: [codec] #8: Sample rates?

This would make the signaling more complicated - personally I am not convinced it is worth it.
CH: It is a difficult tradeoff. However, signaling overload is done in Skype.  Such as signaling might be very useful for mobile devices, which want to save power and thus lower their CPU clock. Or wireless IP based headphones which do not have large batteries. I am thinking of signaling the states: overloaded, fine, and low. That should be enough for most operational cases.

I think a better avenue is to bound overall complexity, and to focus on dynamically adapting to network conditions (as opposed to dynamic complexity management).
CH: I just like to remind that the good old TCP does support both: congestion control to adapt to network conditions and flow control take into account an overloaded (=full) receiver.
You can't dynamically negotiate complexity in many scenarios anyway - for instance it makes no sense if you are using multicast.
CH: Multicast is out of scope anyhow. We are considering an interactive codec.
CH: The conferencing scenario might be some more difficult to handle but will not a big problem.
Christian

Stephen Botzko
On Tue, Apr 13, 2010 at 10:42 AM, Christian Hoene <hoene@uni-tuebingen.de<mailto:hoene@uni-tuebingen.de>> wrote:
Hi,

It still might make sense to negotiate the maximal supported sampling rate via SDP or, if possible, to select one out of multiple sampling rates, if the audio receiver can cope with multiple rates well. The internal sampling frequency of the codec NEEDS NOT to be affected by the external sampling frequency.

However, the decoder might want to signal to the encoder that the decoding is requiring too many computational resources and that a less complex coding mode (or a lower sampling frequency) should be taken.

Christian

---------------------------------------------------------------
Dr.-Ing. Christian Hoene
Interactive Communication Systems (ICS), University of Tübingen
Sand 13, 72076 Tübingen, Germany, Phone +49 7071 2970532
http://www.net.uni-tuebingen.de/

From: stephen botzko [mailto:stephen.botzko@gmail.com<mailto:stephen.botzko@gmail.com>]
Sent: Tuesday, April 13, 2010 3:21 PM
To: Kevin P. Fleming
Cc: Christian Hoene; codec@ietf.org<mailto:codec@ietf.org>
Subject: Re: [codec] #8: Sample rates?

Though I generally avoid MAY, this could be a case where it makes sense.

Something like:

CODEC MAY reduce the acoustic bandwidth at lower bit rates in order to optimize audio quality.

This is free of any technology assumption about how the acoustic bandwidth is reduced.  The MAY indicates that it is permissible.  But if the CODEC algorithm doesn't need to reduce the acoustic bandwidth, then we are making no statement that it SHOULD (or SHOULD NOT).

Kevin is distinguishing dynamic changes to the sample rate (for bandwidth management) from multiple fixed sample rates; and I agree that is a key distinction.

I have not heard any clear application requirement for more than one fixed sampling rate.  Though if there is such a requirement, IMHO we would have to negotiate the rate within SDP in the usual way, and it would affect the RTP timestamps, jitter buffers, etc.  G.722.1 / G.722.1C is one precedent - it is the same core codec, but can run at two different sample rates (negotiated by SDP).

Stephen Botzko
On Tue, Apr 13, 2010 at 7:41 AM, Kevin P. Fleming <kpfleming@digium.com<mailto:kpfleming@digium.com>> wrote:
stephen botzko wrote:

> Dynamically changing sample rates on the system level adds some
> complexity for RTP, since the timestamp granularity is supposed to be
> the sample rate.
And jitter buffers, and anything else that is based on timestamps and
sample rates/counts. If the desire is for the codec to be able to change
sample rates to adjust to network conditions, then I agree with
Stephen... the 'external' sample rate (input to the encoder and output
from the decoder) should be fixed, and this is what would be negotiated
in SDP and used for RTP timestamps. The codec can downsample in the
encoder and upsample in the decoder if it has decided to transmit fewer
bits across the network.

--
Kevin P. Fleming
Digium, Inc. | Director of Software Technologies
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
skype: kpfleming | jabber: kfleming@digium.com<mailto:kfleming@digium.com>
Check us out at www.digium.com<http://www.digium.com> & www.asterisk.org<http://www.asterisk.org>

_______________________________________________
codec mailing list
codec@ietf.org<mailto:codec@ietf.org>
https://www.ietf.org/mailman/listinfo/codec