Re: [codec] #8: Sample rates?

My 2 cents on this thread:

- Going beyond WB quality matters a lot, even for speech.  See for  
example slides 2, 3 of  
http://www.ietf.org/proceedings/10mar/slides/codec-3.pdf.  Agree with  
Ben that we might as well go full-band.

- The codec has 3 sampling rates:
   1. Encoder API
   2. Codec internal (not strictly sampling rate, more audio bandwidth  
as Stephen pointed out)
   3. Decoder API
I don't see a reason to impose that some or all of these be identical.  
  For (conference) mixing of several streams it helps if all decoders  
run at the same API sampling rate. It would be unreasonable to require  
that every encoder also runs at that API sampling rate.  Some encoders  
may for instance sit in a PSTN gateway, dealing strictly with  
narrowband signals.

- To keep RTP timestamps simple, they could always be based on the  
same sampling rate, irrespective of any of the ones above. Maybe 48  
kHz is a good choice?

- Sometimes a decoder runs on hardware with limited audio bandwidth  
playback capabilities (e.g. mobile devices).  In those cases it helps  
if the decoder can request a maximum internal audio bandwidth to the  
encoder, during call setup. Otherwise the encoder may be wasting bits  
on unused audio spectrum.  So I agree with Christian on this.

- Agree with Raymond that fast switching of audio bandwidth sounds  
unpleasant.  SILK has a hysteresis mechanism for this, and we rarely  
get more than one or two switches during a Skype call.

best,
koen.

Quoting stephen botzko <stephen.botzko@gmail.com>:

> I agree.
>
> The changes in audio bandwidth will certainly be audible, and personally I
> would prefer a stable experience.
>
> Also, increasing the audio bandwidth might result in audible temporary echo
> (in the newly opened frequencies), since the acoustic echo cancellers will
> often need to re-adapt.  The acoustic feedback paths can change pretty
> quickly at higher frequencies.
>
> Stephen Botzko
>
> On Tue, Apr 13, 2010 at 3:23 PM, Raymond (Juin-Hwey) Chen <
> rchen@broadcom.com> wrote:
>
>>  I would agree with Roman that for speech the difference between wideband
>> (16 kHz sampling) and super-wideband/full-band (32 ~ 48 kHz sampling) is
>> there but very small and many people cannot even distinguish between them,
>> while for music the difference can be much more noticeable. From all the
>> codec WG emails dating back to last year, it appears to me that most people
>> want the IETF codec to handle music as well as speech.  Therefore, it seems
>> appropriate to have the maximum sampling rate up to 48 kHz if we want to the
>> codec to handle music.
>>
>>
>>
>> Regarding the dynamic switching of the sampling rate or audio bandwidth, if
>> this is to be done, I think we need to be careful not to change the audio
>> bandwidth too frequently, otherwise the frequent change of the audio
>> bandwidth can be quite disturbing to the listener.  It would certainly be
>> disturbing to me personally if the audio bandwidth changes more frequently
>> than once every few seconds, but if it changes once every few minutes,
>> that's probably tolerable to me.
>>
>>
>>
>> Raymond
>>
>>
>>
>> *From:* codec-bounces@ietf.org [mailto:codec-bounces@ietf.org] *On Behalf
>> Of *stephen botzko
>> *Sent:* Tuesday, April 13, 2010 11:43 AM
>> *To:* Roman Shpount
>>
>> *Cc:* codec@ietf.org
>> *Subject:* Re: [codec] #8: Sample rates?
>>
>>
>>
>> >>>
>>
>> I will not argue superwideband vs wideband, even though we did some, not
>> very scientific, blind tests, and in most cases people (especially men)
>> cannot even distinguish wideband from superwideband when    listening to
>> voice samples. Only a very small percentage of voice energy is even present
>> above 8 Khz.
>>
>> >>>
>> Though there isn't much voice energy over 8 kHz, in our (equally
>> unscientific) tests, sibilants and fricatives are easier to distinguish if
>> you are using superwideband or better.  That was one reason we added Annec C
>> to G.722.1.
>>
>> Fullband is (IMHO) a specsmanship thing for speech (and probably for music
>> also for most of us).  Though it may not be that hard to get it, if we are
>> shooting for superwideband anyway.
>>
>> Stephen Botzko
>>
>>  On Tue, Apr 13, 2010 at 2:11 PM, Roman Shpount <roman@telurix.com> wrote:
>>
>> I will not argue superwideband vs wideband, even though we did some, not
>> very scientific, blind tests, and in most cases people (especially men)
>> cannot even distinguish wideband from superwideband when listening to voice
>> samples. Only a very small percentage of voice energy is even present above
>> 8 Khz.
>>
>> Music on the other hand, especially past 24Khz sampling rate, gets affected
>> by the CODEC more then by bandwidth. For instance 8KHz G.711 encoded music
>> sounds poor, but reasonable, and G.729 encoded music cannot be listened to.
>> In most cases (apart from critical listening), music sampled at 16Khz is
>> acceptable, especially for generation iPod.
>>
>> My remark about RTPC was to try to develop a CODEC that will function
>> properly with RTCP absent. If we require RTCP based mechanisms in order for
>> the CODEC to operate properly, this can impede the adoption of this CODEC.
>> In no way do I propose to create new signaling mechanisms.
>>
>>
>> ______________________________
>> Roman Shpount - www.telurix.com
>>
>>    On Tue, Apr 13, 2010 at 1:29 PM, stephen botzko <
>> stephen.botzko@gmail.com> wrote:
>>
>> Superwideband (and even fullband) do make speech somewhat more
>> intelligible, and also reduce listener fatigue.  Telepresence and other
>> videoconferencing equipment use those acoustic bandwidths today, so it would
>> be nice if CODEC supported at least superwideband also.
>>
>> Personally I see some value in carriage of music.  Sometimes our equipment
>> is used for music performance.  Distance learning is another use case where
>> music has some value, since course and training materials frequently do
>> include videos with music.  Though of course conversational speech is the
>> dominant use case.
>>
>> BTW, Videoconferencing devices do almost always support RTCP.  It is
>> regrettable that so many VOIP devices do not.  Anyway, I do not think our
>> charter scope includes invention of a new mechanism for signaling the
>> network quality.
>>
>> Stephen Botzko
>>
>>
>>
>> On Tue, Apr 13, 2010 at 12:41 PM, Roman Shpount <roman@telurix.com> wrote:
>>
>> I am not sure if this was decided, but should this new CODEC support music
>> encoding? If we don't plan to support music, we should probably stick to 16
>> Khz sampling rate. If we need music, I would suggest to have a 24 Khz (or
>> higher sampling rate) variant. I am not sure how many people here care about
>> a non-voice CODEC. For all the practical purposes I don't. I would argue,
>> at least, for a fixed 16 KHz sampling rate CODEC variant.
>>
>> P.S. On the same note, does anybody here cares about using this CODEC with
>> multicast? Is there a single commercial multicast voice deployment? From
>> what I've seen all multicast does is making IETF voice standards harder to
>> understand or implement.
>>
>> P.P.S. RTCP is almost universally not implemented. The biggest VoIP gateway
>> on the market does not generate RTCP. If we will rely on any RTCP
>> functionality for bandwidth control it will probably be ignored.
>> ______________________________
>> Roman Shpount -  www.telurix.com
>>
>>   On Tue, Apr 13, 2010 at 12:26 PM, stephen botzko <
>> stephen.botzko@gmail.com> wrote:
>>
>>  TCP is a different case, since for this we are using RTCP to signal our
>> feedback, and I don't think it has the facility you are envisioning.
>>
>> Also, I disagree with your presumption that multicast is out of scope.  I
>> don't know of any other packetization RFCs that expressly rule out
>> multicast, and multicast can be used for interactive applications.
>>
>> This concept seems pretty theoretical to me.  If we need to manage
>> complexity / quality tradeoffs, why not just use profiles (as AVC/H.264
>> does) or create a low complexity variant (like G.729A).  I really don't see
>> the need for *dynamic* complexity management.
>>
>> BTW, you seem to be assuming that a lower sample rate results in
>> significantly less complexity.  The savings there might not be as great as
>> you think, especially if the receiver needs to resample anyway (to prevent
>> those sound card limitations you were talking about before).
>>
>> Stephen Botzko
>>
>> On Tue, Apr 13, 2010 at 11:18 AM, Christian Hoene <hoene@uni-tuebingen.de>
>> wrote:
>>
>> Hi,
>>
>>
>>
>> comments inline:
>>
>>
>>
>>
>>
>> *From:* stephen botzko [mailto:stephen.botzko@gmail.com]
>> *Sent:* Tuesday, April 13, 2010 4:56 PM
>> *To:* Christian Hoene
>> *Cc:* codec@ietf.org
>>
>>
>> *Subject:* Re: [codec] #8: Sample rates?
>>
>>
>>
>> This would make the signaling more complicated - personally I am not
>> convinced it is worth it.
>>
>> CH: It is a difficult tradeoff. However, signaling overload is done in
>> Skype.  Such as signaling might be very useful for mobile devices, which
>> want to save power and thus lower their CPU clock. Or wireless IP based
>> headphones which do not have large batteries. I am thinking of signaling the
>> states: overloaded, fine, and low. That should be enough for most
>> operational cases.
>>
>>
>> I think a better avenue is to bound overall complexity, and to focus on
>> dynamically adapting to network conditions (as opposed to dynamic complexity
>> management).
>>
>> CH: I just like to remind that the good old TCP does support both:
>> congestion control to adapt to network conditions and flow control take into
>> account an overloaded (=full) receiver.
>>
>> You can't dynamically negotiate complexity in many scenarios anyway - for
>> instance it makes no sense if you are using multicast.
>>
>> CH: Multicast is out of scope anyhow. We are considering an interactive
>> codec.
>>
>> CH: The conferencing scenario might be some more difficult to handle but
>> will not a big problem.
>>
>> Christian
>>
>>
>>
>> Stephen Botzko
>>
>> On Tue, Apr 13, 2010 at 10:42 AM, Christian Hoene <hoene@uni-tuebingen.de>
>> wrote:
>>
>> Hi,
>>
>>
>>
>> It still might make sense to negotiate the maximal supported sampling rate
>> via SDP or, if possible, to select one out of multiple sampling rates, if
>> the audio receiver can cope with multiple rates well. The internal sampling
>> frequency of the codec NEEDS NOT to be affected by the external sampling
>> frequency.
>>
>>
>>
>> However, the decoder might want to signal to the encoder that the decoding
>> is requiring too many computational resources and that a less complex coding
>> mode (or a lower sampling frequency) should be taken.
>>
>>
>>
>> Christian
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------
>>
>> Dr.-Ing. Christian Hoene
>>
>> Interactive Communication Systems (ICS), University of Tübingen
>>
>> Sand 13, 72076 Tübingen, Germany, Phone +49 7071 2970532
>> http://www.net.uni-tuebingen.de/
>>
>>
>>
>> *From:* stephen botzko [mailto:stephen.botzko@gmail.com]
>> *Sent:* Tuesday, April 13, 2010 3:21 PM
>> *To:* Kevin P. Fleming
>> *Cc:* Christian Hoene; codec@ietf.org
>> *Subject:* Re: [codec] #8: Sample rates?
>>
>>
>>
>> Though I generally avoid MAY, this could be a case where it makes sense.
>>
>> Something like:
>>
>> CODEC MAY reduce the acoustic bandwidth at lower bit rates in order to
>> optimize audio quality.
>>
>> This is free of any technology assumption about *how* the acoustic
>> bandwidth is reduced.  The MAY indicates that it is permissible.  But if the
>> CODEC algorithm doesn't need to reduce the acoustic bandwidth, then we are
>> making no statement that it SHOULD (or SHOULD NOT).
>>
>> Kevin is distinguishing dynamic changes to the sample rate (for bandwidth
>> management) from multiple fixed sample rates; and I agree that is a key
>> distinction.
>>
>> I have not heard any clear application requirement for more than one fixed
>> sampling rate.  Though if there is such a requirement, IMHO we would have to
>> negotiate the rate within SDP in the usual way, and it would affect the RTP
>> timestamps, jitter buffers, etc.  G.722.1 / G.722.1C is one precedent - it
>> is the same core codec, but can run at two different sample rates
>> (negotiated by SDP).
>>
>> Stephen Botzko
>>
>> On Tue, Apr 13, 2010 at 7:41 AM, Kevin P. Fleming <kpfleming@digium.com>
>> wrote:
>>
>> stephen botzko wrote:
>>
>> > Dynamically changing sample rates on the system level adds some
>> > complexity for RTP, since the timestamp granularity is supposed to be
>> > the sample rate.
>>
>> And jitter buffers, and anything else that is based on timestamps and
>> sample rates/counts. If the desire is for the codec to be able to change
>> sample rates to adjust to network conditions, then I agree with
>> Stephen... the 'external' sample rate (input to the encoder and output
>> from the decoder) should be fixed, and this is what would be negotiated
>> in SDP and used for RTP timestamps. The codec can downsample in the
>> encoder and upsample in the decoder if it has decided to transmit fewer
>> bits across the network.
>>
>> --
>> Kevin P. Fleming
>> Digium, Inc. | Director of Software Technologies
>> 445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
>> skype: kpfleming | jabber: kfleming@digium.com
>> Check us out at www.digium.com & www.asterisk.org
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> codec mailing list
>> codec@ietf.org
>> https://www.ietf.org/mailman/listinfo/codec
>>
>>
>>
>>
>>
>>
>>
>>
>>
>