Re: [codec] #8: Sample rates?

I agree.

The changes in audio bandwidth will certainly be audible, and personally I
would prefer a stable experience.

Also, increasing the audio bandwidth might result in audible temporary echo
(in the newly opened frequencies), since the acoustic echo cancellers will
often need to re-adapt.  The acoustic feedback paths can change pretty
quickly at higher frequencies.

Stephen Botzko

On Tue, Apr 13, 2010 at 3:23 PM, Raymond (Juin-Hwey) Chen <
rchen@broadcom.com> wrote:

>  I would agree with Roman that for speech the difference between wideband
> (16 kHz sampling) and super-wideband/full-band (32 ~ 48 kHz sampling) is
> there but very small and many people cannot even distinguish between them,
> while for music the difference can be much more noticeable. From all the
> codec WG emails dating back to last year, it appears to me that most people
> want the IETF codec to handle music as well as speech.  Therefore, it seems
> appropriate to have the maximum sampling rate up to 48 kHz if we want to the
> codec to handle music.
>
>
>
> Regarding the dynamic switching of the sampling rate or audio bandwidth, if
> this is to be done, I think we need to be careful not to change the audio
> bandwidth too frequently, otherwise the frequent change of the audio
> bandwidth can be quite disturbing to the listener.  It would certainly be
> disturbing to me personally if the audio bandwidth changes more frequently
> than once every few seconds, but if it changes once every few minutes,
> that’s probably tolerable to me.
>
>
>
> Raymond
>
>
>
> *From:* codec-bounces@ietf.org [mailto:codec-bounces@ietf.org] *On Behalf
> Of *stephen botzko
> *Sent:* Tuesday, April 13, 2010 11:43 AM
> *To:* Roman Shpount
>
> *Cc:* codec@ietf.org
> *Subject:* Re: [codec] #8: Sample rates?
>
>
>
> >>>
>
> I will not argue superwideband vs wideband, even though we did some, not
> very scientific, blind tests, and in most cases people (especially men)
> cannot even distinguish wideband from superwideband when    listening to
> voice samples. Only a very small percentage of voice energy is even present
> above 8 Khz.
>
> >>>
> Though there isn't much voice energy over 8 kHz, in our (equally
> unscientific) tests, sibilants and fricatives are easier to distinguish if
> you are using superwideband or better.  That was one reason we added Annec C
> to G.722.1.
>
> Fullband is (IMHO) a specsmanship thing for speech (and probably for music
> also for most of us).  Though it may not be that hard to get it, if we are
> shooting for superwideband anyway.
>
> Stephen Botzko
>
>  On Tue, Apr 13, 2010 at 2:11 PM, Roman Shpount <roman@telurix.com> wrote:
>
> I will not argue superwideband vs wideband, even though we did some, not
> very scientific, blind tests, and in most cases people (especially men)
> cannot even distinguish wideband from superwideband when listening to voice
> samples. Only a very small percentage of voice energy is even present above
> 8 Khz.
>
> Music on the other hand, especially past 24Khz sampling rate, gets affected
> by the CODEC more then by bandwidth. For instance 8KHz G.711 encoded music
> sounds poor, but reasonable, and G.729 encoded music cannot be listened to.
> In most cases (apart from critical listening), music sampled at 16Khz is
> acceptable, especially for generation iPod.
>
> My remark about RTPC was to try to develop a CODEC that will function
> properly with RTCP absent. If we require RTCP based mechanisms in order for
> the CODEC to operate properly, this can impede the adoption of this CODEC.
> In no way do I propose to create new signaling mechanisms.
>
>
> ______________________________
> Roman Shpount - www.telurix.com
>
>    On Tue, Apr 13, 2010 at 1:29 PM, stephen botzko <
> stephen.botzko@gmail.com> wrote:
>
> Superwideband (and even fullband) do make speech somewhat more
> intelligible, and also reduce listener fatigue.  Telepresence and other
> videoconferencing equipment use those acoustic bandwidths today, so it would
> be nice if CODEC supported at least superwideband also.
>
> Personally I see some value in carriage of music.  Sometimes our equipment
> is used for music performance.  Distance learning is another use case where
> music has some value, since course and training materials frequently do
> include videos with music.  Though of course conversational speech is the
> dominant use case.
>
> BTW, Videoconferencing devices do almost always support RTCP.  It is
> regrettable that so many VOIP devices do not.  Anyway, I do not think our
> charter scope includes invention of a new mechanism for signaling the
> network quality.
>
> Stephen Botzko
>
>
>
> On Tue, Apr 13, 2010 at 12:41 PM, Roman Shpount <roman@telurix.com> wrote:
>
> I am not sure if this was decided, but should this new CODEC support music
> encoding? If we don't plan to support music, we should probably stick to 16
> Khz sampling rate. If we need music, I would suggest to have a 24 Khz (or
> higher sampling rate) variant. I am not sure how many people here care about
> a non-voice CODEC. For all the practical purposes I don't. I would argue,
> at least, for a fixed 16 KHz sampling rate CODEC variant.
>
> P.S. On the same note, does anybody here cares about using this CODEC with
> multicast? Is there a single commercial multicast voice deployment? From
> what I've seen all multicast does is making IETF voice standards harder to
> understand or implement.
>
> P.P.S. RTCP is almost universally not implemented. The biggest VoIP gateway
> on the market does not generate RTCP. If we will rely on any RTCP
> functionality for bandwidth control it will probably be ignored.
> ______________________________
> Roman Shpount -  www.telurix.com
>
>   On Tue, Apr 13, 2010 at 12:26 PM, stephen botzko <
> stephen.botzko@gmail.com> wrote:
>
>  TCP is a different case, since for this we are using RTCP to signal our
> feedback, and I don't think it has the facility you are envisioning.
>
> Also, I disagree with your presumption that multicast is out of scope.  I
> don't know of any other packetization RFCs that expressly rule out
> multicast, and multicast can be used for interactive applications.
>
> This concept seems pretty theoretical to me.  If we need to manage
> complexity / quality tradeoffs, why not just use profiles (as AVC/H.264
> does) or create a low complexity variant (like G.729A).  I really don't see
> the need for *dynamic* complexity management.
>
> BTW, you seem to be assuming that a lower sample rate results in
> significantly less complexity.  The savings there might not be as great as
> you think, especially if the receiver needs to resample anyway (to prevent
> those sound card limitations you were talking about before).
>
> Stephen Botzko
>
> On Tue, Apr 13, 2010 at 11:18 AM, Christian Hoene <hoene@uni-tuebingen.de>
> wrote:
>
> Hi,
>
>
>
> comments inline:
>
>
>
>
>
> *From:* stephen botzko [mailto:stephen.botzko@gmail.com]
> *Sent:* Tuesday, April 13, 2010 4:56 PM
> *To:* Christian Hoene
> *Cc:* codec@ietf.org
>
>
> *Subject:* Re: [codec] #8: Sample rates?
>
>
>
> This would make the signaling more complicated - personally I am not
> convinced it is worth it.
>
> CH: It is a difficult tradeoff. However, signaling overload is done in
> Skype.  Such as signaling might be very useful for mobile devices, which
> want to save power and thus lower their CPU clock. Or wireless IP based
> headphones which do not have large batteries. I am thinking of signaling the
> states: overloaded, fine, and low. That should be enough for most
> operational cases.
>
>
> I think a better avenue is to bound overall complexity, and to focus on
> dynamically adapting to network conditions (as opposed to dynamic complexity
> management).
>
> CH: I just like to remind that the good old TCP does support both:
> congestion control to adapt to network conditions and flow control take into
> account an overloaded (=full) receiver.
>
> You can't dynamically negotiate complexity in many scenarios anyway - for
> instance it makes no sense if you are using multicast.
>
> CH: Multicast is out of scope anyhow. We are considering an interactive
> codec.
>
> CH: The conferencing scenario might be some more difficult to handle but
> will not a big problem.
>
> Christian
>
>
>
> Stephen Botzko
>
> On Tue, Apr 13, 2010 at 10:42 AM, Christian Hoene <hoene@uni-tuebingen.de>
> wrote:
>
> Hi,
>
>
>
> It still might make sense to negotiate the maximal supported sampling rate
> via SDP or, if possible, to select one out of multiple sampling rates, if
> the audio receiver can cope with multiple rates well. The internal sampling
> frequency of the codec NEEDS NOT to be affected by the external sampling
> frequency.
>
>
>
> However, the decoder might want to signal to the encoder that the decoding
> is requiring too many computational resources and that a less complex coding
> mode (or a lower sampling frequency) should be taken.
>
>
>
> Christian
>
>
>
>
>
> ---------------------------------------------------------------
>
> Dr.-Ing. Christian Hoene
>
> Interactive Communication Systems (ICS), University of Tübingen
>
> Sand 13, 72076 Tübingen, Germany, Phone +49 7071 2970532
> http://www.net.uni-tuebingen.de/
>
>
>
> *From:* stephen botzko [mailto:stephen.botzko@gmail.com]
> *Sent:* Tuesday, April 13, 2010 3:21 PM
> *To:* Kevin P. Fleming
> *Cc:* Christian Hoene; codec@ietf.org
> *Subject:* Re: [codec] #8: Sample rates?
>
>
>
> Though I generally avoid MAY, this could be a case where it makes sense.
>
> Something like:
>
> CODEC MAY reduce the acoustic bandwidth at lower bit rates in order to
> optimize audio quality.
>
> This is free of any technology assumption about *how* the acoustic
> bandwidth is reduced.  The MAY indicates that it is permissible.  But if the
> CODEC algorithm doesn't need to reduce the acoustic bandwidth, then we are
> making no statement that it SHOULD (or SHOULD NOT).
>
> Kevin is distinguishing dynamic changes to the sample rate (for bandwidth
> management) from multiple fixed sample rates; and I agree that is a key
> distinction.
>
> I have not heard any clear application requirement for more than one fixed
> sampling rate.  Though if there is such a requirement, IMHO we would have to
> negotiate the rate within SDP in the usual way, and it would affect the RTP
> timestamps, jitter buffers, etc.  G.722.1 / G.722.1C is one precedent - it
> is the same core codec, but can run at two different sample rates
> (negotiated by SDP).
>
> Stephen Botzko
>
> On Tue, Apr 13, 2010 at 7:41 AM, Kevin P. Fleming <kpfleming@digium.com>
> wrote:
>
> stephen botzko wrote:
>
> > Dynamically changing sample rates on the system level adds some
> > complexity for RTP, since the timestamp granularity is supposed to be
> > the sample rate.
>
> And jitter buffers, and anything else that is based on timestamps and
> sample rates/counts. If the desire is for the codec to be able to change
> sample rates to adjust to network conditions, then I agree with
> Stephen... the 'external' sample rate (input to the encoder and output
> from the decoder) should be fixed, and this is what would be negotiated
> in SDP and used for RTP timestamps. The codec can downsample in the
> encoder and upsample in the decoder if it has decided to transmit fewer
> bits across the network.
>
> --
> Kevin P. Fleming
> Digium, Inc. | Director of Software Technologies
> 445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
> skype: kpfleming | jabber: kfleming@digium.com
> Check us out at www.digium.com & www.asterisk.org
>
>
>
>
>
>
>
> _______________________________________________
> codec mailing list
> codec@ietf.org
> https://www.ietf.org/mailman/listinfo/codec
>
>
>
>
>
>
>
>
>