Re: [codec] #8: Sample rates?

stephen botzko <stephen.botzko@gmail.com> Tue, 13 April 2010 20:03 UTC

Return-Path: <stephen.botzko@gmail.com>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 42AFA28C157 for <codec@core3.amsl.com>; Tue, 13 Apr 2010 13:03:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.524
X-Spam-Level:
X-Spam-Status: No, score=-2.524 tagged_above=-999 required=5 tests=[AWL=0.074, BAYES_00=-2.599, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id KnvNy8QbZIcP for <codec@core3.amsl.com>; Tue, 13 Apr 2010 13:03:42 -0700 (PDT)
Received: from mail-pz0-f182.google.com (mail-pz0-f182.google.com [209.85.222.182]) by core3.amsl.com (Postfix) with ESMTP id 3CE3928C15D for <codec@ietf.org>; Tue, 13 Apr 2010 13:03:38 -0700 (PDT)
Received: by pzk12 with SMTP id 12so646309pzk.32 for <codec@ietf.org>; Tue, 13 Apr 2010 13:03:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:cc:content-type; bh=JGO0dDAsVJ51C2ys9oclJ74r+e6YD2mAjuXwZdTSK1I=; b=N+8i85E2STiupvmvw+RWe02YrmZASVPpDJjLSkmQ8Xx2NWkRvJ+9hCffzCLYOr5SUE zIuiEZ2vIZPK9xOpDa/ledKko/yGZvgJO7YO7IPQwuuRUXIeLfwdqELtRE5nhtpAFTO/ W55HRj2YvRV8/t2jsJ4Cb9OrXQQp0K8rW8XsM=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=RDSTjFilPdd/Ij5TZj42wAMon9sAtDzJUc2pzNEc4crCsiMYmNKxzjKsI+rLHH/NSI ymqRSp6jbQ8R/9i/CbLhFuo3JoMYzKpbeX3BFYpzG/tTLGdDtVnnYWlzmIVQPEvoBznv UwEuf3zCG4TjRLZnDkryaT9BBdpSQTWctbdDk=
MIME-Version: 1.0
Received: by 10.231.85.133 with HTTP; Tue, 13 Apr 2010 13:03:29 -0700 (PDT)
In-Reply-To: <CB68DF4CFBEF4942881AD37AE1A7E8C74AB3D92271@IRVEXCHCCR01.corp.ad.broadcom.com>
References: <062.89d7aa91c79b145b798b83610e45ce71@tools.ietf.org> <001e01cadb17$886fcec0$994f6c40$@de> <v2p6e9223711004130756p52726f8bo2db445e749ffe662@mail.gmail.com> <003101cadb1c$828b3990$87a1acb0$@de> <j2l6e9223711004130926nfaa975e3y129cc8cc21c52a84@mail.gmail.com> <m2v28bf2c661004130941g2e2bf956ld512b5d162df9080@mail.gmail.com> <g2h6e9223711004131029m3bfeb1ddq1a0e2bbd8418102f@mail.gmail.com> <m2s28bf2c661004131111pd7880c03m5f225ad464819414@mail.gmail.com> <s2i6e9223711004131143v3f3d2123pc94fe430a59b5776@mail.gmail.com> <CB68DF4CFBEF4942881AD37AE1A7E8C74AB3D92271@IRVEXCHCCR01.corp.ad.broadcom.com>
Date: Tue, 13 Apr 2010 16:03:29 -0400
Received: by 10.114.186.40 with SMTP id j40mr6124810waf.93.1271189009176; Tue, 13 Apr 2010 13:03:29 -0700 (PDT)
Message-ID: <y2q6e9223711004131303l15fb87ffoe1039c56d21c565f@mail.gmail.com>
From: stephen botzko <stephen.botzko@gmail.com>
To: "Raymond (Juin-Hwey) Chen" <rchen@broadcom.com>
Content-Type: multipart/alternative; boundary="0016e64cc89655ab0f048423c1ce"
Cc: "codec@ietf.org" <codec@ietf.org>
Subject: Re: [codec] #8: Sample rates?
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Apr 2010 20:03:45 -0000

I agree.

The changes in audio bandwidth will certainly be audible, and personally I
would prefer a stable experience.

Also, increasing the audio bandwidth might result in audible temporary echo
(in the newly opened frequencies), since the acoustic echo cancellers will
often need to re-adapt.  The acoustic feedback paths can change pretty
quickly at higher frequencies.

Stephen Botzko

On Tue, Apr 13, 2010 at 3:23 PM, Raymond (Juin-Hwey) Chen <
rchen@broadcom.com> wrote:

>  I would agree with Roman that for speech the difference between wideband
> (16 kHz sampling) and super-wideband/full-band (32 ~ 48 kHz sampling) is
> there but very small and many people cannot even distinguish between them,
> while for music the difference can be much more noticeable. From all the
> codec WG emails dating back to last year, it appears to me that most people
> want the IETF codec to handle music as well as speech.  Therefore, it seems
> appropriate to have the maximum sampling rate up to 48 kHz if we want to the
> codec to handle music.
>
>
>
> Regarding the dynamic switching of the sampling rate or audio bandwidth, if
> this is to be done, I think we need to be careful not to change the audio
> bandwidth too frequently, otherwise the frequent change of the audio
> bandwidth can be quite disturbing to the listener.  It would certainly be
> disturbing to me personally if the audio bandwidth changes more frequently
> than once every few seconds, but if it changes once every few minutes,
> that’s probably tolerable to me.
>
>
>
> Raymond
>
>
>
> *From:* codec-bounces@ietf.org [mailto:codec-bounces@ietf.org] *On Behalf
> Of *stephen botzko
> *Sent:* Tuesday, April 13, 2010 11:43 AM
> *To:* Roman Shpount
>
> *Cc:* codec@ietf.org
> *Subject:* Re: [codec] #8: Sample rates?
>
>
>
> >>>
>
> I will not argue superwideband vs wideband, even though we did some, not
> very scientific, blind tests, and in most cases people (especially men)
> cannot even distinguish wideband from superwideband when    listening to
> voice samples. Only a very small percentage of voice energy is even present
> above 8 Khz.
>
> >>>
> Though there isn't much voice energy over 8 kHz, in our (equally
> unscientific) tests, sibilants and fricatives are easier to distinguish if
> you are using superwideband or better.  That was one reason we added Annec C
> to G.722.1.
>
> Fullband is (IMHO) a specsmanship thing for speech (and probably for music
> also for most of us).  Though it may not be that hard to get it, if we are
> shooting for superwideband anyway.
>
> Stephen Botzko
>
>  On Tue, Apr 13, 2010 at 2:11 PM, Roman Shpount <roman@telurix.com> wrote:
>
> I will not argue superwideband vs wideband, even though we did some, not
> very scientific, blind tests, and in most cases people (especially men)
> cannot even distinguish wideband from superwideband when listening to voice
> samples. Only a very small percentage of voice energy is even present above
> 8 Khz.
>
> Music on the other hand, especially past 24Khz sampling rate, gets affected
> by the CODEC more then by bandwidth. For instance 8KHz G.711 encoded music
> sounds poor, but reasonable, and G.729 encoded music cannot be listened to.
> In most cases (apart from critical listening), music sampled at 16Khz is
> acceptable, especially for generation iPod.
>
> My remark about RTPC was to try to develop a CODEC that will function
> properly with RTCP absent. If we require RTCP based mechanisms in order for
> the CODEC to operate properly, this can impede the adoption of this CODEC.
> In no way do I propose to create new signaling mechanisms.
>
>
> ______________________________
> Roman Shpount - www.telurix.com
>
>    On Tue, Apr 13, 2010 at 1:29 PM, stephen botzko <
> stephen.botzko@gmail.com> wrote:
>
> Superwideband (and even fullband) do make speech somewhat more
> intelligible, and also reduce listener fatigue.  Telepresence and other
> videoconferencing equipment use those acoustic bandwidths today, so it would
> be nice if CODEC supported at least superwideband also.
>
> Personally I see some value in carriage of music.  Sometimes our equipment
> is used for music performance.  Distance learning is another use case where
> music has some value, since course and training materials frequently do
> include videos with music.  Though of course conversational speech is the
> dominant use case.
>
> BTW, Videoconferencing devices do almost always support RTCP.  It is
> regrettable that so many VOIP devices do not.  Anyway, I do not think our
> charter scope includes invention of a new mechanism for signaling the
> network quality.
>
> Stephen Botzko
>
>
>
> On Tue, Apr 13, 2010 at 12:41 PM, Roman Shpount <roman@telurix.com> wrote:
>
> I am not sure if this was decided, but should this new CODEC support music
> encoding? If we don't plan to support music, we should probably stick to 16
> Khz sampling rate. If we need music, I would suggest to have a 24 Khz (or
> higher sampling rate) variant. I am not sure how many people here care about
> a non-voice CODEC. For all the practical purposes I don't. I would argue,
> at least, for a fixed 16 KHz sampling rate CODEC variant.
>
> P.S. On the same note, does anybody here cares about using this CODEC with
> multicast? Is there a single commercial multicast voice deployment? From
> what I've seen all multicast does is making IETF voice standards harder to
> understand or implement.
>
> P.P.S. RTCP is almost universally not implemented. The biggest VoIP gateway
> on the market does not generate RTCP. If we will rely on any RTCP
> functionality for bandwidth control it will probably be ignored.
> ______________________________
> Roman Shpount -  www.telurix.com
>
>   On Tue, Apr 13, 2010 at 12:26 PM, stephen botzko <
> stephen.botzko@gmail.com> wrote:
>
>  TCP is a different case, since for this we are using RTCP to signal our
> feedback, and I don't think it has the facility you are envisioning.
>
> Also, I disagree with your presumption that multicast is out of scope.  I
> don't know of any other packetization RFCs that expressly rule out
> multicast, and multicast can be used for interactive applications.
>
> This concept seems pretty theoretical to me.  If we need to manage
> complexity / quality tradeoffs, why not just use profiles (as AVC/H.264
> does) or create a low complexity variant (like G.729A).  I really don't see
> the need for *dynamic* complexity management.
>
> BTW, you seem to be assuming that a lower sample rate results in
> significantly less complexity.  The savings there might not be as great as
> you think, especially if the receiver needs to resample anyway (to prevent
> those sound card limitations you were talking about before).
>
> Stephen Botzko
>
> On Tue, Apr 13, 2010 at 11:18 AM, Christian Hoene <hoene@uni-tuebingen.de>
> wrote:
>
> Hi,
>
>
>
> comments inline:
>
>
>
>
>
> *From:* stephen botzko [mailto:stephen.botzko@gmail.com]
> *Sent:* Tuesday, April 13, 2010 4:56 PM
> *To:* Christian Hoene
> *Cc:* codec@ietf.org
>
>
> *Subject:* Re: [codec] #8: Sample rates?
>
>
>
> This would make the signaling more complicated - personally I am not
> convinced it is worth it.
>
> CH: It is a difficult tradeoff. However, signaling overload is done in
> Skype.  Such as signaling might be very useful for mobile devices, which
> want to save power and thus lower their CPU clock. Or wireless IP based
> headphones which do not have large batteries. I am thinking of signaling the
> states: overloaded, fine, and low. That should be enough for most
> operational cases.
>
>
> I think a better avenue is to bound overall complexity, and to focus on
> dynamically adapting to network conditions (as opposed to dynamic complexity
> management).
>
> CH: I just like to remind that the good old TCP does support both:
> congestion control to adapt to network conditions and flow control take into
> account an overloaded (=full) receiver.
>
> You can't dynamically negotiate complexity in many scenarios anyway - for
> instance it makes no sense if you are using multicast.
>
> CH: Multicast is out of scope anyhow. We are considering an interactive
> codec.
>
> CH: The conferencing scenario might be some more difficult to handle but
> will not a big problem.
>
> Christian
>
>
>
> Stephen Botzko
>
> On Tue, Apr 13, 2010 at 10:42 AM, Christian Hoene <hoene@uni-tuebingen.de>
> wrote:
>
> Hi,
>
>
>
> It still might make sense to negotiate the maximal supported sampling rate
> via SDP or, if possible, to select one out of multiple sampling rates, if
> the audio receiver can cope with multiple rates well. The internal sampling
> frequency of the codec NEEDS NOT to be affected by the external sampling
> frequency.
>
>
>
> However, the decoder might want to signal to the encoder that the decoding
> is requiring too many computational resources and that a less complex coding
> mode (or a lower sampling frequency) should be taken.
>
>
>
> Christian
>
>
>
>
>
> ---------------------------------------------------------------
>
> Dr.-Ing. Christian Hoene
>
> Interactive Communication Systems (ICS), University of Tübingen
>
> Sand 13, 72076 Tübingen, Germany, Phone +49 7071 2970532
> http://www.net.uni-tuebingen.de/
>
>
>
> *From:* stephen botzko [mailto:stephen.botzko@gmail.com]
> *Sent:* Tuesday, April 13, 2010 3:21 PM
> *To:* Kevin P. Fleming
> *Cc:* Christian Hoene; codec@ietf.org
> *Subject:* Re: [codec] #8: Sample rates?
>
>
>
> Though I generally avoid MAY, this could be a case where it makes sense.
>
> Something like:
>
> CODEC MAY reduce the acoustic bandwidth at lower bit rates in order to
> optimize audio quality.
>
> This is free of any technology assumption about *how* the acoustic
> bandwidth is reduced.  The MAY indicates that it is permissible.  But if the
> CODEC algorithm doesn't need to reduce the acoustic bandwidth, then we are
> making no statement that it SHOULD (or SHOULD NOT).
>
> Kevin is distinguishing dynamic changes to the sample rate (for bandwidth
> management) from multiple fixed sample rates; and I agree that is a key
> distinction.
>
> I have not heard any clear application requirement for more than one fixed
> sampling rate.  Though if there is such a requirement, IMHO we would have to
> negotiate the rate within SDP in the usual way, and it would affect the RTP
> timestamps, jitter buffers, etc.  G.722.1 / G.722.1C is one precedent - it
> is the same core codec, but can run at two different sample rates
> (negotiated by SDP).
>
> Stephen Botzko
>
> On Tue, Apr 13, 2010 at 7:41 AM, Kevin P. Fleming <kpfleming@digium.com>
> wrote:
>
> stephen botzko wrote:
>
> > Dynamically changing sample rates on the system level adds some
> > complexity for RTP, since the timestamp granularity is supposed to be
> > the sample rate.
>
> And jitter buffers, and anything else that is based on timestamps and
> sample rates/counts. If the desire is for the codec to be able to change
> sample rates to adjust to network conditions, then I agree with
> Stephen... the 'external' sample rate (input to the encoder and output
> from the decoder) should be fixed, and this is what would be negotiated
> in SDP and used for RTP timestamps. The codec can downsample in the
> encoder and upsample in the decoder if it has decided to transmit fewer
> bits across the network.
>
> --
> Kevin P. Fleming
> Digium, Inc. | Director of Software Technologies
> 445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
> skype: kpfleming | jabber: kfleming@digium.com
> Check us out at www.digium.com & www.asterisk.org
>
>
>
>
>
>
>
> _______________________________________________
> codec mailing list
> codec@ietf.org
> https://www.ietf.org/mailman/listinfo/codec
>
>
>
>
>
>
>
>
>