Re: [codec] #8: Sample rates?

Koen Vos <koen.vos@skype.net> Wed, 14 April 2010 00:57 UTC

Return-Path: <koen.vos@skype.net>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 1468D28C1C4 for <codec@core3.amsl.com>; Tue, 13 Apr 2010 17:57:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.602
X-Spam-Level:
X-Spam-Status: No, score=-4.602 tagged_above=-999 required=5 tests=[AWL=0.001, BAYES_00=-2.599, J_CHICKENPOX_23=0.6, MIME_QP_LONG_LINE=1.396, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jxj5d9z0uGAE for <codec@core3.amsl.com>; Tue, 13 Apr 2010 17:57:37 -0700 (PDT)
Received: from mail.skype.net (mail.skype.net [212.187.172.39]) by core3.amsl.com (Postfix) with ESMTP id B6D6D28C1DD for <codec@ietf.org>; Tue, 13 Apr 2010 17:57:32 -0700 (PDT)
Received: from mail.skype.net (localhost [127.0.0.1]) by mail.skype.net (Postfix) with ESMTP id BFF6260133B87; Wed, 14 Apr 2010 01:57:26 +0100 (IST)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=skype.net; h=message-id :date:from:to:cc:subject:references:in-reply-to:mime-version :content-type:content-transfer-encoding; s=mail; bh=Y9xnqSIRKv7M DRdPrwMtEiE3+bY=; b=XgGIasTIm/7Erbk1Gg6TMRaj0PZYJj7area82Ow9W/KE smAXjH/TOkEU2NkdFyuFF1R2nWiwFizwSJ3WWYgtMNf+jPoS9NTdadyd59Zpa4s9 RuWO4ifj6Z1e+uG0kjdaIThHUF3HQOOriSBppD5iYH+xM9ajO71MoZWfQhYCDT0=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=skype.net; h=message-id:date:from :to:cc:subject:references:in-reply-to:mime-version:content-type: content-transfer-encoding; q=dns; s=mail; b=BcJSX1zrkr0yclkUD1qs k5YSNCiEEMNbAoFv0UxZt4BnQ+N/HJMrX7ek1F9Xzl+GsUlltw3RgQ5HQ323lYv4 ZCwxc+eQ6y6AZxDji7YcM/Ks0KfpY/zcgJdO5b7vRePf7NSlubjsiwCOjLBuiR2C 0jA1u6QcMMGlCwjKwuToKtg=
Received: from localhost (localhost [127.0.0.1]) by mail.skype.net (Postfix) with ESMTP id BDCF96013365C; Wed, 14 Apr 2010 01:57:26 +0100 (IST)
X-Virus-Scanned: Debian amavisd-new at dub-mail.skype.net
Received: from mail.skype.net ([127.0.0.1]) by localhost (dub-mail.skype.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Jq2zjCLUc9FT; Wed, 14 Apr 2010 01:57:25 +0100 (IST)
Received: by mail.skype.net (Postfix, from userid 33) id 43FE560133B86; Wed, 14 Apr 2010 01:57:25 +0100 (IST)
Received: from adsl-71-141-129-119.dsl.snfc21.pacbell.net (adsl-71-141-129-119.dsl.snfc21.pacbell.net [71.141.129.119]) by mail.skype.net (Horde Framework) with HTTP; Tue, 13 Apr 2010 17:57:25 -0700
Message-ID: <20100413175725.19971pdv7otmomph@mail.skype.net>
Date: Tue, 13 Apr 2010 17:57:25 -0700
From: Koen Vos <koen.vos@skype.net>
To: stephen botzko <stephen.botzko@gmail.com>
References: <062.89d7aa91c79b145b798b83610e45ce71@tools.ietf.org> <003101cadb1c$828b3990$87a1acb0$@de> <j2l6e9223711004130926nfaa975e3y129cc8cc21c52a84@mail.gmail.com> <m2v28bf2c661004130941g2e2bf956ld512b5d162df9080@mail.gmail.com> <g2h6e9223711004131029m3bfeb1ddq1a0e2bbd8418102f@mail.gmail.com> <m2s28bf2c661004131111pd7880c03m5f225ad464819414@mail.gmail.com> <s2i6e9223711004131143v3f3d2123pc94fe430a59b5776@mail.gmail.com> <CB68DF4CFBEF4942881AD37AE1A7E8C74AB3D92271@IRVEXCHCCR01.corp.ad.broadcom.com> <y2q6e9223711004131303l15fb87ffoe1039c56d21c565f@mail.gmail.com> <20100413164818.546929eae97cjjr6@mail.skype.net> <z2g6e9223711004131723qa66e5a82y3bea15ae44ae5ba0@mail.gmail.com>
In-Reply-To: <z2g6e9223711004131723qa66e5a82y3bea15ae44ae5ba0@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"; DelSp="Yes"; format="flowed"
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
User-Agent: Internet Messaging Program (IMP) H3 (4.3.4)
Cc: codec@ietf.org
Subject: Re: [codec] #8: Sample rates?
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 14 Apr 2010 00:57:39 -0000

Quoting stephen botzko:
> If that is what you mean, then the two API rates are never signaled?

Right, that's what I meant.

koen.


> I am not sure exactly what you mean by  "API rate".  Seems to me that the
> reference encoder source code might accept any sample rate, but convert it
> to 48 kHz before encoding. If that is what you mean, then the two API rates
> are never signaled?
>
> I rather like the idea of negotiating maximum audio bandwidth.  For me that
> is different from dynamic complexity management, and is being signaled for a
> different purpose (wasting coded bits on unheard spectrum degrades the
> quality of the heard spectrum). Signaling the bandwidth, and defining the
> internal codec rate as fullband should let us lock down the RTP timestamp
> rate at 48 kHz (which I think is desirable).
>
> Stephen Botzko
>
>
> On Tue, Apr 13, 2010 at 7:48 PM, Koen Vos <koen.vos@skype.net> wrote:
>
>> My 2 cents on this thread:
>>
>> - Going beyond WB quality matters a lot, even for speech.  See for example
>> slides 2, 3 of http://www.ietf.org/proceedings/10mar/slides/codec-3.pdf.
>>  Agree with Ben that we might as well go full-band.
>>
>> - The codec has 3 sampling rates:
>>  1. Encoder API
>>  2. Codec internal (not strictly sampling rate, more audio bandwidth as
>> Stephen pointed out)
>>  3. Decoder API
>> I don't see a reason to impose that some or all of these be identical.  For
>> (conference) mixing of several streams it helps if all decoders run at the
>> same API sampling rate. It would be unreasonable to require that every
>> encoder also runs at that API sampling rate.  Some encoders may for instance
>> sit in a PSTN gateway, dealing strictly with narrowband signals.
>>
>> - To keep RTP timestamps simple, they could always be based on the same
>> sampling rate, irrespective of any of the ones above. Maybe 48 kHz is a good
>> choice?
>>
>> - Sometimes a decoder runs on hardware with limited audio bandwidth
>> playback capabilities (e.g. mobile devices).  In th?ose cases it helps if
>> the decoder can request a maximum internal audio bandwidth to the encoder,
>> during call setup. Otherwise the encoder may be wasting bits on unused audio
>> spectrum.  So I agree with Christian on this.
>>
>> - Agree with Raymond that fast switching of audio bandwidth sounds
>> unpleasant.  SILK has a hysteresis mechanism for this, and we rarely get
>> more than one or two switches during a Skype call.
>>
>> best,
>> koen.
>>
>>
>>
>>
>> Quoting stephen botzko <stephen.botzko@gmail.com>:
>>
>>  I agree.
>>>
>>> The changes in audio bandwidth will certainly be audible, and personally I
>>> would prefer a stable experience.
>>>
>>> Also, increasing the audio bandwidth might result in audible temporary
>>> echo
>>> (in the newly opened frequencies), since the acoustic echo cancellers will
>>> often need to re-adapt.  The acoustic feedback paths can change pretty
>>> quickly at higher frequencies.
>>>
>>> Stephen Botzko
>>>
>>> On Tue, Apr 13, 2010 at 3:23 PM, Raymond (Juin-Hwey) Chen <
>>> rchen@broadcom.com> wrote:
>>>
>>>   I would agree with Roman that for speech the difference between wideband
>>>> (16 kHz sampling) and super-wideband/full-band (32 ~ 48 kHz sampling) is
>>>> there but very small and many people cannot even distinguish between
>>>> them,
>>>> while for music the difference can be much more noticeable. From all the
>>>> codec WG emails dating back to last year, it appears to me that most
>>>> people
>>>> want the IETF codec to handle music as well as speech.  Therefore, it
>>>> seems
>>>> appropriate to have the maximum sampling rate up to 48 kHz if we want to
>>>> the
>>>> codec to handle music.
>>>>
>>>>
>>>>
>>>> Regarding the dynamic switching of the sampling rate or audio bandwidth,
>>>> if
>>>> this is to be done, I think we need to be careful not to change the audio
>>>> bandwidth too frequently, otherwise the frequent change of the audio
>>>> bandwidth can be quite disturbing to the listener.  It would certainly be
>>>> disturbing to me personally if the audio bandwidth changes more
>>>> frequently
>>>> than once every few seconds, but if it changes once every few minutes,
>>>> that's probably tolerable to me.
>>>>
>>>>
>>>>
>>>> Raymond
>>>>
>>>>
>>>>
>>>> *From:* codec-bounces@ietf.org [mailto:codec-bounces@ietf.org] *On
>>>> Behalf
>>>> Of *stephen botzko
>>>> *Sent:* Tuesday, April 13, 2010 11:43 AM
>>>> *To:* Roman Shpount
>>>>
>>>> *Cc:* codec@ietf.org
>>>> *Subject:* Re: [codec] #8: Sample rates?
>>>>
>>>>
>>>>
>>>> >>>
>>>>
>>>> I will not argue superwideband vs wideband, even though we did some, not
>>>> very scientific, blind tests, and in most cases people (especially men)
>>>> cannot even distinguish wideband from superwideband when    listening to
>>>> voice samples. Only a very small percentage of voice energy is even
>>>> present
>>>> above 8 Khz.
>>>>
>>>> >>>
>>>> Though there isn't much voice energy over 8 kHz, in our (equally
>>>> unscientific) tests, sibilants and fricatives are easier to distinguish
>>>> if
>>>> you are using superwideband or better.  That was one reason we added
>>>> Annec C
>>>> to G.722.1.
>>>>
>>>> Fullband is (IMHO) a specsmanship thing for speech (and probably for
>>>> music
>>>> also for most of us).  Though it may not be that hard to get it, if we
>>>> are
>>>> shooting for superwideband anyway.
>>>>
>>>> Stephen Botzko
>>>>
>>>>  On Tue, Apr 13, 2010 at 2:11 PM, Roman Shpount <roman@telurix.com>
>>>> wrote:
>>>>
>>>> I will not argue superwideband vs wideband, even though we did some, not
>>>> very scientific, blind tests, and in most cases people (especially men)
>>>> cannot even distinguish wideband from superwideband when listening to
>>>> voice
>>>> samples. Only a very small percentage of voice energy is even present
>>>> above
>>>> 8 Khz.
>>>>
>>>> Music on the other hand, especially past 24Khz sampling rate, gets
>>>> affected
>>>> by the CODEC more then by bandwidth. For instance 8KHz G.711 encoded
>>>> music
>>>> sounds poor, but reasonable, and G.729 encoded music cannot be listened
>>>> to.
>>>> In most cases (apart from critical listening), music sampled at 16Khz is
>>>> acceptable, especially for generation iPod.
>>>>
>>>> My remark about RTPC was to try to develop a CODEC that will function
>>>> properly with RTCP absent. If we require RTCP based mechanisms in order
>>>> for
>>>> the CODEC to operate properly, this can impede the adoption of this
>>>> CODEC.
>>>> In no way do I propose to create new signaling mechanisms.
>>>>
>>>>
>>>> ______________________________
>>>> Roman Shpount - www.telurix.com
>>>>
>>>>   On Tue, Apr 13, 2010 at 1:29 PM, stephen botzko <
>>>> stephen.botzko@gmail.com> wrote:
>>>>
>>>> Superwideband (and even fullband) do make speech somewhat more
>>>> intelligible, and also reduce listener fatigue.  Telepresence and other
>>>> videoconferencing equipment use those acoustic bandwidths today, so it
>>>> would
>>>> be nice if CODEC supported at least superwideband also.
>>>>
>>>> Personally I see some value in carriage of music.  Sometimes our
>>>> equipment
>>>> is used for music performance.  Distance learning is another use case
>>>> where
>>>> music has some value, since course and training materials frequently do
>>>> include videos with music.  Though of course conversational speech is the
>>>> dominant use case.
>>>>
>>>> BTW, Videoconferencing devices do almost always support RTCP.  It is
>>>> regrettable that so many VOIP devices do not.  Anyway, I do not think our
>>>> charter scope includes invention of a new mechanism for signaling the
>>>> network quality.
>>>>
>>>> Stephen Botzko
>>>>
>>>>
>>>>
>>>> On Tue, Apr 13, 2010 at 12:41 PM, Roman Shpount <roman@telurix.com>
>>>> wrote:
>>>>
>>>> I am not sure if this was decided, but should this new CODEC support
>>>> music
>>>> encoding? If we don't plan to support music, we should probably stick to
>>>> 16
>>>> Khz sampling rate. If we need music, I would suggest to have a 24 Khz (or
>>>> higher sampling rate) variant. I am not sure how many people here care
>>>> about
>>>> a non-voice CODEC. For all the practical purposes I don't. I would argue,
>>>> at least, for a fixed 16 KHz sampling rate CODEC variant.
>>>>
>>>> P.S. On the same note, does anybody here cares about using this CODEC
>>>> with
>>>> multicast? Is there a single commercial multicast voice deployment? From
>>>> what I've seen all multicast does is making IETF voice standards harder
>>>> to
>>>> understand or implement.
>>>>
>>>> P.P.S. RTCP is almost universally not implemented. The biggest VoIP
>>>> gateway
>>>> on the market does not generate RTCP. If we will rely on any RTCP
>>>> functionality for bandwidth control it will probably be ignored.
>>>> ______________________________
>>>> Roman Shpount -  www.telurix.com
>>>>
>>>>  On Tue, Apr 13, 2010 at 12:26 PM, stephen botzko <
>>>> stephen.botzko@gmail.com> wrote:
>>>>
>>>>  TCP is a different case, since for this we are using RTCP to signal our
>>>> feedback, and I don't think it has the facility you are envisioning.
>>>>
>>>> Also, I disagree with your presumption that multicast is out of scope.  I
>>>> don't know of any other packetization RFCs that expressly rule out
>>>> multicast, and multicast can be used for interactive applications.
>>>>
>>>> This concept seems pretty theoretical to me.  If we need to manage
>>>> complexity / quality tradeoffs, why not just use profiles (as AVC/H.264
>>>> does) or create a low complexity variant (like G.729A).  I really don't
>>>> see
>>>> the need for *dynamic* complexity management.
>>>>
>>>> BTW, you seem to be assuming that a lower sample rate results in
>>>> significantly less complexity.  The savings there might not be as great
>>>> as
>>>> you think, especially if the receiver needs to resample anyway (to
>>>> prevent
>>>> those sound card limitations you were talking about before).
>>>>
>>>> Stephen Botzko
>>>>
>>>> On Tue, Apr 13, 2010 at 11:18 AM, Christian Hoene <
>>>> hoene@uni-tuebingen.de>
>>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> comments inline:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:* stephen botzko [mailto:stephen.botzko@gmail.com]
>>>> *Sent:* Tuesday, April 13, 2010 4:56 PM
>>>> *To:* Christian Hoene
>>>> *Cc:* codec@ietf.org
>>>>
>>>>
>>>> *Subject:* Re: [codec] #8: Sample rates?
>>>>
>>>>
>>>>
>>>> This would make the signaling more complicated - personally I am not
>>>> convinced it is worth it.
>>>>
>>>> CH: It is a difficult tradeoff. However, signaling overload is done in
>>>> Skype.  Such as signaling might be very useful for mobile devices, which
>>>> want to save power and thus lower their CPU clock. Or wireless IP based
>>>> headphones which do not have large batteries. I am thinking of signaling
>>>> the
>>>> states: overloaded, fine, and low. That should be enough for most
>>>> operational cases.
>>>>
>>>>
>>>> I think a better avenue is to bound overall complexity, and to focus on
>>>> dynamically adapting to network conditions (as opposed to dynamic
>>>> complexity
>>>> management).
>>>>
>>>> CH: I just like to remind that the good old TCP does support both:
>>>> congestion control to adapt to network conditions and flow control take
>>>> into
>>>> account an overloaded (=full) receiver.
>>>>
>>>> You can't dynamically negotiate complexity in many scenarios anyway - for
>>>> instance it makes no sense if you are using multicast.
>>>>
>>>> CH: Multicast is out of scope anyhow. We are considering an interactive
>>>> codec.
>>>>
>>>> CH: The conferencing scenario might be some more difficult to handle but
>>>> will not a big problem.
>>>>
>>>> Christian
>>>>
>>>>
>>>>
>>>> Stephen Botzko
>>>>
>>>> On Tue, Apr 13, 2010 at 10:42 AM, Christian Hoene <
>>>> hoene@uni-tuebingen.de>
>>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> It still might make sense to negotiate the maximal supported sampling
>>>> rate
>>>> via SDP or, if possible, to select one out of multiple sampling rates, if
>>>> the audio receiver can cope with multiple rates well. The internal
>>>> sampling
>>>> frequency of the codec NEEDS NOT to be affected by the external sampling
>>>> frequency.
>>>>
>>>>
>>>>
>>>> However, the decoder might want to signal to the encoder that the
>>>> decoding
>>>> is requiring too many computational resources and that a less complex
>>>> coding
>>>> mode (or a lower sampling frequency) should be taken.
>>>>
>>>>
>>>>
>>>> Christian
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------
>>>>
>>>> Dr.-Ing. Christian Hoene
>>>>
>>>> Interactive Communication Systems (ICS), University of Tübingen
>>>>
>>>> Sand 13, 72076 Tübingen, Germany, Phone +49 7071 2970532
>>>> http://www.net.uni-tuebingen.de/
>>>>
>>>>
>>>>
>>>> *From:* stephen botzko [mailto:stephen.botzko@gmail.com]
>>>> *Sent:* Tuesday, April 13, 2010 3:21 PM
>>>> *To:* Kevin P. Fleming
>>>> *Cc:* Christian Hoene; codec@ietf.org
>>>> *Subject:* Re: [codec] #8: Sample rates?
>>>>
>>>>
>>>>
>>>> Though I generally avoid MAY, this could be a case where it makes sense.
>>>>
>>>> Something like:
>>>>
>>>> CODEC MAY reduce the acoustic bandwidth at lower bit rates in order to
>>>> optimize audio quality.
>>>>
>>>> This is free of any technology assumption about *how* the acoustic
>>>> bandwidth is reduced.  The MAY indicates that it is permissible.  But if
>>>> the
>>>> CODEC algorithm doesn't need to reduce the acoustic bandwidth, then we
>>>> are
>>>> making no statement that it SHOULD (or SHOULD NOT).
>>>>
>>>> Kevin is distinguishing dynamic changes to the sample rate (for bandwidth
>>>> management) from multiple fixed sample rates; and I agree that is a key
>>>> distinction.
>>>>
>>>> I have not heard any clear application requirement for more than one
>>>> fixed
>>>> sampling rate.  Though if there is such a requirement, IMHO we would have
>>>> to
>>>> negotiate the rate within SDP in the usual way, and it would affect the
>>>> RTP
>>>> timestamps, jitter buffers, etc.  G.722.1 / G.722.1C is one precedent -
>>>> it
>>>> is the same core codec, but can run at two different sample rates
>>>> (negotiated by SDP).
>>>>
>>>> Stephen Botzko
>>>>
>>>> On Tue, Apr 13, 2010 at 7:41 AM, Kevin P. Fleming <kpfleming@digium.com>
>>>> wrote:
>>>>
>>>> stephen botzko wrote:
>>>>
>>>> > Dynamically changing sample rates on the system level adds some
>>>> > complexity for RTP, since the timestamp granularity is supposed to be
>>>> > the sample rate.
>>>>
>>>> And jitter buffers, and anything else that is based on timestamps and
>>>> sample rates/counts. If the desire is for the codec to be able to change
>>>> sample rates to adjust to network conditions, then I agree with
>>>> Stephen... the 'external' sample rate (input to the encoder and output
>>>> from the decoder) should be fixed, and this is what would be negotiated
>>>> in SDP and used for RTP timestamps. The codec can downsample in the
>>>> encoder and upsample in the decoder if it has decided to transmit fewer
>>>> bits across the network.
>>>>
>>>> --
>>>> Kevin P. Fleming
>>>> Digium, Inc. | Director of Software Technologies
>>>> 445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
>>>> skype: kpfleming | jabber: kfleming@digium.com
>>>> Check us out at www.digium.com & www.asterisk.org
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> codec mailing list
>>>> codec@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/codec
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>> _______________________________________________
>> codec mailing list
>> codec@ietf.org
>> https://www.ietf.org/mailman/listinfo/codec
>>
>