Re: [codec] #8: Sample rates?

stephen botzko <stephen.botzko@gmail.com> Tue, 13 April 2010 18:43 UTC

Return-Path: <stephen.botzko@gmail.com>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 14C4E28C141 for <codec@core3.amsl.com>; Tue, 13 Apr 2010 11:43:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.515
X-Spam-Level:
X-Spam-Status: No, score=-2.515 tagged_above=-999 required=5 tests=[AWL=0.083, BAYES_00=-2.599, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AsQRH2EYjLfR for <codec@core3.amsl.com>; Tue, 13 Apr 2010 11:43:20 -0700 (PDT)
Received: from mail-iw0-f189.google.com (mail-iw0-f189.google.com [209.85.223.189]) by core3.amsl.com (Postfix) with ESMTP id 70D3B28C0D6 for <codec@ietf.org>; Tue, 13 Apr 2010 11:43:20 -0700 (PDT)
Received: by iwn27 with SMTP id 27so5812281iwn.5 for <codec@ietf.org>; Tue, 13 Apr 2010 11:43:12 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:cc:content-type; bh=iR43d/yFZ7GVEQsm7WVjQl8g7BSG1DiQVQzXsrBvDEs=; b=djKiAUarUUb2/1aE8VWxb8X6fGf/v0NKd6ldhBasvfmx8K7NCRJucdpp71P0821FX9 MYedKg1UshgqtEepspPDUiLUKyxrJlcBkd9UnXi14KAlyhObqpKg2NRbdviDks2gDw2w m0BTmi5rdk5fG4eOXetRtpodmEP7r/AQalv5Y=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=LG8zIQGS2AWsFaCSCTyIszbh4XGwX23V65YrNFf9vdjp/Nc2Jx02/uQDvhyTLXi4Vv IjVVdTHCzaUSHgAXYbmNs4MNJyS/TPfWP7ZPb94FQNiUB3wG3BMpjkuiYPj7MCkuGz7m 3RMHgDJ0dazwHOBvNXOVM7e47LqXiBoJ114ss=
MIME-Version: 1.0
Received: by 10.231.85.133 with HTTP; Tue, 13 Apr 2010 11:43:11 -0700 (PDT)
In-Reply-To: <m2s28bf2c661004131111pd7880c03m5f225ad464819414@mail.gmail.com>
References: <062.89d7aa91c79b145b798b83610e45ce71@tools.ietf.org> <4BC4586F.1010709@digium.com> <o2u6e9223711004130620lb04d335auaafacfa34b0d6fe7@mail.gmail.com> <001e01cadb17$886fcec0$994f6c40$@de> <v2p6e9223711004130756p52726f8bo2db445e749ffe662@mail.gmail.com> <003101cadb1c$828b3990$87a1acb0$@de> <j2l6e9223711004130926nfaa975e3y129cc8cc21c52a84@mail.gmail.com> <m2v28bf2c661004130941g2e2bf956ld512b5d162df9080@mail.gmail.com> <g2h6e9223711004131029m3bfeb1ddq1a0e2bbd8418102f@mail.gmail.com> <m2s28bf2c661004131111pd7880c03m5f225ad464819414@mail.gmail.com>
Date: Tue, 13 Apr 2010 14:43:11 -0400
Received: by 10.143.85.8 with SMTP id n8mr2897964wfl.282.1271184191995; Tue, 13 Apr 2010 11:43:11 -0700 (PDT)
Message-ID: <s2i6e9223711004131143v3f3d2123pc94fe430a59b5776@mail.gmail.com>
From: stephen botzko <stephen.botzko@gmail.com>
To: Roman Shpount <roman@telurix.com>
Content-Type: multipart/alternative; boundary="000e0cd5cdc8353f58048422a26c"
Cc: codec@ietf.org
Subject: Re: [codec] #8: Sample rates?
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Apr 2010 18:43:23 -0000

>>>
I will not argue superwideband vs wideband, even though we did some, not
very scientific, blind tests, and in most cases people (especially men)
cannot even distinguish wideband from superwideband when    listening to
voice samples. Only a very small percentage of voice energy is even present
above 8 Khz.
>>>
Though there isn't much voice energy over 8 kHz, in our (equally
unscientific) tests, sibilants and fricatives are easier to distinguish if
you are using superwideband or better.  That was one reason we added Annec C
to G.722.1.

Fullband is (IMHO) a specsmanship thing for speech (and probably for music
also for most of us).  Though it may not be that hard to get it, if we are
shooting for superwideband anyway.

Stephen Botzko


On Tue, Apr 13, 2010 at 2:11 PM, Roman Shpount <roman@telurix.com> wrote:

> I will not argue superwideband vs wideband, even though we did some, not
> very scientific, blind tests, and in most cases people (especially men)
> cannot even distinguish wideband from superwideband when listening to voice
> samples. Only a very small percentage of voice energy is even present above
> 8 Khz.
>
> Music on the other hand, especially past 24Khz sampling rate, gets affected
> by the CODEC more then by bandwidth. For instance 8KHz G.711 encoded music
> sounds poor, but reasonable, and G.729 encoded music cannot be listened to.
> In most cases (apart from critical listening), music sampled at 16Khz is
> acceptable, especially for generation iPod.
>
> My remark about RTPC was to try to develop a CODEC that will function
> properly with RTCP absent. If we require RTCP based mechanisms in order for
> the CODEC to operate properly, this can impede the adoption of this CODEC.
> In no way do I propose to create new signaling mechanisms.
>
> ______________________________
> Roman Shpount - www.telurix.com
>
>
> On Tue, Apr 13, 2010 at 1:29 PM, stephen botzko <stephen.botzko@gmail.com>wrote:
>
>> Superwideband (and even fullband) do make speech somewhat more
>> intelligible, and also reduce listener fatigue.  Telepresence and other
>> videoconferencing equipment use those acoustic bandwidths today, so it would
>> be nice if CODEC supported at least superwideband also.
>>
>> Personally I see some value in carriage of music.  Sometimes our equipment
>> is used for music performance.  Distance learning is another use case where
>> music has some value, since course and training materials frequently do
>> include videos with music.  Though of course conversational speech is the
>> dominant use case.
>>
>> BTW, Videoconferencing devices do almost always support RTCP.  It is
>> regrettable that so many VOIP devices do not.  Anyway, I do not think our
>> charter scope includes invention of a new mechanism for signaling the
>> network quality.
>>
>> Stephen Botzko
>>
>>
>> On Tue, Apr 13, 2010 at 12:41 PM, Roman Shpount <roman@telurix.com>wrote:
>>
>>> I am not sure if this was decided, but should this new CODEC support
>>> music encoding? If we don't plan to support music, we should probably stick
>>> to 16 Khz sampling rate. If we need music, I would suggest to have a 24 Khz
>>> (or higher sampling rate) variant. I am not sure how many people here care
>>> about a non-voice CODEC. For all the practical purposes I don't. I would
>>> argue,  at least, for a fixed 16 KHz sampling rate CODEC variant.
>>>
>>> P.S. On the same note, does anybody here cares about using this CODEC
>>> with multicast? Is there a single commercial multicast voice deployment?
>>> From what I've seen all multicast does is making IETF voice standards harder
>>> to understand or implement.
>>>
>>> P.P.S. RTCP is almost universally not implemented. The biggest VoIP
>>> gateway on the market does not generate RTCP. If we will rely on any RTCP
>>> functionality for bandwidth control it will probably be ignored.
>>> ______________________________
>>> Roman Shpount -  www.telurix.com
>>>
>>>
>>> On Tue, Apr 13, 2010 at 12:26 PM, stephen botzko <
>>> stephen.botzko@gmail.com> wrote:
>>>
>>>> TCP is a different case, since for this we are using RTCP to signal our
>>>> feedback, and I don't think it has the facility you are envisioning.
>>>>
>>>> Also, I disagree with your presumption that multicast is out of scope.
>>>> I don't know of any other packetization RFCs that expressly rule out
>>>> multicast, and multicast can be used for interactive applications.
>>>>
>>>> This concept seems pretty theoretical to me.  If we need to manage
>>>> complexity / quality tradeoffs, why not just use profiles (as AVC/H.264
>>>> does) or create a low complexity variant (like G.729A).  I really don't see
>>>> the need for *dynamic* complexity management.
>>>>
>>>> BTW, you seem to be assuming that a lower sample rate results in
>>>> significantly less complexity.  The savings there might not be as great as
>>>> you think, especially if the receiver needs to resample anyway (to prevent
>>>> those sound card limitations you were talking about before).
>>>>
>>>> Stephen Botzko
>>>>
>>>> On Tue, Apr 13, 2010 at 11:18 AM, Christian Hoene <
>>>> hoene@uni-tuebingen.de> wrote:
>>>>
>>>>>  Hi,
>>>>>
>>>>>
>>>>>
>>>>> comments inline:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *From:* stephen botzko [mailto:stephen.botzko@gmail.com]
>>>>> *Sent:* Tuesday, April 13, 2010 4:56 PM
>>>>> *To:* Christian Hoene
>>>>> *Cc:* codec@ietf.org
>>>>>
>>>>> *Subject:* Re: [codec] #8: Sample rates?
>>>>>
>>>>>
>>>>>
>>>>> This would make the signaling more complicated - personally I am not
>>>>> convinced it is worth it.
>>>>>
>>>>> CH: It is a difficult tradeoff. However, signaling overload is done in
>>>>> Skype.  Such as signaling might be very useful for mobile devices,
>>>>> which want to save power and thus lower their CPU clock. Or wireless IP
>>>>> based headphones which do not have large batteries. I am thinking of
>>>>> signaling the states: overloaded, fine, and low. That should be enough for
>>>>> most operational cases.
>>>>>
>>>>>
>>>>> I think a better avenue is to bound overall complexity, and to focus on
>>>>> dynamically adapting to network conditions (as opposed to dynamic complexity
>>>>> management).
>>>>>
>>>>> CH: I just like to remind that the good old TCP does support both:
>>>>> congestion control to adapt to network conditions and flow control take into
>>>>> account an overloaded (=full) receiver.
>>>>>
>>>>> You can't dynamically negotiate complexity in many scenarios anyway -
>>>>> for instance it makes no sense if you are using multicast.
>>>>>
>>>>> CH: Multicast is out of scope anyhow. We are considering an interactive
>>>>> codec.
>>>>>
>>>>> CH: The conferencing scenario might be some more difficult to handle
>>>>> but will not a big problem.
>>>>>
>>>>> Christian
>>>>>
>>>>>
>>>>>
>>>>> Stephen Botzko
>>>>>
>>>>>  On Tue, Apr 13, 2010 at 10:42 AM, Christian Hoene <
>>>>> hoene@uni-tuebingen.de> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> It still might make sense to negotiate the maximal supported sampling
>>>>> rate via SDP or, if possible, to select one out of multiple sampling rates,
>>>>> if the audio receiver can cope with multiple rates well. The internal
>>>>> sampling frequency of the codec NEEDS NOT to be affected by the external
>>>>> sampling frequency.
>>>>>
>>>>>
>>>>>
>>>>> However, the decoder might want to signal to the encoder that the
>>>>> decoding is requiring too many computational resources and that a less
>>>>> complex coding mode (or a lower sampling frequency) should be taken.
>>>>>
>>>>>
>>>>>
>>>>> Christian
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------
>>>>>
>>>>> Dr.-Ing. Christian Hoene
>>>>>
>>>>> Interactive Communication Systems (ICS), University of Tübingen
>>>>>
>>>>> Sand 13, 72076 Tübingen, Germany, Phone +49 7071 2970532
>>>>> http://www.net.uni-tuebingen.de/
>>>>>
>>>>>
>>>>>
>>>>> *From:* stephen botzko [mailto:stephen.botzko@gmail.com]
>>>>> *Sent:* Tuesday, April 13, 2010 3:21 PM
>>>>> *To:* Kevin P. Fleming
>>>>> *Cc:* Christian Hoene; codec@ietf.org
>>>>> *Subject:* Re: [codec] #8: Sample rates?
>>>>>
>>>>>
>>>>>
>>>>> Though I generally avoid MAY, this could be a case where it makes
>>>>> sense.
>>>>>
>>>>> Something like:
>>>>>
>>>>> CODEC MAY reduce the acoustic bandwidth at lower bit rates in order to
>>>>> optimize audio quality.
>>>>>
>>>>> This is free of any technology assumption about *how* the acoustic
>>>>> bandwidth is reduced.  The MAY indicates that it is permissible.  But if the
>>>>> CODEC algorithm doesn't need to reduce the acoustic bandwidth, then we are
>>>>> making no statement that it SHOULD (or SHOULD NOT).
>>>>>
>>>>> Kevin is distinguishing dynamic changes to the sample rate (for
>>>>> bandwidth management) from multiple fixed sample rates; and I agree that is
>>>>> a key distinction.
>>>>>
>>>>> I have not heard any clear application requirement for more than one
>>>>> fixed sampling rate.  Though if there is such a requirement, IMHO we would
>>>>> have to negotiate the rate within SDP in the usual way, and it would affect
>>>>> the RTP timestamps, jitter buffers, etc.  G.722.1 / G.722.1C is one
>>>>> precedent - it is the same core codec, but can run at two different sample
>>>>> rates (negotiated by SDP).
>>>>>
>>>>> Stephen Botzko
>>>>>
>>>>> On Tue, Apr 13, 2010 at 7:41 AM, Kevin P. Fleming <
>>>>> kpfleming@digium.com> wrote:
>>>>>
>>>>> stephen botzko wrote:
>>>>>
>>>>> > Dynamically changing sample rates on the system level adds some
>>>>> > complexity for RTP, since the timestamp granularity is supposed to be
>>>>> > the sample rate.
>>>>>
>>>>> And jitter buffers, and anything else that is based on timestamps and
>>>>> sample rates/counts. If the desire is for the codec to be able to
>>>>> change
>>>>> sample rates to adjust to network conditions, then I agree with
>>>>> Stephen... the 'external' sample rate (input to the encoder and output
>>>>> from the decoder) should be fixed, and this is what would be negotiated
>>>>> in SDP and used for RTP timestamps. The codec can downsample in the
>>>>> encoder and upsample in the decoder if it has decided to transmit fewer
>>>>> bits across the network.
>>>>>
>>>>> --
>>>>> Kevin P. Fleming
>>>>> Digium, Inc. | Director of Software Technologies
>>>>> 445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
>>>>> skype: kpfleming | jabber: kfleming@digium.com
>>>>> Check us out at www.digium.com & www.asterisk.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> codec mailing list
>>>> codec@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/codec
>>>>
>>>>
>>>
>>
>