Re: [codec] Next Steps for WG

Roman Shpount <roman@telurix.com> Tue, 18 January 2011 22:52 UTC

Return-Path: <roman@telurix.com>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 3AF1828C0FA for <codec@core3.amsl.com>; Tue, 18 Jan 2011 14:52:37 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.976
X-Spam-Level:
X-Spam-Status: No, score=-2.976 tagged_above=-999 required=5 tests=[AWL=-0.000, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xPWXxr-zJwhd for <codec@core3.amsl.com>; Tue, 18 Jan 2011 14:52:34 -0800 (PST)
Received: from mail-iy0-f172.google.com (mail-iy0-f172.google.com [209.85.210.172]) by core3.amsl.com (Postfix) with ESMTP id 6824728C0D7 for <codec@ietf.org>; Tue, 18 Jan 2011 14:52:34 -0800 (PST)
Received: by iyi42 with SMTP id 42so158074iyi.31 for <codec@ietf.org>; Tue, 18 Jan 2011 14:55:12 -0800 (PST)
Received: by 10.42.169.4 with SMTP id z4mr7141498icy.71.1295391312258; Tue, 18 Jan 2011 14:55:12 -0800 (PST)
Received: from mail-iy0-f172.google.com (mail-iy0-f172.google.com [209.85.210.172]) by mx.google.com with ESMTPS id u5sm4800045ics.6.2011.01.18.14.55.10 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 18 Jan 2011 14:55:11 -0800 (PST)
Received: by iyi42 with SMTP id 42so158046iyi.31 for <codec@ietf.org>; Tue, 18 Jan 2011 14:55:09 -0800 (PST)
MIME-Version: 1.0
Received: by 10.231.16.68 with SMTP id n4mr6865805iba.94.1295391309907; Tue, 18 Jan 2011 14:55:09 -0800 (PST)
Received: by 10.231.20.139 with HTTP; Tue, 18 Jan 2011 14:55:09 -0800 (PST)
In-Reply-To: <148861739.1112857.1295390120406.JavaMail.root@lu2-zimbra>
References: <AANLkTinD-ghqhP_dLkXigBjSZjc4dqZp+q_XY9Vedz9f@mail.gmail.com> <148861739.1112857.1295390120406.JavaMail.root@lu2-zimbra>
Date: Tue, 18 Jan 2011 17:55:09 -0500
Message-ID: <AANLkTinqu9Z3Co9KXQR=9-QF-TXpnN4i6ysT-3AA6V4y@mail.gmail.com>
From: Roman Shpount <roman@telurix.com>
To: Koen Vos <koen.vos@skype.net>
Content-Type: multipart/alternative; boundary="00221532c58cdf480e049a26cabe"
Cc: codec@ietf.org
Subject: Re: [codec] Next Steps for WG
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Jan 2011 22:52:37 -0000

Thank you, this is very helpful.

What I was considering was something similar: Typically you have one speaker
in the conference with short periods of double talk. I was thinking that
decoding, mixing and encoding using independent frames the double talk
period with the few initial frames of the new speaker, and then switching to
a new speaker stream would sound the most natural and will only require
re-encoding for short periods of time.

Thank you for your great work,
_____________________________
Roman Shpount - www.telurix.com


On Tue, Jan 18, 2011 at 5:35 PM, Koen Vos <koen.vos@skype.net> wrote:

> Roman:
>
> Jean-Marc and I discussed this a bit, and this is how we see it:
>
> Resetting the decoder state would lead to a discontinuity in the output
> signal and may create an objectionable "click" sound.  Moreover, the
> incoming stream that's enabled in the switching node at the moment of the
> decoder reset may depend on the last frame before the switch. So the first
> few frames would not be decoded correctly.
>
> Coding and transmitting the decoder state doesn't overcome the
> discontinuity problem.  And it would add a lot of complications and code to
> Opus.
>
> A better method seems to be:
> 1. Determine if the first frame after switching has significant
> dependencies on the previous frame (i.e., long-term prediction in the case
> of SILK, or energy prediction in the case of CELT)
> 2a: If yes: decode the first frame and re-encode it independently.  After
> that, switch to the new stream.
> 2b. If no: simply switch to the new stream immediately.
> This ensures a smooth transition and fast convergence to the correct output
> signal for the new stream.
>
> best,
> koen.
>
>
> ------------------------------
> *From: *"Roman Shpount" <roman@telurix.com>
> *To: *"Kevin P. Fleming" <kpfleming@digium.com>
> *Cc: *codec@ietf.org
> *Sent: *Saturday, January 15, 2011 2:07:28 PM
>
> *Subject: *Re: [codec] Next Steps for WG
>
> First of all, CODEC definition should be independent from the transport
> protocol, and we might need this functionality when RTP SSRC are not
> available.
>
> Furthermore, in case of RTP, there are two problems with your suggestion:
>
> 1. Quite a few clients do not allow remote party to change SSRC without the
> re-Invite. SIP clients note SSRC of the first received RTP packet and ignore
> RTP packets with different SSRC
>
> 2. Even when the client allows switching of SSRC on the fly, a few initial
> RTP packets are typically discarded due to RTP probation. After this,
> clients typically need to pre-fill the jitter buffer, and only after this
> start audio playback. This produces from 40ms to 100ms gap in audio. This is
> very audible and highly undesirable.
>
> Finally, what I was looking for was a bit more then just decoder reset.
> Ideally I wanted to set decoder state to a known value to start decoding
> audio packets that I will send to it.
>
> P.S. As a side note, Intel Performance Primitives CODECs implement a packet
> type in its wave file format that resets the decoder. This packet is used to
> simplify implementation of performance and regression tests. I think they
> are using standard sized packet with all bits set to zero for this purpose.
> We can at least do something similar.
>
> _____________
> Roman Shpount
>
>
> On Sat, Jan 15, 2011 at 8:12 AM, Kevin P. Fleming <kpfleming@digium.com>wrote:
>
>> On 01/14/2011 11:27 PM, Roman Shpount wrote:
>>
>>> My question was not as much about the VAD, as about multiple stream
>>> combining. What I want is to implement is switching between multiple
>>> talker based on VAD (computation of which is done by some external
>>> methods). In this case we need to indicate to the remote party that this
>>> is a new stream and ideally get the remote decoder in such a state that
>>> it can immediately start correctly decoding the new audio without a
>>> glitch. So, the minimum requirement would be to have a flag or a packet
>>> that indicates the decoder that it needs to reset its state. Ideally, we
>>> need an algorithm to get the proper decoder state based on some audio
>>> history stored on the conferencing server, and to send this decoder
>>> state to the remote party so that it can be synchronized. I don't think
>>> this should affect the codec performance. This is just something that
>>> needs to be accommodated in the bitstream by adding a packet type which
>>> either resets the decoder state or sets it to some specified value.
>>>
>>
>> This sort of mechanism already exists when using RTP as the transport
>> mechanism, by setting the marker bit and changing the SSRC to indicate that
>> the payload in the packet is from a different source than the previous
>> packet. In my opinion there's no need for the codec bitstream to have any
>> provisions for such an indication.
>>
>> --
>> Kevin P. Fleming
>> Digium, Inc. | Director of Software Technologies
>> 445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
>> skype: kpfleming | jabber: kfleming@digium.com
>> Check us out at www.digium.com & www.asterisk.org
>>
>> _______________________________________________
>> codec mailing list
>> codec@ietf.org
>> https://www.ietf.org/mailman/listinfo/codec
>>
>
>
> _______________________________________________
> codec mailing list
> codec@ietf.org
> https://www.ietf.org/mailman/listinfo/codec
>