Re: [codec] Next Steps for WG

Thank you, this is very helpful.

What I was considering was something similar: Typically you have one speaker
in the conference with short periods of double talk. I was thinking that
decoding, mixing and encoding using independent frames the double talk
period with the few initial frames of the new speaker, and then switching to
a new speaker stream would sound the most natural and will only require
re-encoding for short periods of time.

Thank you for your great work,
_____________________________
Roman Shpount - www.telurix.com

On Tue, Jan 18, 2011 at 5:35 PM, Koen Vos <koen.vos@skype.net> wrote:

> Roman:
>
> Jean-Marc and I discussed this a bit, and this is how we see it:
>
> Resetting the decoder state would lead to a discontinuity in the output
> signal and may create an objectionable "click" sound.  Moreover, the
> incoming stream that's enabled in the switching node at the moment of the
> decoder reset may depend on the last frame before the switch. So the first
> few frames would not be decoded correctly.
>
> Coding and transmitting the decoder state doesn't overcome the
> discontinuity problem.  And it would add a lot of complications and code to
> Opus.
>
> A better method seems to be:
> 1. Determine if the first frame after switching has significant
> dependencies on the previous frame (i.e., long-term prediction in the case
> of SILK, or energy prediction in the case of CELT)
> 2a: If yes: decode the first frame and re-encode it independently.  After
> that, switch to the new stream.
> 2b. If no: simply switch to the new stream immediately.
> This ensures a smooth transition and fast convergence to the correct output
> signal for the new stream.
>
> best,
> koen.
>
>
> ------------------------------
> *From: *"Roman Shpount" <roman@telurix.com>
> *To: *"Kevin P. Fleming" <kpfleming@digium.com>
> *Cc: *codec@ietf.org
> *Sent: *Saturday, January 15, 2011 2:07:28 PM
>
> *Subject: *Re: [codec] Next Steps for WG
>
> First of all, CODEC definition should be independent from the transport
> protocol, and we might need this functionality when RTP SSRC are not
> available.
>
> Furthermore, in case of RTP, there are two problems with your suggestion:
>
> 1. Quite a few clients do not allow remote party to change SSRC without the
> re-Invite. SIP clients note SSRC of the first received RTP packet and ignore
> RTP packets with different SSRC
>
> 2. Even when the client allows switching of SSRC on the fly, a few initial
> RTP packets are typically discarded due to RTP probation. After this,
> clients typically need to pre-fill the jitter buffer, and only after this
> start audio playback. This produces from 40ms to 100ms gap in audio. This is
> very audible and highly undesirable.
>
> Finally, what I was looking for was a bit more then just decoder reset.
> Ideally I wanted to set decoder state to a known value to start decoding
> audio packets that I will send to it.
>
> P.S. As a side note, Intel Performance Primitives CODECs implement a packet
> type in its wave file format that resets the decoder. This packet is used to
> simplify implementation of performance and regression tests. I think they
> are using standard sized packet with all bits set to zero for this purpose.
> We can at least do something similar.
>
> _____________
> Roman Shpount
>
>
> On Sat, Jan 15, 2011 at 8:12 AM, Kevin P. Fleming <kpfleming@digium.com>wrote:
>
>> On 01/14/2011 11:27 PM, Roman Shpount wrote:
>>
>>> My question was not as much about the VAD, as about multiple stream
>>> combining. What I want is to implement is switching between multiple
>>> talker based on VAD (computation of which is done by some external
>>> methods). In this case we need to indicate to the remote party that this
>>> is a new stream and ideally get the remote decoder in such a state that
>>> it can immediately start correctly decoding the new audio without a
>>> glitch. So, the minimum requirement would be to have a flag or a packet
>>> that indicates the decoder that it needs to reset its state. Ideally, we
>>> need an algorithm to get the proper decoder state based on some audio
>>> history stored on the conferencing server, and to send this decoder
>>> state to the remote party so that it can be synchronized. I don't think
>>> this should affect the codec performance. This is just something that
>>> needs to be accommodated in the bitstream by adding a packet type which
>>> either resets the decoder state or sets it to some specified value.
>>>
>>
>> This sort of mechanism already exists when using RTP as the transport
>> mechanism, by setting the marker bit and changing the SSRC to indicate that
>> the payload in the packet is from a different source than the previous
>> packet. In my opinion there's no need for the codec bitstream to have any
>> provisions for such an indication.
>>
>> --
>> Kevin P. Fleming
>> Digium, Inc. | Director of Software Technologies
>> 445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
>> skype: kpfleming | jabber: kfleming@digium.com
>> Check us out at www.digium.com & www.asterisk.org
>>
>> _______________________________________________
>> codec mailing list
>> codec@ietf.org
>> https://www.ietf.org/mailman/listinfo/codec
>>
>
>
> _______________________________________________
> codec mailing list
> codec@ietf.org
> https://www.ietf.org/mailman/listinfo/codec
>