Re: [codec] Next Steps for WG
Roman Shpount <roman@telurix.com> Tue, 18 January 2011 22:52 UTC
Return-Path: <roman@telurix.com>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 3AF1828C0FA for <codec@core3.amsl.com>; Tue, 18 Jan 2011 14:52:37 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.976
X-Spam-Level:
X-Spam-Status: No, score=-2.976 tagged_above=-999 required=5 tests=[AWL=-0.000, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xPWXxr-zJwhd for <codec@core3.amsl.com>; Tue, 18 Jan 2011 14:52:34 -0800 (PST)
Received: from mail-iy0-f172.google.com (mail-iy0-f172.google.com [209.85.210.172]) by core3.amsl.com (Postfix) with ESMTP id 6824728C0D7 for <codec@ietf.org>; Tue, 18 Jan 2011 14:52:34 -0800 (PST)
Received: by iyi42 with SMTP id 42so158074iyi.31 for <codec@ietf.org>; Tue, 18 Jan 2011 14:55:12 -0800 (PST)
Received: by 10.42.169.4 with SMTP id z4mr7141498icy.71.1295391312258; Tue, 18 Jan 2011 14:55:12 -0800 (PST)
Received: from mail-iy0-f172.google.com (mail-iy0-f172.google.com [209.85.210.172]) by mx.google.com with ESMTPS id u5sm4800045ics.6.2011.01.18.14.55.10 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 18 Jan 2011 14:55:11 -0800 (PST)
Received: by iyi42 with SMTP id 42so158046iyi.31 for <codec@ietf.org>; Tue, 18 Jan 2011 14:55:09 -0800 (PST)
MIME-Version: 1.0
Received: by 10.231.16.68 with SMTP id n4mr6865805iba.94.1295391309907; Tue, 18 Jan 2011 14:55:09 -0800 (PST)
Received: by 10.231.20.139 with HTTP; Tue, 18 Jan 2011 14:55:09 -0800 (PST)
In-Reply-To: <148861739.1112857.1295390120406.JavaMail.root@lu2-zimbra>
References: <AANLkTinD-ghqhP_dLkXigBjSZjc4dqZp+q_XY9Vedz9f@mail.gmail.com> <148861739.1112857.1295390120406.JavaMail.root@lu2-zimbra>
Date: Tue, 18 Jan 2011 17:55:09 -0500
Message-ID: <AANLkTinqu9Z3Co9KXQR=9-QF-TXpnN4i6ysT-3AA6V4y@mail.gmail.com>
From: Roman Shpount <roman@telurix.com>
To: Koen Vos <koen.vos@skype.net>
Content-Type: multipart/alternative; boundary="00221532c58cdf480e049a26cabe"
Cc: codec@ietf.org
Subject: Re: [codec] Next Steps for WG
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Jan 2011 22:52:37 -0000
Thank you, this is very helpful. What I was considering was something similar: Typically you have one speaker in the conference with short periods of double talk. I was thinking that decoding, mixing and encoding using independent frames the double talk period with the few initial frames of the new speaker, and then switching to a new speaker stream would sound the most natural and will only require re-encoding for short periods of time. Thank you for your great work, _____________________________ Roman Shpount - www.telurix.com On Tue, Jan 18, 2011 at 5:35 PM, Koen Vos <koen.vos@skype.net> wrote: > Roman: > > Jean-Marc and I discussed this a bit, and this is how we see it: > > Resetting the decoder state would lead to a discontinuity in the output > signal and may create an objectionable "click" sound. Moreover, the > incoming stream that's enabled in the switching node at the moment of the > decoder reset may depend on the last frame before the switch. So the first > few frames would not be decoded correctly. > > Coding and transmitting the decoder state doesn't overcome the > discontinuity problem. And it would add a lot of complications and code to > Opus. > > A better method seems to be: > 1. Determine if the first frame after switching has significant > dependencies on the previous frame (i.e., long-term prediction in the case > of SILK, or energy prediction in the case of CELT) > 2a: If yes: decode the first frame and re-encode it independently. After > that, switch to the new stream. > 2b. If no: simply switch to the new stream immediately. > This ensures a smooth transition and fast convergence to the correct output > signal for the new stream. > > best, > koen. > > > ------------------------------ > *From: *"Roman Shpount" <roman@telurix.com> > *To: *"Kevin P. Fleming" <kpfleming@digium.com> > *Cc: *codec@ietf.org > *Sent: *Saturday, January 15, 2011 2:07:28 PM > > *Subject: *Re: [codec] Next Steps for WG > > First of all, CODEC definition should be independent from the transport > protocol, and we might need this functionality when RTP SSRC are not > available. > > Furthermore, in case of RTP, there are two problems with your suggestion: > > 1. Quite a few clients do not allow remote party to change SSRC without the > re-Invite. SIP clients note SSRC of the first received RTP packet and ignore > RTP packets with different SSRC > > 2. Even when the client allows switching of SSRC on the fly, a few initial > RTP packets are typically discarded due to RTP probation. After this, > clients typically need to pre-fill the jitter buffer, and only after this > start audio playback. This produces from 40ms to 100ms gap in audio. This is > very audible and highly undesirable. > > Finally, what I was looking for was a bit more then just decoder reset. > Ideally I wanted to set decoder state to a known value to start decoding > audio packets that I will send to it. > > P.S. As a side note, Intel Performance Primitives CODECs implement a packet > type in its wave file format that resets the decoder. This packet is used to > simplify implementation of performance and regression tests. I think they > are using standard sized packet with all bits set to zero for this purpose. > We can at least do something similar. > > _____________ > Roman Shpount > > > On Sat, Jan 15, 2011 at 8:12 AM, Kevin P. Fleming <kpfleming@digium.com>wrote: > >> On 01/14/2011 11:27 PM, Roman Shpount wrote: >> >>> My question was not as much about the VAD, as about multiple stream >>> combining. What I want is to implement is switching between multiple >>> talker based on VAD (computation of which is done by some external >>> methods). In this case we need to indicate to the remote party that this >>> is a new stream and ideally get the remote decoder in such a state that >>> it can immediately start correctly decoding the new audio without a >>> glitch. So, the minimum requirement would be to have a flag or a packet >>> that indicates the decoder that it needs to reset its state. Ideally, we >>> need an algorithm to get the proper decoder state based on some audio >>> history stored on the conferencing server, and to send this decoder >>> state to the remote party so that it can be synchronized. I don't think >>> this should affect the codec performance. This is just something that >>> needs to be accommodated in the bitstream by adding a packet type which >>> either resets the decoder state or sets it to some specified value. >>> >> >> This sort of mechanism already exists when using RTP as the transport >> mechanism, by setting the marker bit and changing the SSRC to indicate that >> the payload in the packet is from a different source than the previous >> packet. In my opinion there's no need for the codec bitstream to have any >> provisions for such an indication. >> >> -- >> Kevin P. Fleming >> Digium, Inc. | Director of Software Technologies >> 445 Jan Davis Drive NW - Huntsville, AL 35806 - USA >> skype: kpfleming | jabber: kfleming@digium.com >> Check us out at www.digium.com & www.asterisk.org >> >> _______________________________________________ >> codec mailing list >> codec@ietf.org >> https://www.ietf.org/mailman/listinfo/codec >> > > > _______________________________________________ > codec mailing list > codec@ietf.org > https://www.ietf.org/mailman/listinfo/codec >
- Re: [codec] Next Steps for WG Roman Shpount
- [codec] Next Steps for WG Cullen Jennings
- Re: [codec] Next Steps for WG Jean-Marc Valin
- Re: [codec] Next Steps for WG Christian Hoene
- Re: [codec] Next Steps for WG Koen Vos
- Re: [codec] Next Steps for WG Roman Shpount
- Re: [codec] Next Steps for WG Kevin P. Fleming
- Re: [codec] Next Steps for WG Roman Shpount
- Re: [codec] Next Steps for WG Colin Perkins
- Re: [codec] Next Steps for WG Koen Vos
- Re: [codec] Next Steps for WG Roman Shpount
- Re: [codec] Next Steps for WG Koen Vos