Re: [codec] Next Steps for WG
Koen Vos <koen.vos@skype.net> Tue, 18 January 2011 23:11 UTC
Return-Path: <koen.vos@skype.net>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id CE79928C13C for <codec@core3.amsl.com>; Tue, 18 Jan 2011 15:11:22 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.576
X-Spam-Level:
X-Spam-Status: No, score=-2.576 tagged_above=-999 required=5 tests=[AWL=0.022, BAYES_00=-2.599, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jpg0WJctL+hj for <codec@core3.amsl.com>; Tue, 18 Jan 2011 15:11:15 -0800 (PST)
Received: from mx.skype.net (mx.skype.net [78.141.177.88]) by core3.amsl.com (Postfix) with ESMTP id 15EFA3A6F64 for <codec@ietf.org>; Tue, 18 Jan 2011 15:11:15 -0800 (PST)
Received: from mx.skype.net (localhost [127.0.0.1]) by mx.skype.net (Postfix) with ESMTP id C88A3170C; Wed, 19 Jan 2011 00:13:52 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=skype.net; h=date:from:to :cc:message-id:in-reply-to:subject:mime-version:content-type; s= mx; bh=SEiin+RfHkqLer1bY7rMQI48muw=; b=EvPwAYymhUYUHs/YQ41o2wUkg HUitlcJrCi+aJ4rQ7q3JsuWjiBWFRiO+pbUcIcZkou/3Pf7Z5Fu7m9/+OZif+zc8 mjSyVwzEFCkI2O9QJXUcHmKf3Rs30PeB/QLoHlCt5kGWWceXA2ZRgyBC5uipViyV 772psIJgMQJesbowy0=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=skype.net; h=date:from:to:cc :message-id:in-reply-to:subject:mime-version:content-type; q=dns ; s=mx; b=rzlNTx5HyyVPl30I/LKpFikEvCi4OohD+5KxubCfNLvNj2Gs0LUmcs r9CCO9wIEJU5icrqS2rX9K/UenniO8wEsgO18VZ633u1kMtqn1h9NMq9aXKIjJiU DI88H1BcWoP7cLY0kt3nMfAFBpp6tMYrD7jU4EHusS3il7Vi4zEuw=
Received: from zimbra.skype.net (zimbra.skype.net [78.141.177.82]) by mx.skype.net (Postfix) with ESMTP id C3F3E16FC; Wed, 19 Jan 2011 00:13:52 +0100 (CET)
Received: from localhost (localhost [127.0.0.1]) by zimbra.skype.net (Postfix) with ESMTP id 96BF13507DB1; Wed, 19 Jan 2011 00:13:52 +0100 (CET)
X-Virus-Scanned: amavisd-new at lu2-zimbra.skype.net
Received: from zimbra.skype.net ([127.0.0.1]) by localhost (zimbra.skype.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id c6807sHO4ndq; Wed, 19 Jan 2011 00:13:50 +0100 (CET)
Received: from zimbra.skype.net (lu2-zimbra.skype.net [78.141.177.82]) by zimbra.skype.net (Postfix) with ESMTP id DFA5C3507DA4; Wed, 19 Jan 2011 00:13:50 +0100 (CET)
Date: Wed, 19 Jan 2011 00:13:50 +0100
From: Koen Vos <koen.vos@skype.net>
To: Roman Shpount <roman@telurix.com>
Message-ID: <785740794.1113907.1295392430782.JavaMail.root@lu2-zimbra>
In-Reply-To: <AANLkTinqu9Z3Co9KXQR=9-QF-TXpnN4i6ysT-3AA6V4y@mail.gmail.com>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_Part_1113906_217516684.1295392430780"
X-Originating-IP: [69.181.192.115]
X-Mailer: Zimbra 6.0.9_GA_2686 (ZimbraWebClient - FF3.0 (Win)/6.0.9_GA_2686)
Cc: codec@ietf.org
Subject: Re: [codec] Next Steps for WG
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Jan 2011 23:11:23 -0000
That should work. Make sure to get the time alignment right by adjusting for look-ahead etc. And only the first frame of the double talk period has to be coded independently; for the others you could let the encoder decide. In general, whenever you're adding or removing a stream from the mix you want an independent frame. best, koen. ----- Original Message ----- From: "Roman Shpount" <roman@telurix.com> To: "Koen Vos" <koen.vos@skype.net> Cc: codec@ietf.org Sent: Tuesday, January 18, 2011 2:55:09 PM Subject: Re: [codec] Next Steps for WG Thank you, this is very helpful. What I was considering was something similar: Typically you have one speaker in the conference with short periods of double talk. I was thinking that decoding, mixing and encoding using independent frames the double talk period with the few initial frames of the new speaker, and then switching to a new speaker stream would sound the most natural and will only require re-encoding for short periods of time. Thank you for your great work, _____________________________ Roman Shpount - www.telurix.com On Tue, Jan 18, 2011 at 5:35 PM, Koen Vos < koen.vos@skype.net > wrote: Roman: Jean-Marc and I discussed this a bit, and this is how we see it: Resetting the decoder state would lead to a discontinuity in the output signal and may create an objectionable "click" sound. Moreover, the incoming stream that's enabled in the switching node at the moment of the decoder reset may depend on the last frame before the switch. So the first few frames would not be decoded correctly. Coding and transmitting the decoder state doesn't overcome the discontinuity problem. And it would add a lot of complications and code to Opus. A better method seems to be: 1. Determine if the first frame after switching has significant dependencies on the previous frame (i.e., long-term prediction in the case of SILK, or energy prediction in the case of CELT) 2a: If yes: decode the first frame and re-encode it independently. After that, switch to the new stream. 2b. If no: simply switch to the new stream immediately. This ensures a smooth transition and fast convergence to the correct output signal for the new stream. best, koen. From: "Roman Shpount" < roman@telurix.com > To: "Kevin P. Fleming" < kpfleming@digium.com > Cc: codec@ietf.org Sent: Saturday, January 15, 2011 2:07:28 PM Subject: Re: [codec] Next Steps for WG First of all, CODEC definition should be independent from the transport protocol, and we might need this functionality when RTP SSRC are not available. Furthermore, in case of RTP, there are two problems with your suggestion: 1. Quite a few clients do not allow remote party to change SSRC without the re-Invite. SIP clients note SSRC of the first received RTP packet and ignore RTP packets with different SSRC 2. Even when the client allows switching of SSRC on the fly, a few initial RTP packets are typically discarded due to RTP probation. After this, clients typically need to pre-fill the jitter buffer, and only after this start audio playback. This produces from 40ms to 100ms gap in audio. This is very audible and highly undesirable. Finally, what I was looking for was a bit more then just decoder reset. Ideally I wanted to set decoder state to a known value to start decoding audio packets that I will send to it. P.S. As a side note, Intel Performance Primitives CODECs implement a packet type in its wave file format that resets the decoder. This packet is used to simplify implementation of performance and regression tests. I think they are using standard sized packet with all bits set to zero for this purpose. We can at least do something similar. _____________ Roman Shpount On Sat, Jan 15, 2011 at 8:12 AM, Kevin P. Fleming < kpfleming@digium.com > wrote: On 01/14/2011 11:27 PM, Roman Shpount wrote: My question was not as much about the VAD, as about multiple stream combining. What I want is to implement is switching between multiple talker based on VAD (computation of which is done by some external methods). In this case we need to indicate to the remote party that this is a new stream and ideally get the remote decoder in such a state that it can immediately start correctly decoding the new audio without a glitch. So, the minimum requirement would be to have a flag or a packet that indicates the decoder that it needs to reset its state. Ideally, we need an algorithm to get the proper decoder state based on some audio history stored on the conferencing server, and to send this decoder state to the remote party so that it can be synchronized. I don't think this should affect the codec performance. This is just something that needs to be accommodated in the bitstream by adding a packet type which either resets the decoder state or sets it to some specified value. This sort of mechanism already exists when using RTP as the transport mechanism, by setting the marker bit and changing the SSRC to indicate that the payload in the packet is from a different source than the previous packet. In my opinion there's no need for the codec bitstream to have any provisions for such an indication. -- Kevin P. Fleming Digium, Inc. | Director of Software Technologies 445 Jan Davis Drive NW - Huntsville, AL 35806 - USA skype: kpfleming | jabber: kfleming@digium.com Check us out at www.digium.com & www.asterisk.org _______________________________________________ codec mailing list codec@ietf.org https://www.ietf.org/mailman/listinfo/codec _______________________________________________ codec mailing list codec@ietf.org https://www.ietf.org/mailman/listinfo/codec
- Re: [codec] Next Steps for WG Roman Shpount
- [codec] Next Steps for WG Cullen Jennings
- Re: [codec] Next Steps for WG Jean-Marc Valin
- Re: [codec] Next Steps for WG Christian Hoene
- Re: [codec] Next Steps for WG Koen Vos
- Re: [codec] Next Steps for WG Roman Shpount
- Re: [codec] Next Steps for WG Kevin P. Fleming
- Re: [codec] Next Steps for WG Roman Shpount
- Re: [codec] Next Steps for WG Colin Perkins
- Re: [codec] Next Steps for WG Koen Vos
- Re: [codec] Next Steps for WG Roman Shpount
- Re: [codec] Next Steps for WG Koen Vos