Re: [rtcweb] Interaction between MediaStream API and signaling

Stefan Hakansson LK <> Sat, 31 March 2012 05:17 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 5A49921F8731 for <>; Fri, 30 Mar 2012 22:17:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -9.807
X-Spam-Status: No, score=-9.807 tagged_above=-999 required=5 tests=[AWL=0.792, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id CekcsJ1nNVhO for <>; Fri, 30 Mar 2012 22:17:50 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 254AF21F8730 for <>; Fri, 30 Mar 2012 22:17:49 -0700 (PDT)
X-AuditID: c1b4fb3d-b7c4fae00000507f-e7-4f76937cbe29
Received: from (Unknown_Domain []) (using TLS with cipher AES128-SHA (AES128-SHA/128 bits)) (Client did not present a certificate) by (Symantec Mail Security) with SMTP id CF.49.20607.C73967F4; Sat, 31 Mar 2012 07:17:48 +0200 (CEST)
Received: from [] ( by ( with Microsoft SMTP Server id; Sat, 31 Mar 2012 07:17:47 +0200
Message-ID: <>
Date: Sat, 31 Mar 2012 07:17:47 +0200
From: Stefan Hakansson LK <>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120310 Thunderbird/11.0
MIME-Version: 1.0
References: <> <>
In-Reply-To: <>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Brightmail-Tracker: AAAAAA==
Subject: Re: [rtcweb] Interaction between MediaStream API and signaling
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sat, 31 Mar 2012 05:17:51 -0000

On 03/30/2012 11:39 PM, Randell Jesup wrote:
> On 3/30/2012 4:59 AM, Stefan Hakansson LK wrote:
>> The JS API has deals with MediaStreams (this is what you send and
>> receive using PeerConnection from an application perspective).
>> A browser receiving RTP streams, needs side info to be able to
>> assemble those RTP streams into MediaStreams in a correct way. The
>> current model is that this is signaled using SDP exchanges (where
>> Haralds MSID proposal would tell which MediaStream an RTP stream
>> belongs to).
>> As I brought up at the mike yesterday, I think we may have a race
>> condition for the responder.
>> For the initiator side browser, this is clear: once an (PR-)ANSWER is
>> received, the responder has received the SDP, and hence can map
>> incoming RTP streams into MediaStreams.
>> But for the responder side this is less clear to me. Imagine
>> applications where the responder just mirrors the initiator - if one
>> of the parties adds a MediaStream to PeerConnection, the other end
>> would add the corresponding MediaStream.
>> This can happen any time in the session, so ICE can very well be up
>> and running. One example could be that the data channel is used for
>> text chat, when one side clicks a button to start video. And the
>> application can have asked for permission to use all input devices
>> earlier, so no user interaction may be involved.
>> In this situation the responder's (added) RTP streams can very well
>> arrive before the ANSWER if I understand correctly.
> Yes.  Just like in SIP.  And so when you send an OFFER (or modified
> re-OFFER), you must be ready to receive data per that offer even if no
> ANSWER has been received - just like in SIP.  And if its a re-offer, you
> need to accept the old, and accept the new (though you could probably
> use reception of obviously new-OFFER media to turn off
> decoding/rendering old-OFFER in preparation for the ANSWER).
> The flip side of this is the responder has to infer when the sender
> switches over to the result of the ANSWER from the media.  For example:
> A                                      B
> <--- H.261 --->
> re-OFFER(VP8) --->
> <-- ANSWER(VP8) (delayed in reception)
> <-----------VP8            (A should infer that B ANSWERed and accepted VP8)
>    ---------->  H.261
> <-- ANSWER(VP8) (received)
> <--------VP8---------->  (B should infer by reception of VP8 that ANSWER
> was received)
> (Personally, I hate inferences, but without a 3 (or 4) way handshake,
> you have to).  If you switches of codecs are staged, then this isn't
> (much) of a problem.  Either leave old codec on the list, or leave it on
> the list until accept, and then re-OFFER to remove the un-used codec.

I think I understand what you mean, and this would work fine as long as 
you just switch codecs that are used in already set-up MediaStreams.

But if A in this case, as part of re-OFFERING the session, not only 
offers a new codec (VP8) for the already flowing video but also adds a 
new outgoing video stream (e.g. front cam), and then (without receiving 
the ANSWER - delayed in reception) starts receiving VP8 video it could 
not really know if this VP8 video is new video from the responders front 
cam or just a new codec for the existing (back cam) video from the 
responder to the sender.

> One problem is what to do in the switchover window when you might get a
> mixture of old and new media, especially if you moved them to different
> ports and so can't count on RTP sequence re-ordering to un-mix them; in
> the past I dealt with that (and long codec-switch times) by locking out
> codec changes for a fraction of a second after I do one.  Not a huge
> deal, however.
> My apologies if I've missed something in JSEP; I've been heads-down
> enough in Data Channels and bring-up that I could have a disconnect here
> and be saying something silly.

Actually I don't think this is very JSEP related; it is the generic 
problem that the browser receiving RTP streams need some side info about 
them before being able to do anything sensible with them.

>> I think we need to find a way to handle this. One way is to add an
>> "ACK" that indicates to the responder that the initiator has received
>> the ANSWER, but I'm not sure that is the best way.
> If you need to know that, you need a SIP-style ACK.

As explained, I do think we need to know that.