Re: [rtcweb] Multiple videos in one MediaStream (Re: MediaStream Label and CNAME)

Magnus Westerlund <> Wed, 14 September 2011 13:28 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id A42B121F8BE7 for <>; Wed, 14 Sep 2011 06:28:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -106.501
X-Spam-Status: No, score=-106.501 tagged_above=-999 required=5 tests=[AWL=0.098, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id bgy5tD+L3s7x for <>; Wed, 14 Sep 2011 06:28:00 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id F328E21F8C2A for <>; Wed, 14 Sep 2011 06:27:52 -0700 (PDT)
X-AuditID: c1b4fb3d-b7c47ae000000b17-2f-4e70ac590f96
Received: from (Unknown_Domain []) by (Symantec Mail Security) with SMTP id 92.D9.02839.95CA07E4; Wed, 14 Sep 2011 15:30:01 +0200 (CEST)
Received: from [] ( by ( with Microsoft SMTP Server id; Wed, 14 Sep 2011 15:30:01 +0200
Message-ID: <>
Date: Wed, 14 Sep 2011 15:29:57 +0200
From: Magnus Westerlund <>
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:6.0.2) Gecko/20110902 Thunderbird/6.0.2
MIME-Version: 1.0
To: Harald Alvestrand <>
References: <> <> <> <>
In-Reply-To: <>
X-Enigmail-Version: 1.3.1
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: 8bit
X-Brightmail-Tracker: AAAAAA==
Cc: "" <>
Subject: Re: [rtcweb] Multiple videos in one MediaStream (Re: MediaStream Label and CNAME)
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 14 Sep 2011 13:28:01 -0000


See below.

On 2011-09-14 12:21, Harald Alvestrand wrote:
> On 09/14/11 11:53, Magnus Westerlund wrote:
>> On 2011-09-13 16:32, Harald Alvestrand wrote:
>>> On 09/13/11 10:43, Magnus Westerlund wrote:
>> If I read this correctly, what you are doing is to creating two media
>> streams based on a first media stream. Thus you have multiple
>> mediaStream objects that contains tracks that has a common
>> synchronziation context. So from my point of view you have very well
>> demonstrated why you need CNAME to be a property associated with the
>> tracks in the media stream and not be the label for the mediaStream.
> That's a good point.
> My thinking when typing this was that stuff that happens inside a 
> browser is synchronized "enough" by default, so that we don't need to 
> carry the label along in order to keep synchronity between myFirstVideo 
> and mySecondVideo. This may or may not be optimistic; input sought.

I am not certain that would work well. In reality you do need to keep
track of all your media resources against some reference clock. As
offset etc are depending on the context the streams comes from I think
in most cases you do need to track them. You can't rely on them being
sync only because they are local.

My suggestion is that the CNAME never is exposed in the API. It is
something that happens under the hood and only really present when
transmitting the tracks over RTP where each SSRC will expose the CNAME
it is associated with.

> The audio tracks inside myFirstVideo are declared as synchronized to the 
> myFirstVideo video track by virtue of being inside the same MediaStream 
> object; this carries over to them being synchronized if myFirstVideo is 
> connected to another PeerConnection, if this is allowed (see "recording" 
> discussion). (In this case, I think the CNAME should NOT be the same as 
> that coming in over the incoming PeerConnection).

Can we please keep the forwarding of media stream separate from just the
basics. I will come back to the forwarding case and my view on how CNAME
and label needs to be handled in these cases.

Lets start with the basic case when you media stream cloning/forking on
the receiver side. A singel mediaStream with a label and one audio and 2
video is received. They are from one synchronization context. You clone
the media stream into a second one where only the second video is
selected for playback and the first MS has the audio and the first video.

If one uses the received CNAME as indicator that they are the same
context and play them back this works fine. But unless the label and
cname is inherited and kept you broken sync. And you have in fact
mandated a less from obvious implementation method to achieve
synchronized playback. But it requires correct implementation on the
sending side.

Also, if the application uses the data channel or a channel over the
webserver to tell each other about what it is doing with the different
tracks, the label either needs to be different for the different clones
or you can't separate which mediaStream object you are referring to.

If we look at this from the senders perspective, the sender must create
a single media stream in an object it can't directly play back locally
for self view without splitting as it contains two video tracks.

And if the application want to distribute the second camera as an
identifiable labeled entity and still be synchronized with the first
camera and the audio I don't understand how to accomplish this with the
current API unless you can have multiple mediaStreams with still allows
for sync.

If relaying of media is allowed I have the following view. I think the
CNAME should be maintained if the media in itself isn't modified. If it
is mixed or changed, or combined in the node, then a new CNAME shall be

When it comes to the MediaStream label it will implicitly be the same.
You receive the MediaStream from one PeerConnection. Then you add it to
another. In this JS instance it is the same MediaStream. You just have
to keep in mind if you refer to operations that should be done on the
transmission of that MS or local operations on it (thus more associated
with what you receive).

>> Or how do you see the label for the media stream being handled when you
>> create two mediaStream objects from the incomming "stream" that has one
>> label?

> My original thinking was that a CNAME is created for any MediaStream at 
> creation time, so that we're operating with three CNAMEs in the example 
> above.

I think that is a bad idea. This as it doesn't match the synchronization
context of the actual stream. All tracks in all three MS are in fact
from the same sync context.

It also makes it very easy for a careless implementor to destroy the
information it in fact may want.

> Another possibility is that CNAME is a property of the attachment of a 
> MediaStream to a PeerConnection object; this has the advantage that the 
> CNAME property is only defined in the cases where we need it, but we 
> don't have to carry the baggage of actually generating CNAMEs for 
> MediaStreams that are not connected to PeerConnections.

My view is that it should be associated with the actual track source
node. So any media coming from a local device in a browser instance will
have the same CNAME as they are related to the same wall clock and
captured in a common time space. As it is unlikely that the implementor
of the webapp actually can determine if any source is coming from
another physical context. And capturing multiple physical contexts on a
common time base isn't a problem as long as all individual tracks that
one wants to capture are on the same time base.

I would also like to point out that due to that an implementor may call
getusermedia more than once in the program, thus creating different
MediaStream objects but for a potential overlapping set of underlying
media sources. Shouldn't these be possible to sync as they in reality
are as they are coming from all local devices?


Magnus Westerlund

Multimedia Technologies, Ericsson Research EAB/TVM
Ericsson AB                | Phone  +46 10 7148287
Färögatan 6                | Mobile +46 73 0949079
SE-164 80 Stockholm, Sweden| mailto: