Re: [rtcweb] Multiple videos in one MediaStream (Re: MediaStream Label and CNAME)

Harald Alvestrand <harald@alvestrand.no> Tue, 13 September 2011 14:49 UTC

Return-Path: <harald@alvestrand.no>
X-Original-To: rtcweb@ietfa.amsl.com
Delivered-To: rtcweb@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3C84A21F8AE1 for <rtcweb@ietfa.amsl.com>; Tue, 13 Sep 2011 07:49:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -108.253
X-Spam-Level:
X-Spam-Status: No, score=-108.253 tagged_above=-999 required=5 tests=[AWL=2.346, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oQoMm3wl4ULq for <rtcweb@ietfa.amsl.com>; Tue, 13 Sep 2011 07:49:17 -0700 (PDT)
Received: from eikenes.alvestrand.no (eikenes.alvestrand.no [158.38.152.233]) by ietfa.amsl.com (Postfix) with ESMTP id 4E03621F8804 for <rtcweb@ietf.org>; Tue, 13 Sep 2011 07:49:17 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by eikenes.alvestrand.no (Postfix) with ESMTP id 5FB6839E0AF; Tue, 13 Sep 2011 16:51:21 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at eikenes.alvestrand.no
Received: from eikenes.alvestrand.no ([127.0.0.1]) by localhost (eikenes.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iENffnVF9Wv5; Tue, 13 Sep 2011 16:51:20 +0200 (CEST)
Received: from hta-dell.lul.corp.google.com (62-20-124-50.customer.telia.com [62.20.124.50]) by eikenes.alvestrand.no (Postfix) with ESMTPS id BB7A639E088; Tue, 13 Sep 2011 16:51:20 +0200 (CEST)
Message-ID: <4E6F6DE8.5040200@alvestrand.no>
Date: Tue, 13 Sep 2011 16:51:20 +0200
From: Harald Alvestrand <harald@alvestrand.no>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110617 Thunderbird/3.1.11
MIME-Version: 1.0
To: "Olle E. Johansson" <oej@edvina.net>
References: <4E6F17AB.4000005@ericsson.com> <4E6F6963.9090702@alvestrand.no> <D2889951-B3E0-48BE-9CF0-327776298122@edvina.net>
In-Reply-To: <D2889951-B3E0-48BE-9CF0-327776298122@edvina.net>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 8bit
Cc: rtcweb@ietf.org
Subject: Re: [rtcweb] Multiple videos in one MediaStream (Re: MediaStream Label and CNAME)
X-BeenThere: rtcweb@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <rtcweb.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtcweb>
List-Post: <mailto:rtcweb@ietf.org>
List-Help: <mailto:rtcweb-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Sep 2011 14:49:18 -0000

On 09/13/11 16:38, Olle E. Johansson wrote:
> 13 sep 2011 kl. 16:32 skrev Harald Alvestrand:
>
>> On 09/13/11 10:43, Magnus Westerlund wrote:
>>> WG,
>>> (As an individual contributor)
>>>
>>>
>>> There has been some discussion as result of the presentation of
>>> terminology in the RTCWEB Interim meeting last Thursday. The biggest
>>> question was why CNAME can't map to MediaStream label. Below we
>>> clarify why we think CNAME and label are separate entities.
>>>
>>> One part in this reasoning has to do with the current definition of
>>> ’media resource’
>>> (<http://dev.w3.org/html5/spec/Overview.html#media-resource>) and media
>>> elements of html5. The ‘media resource’ could be a file, or, more
>>> relevant to this discussion, a MediaStream. In that usage only a single
>>> video track can be played simultaneously and in sync with one or more
>>> audio tracks.
>>>
>>> Thus unless we modify an existing semantics the only way of playing
>>> multiple video tracks in sync with one or more audio tracks is to have
>>> multiple MediaStream objects.
>> Sorry, I think this can be handled with the existing API without any issue.
>> Apologies to those who know JavaScript better than me, but this is a handler
>> that can handle a new incoming video stream
>>
>> NewStreamHandler(stream) {
>>    myMultiVideoStream = stream
>>    myFirstVideo = new MediaStream(myMultiVideoStream)
>>    myFirstVideo.videoTracks.select(1)
>>    oneVideoObject.setSource(myFirstVideo.getUrl())
>>    mySecondVideo = new MediaStream(myMultiVideoStream)
>>    mySecondVideo.videoTracks.select(2)
>>    anotherVideoObject.setSource(mySecondVideo.getUrl())
>> }
>>
>> I think the API can be improved, but the improvements are unlikely to lose this functionality.
>> (Remember that the API objects are the control surfaces on the stream - the video data will likely go straight from the incoming buffer, through the codec for decoding, and blitted onto the screen; the "copying" from one MediaStream to another MediaStream is purely conceptual.)
>>
>> I don't think it makes sense to discuss the other points before getting this one put to rest.
>>
>>
> Just to add to the stew:
>
>   - For each stream  (video, text, audio) you have one outbound and at least one inbound
Nit: I'm not sure what "stream" refers to in this sentence.

For the "hockey game" use case, you have 2 video media stream tracks 
outbound, presumably from the same PeerConnection, possibly in the same 
MediaStream (they're in sync), and carried over the same RTP session, 
too. (That one was added at least partially to make sure we'd allow 
multiple outbound video streams).
>   - If we are going to support SDP and SIP, you will end up with multiple inbound streams - one at a time or at the same time
>
> Forking may lead to multiple incoming streams for one outbound "call". Early media may lead to one incoming stream, which may be replaced by another endpoint at "answer time".
>
> I think one important question here is if we are going to allow this at all. If we follow the SIP model we just have to.
>
> If possible, I would like to see a restriction in rtcweb so that we push this complexity to the server, far away from the browser.
My preferred place-to-end-up would be that the browser supports media 
streams and media stream tracks being added and removed at any time 
during the call, without attaching special semantics to them, and the 
Javascript being left to sort out whether it's "early", "conference" or 
"other".