[rtcweb] Comments on draft-jennings-rtcweb-plan-01

Stefan Håkansson LK <stefan.lk.hakansson@ericsson.com> Thu, 28 February 2013 15:15 UTC

Message-ID: <512F747D.4050403@ericsson.com>
Date: Thu, 28 Feb 2013 16:15:09 +0100
From: Stefan Håkansson LK <stefan.lk.hakansson@ericsson.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130215 Thunderbird/17.0.3
MIME-Version: 1.0
To: "rtcweb@ietf.org" <rtcweb@ietf.org>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Subject: [rtcweb] Comments on draft-jennings-rtcweb-plan-01
Precedence: list

I have some questions and comments on the -01 draft [1]:

General
=======
This really combines model A (initial part of the doc) and B (section 
7.2) as discussed in Boston.

Regarding the idea of changing the msid structure, and providing an RTP 
header extension, I'm not sure what the gain really is. Sure, we can 
experience SSRC collisions - but it is a well known problem and easy to 
recover from.

I also note that in your proposal the MediaStreamTrack id is carried 
("f") but not the _MediaStream_ id (as "msid" and "r" seem to identify 
the PeerConnection).

I would prefer to use SSRC to identify RTP flows, and to signal how 
SSRC's relate to PC-streams and PC-tracks (and possibly sources) in the 
SDP (similar to what is proposed in the current msid draft)

When it comes to the RTP header extension, I can see the advantage in 
some situations (i.e. when the signaling is late), but is it worth it? 
Would we not get a lot of problems with legacy implementations? And, 
what about the overhead? Adding 128 bits is quite much - used with an 
codec with 20 ms framing it would add 6.4 kbps, and many speech codecs 
work fine at similar bitrates. (OTOH, it should be easy to compress in 
some header compression if that is deployed)

Another item that probably need some thinking is how this works with 
trickle ICE. With the model of having (at least) one m-line per 
PC-track, the number of m-lines would vary a lot. For example, what 
happens if a PC is created, two PC-streams are added, followed by 
createOffer/setLocal (now the ICE machinery starts); and then, before 
things settle, the app adds two PC-tracks to one of the attached 
PC-streams and removes the other PC-stream from the PeerConnection 
followed by a new createOffer/setLocal? Is there a risk that the ICE 
-candidate's references to m-lines gets out of sync?


Needs definition
================
LS - I read it as Lip Sync (as specified in RFC5388); is that what is meant?
Application - is this the browser, or the web application running in the 
browser, or the combination?
System - not defined.
Device in the msid discussion - seems to mean PC, but I'm not sure

"Requirements" section
====================
The list differs a bit from what was discussed at the Boston interim 
[2]. E.g. "Add/remove one way video ﬂows with minimal chance of glare on 
non legacy apps" is missing. What is the reasoning behind?

"Approach" section
================
In bullet 7 it is said that "If a PC-Track appears in more than one 
PC-Stream, then all the PC-Streams with that PC-Track MUST have the same 
CNAME." Technically that can not happen since they will be individual 
PC-Track instances (as defined in the PC-Stream constructor). Perhaps 
reformulate to something like "If PC-Track's representing the same 
source appears in more than one PC-Stream, then all the PC-Streams with 
PC-Track's representing that source MUST have the same CNAME."

I don't understand bullet 13. What is really meant? I think that having 
a way for the browser's to establish an understanding on how many 
video/audio tracks that can be supported (coded, transmitted, decoded) 
simultaneously makes a lot of sense. But I see no need for an API where 
the web app can define its wishes - this can be handled as app signaling 
and needs no API.

I don't fully understand how you intend "label" to be used. In the W3C 
space, this is now a readonly attribute that IIUAC could give info like 
"USB cam makeX" on the local side, but would be null on the remote side. 
I sense that you intend it more to be used to identify the purpose of a 
PC-track or stream. Possibly, the SDP attribute "label" could be used to 
bind PC-tracks for the same purpose to the same m-line, see more below, 
but that is another "label" to me.

"Open Issues" section
=====================
It is said that

"The overall solution is complicated considerably by the fact that
    WebRTC allows a PC-Track to be used in more than one PC-Stream but
    requires only one copy of the RTP data for the track to be sent."

Again, with the current PC-stream constructor, you can't have the same 
PC-track in two PC-stream - those are individual objects. And I don't 
think we've really discussed the optimization of sending only one RTP 
for two PC-tracks representing the same source.

Other comments/questions
========================
In a rtcweb context (in a browser-to-browser) scenario, it seems most 
natural that all m-lines are unidirectional per default. The reason is 
that the API design, which deals with sending streams, fits better with 
this model. As an example, say that A starts sending two videos to B, 
and B then sends three videos to A. How should those two and three 
streams share m-lines? I think we could add a possibility for apps that 
want them to share to do this via a constraint in the API if we see need 
(and perhaps the SDP attribute "label" could be used).

One approach of handling simulcast would be to use different PC-streams 
(and possibly even different PeerConnection's) for the different 
resolutions. But then synchronization across PC-streams (or even 
PeerConnections) is needed.

Stefan


[1] http://tools.ietf.org/html/draft-jennings-rtcweb-plan-01

[2] 
http://www.ietf.org/proceedings/interim/2013/02/05/rtcweb/slides/slides-interim-2013-rtcweb-1-10.pdf

[rtcweb] Comments on draft-jennings-rtcweb-plan-01 Stefan Håkansson LK
Re: [rtcweb] Comments on draft-jennings-rtcweb-pl… Martin Thomson