[rtcweb] Comments on draft-uberti-rtcweb-plan-00
Bernard Aboba <bernard_aboba@hotmail.com> Sat, 11 May 2013 23:59 UTC
Return-Path: <bernard_aboba@hotmail.com>
X-Original-To: rtcweb@ietfa.amsl.com
Delivered-To: rtcweb@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 76E9421F8AD5 for <rtcweb@ietfa.amsl.com>; Sat, 11 May 2013 16:59:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -101.955
X-Spam-Level:
X-Spam-Status: No, score=-101.955 tagged_above=-999 required=5 tests=[AWL=-0.557, BAYES_00=-2.599, HTML_MESSAGE=0.001, J_CHICKENPOX_14=0.6, J_CHICKENPOX_52=0.6, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4Vgr2-KHMR-n for <rtcweb@ietfa.amsl.com>; Sat, 11 May 2013 16:59:51 -0700 (PDT)
Received: from blu0-omc3-s4.blu0.hotmail.com (blu0-omc3-s4.blu0.hotmail.com [65.55.116.79]) by ietfa.amsl.com (Postfix) with ESMTP id CC3C021F8B2B for <rtcweb@ietf.org>; Sat, 11 May 2013 16:59:50 -0700 (PDT)
Received: from BLU169-W115 ([65.55.116.72]) by blu0-omc3-s4.blu0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Sat, 11 May 2013 16:59:49 -0700
X-EIP: [19fqnCcSrzjUQD+BuyrHsFX+EDEcU8Dt]
X-Originating-Email: [bernard_aboba@hotmail.com]
Message-ID: <BLU169-W1158BEB6CD5A0828D7D866293A60@phx.gbl>
Content-Type: multipart/alternative; boundary="_99b62df1-10d4-4679-9d7d-00cf5754d31f_"
From: Bernard Aboba <bernard_aboba@hotmail.com>
To: "rtcweb@ietf.org" <rtcweb@ietf.org>
Date: Sat, 11 May 2013 16:59:49 -0700
Importance: Normal
MIME-Version: 1.0
X-OriginalArrivalTime: 11 May 2013 23:59:49.0533 (UTC) FILETIME=[9F0C1CD0:01CE4EA3]
Subject: [rtcweb] Comments on draft-uberti-rtcweb-plan-00
X-BeenThere: rtcweb@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <rtcweb.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtcweb>
List-Post: <mailto:rtcweb@ietf.org>
List-Help: <mailto:rtcweb-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 11 May 2013 23:59:56 -0000
First some general comments: To some extent this document is arguing against a straw man (individual m= lines for RTP streams) that in its extreme case (dozens of RTP streams, possibly with simulcast and layered coding) is quite impractical. AFAIK, many (if not the majority) of video implementations have rejected that purist "Plan A" approach. So to some extent, the document is beating a horse that is quite dead (or should be). To characterize that dead horse as "legacy" is a mistake -- in fact legacy implementations are more sophisticated than that, and several resemble Plan B in their approach, though not quite enough to avoid potential interoperability problems. More later on how to get betterinteroperability with legacy and also how to reduce the exchange to a single O/A in common cases. Section 2.4 2.4. Interworking with legacy devices When interacting with a legacy application that only knows how to deal with a small number of sources, it must be possible to degrade gracefully to a usable basic experience, where at least a single audio and video source are active on each side, using a typical offer /answer exchange. [BA] I would suggest that only being able to handle a single legacy stream is toolimiting. It seems that it should be possible to handle multiple undeclared SSRCs as long as it is assumed that they inherit the parameters declared for the RTP session they are part of. 2.1 These layouts can change dynamically, depending on the conference content and the preferences of the receiver. As such, there are not well-defined 'roles', that could be used to group sources into specific 'large' or 'thumbnail' categories. As such, the requirement Plan B attempts to satisfy is support for sending and receiving up to hundreds of simultaneous, hetereogeneous sources. [BA] While I agree that the layouts can change dynamically, I am wondering if there is an implication that the burden of determining the 'roles' is on the mixer. For example, it might be assumed that the mixer allocates an SSRC for the 'large' category, and other SSRCs for the 'thumbnails' and then these SSRCs are statically mapped to MSTs and rendered. However, another way to handle it is for the browser to handle the role assignment, and I would argue that this could make more sense in some cases, particularly since this could make the mixer a lot simpler, or even obviate the need for a mixer entirely (e.g. an RTP translator might work in some cases). 2.6. Simple binding of MediaStreamTrack to SDP In WebRTC, each media source is identified by a MediaStreamTrack object. In order to ensure that the MSTs created by the sender show up at the receiver, each MST's id attribute needs to be reflected in SDP. [BA] While I might understand why the MST ID might be useful to expose in the API, I don't see why this needs to be signaled over the wire. In general, since WebRTC is supposed to be "signaling independent", we should try to avoid signaling things that don't need to be signaled. With respect to MST id, a number of approaches seem preferrable, including use of an RTP SDES item, or (even better)allowing the receiver to determine which MediaStreamTrack an incoming RTP stream belongs to when the SSRC is first received. 2.7. Support for RTX, FEC, simulcast, layered coding For robust applications, techniques like RTX and FEC are used to protect media, and simulcast/layered coding can be used to provide support to hetereogeneous receivers. It needs to be possible to support these techniques, allow the receipient to optionally use or not use them on a source-by-source basis, and for simulcast/layered scenarios, control which simulcast streams or layers are received. [BA] In practice, control over simulcast streams or layers is often left with the sender. That is, a sender can determine what simulcast stream or set of layers will go a particular receiver based on the RTCP feedback. What the receiver needs to indicate is what it is capable of receiving (e.g. "I can handle up to 4 temporal layers"). While I'm not against allowing a receiver to indicate what simulcast or layered streams it wants to receive, I don't think this is the most important use case. In particular, assuming receiver-side control isn't compatible with the sender-side congestion control most commonly used along with simulcast/layered coding. Section 3.1 By only using a m= line for each media type, as opposed to each media source, this approach reduces the number of transports required to 2 even in complex audio/video cases. [BA] Wasn't sure if the "2" referred to here was one for audio and one for video,or one for RTP and another for RTCP. To clarify I might say "2 (one for audio and one for video, assuming rtcp mux)". 4.1. Negotiation of new or legacy behavior In order to know whether a given application supports Plan B, an attribute in the offer is needed. There are various options that could be used for this: o a=ssrc isn't enough, since you might not have any send streams, and therefore no a=ssrc attributes. o a=max-*-ssrc could work, but has additional semantics o a=msid-semantic indicates that you understand MSIDs. Because understanding MSID is a prerequisite to using plan B, the third option (presence of a=msid-semantic) is recommended. [BA] I would suggest that max-*-ssrc is a better choice because there are legacy scenarios where msid might not be present. 4.2. New signaling flow When both sides support Plan B, to properly allow both sides to indicate which MSTs they have, and allow the remote side to select the desired MSTs to receive, a 3-way handshake is needed (this is just math; the offer can't select the answerer's MSTs until they know about them). [BA] While I understand the argument for why you need two O/As for bothsides to select the desired streams, I think that it's possible to designthe exchange so that only one O/A is needed most of the time. The keyconcept is for the offer to contain information on what the offerer is capableof receiving in addition to what it is capable of sending. Yes, anotherO/A might be needed if it turns out that the Offerer wants something differentthan what the Answerer chose, but at least the first Answer is guaranteed tobe acceptable to the Offerer. That is, I believe we should think of the second O/A as an optional exchangethat hopefully won't be needed much of the time than as part of a "3-way" or"4-way" handshake that will execute every time. The expected flow for this would be for the caller to send an offer with its sources, then the callee would send back an answer with the sources it wants the caller to send, followed immediately by an offer with the sources that the callee has available to send. Finally, the answerer will reply back with the sources that it wants to request from the callee. The entire sequence can be done in 1.5 RTT. [BA] Why not add the info on what sources the callee has available to send to the first Answer? If the Offer also contains the maximum number of received SSRCs, the Offerer should prepare to receive that many SSRCs, and the Answer could include up to that many sourcesas enabled and start sending. That way, if the sources sent are OK with the Offerer then we don't need anotherOffer/Answer exchange, because the Answerer has indicated what sources it wants from the ones the Offerersaid it could send. This assumes that the Offerer can handle incoming RTP streams up to the maximum number of receive SSRCs before it receives the Answer which can explicitly declare the SSRCs. In addition, since the sources are known ahead of time by the recipient of said sources, it is prepared to demux them by SSRC without any signaling/media race. [BA] If you don't make this a hard requirement, it seems like you could get by with a single O/A exchange much of the time. 4.3. Legacy signaling flow In the legacy case, Plan B degrades gracefully back to a single offer-answer sequence. Since there's no brokering of which sources should be sent, the "new" endpoint picks a default media source for each m= line, and upon receiving an answer indicating lack of support for Plan B, it sends just the default sources to the legacy endpoint. When receiving media from the legacy endpoint, the new endpoint creates a "default" MediaStream (containing a single MediaStreamTrack) for each m= line, just as when talking to any other legacy endpoint, as specified in the MSID draft. [BA] I'd claim that not only the "legacy" case but other cases as well can behandled with a single O/A if you are prepared to handle more than one undeclaredSSRC. Seems like a win to me.
- [rtcweb] Comments on draft-uberti-rtcweb-plan-00 Bernard Aboba
- Re: [rtcweb] Comments on draft-uberti-rtcweb-plan… Paul Kyzivat
- Re: [rtcweb] Comments on draft-uberti-rtcweb-plan… Bernard Aboba
- Re: [rtcweb] Comments on draft-uberti-rtcweb-plan… Harald Alvestrand
- Re: [rtcweb] Comments on draft-uberti-rtcweb-plan… Cullen Jennings (fluffy)