[rtcweb] Comments on draft-uberti-rtcweb-plan-00

Bernard Aboba <bernard_aboba@hotmail.com> Sat, 11 May 2013 23:59 UTC

Return-Path: <bernard_aboba@hotmail.com>
X-Original-To: rtcweb@ietfa.amsl.com
Delivered-To: rtcweb@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 76E9421F8AD5 for <rtcweb@ietfa.amsl.com>; Sat, 11 May 2013 16:59:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -101.955
X-Spam-Level:
X-Spam-Status: No, score=-101.955 tagged_above=-999 required=5 tests=[AWL=-0.557, BAYES_00=-2.599, HTML_MESSAGE=0.001, J_CHICKENPOX_14=0.6, J_CHICKENPOX_52=0.6, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4Vgr2-KHMR-n for <rtcweb@ietfa.amsl.com>; Sat, 11 May 2013 16:59:51 -0700 (PDT)
Received: from blu0-omc3-s4.blu0.hotmail.com (blu0-omc3-s4.blu0.hotmail.com [65.55.116.79]) by ietfa.amsl.com (Postfix) with ESMTP id CC3C021F8B2B for <rtcweb@ietf.org>; Sat, 11 May 2013 16:59:50 -0700 (PDT)
Received: from BLU169-W115 ([65.55.116.72]) by blu0-omc3-s4.blu0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Sat, 11 May 2013 16:59:49 -0700
X-EIP: [19fqnCcSrzjUQD+BuyrHsFX+EDEcU8Dt]
X-Originating-Email: [bernard_aboba@hotmail.com]
Message-ID: <BLU169-W1158BEB6CD5A0828D7D866293A60@phx.gbl>
Content-Type: multipart/alternative; boundary="_99b62df1-10d4-4679-9d7d-00cf5754d31f_"
From: Bernard Aboba <bernard_aboba@hotmail.com>
To: "rtcweb@ietf.org" <rtcweb@ietf.org>
Date: Sat, 11 May 2013 16:59:49 -0700
Importance: Normal
MIME-Version: 1.0
X-OriginalArrivalTime: 11 May 2013 23:59:49.0533 (UTC) FILETIME=[9F0C1CD0:01CE4EA3]
Subject: [rtcweb] Comments on draft-uberti-rtcweb-plan-00
X-BeenThere: rtcweb@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <rtcweb.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtcweb>
List-Post: <mailto:rtcweb@ietf.org>
List-Help: <mailto:rtcweb-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 11 May 2013 23:59:56 -0000

First some general comments: 
To some extent this document is arguing against a straw man (individual m= lines for RTP streams) that in its extreme case  (dozens of RTP streams, possibly with simulcast and layered coding) is quite impractical.  AFAIK, many (if not the majority) of video implementations have rejected that purist "Plan A" approach.  So to some extent, the document is beating a horse that is quite dead (or should be).   To characterize that dead horse as "legacy" is a mistake -- in fact legacy implementations are more sophisticated than that, and several resemble Plan B in their approach, though not quite enough to avoid potential interoperability problems.  More later on how to get betterinteroperability with legacy and also how to reduce the exchange to a single O/A in common cases. 
Section 2.4
2.4.  Interworking with legacy devices

   When interacting with a legacy application that only knows how to
   deal with a small number of sources, it must be possible to degrade
   gracefully to a usable basic experience, where at least a single
   audio and video source are active on each side, using a typical offer
   /answer exchange.
[BA] I would suggest that only being able to handle a single legacy stream is toolimiting.  It seems that it should be possible  to handle multiple undeclared SSRCs as long as it is assumed that they inherit the parameters declared for the RTP session they are part of. 
2.1
   These layouts can change dynamically, depending on the conference
   content and the preferences of the receiver.  As such, there are not
   well-defined 'roles', that could be used to group sources into
   specific 'large' or 'thumbnail' categories.  As such, the requirement
   Plan B attempts to satisfy is support for sending and receiving up to
   hundreds of simultaneous, hetereogeneous sources.

[BA] While I agree that the layouts can change dynamically, I am wondering if there is an implication that the burden of determining the 'roles' is on the mixer.  For example, it might be assumed that the mixer allocates an SSRC for the 'large' category, and other SSRCs for the 'thumbnails' and then these SSRCs are statically mapped to MSTs and rendered.  However, another way to handle it is for the browser to handle the role assignment, and I would argue that this could make more sense in some cases, particularly since this could make the mixer a lot simpler, or even obviate the need for a mixer entirely (e.g. an RTP translator might work in some cases).  






2.6.  Simple binding of MediaStreamTrack to SDP

   In WebRTC, each media source is identified by a MediaStreamTrack
   object.  In order to ensure that the MSTs created by the sender show
   up at the receiver, each MST's id attribute needs to be reflected in
   SDP.
[BA] While I might understand why the MST ID might be useful to expose in the API, I don't see why this needs to be signaled over the wire. In general, since WebRTC is supposed to be "signaling independent", we should try to avoid signaling things that don't need to be signaled.  With respect to MST id, a number of approaches seem preferrable, including use of an RTP SDES item, or (even better)allowing the receiver to determine which MediaStreamTrack an incoming RTP stream belongs to when the SSRC is first received.  

2.7.  Support for RTX, FEC, simulcast, layered coding

   For robust applications, techniques like RTX and FEC are used to
   protect media, and simulcast/layered coding can be used to provide
   support to hetereogeneous receivers.  It needs to be possible to
   support these techniques, allow the receipient to optionally use or
   not use them on a source-by-source basis, and for simulcast/layered
   scenarios, control which simulcast streams or layers are received.
[BA] In practice, control over simulcast streams or layers is often left with the sender.  That is, a sender can determine what simulcast stream or set of layers will go a particular receiver based on the RTCP feedback.  What the receiver needs to indicate is what it is capable of receiving (e.g. "I can handle up to 4 temporal layers").  While I'm not against allowing a receiver to indicate what simulcast or layered streams it wants to receive, I don't think this is the most important use case.  In particular, assuming receiver-side control isn't compatible with the sender-side congestion control most commonly used along with simulcast/layered coding. 
Section 3.1
   By only using a m= line for each media type, as opposed to each media
   source, this approach reduces the number of transports required to 2
   even in complex audio/video cases. 
[BA] Wasn't sure if the "2" referred to here was one for audio and one for video,or one for RTP and another for RTCP.  To clarify I might say "2 (one for audio and one for video, assuming rtcp mux)". 
4.1.  Negotiation of new or legacy behavior
   In order to know whether a given application supports Plan B, an
   attribute in the offer is needed.  There are various options that
   could be used for this:

   o  a=ssrc isn't enough, since you might not have any send streams,
      and therefore no a=ssrc attributes.

   o  a=max-*-ssrc could work, but has additional semantics

   o  a=msid-semantic indicates that you understand MSIDs.

   Because understanding MSID is a prerequisite to using plan B, the
   third option (presence of a=msid-semantic) is recommended.


[BA]  I would suggest that max-*-ssrc is a better choice because there are legacy scenarios where msid might not be present.
4.2.  New signaling flow

   When both sides support Plan B, to properly allow both sides to
   indicate which MSTs they have, and allow the remote side to select
   the desired MSTs to receive, a 3-way handshake is needed (this is
   just math; the offer can't select the answerer's MSTs until they know
   about them).  
[BA] While I understand the argument for why you need two O/As for bothsides to select the desired streams, I think that it's possible to designthe exchange so that only one O/A is needed most of the time.  The keyconcept is for the offer to contain information on what the offerer is capableof receiving in addition to what it is capable of sending.  Yes, anotherO/A might be needed if it turns out that the Offerer wants something differentthan what the Answerer chose, but at least the first Answer is guaranteed tobe acceptable to the Offerer. 
That is, I believe we should think of the second O/A as an optional exchangethat hopefully won't be needed much of the time than as part of a "3-way" or"4-way" handshake that will execute every time. 
   The expected flow for this would be for the caller to
   send an offer with its sources, then the callee would send back an
   answer with the sources it wants the caller to send, followed
   immediately by an offer with the sources that the callee has
   available to send.  Finally, the answerer will reply back with the
   sources that it wants to request from the callee.  The entire
   sequence can be done in 1.5 RTT.
[BA] Why not add the info on what sources the callee has available to send to the first Answer? If the Offer also contains the maximum number of received SSRCs, the Offerer should prepare to receive that many SSRCs, and the Answer could include up to that many sourcesas enabled and start sending.    That way, if the sources sent are OK with the Offerer then we don't need anotherOffer/Answer exchange, because the Answerer has indicated what sources it wants from the ones the Offerersaid it could send. 
This assumes that the Offerer can handle incoming RTP streams up to the maximum number of receive SSRCs before it receives the Answer which can explicitly declare the SSRCs.  
   In addition, since the
   sources are known ahead of time by the recipient of said sources, it
   is prepared to demux them by SSRC without any signaling/media race.

[BA] If you don't make this a hard requirement, it seems like you could get by with a single O/A exchange much of the time. 
4.3.  Legacy signaling flow

   In the legacy case, Plan B degrades gracefully back to a single
   offer-answer sequence.  Since there's no brokering of which sources
   should be sent, the "new" endpoint picks a default media source for
   each m= line, and upon receiving an answer indicating lack of support
   for Plan B, it sends just the default sources to the legacy endpoint.
   When receiving media from the legacy endpoint, the new endpoint
   creates a "default" MediaStream (containing a single
   MediaStreamTrack) for each m= line, just as when talking to any other
   legacy endpoint, as specified in the MSID draft.
[BA] I'd claim that not only the "legacy" case but other cases as well can behandled with a single O/A if you are prepared to handle more than one undeclaredSSRC.  Seems like a win to me.